pax_global_header00006660000000000000000000000064145030726600014515gustar00rootroot0000000000000052 comment=1949b1621a802ffb1492616adbae6154bfbe64ef clr-rocm-5.7.1/000077500000000000000000000000001450307266000132455ustar00rootroot00000000000000clr-rocm-5.7.1/.gitignore000066400000000000000000000005661450307266000152440ustar00rootroot00000000000000# Prerequisites *.d # Compiled Object files *.slo *.lo *.o *.obj # Precompiled Headers *.gch *.pch # Compiled Dynamic libraries *.so *.dylib *.dll # Fortran module files *.mod *.smod # Compiled Static libraries *.lai *.la *.a *.lib # Executables *.exe *.out *.app # Directories build/ release/ debug/ packages/ install/ .vs/ .vscode/ # Editor temp files *.swp *.swo clr-rocm-5.7.1/CHANGELOG.md000066400000000000000000000116721450307266000150650ustar00rootroot00000000000000# Change Log for HIP Full documentation for HIP is available at [docs.amd.com](https://docs.amd.com/) ## HIP 5.7.1 (For ROCm 5.7.1) ### Fixed - hipPointerGetAttributes API returns the correct hip memory type as hipMemoryTypeManaged for managed memory. ## HIP 5.7 (For ROCm 5.7) ### Optimizations ### Added - Added meta_group_size/rank for getting the number of tiles and rank of a tile in the partition - Added new APIs supporting Windows only, under development on Linux - hipMallocMipmappedArray for allocating a mipmapped array on the device - hipFreeMipmappedArray for freeing a mipmapped array on the device - hipGetMipmappedArrayLevel for getting a mipmap level of a HIP mipmapped array - hipMipmappedArrayCreate for creating a mipmapped array - hipMipmappedArrayDestroy for destroy a mipmapped array - hipMipmappedArrayGetLevel for getting a mipmapped array on a mipmapped level ### Changed ### Fixed ### Known Issues - HIP memory type enum values currently don't support equivalent value to cudaMemoryTypeUnregistered, due to HIP functionality backward compatibility. - HIP API hipPointerGetAttributes could return invalid value in case the input memory pointer was not allocated through any HIP API on device or host. ### Upcoming changes in ROCm 6.0 release - Removal of gcnarch from hipDeviceProp_t structure - Addition of new fields in hipDeviceProp_t structure - maxTexture1D - maxTexture2D - maxTexture1DLayered - maxTexture2DLayered - sharedMemPerMultiprocessor - deviceOverlap - asyncEngineCount - surfaceAlignment - unifiedAddressing - computePreemptionSupported - hostRegisterSupported - uuid - Removal of deprecated code -hip-hcc codes from hip code tree - Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA - HIPMEMCPY_3D fields correction to avoid truncation of "size_t" to "unsigned int" inside hipMemcpy3D() - Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type' - Correct hipGetLastError to return the last error instead of last API call's return code - Update hipExternalSemaphoreHandleDesc to add "unsigned int reserved[16]" - Correct handling of flag values in hipIpcOpenMemHandle for hipIpcMemLazyEnablePeerAccess - Remove hiparray* and make it opaque with hipArray_t ## HIP 5.6.1 (For ROCm 5.6.1) ### Fixed - Enabled xnack+ check in HIP catch2 tests hang while tests execution - Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs - Fixed a crash happening while using hipGraphAddMemFreeNode ## HIP 5.6 (For ROCm 5.6) ### Optimizations - Consolidation of hipamd, rocclr and OpenCL projects in clr - Optimized lock for graph global capture mode ### Added - Added hipRTC support for amd_hip_fp16 - Added hipStreamGetDevice implementation to get the device assocaited with the stream - Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats - hipArrayGetInfo for getting information about the specified array - hipArrayGetDescriptor for getting 1D or 2D array descriptor - hipArray3DGetDescriptor to get 3D array descriptor ### Changed - hipMallocAsync to return success for zero size allocation to match hipMalloc - Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package - Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide - Removed hipBusBandwidth and hipCommander samples from hip-tests ### Fixed - Fixed regression in hipMemCpyParam3D when offset is applied ### Known Issues - Limited testing on xnack+ configuration - Multiple HIP tests failures (gpuvm fault or hangs) - hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU - Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in future release ### Upcoming changes in future release - Removal of gcnarch from hipDeviceProp_t structure - Addition of new fields in hipDeviceProp_t structure - maxTexture1D - maxTexture2D - maxTexture1DLayered - maxTexture2DLayered - sharedMemPerMultiprocessor - deviceOverlap - asyncEngineCount - surfaceAlignment - unifiedAddressing - computePreemptionSupported - hostRegisterSupported - uuid - Removal of deprecated code -hip-hcc codes from hip code tree - Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA - HIPMEMCPY_3D fields correction to avoid truncation of "size_t" to "unsigned int" inside hipMemcpy3D() - Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type' - Correct hipGetLastError to return the last error instead of last API call's return code - Update hipExternalSemaphoreHandleDesc to add "unsigned int reserved[16]" - Correct handling of flag values in hipIpcOpenMemHandle for hipIpcMemLazyEnablePeerAccess - Remove hiparray* and make it opaque with hipArray_t clr-rocm-5.7.1/CMakeLists.txt000066400000000000000000000061421450307266000160100ustar00rootroot00000000000000# Copyright (c) 2022 - 2023 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.16.8) project(clr) ########## # Defaults ########## if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC") add_compile_options("/wd4267" "/wd4244" "/wd4996") string(REPLACE "/GR" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}) string(REPLACE "/W3" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}) endif() option(CLR_BUILD_HIP "Build HIP" OFF) option(CLR_BUILD_OCL "Build OCL" OFF) # Set default build type if(NOT CMAKE_BUILD_TYPE) set(CMAKE_BUILD_TYPE "Release") endif() ############# # Build steps ############# if(CLR_BUILD_HIP) # Set default HIPCC_BIN_DIR to /opt/rocm/bin if(NOT DEFINED HIPCC_BIN_DIR AND UNIX) set(HIPCC_BIN_DIR "/opt/rocm/bin" CACHE STRING "Default hipcc directory on linux.") endif() message(STATUS "HIPCC Binary Directory: ${HIPCC_BIN_DIR}") if(NOT EXISTS ${HIPCC_BIN_DIR}/hipconfig) message(FATAL_ERROR "Please pass hipcc/build or hipcc/bin using -DHIPCC_BIN_DIR.") endif() message(STATUS "HIP Common Directory: ${HIP_COMMON_DIR}") if(NOT DEFINED HIP_COMMON_DIR) message(FATAL_ERROR "Please pass HIP using -DHIP_COMMON_DIR. HIP_COMMON_DIR is incorrect") endif() # Determine HIP_PLATFORM set(__HIPCONFIG_EXECUTABLE__ ${HIPCC_BIN_DIR}/hipconfig) if(NOT DEFINED HIP_PLATFORM) if(NOT DEFINED ENV{HIP_PLATFORM}) execute_process(COMMAND ${__HIPCONFIG_EXECUTABLE__} --platform OUTPUT_VARIABLE HIP_PLATFORM OUTPUT_STRIP_TRAILING_WHITESPACE) else() set(HIP_PLATFORM $ENV{HIP_PLATFORM} CACHE STRING "HIP Platform") endif() endif() endif() if((CLR_BUILD_HIP AND HIP_PLATFORM STREQUAL "amd") OR CLR_BUILD_OCL) add_subdirectory(rocclr) elseif(HIP_PLATFORM STREQUAL "amd") message(FATAL_ERROR "Please enable building of one or more of the below runtimes:\n- HIP (-DCLR_BUILD_HIP=ON)\n- OpenCL (-DCLR_BUILD_OCL-ON)") endif() if(CLR_BUILD_HIP) add_subdirectory(hipamd) endif() if(CLR_BUILD_OCL) add_subdirectory(opencl) endif() clr-rocm-5.7.1/LICENCE000066400000000000000000000020671450307266000142370ustar00rootroot00000000000000Copyright (c) 2008 - 2023 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. clr-rocm-5.7.1/README.md000066400000000000000000000066251450307266000145350ustar00rootroot00000000000000# AMD CLR - Compute Language Runtimes AMD Common Language Runtime contains source code for AMD's compute languages runtimes: `HIP` and `OpenCL™`. ## Project Organisation - `hipamd` - contains implementation of `HIP` language on AMD platform. Previously this was hosted at [ROCm-Developer-Tools/hipamd](https://github.com/ROCm-Developer-Tools/hipamd) - `opencl` - contains implementation of [OpenCL™](https://www.khronos.org/opencl/) on AMD platform. Previously this was hosted at [RadeonOpenCompute/ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime) - `rocclr` - contains common runtime used in `HIP` and `OpenCL™`. Previously this was hosted at [ROCm-Developer-Tools/ROCclr](https://github.com/ROCm-Developer-Tools/hipamd) ## How to build/install ### Prerequisites Please refer to Quick Start Guide in [ROCm Docs](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html). Building clr requires `rocm-hip-libraries` meta package, which provides the pre-requisites for clr. ### Linux - Clone this repo - `cd clr && mkdir build && cd build` - For HIP : `cmake .. -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=$HIP_COMMON_DIR` - `HIP_COMMON_DIR` points to [HIP](https://github.com/ROCm-Developer-Tools/HIP) - `HIPCC_BIN_DIR` points to [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC)'s bin folder. If not provided, it defaults to `/opt/rocm/bin`. - For OpenCL™ : `cmake .. -DCLR_BUILD_OCL=ON` - `make` : to build - `make install` : to install Users can also build `OCL` and `HIP` at the same time by passing `-DCLR_BUILD_HIP=ON -DCLR_BUILD_OCL=ON` to configure command. ## Disclaimer The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. © 2023 Advanced Micro Devices, Inc. All Rights Reserved. OpenCL™ is registered Trademark of Apple clr-rocm-5.7.1/hipamd/000077500000000000000000000000001450307266000145075ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/.clang-format000066400000000000000000000004031450307266000170570ustar00rootroot00000000000000Language: Cpp BasedOnStyle: Google AlignEscapedNewlinesLeft: false AlignOperands: false ColumnLimit: 100 AlwaysBreakTemplateDeclarations: false DerivePointerAlignment: false IndentFunctionDeclarationAfterType: false MaxEmptyLinesToKeep: 2 SortIncludes: false clr-rocm-5.7.1/hipamd/.gitattributes000066400000000000000000000011601450307266000174000ustar00rootroot00000000000000# Set the default behavior, in case people don't have core.autolf set. * text=auto # Explicitly declare text files you want to always be normalized and converted # to have LF line endings on checkout. *.c text eol=lf *.cpp text eol=lf *.cc text eol=lf *.h text eol=lf *.hpp text eol=lf *.txt text eol=lf # Define files to support auto-remove trailing white space # Need to run the command below, before add modified file(s) to the staging area # git config filter.trimspace.clean 'sed -e "s/[[:space:]]*$//g"' *.cpp filter=trimspace *.c filter=trimspace *.h filter=trimspacecpp *.hpp filter=trimspace *.md filter=trimspaceclr-rocm-5.7.1/hipamd/.gitignore000066400000000000000000000004041450307266000164750ustar00rootroot00000000000000.* !.gitignore *.o *.exe *.swp lib packages build bin/hipInfo bin/hipBusBandwidth bin/hipDispatchLatency bin/hipify-clang tags samples/1_Utils/hipInfo/hipInfo samples/1_Utils/hipBusBandwidth/hipBusBandwidth samples/1_Utils/hipDispatchLatency/hipDispatchLatencyclr-rocm-5.7.1/hipamd/CMakeLists.txt000077500000000000000000000535631450307266000172660ustar00rootroot00000000000000# Copyright (c) 2016 - 2021 Advanced Micro Devices, Inc. All Rights Reserved. # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.16.8) project(hip) include(GNUInstallDirs) # sample command for hip-rocclr runtime, you'll need to have rocclr built # ROCM_PATH is the path where ROCM is installed # For shared lib of hip-rocclr runtime # For release version # cmake -DHIP_COMMON_DIR="$HIP_DIR" -DHIPCC_BIN_DIR="$HIPCC_DIR/bin" -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DCMAKE_PREFIX_PATH="/" -DCMAKE_INSTALL_PREFIX= .. # For debug version # cmake -DHIP_COMMON_DIR="$HIP_DIR" -DHIPCC_BIN_DIR="$HIPCC_DIR/bin" -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DCMAKE_PREFIX_PATH="/" -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX= .. # For static lib of hip-rocclr runtime # For release version # cmake -DHIP_COMMON_DIR="$HIP_DIR" -DHIPCC_BIN_DIR="$HIPCC_DIR/bin" -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DBUILD_SHARED_LIBS=OFF -DCMAKE_PREFIX_PATH="/" -DCMAKE_INSTALL_PREFIX= .. # For debug version # cmake -DHIP_COMMON_DIR="$HIP_DIR" -DHIPCC_BIN_DIR="$HIPCC_DIR/bin" -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DBUILD_SHARED_LIBS=OFF -DCMAKE_BUILD_TYPE=Debug -DCMAKE_PREFIX_PATH="/" -DCMAKE_INSTALL_PREFIX= .. # If you don't specify CMAKE_INSTALL_PREFIX, hip-rocclr runtime will be installed to "/hip". # By default, CMake will search for a folder named vdi or ROCclr relative to the current path. Specify -DROCCLR_PATH=$ROCCLR_DIR if rocclr source is in obscure location. # By default, CMake will search for a folder named opencl or ROCm-OpenCL-Runtime relative to the current path. Specify -DAMD_OPENCL_PATH=$OPENCL_DIR if opencl source is in obscure location. list(APPEND CMAKE_MODULE_PATH ${HIP_COMMON_DIR}/cmake) ############################# # Options ############################# option(BUILD_HIPIFY_CLANG "Enable building the CUDA->HIP converter" OFF) option(__HIP_ENABLE_PCH "Enable/Disable pre-compiled hip headers" ON) option(HIP_OFFICIAL_BUILD "Enable/Disable for mainline/staging builds" ON) # Disable file reorg backward compatibility for ASAN packaging if(NOT ENABLE_ASAN_PACKAGING) option(FILE_REORG_BACKWARD_COMPATIBILITY "Enable File Reorg with backward compatibility" ON) endif() if(MSVC) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /Zi") set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} /DEBUG:FULL") endif() set(HIPCC_BIN_DIR "" CACHE STRING "HIPCC and HIPCONFIG binary directories") if(__HIP_ENABLE_PCH) set(_pchStatus 1) else() set(_pchStatus 0) endif() message(STATUS "HIPCC_BIN_DIR found at ${HIPCC_BIN_DIR}") message(STATUS "HIP_COMMON_DIR found at ${HIP_COMMON_DIR}") set(HIP_COMMON_INCLUDE_DIR ${HIP_COMMON_DIR}/include) set(HIP_COMMON_BIN_DIR ${HIP_COMMON_DIR}/bin) set(__HIPCONFIG_EXECUTABLE__ ${HIP_COMMON_DIR}/bin/hipconfig) ############################# # Setup config generation ############################# string(TIMESTAMP _timestamp UTC) set(_versionInfo "# Auto-generated by cmake\n") set(_buildInfo "# Auto-generated by cmake on ${_timestamp} UTC\n") macro(add_to_config _configfile _variable) set(${_configfile} "${${_configfile}}${_variable}=${${_variable}}\n") endmacro() ############################# # Setup version information ############################# find_package(Perl REQUIRED) # Determine HIP_BASE_VERSION set(ENV{HIP_PATH} "") file(STRINGS ${HIP_COMMON_DIR}/VERSION VERSION_LIST REGEX "^[0-9]+") list(GET VERSION_LIST 0 HIP_VERSION_MAJOR) list(GET VERSION_LIST 1 HIP_VERSION_MINOR) list(GET VERSION_LIST 2 HIP_VERSION_PATCH) set(HIP_VERSION_GITDATE 0) find_package(Git) # FIXME: Two different version strings used. # Below we use UNIX commands, not compatible with Windows. if(GIT_FOUND) # use the commit date, instead of build date execute_process(COMMAND ${GIT_EXECUTABLE} show -s --format=%ct RESULT_VARIABLE git_result OUTPUT_VARIABLE git_output WORKING_DIRECTORY ${PROJECT_SOURCE_DIR} OUTPUT_STRIP_TRAILING_WHITESPACE) if(git_result EQUAL 0) set(HIP_VERSION_UNIXDATE ${git_output}) endif() # get date information based on UTC # use the last two digits of year + week number + day in the week as HIP_VERSION_GITDATE execute_process(COMMAND ${PERL_EXECUTABLE} "-MPOSIX=strftime" "-le" "print strftime \'%y%W%w\',gmtime(${HIP_VERSION_UNIXDATE})" RESULT_VARIABLE git_result OUTPUT_VARIABLE git_output WORKING_DIRECTORY ${PROJECT_SOURCE_DIR} OUTPUT_STRIP_TRAILING_WHITESPACE) if(git_result EQUAL 0) set(HIP_VERSION_GITDATE ${git_output}) endif() # get commit short hash execute_process(COMMAND ${GIT_EXECUTABLE} rev-parse --short HEAD WORKING_DIRECTORY ${PROJECT_SOURCE_DIR} RESULT_VARIABLE git_result OUTPUT_VARIABLE git_output OUTPUT_STRIP_TRAILING_WHITESPACE) if(git_result EQUAL 0) set(HIP_VERSION_GITHASH ${git_output}) endif() set(HIP_VERSION_BUILD_ID 0) set(HIP_VERSION_BUILD_NAME "") if(NOT DEFINED ENV{HIP_OFFICIAL_BUILD} AND NOT HIP_OFFICIAL_BUILD) set(HIP_VERSION_PATCH ${HIP_VERSION_GITDATE}) endif() if(DEFINED ENV{ROCM_LIBPATCH_VERSION}) set(HIP_PACKAGING_VERSION_PATCH ${HIP_VERSION_PATCH}.$ENV{ROCM_LIBPATCH_VERSION}) else() set(HIP_PACKAGING_VERSION_PATCH ${HIP_VERSION_PATCH}-${HIP_VERSION_GITHASH}) endif() else() set(HIP_VERSION_BUILD_ID 0) set(HIP_VERSION_BUILD_NAME "") # FIXME: Some parts depend on this being set. set(HIP_PACKAGING_VERSION_PATCH "0") endif() ## Debian package specific variables if ( DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE} ) set ( CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE} ) else() set ( CPACK_DEBIAN_PACKAGE_RELEASE "local" ) endif() message (STATUS "Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}" ) ## RPM package specific variables if ( DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE} ) set ( CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE} ) else() set ( CPACK_RPM_PACKAGE_RELEASE "local" ) endif() ## 'dist' breaks manual builds on debian systems due to empty Provides execute_process( COMMAND rpm --eval %{?dist} RESULT_VARIABLE PROC_RESULT OUTPUT_VARIABLE EVAL_RESULT OUTPUT_STRIP_TRAILING_WHITESPACE ) if ( PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "" ) string ( APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}" ) endif() message(STATUS "CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}") add_to_config(_versionInfo HIP_PACKAGING_VERSION_PATCH) add_to_config(_versionInfo CPACK_DEBIAN_PACKAGE_RELEASE) add_to_config(_versionInfo CPACK_RPM_PACKAGE_RELEASE) add_to_config(_versionInfo HIP_VERSION_MAJOR) add_to_config(_versionInfo HIP_VERSION_MINOR) add_to_config(_versionInfo HIP_VERSION_PATCH) add_to_config(_versionInfo HIP_VERSION_GITHASH) set (HIP_LIB_VERSION_MAJOR ${HIP_VERSION_MAJOR}) set (HIP_LIB_VERSION_MINOR ${HIP_VERSION_MINOR}) if (${ROCM_PATCH_VERSION} ) set (HIP_LIB_VERSION_PATCH ${ROCM_PATCH_VERSION}) elseif (DEFINED HIP_VERSION_GITHASH) set (HIP_LIB_VERSION_PATCH ${HIP_VERSION_PATCH}-${HIP_VERSION_GITHASH}) else () set (HIP_LIB_VERSION_PATCH ${HIP_VERSION_PATCH}) endif () set (HIP_LIB_VERSION_STRING "${HIP_LIB_VERSION_MAJOR}.${HIP_LIB_VERSION_MINOR}.${HIP_LIB_VERSION_PATCH}") # overwrite HIP_VERSION_PATCH for packaging set(HIP_VERSION ${HIP_VERSION_MAJOR}.${HIP_VERSION_MINOR}.${HIP_PACKAGING_VERSION_PATCH}) # Remove when CI is updated if(HIP_PLATFORM STREQUAL "rocclr") set(HIP_PLATFORM "amd") endif() ############################# # Configure variables ############################# # Determine HIP_PLATFORM if(NOT DEFINED HIP_PLATFORM) if(NOT DEFINED ENV{HIP_PLATFORM}) execute_process(COMMAND ${__HIPCONFIG_EXECUTABLE__} --platform OUTPUT_VARIABLE HIP_PLATFORM OUTPUT_STRIP_TRAILING_WHITESPACE) else() set(HIP_PLATFORM $ENV{HIP_PLATFORM} CACHE STRING "HIP Platform") endif() endif() message(STATUS "HIP Platform: " ${HIP_PLATFORM}) if(HIP_PLATFORM STREQUAL "nvidia") set(HIP_RUNTIME "cuda" CACHE STRING "HIP Runtime") set(HIP_COMPILER "nvcc" CACHE STRING "HIP Compiler") elseif(HIP_PLATFORM STREQUAL "amd") set(HIP_RUNTIME "rocclr" CACHE STRING "HIP Runtime") set(HIP_COMPILER "clang" CACHE STRING "HIP Compiler") else() message(FATAL_ERROR "Unexpected HIP_PLATFORM: " ${HIP_PLATFORM}) endif() message(STATUS "HIP Runtime: " ${HIP_RUNTIME}) message(STATUS "HIP Compiler: " ${HIP_COMPILER}) add_to_config(_buildInfo HIP_RUNTIME) add_to_config(_buildInfo HIP_COMPILER) if (NOT DEFINED ROCM_PATH ) set ( ROCM_PATH "/opt/rocm" CACHE STRING "Default ROCM installation directory." ) endif () message (STATUS "ROCM Installation path(ROCM_PATH): ${ROCM_PATH}") # Determine HIP install path if (UNIX) set(HIP_DEFAULT_INSTALL_PREFIX "${ROCM_PATH}") endif() if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT) set(CMAKE_INSTALL_PREFIX ${HIP_DEFAULT_INSTALL_PREFIX} CACHE PATH "Installation path for HIP" FORCE) endif() if(DEV_LOG_ENABLE MATCHES "yes") add_definitions(-DDEV_LOG_ENABLE) endif() # Set default install path as "${ROCM_PATH}", can override the path from cmake build. set(CPACK_INSTALL_PREFIX ${HIP_DEFAULT_INSTALL_PREFIX} CACHE PATH "Package Installation path for HIP") if(IS_ABSOLUTE ${CMAKE_INSTALL_PREFIX}) message(STATUS "HIP will be installed in: " ${CMAKE_INSTALL_PREFIX}) else() message(FATAL_ERROR "Don't know where to install HIP. Please specify absolute path using -DCMAKE_INSTALL_PREFIX") endif() # set the installation path for the installer package set(CPACK_SET_DESTDIR ON CACHE BOOL "Installer package will install hip to CMAKE_INSTALL_PREFIX instead of CPACK_PACKAGING_INSTALL_PREFIX") if (NOT CPACK_SET_DESTDIR) set(CPACK_PACKAGING_INSTALL_PREFIX "${ROCM_PATH}" CACHE PATH "Default installation path of hcc installer package") endif (NOT CPACK_SET_DESTDIR) ############################# # Build steps ############################# set(BIN_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}) set(LIB_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}) set(INCLUDE_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_INCLUDEDIR}) set(CONFIG_PACKAGE_INSTALL_DIR ${LIB_INSTALL_DIR}/cmake/hip) set(CONFIG_LANG_PACKAGE_INSTALL_DIR ${LIB_INSTALL_DIR}/cmake/hip-lang) set(CONFIG_RTC_PACKAGE_INSTALL_DIR ${LIB_INSTALL_DIR}/cmake/hiprtc) # Build clang hipify if enabled if (BUILD_HIPIFY_CLANG) add_subdirectory(hipify-clang) endif() # Generate hip_version.h set(_versionInfoHeader "// Auto-generated by cmake\n #ifndef HIP_VERSION_H #define HIP_VERSION_H\n #define HIP_VERSION_MAJOR ${HIP_VERSION_MAJOR} #define HIP_VERSION_MINOR ${HIP_VERSION_MINOR} #define HIP_VERSION_PATCH ${HIP_VERSION_PATCH} #define HIP_VERSION_GITHASH \"${HIP_VERSION_GITHASH}\" #define HIP_VERSION_BUILD_ID ${HIP_VERSION_BUILD_ID} #define HIP_VERSION_BUILD_NAME \"${HIP_VERSION_BUILD_NAME}\" #define HIP_VERSION (HIP_VERSION_MAJOR * 10000000 + HIP_VERSION_MINOR * 100000 + HIP_VERSION_PATCH)\n #define __HIP_HAS_GET_PCH ${_pchStatus}\n #endif\n ") file(WRITE "${PROJECT_BINARY_DIR}/include/hip/hip_version.h" ${_versionInfoHeader}) if(HIP_RUNTIME STREQUAL "rocclr") add_subdirectory(src) endif() # Generate .hipInfo file(WRITE "${PROJECT_BINARY_DIR}/.hipInfo" ${_buildInfo}) # Generate .hipVersion file(WRITE "${PROJECT_BINARY_DIR}/.hipVersion" ${_versionInfo}) # Build doxygen documentation find_program(DOXYGEN_EXE doxygen) if(DOXYGEN_EXE) if(EXISTS "${HIP_COMMON_DIR}/docs/doxygen-input/doxy.cfg") add_custom_target(doc COMMAND HIP_PATH=${CMAKE_CURRENT_SOURCE_DIR} ${DOXYGEN_EXE} ${HIP_COMMON_DIR}/docs/doxygen-input/doxy.cfg WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/docs) elseif(EXISTS "${HIP_COMMON_DIR}/docs/.doxygen/Doxyfile") add_custom_target(doc COMMAND HIP_PATH=${CMAKE_CURRENT_SOURCE_DIR} ${DOXYGEN_EXE} ${HIP_COMMON_DIR}/docs/.doxygen/Doxyfile WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/docs) else() message(STATUS "Unable to find doxygen config file. Will not generate doxygen output") endif() endif() ############################# # Install steps ############################# # Install .hipInfo install(FILES ${PROJECT_BINARY_DIR}/.hipInfo DESTINATION ${CMAKE_INSTALL_LIBDIR}) # Install .hipVersion if(WIN32) install(FILES ${PROJECT_BINARY_DIR}/.hipVersion DESTINATION ${CMAKE_INSTALL_BINDIR}) else() install(FILES ${PROJECT_BINARY_DIR}/.hipVersion DESTINATION ${CMAKE_INSTALL_DATADIR}/hip RENAME version) endif() # Install src, bin, include & cmake if necessary execute_process(COMMAND test ${CMAKE_INSTALL_PREFIX} -ef ${CMAKE_CURRENT_SOURCE_DIR} RESULT_VARIABLE INSTALL_SOURCE) if(NOT ${INSTALL_SOURCE} EQUAL 0) if(WIN32) install(DIRECTORY ${HIP_COMMON_BIN_DIR} DESTINATION . USE_SOURCE_PERMISSIONS) if (CMAKE_BUILD_TYPE STREQUAL "Debug") install(DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/src/" DESTINATION ${CMAKE_INSTALL_BINDIR} FILES_MATCHING PATTERN "*.pdb" PATTERN "*.ilk" PATTERN "CMakeFiles" EXCLUDE PATTERN "hip_rtc_gen" EXCLUDE PATTERN "libelf" EXCLUDE PATTERN "loader" EXCLUDE PATTERN "pal" EXCLUDE PATTERN "libamdhsacode" EXCLUDE) endif() else() # Exclude .bat files on Linux. #Hip bin files moved to /opt/rocm/bin and the file permission need to set properly install(DIRECTORY ${HIP_COMMON_BIN_DIR} DESTINATION . USE_SOURCE_PERMISSIONS DIRECTORY_PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE PATTERN *.bat EXCLUDE) endif() if(WIN32) #not required for flat folder structure # The following two lines will be removed after upstream updation install(CODE "MESSAGE(\"Removing ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_INCLUDEDIR}\")") install(CODE "file(REMOVE_RECURSE ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_INCLUDEDIR})") endif() install(DIRECTORY include DESTINATION .) install(DIRECTORY ${HIP_COMMON_INCLUDE_DIR}/hip/ DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/hip/) if(WIN32) install(DIRECTORY ${HIP_COMMON_DIR}/cmake DESTINATION .) else() install(DIRECTORY ${HIP_COMMON_DIR}/cmake/ DESTINATION ${CONFIG_PACKAGE_INSTALL_DIR}) endif() endif() # Install generated headers # FIXME: Associate with individual targets. if(HIP_PLATFORM STREQUAL "amd") install(FILES ${PROJECT_BINARY_DIR}/include/hip/amd_detail/hip_prof_str.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/hip/amd_detail) install(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/bin DESTINATION . USE_SOURCE_PERMISSIONS) endif() install(FILES ${PROJECT_BINARY_DIR}/include/hip/hip_version.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/hip) if (NOT ${HIPCC_BIN_DIR} STREQUAL "") file(TO_CMAKE_PATH "${HIPCC_BIN_DIR}" HIPCC_BIN_DIR) if(${HIPCC_BIN_DIR}/hipcc.bin) set(hipcc_bin ${HIPCC_BIN_DIR}/hipcc.bin) set(hipconfig_bin ${HIPCC_BIN_DIR}/hipconfig.bin) if(WIN32) set(hipcc_bin ${hipcc_bin}.exe) set(hipconfig_bin ${hipconfig_bin}.exe) endif() if(EXISTS ${hipcc_bin} AND EXISTS ${hipconfig_bin}) install(PROGRAMS ${hipcc_bin} DESTINATION bin) install(PROGRAMS ${hipconfig_bin} DESTINATION bin) endif() endif() install(PROGRAMS ${HIPCC_BIN_DIR}/hipcc DESTINATION bin) install(PROGRAMS ${HIPCC_BIN_DIR}/hipconfig DESTINATION bin) install(PROGRAMS ${HIPCC_BIN_DIR}/hipcc.pl DESTINATION bin) install(PROGRAMS ${HIPCC_BIN_DIR}/hipconfig.pl DESTINATION bin) install(PROGRAMS ${HIPCC_BIN_DIR}/hipvars.pm DESTINATION bin) install(PROGRAMS ${HIPCC_BIN_DIR}/hipcc.bat DESTINATION bin) install(PROGRAMS ${HIPCC_BIN_DIR}/hipconfig.bat DESTINATION bin) endif() ############################# # hip-config ############################# include(CMakePackageConfigHelpers) configure_package_config_file( hip-config.cmake.in ${CMAKE_CURRENT_BINARY_DIR}/hip-config.cmake INSTALL_DESTINATION ${CONFIG_PACKAGE_INSTALL_DIR} PATH_VARS LIB_INSTALL_DIR INCLUDE_INSTALL_DIR BIN_INSTALL_DIR ) configure_package_config_file( hip-config-amd.cmake ${CMAKE_CURRENT_BINARY_DIR}/hip-config-amd.cmake INSTALL_DESTINATION ${CONFIG_PACKAGE_INSTALL_DIR} PATH_VARS LIB_INSTALL_DIR INCLUDE_INSTALL_DIR BIN_INSTALL_DIR ) configure_package_config_file( hip-config-nvidia.cmake ${CMAKE_CURRENT_BINARY_DIR}/hip-config-nvidia.cmake INSTALL_DESTINATION ${CONFIG_PACKAGE_INSTALL_DIR} PATH_VARS LIB_INSTALL_DIR INCLUDE_INSTALL_DIR BIN_INSTALL_DIR ) write_basic_package_version_file( ${CMAKE_CURRENT_BINARY_DIR}/hip-config-version.cmake VERSION "${HIP_VERSION_MAJOR}.${HIP_VERSION_MINOR}.${HIP_VERSION_GITDATE}" COMPATIBILITY SameMajorVersion ) install( FILES ${CMAKE_CURRENT_BINARY_DIR}/hip-config.cmake ${CMAKE_CURRENT_BINARY_DIR}/hip-config-amd.cmake ${CMAKE_CURRENT_BINARY_DIR}/hip-config-nvidia.cmake ${CMAKE_CURRENT_BINARY_DIR}/hip-config-version.cmake DESTINATION ${CONFIG_PACKAGE_INSTALL_DIR} ) # Packaging invokes UNIX commands, which are not available on Windows. if(NOT WIN32) add_subdirectory(packaging) endif() ############################# # Code formatting ############################# # Target: clangformat find_program(CLANGFORMAT_EXE clang-format PATHS ${HCC_HOME}/bin) if(CLANGFORMAT_EXE) file(GLOB_RECURSE FORMAT_SOURCE_FILE_LIST *.cpp *.hpp *.h) add_custom_target(clangformat COMMAND ${CLANGFORMAT_EXE} -style=file -i ${FORMAT_SOURCE_FILE_LIST} WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}) endif() ############################# # Testing steps ############################# # HIT is not compatible with Windows if(NOT WIN32) set(HIP_ROOT_DIR ${CMAKE_CURRENT_BINARY_DIR}) set(HIP_SRC_PATH ${CMAKE_CURRENT_SOURCE_DIR}) if(HIP_PLATFORM STREQUAL "nvidia") execute_process(COMMAND "${CMAKE_COMMAND}" -E copy_directory "${HIP_SRC_PATH}/include" "${HIP_ROOT_DIR}/include" RESULT_VARIABLE RUN_HIT ERROR_QUIET) endif() execute_process(COMMAND "${CMAKE_COMMAND}" -E copy_directory "${HIP_COMMON_INCLUDE_DIR}/hip/" "${HIP_ROOT_DIR}/include/hip/" RESULT_VARIABLE RUN_HIT ERROR_QUIET) execute_process(COMMAND "${CMAKE_COMMAND}" -E copy_directory "${HIP_COMMON_DIR}/cmake" "${HIP_ROOT_DIR}/cmake" RESULT_VARIABLE RUN_HIT ERROR_QUIET) if(${RUN_HIT} EQUAL 0) execute_process(COMMAND "${CMAKE_COMMAND}" -E copy_directory "${HIP_COMMON_BIN_DIR}" "${HIP_ROOT_DIR}/bin" RESULT_VARIABLE RUN_HIT ERROR_QUIET) endif() file(COPY ${HIPCC_BIN_DIR}/hipcc DESTINATION ${HIP_ROOT_DIR}/bin/) file(COPY ${HIPCC_BIN_DIR}/hipcc.pl DESTINATION ${HIP_ROOT_DIR}/bin/) file(COPY ${HIPCC_BIN_DIR}/hipconfig DESTINATION ${HIP_ROOT_DIR}/bin/) file(COPY ${HIPCC_BIN_DIR}/hipconfig.pl DESTINATION ${HIP_ROOT_DIR}/bin/) file(COPY ${HIPCC_BIN_DIR}/hipvars.pm DESTINATION ${HIP_ROOT_DIR}/bin/) if(HIP_CATCH_TEST EQUAL "1") message(STATUS "Building of catch tests through hipamd is no longer supported. Testing targets will not be available. catch tests have been moved to an independent github project hip-tests. Please refer to hip-tests Readme for build instructions! ") else() if(${RUN_HIT} EQUAL 0) set(CMAKE_MODULE_PATH "${HIP_ROOT_DIR}/cmake" ${CMAKE_MODULE_PATH}) include(${HIP_COMMON_DIR}/tests/hit/HIT.cmake) include(${HIP_COMMON_DIR}/tests/Tests.cmake) else() message(STATUS "Testing targets will not be available. To enable them please ensure that the HIP installation directory is writeable. Use -DCMAKE_INSTALL_PREFIX to specify a suitable location") endif() endif() endif() ############################# # Code analysis ############################# # Target: clang if(HIP_HIPCC_EXECUTABLE) add_custom_target(analyze COMMAND ${HIP_HIPCC_EXECUTABLE} -fvisibility=hidden -fvisibility-inlines-hidden --analyze --analyzer-outputtext -isystem ${ROCM_PATH}/${CMAKE_INSTALL_INCLUDEDIR} -Wno-unused-command-line-argument -I${ROCM_PATH}/${CMAKE_INSTALL_INCLUDEDIR} -c src/*.cpp -Iinclude/ -I./ WORKING_DIRECTORY ${HIP_SRC_PATH}) if(CPPCHECK_EXE) add_dependencies(analyze cppcheck) endif() endif() #File reorg Backward compatibility function if(NOT WIN32) if(FILE_REORG_BACKWARD_COMPATIBILITY) # To enabe/disable #error in wrapper header files if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR) if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR}) set(ROCM_HEADER_WRAPPER_WERROR "$ENV{ROCM_HEADER_WRAPPER_WERROR}" CACHE STRING "Header wrapper warnings as errors.") else() set(ROCM_HEADER_WRAPPER_WERROR "OFF" CACHE STRING "Header wrapper warnings as errors.") endif() endif() if(ROCM_HEADER_WRAPPER_WERROR) set(deprecated_error 1) else() set(deprecated_error 0) endif() include(hip-backward-compat.cmake) endif() #FILE_REORG_BACKWARD_COMPATIBILITY endif() clr-rocm-5.7.1/hipamd/INSTALL.md000066400000000000000000000041341450307266000161410ustar00rootroot00000000000000## Prerequisites - Install mesa-common-dev - Either build or install [COMGR](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport), [CLANG](https://github.com/RadeonOpenCompute/llvm-project) and [Device Library](https://github.com/RadeonOpenCompute/ROCm-Device-Libs) ## Branch of repository Before get HIP source code, set the expected branch of repository at the variable HIP_BRANCH. For example, for ROCm5.0 release branch, set ``` export HIP_BRANCH=rocm-5.0.x ``` ROCm5.1 release branch, set ``` export HIP_BRANCH=rocm-5.1.x ``` Similiar format for future branches. ## Getting the source code ```bash git clone -b $HIP_BRANCH https://github.com/ROCm-Developer-Tools/hipamd.git git clone -b $HIP_BRANCH https://github.com/ROCm-Developer-Tools/hip.git git clone -b $HIP_BRANCH https://github.com/ROCm-Developer-Tools/ROCclr.git git clone -b $HIP_BRANCH https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime.git ``` ## Set the environment variables ```bash export HIPAMD_DIR="$(readlink -f hipamd)" export HIP_DIR="$(readlink -f hip)" export ROCclr_DIR="$(readlink -f ROCclr)" export OPENCL_DIR="$(readlink -f ROCm-OpenCL-Runtime)" ``` ## Build HIPAMD Commands to build hipamd are as following, ```bash cd "$HIPAMD_DIR" mkdir -p build; cd build cmake -DHIP_COMMON_DIR=$HIP_DIR -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DCMAKE_PREFIX_PATH="/" .. make -j$(nproc) sudo make install ``` Please note, HIP_COMMON_DIR looks for hip common ([HIP](https://github.com/ROCm-Developer-Tools/HIP/)) source codes. By default, release version of hipamd is built. hip will be installed to the default path /hip Developer can use cmake option CMAKE_INSTALL_PREFIX to define the path where hip is expected to be installed, commands to build are as following, ```bash cd "$HIPAMD_DIR" mkdir -p build; cd build cmake -DHIP_COMMON_DIR=$HIP_DIR -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DCMAKE_PREFIX_PATH="/" -DCMAKE_INSTALL_PREFIX=$PWD/install .. make -j$(nproc) sudo make install ``` After installation, make sure HIP_PATH is pointed to the path where hip is installed. clr-rocm-5.7.1/hipamd/LICENSE.txt000066400000000000000000000020701450307266000163310ustar00rootroot00000000000000Copyright (c) 2008 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. clr-rocm-5.7.1/hipamd/README.md000066400000000000000000000055251450307266000157750ustar00rootroot00000000000000## What is this repository for? ### This repository provides [HIP](https://github.com/ROCm-Developer-Tools/HIP) implementation specifically for AMD platform. ## DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. © 2021 Advanced Micro Devices, Inc. All Rights Reserved. ## Repository branches: The hipamd repository maintains several branches. The branches that are of importance are: * Main branch: This is the stable branch. It is up to date with the latest release branch, for example, if the latest HIP release is rocm-4.4, main branch will be the repository based on this release. * Develop branch: This is the default branch, on which the new features are still under development and visible. While this maybe of interest to many, it should be noted that this branch and the features under development might not be stable. * Release branches. These are branches corresponding to each ROCM release, listed with release tags, such as rocm-4.4, etc. ## Release tagging: hipamd releases are typically naming convention for each ROCM release to help differentiate them. * rocm x.yy: These are the stable releases based on the ROCM release. This type of release is typically made once a month.*clr-rocm-5.7.1/hipamd/bin/000077500000000000000000000000001450307266000152575ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/bin/roc-obj000077500000000000000000000234771450307266000165550ustar00rootroot00000000000000#!/bin/bash #| Usage: roc-obj [-h] [-t REGEXP] [-o OUTDIR] [-I REPLACE-STRING|-i] [-d] #| EXECUTABLE... [: [SUFFIX COMMAND [ARGS...] ;]...] #| #| Wrapper for roc-obj-ls and roc-obj-extract which extracts code objects #| embedded in each EXECUTABLE and optionally applies COMMANDs to them. #| #| If the POSIX extended regular expression REGEXP is specified, only embedded #| code objects whose Target ID matches REGEXP are extracted; otherwise all #| code objects are extracted. #| #| If the directory path OUTDIR is specified, it is created if it does not #| already exist, and the code objects are extracted into it; otherwise they #| are extracted into the current working directory. #| #| The extracted files are named by appending a ":" followed by the Target ID #| of the extracted code object to the input filename EXECUTABLE they were #| extracted from. #| #| If the list of EXECUTABLE arguments is terminated with ":" then after all #| selected files are successfully extracted, zero or more additional embedded #| command-lines, separated by ";", are read from the command-line starting #| after the ":". These must specify a SUFFIX used to name the output of the #| corresponding COMMAND, along with the COMMAND name and any ARGS to it. #| #| Then each COMMAND is executed, as if by a POSIX "execvp" function, once for #| each embedded code object that was created in OUTDIR. (Note: Typically this #| means the user must ensure the commands are present in at least one #| directory of the "PATH" environment variable.) For each execution of #| COMMAND: #| #| If REPLACE-STRING is specified, all instances of REPLACE-STRING in ARGS are #| replaced with the file path of the extracted code object before executing #| COMMAND. #| #| The standard input is redirected from the extracted code object. #| #| If SUFFIX is "-" the standard output is not redirected. If SUFFIX is "!" the #| standard output is redirected to /dev/null. Otherwise, the standard output #| is redirected to files named by the file path of the extracted code object #| with SUFFIX appended. #| #| Note: The executables roc-obj-ls, roc-obj-extract, and llvm-objdump (in the #| case of disassembly requested using the -d flag) are searched for in a #| unique way. A series of directories are searched, some conditionally, until #| a suitable executable is found. If all directories are searched without #| finding the executable, an error occurs. The first directory searched is the #| one containing the hard-link to the roc-obj being executed, known as the #| "base directory". Next, if the environment variable HIP_CLANG_PATH is set, #| it is searched; otherwise, the base directory path is appended with #| "../../llvm/bin" and it is searched. Finally, the PATH is searched as if by #| a POSIX "execvp" function. #| #| Option Descriptions: #| -h, --help print this help text and exit #| -t, --target-id only extract code objects from EXECUTABLE whose Target ID #| matches the POSIX extended regular expression REGEXP #| -o, --outdir set the output directory, which is created if it #| does not exist #| -I, --replace-string replace all occurrences of the literal string #| REPLACE-STRING in ARGS with the input filename #| -i, --replace equivalent to -I{} #| -d, --disassemble diassemble extracted code objects; equivalent to #| : .s llvm-objdump -d - ; #| #| Example Usage: #| #| Extract all code objects embedded in a.so: #| $ roc-obj a.so #| #| Extract all code objects embedded in a.so, b.so, and c.so: #| $ roc-obj a.so b.so c.so #| #| Extract all code objects embedded in a.so with "gfx9" in their Target ID: #| $ roc-obj -t gfx9 a.so #| #| Extract all code objects embedded in a.so into output/ (creating it if needed): #| $ roc-obj -o output/ a.so #| #| Extract all code objects embedded in a.so with "gfx9" in their Target ID #| into output/ (creating it if needed): #| $ roc-obj -t gfx9 -o output/ a.so #| #| Extract all code objects embedded in a.so, and then disassemble each of them #| to files ending with .s: #| $ roc-obj -d a.so #| #| Extract all code objects embedded in a.so, and count the number of bytes in #| each, writing the results to files ending with .count: #| $ roc-obj a.so : .count wc -c #| #| Extract all code objects embedded in a.so, and inspect their ELF headers #| using llvm-readelf (which will not read from standard input), writing to #| files ending with .hdr: #| $ roc-obj -I'{}' a.so : .hdr llvm-readelf -h '{}' #| #| Extract all code objects embedded in a.so, and then extract each of their #| .text sections using llvm-objcopy (which won't read from standard input #| or write to standard output): #| $ roc-obj -I'{}' a.so : ! llvm-objcopy -O binary :only-section=.text '{}' '{}.text' #| #| Extract all code objects embedded in a.so, b.so, and c.so with target #| feature xnack disabled into directory out/. Then, for each: #| Write the size in bytes into a file ending with .count, and #| Write a textual description of the ELF headers to a file ending with .hdr, and #| Extract the .text section to a file ending with .text #| $ roc-obj -I'{}' -t xnack- -o out/ a.so b.so c.so : \ #| .count wc -c \; #| .hdr llvm-readelf -h '{}' \; #| ! llvm-objcopy -O binary --only-section=.text '{}' '{}.text' set -euo pipefail usage() { sed -n 's/^#| \?\(.*\)$/\1/p' "$0" } usage_then_exit() { local -r status="$1"; shift usage >&$(( status ? 2 : 1 )) exit "$status" } fail() { printf "error: %s\n" "$*" >&2 exit 1 } # Account for the fact that we do not necessarily put ROCm tools in the PATH, # nor do we have a single, unified ROCm "bin/" directory. # # Note that this is only used for roc-obj-ls, roc-obj-extract, and "shortcut" # options like -d, and the user can still use any copy of llvm-* by explicitly # invoking it with a full path, e.g. : /path/to/llvm-* ... ; find_rocm_executable_or_fail() { local -r command="$1"; shift local file local searched=() for dir in "$BASE_DIR" "${HIP_CLANG_PATH:-"$BASE_DIR/../../llvm/bin"}"; do file="$dir/$command" if [[ -x $file ]]; then printf "%s" "$file" return else searched+=("$dir") fi done if hash "$command" 2>/dev/null; then printf "%s" "$command" else fail could not find "$command" in "${searched[*]}" or PATH fi } # Extract the embedded code objects of the executable file given as the first # argument into OPT_OUTDIR, filtering them via OPT_TARGET_ID. # # Deletes any resulting files which are empty, and prints the paths of the # remaining files. extract() { local -r executable="$1"; shift local prefix prefix="$(basename -- "$executable")" # We want the shell to split the result of roc-obj-ls on whitespace, as # neither the Target ID nor the URI can have embedded spaces. # shellcheck disable=SC2046 set -- $("$ROC_OBJ_LS" -- "$executable" | awk "\$2~/$OPT_TARGET_ID/") while (( $# )); do local output="$prefix:$1"; shift output="$output.$1"; shift local uri="$1"; shift [[ -n $OPT_OUTDIR ]] && output="$OPT_OUTDIR/$output" "$ROC_OBJ_EXTRACT" -o - -- "$uri" >"$output" if [[ -s $output ]]; then printf '%s\n' "$output" else rm "$output" fi done (( $# )) && fail expected even number of fields from roc-obj-ls } # Run a command over a list of inputs, naming output files with the supplied # suffix and applying OPT_REPLACE_STRING if needed. # # Arguments are of the form: # $suffix $command $args... ; $inputs run_command() { local -r suffix="$1"; shift local -r command="$1"; shift local args=() while (( $# )); do local arg="$1"; shift [[ $arg == ';' ]] && break args+=("$arg") done local inputs=("$@") for input in "${inputs[@]}"; do case "$suffix" in '-') output=/dev/stdout;; '!') output=/dev/null;; *) output="$input$suffix";; esac "$command" "${args[@]//$OPT_REPLACE_STRING/$input}" <"$input" >"$output" done } main() { [[ -n $OPT_OUTDIR ]] && mkdir -p "$OPT_OUTDIR" local inputs=() while (( $# )); do local executable="$1"; shift [[ $executable == : ]] && break # Append the file paths extracted from $executable to $inputs readarray -t -O "${#inputs[@]}" inputs < <(extract "$executable") done (( ${#inputs[@]} )) || fail no executables specified while (( $# )); do local suffix="$1"; shift local command="$1"; shift local args=() while (( $# )); do local arg="$1"; shift [[ $arg == \; ]] && break args+=("$arg") done run_command "$suffix" "$command" "${args[@]}" \; "${inputs[@]}" done (( OPT_DISASSEMBLE )) && run_command .s "$OBJDUMP" -d - \; "${inputs[@]}" } OPT_TARGET_ID='' OPT_OUTDIR='' OPT_REPLACE_STRING='' OPT_DISASSEMBLE=0 ! getopt -T || fail util-linux enhanced getopt required getopt="$(getopt -o +ht:o:I:id \ --long help,target-id:,outdir:,replace:,replace-default,disassemble \ -n roc-obj -- "$@")" eval set -- "$getopt" unset getopt while true; do case "$1" in -h | --help) usage_then_exit 0;; -t | --target-id) OPT_TARGET_ID="${2//\//\\\/}"; shift 2;; -o | --outdir) OPT_OUTDIR="$2"; shift 2;; -I | --replace-string) OPT_REPLACE_STRING="$2"; shift 2;; -i | --replace) OPT_REPLACE_STRING='{}'; shift;; -d | --disassemble) OPT_DISASSEMBLE=1; shift;; --) shift; break;; *) usage_then_exit 1;; esac done readonly -- OPT_TARGET_ID OPT_OUTDIR OPT_REPLACE_STRING OPT_DISASSEMBLE # We expect to be installed as ROCM_PATH/hip/bin/roc-obj, which means BASE_DIR # is ROCM_PATH/hip/bin. BASE_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)" (( OPT_DISASSEMBLE )) && OBJDUMP="$(find_rocm_executable_or_fail llvm-objdump)" ROC_OBJ_LS="$(find_rocm_executable_or_fail roc-obj-ls)" ROC_OBJ_EXTRACT="$(find_rocm_executable_or_fail roc-obj-extract)" readonly -- BASE_DIR OBJDUMP ROC_OBJ_LS ROC_OBJ_EXTRACT main "$@" clr-rocm-5.7.1/hipamd/bin/roc-obj-extract000077500000000000000000000204771450307266000202220ustar00rootroot00000000000000#!/usr/bin/perl # Copyright (c) 2020-2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. use strict; use File::Copy; use File::Spec; use File::Basename; use File::Which; use Cwd 'realpath'; use Getopt::Std; use List::Util qw(max); use URI::Encode; my $extract_range_specifier; my $extract_pid; my $extract_file; my $output_file; my $output_path; my $extract_offset; my $extract_size; my $pid_running; my $verbose=0; my $error=0; my $output_to_stdout=0; sub usage { print("Usage: $0 [-o|v|h] URI... \n"); print(" URIs can be read from STDIN, one per line.\n"); print(" From the URIs specified, extracts code objects into files named: "); print("-[pid]-offset-size.co\n\n"); print("Options:\n"); print(" -o \tPath for output. If \"-\" specified, code object is printed to STDOUT.\n"); print(" -v \tVerbose output to STDOUT.\n"); print(" -h \tShow this help message.\n"); print("\nURI syntax:\n"); print("\tcode_object_uri ::== file_uri | memory_uri\n"); print("\tfile_uri ::== \"file://\" extract_file [ range_specifier ]\n"); print("\tmemory_uri ::== \"memory://\" process_id range_specifier\n"); print("\trange_specifier ::== range_delimiter range_attribute [\"&\" range_attribute]\n"); print("\trange_delimiter ::== \"#\" | \"?\"\n"); print("\trange_attribute ::== [\"offset=\" number | \"size=\" number ]\n"); print("\textract_file ::== URI_ENCODED_OS_FILE_PATH\n"); print("\tprocess_id ::== DECIMAL_NUMBER\n"); print("\tnumber ::== HEX_NUMBER \| DECIMAL_NUMBER \| OCTAL_NUMBER\n\n"); print("\tExample: file://dir1/dir2/hello_world#offset=133&size=14472 \n"); print("\t memory://1234#offset=0x20000&size=3000\n\n"); print(" NOTES:\n\n"); print("\tWhen specifying a URI in a shell command you will need to escape the \'&\' character in the range_specifier.\n"); print("\tIf \"size=\" is not specified, the default is the remainder of the file from the given offset.\n\n"); exit($error); } # Process options my %options=(); getopts('vho:', \%options); if (defined $options{h}) { usage(); } if (defined $options{v}) { $verbose = 1; } if (defined $options{o}) { $output_path = $options{o}; if ($output_path eq "-") { $output_to_stdout=1; } else { (-d $output_path) || die("Error: Path \'$output_path\' cannot be found.\n"); } } # Only push STDIN if there are no arguments -- otherwise this # consumes the caller's stdin by accident. # push STDIN to ARGV array. if ($#ARGV < 0) { push @ARGV, unless -t STDIN; } # error check: enough arguments presented. if ($#ARGV < 0) { print(STDERR "Error: No arguments.\n"); $error++; usage(); } # error check: command dd is available. my $dd_cmd = which("dd"); (-f $dd_cmd) || die("Error: Can't find dd command\n"); foreach my $uri_str(@ARGV) { chomp $uri_str; my ($uri_protocol, $specs) = split(/:\/\//,$uri_str); my $obj_uri_encode = URI::Encode->new(); my $decoded_extract_file; my $file_size; if (lc($uri_protocol) eq "file") { # expect file path ($extract_file, $extract_range_specifier) = split(/[#,?]/,$specs); # decode the file name. URIs may have file/path names with non-alphanumeric characters, which will be encoded with %. We need to decode these. $decoded_extract_file = $obj_uri_encode->decode($extract_file); # verify file exists: if (! -e $decoded_extract_file) { print(STDERR "Error: can't find file: $decoded_extract_file\n"); $error++; next; } # use the output_path is specified, otherwise use current working dir. if ($output_path ne "") { $output_file = File::Spec->catfile($output_path, basename($decoded_extract_file)); } else { $output_file = basename($decoded_extract_file); } } elsif ( lc($uri_protocol) eq "memory") { # expect memory specifier ($extract_pid, $extract_range_specifier) = split(/[#,?]/,$specs); # verify pid is currently running $pid_running = kill 0, $extract_pid; if (! $pid_running) { print(STDERR "Error: PID: $extract_pid is NOT running\n"); $error++; next; } # get pid filename: $extract_file = "/proc/$extract_pid/mem"; # verify file exists: if (! -e $extract_file) { print(STDERR "Error: can't find file: $extract_file\n"); $error++; next; } # for extracting from a pid, make the output file in the current dir/path with the pid value as a name. $output_file = "pid${extract_pid}"; # need to set $decoded_extract_file, because later we use this for other checks. $decoded_extract_file = $extract_file; } else { # error, unrecognized Code Object URI print(STDERR "Error: \'$uri_protocol\' is not recognized as a supported code object URI.\n"); $error++; next; } # it is valid to not give a range specifier in a URI, in which case the entire code object will be extracted. if ($extract_range_specifier ne "") { my @tokens; my $str; my $value; my $size_specified = 0; @tokens = split(/[&]/,$extract_range_specifier); foreach (@tokens) { ($str,$value) = split(/=/,$_); if ($str eq "size") { $extract_size=$value; $size_specified = 1; } elsif ($str eq "offset") { $extract_offset=$value; } } if ($size_specified != 1) { # "size" not specified. default to rest of file (total size - offset) $extract_size = -s $decoded_extract_file; $extract_size -= $extract_offset; } } else { # Error if URI is a memory request, and we have no range_specifier. if ($pid_running) { print(STDERR "Error: must specify a Range Specifier (offset and size) for a memory URI: $uri_str\n"); $error++; next; } $extract_offset = 0; $extract_size = -s $decoded_extract_file; } # We should have at least a valid size to extract; ignore cases with size=0. if ($extract_size != 0) { print("Reading input file \"$extract_file\" ...\n") if ($verbose); # only if this is a File URI. if (lc($uri_protocol) eq "file") { # verify that offset+size does not exceed file size: my $file_size = -s $decoded_extract_file; my $size = int($extract_offset) + int($extract_size); if ( $size > $file_size ) { print(STDERR "Error: requested offset($extract_offset) + size($extract_size) exceeds file size($file_size) for file \"$decoded_extract_file\".\n"); $error++; next; } } open(INPUT_FP, "<", $decoded_extract_file) || die $!; binmode INPUT_FP; # extract the code object my $co_filename; if (!$output_to_stdout) { $co_filename = "of=\'${output_file}-offset${extract_offset}-size${extract_size}.co\'"; } my $dd_cmd_str = "$dd_cmd if=\'$decoded_extract_file\' $co_filename skip=$extract_offset count=$extract_size bs=1 status=none"; print("DD Command: $dd_cmd_str\n") if ($verbose); my $dd_ret = system($dd_cmd_str); if ($dd_ret != 0) { print(STDERR "Error: DD command ($dd_cmd_str) failed with RC: $dd_ret\n"); $error++; } print("Extract request: file: $extract_file offset: $extract_offset size: $extract_size\n") if ($verbose); } else { print("Warning: trying to extract from $extract_file at offset=$extract_offset with size=0. Nothing to extract.\n") if ($verbose); } } # end of for each (URI) argument exit($error); clr-rocm-5.7.1/hipamd/bin/roc-obj-ls000077500000000000000000000156161450307266000171650ustar00rootroot00000000000000#!/usr/bin/perl # Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. use strict; use File::Copy; use File::Spec; use File::Basename; use File::Which; use Cwd 'realpath'; use Getopt::Std; use List::Util qw(max); use URI::Encode; sub usage { print("Usage: $0 [-v|h] executable...\n"); print("List the URIs of the code objects embedded in the specfied host executables.\n"); print("-v \tVerbose output (includes Entry ID)\n"); print("-h \tShow this help message\n"); exit; } # sub to read a qword. 1st arg is a FP, 2nd arg is ref to destination var. sub readq { my ($input_fp, $qword) = @_; read($input_fp, my $bytes, 8) == 8 or die("Error: Failed to read 8 bytes\n"); ${$qword} = unpack("Q<", $bytes); } # sub to move address to next alignment boundary # first arg is address to move # second arg is alignment requirement/boundary sub align_up { my ($address, $alignment) = @_; $address = int(($address + ($alignment - 1)) / $alignment) * $alignment; } # Process options my %options=(); getopts('vhd', \%options); if (defined $options{h}) { usage(); } my $verbose = $options{v}; my $debug = $options{d}; my $num_bundles = 1; my $bundle_alignment = 4096; # look for objdump my $objdump = which("objdump"); (-f $objdump) || die("Error: Can't find objdump command\n"); # for each argument (which should be an executable): foreach my $executable_file(@ARGV) { # debug message print("Reading input file \"$executable_file\" ...\n") if ($debug); # verify/open file specified. open (INPUT_FP, "<", $executable_file) || die("Error: failed to open file: $executable_file\n"); binmode INPUT_FP; # kernel section information my $escaped_name=quotemeta($executable_file); my $bundle_section_name = ".hip_fatbin"; my $bundle_section_size = hex(`$objdump -h $escaped_name | grep $bundle_section_name | awk '{print \$3}'`); my $bundle_section_offset = hex(`$objdump -h $escaped_name | grep $bundle_section_name | awk '{print \$6}'`); $bundle_section_size or die("Error: No kernel section found\n"); my $bundle_section_end = $bundle_section_offset + $bundle_section_size; if ($debug) { printf("Code Objects Bundle section size: %x\n",$bundle_section_size); printf("Code Objects Bundle section offset: %x\n",$bundle_section_offset); printf("Code Objects Bundle section end: %x\n\n",$bundle_section_end); } my $current_bundle_offset = $bundle_section_offset; printf("Current Bundle offset: 0x%X\n",$current_bundle_offset) if ($debug); # move fp to current_bundle_offset. seek(INPUT_FP, $current_bundle_offset, 0); while ($current_bundle_offset < $bundle_section_end) { # skip OFFLOAD_BUNDLER_MAGIC_STR my $magic_str; my $read_bytes = read(INPUT_FP, $magic_str, 24); if (($read_bytes != 24) || ($magic_str ne "__CLANG_OFFLOAD_BUNDLE__")) { print(STDERR "Error: Offload bundle magic string not detected\n") if ($debug); last; } # read number of bundle entries, which are code objects. my $num_codeobjects; readq(\*INPUT_FP,\$num_codeobjects); # header with current bundle number and number of embedded code objcts in that bundle. # print("Bundle Number: $num_bundles with $num_codeobjects Code Objects:\n") if ($very_verbose); my $end_of_current_bundle = $current_bundle_offset; # Column Header. printf("%-8s%-40s%35s\n","Bundle#","Entry ID:","URI:") if ($verbose); # for each Bundle entry (code object) .... for (my $iter = 0; $iter < $num_codeobjects; $iter++) { print("\nEntry #$iter\n") if $debug; # read bundle entry (code object) offset my $entry_offset; my $abs_offset; readq(*INPUT_FP,\$entry_offset); printf("entry_offset: 0x%X\n",$entry_offset) if $debug; # read bundle entry (code object) size my $entry_size; readq(*INPUT_FP,\$entry_size); printf("entry_size: 0x%X\n",$entry_size) if $debug; # read triple size my $triple_size; readq(*INPUT_FP,\$triple_size); printf("triple_size: 0x%X\n",$triple_size) if $debug; # read triple string my $triple; my $read_bytes = read(INPUT_FP, $triple, $triple_size); $read_bytes == $triple_size or die("Error: Fail to parse triple\n"); print("triple: $triple\n") if $debug; # because the bundle entry's offset is relative to the beginning of the bundled code object section. $abs_offset = int($current_bundle_offset + $entry_offset); # and we need to keep track of where we are in the current bundle. $end_of_current_bundle = int($abs_offset + $entry_size); printf("abs_offset: 0x%X\n",$abs_offset) if $debug; my $obj_uri_encode = URI::Encode->new(); my $encoded_executable_file = $obj_uri_encode->encode($executable_file); printf("%-8s%-40s%35s%s%s%s%s%s%s\n",$num_bundles,$triple,"file:\/\/",$encoded_executable_file,"\#offset=",$abs_offset, "\&size=",$entry_size); printf("end_of_current_bundle: 0x%X\n",$end_of_current_bundle) if $debug; printf("Hex values: file:\/\/$encoded_executable_file#offset=0x%X$abs_offset\&size=0x%X\n", $abs_offset, $entry_size) if $debug; } # End of for each Bundle entry (code object) ... printf("\n") if ($verbose); # we've finished listing this current bundle ... printf("current_bundle_offset: %x \n",$current_bundle_offset) if ($debug); printf("bundle_section_end: %x \n", $bundle_section_end) if ($debug); # move current_bundle_offset to next alignment boundary. $current_bundle_offset = align_up($end_of_current_bundle,$bundle_alignment); printf("Adjusting for alignment of next bundle: current_bundle_offset: %x \n\n\n", $current_bundle_offset) if ($debug); # seek to the end of the current bundle: seek(INPUT_FP, $current_bundle_offset, 0); # increment the number of bundles listed. $num_bundles = $num_bundles+1; } # End of while loop } # End of for each command line argument exit(0); clr-rocm-5.7.1/hipamd/header_template.hpp.in000066400000000000000000000044151450307266000207540ustar00rootroot00000000000000/* Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef @include_guard@ #define @include_guard@ #ifndef ROCM_HEADER_WRAPPER_WERROR #define ROCM_HEADER_WRAPPER_WERROR @deprecated_error@ #endif #if ROCM_HEADER_WRAPPER_WERROR /* ROCM_HEADER_WRAPPER_WERROR 1 */ #error "@file_name@ has moved to @CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@/@headerfile_dir@ and package include paths have changed. Provide include path as @CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@ when using cmake packages." #else /* ROCM_HEADER_WRAPPER_WERROR 0 */ #if defined(__GNUC__) #warning "@file_name@ has moved to @CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@/@headerfile_dir@ and package include paths have changed. Provide include path as @CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@ when using cmake packages." #else #pragma message ("@file_name@ has moved to @CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@/@headerfile_dir@ and package include paths have changed. Provide include path as @CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@ when using cmake packages.") #endif #endif /* ROCM_HEADER_WRAPPER_WERROR */ @include_statements@ @hashzero_check@ @file_contents@ @hash_endif@ #endif clr-rocm-5.7.1/hipamd/hip-backward-compat.cmake000066400000000000000000000276741450307266000213460ustar00rootroot00000000000000# Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved. # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.16.8) set(HIP_BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR}) set(HIP_WRAPPER_DIR ${HIP_BUILD_DIR}/wrapper_dir) set(HIP_WRAPPER_INC_DIR ${HIP_WRAPPER_DIR}/include/hip) set(HIP_WRAPPER_BIN_DIR ${HIP_WRAPPER_DIR}/bin) set(HIP_WRAPPER_LIB_DIR ${HIP_WRAPPER_DIR}/lib) set(HIP_WRAPPER_CMAKE_DIR ${HIP_WRAPPER_DIR}/cmake) set(HIP_WRAPPER_FINDHIP_DIR ${HIP_WRAPPER_DIR}/FindHIP) set(HIP_SRC_INC_DIR ${HIP_SRC_PATH}/include/hip) set(HIP_SRC_BIN_DIR ${HIP_SRC_PATH}/bin) set(HIP_INFO_FILE ".hipInfo") set(HIP_AMD_DETAIL_DIR "amd_detail") set(HIP_NVIDIA_DETAIL_DIR "nvidia_detail") #Function to set actual file contents in wrapper files #Some components grep for the contents in the file function(set_file_contents input_file) set(hashzero_check "#if 0 /* The following is a copy of the original file for the benefit of build systems which grep for values * in this file rather than preprocess it. This is just for backward compatibility */") file(READ ${input_file} file_contents) set(hash_endif "#endif") get_filename_component(file_name ${input_file} NAME) configure_file(${HIP_SRC_PATH}/header_template.hpp.in ${HIP_WRAPPER_INC_DIR}/${file_name}) endfunction() #use header template file and generate wrapper header files function(generate_wrapper_header) #create respecitve folder in /opt/rocm/hip file(MAKE_DIRECTORY ${HIP_WRAPPER_INC_DIR}/${HIP_AMD_DETAIL_DIR}) file(MAKE_DIRECTORY ${HIP_WRAPPER_INC_DIR}/${HIP_NVIDIA_DETAIL_DIR}) #find all header files from include/hip file(GLOB include_files ${HIP_BUILD_DIR}/include/hip/*.h) #Convert the list of files into #includes foreach(header_file ${include_files}) # set include guard get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE) string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME) set(include_guard "HIP_WRAPPER_INCLUDE_HIP_${INC_GAURD_NAME}_H") #set #include statement get_filename_component(file_name ${header_file} NAME) set(headerfile_dir "hip") set(include_statements "#include \"../../../${CMAKE_INSTALL_INCLUDEDIR}/${headerfile_dir}/${file_name}\"\n") if(${file_name} STREQUAL "hip_version.h") set_file_contents(${header_file}) else() configure_file(${HIP_SRC_PATH}/header_template.hpp.in ${HIP_WRAPPER_INC_DIR}/${file_name}) endif() endforeach() #find all header files from include/hip/amd_detail file(GLOB include_files ${HIP_SRC_INC_DIR}/${HIP_AMD_DETAIL_DIR}/*) #Convert the list of files into #includes foreach(header_file ${include_files}) # set include guard get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE) string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME) set(include_guard "HIP_WRAPPER_INCLUDE_HIP_AMD_DETAIL_${INC_GAURD_NAME}_H") #set #include statement get_filename_component(file_name ${header_file} NAME) set(headerfile_dir "hip/${HIP_AMD_DETAIL_DIR}") set(include_statements "#include \"../../../../${CMAKE_INSTALL_INCLUDEDIR}/${headerfile_dir}/${file_name}\"\n") configure_file(${HIP_SRC_PATH}/header_template.hpp.in ${HIP_WRAPPER_INC_DIR}/${HIP_AMD_DETAIL_DIR}/${file_name}) endforeach() #find all header files from include/hip/nvidia_detail file(GLOB include_files ${HIP_SRC_INC_DIR}/${HIP_NVIDIA_DETAIL_DIR}/*) #Convert the list of files into #includes foreach(header_file ${include_files}) # set include guard get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE) string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME) set(include_guard "HIP_WRAPPER_INCLUDE_HIP_NVIDIA_DETAIL_${INC_GAURD_NAME}_H") #set #include statement get_filename_component(file_name ${header_file} NAME) set(headerfile_dir "hip/${HIP_NVIDIA_DETAIL_DIR}") set(include_statements "#include \"../../../../${CMAKE_INSTALL_INCLUDEDIR}/${headerfile_dir}/${file_name}\"\n") configure_file(${HIP_SRC_PATH}/header_template.hpp.in ${HIP_WRAPPER_INC_DIR}/${HIP_NVIDIA_DETAIL_DIR}/${file_name}) endforeach() endfunction() #function to create symlink to binaries function(create_binary_symlink) file(MAKE_DIRECTORY ${HIP_WRAPPER_BIN_DIR}) #get all binaries file(GLOB binary_files ${HIP_SRC_BIN_DIR}/*) #Add .hipVersion to binary list set(binary_files "${binary_files}" ".hipVersion") foreach(binary_file ${binary_files}) get_filename_component(file_name ${binary_file} NAME) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_BINDIR}/${file_name} ${HIP_WRAPPER_BIN_DIR}/${file_name}) endforeach() endfunction() #function to create symlink to libraries function(create_library_symlink) file(MAKE_DIRECTORY ${HIP_WRAPPER_LIB_DIR}) if(BUILD_SHARED_LIBS) set(LIB_AMDHIP "libamdhip64.so") set(MAJ_VERSION "${HIP_LIB_VERSION_MAJOR}") set(SO_VERSION "${HIP_LIB_VERSION_STRING}") set(library_files "${LIB_AMDHIP}" "${LIB_AMDHIP}.${MAJ_VERSION}" "${LIB_AMDHIP}.${SO_VERSION}") set(LIB_HIPRTC "libhiprtc-builtins.so") set(library_files "${library_files}" "${LIB_HIPRTC}" "${LIB_HIPRTC}.${MAJ_VERSION}" "${LIB_HIPRTC}.${SO_VERSION}" ) set(LIB_RTC "libhiprtc.so") set(library_files "${library_files}" "${LIB_RTC}" "${LIB_RTC}.${MAJ_VERSION}" "${LIB_RTC}.${SO_VERSION}" ) else() set(library_files "libamdhip64.a") endif() foreach(file_name ${library_files}) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_LIBDIR}/${file_name} ${HIP_WRAPPER_LIB_DIR}/${file_name}) endforeach() #Add symlink for .hipInfo set(file_name ${HIP_INFO_FILE}) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_LIBDIR}/${file_name} ${HIP_WRAPPER_LIB_DIR}/${file_name}) endfunction() function(create_cmake_symlink) file(MAKE_DIRECTORY ${HIP_WRAPPER_CMAKE_DIR}/hip) #create symlink to all hip config files file(GLOB config_files ${HIP_BUILD_DIR}/hip-config*) foreach(config_name ${config_files}) get_filename_component(file_name ${config_name} NAME) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../../../${CMAKE_INSTALL_LIBDIR}/cmake/hip/${file_name} ${HIP_WRAPPER_CMAKE_DIR}/hip/${file_name}) endforeach() unset(config_files) #create symlink to hip-lang file(MAKE_DIRECTORY ${HIP_WRAPPER_CMAKE_DIR}/hip-lang) file(GLOB config_files ${HIP_BUILD_DIR}/src/hip-lang-config*) foreach(config_name ${config_files}) get_filename_component(file_name ${config_name} NAME) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../../../${CMAKE_INSTALL_LIBDIR}/cmake/hip-lang/${file_name} ${HIP_WRAPPER_CMAKE_DIR}/hip-lang/${file_name}) endforeach() unset(config_files) #create symlink to hiprtc config files file(MAKE_DIRECTORY ${HIP_WRAPPER_CMAKE_DIR}/hiprtc) file(GLOB config_files ${HIP_BUILD_DIR}/hiprtc-config*) foreach(config_name ${config_files}) get_filename_component(file_name ${config_name} NAME) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../../../${CMAKE_INSTALL_LIBDIR}/cmake/hiprtc/${file_name} ${HIP_WRAPPER_CMAKE_DIR}/hiprtc/${file_name}) endforeach() unset(config_files) #create symlink to FindHIP file(MAKE_DIRECTORY ${HIP_WRAPPER_FINDHIP_DIR}/FindHIP) file(GLOB config_files ${HIP_BUILD_DIR}/cmake/FindHIP/*.cmake) foreach(config_name ${config_files}) get_filename_component(file_name ${config_name} NAME) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../../${CMAKE_INSTALL_LIBDIR}/cmake/hip/FindHIP/${file_name} ${HIP_WRAPPER_FINDHIP_DIR}/FindHIP/${file_name}) endforeach() unset(config_files) file(GLOB config_files ${HIP_BUILD_DIR}/cmake/*.cmake) foreach(config_name ${config_files}) get_filename_component(file_name ${config_name} NAME) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_LIBDIR}/cmake/hip/${file_name} ${HIP_WRAPPER_FINDHIP_DIR}/${file_name}) endforeach() unset(config_files) endfunction() #Use template header file and generater wrapper header files generate_wrapper_header() install(DIRECTORY ${HIP_WRAPPER_INC_DIR} DESTINATION hip/include COMPONENT dev) # Create symlink to binaries create_binary_symlink() install(DIRECTORY ${HIP_WRAPPER_BIN_DIR} DESTINATION hip COMPONENT dev) option(BUILD_SHARED_LIBS "Build the shared library" ON) # Create symlink to library files create_library_symlink() if(HIP_PLATFORM STREQUAL "amd" ) if(BUILD_SHARED_LIBS) install(FILES ${HIP_WRAPPER_LIB_DIR}/libamdhip64.so DESTINATION hip/lib COMPONENT binary) install(FILES ${HIP_WRAPPER_LIB_DIR}/libamdhip64.so.${HIP_LIB_VERSION_MAJOR} DESTINATION hip/lib COMPONENT binary) install(FILES ${HIP_WRAPPER_LIB_DIR}/libamdhip64.so.${HIP_LIB_VERSION_STRING} DESTINATION hip/lib COMPONENT binary) install(FILES ${HIP_WRAPPER_LIB_DIR}/libhiprtc-builtins.so DESTINATION hip/lib COMPONENT binary) install(FILES ${HIP_WRAPPER_LIB_DIR}/libhiprtc-builtins.so.${HIP_LIB_VERSION_MAJOR} DESTINATION hip/lib COMPONENT binary) install(FILES ${HIP_WRAPPER_LIB_DIR}/libhiprtc-builtins.so.${HIP_LIB_VERSION_STRING} DESTINATION hip/lib COMPONENT binary) install(FILES ${HIP_WRAPPER_LIB_DIR}/libhiprtc.so DESTINATION hip/lib COMPONENT binary) install(FILES ${HIP_WRAPPER_LIB_DIR}/libhiprtc.so.${HIP_LIB_VERSION_MAJOR} DESTINATION hip/lib COMPONENT binary) install(FILES ${HIP_WRAPPER_LIB_DIR}/libhiprtc.so.${HIP_LIB_VERSION_STRING} DESTINATION hip/lib COMPONENT binary) else() install(FILES ${HIP_WRAPPER_LIB_DIR}/libamdhip64.a DESTINATION hip/lib COMPONENT binary) endif()#End BUILD_SHARED_LIBS endif()#End HIP_PLATFORM AMD #install hipInfo install(FILES ${HIP_WRAPPER_LIB_DIR}/${HIP_INFO_FILE} DESTINATION hip/lib COMPONENT binary) #create symlink to cmake files create_cmake_symlink() install(DIRECTORY ${HIP_WRAPPER_CMAKE_DIR}/hip-lang DESTINATION hip/lib/cmake COMPONENT binary) install(DIRECTORY ${HIP_WRAPPER_CMAKE_DIR}/hiprtc DESTINATION hip/lib/cmake COMPONENT binary) install(DIRECTORY ${HIP_WRAPPER_CMAKE_DIR}/hip DESTINATION hip/lib/cmake COMPONENT dev) install(DIRECTORY ${HIP_WRAPPER_FINDHIP_DIR}/ DESTINATION hip/cmake COMPONENT dev) clr-rocm-5.7.1/hipamd/hip-config-amd.cmake000077500000000000000000000134671450307266000203110ustar00rootroot00000000000000# Copyright (c) 2023 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.3) # Number of parallel jobs by default is 1 if(NOT DEFINED HIP_CLANG_NUM_PARALLEL_JOBS) set(HIP_CLANG_NUM_PARALLEL_JOBS 1) endif() # Windows Specific Definition here: if(WIN32) if(DEFINED ENV{HIP_PATH}) file(TO_CMAKE_PATH "$ENV{HIP_PATH}" HIP_PATH) elseif(DEFINED ENV{HIP_DIR}) file(TO_CMAKE_PATH "$ENV{HIP_DIR}" HIP_DIR) else() # using the HIP found set(HIP_PATH ${PACKAGE_PREFIX_DIR}) endif() else() # Linux # If HIP is not installed under ROCm, need this to find HSA assuming HSA is under ROCm if(DEFINED ENV{ROCM_PATH}) set(ROCM_PATH "$ENV{ROCM_PATH}") endif() # set a default path for ROCM_PATH if(NOT DEFINED ROCM_PATH) set(ROCM_PATH ${PACKAGE_PREFIX_DIR}) endif() endif() if(WIN32) # Using SDK folder file(TO_CMAKE_PATH "${HIP_PATH}" HIP_CLANG_ROOT) if (NOT EXISTS "${HIP_CLANG_ROOT}/bin/clang.exe") # if using install folder file(TO_CMAKE_PATH "${HIP_PATH}/../lc" HIP_CLANG_ROOT) endif() else() set(HIP_CLANG_ROOT "${ROCM_PATH}/llvm") endif() if(NOT HIP_CXX_COMPILER) set(HIP_CXX_COMPILER ${CMAKE_CXX_COMPILER}) endif() if(NOT WIN32) find_dependency(AMDDeviceLibs) endif() set(AMDGPU_TARGETS "" CACHE STRING "AMD GPU targets to compile for") set(GPU_TARGETS "${AMDGPU_TARGETS}" CACHE STRING "GPU targets to compile for") if(NOT WIN32) find_dependency(amd_comgr) endif() include( "${CMAKE_CURRENT_LIST_DIR}/hip-targets.cmake" ) #Using find_dependency to locate the dependency for the packages #This makes the cmake generated file xxxx-targets to supply the linker libraries # without worrying other transitive dependencies if(NOT WIN32) find_dependency(hsa-runtime64) find_dependency(Threads) endif() set(_IMPORT_PREFIX ${HIP_PACKAGE_PREFIX_DIR}) # Right now this is only supported for amd platforms set_target_properties(hip::host PROPERTIES INTERFACE_COMPILE_DEFINITIONS "__HIP_PLATFORM_HCC__=1;__HIP_PLATFORM_AMD__=1" ) set_target_properties(hip::amdhip64 PROPERTIES INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include" INTERFACE_SYSTEM_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include" ) get_target_property(amdhip64_type hip::amdhip64 TYPE) message(STATUS "hip::amdhip64 is ${amdhip64_type}") if(NOT WIN32) set_target_properties(hip::device PROPERTIES INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include" INTERFACE_SYSTEM_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include" ) endif() get_property(compilePropIsSet TARGET hip::device PROPERTY INTERFACE_COMPILE_OPTIONS SET) if (NOT compilePropIsSet AND HIP_CXX_COMPILER MATCHES ".*clang\\+\\+") hip_add_interface_compile_flags(hip::device -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false) endif() if (NOT compilePropIsSet) hip_add_interface_compile_flags(hip::device -x hip) endif() hip_add_interface_link_flags(hip::device --hip-link) foreach(GPU_TARGET ${GPU_TARGETS}) if (NOT compilePropIsSet) hip_add_interface_compile_flags(hip::device --offload-arch=${GPU_TARGET}) endif() hip_add_interface_link_flags(hip::device --offload-arch=${GPU_TARGET}) endforeach() #Add support for parallel build and link if(${CMAKE_CXX_COMPILER_ID} STREQUAL "Clang") check_cxx_compiler_flag("-parallel-jobs=1" HIP_CLANG_SUPPORTS_PARALLEL_JOBS) endif() if(HIP_CLANG_NUM_PARALLEL_JOBS GREATER 1) if(${HIP_CLANG_SUPPORTS_PARALLEL_JOBS} ) if (NOT compilePropIsSet) hip_add_interface_compile_flags(hip::device -parallel-jobs=${HIP_CLANG_NUM_PARALLEL_JOBS} -Wno-format-nonliteral) endif() hip_add_interface_link_flags(hip::device -parallel-jobs=${HIP_CLANG_NUM_PARALLEL_JOBS}) else() message("clang compiler doesn't support parallel jobs") endif() endif() # Use HIP_CXX option -print-libgcc-file-name --rtlib=compiler-rt # To fetch the compiler rt library file name. execute_process( COMMAND ${HIP_CXX_COMPILER} -print-libgcc-file-name --rtlib=compiler-rt OUTPUT_VARIABLE CLANGRT_BUILTINS ERROR_VARIABLE CLANGRT_Error OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_STRIP_TRAILING_WHITESPACE RESULT_VARIABLE CLANGRT_BUILTINS_FETCH_EXIT_CODE) if( CLANGRT_Error ) message( STATUS "${HIP_CXX_COMPILER}: CLANGRT compiler options not supported.") else() # Add support for __fp16 and _Float16, explicitly link with compiler-rt if( "${CLANGRT_BUILTINS_FETCH_EXIT_CODE}" STREQUAL "0" ) # CLANG_RT Builtins found Successfully Set interface link libraries property set_property(TARGET hip::host APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${CLANGRT_BUILTINS}") set_property(TARGET hip::device APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${CLANGRT_BUILTINS}") else() message(STATUS "clangrt builtins lib not found: ${CLANGRT_BUILTINS_FETCH_EXIT_CODE}") endif() # CLANGRT_BUILTINS_FETCH_EXIT_CODE Check endif() # CLANGRT_Error Check clr-rocm-5.7.1/hipamd/hip-config-nvidia.cmake000077500000000000000000000024231450307266000210100ustar00rootroot00000000000000# Copyright (c) 2023 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.3) add_library(hip::device INTERFACE IMPORTED) add_library(hip::host INTERFACE IMPORTED) add_library(hip::amdhip64 INTERFACE IMPORTED) clr-rocm-5.7.1/hipamd/hip-config.cmake.in000077500000000000000000000131561450307266000201520ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.3) @PACKAGE_INIT@ include(CheckCXXCompilerFlag) include(CMakeFindDependencyMacro OPTIONAL RESULT_VARIABLE _CMakeFindDependencyMacro_FOUND) if (NOT _CMakeFindDependencyMacro_FOUND) macro(find_dependency dep) if (NOT ${dep}_FOUND) set(cmake_fd_version) if (${ARGC} GREATER 1) set(cmake_fd_version ${ARGV1}) endif() set(cmake_fd_exact_arg) if(${CMAKE_FIND_PACKAGE_NAME}_FIND_VERSION_EXACT) set(cmake_fd_exact_arg EXACT) endif() set(cmake_fd_quiet_arg) if(${CMAKE_FIND_PACKAGE_NAME}_FIND_QUIETLY) set(cmake_fd_quiet_arg QUIET) endif() set(cmake_fd_required_arg) if(${CMAKE_FIND_PACKAGE_NAME}_FIND_REQUIRED) set(cmake_fd_required_arg REQUIRED) endif() find_package(${dep} ${cmake_fd_version} ${cmake_fd_exact_arg} ${cmake_fd_quiet_arg} ${cmake_fd_required_arg} ) string(TOUPPER ${dep} cmake_dep_upper) if (NOT ${dep}_FOUND AND NOT ${cmake_dep_upper}_FOUND) set(${CMAKE_FIND_PACKAGE_NAME}_NOT_FOUND_MESSAGE "${CMAKE_FIND_PACKAGE_NAME} could not be found because dependency ${dep} could not be found.") set(${CMAKE_FIND_PACKAGE_NAME}_FOUND False) return() endif() set(cmake_fd_version) set(cmake_fd_required_arg) set(cmake_fd_quiet_arg) set(cmake_fd_exact_arg) endif() endmacro() endif() set(_HIP_SHELL "SHELL:") if(CMAKE_VERSION VERSION_LESS 3.12) set(_HIP_SHELL "") endif() function(hip_add_interface_compile_flags TARGET) set_property(TARGET ${TARGET} APPEND PROPERTY INTERFACE_COMPILE_OPTIONS "$<$:${_HIP_SHELL}${ARGN}>" ) endfunction() function(hip_add_interface_link_flags TARGET) if(CMAKE_VERSION VERSION_LESS 3.20) set_property(TARGET ${TARGET} APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${ARGN}" ) else() set_property(TARGET ${TARGET} APPEND PROPERTY INTERFACE_LINK_LIBRARIES "$<$:${ARGN}>" ) endif() endfunction() # NOTE: If hip-config is invoked from /opt/rocm-ver/hip/lib/cmake/hip/ # then PACKAGE_PREFIX_DIR will resolve to /opt/rocm-ver/hip, which is for backward compatibility # The following will ensure PACKAGE_PREFIX_DIR will resolves to /opt/rocm-ver # First find the real path to hip-config file with symlinks resolved # Real Path : /opt/rocm-ver/lib/cmake/hip/hip-config.cmake # Then go up 4 levels to get PACKAGE_PREFIX_DIR # PACKAGE_PREFIX_DIR : /opt/rocm-ver # TODO:once file reorg backward compatibility is turned off this can be removed. if(IS_SYMLINK ${CMAKE_CURRENT_LIST_FILE}) get_filename_component(CONFIG_FILE_PATH "${CMAKE_CURRENT_LIST_FILE}" REALPATH) get_filename_component(PACKAGE_PREFIX_DIR "${CONFIG_FILE_PATH}/../../../../" ABSOLUTE) endif() # end of TODO set(HIP_PACKAGE_PREFIX_DIR ${PACKAGE_PREFIX_DIR}) set_and_check( hip_INCLUDE_DIR "@PACKAGE_INCLUDE_INSTALL_DIR@" ) set_and_check( hip_INCLUDE_DIRS "${hip_INCLUDE_DIR}" ) set_and_check( hip_LIB_INSTALL_DIR "@PACKAGE_LIB_INSTALL_DIR@" ) set_and_check( hip_BIN_INSTALL_DIR "@PACKAGE_BIN_INSTALL_DIR@" ) if(WIN32) set_and_check(hip_HIPCC_EXECUTABLE "${hip_BIN_INSTALL_DIR}/hipcc.bat") set_and_check(hip_HIPCONFIG_EXECUTABLE "${hip_BIN_INSTALL_DIR}/hipconfig.bat") else() set_and_check(hip_HIPCC_EXECUTABLE "${hip_BIN_INSTALL_DIR}/hipcc") set_and_check(hip_HIPCONFIG_EXECUTABLE "${hip_BIN_INSTALL_DIR}/hipconfig") endif() if(NOT DEFINED HIP_PLATFORM) if(NOT DEFINED ENV{HIP_PLATFORM}) execute_process(COMMAND ${hip_HIPCONFIG_EXECUTABLE} --platform OUTPUT_VARIABLE HIP_PLATFORM OUTPUT_STRIP_TRAILING_WHITESPACE) else() set(HIP_PLATFORM $ENV{HIP_PLATFORM} CACHE STRING "HIP Platform") endif() endif() if(HIP_PLATFORM STREQUAL "amd") set(HIP_RUNTIME "rocclr") set(HIP_COMPILER "clang") include( "${hip_LIB_INSTALL_DIR}/cmake/hip/hip-config-amd.cmake" ) elseif(HIP_PLATFORM STREQUAL "nvidia") set(HIP_RUNTIME "cuda") set(HIP_COMPILER "nvcc") include( "${hip_LIB_INSTALL_DIR}/cmake/hip/hip-config-nvidia.cmake" ) else() message(FATAL_ERROR "Unexpected HIP_PLATFORM: " ${HIP_PLATFORM}) endif() set( hip_LIBRARIES hip::host hip::device) set( hip_LIBRARY ${hip_LIBRARIES}) set(HIP_INCLUDE_DIR ${hip_INCLUDE_DIR}) set(HIP_INCLUDE_DIRS ${hip_INCLUDE_DIRS}) set(HIP_LIB_INSTALL_DIR ${hip_LIB_INSTALL_DIR}) set(HIP_BIN_INSTALL_DIR ${hip_BIN_INSTALL_DIR}) set(HIP_LIBRARIES ${hip_LIBRARIES}) set(HIP_LIBRARY ${hip_LIBRARY}) set(HIP_HIPCC_EXECUTABLE ${hip_HIPCC_EXECUTABLE}) set(HIP_HIPCONFIG_EXECUTABLE ${hip_HIPCONFIG_EXECUTABLE}) clr-rocm-5.7.1/hipamd/include/000077500000000000000000000000001450307266000161325ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/include/hip/000077500000000000000000000000001450307266000167125ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/include/hip/amd_detail/000077500000000000000000000000001450307266000207755ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_channel_descriptor.h000066400000000000000000000266311450307266000256450ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_CHANNEL_DESCRIPTOR_H #define HIP_INCLUDE_HIP_AMD_DETAIL_CHANNEL_DESCRIPTOR_H #include #include #include #ifdef __cplusplus extern "C" HIP_PUBLIC_API hipChannelFormatDesc hipCreateChannelDesc(int x, int y, int z, int w, hipChannelFormatKind f); static inline hipChannelFormatDesc hipCreateChannelDescHalf() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindFloat); } static inline hipChannelFormatDesc hipCreateChannelDescHalf1() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindFloat); } static inline hipChannelFormatDesc hipCreateChannelDescHalf2() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindFloat); } static inline hipChannelFormatDesc hipCreateChannelDescHalf4() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindFloat); } template static inline hipChannelFormatDesc hipCreateChannelDesc() { return hipCreateChannelDesc(0, 0, 0, 0, hipChannelFormatKindNone); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(char) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed char) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned char) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned char) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed char) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned char) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed char) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindSigned); } #ifndef __GNUC__ // vector3 is the same as vector4 template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned char) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed char) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindSigned); } #endif template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned char) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed char) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed short) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed short) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed short) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindSigned); } #ifndef __GNUC__ template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed short) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindSigned); } #endif template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned short) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed short) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned int) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed int) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned int) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed int) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned int) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed int) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindSigned); } #ifndef __GNUC__ template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned int) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed int) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindSigned); } #endif template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned int) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed int) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(float) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindFloat); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(float) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindFloat); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(float) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindFloat); } #ifndef __GNUC__ template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(float) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindFloat); } #endif template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(float) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindFloat); } #if !defined(__LP64__) template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned long) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed long) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned long) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed long) * 8; return hipCreateChannelDesc(e, 0, 0, 0, hipChannelFormatKindSigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned long) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed long) * 8; return hipCreateChannelDesc(e, e, 0, 0, hipChannelFormatKindSigned); } #ifndef __GNUC__ template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned long) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed long) * 8; return hipCreateChannelDesc(e, e, e, 0, hipChannelFormatKindSigned); } #endif template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(unsigned long) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindUnsigned); } template <> inline hipChannelFormatDesc hipCreateChannelDesc() { int e = (int)sizeof(signed long) * 8; return hipCreateChannelDesc(e, e, e, e, hipChannelFormatKindSigned); } #endif /* !__LP64__ */ #else struct hipChannelFormatDesc hipCreateChannelDesc(int x, int y, int z, int w, enum hipChannelFormatKind f); #endif /* __cplusplus */ #endif /* !HIP_INCLUDE_HIP_AMD_DETAIL_CHANNEL_DESCRIPTOR_H */ clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_device_functions.h000066400000000000000000001022061450307266000253170ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_DEVICE_FUNCTIONS_H #define HIP_INCLUDE_HIP_AMD_DETAIL_DEVICE_FUNCTIONS_H #include "host_defines.h" #include "math_fwd.h" #if !defined(__HIPCC_RTC__) #include #include #endif // !defined(__HIPCC_RTC__) #include #include #if __HIP_CLANG_ONLY__ extern "C" __device__ int printf(const char *fmt, ...); #else template static inline __device__ void printf(const char* format, All... all) {} #endif // __HIP_CLANG_ONLY__ extern "C" __device__ unsigned long long __ockl_steadyctr_u64(); /* Integer Intrinsics */ // integer intrinsic function __poc __clz __ffs __brev __device__ static inline unsigned int __popc(unsigned int input) { return __builtin_popcount(input); } __device__ static inline unsigned int __popcll(unsigned long long int input) { return __builtin_popcountll(input); } __device__ static inline int __clz(int input) { return __ockl_clz_u32((uint)input); } __device__ static inline int __clzll(long long int input) { return __ockl_clz_u64((uint64_t)input); } __device__ static inline unsigned int __ffs(unsigned int input) { return ( input == 0 ? -1 : __builtin_ctz(input) ) + 1; } __device__ static inline unsigned int __ffsll(unsigned long long int input) { return ( input == 0 ? -1 : __builtin_ctzll(input) ) + 1; } __device__ static inline unsigned int __ffs(int input) { return ( input == 0 ? -1 : __builtin_ctz(input) ) + 1; } __device__ static inline unsigned int __ffsll(long long int input) { return ( input == 0 ? -1 : __builtin_ctzll(input) ) + 1; } // Given a 32/64-bit value exec mask and an integer value base (between 0 and WAVEFRONT_SIZE), // find the n-th (given by offset) set bit in the exec mask from the base bit, and return the bit position. // If not found, return -1. __device__ static int32_t __fns64(uint64_t mask, uint32_t base, int32_t offset) { uint64_t temp_mask = mask; int32_t temp_offset = offset; if (offset == 0) { temp_mask &= (1 << base); temp_offset = 1; } else if (offset < 0) { temp_mask = __builtin_bitreverse64(mask); base = 63 - base; temp_offset = -offset; } temp_mask = temp_mask & ((~0ULL) << base); if (__builtin_popcountll(temp_mask) < temp_offset) return -1; int32_t total = 0; for (int i = 0x20; i > 0; i >>= 1) { uint64_t temp_mask_lo = temp_mask & ((1ULL << i) - 1); int32_t pcnt = __builtin_popcountll(temp_mask_lo); if (pcnt < temp_offset) { temp_mask = temp_mask >> i; temp_offset -= pcnt; total += i; } else { temp_mask = temp_mask_lo; } } if (offset < 0) return 63 - total; else return total; } __device__ static int32_t __fns32(uint64_t mask, uint32_t base, int32_t offset) { uint64_t temp_mask = mask; int32_t temp_offset = offset; if (offset == 0) { temp_mask &= (1 << base); temp_offset = 1; } else if (offset < 0) { temp_mask = __builtin_bitreverse64(mask); base = 63 - base; temp_offset = -offset; } temp_mask = temp_mask & ((~0ULL) << base); if (__builtin_popcountll(temp_mask) < temp_offset) return -1; int32_t total = 0; for (int i = 0x20; i > 0; i >>= 1) { uint64_t temp_mask_lo = temp_mask & ((1ULL << i) - 1); int32_t pcnt = __builtin_popcountll(temp_mask_lo); if (pcnt < temp_offset) { temp_mask = temp_mask >> i; temp_offset -= pcnt; total += i; } else { temp_mask = temp_mask_lo; } } if (offset < 0) return 63 - total; else return total; } __device__ static inline unsigned int __brev(unsigned int input) { return __builtin_bitreverse32(input); } __device__ static inline unsigned long long int __brevll(unsigned long long int input) { return __builtin_bitreverse64(input); } __device__ static inline unsigned int __lastbit_u32_u64(uint64_t input) { return input == 0 ? -1 : __builtin_ctzl(input); } __device__ static inline unsigned int __bitextract_u32(unsigned int src0, unsigned int src1, unsigned int src2) { uint32_t offset = src1 & 31; uint32_t width = src2 & 31; return width == 0 ? 0 : (src0 << (32 - offset - width)) >> (32 - width); } __device__ static inline uint64_t __bitextract_u64(uint64_t src0, unsigned int src1, unsigned int src2) { uint64_t offset = src1 & 63; uint64_t width = src2 & 63; return width == 0 ? 0 : (src0 << (64 - offset - width)) >> (64 - width); } __device__ static inline unsigned int __bitinsert_u32(unsigned int src0, unsigned int src1, unsigned int src2, unsigned int src3) { uint32_t offset = src2 & 31; uint32_t width = src3 & 31; uint32_t mask = (1 << width) - 1; return ((src0 & ~(mask << offset)) | ((src1 & mask) << offset)); } __device__ static inline uint64_t __bitinsert_u64(uint64_t src0, uint64_t src1, unsigned int src2, unsigned int src3) { uint64_t offset = src2 & 63; uint64_t width = src3 & 63; uint64_t mask = (1ULL << width) - 1; return ((src0 & ~(mask << offset)) | ((src1 & mask) << offset)); } __device__ inline unsigned int __funnelshift_l(unsigned int lo, unsigned int hi, unsigned int shift) { uint32_t mask_shift = shift & 31; return mask_shift == 0 ? hi : __builtin_amdgcn_alignbit(hi, lo, 32 - mask_shift); } __device__ inline unsigned int __funnelshift_lc(unsigned int lo, unsigned int hi, unsigned int shift) { uint32_t min_shift = shift >= 32 ? 32 : shift; return min_shift == 0 ? hi : __builtin_amdgcn_alignbit(hi, lo, 32 - min_shift); } __device__ inline unsigned int __funnelshift_r(unsigned int lo, unsigned int hi, unsigned int shift) { return __builtin_amdgcn_alignbit(hi, lo, shift); } __device__ inline unsigned int __funnelshift_rc(unsigned int lo, unsigned int hi, unsigned int shift) { return shift >= 32 ? hi : __builtin_amdgcn_alignbit(hi, lo, shift); } __device__ static unsigned int __byte_perm(unsigned int x, unsigned int y, unsigned int s); __device__ static unsigned int __hadd(int x, int y); __device__ static int __mul24(int x, int y); __device__ static long long int __mul64hi(long long int x, long long int y); __device__ static int __mulhi(int x, int y); __device__ static int __rhadd(int x, int y); __device__ static unsigned int __sad(int x, int y,unsigned int z); __device__ static unsigned int __uhadd(unsigned int x, unsigned int y); __device__ static int __umul24(unsigned int x, unsigned int y); __device__ static unsigned long long int __umul64hi(unsigned long long int x, unsigned long long int y); __device__ static unsigned int __umulhi(unsigned int x, unsigned int y); __device__ static unsigned int __urhadd(unsigned int x, unsigned int y); __device__ static unsigned int __usad(unsigned int x, unsigned int y, unsigned int z); struct ucharHolder { union { unsigned char c[4]; unsigned int ui; }; } __attribute__((aligned(4))); struct uchar2Holder { union { unsigned int ui[2]; unsigned char c[8]; }; } __attribute__((aligned(8))); __device__ static inline unsigned int __byte_perm(unsigned int x, unsigned int y, unsigned int s) { struct uchar2Holder cHoldVal; struct ucharHolder cHoldKey; cHoldKey.ui = s; cHoldVal.ui[0] = x; cHoldVal.ui[1] = y; unsigned int result; result = cHoldVal.c[cHoldKey.c[0] & 0x07]; result += (cHoldVal.c[(cHoldKey.c[0] & 0x70) >> 4] << 8); result += (cHoldVal.c[cHoldKey.c[1] & 0x07] << 16); result += (cHoldVal.c[(cHoldKey.c[1] & 0x70) >> 4] << 24); return result; } __device__ static inline unsigned int __hadd(int x, int y) { int z = x + y; int sign = z & 0x8000000; int value = z & 0x7FFFFFFF; return ((value) >> 1 || sign); } __device__ static inline int __mul24(int x, int y) { return __ockl_mul24_i32(x, y); } __device__ static inline long long __mul64hi(long long int x, long long int y) { ulong x0 = (ulong)x & 0xffffffffUL; long x1 = x >> 32; ulong y0 = (ulong)y & 0xffffffffUL; long y1 = y >> 32; ulong z0 = x0*y0; long t = x1*y0 + (z0 >> 32); long z1 = t & 0xffffffffL; long z2 = t >> 32; z1 = x0*y1 + z1; return x1*y1 + z2 + (z1 >> 32); } __device__ static inline int __mulhi(int x, int y) { return __ockl_mul_hi_i32(x, y); } __device__ static inline int __rhadd(int x, int y) { int z = x + y + 1; int sign = z & 0x8000000; int value = z & 0x7FFFFFFF; return ((value) >> 1 || sign); } __device__ static inline unsigned int __sad(int x, int y, unsigned int z) { return x > y ? x - y + z : y - x + z; } __device__ static inline unsigned int __uhadd(unsigned int x, unsigned int y) { return (x + y) >> 1; } __device__ static inline int __umul24(unsigned int x, unsigned int y) { return __ockl_mul24_u32(x, y); } __device__ static inline unsigned long long __umul64hi(unsigned long long int x, unsigned long long int y) { ulong x0 = x & 0xffffffffUL; ulong x1 = x >> 32; ulong y0 = y & 0xffffffffUL; ulong y1 = y >> 32; ulong z0 = x0*y0; ulong t = x1*y0 + (z0 >> 32); ulong z1 = t & 0xffffffffUL; ulong z2 = t >> 32; z1 = x0*y1 + z1; return x1*y1 + z2 + (z1 >> 32); } __device__ static inline unsigned int __umulhi(unsigned int x, unsigned int y) { return __ockl_mul_hi_u32(x, y); } __device__ static inline unsigned int __urhadd(unsigned int x, unsigned int y) { return (x + y + 1) >> 1; } __device__ static inline unsigned int __usad(unsigned int x, unsigned int y, unsigned int z) { return __ockl_sadd_u32(x, y, z); } __device__ static inline unsigned int __lane_id() { return __builtin_amdgcn_mbcnt_hi( -1, __builtin_amdgcn_mbcnt_lo(-1, 0)); } __device__ static inline unsigned int __mbcnt_lo(unsigned int x, unsigned int y) {return __builtin_amdgcn_mbcnt_lo(x,y);}; __device__ static inline unsigned int __mbcnt_hi(unsigned int x, unsigned int y) {return __builtin_amdgcn_mbcnt_hi(x,y);}; /* HIP specific device functions */ #if !defined(__HIPCC_RTC__) #include "amd_warp_functions.h" #endif #define MASK1 0x00ff00ff #define MASK2 0xff00ff00 __device__ static inline char4 __hip_hc_add8pk(char4 in1, char4 in2) { char4 out; unsigned one1 = in1.w & MASK1; unsigned one2 = in2.w & MASK1; out.w = (one1 + one2) & MASK1; one1 = in1.w & MASK2; one2 = in2.w & MASK2; out.w = out.w | ((one1 + one2) & MASK2); return out; } __device__ static inline char4 __hip_hc_sub8pk(char4 in1, char4 in2) { char4 out; unsigned one1 = in1.w & MASK1; unsigned one2 = in2.w & MASK1; out.w = (one1 - one2) & MASK1; one1 = in1.w & MASK2; one2 = in2.w & MASK2; out.w = out.w | ((one1 - one2) & MASK2); return out; } __device__ static inline char4 __hip_hc_mul8pk(char4 in1, char4 in2) { char4 out; unsigned one1 = in1.w & MASK1; unsigned one2 = in2.w & MASK1; out.w = (one1 * one2) & MASK1; one1 = in1.w & MASK2; one2 = in2.w & MASK2; out.w = out.w | ((one1 * one2) & MASK2); return out; } __device__ static inline float __double2float_rd(double x) { return __ocml_cvtrtn_f32_f64(x); } __device__ static inline float __double2float_rn(double x) { return x; } __device__ static inline float __double2float_ru(double x) { return __ocml_cvtrtp_f32_f64(x); } __device__ static inline float __double2float_rz(double x) { return __ocml_cvtrtz_f32_f64(x); } __device__ static inline int __double2hiint(double x) { static_assert(sizeof(double) == 2 * sizeof(int), ""); int tmp[2]; __builtin_memcpy(tmp, &x, sizeof(tmp)); return tmp[1]; } __device__ static inline int __double2loint(double x) { static_assert(sizeof(double) == 2 * sizeof(int), ""); int tmp[2]; __builtin_memcpy(tmp, &x, sizeof(tmp)); return tmp[0]; } __device__ static inline int __double2int_rd(double x) { return (int)__ocml_floor_f64(x); } __device__ static inline int __double2int_rn(double x) { return (int)__ocml_rint_f64(x); } __device__ static inline int __double2int_ru(double x) { return (int)__ocml_ceil_f64(x); } __device__ static inline int __double2int_rz(double x) { return (int)x; } __device__ static inline long long int __double2ll_rd(double x) { return (long long)__ocml_floor_f64(x); } __device__ static inline long long int __double2ll_rn(double x) { return (long long)__ocml_rint_f64(x); } __device__ static inline long long int __double2ll_ru(double x) { return (long long)__ocml_ceil_f64(x); } __device__ static inline long long int __double2ll_rz(double x) { return (long long)x; } __device__ static inline unsigned int __double2uint_rd(double x) { return (unsigned int)__ocml_floor_f64(x); } __device__ static inline unsigned int __double2uint_rn(double x) { return (unsigned int)__ocml_rint_f64(x); } __device__ static inline unsigned int __double2uint_ru(double x) { return (unsigned int)__ocml_ceil_f64(x); } __device__ static inline unsigned int __double2uint_rz(double x) { return (unsigned int)x; } __device__ static inline unsigned long long int __double2ull_rd(double x) { return (unsigned long long int)__ocml_floor_f64(x); } __device__ static inline unsigned long long int __double2ull_rn(double x) { return (unsigned long long int)__ocml_rint_f64(x); } __device__ static inline unsigned long long int __double2ull_ru(double x) { return (unsigned long long int)__ocml_ceil_f64(x); } __device__ static inline unsigned long long int __double2ull_rz(double x) { return (unsigned long long int)x; } #if defined(__clang__) #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wc++98-compat-pedantic" #endif __device__ static inline long long int __double_as_longlong(double x) { static_assert(sizeof(long long) == sizeof(double), ""); long long tmp; __builtin_memcpy(&tmp, &x, sizeof(tmp)); return tmp; } #if defined(__clang__) #pragma clang diagnostic pop #endif /* __device__ unsigned short __float2half_rn(float x); __device__ float __half2float(unsigned short); The above device function are not a valid . Use __device__ __half __float2half_rn(float x); __device__ float __half2float(__half); from hip_fp16.h CUDA implements half as unsigned short whereas, HIP doesn't. */ __device__ static inline int __float2int_rd(float x) { return (int)__ocml_floor_f32(x); } __device__ static inline int __float2int_rn(float x) { return (int)__ocml_rint_f32(x); } __device__ static inline int __float2int_ru(float x) { return (int)__ocml_ceil_f32(x); } __device__ static inline int __float2int_rz(float x) { return (int)__ocml_trunc_f32(x); } __device__ static inline long long int __float2ll_rd(float x) { return (long long int)__ocml_floor_f32(x); } __device__ static inline long long int __float2ll_rn(float x) { return (long long int)__ocml_rint_f32(x); } __device__ static inline long long int __float2ll_ru(float x) { return (long long int)__ocml_ceil_f32(x); } __device__ static inline long long int __float2ll_rz(float x) { return (long long int)x; } __device__ static inline unsigned int __float2uint_rd(float x) { return (unsigned int)__ocml_floor_f32(x); } __device__ static inline unsigned int __float2uint_rn(float x) { return (unsigned int)__ocml_rint_f32(x); } __device__ static inline unsigned int __float2uint_ru(float x) { return (unsigned int)__ocml_ceil_f32(x); } __device__ static inline unsigned int __float2uint_rz(float x) { return (unsigned int)x; } __device__ static inline unsigned long long int __float2ull_rd(float x) { return (unsigned long long int)__ocml_floor_f32(x); } __device__ static inline unsigned long long int __float2ull_rn(float x) { return (unsigned long long int)__ocml_rint_f32(x); } __device__ static inline unsigned long long int __float2ull_ru(float x) { return (unsigned long long int)__ocml_ceil_f32(x); } __device__ static inline unsigned long long int __float2ull_rz(float x) { return (unsigned long long int)x; } __device__ static inline int __float_as_int(float x) { static_assert(sizeof(int) == sizeof(float), ""); int tmp; __builtin_memcpy(&tmp, &x, sizeof(tmp)); return tmp; } __device__ static inline unsigned int __float_as_uint(float x) { static_assert(sizeof(unsigned int) == sizeof(float), ""); unsigned int tmp; __builtin_memcpy(&tmp, &x, sizeof(tmp)); return tmp; } __device__ static inline double __hiloint2double(int hi, int lo) { static_assert(sizeof(double) == sizeof(uint64_t), ""); uint64_t tmp0 = (static_cast(hi) << 32ull) | static_cast(lo); double tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ static inline double __int2double_rn(int x) { return (double)x; } __device__ static inline float __int2float_rd(int x) { return __ocml_cvtrtn_f32_s32(x); } __device__ static inline float __int2float_rn(int x) { return (float)x; } __device__ static inline float __int2float_ru(int x) { return __ocml_cvtrtp_f32_s32(x); } __device__ static inline float __int2float_rz(int x) { return __ocml_cvtrtz_f32_s32(x); } __device__ static inline float __int_as_float(int x) { static_assert(sizeof(float) == sizeof(int), ""); float tmp; __builtin_memcpy(&tmp, &x, sizeof(tmp)); return tmp; } __device__ static inline double __ll2double_rd(long long int x) { return __ocml_cvtrtn_f64_s64(x); } __device__ static inline double __ll2double_rn(long long int x) { return (double)x; } __device__ static inline double __ll2double_ru(long long int x) { return __ocml_cvtrtp_f64_s64(x); } __device__ static inline double __ll2double_rz(long long int x) { return __ocml_cvtrtz_f64_s64(x); } __device__ static inline float __ll2float_rd(long long int x) { return __ocml_cvtrtn_f32_s64(x); } __device__ static inline float __ll2float_rn(long long int x) { return (float)x; } __device__ static inline float __ll2float_ru(long long int x) { return __ocml_cvtrtp_f32_s64(x); } __device__ static inline float __ll2float_rz(long long int x) { return __ocml_cvtrtz_f32_s64(x); } __device__ static inline double __longlong_as_double(long long int x) { static_assert(sizeof(double) == sizeof(long long), ""); double tmp; __builtin_memcpy(&tmp, &x, sizeof(tmp)); return tmp; } __device__ static inline double __uint2double_rn(unsigned int x) { return (double)x; } __device__ static inline float __uint2float_rd(unsigned int x) { return __ocml_cvtrtn_f32_u32(x); } __device__ static inline float __uint2float_rn(unsigned int x) { return (float)x; } __device__ static inline float __uint2float_ru(unsigned int x) { return __ocml_cvtrtp_f32_u32(x); } __device__ static inline float __uint2float_rz(unsigned int x) { return __ocml_cvtrtz_f32_u32(x); } __device__ static inline float __uint_as_float(unsigned int x) { static_assert(sizeof(float) == sizeof(unsigned int), ""); float tmp; __builtin_memcpy(&tmp, &x, sizeof(tmp)); return tmp; } __device__ static inline double __ull2double_rd(unsigned long long int x) { return __ocml_cvtrtn_f64_u64(x); } __device__ static inline double __ull2double_rn(unsigned long long int x) { return (double)x; } __device__ static inline double __ull2double_ru(unsigned long long int x) { return __ocml_cvtrtp_f64_u64(x); } __device__ static inline double __ull2double_rz(unsigned long long int x) { return __ocml_cvtrtz_f64_u64(x); } __device__ static inline float __ull2float_rd(unsigned long long int x) { return __ocml_cvtrtn_f32_u64(x); } __device__ static inline float __ull2float_rn(unsigned long long int x) { return (float)x; } __device__ static inline float __ull2float_ru(unsigned long long int x) { return __ocml_cvtrtp_f32_u64(x); } __device__ static inline float __ull2float_rz(unsigned long long int x) { return __ocml_cvtrtz_f32_u64(x); } #if __HIP_CLANG_ONLY__ // Clock functions __device__ long long int __clock64(); __device__ long long int __clock(); __device__ long long int clock64(); __device__ long long int clock(); __device__ long long int wall_clock64(); // hip.amdgcn.bc - named sync __device__ void __named_sync(); #ifdef __HIP_DEVICE_COMPILE__ // Clock function to return GPU core cycle count. // GPU can change its core clock frequency at runtime. The maximum frequency can be queried // through hipDeviceAttributeClockRate attribute. __device__ inline __attribute((always_inline)) long long int __clock64() { #if __has_builtin(__builtin_amdgcn_s_memtime) // Exists on gfx8, gfx9, gfx10.1, gfx10.2, gfx10.3 return (long long int) __builtin_amdgcn_s_memtime(); #else // Subject to change when better solution available return (long long int) __builtin_readcyclecounter(); #endif } __device__ inline __attribute((always_inline)) long long int __clock() { return __clock64(); } // Clock function to return wall clock count at a constant frequency that can be queried // through hipDeviceAttributeWallClockRate attribute. __device__ inline __attribute__((always_inline)) long long int wall_clock64() { return (long long int) __ockl_steadyctr_u64(); } __device__ inline __attribute__((always_inline)) long long int clock64() { return __clock64(); } __device__ inline __attribute__((always_inline)) long long int clock() { return __clock(); } // hip.amdgcn.bc - named sync __device__ inline void __named_sync() { __builtin_amdgcn_s_barrier(); } #endif // __HIP_DEVICE_COMPILE__ // warp vote function __all __any __ballot __device__ inline int __all(int predicate) { return __ockl_wfall_i32(predicate); } __device__ inline int __any(int predicate) { return __ockl_wfany_i32(predicate); } // XXX from llvm/include/llvm/IR/InstrTypes.h #define ICMP_NE 33 __device__ inline unsigned long long int __ballot(int predicate) { return __builtin_amdgcn_uicmp(predicate, 0, ICMP_NE); } __device__ inline unsigned long long int __ballot64(int predicate) { return __builtin_amdgcn_uicmp(predicate, 0, ICMP_NE); } // hip.amdgcn.bc - lanemask __device__ inline uint64_t __lanemask_gt() { uint32_t lane = __ockl_lane_u32(); if (lane == 63) return 0; uint64_t ballot = __ballot64(1); uint64_t mask = (~((uint64_t)0)) << (lane + 1); return mask & ballot; } __device__ inline uint64_t __lanemask_lt() { uint32_t lane = __ockl_lane_u32(); int64_t ballot = __ballot64(1); uint64_t mask = ((uint64_t)1 << lane) - (uint64_t)1; return mask & ballot; } __device__ inline uint64_t __lanemask_eq() { uint32_t lane = __ockl_lane_u32(); int64_t mask = ((uint64_t)1 << lane); return mask; } __device__ inline void* __local_to_generic(void* p) { return p; } #ifdef __HIP_DEVICE_COMPILE__ __device__ inline void* __get_dynamicgroupbaseptr() { // Get group segment base pointer. return (char*)__local_to_generic((void*)__to_local(__builtin_amdgcn_groupstaticsize())); } #else __device__ void* __get_dynamicgroupbaseptr(); #endif // __HIP_DEVICE_COMPILE__ __device__ inline void *__amdgcn_get_dynamicgroupbaseptr() { return __get_dynamicgroupbaseptr(); } // Memory Fence Functions __device__ inline static void __threadfence() { __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "agent"); } __device__ inline static void __threadfence_block() { __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup"); } __device__ inline static void __threadfence_system() { __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, ""); } // abort __device__ inline __attribute__((weak)) void abort() { return __builtin_trap(); } // The noinline attribute helps encapsulate the printf expansion, // which otherwise has a performance impact just by increasing the // size of the calling function. Additionally, the weak attribute // allows the function to exist as a global although its definition is // included in every compilation unit. #if defined(_WIN32) || defined(_WIN64) extern "C" __device__ __attribute__((noinline)) __attribute__((weak)) void _wassert(const wchar_t *_msg, const wchar_t *_file, unsigned _line) { // FIXME: Need `wchar_t` support to generate assertion message. __builtin_trap(); } #else /* defined(_WIN32) || defined(_WIN64) */ extern "C" __device__ __attribute__((noinline)) __attribute__((weak)) void __assert_fail(const char *assertion, const char *file, unsigned int line, const char *function) { const char fmt[] = "%s:%u: %s: Device-side assertion `%s' failed.\n"; // strlen is not available as a built-in yet, so we create our own // loop in a macro. With a string literal argument, the compiler // usually manages to replace the loop with a constant. // // The macro does not check for null pointer, since all the string // arguments are defined to be constant literals when called from // the assert() macro. // // NOTE: The loop below includes the null terminator in the length // as required by append_string_n(). #define __hip_get_string_length(LEN, STR) \ do { \ const char *tmp = STR; \ while (*tmp++); \ LEN = tmp - STR; \ } while (0) auto msg = __ockl_fprintf_stderr_begin(); int len = 0; __hip_get_string_length(len, fmt); msg = __ockl_fprintf_append_string_n(msg, fmt, len, 0); __hip_get_string_length(len, file); msg = __ockl_fprintf_append_string_n(msg, file, len, 0); msg = __ockl_fprintf_append_args(msg, 1, line, 0, 0, 0, 0, 0, 0, 0); __hip_get_string_length(len, function); msg = __ockl_fprintf_append_string_n(msg, function, len, 0); __hip_get_string_length(len, assertion); __ockl_fprintf_append_string_n(msg, assertion, len, /* is_last = */ 1); #undef __hip_get_string_length __builtin_trap(); } extern "C" __device__ __attribute__((noinline)) __attribute__((weak)) void __assertfail() { // ignore all the args for now. __builtin_trap(); } #endif /* defined(_WIN32) || defined(_WIN64) */ __device__ inline static void __work_group_barrier(__cl_mem_fence_flags flags) { if (flags) { __builtin_amdgcn_fence(__ATOMIC_RELEASE, "workgroup"); __builtin_amdgcn_s_barrier(); __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "workgroup"); } else { __builtin_amdgcn_s_barrier(); } } __device__ inline static void __barrier(int n) { __work_group_barrier((__cl_mem_fence_flags)n); } __device__ inline __attribute__((convergent)) void __syncthreads() { __barrier(__CLK_LOCAL_MEM_FENCE); } __device__ inline __attribute__((convergent)) int __syncthreads_count(int predicate) { return __ockl_wgred_add_i32(!!predicate); } __device__ inline __attribute__((convergent)) int __syncthreads_and(int predicate) { return __ockl_wgred_and_i32(!!predicate); } __device__ inline __attribute__((convergent)) int __syncthreads_or(int predicate) { return __ockl_wgred_or_i32(!!predicate); } // hip.amdgcn.bc - device routine /* HW_ID Register bit structure for RDNA2 & RDNA3 WAVE_ID 4:0 Wave id within the SIMD. SIMD_ID 9:8 SIMD_ID within the WGP: [0] = row, [1] = column. WGP_ID 13:10 Physical WGP ID. SA_ID 16 Shader Array ID SE_ID 20:18 Shader Engine the wave is assigned to for gfx11 SE_ID 19:18 Shader Engine the wave is assigned to for gfx10 DP_RATE 31:29 Number of double-precision float units per SIMD HW_ID Register bit structure for GCN and CDNA WAVE_ID 3:0 Wave buffer slot number. 0-9. SIMD_ID 5:4 SIMD which the wave is assigned to within the CU. PIPE_ID 7:6 Pipeline from which the wave was dispatched. CU_ID 11:8 Compute Unit the wave is assigned to. SH_ID 12 Shader Array (within an SE) the wave is assigned to. SE_ID 15:13 Shader Engine the wave is assigned to for gfx908, gfx90a, gfx940 14:13 Shader Engine the wave is assigned to for Vega. TG_ID 19:16 Thread-group ID VM_ID 23:20 Virtual Memory ID QUEUE_ID 26:24 Queue from which this wave was dispatched. STATE_ID 29:27 State ID (graphics only, not compute). ME_ID 31:30 Micro-engine ID. XCC_ID Register bit structure for gfx940 XCC_ID 3:0 XCC the wave is assigned to. */ #if (defined (__GFX10__) || defined (__GFX11__)) #define HW_ID 23 #else #define HW_ID 4 #endif #if (defined(__GFX10__) || defined(__GFX11__)) #define HW_ID_WGP_ID_SIZE 4 #define HW_ID_WGP_ID_OFFSET 10 #else #define HW_ID_CU_ID_SIZE 4 #define HW_ID_CU_ID_OFFSET 8 #endif #if (defined(__gfx908__) || defined(__gfx90a__) || \ defined(__GFX11__)) #define HW_ID_SE_ID_SIZE 3 #else //4 SEs/XCC for gfx940 #define HW_ID_SE_ID_SIZE 2 #endif #if (defined(__GFX10__) || defined(__GFX11__)) #define HW_ID_SE_ID_OFFSET 18 #define HW_ID_SA_ID_OFFSET 16 #define HW_ID_SA_ID_SIZE 1 #else #define HW_ID_SE_ID_OFFSET 13 #endif #if (defined(__gfx940__)) #define XCC_ID 20 #define XCC_ID_XCC_ID_SIZE 4 #define XCC_ID_XCC_ID_OFFSET 0 #endif #if (!defined(__HIP_NO_IMAGE_SUPPORT) && \ (defined(__gfx940__) || defined(__gfx941__) || defined(__gfx942__))) #define __HIP_NO_IMAGE_SUPPORT 1 #endif /* Encoding of parameter bitmask HW_ID 5:0 HW_ID OFFSET 10:6 Range: 0..31 SIZE 15:11 Range: 1..32 */ #define GETREG_IMMED(SZ,OFF,REG) (((SZ) << 11) | ((OFF) << 6) | (REG)) /* __smid returns the wave's assigned Compute Unit and Shader Engine. The Compute Unit, CU_ID returned in bits 3:0, and Shader Engine, SE_ID in bits 5:4. Note: the results vary over time. SZ minus 1 since SIZE is 1-based. */ __device__ inline unsigned __smid(void) { unsigned se_id = __builtin_amdgcn_s_getreg( GETREG_IMMED(HW_ID_SE_ID_SIZE-1, HW_ID_SE_ID_OFFSET, HW_ID)); #if (defined(__GFX10__) || defined(__GFX11__)) unsigned wgp_id = __builtin_amdgcn_s_getreg( GETREG_IMMED(HW_ID_WGP_ID_SIZE - 1, HW_ID_WGP_ID_OFFSET, HW_ID)); unsigned sa_id = __builtin_amdgcn_s_getreg( GETREG_IMMED(HW_ID_SA_ID_SIZE - 1, HW_ID_SA_ID_OFFSET, HW_ID)); #else #if defined(__gfx940__) unsigned xcc_id = __builtin_amdgcn_s_getreg( GETREG_IMMED(XCC_ID_XCC_ID_SIZE - 1, XCC_ID_XCC_ID_OFFSET, XCC_ID)); #endif unsigned cu_id = __builtin_amdgcn_s_getreg( GETREG_IMMED(HW_ID_CU_ID_SIZE - 1, HW_ID_CU_ID_OFFSET, HW_ID)); #endif #if (defined(__GFX10__) || defined(__GFX11__)) unsigned temp = se_id; temp = (temp << HW_ID_SA_ID_SIZE) | sa_id; temp = (temp << HW_ID_WGP_ID_SIZE) | wgp_id; return temp; //TODO : CU Mode impl #elif defined(__gfx940__) unsigned temp = xcc_id; temp = (temp << HW_ID_SE_ID_SIZE) | se_id; temp = (temp << HW_ID_CU_ID_SIZE) | cu_id; return temp; #else return (se_id << HW_ID_CU_ID_SIZE) + cu_id; #endif } /** * Map HIP_DYNAMIC_SHARED to "extern __shared__" for compatibility with old HIP applications * To be removed in a future release. */ #define HIP_DYNAMIC_SHARED(type, var) extern __shared__ type var[]; #define HIP_DYNAMIC_SHARED_ATTRIBUTE #endif //defined(__clang__) && defined(__HIP__) // loop unrolling static inline __device__ void* __hip_hc_memcpy(void* dst, const void* src, size_t size) { auto dstPtr = static_cast(dst); auto srcPtr = static_cast(src); while (size >= 4u) { dstPtr[0] = srcPtr[0]; dstPtr[1] = srcPtr[1]; dstPtr[2] = srcPtr[2]; dstPtr[3] = srcPtr[3]; size -= 4u; srcPtr += 4u; dstPtr += 4u; } switch (size) { case 3: dstPtr[2] = srcPtr[2]; case 2: dstPtr[1] = srcPtr[1]; case 1: dstPtr[0] = srcPtr[0]; } return dst; } static inline __device__ void* __hip_hc_memset(void* dst, unsigned char val, size_t size) { auto dstPtr = static_cast(dst); while (size >= 4u) { dstPtr[0] = val; dstPtr[1] = val; dstPtr[2] = val; dstPtr[3] = val; size -= 4u; dstPtr += 4u; } switch (size) { case 3: dstPtr[2] = val; case 2: dstPtr[1] = val; case 1: dstPtr[0] = val; } return dst; } #ifndef __OPENMP_AMDGCN__ static inline __device__ void* memcpy(void* dst, const void* src, size_t size) { return __hip_hc_memcpy(dst, src, size); } static inline __device__ void* memset(void* ptr, int val, size_t size) { unsigned char val8 = static_cast(val); return __hip_hc_memset(ptr, val8, size); } #endif // !__OPENMP_AMDGCN__ #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_atomic.h000066400000000000000000001421401450307266000241050ustar00rootroot00000000000000/* Copyright (c) 2015 - Present Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "amd_device_functions.h" #if __has_builtin(__hip_atomic_compare_exchange_strong) template struct Cond_t; template struct Cond_t { using type = T; }; template struct Cond_t { using type = F; }; #if !__HIP_DEVICE_COMPILE__ //TODO: Remove this after compiler pre-defines the following Macros. #define __HIP_MEMORY_SCOPE_SINGLETHREAD 1 #define __HIP_MEMORY_SCOPE_WAVEFRONT 2 #define __HIP_MEMORY_SCOPE_WORKGROUP 3 #define __HIP_MEMORY_SCOPE_AGENT 4 #define __HIP_MEMORY_SCOPE_SYSTEM 5 #endif #if !defined(__HIPCC_RTC__) #include "amd_hip_unsafe_atomics.h" #endif // Atomic expanders template< int mem_order = __ATOMIC_SEQ_CST, int mem_scope= __HIP_MEMORY_SCOPE_SYSTEM, typename T, typename Op, typename F> inline __attribute__((always_inline, device)) T hip_cas_expander(T* p, T x, Op op, F f) noexcept { using FP = __attribute__((address_space(0))) const void*; __device__ extern bool is_shared_workaround(FP) asm("llvm.amdgcn.is.shared"); if (is_shared_workaround((FP)p)) return f(); using U = typename Cond_t< sizeof(T) == sizeof(unsigned int), unsigned int, unsigned long long>::type; auto q = reinterpret_cast(p); U tmp0{__hip_atomic_load(q, mem_order, mem_scope)}; U tmp1; do { tmp1 = tmp0; op(reinterpret_cast(tmp1), x); } while (!__hip_atomic_compare_exchange_strong(q, &tmp0, tmp1, mem_order, mem_order, mem_scope)); return reinterpret_cast(tmp0); } template< int mem_order = __ATOMIC_SEQ_CST, int mem_scope= __HIP_MEMORY_SCOPE_SYSTEM, typename T, typename Cmp, typename F> inline __attribute__((always_inline, device)) T hip_cas_extrema_expander(T* p, T x, Cmp cmp, F f) noexcept { using FP = __attribute__((address_space(0))) const void*; __device__ extern bool is_shared_workaround(FP) asm("llvm.amdgcn.is.shared"); if (is_shared_workaround((FP)p)) return f(); using U = typename Cond_t< sizeof(T) == sizeof(unsigned int), unsigned int, unsigned long long>::type; auto q = reinterpret_cast(p); U tmp{__hip_atomic_load(q, mem_order, mem_scope)}; while (cmp(x, reinterpret_cast(tmp)) && !__hip_atomic_compare_exchange_strong(q, &tmp, x, mem_order, mem_order, mem_scope)); return reinterpret_cast(tmp); } __device__ inline int atomicCAS(int* address, int compare, int val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); return compare; } __device__ inline int atomicCAS_system(int* address, int compare, int val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); return compare; } __device__ inline unsigned int atomicCAS(unsigned int* address, unsigned int compare, unsigned int val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); return compare; } __device__ inline unsigned int atomicCAS_system(unsigned int* address, unsigned int compare, unsigned int val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); return compare; } __device__ inline unsigned long atomicCAS(unsigned long* address, unsigned long compare, unsigned long val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); return compare; } __device__ inline unsigned long atomicCAS_system(unsigned long* address, unsigned long compare, unsigned long val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); return compare; } __device__ inline unsigned long long atomicCAS(unsigned long long* address, unsigned long long compare, unsigned long long val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); return compare; } __device__ inline unsigned long long atomicCAS_system(unsigned long long* address, unsigned long long compare, unsigned long long val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); return compare; } __device__ inline float atomicCAS(float* address, float compare, float val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); return compare; } __device__ inline float atomicCAS_system(float* address, float compare, float val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); return compare; } __device__ inline double atomicCAS(double* address, double compare, double val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); return compare; } __device__ inline double atomicCAS_system(double* address, double compare, double val) { __hip_atomic_compare_exchange_strong(address, &compare, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); return compare; } __device__ inline int atomicAdd(int* address, int val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline int atomicAdd_system(int* address, int val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned int atomicAdd(unsigned int* address, unsigned int val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned int atomicAdd_system(unsigned int* address, unsigned int val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned long atomicAdd(unsigned long* address, unsigned long val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned long atomicAdd_system(unsigned long* address, unsigned long val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned long long atomicAdd(unsigned long long* address, unsigned long long val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned long long atomicAdd_system(unsigned long long* address, unsigned long long val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline float atomicAdd(float* address, float val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicAdd(address, val); #else return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif } __device__ inline float atomicAdd_system(float* address, float val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } #if !defined(__HIPCC_RTC__) DEPRECATED("use atomicAdd instead") #endif // !defined(__HIPCC_RTC__) __device__ inline void atomicAddNoRet(float* address, float val) { __ockl_atomic_add_noret_f32(address, val); } __device__ inline double atomicAdd(double* address, double val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicAdd(address, val); #else return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif } __device__ inline double atomicAdd_system(double* address, double val) { return __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline int atomicSub(int* address, int val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline int atomicSub_system(int* address, int val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned int atomicSub(unsigned int* address, unsigned int val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned int atomicSub_system(unsigned int* address, unsigned int val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned long atomicSub(unsigned long* address, unsigned long val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned long atomicSub_system(unsigned long* address, unsigned long val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned long long atomicSub(unsigned long long* address, unsigned long long val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned long long atomicSub_system(unsigned long long* address, unsigned long long val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline float atomicSub(float* address, float val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicAdd(address, -val); #else return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif } __device__ inline float atomicSub_system(float* address, float val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline double atomicSub(double* address, double val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicAdd(address, -val); #else return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif } __device__ inline double atomicSub_system(double* address, double val) { return __hip_atomic_fetch_add(address, -val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline int atomicExch(int* address, int val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline int atomicExch_system(int* address, int val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned int atomicExch(unsigned int* address, unsigned int val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned int atomicExch_system(unsigned int* address, unsigned int val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned long atomicExch(unsigned long* address, unsigned long val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned long atomicExch_system(unsigned long* address, unsigned long val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline unsigned long long atomicExch(unsigned long long* address, unsigned long long val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline unsigned long long atomicExch_system(unsigned long long* address, unsigned long long val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline float atomicExch(float* address, float val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline float atomicExch_system(float* address, float val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline double atomicExch(double* address, double val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } __device__ inline double atomicExch_system(double* address, double val) { return __hip_atomic_exchange(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } __device__ inline int atomicMin(int* address, int val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](int x, int y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline int atomicMin_system(int* address, int val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](int x, int y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned int atomicMin(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned int x, unsigned int y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned int atomicMin_system(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned int x, unsigned int y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long long atomicMin(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long x, unsigned long y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long atomicMin_system(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long x, unsigned long y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long long atomicMin(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long long x, unsigned long long y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long long atomicMin_system(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long long x, unsigned long long y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline long long atomicMin(long long* address, long long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](long long x, long long y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline long long atomicMin_system(long long* address, long long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](long long x, long long y) { return x < y; }, [=]() { return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_min(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline float atomicMin(float* addr, float val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicMin(addr, val); #else #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) float value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value > val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned int *uaddr = (unsigned int *)addr; unsigned int value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __uint_as_float(value) > val) { done = __atomic_compare_exchange_n(uaddr, &value, __float_as_uint(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __uint_as_float(value); #endif #endif } __device__ inline float atomicMin_system(float* address, float val) { unsigned int* uaddr { reinterpret_cast(address) }; #if __has_builtin(__hip_atomic_load) unsigned int tmp {__hip_atomic_load(uaddr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM)}; #else unsigned int tmp {__atomic_load_n(uaddr, __ATOMIC_RELAXED)}; #endif float value = __uint_as_float(tmp); while (val < value) { value = atomicCAS_system(address, value, val); } return value; } __device__ inline double atomicMin(double* addr, double val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicMin(addr, val); #else #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) double value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value > val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned long long *uaddr = (unsigned long long *)addr; unsigned long long value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __longlong_as_double(value) > val) { done = __atomic_compare_exchange_n(uaddr, &value, __double_as_longlong(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __longlong_as_double(value); #endif #endif } __device__ inline double atomicMin_system(double* address, double val) { unsigned long long* uaddr { reinterpret_cast(address) }; #if __has_builtin(__hip_atomic_load) unsigned long long tmp {__hip_atomic_load(uaddr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM)}; #else unsigned long long tmp {__atomic_load_n(uaddr, __ATOMIC_RELAXED)}; #endif double value = __longlong_as_double(tmp); while (val < value) { value = atomicCAS_system(address, value, val); } return value; } __device__ inline int atomicMax(int* address, int val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](int x, int y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline int atomicMax_system(int* address, int val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](int x, int y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned int atomicMax(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned int x, unsigned int y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned int atomicMax_system(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned int x, unsigned int y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long atomicMax(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long x, unsigned long y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long atomicMax_system(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long x, unsigned long y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long long atomicMax(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long long x, unsigned long long y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long long atomicMax_system(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long long x, unsigned long long y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline long long atomicMax(long long* address, long long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](long long x, long long y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline long long atomicMax_system(long long* address, long long val) { #if defined(__gfx941__) return hip_cas_extrema_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](long long x, long long y) { return y < x; }, [=]() { return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_max(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline float atomicMax(float* addr, float val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicMax(addr, val); #else #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) float value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value < val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned int *uaddr = (unsigned int *)addr; unsigned int value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __uint_as_float(value) < val) { done = __atomic_compare_exchange_n(uaddr, &value, __float_as_uint(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __uint_as_float(value); #endif #endif } __device__ inline float atomicMax_system(float* address, float val) { unsigned int* uaddr { reinterpret_cast(address) }; #if __has_builtin(__hip_atomic_load) unsigned int tmp {__hip_atomic_load(uaddr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM)}; #else unsigned int tmp {__atomic_load_n(uaddr, __ATOMIC_RELAXED)}; #endif float value = __uint_as_float(tmp); while (value < val) { value = atomicCAS_system(address, value, val); } return value; } __device__ inline double atomicMax(double* addr, double val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicMax(addr, val); #else #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) double value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value < val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned long long *uaddr = (unsigned long long *)addr; unsigned long long value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __longlong_as_double(value) < val) { done = __atomic_compare_exchange_n(uaddr, &value, __double_as_longlong(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __longlong_as_double(value); #endif #endif } __device__ inline double atomicMax_system(double* address, double val) { unsigned long long* uaddr { reinterpret_cast(address) }; #if __has_builtin(__hip_atomic_load) unsigned long long tmp {__hip_atomic_load(uaddr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM)}; #else unsigned long long tmp {__atomic_load_n(uaddr, __ATOMIC_RELAXED)}; #endif double value = __longlong_as_double(tmp); while (value < val) { value = atomicCAS_system(address, value, val); } return value; } __device__ inline unsigned int atomicInc(unsigned int* address, unsigned int val) { #if defined(__gfx941__) __device__ extern unsigned int __builtin_amdgcn_atomic_inc( unsigned int*, unsigned int, unsigned int, unsigned int, bool) __asm("llvm.amdgcn.atomic.inc.i32.p0i32"); return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned int& x, unsigned int y) { x = (x >= y) ? 0 : (x + 1); }, [=]() { return __builtin_amdgcn_atomic_inc(address, val, __ATOMIC_RELAXED, 1, false); }); #else return __builtin_amdgcn_atomic_inc32(address, val, __ATOMIC_RELAXED, "agent"); #endif // __gfx941__ } __device__ inline unsigned int atomicDec(unsigned int* address, unsigned int val) { #if defined(__gfx941__) __device__ extern unsigned int __builtin_amdgcn_atomic_dec( unsigned int*, unsigned int, unsigned int, unsigned int, bool) __asm("llvm.amdgcn.atomic.dec.i32.p0i32"); return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned int& x, unsigned int y) { x = (!x || x > y) ? y : (x - 1); }, [=]() { return __builtin_amdgcn_atomic_dec(address, val, __ATOMIC_RELAXED, 1, false); }); #else return __builtin_amdgcn_atomic_dec32(address, val, __ATOMIC_RELAXED, "agent"); #endif // __gfx941__ } __device__ inline int atomicAnd(int* address, int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](int& x, int y) { x &= y; }, [=]() { return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline int atomicAnd_system(int* address, int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](int& x, int y) { x &= y; }, [=]() { return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned int atomicAnd(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned int& x, unsigned int y) { x &= y; }, [=]() { return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned int atomicAnd_system(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned int& x, unsigned int y) { x &= y; }, [=]() { return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long atomicAnd(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long& x, unsigned long y) { x &= y; }, [=]() { return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long atomicAnd_system(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long& x, unsigned long y) { x &= y; }, [=]() { return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long long atomicAnd(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long long& x, unsigned long long y) { x &= y; }, [=]() { return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long long atomicAnd_system(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long long& x, unsigned long long y) { x &= y; }, [=]() { return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_and(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline int atomicOr(int* address, int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](int& x, int y) { x |= y; }, [=]() { return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline int atomicOr_system(int* address, int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](int& x, int y) { x |= y; }, [=]() { return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned int atomicOr(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned int& x, unsigned int y) { x |= y; }, [=]() { return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned int atomicOr_system(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned int& x, unsigned int y) { x |= y; }, [=]() { return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long atomicOr(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long& x, unsigned long y) { x |= y; }, [=]() { return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long atomicOr_system(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long& x, unsigned long y) { x |= y; }, [=]() { return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long long atomicOr(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long long& x, unsigned long long y) { x |= y; }, [=]() { return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long long atomicOr_system(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long long& x, unsigned long long y) { x |= y; }, [=]() { return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_or(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline int atomicXor(int* address, int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](int& x, int y) { x ^= y; }, [=]() { return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline int atomicXor_system(int* address, int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](int& x, int y) { x ^= y; }, [=]() { return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned int atomicXor(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned int& x, unsigned int y) { x ^= y; }, [=]() { return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned int atomicXor_system(unsigned int* address, unsigned int val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned int& x, unsigned int y) { x ^= y; }, [=]() { return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long atomicXor(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long& x, unsigned long y) { x ^= y; }, [=]() { return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long atomicXor_system(unsigned long* address, unsigned long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM>( address, val, [](unsigned long& x, unsigned long y) { x ^= y; }, [=]() { return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); }); #else return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #endif // __gfx941__ } __device__ inline unsigned long long atomicXor(unsigned long long* address, unsigned long long val) { #if defined(__gfx941__) return hip_cas_expander<__ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT>( address, val, [](unsigned long long& x, unsigned long long y) { x ^= y; }, [=]() { return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); }); #else return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #endif // __gfx941__ } __device__ inline unsigned long long atomicXor_system(unsigned long long* address, unsigned long long val) { return __hip_atomic_fetch_xor(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); } #else // __hip_atomic_compare_exchange_strong __device__ inline int atomicCAS(int* address, int compare, int val) { __atomic_compare_exchange_n( address, &compare, val, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); return compare; } __device__ inline unsigned int atomicCAS( unsigned int* address, unsigned int compare, unsigned int val) { __atomic_compare_exchange_n( address, &compare, val, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); return compare; } __device__ inline unsigned long long atomicCAS( unsigned long long* address, unsigned long long compare, unsigned long long val) { __atomic_compare_exchange_n( address, &compare, val, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); return compare; } __device__ inline int atomicAdd(int* address, int val) { return __atomic_fetch_add(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned int atomicAdd(unsigned int* address, unsigned int val) { return __atomic_fetch_add(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned long long atomicAdd( unsigned long long* address, unsigned long long val) { return __atomic_fetch_add(address, val, __ATOMIC_RELAXED); } __device__ inline float atomicAdd(float* address, float val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicAdd(address, val); #else return __atomic_fetch_add(address, val, __ATOMIC_RELAXED); #endif } #if !defined(__HIPCC_RTC__) DEPRECATED("use atomicAdd instead") #endif // !defined(__HIPCC_RTC__) __device__ inline void atomicAddNoRet(float* address, float val) { __ockl_atomic_add_noret_f32(address, val); } __device__ inline double atomicAdd(double* address, double val) { #if defined(__AMDGCN_UNSAFE_FP_ATOMICS__) return unsafeAtomicAdd(address, val); #else return __atomic_fetch_add(address, val, __ATOMIC_RELAXED); #endif } __device__ inline int atomicSub(int* address, int val) { return __atomic_fetch_sub(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned int atomicSub(unsigned int* address, unsigned int val) { return __atomic_fetch_sub(address, val, __ATOMIC_RELAXED); } __device__ inline int atomicExch(int* address, int val) { return __atomic_exchange_n(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned int atomicExch(unsigned int* address, unsigned int val) { return __atomic_exchange_n(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned long long atomicExch(unsigned long long* address, unsigned long long val) { return __atomic_exchange_n(address, val, __ATOMIC_RELAXED); } __device__ inline float atomicExch(float* address, float val) { return __uint_as_float(__atomic_exchange_n( reinterpret_cast(address), __float_as_uint(val), __ATOMIC_RELAXED)); } __device__ inline int atomicMin(int* address, int val) { return __atomic_fetch_min(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned int atomicMin(unsigned int* address, unsigned int val) { return __atomic_fetch_min(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned long long atomicMin( unsigned long long* address, unsigned long long val) { unsigned long long tmp{__atomic_load_n(address, __ATOMIC_RELAXED)}; while (val < tmp) { const auto tmp1 = __atomic_load_n(address, __ATOMIC_RELAXED); if (tmp1 != tmp) { tmp = tmp1; continue; } tmp = atomicCAS(address, tmp, val); } return tmp; } __device__ inline long long atomicMin(long long* address, long long val) { long long tmp{__atomic_load_n(address, __ATOMIC_RELAXED)}; while (val < tmp) { const auto tmp1 = __atomic_load_n(address, __ATOMIC_RELAXED); if (tmp1 != tmp) { tmp = tmp1; continue; } tmp = atomicCAS(address, tmp, val); } return tmp; } __device__ inline int atomicMax(int* address, int val) { return __atomic_fetch_max(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned int atomicMax(unsigned int* address, unsigned int val) { return __atomic_fetch_max(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned long long atomicMax( unsigned long long* address, unsigned long long val) { unsigned long long tmp{__atomic_load_n(address, __ATOMIC_RELAXED)}; while (tmp < val) { const auto tmp1 = __atomic_load_n(address, __ATOMIC_RELAXED); if (tmp1 != tmp) { tmp = tmp1; continue; } tmp = atomicCAS(address, tmp, val); } return tmp; } __device__ inline long long atomicMax(long long* address, long long val) { long long tmp{__atomic_load_n(address, __ATOMIC_RELAXED)}; while (tmp < val) { const auto tmp1 = __atomic_load_n(address, __ATOMIC_RELAXED); if (tmp1 != tmp) { tmp = tmp1; continue; } tmp = atomicCAS(address, tmp, val); } return tmp; } __device__ inline unsigned int atomicInc(unsigned int* address, unsigned int val) { return __builtin_amdgcn_atomic_inc32(address, val, __ATOMIC_RELAXED, "agent"); } __device__ inline unsigned int atomicDec(unsigned int* address, unsigned int val) { return __builtin_amdgcn_atomic_dec32(address, val, __ATOMIC_RELAXED, "agent"); } __device__ inline int atomicAnd(int* address, int val) { return __atomic_fetch_and(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned int atomicAnd(unsigned int* address, unsigned int val) { return __atomic_fetch_and(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned long long atomicAnd( unsigned long long* address, unsigned long long val) { return __atomic_fetch_and(address, val, __ATOMIC_RELAXED); } __device__ inline int atomicOr(int* address, int val) { return __atomic_fetch_or(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned int atomicOr(unsigned int* address, unsigned int val) { return __atomic_fetch_or(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned long long atomicOr( unsigned long long* address, unsigned long long val) { return __atomic_fetch_or(address, val, __ATOMIC_RELAXED); } __device__ inline int atomicXor(int* address, int val) { return __atomic_fetch_xor(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned int atomicXor(unsigned int* address, unsigned int val) { return __atomic_fetch_xor(address, val, __ATOMIC_RELAXED); } __device__ inline unsigned long long atomicXor( unsigned long long* address, unsigned long long val) { return __atomic_fetch_xor(address, val, __ATOMIC_RELAXED); } #endif // __hip_atomic_compare_exchange_strong clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_bf16.h000066400000000000000000000730401450307266000233710ustar00rootroot00000000000000/** * MIT License * * Copyright (c) 2019 - 2023 Advanced Micro Devices, Inc. All rights reserved. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ /** * \file * \brief hip_bf16.h provides struct for __hip_bfloat16 types */ /** * \defgroup HIP_INTRINSIC_BFLOAT16 bfloat16 Precision Intrinsics * This section describes hip_bfloat16 precision intrinsic functions. * To use these functions, include the header file \p hip_bf16.h in your program. */ /** * \defgroup HIP_INTRINSIC_BFLOAT16_ARITH Bfloat16 Arithmetic Functions * \ingroup HIP_INTRINSIC_BFLOAT16 * To use these functions, include the header file \p hip_bf16.h in your program. */ /** * \defgroup HIP_INTRINSIC_BFLOAT16_COMP Bfloat16 Comparision Functions * \ingroup HIP_INTRINSIC_BFLOAT16 * To use these functions, include the header file \p hip_bf16.h in your program. */ /** * \defgroup HIP_INTRINSIC_BFLOAT162_COMP Bfloat162 Comparision Functions * \ingroup HIP_INTRINSIC_BFLOAT16 * To use these functions, include the header file \p hip_bf16.h in your program. */ /** * \defgroup HIP_INTRINSIC_BFLOAT162_ARITH Bfloat162 Arithmetic Functions * \ingroup HIP_INTRINSIC_BFLOAT16 * To use these functions, include the header file \p hip_bf16.h in your program. */ /** * \defgroup HIP_INTRINSIC_BFLOAT16_CONV Bfloat16 Conversion Functions * \ingroup HIP_INTRINSIC_BFLOAT16 * To use these functions, include the header file \p hip_bf16.h in your program. */ /** * \defgroup HIP_INTRINSIC_BFLOAT162_CONV Bfloat162 Conversion Functions * \ingroup HIP_INTRINSIC_BFLOAT16 * To use these functions, include the header file \p hip_bf16.h in your program. */ /** * \defgroup HIP_INTRINSIC_BFLOAT16_MATH Bfloat16 Math Functions * \ingroup HIP_INTRINSIC_BFLOAT16 * To use these functions, include the header file \p hip_bf16.h in your program. */ /** * \defgroup HIP_INTRINSIC_BFLOAT162_MATH Bfloat162 Math Functions * \ingroup HIP_INTRINSIC_BFLOAT16 * To use these functions, include the header file \p hip_bf16.h in your program. */ #ifndef _HIP_INCLUDE_HIP_AMD_DETAIL_HIP_BF16_H_ #define _HIP_INCLUDE_HIP_AMD_DETAIL_HIP_BF16_H_ #include "amd_hip_vector_types.h" // float2 etc #include "device_library_decls.h" // ocml conversion functions #include "math_fwd.h" // ocml device functions #if defined(__HIPCC_RTC__) #define __HOST_DEVICE__ __device__ #else #include #define __HOST_DEVICE__ __host__ __device__ #endif // Since we are using unsigned short to represent data in bfloat16, it can be of different sizes on // different machines. These naive checks should prevent some undefined behavior on systems which // have different sizes for basic types. #if !defined(__HIPCC_RTC__) static_assert(CHAR_BIT == 8, "byte size should be of 8 bits"); #endif static_assert(sizeof(unsigned short) == 2, "size of unsigned short should be 2 bytes"); /*! \brief Struct to represent a 16 bit brain floating point number. */ struct __hip_bfloat16 { unsigned short data; }; /*! \brief Struct to represent two 16 bit brain floating point numbers. */ struct __hip_bfloat162 { __hip_bfloat16 x; __hip_bfloat16 y; }; /** * \ingroup HIP_INTRINSIC_BFLOAT16_CONV * \brief Converts bfloat16 to float */ __HOST_DEVICE__ inline float __bfloat162float(__hip_bfloat16 a) { unsigned int uval = 0; uval = a.data << 16; union { unsigned int u32; float fp32; } u = {uval}; return u.fp32; } /** * \ingroup HIP_INTRINSIC_BFLOAT16_CONV * \brief Converts float to bfloat16 */ __HOST_DEVICE__ __hip_bfloat16 __float2bfloat16(float f) { __hip_bfloat16 ret; union { float fp32; unsigned int u32; } u = {f}; if (~u.u32 & 0x7f800000) { // When the exponent bits are not all 1s, then the value is zero, normal, // or subnormal. We round the bfloat16 mantissa up by adding 0x7FFF, plus // 1 if the least significant bit of the bfloat16 mantissa is 1 (odd). // This causes the bfloat16's mantissa to be incremented by 1 if the 16 // least significant bits of the float mantissa are greater than 0x8000, // or if they are equal to 0x8000 and the least significant bit of the // bfloat16 mantissa is 1 (odd). This causes it to be rounded to even when // the lower 16 bits are exactly 0x8000. If the bfloat16 mantissa already // has the value 0x7f, then incrementing it causes it to become 0x00 and // the exponent is incremented by one, which is the next higher FP value // to the unrounded bfloat16 value. When the bfloat16 value is subnormal // with an exponent of 0x00 and a mantissa of 0x7F, it may be rounded up // to a normal value with an exponent of 0x01 and a mantissa of 0x00. // When the bfloat16 value has an exponent of 0xFE and a mantissa of 0x7F, // incrementing it causes it to become an exponent of 0xFF and a mantissa // of 0x00, which is Inf, the next higher value to the unrounded value. u.u32 += 0x7fff + ((u.u32 >> 16) & 1); // Round to nearest, round to even } else if (u.u32 & 0xffff) { // When all of the exponent bits are 1, the value is Inf or NaN. // Inf is indicated by a zero mantissa. NaN is indicated by any nonzero // mantissa bit. Quiet NaN is indicated by the most significant mantissa // bit being 1. Signaling NaN is indicated by the most significant // mantissa bit being 0 but some other bit(s) being 1. If any of the // lower 16 bits of the mantissa are 1, we set the least significant bit // of the bfloat16 mantissa, in order to preserve signaling NaN in case // the bloat16's mantissa bits are all 0. u.u32 |= 0x10000; // Preserve signaling NaN } ret.data = (u.u32 >> 16); return ret; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Converts and moves bfloat162 to float2 */ __HOST_DEVICE__ float2 __bfloat1622float2(const __hip_bfloat162 a) { return float2{__bfloat162float(a.x), __bfloat162float(a.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Moves bfloat16 value to bfloat162 */ __device__ __hip_bfloat162 __bfloat162bfloat162(const __hip_bfloat16 a) { return __hip_bfloat162{a, a}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Reinterprets bits in a __hip_bfloat16 as a signed short integer */ __device__ short int __bfloat16_as_short(const __hip_bfloat16 h) { return (short)h.data; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Reinterprets bits in a __hip_bfloat16 as an unsigned signed short integer */ __device__ unsigned short int __bfloat16_as_ushort(const __hip_bfloat16 h) { return h.data; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Convert double to __hip_bfloat16 */ __HOST_DEVICE__ __hip_bfloat16 __double2bfloat16(const double a) { return __float2bfloat16((float)a); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Convert float2 to __hip_bfloat162 */ __HOST_DEVICE__ __hip_bfloat162 __float22bfloat162_rn(const float2 a) { return __hip_bfloat162{__float2bfloat16(a.x), __float2bfloat16(a.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Combine two __hip_bfloat16 to __hip_bfloat162 */ __device__ __hip_bfloat162 __halves2bfloat162(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __hip_bfloat162{a, b}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Returns high 16 bits of __hip_bfloat162 */ __device__ __hip_bfloat16 __high2bfloat16(const __hip_bfloat162 a) { return a.y; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Returns high 16 bits of __hip_bfloat162 */ __device__ __hip_bfloat162 __high2bfloat162(const __hip_bfloat162 a) { return __hip_bfloat162{a.y, a.y}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Converts high 16 bits of __hip_bfloat162 to float and returns the result */ __HOST_DEVICE__ float __high2float(const __hip_bfloat162 a) { return __bfloat162float(a.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Extracts high 16 bits from each and combines them */ __device__ __hip_bfloat162 __highs2bfloat162(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{a.y, b.y}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Returns low 16 bits of __hip_bfloat162 */ __device__ __hip_bfloat16 __low2bfloat16(const __hip_bfloat162 a) { return a.x; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Returns low 16 bits of __hip_bfloat162 */ __device__ __hip_bfloat162 __low2bfloat162(const __hip_bfloat162 a) { return __hip_bfloat162{a.x, a.x}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Converts low 16 bits of __hip_bfloat162 to float and returns the result */ __HOST_DEVICE__ float __low2float(const __hip_bfloat162 a) { return __bfloat162float(a.x); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Swaps both halves */ __device__ __hip_bfloat162 __lowhigh2highlow(const __hip_bfloat162 a) { return __hip_bfloat162{a.y, a.x}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Extracts low 16 bits from each and combines them */ __device__ __hip_bfloat162 __lows2bfloat162(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{a.x, b.x}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Reinterprets short int into a bfloat16 */ __device__ __hip_bfloat16 __short_as_bfloat16(const short int a) { return __hip_bfloat16{(unsigned short)a}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_CONV * \brief Reinterprets unsigned short int into a bfloat16 */ __device__ __hip_bfloat16 __ushort_as_bfloat16(const unsigned short int a) { return __hip_bfloat16{a}; } /** * \ingroup HIP_INTRINSIC_BFLOAT16_ARITH * \brief Adds two bfloat16 values */ __device__ __hip_bfloat16 __hadd(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __float2bfloat16(__bfloat162float(a) + __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_ARITH * \brief Subtracts two bfloat16 values */ __device__ __hip_bfloat16 __hsub(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __float2bfloat16(__bfloat162float(a) - __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_ARITH * \brief Divides two bfloat16 values */ __device__ __hip_bfloat16 __hdiv(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __float2bfloat16(__bfloat162float(a) / __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_ARITH * \brief Performs FMA of given bfloat16 values */ __device__ __hip_bfloat16 __hfma(const __hip_bfloat16 a, const __hip_bfloat16 b, const __hip_bfloat16 c) { return __float2bfloat16( __ocml_fma_f32(__bfloat162float(a), __bfloat162float(b), __bfloat162float(c))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_ARITH * \brief Multiplies two bfloat16 values */ __device__ __hip_bfloat16 __hmul(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __float2bfloat16(__bfloat162float(a) * __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_ARITH * \brief Negate a bfloat16 value */ __device__ __hip_bfloat16 __hneg(const __hip_bfloat16 a) { auto ret = a; ret.data ^= 0x8000; return ret; } /** * \ingroup HIP_INTRINSIC_BFLOAT16_ARITH * \brief Returns absolute of a bfloat16 */ __device__ __hip_bfloat16 __habs(const __hip_bfloat16 a) { auto ret = a; ret.data &= 0x7FFF; return ret; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_ARITH * \brief Divides bfloat162 values */ __device__ __hip_bfloat162 __h2div(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{__float2bfloat16(__bfloat162float(a.x) / __bfloat162float(b.x)), __float2bfloat16(__bfloat162float(a.y) / __bfloat162float(b.y))}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_ARITH * \brief Returns absolute of a bfloat162 */ __device__ __hip_bfloat162 __habs2(const __hip_bfloat162 a) { return __hip_bfloat162{__habs(a.x), __habs(a.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_ARITH * \brief Adds two bfloat162 values */ __device__ __hip_bfloat162 __hadd2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{__hadd(a.x, b.x), __hadd(a.y, b.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_ARITH * \brief Performs FMA of given bfloat162 values */ __device__ __hip_bfloat162 __hfma2(const __hip_bfloat162 a, const __hip_bfloat162 b, const __hip_bfloat162 c) { return __hip_bfloat162{__hfma(a.x, b.x, c.x), __hfma(a.y, b.y, c.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_ARITH * \brief Multiplies two bfloat162 values */ __device__ __hip_bfloat162 __hmul2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{__hmul(a.x, b.x), __hmul(a.y, b.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_ARITH * \brief Converts a bfloat162 into negative */ __device__ __hip_bfloat162 __hneg2(const __hip_bfloat162 a) { return __hip_bfloat162{__hneg(a.x), __hneg(a.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_ARITH * \brief Subtracts two bfloat162 values */ __device__ __hip_bfloat162 __hsub2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{__hsub(a.x, b.x), __hsub(a.y, b.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values */ __device__ bool __heq(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __bfloat162float(a) == __bfloat162float(b); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - unordered equal */ __device__ bool __hequ(const __hip_bfloat16 a, const __hip_bfloat16 b) { return !(__bfloat162float(a) < __bfloat162float(b)) && !(__bfloat162float(a) > __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - greater than */ __device__ bool __hgt(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __bfloat162float(a) > __bfloat162float(b); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - unordered greater than */ __device__ bool __hgtu(const __hip_bfloat16 a, const __hip_bfloat16 b) { return !(__bfloat162float(a) <= __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - greater than equal */ __device__ bool __hge(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __bfloat162float(a) >= __bfloat162float(b); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - unordered greater than equal */ __device__ bool __hgeu(const __hip_bfloat16 a, const __hip_bfloat16 b) { return !(__bfloat162float(a) < __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - not equal */ __device__ bool __hne(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __bfloat162float(a) != __bfloat162float(b); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - unordered not equal */ __device__ bool __hneu(const __hip_bfloat16 a, const __hip_bfloat16 b) { return !(__bfloat162float(a) == __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - return max */ __device__ __hip_bfloat16 __hmax(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __float2bfloat16(__ocml_fmax_f32(__bfloat162float(a), __bfloat162float(b))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - return min */ __device__ __hip_bfloat16 __hmin(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __float2bfloat16(__ocml_fmin_f32(__bfloat162float(a), __bfloat162float(b))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - less than operator */ __device__ bool __hlt(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __bfloat162float(a) < __bfloat162float(b); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - unordered less than */ __device__ bool __hltu(const __hip_bfloat16 a, const __hip_bfloat16 b) { return !(__bfloat162float(a) >= __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - less than */ __device__ bool __hle(const __hip_bfloat16 a, const __hip_bfloat16 b) { return __bfloat162float(a) <= __bfloat162float(b); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Compare two bfloat162 values - unordered less than equal */ __device__ bool __hleu(const __hip_bfloat16 a, const __hip_bfloat16 b) { return !(__bfloat162float(a) > __bfloat162float(b)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Checks if number is inf */ __device__ int __hisinf(const __hip_bfloat16 a) { return __ocml_isinf_f32(__bfloat162float(a)); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_COMP * \brief Checks if number is nan */ __device__ bool __hisnan(const __hip_bfloat16 a) { return __ocml_isnan_f32(__bfloat162float(a)); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Checks if two numbers are equal */ __device__ bool __hbeq2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __heq(a.x, b.x) && __heq(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Checks if two numbers are equal - unordered */ __device__ bool __hbequ2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hequ(a.x, b.x) && __hequ(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a >= b */ __device__ bool __hbge2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hge(a.x, b.x) && __hge(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a >= b - unordered */ __device__ bool __hbgeu2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hgeu(a.x, b.x) && __hgeu(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a > b */ __device__ bool __hbgt2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hgt(a.x, b.x) && __hgt(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a > b - unordered */ __device__ bool __hbgtu2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hgtu(a.x, b.x) && __hgtu(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a <= b */ __device__ bool __hble2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hle(a.x, b.x) && __hle(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a <= b - unordered */ __device__ bool __hbleu2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hleu(a.x, b.x) && __hleu(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a < b */ __device__ bool __hblt2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hlt(a.x, b.x) && __hlt(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a < b - unordered */ __device__ bool __hbltu2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hltu(a.x, b.x) && __hltu(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a != b */ __device__ bool __hbne2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hne(a.x, b.x) && __hne(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a != b */ __device__ bool __hbneu2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hneu(a.x, b.x) && __hneu(a.y, b.y); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a != b, returns 1.0 if equal, otherwise 0.0 */ __device__ __hip_bfloat162 __heq2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{{__heq(a.x, b.x) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}, {__heq(a.y, b.y) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a >= b, returns 1.0 if greater than equal, otherwise 0.0 */ __device__ __hip_bfloat162 __hge2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{{__hge(a.x, b.x) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}, {__hge(a.y, b.y) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a > b, returns 1.0 if greater than equal, otherwise 0.0 */ __device__ __hip_bfloat162 __hgt2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{{__hgt(a.x, b.x) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}, {__hgt(a.y, b.y) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a is NaN, returns 1.0 if NaN, otherwise 0.0 */ __device__ __hip_bfloat162 __hisnan2(const __hip_bfloat162 a) { return __hip_bfloat162{ {__ocml_isnan_f32(__bfloat162float(a.x)) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}, {__ocml_isnan_f32(__bfloat162float(a.y)) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a <= b, returns 1.0 if greater than equal, otherwise 0.0 */ __device__ __hip_bfloat162 __hle2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{{__hle(a.x, b.x) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}, {__hle(a.y, b.y) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Check for a < b, returns 1.0 if greater than equal, otherwise 0.0 */ __device__ __hip_bfloat162 __hlt2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{{__hlt(a.x, b.x) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}, {__hlt(a.y, b.y) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Returns max of two elements */ __device__ __hip_bfloat162 __hmax2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{ __float2bfloat16(__ocml_fmax_f32(__bfloat162float(a.x), __bfloat162float(b.x))), __float2bfloat16(__ocml_fmax_f32(__bfloat162float(a.y), __bfloat162float(b.y)))}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Returns min of two elements */ __device__ __hip_bfloat162 __hmin2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{ __float2bfloat16(__ocml_fmin_f32(__bfloat162float(a.x), __bfloat162float(b.x))), __float2bfloat16(__ocml_fmin_f32(__bfloat162float(a.y), __bfloat162float(b.y)))}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_COMP * \brief Checks for not equal to */ __device__ __hip_bfloat162 __hne2(const __hip_bfloat162 a, const __hip_bfloat162 b) { return __hip_bfloat162{{__hne(a.x, b.x) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}, {__hne(a.y, b.y) ? __float2bfloat16(1.0f) : __float2bfloat16(0.0f)}}; } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate ceil of bfloat16 */ __device__ __hip_bfloat16 hceil(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_ceil_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate cosine of bfloat16 */ __device__ __hip_bfloat16 hcos(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_cos_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate exponential of bfloat16 */ __device__ __hip_bfloat16 hexp(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_exp_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate exponential 10 of bfloat16 */ __device__ __hip_bfloat16 hexp10(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_exp10_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate exponential 2 of bfloat16 */ __device__ __hip_bfloat16 hexp2(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_exp2_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate floor of bfloat16 */ __device__ __hip_bfloat16 hfloor(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_floor_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate natural log of bfloat16 */ __device__ __hip_bfloat16 hlog(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_log_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate log 10 of bfloat16 */ __device__ __hip_bfloat16 hlog10(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_log10_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate log 2 of bfloat16 */ __device__ __hip_bfloat16 hlog2(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_log2_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate reciprocal */ __device__ __hip_bfloat16 hrcp(const __hip_bfloat16 h) { return __float2bfloat16(1.0f / (__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Round to nearest int */ __device__ __hip_bfloat16 hrint(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_rint_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Reciprocal square root */ __device__ __hip_bfloat16 hrsqrt(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_rsqrt_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate sin of bfloat16 */ __device__ __hip_bfloat16 hsin(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_sin_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate sqrt of bfloat16 */ __device__ __hip_bfloat16 hsqrt(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_sqrt_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT16_MATH * \brief Calculate truncate of bfloat16 */ __device__ __hip_bfloat16 htrunc(const __hip_bfloat16 h) { return __float2bfloat16(__ocml_trunc_f32(__bfloat162float(h))); } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate ceil of bfloat162 */ __device__ __hip_bfloat162 h2ceil(const __hip_bfloat162 h) { return __hip_bfloat162{hceil(h.x), hceil(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate cosine of bfloat162 */ __device__ __hip_bfloat162 h2cos(const __hip_bfloat162 h) { return __hip_bfloat162{hcos(h.x), hcos(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate exponential of bfloat162 */ __device__ __hip_bfloat162 h2exp(const __hip_bfloat162 h) { return __hip_bfloat162{hexp(h.x), hexp(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate exponential 10 of bfloat162 */ __device__ __hip_bfloat162 h2exp10(const __hip_bfloat162 h) { return __hip_bfloat162{hexp10(h.x), hexp10(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate exponential 2 of bfloat162 */ __device__ __hip_bfloat162 h2exp2(const __hip_bfloat162 h) { return __hip_bfloat162{hexp2(h.x), hexp2(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate floor of bfloat162 */ __device__ __hip_bfloat162 h2floor(const __hip_bfloat162 h) { return __hip_bfloat162{hfloor(h.x), hfloor(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate natural log of bfloat162 */ __device__ __hip_bfloat162 h2log(const __hip_bfloat162 h) { return __hip_bfloat162{hlog(h.x), hlog(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate log 10 of bfloat162 */ __device__ __hip_bfloat162 h2log10(const __hip_bfloat162 h) { return __hip_bfloat162{hlog10(h.x), hlog10(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate log 2 of bfloat162 */ __device__ __hip_bfloat162 h2log2(const __hip_bfloat162 h) { return __hip_bfloat162{hlog2(h.x), hlog2(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate vector reciprocal */ __device__ __hip_bfloat162 h2rcp(const __hip_bfloat162 h) { return __hip_bfloat162{hrcp(h.x), hrcp(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate vector round to nearest int */ __device__ __hip_bfloat162 h2rint(const __hip_bfloat162 h) { return __hip_bfloat162{hrint(h.x), hrint(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate vector reciprocal square root */ __device__ __hip_bfloat162 h2rsqrt(const __hip_bfloat162 h) { return __hip_bfloat162{hrsqrt(h.x), hrsqrt(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate sin of bfloat162 */ __device__ __hip_bfloat162 h2sin(const __hip_bfloat162 h) { return __hip_bfloat162{hsin(h.x), hsin(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate sqrt of bfloat162 */ __device__ __hip_bfloat162 h2sqrt(const __hip_bfloat162 h) { return __hip_bfloat162{hsqrt(h.x), hsqrt(h.y)}; } /** * \ingroup HIP_INTRINSIC_BFLOAT162_MATH * \brief Calculate truncate of bfloat162 */ __device__ __hip_bfloat162 h2trunc(const __hip_bfloat162 h) { return __hip_bfloat162{htrunc(h.x), htrunc(h.y)}; } #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_bfloat16.h000066400000000000000000000224141450307266000242500ustar00rootroot00000000000000/** * MIT License * * Copyright (c) 2019 - 2022 Advanced Micro Devices, Inc. All rights reserved. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ /*!\file * \brief hip_bfloat16.h provides struct for hip_bfloat16 typedef */ #ifndef _HIP_INCLUDE_HIP_AMD_DETAIL_HIP_BFLOAT16_H_ #define _HIP_INCLUDE_HIP_AMD_DETAIL_HIP_BFLOAT16_H_ #include "host_defines.h" #if defined(__HIPCC_RTC__) #define __HOST_DEVICE__ __device__ #else #define __HOST_DEVICE__ __host__ __device__ #endif #if __cplusplus < 201103L || !defined(__HIPCC__) // If this is a C compiler, C++ compiler below C++11, or a host-only compiler, we only // include a minimal definition of hip_bfloat16 #include /*! \brief Struct to represent a 16 bit brain floating point number. */ typedef struct { uint16_t data; } hip_bfloat16; #else // __cplusplus < 201103L || !defined(__HIPCC__) #include #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wshadow" struct hip_bfloat16 { __hip_uint16_t data; enum truncate_t { truncate }; __HOST_DEVICE__ hip_bfloat16() = default; // round upper 16 bits of IEEE float to convert to bfloat16 explicit __HOST_DEVICE__ hip_bfloat16(float f) : data(float_to_bfloat16(f)) { } explicit __HOST_DEVICE__ hip_bfloat16(float f, truncate_t) : data(truncate_float_to_bfloat16(f)) { } // zero extend lower 16 bits of bfloat16 to convert to IEEE float __HOST_DEVICE__ operator float() const { union { uint32_t int32; float fp32; } u = {uint32_t(data) << 16}; return u.fp32; } __HOST_DEVICE__ hip_bfloat16 &operator=(const float& f) { data = float_to_bfloat16(f); return *this; } static __HOST_DEVICE__ hip_bfloat16 round_to_bfloat16(float f) { hip_bfloat16 output; output.data = float_to_bfloat16(f); return output; } static __HOST_DEVICE__ hip_bfloat16 round_to_bfloat16(float f, truncate_t) { hip_bfloat16 output; output.data = truncate_float_to_bfloat16(f); return output; } private: static __HOST_DEVICE__ __hip_uint16_t float_to_bfloat16(float f) { union { float fp32; uint32_t int32; } u = {f}; if(~u.int32 & 0x7f800000) { // When the exponent bits are not all 1s, then the value is zero, normal, // or subnormal. We round the bfloat16 mantissa up by adding 0x7FFF, plus // 1 if the least significant bit of the bfloat16 mantissa is 1 (odd). // This causes the bfloat16's mantissa to be incremented by 1 if the 16 // least significant bits of the float mantissa are greater than 0x8000, // or if they are equal to 0x8000 and the least significant bit of the // bfloat16 mantissa is 1 (odd). This causes it to be rounded to even when // the lower 16 bits are exactly 0x8000. If the bfloat16 mantissa already // has the value 0x7f, then incrementing it causes it to become 0x00 and // the exponent is incremented by one, which is the next higher FP value // to the unrounded bfloat16 value. When the bfloat16 value is subnormal // with an exponent of 0x00 and a mantissa of 0x7F, it may be rounded up // to a normal value with an exponent of 0x01 and a mantissa of 0x00. // When the bfloat16 value has an exponent of 0xFE and a mantissa of 0x7F, // incrementing it causes it to become an exponent of 0xFF and a mantissa // of 0x00, which is Inf, the next higher value to the unrounded value. u.int32 += 0x7fff + ((u.int32 >> 16) & 1); // Round to nearest, round to even } else if(u.int32 & 0xffff) { // When all of the exponent bits are 1, the value is Inf or NaN. // Inf is indicated by a zero mantissa. NaN is indicated by any nonzero // mantissa bit. Quiet NaN is indicated by the most significant mantissa // bit being 1. Signaling NaN is indicated by the most significant // mantissa bit being 0 but some other bit(s) being 1. If any of the // lower 16 bits of the mantissa are 1, we set the least significant bit // of the bfloat16 mantissa, in order to preserve signaling NaN in case // the bloat16's mantissa bits are all 0. u.int32 |= 0x10000; // Preserve signaling NaN } return __hip_uint16_t(u.int32 >> 16); } // Truncate instead of rounding, preserving SNaN static __HOST_DEVICE__ __hip_uint16_t truncate_float_to_bfloat16(float f) { union { float fp32; uint32_t int32; } u = {f}; return __hip_uint16_t(u.int32 >> 16) | (!(~u.int32 & 0x7f800000) && (u.int32 & 0xffff)); } }; #pragma clang diagnostic pop typedef struct { __hip_uint16_t data; } hip_bfloat16_public; static_assert(__hip_internal::is_standard_layout{}, "hip_bfloat16 is not a standard layout type, and thus is " "incompatible with C."); static_assert(__hip_internal::is_trivial{}, "hip_bfloat16 is not a trivial type, and thus is " "incompatible with C."); #if !defined(__HIPCC_RTC__) static_assert(sizeof(hip_bfloat16) == sizeof(hip_bfloat16_public) && offsetof(hip_bfloat16, data) == offsetof(hip_bfloat16_public, data), "internal hip_bfloat16 does not match public hip_bfloat16"); inline std::ostream& operator<<(std::ostream& os, const hip_bfloat16& bf16) { return os << float(bf16); } #endif inline __HOST_DEVICE__ hip_bfloat16 operator+(hip_bfloat16 a) { return a; } inline __HOST_DEVICE__ hip_bfloat16 operator-(hip_bfloat16 a) { a.data ^= 0x8000; return a; } inline __HOST_DEVICE__ hip_bfloat16 operator+(hip_bfloat16 a, hip_bfloat16 b) { return hip_bfloat16(float(a) + float(b)); } inline __HOST_DEVICE__ hip_bfloat16 operator-(hip_bfloat16 a, hip_bfloat16 b) { return hip_bfloat16(float(a) - float(b)); } inline __HOST_DEVICE__ hip_bfloat16 operator*(hip_bfloat16 a, hip_bfloat16 b) { return hip_bfloat16(float(a) * float(b)); } inline __HOST_DEVICE__ hip_bfloat16 operator/(hip_bfloat16 a, hip_bfloat16 b) { return hip_bfloat16(float(a) / float(b)); } inline __HOST_DEVICE__ bool operator<(hip_bfloat16 a, hip_bfloat16 b) { return float(a) < float(b); } inline __HOST_DEVICE__ bool operator==(hip_bfloat16 a, hip_bfloat16 b) { return float(a) == float(b); } inline __HOST_DEVICE__ bool operator>(hip_bfloat16 a, hip_bfloat16 b) { return b < a; } inline __HOST_DEVICE__ bool operator<=(hip_bfloat16 a, hip_bfloat16 b) { return !(a > b); } inline __HOST_DEVICE__ bool operator!=(hip_bfloat16 a, hip_bfloat16 b) { return !(a == b); } inline __HOST_DEVICE__ bool operator>=(hip_bfloat16 a, hip_bfloat16 b) { return !(a < b); } inline __HOST_DEVICE__ hip_bfloat16& operator+=(hip_bfloat16& a, hip_bfloat16 b) { return a = a + b; } inline __HOST_DEVICE__ hip_bfloat16& operator-=(hip_bfloat16& a, hip_bfloat16 b) { return a = a - b; } inline __HOST_DEVICE__ hip_bfloat16& operator*=(hip_bfloat16& a, hip_bfloat16 b) { return a = a * b; } inline __HOST_DEVICE__ hip_bfloat16& operator/=(hip_bfloat16& a, hip_bfloat16 b) { return a = a / b; } inline __HOST_DEVICE__ hip_bfloat16& operator++(hip_bfloat16& a) { return a += hip_bfloat16(1.0f); } inline __HOST_DEVICE__ hip_bfloat16& operator--(hip_bfloat16& a) { return a -= hip_bfloat16(1.0f); } inline __HOST_DEVICE__ hip_bfloat16 operator++(hip_bfloat16& a, int) { hip_bfloat16 orig = a; ++a; return orig; } inline __HOST_DEVICE__ hip_bfloat16 operator--(hip_bfloat16& a, int) { hip_bfloat16 orig = a; --a; return orig; } namespace std { constexpr __HOST_DEVICE__ bool isinf(hip_bfloat16 a) { return !(~a.data & 0x7f80) && !(a.data & 0x7f); } constexpr __HOST_DEVICE__ bool isnan(hip_bfloat16 a) { return !(~a.data & 0x7f80) && +(a.data & 0x7f); } constexpr __HOST_DEVICE__ bool iszero(hip_bfloat16 a) { return !(a.data & 0x7fff); } } #endif // __cplusplus < 201103L || !defined(__HIPCC__) #endif // _HIP_BFLOAT16_H_ clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_common.h000066400000000000000000000025321450307266000241210ustar00rootroot00000000000000/* Copyright (c) 2019 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COMMON_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COMMON_H #if defined(__clang__) && defined(__HIP__) #define __HIP_CLANG_ONLY__ 1 #else #define __HIP_CLANG_ONLY__ 0 #endif #endif // HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COMMON_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_complex.h000066400000000000000000000344641450307266000243110ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COMPLEX_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COMPLEX_H #include "hip/amd_detail/amd_hip_vector_types.h" #if defined(__HIPCC_RTC__) #define __HOST_DEVICE__ __device__ #else #define __HOST_DEVICE__ __host__ __device__ // TODO: Clang has a bug which allows device functions to call std functions // when std functions are introduced into default namespace by using statement. // math.h may be included after this bug is fixed. #if __cplusplus #include #else #include "math.h" #endif #endif // !defined(__HIPCC_RTC__) #if __cplusplus #define COMPLEX_NEG_OP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type operator-(const type& op) { \ type ret; \ ret.x = -op.x; \ ret.y = -op.y; \ return ret; \ } #define COMPLEX_EQ_OP_OVERLOAD(type) \ __HOST_DEVICE__ static inline bool operator==(const type& lhs, const type& rhs) { \ return lhs.x == rhs.x && lhs.y == rhs.y; \ } #define COMPLEX_NE_OP_OVERLOAD(type) \ __HOST_DEVICE__ static inline bool operator!=(const type& lhs, const type& rhs) { \ return !(lhs == rhs); \ } #define COMPLEX_ADD_OP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type operator+(const type& lhs, const type& rhs) { \ type ret; \ ret.x = lhs.x + rhs.x; \ ret.y = lhs.y + rhs.y; \ return ret; \ } #define COMPLEX_SUB_OP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type operator-(const type& lhs, const type& rhs) { \ type ret; \ ret.x = lhs.x - rhs.x; \ ret.y = lhs.y - rhs.y; \ return ret; \ } #define COMPLEX_MUL_OP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type operator*(const type& lhs, const type& rhs) { \ type ret; \ ret.x = lhs.x * rhs.x - lhs.y * rhs.y; \ ret.y = lhs.x * rhs.y + lhs.y * rhs.x; \ return ret; \ } #define COMPLEX_DIV_OP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type operator/(const type& lhs, const type& rhs) { \ type ret; \ ret.x = (lhs.x * rhs.x + lhs.y * rhs.y); \ ret.y = (rhs.x * lhs.y - lhs.x * rhs.y); \ ret.x = ret.x / (rhs.x * rhs.x + rhs.y * rhs.y); \ ret.y = ret.y / (rhs.x * rhs.x + rhs.y * rhs.y); \ return ret; \ } #define COMPLEX_ADD_PREOP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type& operator+=(type& lhs, const type& rhs) { \ lhs.x += rhs.x; \ lhs.y += rhs.y; \ return lhs; \ } #define COMPLEX_SUB_PREOP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type& operator-=(type& lhs, const type& rhs) { \ lhs.x -= rhs.x; \ lhs.y -= rhs.y; \ return lhs; \ } #define COMPLEX_MUL_PREOP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type& operator*=(type& lhs, const type& rhs) { \ type temp{lhs}; \ lhs.x = rhs.x * temp.x - rhs.y * temp.y; \ lhs.y = rhs.y * temp.x + rhs.x * temp.y; \ return lhs; \ } #define COMPLEX_DIV_PREOP_OVERLOAD(type) \ __HOST_DEVICE__ static inline type& operator/=(type& lhs, const type& rhs) { \ type temp; \ temp.x = (lhs.x*rhs.x + lhs.y * rhs.y) / (rhs.x*rhs.x + rhs.y*rhs.y); \ temp.y = (lhs.y * rhs.x - lhs.x * rhs.y) / (rhs.x*rhs.x + rhs.y*rhs.y); \ lhs = temp; \ return lhs; \ } #define COMPLEX_SCALAR_PRODUCT(type, type1) \ __HOST_DEVICE__ static inline type operator*(const type& lhs, type1 rhs) { \ type ret; \ ret.x = lhs.x * rhs; \ ret.y = lhs.y * rhs; \ return ret; \ } #endif typedef float2 hipFloatComplex; __HOST_DEVICE__ static inline float hipCrealf(hipFloatComplex z) { return z.x; } __HOST_DEVICE__ static inline float hipCimagf(hipFloatComplex z) { return z.y; } __HOST_DEVICE__ static inline hipFloatComplex make_hipFloatComplex(float a, float b) { hipFloatComplex z; z.x = a; z.y = b; return z; } __HOST_DEVICE__ static inline hipFloatComplex hipConjf(hipFloatComplex z) { hipFloatComplex ret; ret.x = z.x; ret.y = -z.y; return ret; } __HOST_DEVICE__ static inline float hipCsqabsf(hipFloatComplex z) { return z.x * z.x + z.y * z.y; } __HOST_DEVICE__ static inline hipFloatComplex hipCaddf(hipFloatComplex p, hipFloatComplex q) { return make_hipFloatComplex(p.x + q.x, p.y + q.y); } __HOST_DEVICE__ static inline hipFloatComplex hipCsubf(hipFloatComplex p, hipFloatComplex q) { return make_hipFloatComplex(p.x - q.x, p.y - q.y); } __HOST_DEVICE__ static inline hipFloatComplex hipCmulf(hipFloatComplex p, hipFloatComplex q) { return make_hipFloatComplex(p.x * q.x - p.y * q.y, p.y * q.x + p.x * q.y); } __HOST_DEVICE__ static inline hipFloatComplex hipCdivf(hipFloatComplex p, hipFloatComplex q) { float sqabs = hipCsqabsf(q); hipFloatComplex ret; ret.x = (p.x * q.x + p.y * q.y) / sqabs; ret.y = (p.y * q.x - p.x * q.y) / sqabs; return ret; } __HOST_DEVICE__ static inline float hipCabsf(hipFloatComplex z) { return sqrtf(hipCsqabsf(z)); } typedef double2 hipDoubleComplex; __HOST_DEVICE__ static inline double hipCreal(hipDoubleComplex z) { return z.x; } __HOST_DEVICE__ static inline double hipCimag(hipDoubleComplex z) { return z.y; } __HOST_DEVICE__ static inline hipDoubleComplex make_hipDoubleComplex(double a, double b) { hipDoubleComplex z; z.x = a; z.y = b; return z; } __HOST_DEVICE__ static inline hipDoubleComplex hipConj(hipDoubleComplex z) { hipDoubleComplex ret; ret.x = z.x; ret.y = -z.y; return ret; } __HOST_DEVICE__ static inline double hipCsqabs(hipDoubleComplex z) { return z.x * z.x + z.y * z.y; } __HOST_DEVICE__ static inline hipDoubleComplex hipCadd(hipDoubleComplex p, hipDoubleComplex q) { return make_hipDoubleComplex(p.x + q.x, p.y + q.y); } __HOST_DEVICE__ static inline hipDoubleComplex hipCsub(hipDoubleComplex p, hipDoubleComplex q) { return make_hipDoubleComplex(p.x - q.x, p.y - q.y); } __HOST_DEVICE__ static inline hipDoubleComplex hipCmul(hipDoubleComplex p, hipDoubleComplex q) { return make_hipDoubleComplex(p.x * q.x - p.y * q.y, p.y * q.x + p.x * q.y); } __HOST_DEVICE__ static inline hipDoubleComplex hipCdiv(hipDoubleComplex p, hipDoubleComplex q) { double sqabs = hipCsqabs(q); hipDoubleComplex ret; ret.x = (p.x * q.x + p.y * q.y) / sqabs; ret.y = (p.y * q.x - p.x * q.y) / sqabs; return ret; } __HOST_DEVICE__ static inline double hipCabs(hipDoubleComplex z) { return sqrt(hipCsqabs(z)); } #if __cplusplus COMPLEX_NEG_OP_OVERLOAD(hipFloatComplex) COMPLEX_EQ_OP_OVERLOAD(hipFloatComplex) COMPLEX_NE_OP_OVERLOAD(hipFloatComplex) COMPLEX_ADD_OP_OVERLOAD(hipFloatComplex) COMPLEX_SUB_OP_OVERLOAD(hipFloatComplex) COMPLEX_MUL_OP_OVERLOAD(hipFloatComplex) COMPLEX_DIV_OP_OVERLOAD(hipFloatComplex) COMPLEX_ADD_PREOP_OVERLOAD(hipFloatComplex) COMPLEX_SUB_PREOP_OVERLOAD(hipFloatComplex) COMPLEX_MUL_PREOP_OVERLOAD(hipFloatComplex) COMPLEX_DIV_PREOP_OVERLOAD(hipFloatComplex) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, unsigned short) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, signed short) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, unsigned int) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, signed int) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, float) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, unsigned long) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, signed long) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, double) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, signed long long) COMPLEX_SCALAR_PRODUCT(hipFloatComplex, unsigned long long) COMPLEX_NEG_OP_OVERLOAD(hipDoubleComplex) COMPLEX_EQ_OP_OVERLOAD(hipDoubleComplex) COMPLEX_NE_OP_OVERLOAD(hipDoubleComplex) COMPLEX_ADD_OP_OVERLOAD(hipDoubleComplex) COMPLEX_SUB_OP_OVERLOAD(hipDoubleComplex) COMPLEX_MUL_OP_OVERLOAD(hipDoubleComplex) COMPLEX_DIV_OP_OVERLOAD(hipDoubleComplex) COMPLEX_ADD_PREOP_OVERLOAD(hipDoubleComplex) COMPLEX_SUB_PREOP_OVERLOAD(hipDoubleComplex) COMPLEX_MUL_PREOP_OVERLOAD(hipDoubleComplex) COMPLEX_DIV_PREOP_OVERLOAD(hipDoubleComplex) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, unsigned short) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, signed short) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, unsigned int) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, signed int) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, float) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, unsigned long) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, signed long) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, double) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, signed long long) COMPLEX_SCALAR_PRODUCT(hipDoubleComplex, unsigned long long) #endif typedef hipFloatComplex hipComplex; __HOST_DEVICE__ static inline hipComplex make_hipComplex(float x, float y) { return make_hipFloatComplex(x, y); } __HOST_DEVICE__ static inline hipFloatComplex hipComplexDoubleToFloat(hipDoubleComplex z) { return make_hipFloatComplex((float)z.x, (float)z.y); } __HOST_DEVICE__ static inline hipDoubleComplex hipComplexFloatToDouble(hipFloatComplex z) { return make_hipDoubleComplex((double)z.x, (double)z.y); } __HOST_DEVICE__ static inline hipComplex hipCfmaf(hipComplex p, hipComplex q, hipComplex r) { float real = (p.x * q.x) + r.x; float imag = (q.x * p.y) + r.y; real = -(p.y * q.y) + real; imag = (p.x * q.y) + imag; return make_hipComplex(real, imag); } __HOST_DEVICE__ static inline hipDoubleComplex hipCfma(hipDoubleComplex p, hipDoubleComplex q, hipDoubleComplex r) { double real = (p.x * q.x) + r.x; double imag = (q.x * p.y) + r.y; real = -(p.y * q.y) + real; imag = (p.x * q.y) + imag; return make_hipDoubleComplex(real, imag); } #endif //HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COMPLEX_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_cooperative_groups.h000066400000000000000000000773041450307266000265610ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /** * @file amd_detail/hip_cooperative_groups.h * * @brief Device side implementation of `Cooperative Group` feature. * * Defines new types and device API wrappers related to `Cooperative Group` * feature, which the programmer can directly use in his kernel(s) in order to * make use of this feature. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COOPERATIVE_GROUPS_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COOPERATIVE_GROUPS_H #if defined(__clang__) #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wc++98-compat" #pragma clang diagnostic ignored "-Wsign-conversion" #pragma clang diagnostic ignored "-Wunused-parameter" #pragma clang diagnostic ignored "-Wreserved-macro-identifier" #pragma clang diagnostic ignored "-Wpadded" #endif #if __cplusplus #if !defined(__HIPCC_RTC__) #include #endif #define __hip_abort() \ { asm("trap;"); } #if defined(NDEBUG) #define __hip_assert(COND) #else #define __hip_assert(COND) \ { \ if (!COND) { \ __hip_abort(); \ } \ } #endif namespace cooperative_groups { /** @brief The base type of all cooperative group types * * \details Holds the key properties of a constructed cooperative group types * object, like the group type, its size, etc * * @note Cooperative groups feature is implemented on Linux, under developement * on Windows. */ class thread_group { protected: uint32_t _type; // thread_group type uint32_t _size; // total number of threads in the tread_group uint64_t _mask; // Lanemask for coalesced and tiled partitioned group types, // LSB represents lane 0, and MSB represents lane 63 // Construct a thread group, and set thread group type and other essential // thread group properties. This generic thread group is directly constructed // only when the group is supposed to contain only the calling the thread // (throurh the API - `this_thread()`), and in all other cases, this thread // group object is a sub-object of some other derived thread group object __CG_QUALIFIER__ thread_group(internal::group_type type, uint32_t size = static_cast(0), uint64_t mask = static_cast(0)) { _type = type; _size = size; _mask = mask; } struct _tiled_info { bool is_tiled; unsigned int size; unsigned int meta_group_rank; unsigned int meta_group_size; }; struct _coalesced_info { lane_mask member_mask; unsigned int size; struct _tiled_info tiled_info; } coalesced_info; friend __CG_QUALIFIER__ thread_group tiled_partition(const thread_group& parent, unsigned int tile_size); friend class thread_block; public: // Total number of threads in the thread group, and this serves the purpose // for all derived cooperative group types since their `size` is directly // saved during the construction __CG_QUALIFIER__ uint32_t size() const { return _size; } __CG_QUALIFIER__ unsigned int cg_type() const { return _type; } // Rank of the calling thread within [0, size()) __CG_QUALIFIER__ uint32_t thread_rank() const; // Is this cooperative group type valid? __CG_QUALIFIER__ bool is_valid() const; // synchronize the threads in the thread group __CG_QUALIFIER__ void sync() const; }; /** *------------------------------------------------------------------------------------------------- *------------------------------------------------------------------------------------------------- * @defgroup CooperativeG Cooperative Groups * @ingroup API * @{ * This section describes the cooperative groups functions of HIP runtime API. * * The cooperative groups provides flexible thread parallel programming algorithms, threads * cooperate and share data to perform collective computations. * * @note Cooperative groups feature is implemented on Linux, under developement * on Windows. * */ /** \brief The multi-grid cooperative group type * * \details Represents an inter-device cooperative group type where the * participating threads within the group spans across multple * devices, running the (same) kernel on these devices * @note The multi-grid cooperative group type is implemented on Linux, under developement * on Windows. */ class multi_grid_group : public thread_group { // Only these friend functions are allowed to construct an object of this class // and access its resources friend __CG_QUALIFIER__ multi_grid_group this_multi_grid(); protected: // Construct mutli-grid thread group (through the API this_multi_grid()) explicit __CG_QUALIFIER__ multi_grid_group(uint32_t size) : thread_group(internal::cg_multi_grid, size) {} public: // Number of invocations participating in this multi-grid group. In other // words, the number of GPUs __CG_QUALIFIER__ uint32_t num_grids() { return internal::multi_grid::num_grids(); } // Rank of this invocation. In other words, an ID number within the range // [0, num_grids()) of the GPU, this kernel is running on __CG_QUALIFIER__ uint32_t grid_rank() { return internal::multi_grid::grid_rank(); } __CG_QUALIFIER__ uint32_t thread_rank() const { return internal::multi_grid::thread_rank(); } __CG_QUALIFIER__ bool is_valid() const { return internal::multi_grid::is_valid(); } __CG_QUALIFIER__ void sync() const { internal::multi_grid::sync(); } }; /** @brief User exposed API interface to construct multi-grid cooperative * group type object - `multi_grid_group` * * \details User is not allowed to directly construct an object of type * `multi_grid_group`. Instead, he should construct it through this * API function * @note This multi-grid cooperative API type is implemented on Linux, under developement * on Windows. */ __CG_QUALIFIER__ multi_grid_group this_multi_grid() { return multi_grid_group(internal::multi_grid::size()); } /** @brief The grid cooperative group type * * \details Represents an inter-workgroup cooperative group type where the * participating threads within the group spans across multiple * workgroups running the (same) kernel on the same device * @note This is implemented on Linux, under developement * on Windows. */ class grid_group : public thread_group { // Only these friend functions are allowed to construct an object of this class // and access its resources friend __CG_QUALIFIER__ grid_group this_grid(); protected: // Construct grid thread group (through the API this_grid()) explicit __CG_QUALIFIER__ grid_group(uint32_t size) : thread_group(internal::cg_grid, size) {} public: __CG_QUALIFIER__ uint32_t thread_rank() const { return internal::grid::thread_rank(); } __CG_QUALIFIER__ bool is_valid() const { return internal::grid::is_valid(); } __CG_QUALIFIER__ void sync() const { internal::grid::sync(); } }; /** @brief User exposed API interface to construct grid cooperative group type * object - `grid_group` * * \details User is not allowed to directly construct an object of type * `multi_grid_group`. Instead, he should construct it through this * API function * @note This function is implemented on Linux, under developement * on Windows. */ __CG_QUALIFIER__ grid_group this_grid() { return grid_group(internal::grid::size()); } /** @brief The workgroup (thread-block in CUDA terminology) cooperative group * type * * \details Represents an intra-workgroup cooperative group type where the * participating threads within the group are exactly the same threads * which are participated in the currently executing `workgroup` * @note This is implemented on Linux, under developement * on Windows. */ class thread_block : public thread_group { // Only these friend functions are allowed to construct an object of thi // class and access its resources friend __CG_QUALIFIER__ thread_block this_thread_block(); friend __CG_QUALIFIER__ thread_group tiled_partition(const thread_group& parent, unsigned int tile_size); friend __CG_QUALIFIER__ thread_group tiled_partition(const thread_block& parent, unsigned int tile_size); protected: // Construct a workgroup thread group (through the API this_thread_block()) explicit __CG_QUALIFIER__ thread_block(uint32_t size) : thread_group(internal::cg_workgroup, size) {} __CG_QUALIFIER__ thread_group new_tiled_group(unsigned int tile_size) const { const bool pow2 = ((tile_size & (tile_size - 1)) == 0); // Invalid tile size, assert if (!tile_size || (tile_size > __AMDGCN_WAVEFRONT_SIZE) || !pow2) { __hip_assert(false && "invalid tile size"); } thread_group tiledGroup = thread_group(internal::cg_tiled_group, tile_size); tiledGroup.coalesced_info.tiled_info.size = tile_size; tiledGroup.coalesced_info.tiled_info.is_tiled = true; tiledGroup.coalesced_info.tiled_info.meta_group_rank = thread_rank() / tile_size; tiledGroup.coalesced_info.tiled_info.meta_group_size = (size() + tile_size - 1) / tile_size; return tiledGroup; } public: // 3-dimensional block index within the grid __CG_STATIC_QUALIFIER__ dim3 group_index() { return internal::workgroup::group_index(); } // 3-dimensional thread index within the block __CG_STATIC_QUALIFIER__ dim3 thread_index() { return internal::workgroup::thread_index(); } __CG_STATIC_QUALIFIER__ uint32_t thread_rank() { return internal::workgroup::thread_rank(); } __CG_STATIC_QUALIFIER__ uint32_t size() { return internal::workgroup::size(); } __CG_STATIC_QUALIFIER__ bool is_valid() { return internal::workgroup::is_valid(); } __CG_STATIC_QUALIFIER__ void sync() { internal::workgroup::sync(); } __CG_QUALIFIER__ dim3 group_dim() { return internal::workgroup::block_dim(); } }; /** \brief User exposed API interface to construct workgroup cooperative * group type object - `thread_block`. * * \details User is not allowed to directly construct an object of type * `thread_block`. Instead, he should construct it through this API * function. * @note This function is implemented on Linux, under developement * on Windows. */ __CG_QUALIFIER__ thread_block this_thread_block() { return thread_block(internal::workgroup::size()); } /** \brief The tiled_group cooperative group type * * \details Represents one tiled thread group in a wavefront. * This group type also supports sub-wave level intrinsics. * @note This is implemented on Linux, under developement * on Windows. */ class tiled_group : public thread_group { private: friend __CG_QUALIFIER__ thread_group tiled_partition(const thread_group& parent, unsigned int tile_size); friend __CG_QUALIFIER__ tiled_group tiled_partition(const tiled_group& parent, unsigned int tile_size); __CG_QUALIFIER__ tiled_group new_tiled_group(unsigned int tile_size) const { const bool pow2 = ((tile_size & (tile_size - 1)) == 0); if (!tile_size || (tile_size > __AMDGCN_WAVEFRONT_SIZE) || !pow2) { __hip_assert(false && "invalid tile size"); } if (size() <= tile_size) { return *this; } tiled_group tiledGroup = tiled_group(tile_size); tiledGroup.coalesced_info.tiled_info.is_tiled = true; return tiledGroup; } protected: explicit __CG_QUALIFIER__ tiled_group(unsigned int tileSize) : thread_group(internal::cg_tiled_group, tileSize) { coalesced_info.tiled_info.size = tileSize; coalesced_info.tiled_info.is_tiled = true; } public: __CG_QUALIFIER__ unsigned int size() const { return (coalesced_info.tiled_info.size); } __CG_QUALIFIER__ unsigned int thread_rank() const { return (internal::workgroup::thread_rank() & (coalesced_info.tiled_info.size - 1)); } __CG_QUALIFIER__ void sync() const { internal::tiled_group::sync(); } }; /** \brief The coalesced_group cooperative group type * * \details Represents a active thread group in a wavefront. * This group type also supports sub-wave level intrinsics. * @note This is implemented on Linux, under developement * on Windows. */ class coalesced_group : public thread_group { private: friend __CG_QUALIFIER__ coalesced_group coalesced_threads(); friend __CG_QUALIFIER__ thread_group tiled_partition(const thread_group& parent, unsigned int tile_size); friend __CG_QUALIFIER__ coalesced_group tiled_partition(const coalesced_group& parent, unsigned int tile_size); __CG_QUALIFIER__ coalesced_group new_tiled_group(unsigned int tile_size) const { const bool pow2 = ((tile_size & (tile_size - 1)) == 0); if (!tile_size || (tile_size > size()) || !pow2) { return coalesced_group(0); } // If a tiled group is passed to be partitioned further into a coalesced_group. // prepare a mask for further partitioning it so that it stays coalesced. if (coalesced_info.tiled_info.is_tiled) { unsigned int base_offset = (thread_rank() & (~(tile_size - 1))); unsigned int masklength = min(static_cast(size()) - base_offset, tile_size); lane_mask member_mask = static_cast(-1) >> (__AMDGCN_WAVEFRONT_SIZE - masklength); member_mask <<= (__lane_id() & ~(tile_size - 1)); coalesced_group coalesced_tile = coalesced_group(member_mask); coalesced_tile.coalesced_info.tiled_info.is_tiled = true; coalesced_tile.coalesced_info.tiled_info.meta_group_rank = thread_rank() / tile_size; coalesced_tile.coalesced_info.tiled_info.meta_group_size = size() / tile_size; return coalesced_tile; } // Here the parent coalesced_group is not partitioned. else { lane_mask member_mask = 0; unsigned int tile_rank = 0; int lanes_to_skip = ((thread_rank()) / tile_size) * tile_size; for (unsigned int i = 0; i < __AMDGCN_WAVEFRONT_SIZE; i++) { lane_mask active = coalesced_info.member_mask & (1 << i); // Make sure the lane is active if (active) { if (lanes_to_skip <= 0 && tile_rank < tile_size) { // Prepare a member_mask that is appropriate for a tile member_mask |= active; tile_rank++; } lanes_to_skip--; } } coalesced_group coalesced_tile = coalesced_group(member_mask); coalesced_tile.coalesced_info.tiled_info.meta_group_rank = thread_rank() / tile_size; coalesced_tile.coalesced_info.tiled_info.meta_group_size = (size() + tile_size - 1) / tile_size; return coalesced_tile; } return coalesced_group(0); } protected: // Constructor explicit __CG_QUALIFIER__ coalesced_group(lane_mask member_mask) : thread_group(internal::cg_coalesced_group) { coalesced_info.member_mask = member_mask; // Which threads are active coalesced_info.size = __popcll(coalesced_info.member_mask); // How many threads are active coalesced_info.tiled_info.is_tiled = false; // Not a partitioned group } public: __CG_QUALIFIER__ unsigned int size() const { return coalesced_info.size; } __CG_QUALIFIER__ unsigned int thread_rank() const { return internal::coalesced_group::masked_bit_count(coalesced_info.member_mask); } __CG_QUALIFIER__ void sync() const { internal::coalesced_group::sync(); } __CG_QUALIFIER__ unsigned int meta_group_rank() const { return coalesced_info.tiled_info.meta_group_rank; } __CG_QUALIFIER__ unsigned int meta_group_size() const { return coalesced_info.tiled_info.meta_group_size; } template __CG_QUALIFIER__ T shfl(T var, int srcRank) const { static_assert(is_valid_type::value, "Neither an integer or float type."); srcRank = srcRank % static_cast(size()); int lane = (size() == __AMDGCN_WAVEFRONT_SIZE) ? srcRank : (__AMDGCN_WAVEFRONT_SIZE == 64) ? __fns64(coalesced_info.member_mask, 0, (srcRank + 1)) : __fns32(coalesced_info.member_mask, 0, (srcRank + 1)); return __shfl(var, lane, __AMDGCN_WAVEFRONT_SIZE); } template __CG_QUALIFIER__ T shfl_down(T var, unsigned int lane_delta) const { static_assert(is_valid_type::value, "Neither an integer or float type."); // Note: The cuda implementation appears to use the remainder of lane_delta // and WARP_SIZE as the shift value rather than lane_delta itself. // This is not described in the documentation and is not done here. if (size() == __AMDGCN_WAVEFRONT_SIZE) { return __shfl_down(var, lane_delta, __AMDGCN_WAVEFRONT_SIZE); } int lane; if (__AMDGCN_WAVEFRONT_SIZE == 64) { lane = __fns64(coalesced_info.member_mask, __lane_id(), lane_delta + 1); } else { lane = __fns32(coalesced_info.member_mask, __lane_id(), lane_delta + 1); } if (lane == -1) { lane = __lane_id(); } return __shfl(var, lane, __AMDGCN_WAVEFRONT_SIZE); } template __CG_QUALIFIER__ T shfl_up(T var, unsigned int lane_delta) const { static_assert(is_valid_type::value, "Neither an integer or float type."); // Note: The cuda implementation appears to use the remainder of lane_delta // and WARP_SIZE as the shift value rather than lane_delta itself. // This is not described in the documentation and is not done here. if (size() == __AMDGCN_WAVEFRONT_SIZE) { return __shfl_up(var, lane_delta, __AMDGCN_WAVEFRONT_SIZE); } int lane; if (__AMDGCN_WAVEFRONT_SIZE == 64) { lane = __fns64(coalesced_info.member_mask, __lane_id(), -(lane_delta + 1)); } else if (__AMDGCN_WAVEFRONT_SIZE == 32) { lane = __fns32(coalesced_info.member_mask, __lane_id(), -(lane_delta + 1)); } if (lane == -1) { lane = __lane_id(); } return __shfl(var, lane, __AMDGCN_WAVEFRONT_SIZE); } }; /** \brief User exposed API to create coalesced groups. * * \details A collective operation that groups all active lanes into a new thread group. * @note This function is implemented on Linux, under developement * on Windows. */ __CG_QUALIFIER__ coalesced_group coalesced_threads() { return cooperative_groups::coalesced_group(__builtin_amdgcn_read_exec()); } /** * Implemenation of all publicly exposed base class APIs * @note This function is implemented on Linux, under developement * on Windows. */ __CG_QUALIFIER__ uint32_t thread_group::thread_rank() const { switch (this->_type) { case internal::cg_multi_grid: { return (static_cast(this)->thread_rank()); } case internal::cg_grid: { return (static_cast(this)->thread_rank()); } case internal::cg_workgroup: { return (static_cast(this)->thread_rank()); } case internal::cg_tiled_group: { return (static_cast(this)->thread_rank()); } case internal::cg_coalesced_group: { return (static_cast(this)->thread_rank()); } default: { __hip_assert(false && "invalid cooperative group type"); return -1; } } } /** * Implemenation of all publicly exposed thread group API * @note This function is implemented on Linux, under developement * on Windows. */ __CG_QUALIFIER__ bool thread_group::is_valid() const { switch (this->_type) { case internal::cg_multi_grid: { return (static_cast(this)->is_valid()); } case internal::cg_grid: { return (static_cast(this)->is_valid()); } case internal::cg_workgroup: { return (static_cast(this)->is_valid()); } case internal::cg_tiled_group: { return (static_cast(this)->is_valid()); } case internal::cg_coalesced_group: { return (static_cast(this)->is_valid()); } default: { __hip_assert(false && "invalid cooperative group type"); return false; } } } /** * Implemenation of all publicly exposed thread group sync API * @note This function is implemented on Linux, under developement * on Windows. */ __CG_QUALIFIER__ void thread_group::sync() const { switch (this->_type) { case internal::cg_multi_grid: { static_cast(this)->sync(); break; } case internal::cg_grid: { static_cast(this)->sync(); break; } case internal::cg_workgroup: { static_cast(this)->sync(); break; } case internal::cg_tiled_group: { static_cast(this)->sync(); break; } case internal::cg_coalesced_group: { static_cast(this)->sync(); break; } default: { __hip_assert(false && "invalid cooperative group type"); } } } /** * Implemenation of publicly exposed `wrapper` API on top of basic cooperative * group type APIs * @note This function is implemented on Linux, under developement * on Windows. */ template __CG_QUALIFIER__ uint32_t group_size(CGTy const& g) { return g.size(); } /** * Implemenation of publicly exposed `wrapper` API on top of basic cooperative * group type APIs * @note This function is implemented on Linux, under developement * on Windows. */ template __CG_QUALIFIER__ uint32_t thread_rank(CGTy const& g) { return g.thread_rank(); } /** * Implemenation of publicly exposed `wrapper` API on top of basic cooperative * group type APIs * @note This function is implemented on Linux, under developement * on Windows. */ template __CG_QUALIFIER__ bool is_valid(CGTy const& g) { return g.is_valid(); } /** * Implemenation of publicly exposed `wrapper` API on top of basic cooperative * group type APIs * @note This function is implemented on Linux, under developement * on Windows. */ template __CG_QUALIFIER__ void sync(CGTy const& g) { g.sync(); } /** * template class tile_base * @note This class is implemented on Linux, under developement * on Windows. */ template class tile_base { protected: _CG_STATIC_CONST_DECL_ unsigned int numThreads = tileSize; public: // Rank of the thread within this tile _CG_STATIC_CONST_DECL_ unsigned int thread_rank() { return (internal::workgroup::thread_rank() & (numThreads - 1)); } // Number of threads within this tile __CG_STATIC_QUALIFIER__ unsigned int size() { return numThreads; } }; /** * template class thread_block_tile_base * @note This class is implemented on Linux, under developement * on Windows. */ template class thread_block_tile_base : public tile_base { static_assert(is_valid_tile_size::value, "Tile size is either not a power of 2 or greater than the wavefront size"); using tile_base::numThreads; public: __CG_STATIC_QUALIFIER__ void sync() { internal::tiled_group::sync(); } template __CG_QUALIFIER__ T shfl(T var, int srcRank) const { static_assert(is_valid_type::value, "Neither an integer or float type."); return (__shfl(var, srcRank, numThreads)); } template __CG_QUALIFIER__ T shfl_down(T var, unsigned int lane_delta) const { static_assert(is_valid_type::value, "Neither an integer or float type."); return (__shfl_down(var, lane_delta, numThreads)); } template __CG_QUALIFIER__ T shfl_up(T var, unsigned int lane_delta) const { static_assert(is_valid_type::value, "Neither an integer or float type."); return (__shfl_up(var, lane_delta, numThreads)); } template __CG_QUALIFIER__ T shfl_xor(T var, unsigned int laneMask) const { static_assert(is_valid_type::value, "Neither an integer or float type."); return (__shfl_xor(var, laneMask, numThreads)); } }; /** \brief User exposed API that captures the state of the parent group pre-partition */ template class parent_group_info { public: // Returns the linear rank of the group within the set of tiles partitioned // from a parent group (bounded by meta_group_size) __CG_STATIC_QUALIFIER__ unsigned int meta_group_rank() { return ParentCGTy::thread_rank() / tileSize; } // Returns the number of groups created when the parent group was partitioned. __CG_STATIC_QUALIFIER__ unsigned int meta_group_size() { return (ParentCGTy::size() + tileSize - 1) / tileSize; } }; /** \brief Group type - thread_block_tile * * \details Represents one tile of thread group. * @note This type is implemented on Linux, under developement * on Windows. */ template class thread_block_tile_type : public thread_block_tile_base, public tiled_group, public parent_group_info { _CG_STATIC_CONST_DECL_ unsigned int numThreads = tileSize; protected: __CG_QUALIFIER__ thread_block_tile_type() : tiled_group(numThreads) { coalesced_info.tiled_info.size = numThreads; coalesced_info.tiled_info.is_tiled = true; } }; // Partial template specialization template class thread_block_tile_type : public thread_block_tile_base, public tiled_group { _CG_STATIC_CONST_DECL_ unsigned int numThreads = tileSize; typedef thread_block_tile_base tbtBase; protected: __CG_QUALIFIER__ thread_block_tile_type(unsigned int meta_group_rank, unsigned int meta_group_size) : tiled_group(numThreads) { coalesced_info.tiled_info.size = numThreads; coalesced_info.tiled_info.is_tiled = true; coalesced_info.tiled_info.meta_group_rank = meta_group_rank; coalesced_info.tiled_info.meta_group_size = meta_group_size; } public: using tbtBase::size; using tbtBase::sync; using tbtBase::thread_rank; __CG_QUALIFIER__ unsigned int meta_group_rank() const { return coalesced_info.tiled_info.meta_group_rank; } __CG_QUALIFIER__ unsigned int meta_group_size() const { return coalesced_info.tiled_info.meta_group_size; } // end of operative group /** * @} */ }; /** \brief User exposed API to partition groups. * * \details A collective operation that partitions the parent group into a one-dimensional, * row-major, tiling of subgroups. */ __CG_QUALIFIER__ thread_group tiled_partition(const thread_group& parent, unsigned int tile_size) { if (parent.cg_type() == internal::cg_tiled_group) { const tiled_group* cg = static_cast(&parent); return cg->new_tiled_group(tile_size); } else if(parent.cg_type() == internal::cg_coalesced_group) { const coalesced_group* cg = static_cast(&parent); return cg->new_tiled_group(tile_size); } else { const thread_block* tb = static_cast(&parent); return tb->new_tiled_group(tile_size); } } // Thread block type overload __CG_QUALIFIER__ thread_group tiled_partition(const thread_block& parent, unsigned int tile_size) { return (parent.new_tiled_group(tile_size)); } __CG_QUALIFIER__ tiled_group tiled_partition(const tiled_group& parent, unsigned int tile_size) { return (parent.new_tiled_group(tile_size)); } // If a coalesced group is passed to be partitioned, it should remain coalesced __CG_QUALIFIER__ coalesced_group tiled_partition(const coalesced_group& parent, unsigned int tile_size) { return (parent.new_tiled_group(tile_size)); } template class thread_block_tile; namespace impl { template class thread_block_tile_internal; template class thread_block_tile_internal : public thread_block_tile_type { protected: template __CG_QUALIFIER__ thread_block_tile_internal( const thread_block_tile_internal& g) : thread_block_tile_type(g.meta_group_rank(), g.meta_group_size()) {} __CG_QUALIFIER__ thread_block_tile_internal(const thread_block& g) : thread_block_tile_type() {} }; } // namespace impl template class thread_block_tile : public impl::thread_block_tile_internal { protected: __CG_QUALIFIER__ thread_block_tile(const ParentCGTy& g) : impl::thread_block_tile_internal(g) {} public: __CG_QUALIFIER__ operator thread_block_tile() const { return thread_block_tile(*this); } }; template class thread_block_tile : public impl::thread_block_tile_internal { template friend class thread_block_tile; protected: public: template __CG_QUALIFIER__ thread_block_tile(const thread_block_tile& g) : impl::thread_block_tile_internal(g) {} }; template class thread_block_tile; namespace impl { template struct tiled_partition_internal; template struct tiled_partition_internal : public thread_block_tile { __CG_QUALIFIER__ tiled_partition_internal(const thread_block& g) : thread_block_tile(g) {} }; } // namespace impl /** \brief User exposed API to partition groups. * * \details This constructs a templated class derieved from thread_group. * The template defines tile size of the new thread group at compile time. */ template __CG_QUALIFIER__ thread_block_tile tiled_partition(const ParentCGTy& g) { static_assert(is_valid_tile_size::value, "Tiled partition with size > wavefront size. Currently not supported "); return impl::tiled_partition_internal(g); } } // namespace cooperative_groups #if defined(__clang__) #pragma clang diagnostic pop #endif #endif // __cplusplus #endif // HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COOPERATIVE_GROUPS_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_fp16.h000066400000000000000000001623061450307266000234130ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HIP_FP16_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HIP_FP16_H #if defined(__clang__) #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wreserved-identifier" #pragma clang diagnostic ignored "-Wreserved-macro-identifier" #pragma clang diagnostic ignored "-Wc++98-compat" #pragma clang diagnostic ignored "-Wc++98-compat-pedantic" #pragma clang diagnostic ignored "-Wsign-conversion" #pragma clang diagnostic ignored "-Wfloat-conversion" #pragma clang diagnostic ignored "-Wdouble-promotion" #pragma clang diagnostic ignored "-Wnested-anon-types" #pragma clang diagnostic ignored "-Wgnu-anonymous-struct" #pragma clang diagnostic ignored "-Wfloat-equal" #endif #if defined(__HIPCC_RTC__) #define __HOST_DEVICE__ __device__ #else #define __HOST_DEVICE__ __host__ __device__ #include #include "hip/amd_detail/host_defines.h" #include #if defined(__cplusplus) #include #include #include #endif #endif // !defined(__HIPCC_RTC__) #if defined(__clang__) && defined(__HIP__) typedef _Float16 _Float16_2 __attribute__((ext_vector_type(2))); struct __half_raw { union { static_assert(sizeof(_Float16) == sizeof(unsigned short), ""); _Float16 data; unsigned short x; }; }; struct __half2_raw { union { static_assert(sizeof(_Float16_2) == sizeof(unsigned short[2]), ""); struct { unsigned short x; unsigned short y; }; _Float16_2 data; }; }; #if defined(__cplusplus) #if !defined(__HIPCC_RTC__) #include "hip_fp16_math_fwd.h" #include "amd_hip_vector_types.h" #include "host_defines.h" #include "amd_device_functions.h" #include "amd_warp_functions.h" #endif namespace std { template<> struct is_floating_point<_Float16> : std::true_type {}; } template using Enable_if_t = typename std::enable_if::type; // BEGIN STRUCT __HALF struct __half { protected: union { static_assert(sizeof(_Float16) == sizeof(unsigned short), ""); _Float16 data; unsigned short __x; }; public: // CREATORS __HOST_DEVICE__ __half() = default; __HOST_DEVICE__ __half(const __half_raw& x) : data{x.data} {} #if !defined(__HIP_NO_HALF_CONVERSIONS__) __HOST_DEVICE__ __half(decltype(data) x) : data{x} {} template< typename T, Enable_if_t{}>* = nullptr> __HOST_DEVICE__ __half(T x) : data{static_cast<_Float16>(x)} {} #endif __HOST_DEVICE__ __half(const __half&) = default; __HOST_DEVICE__ __half(__half&&) = default; __HOST_DEVICE__ ~__half() = default; // CREATORS - DEVICE ONLY #if !defined(__HIP_NO_HALF_CONVERSIONS__) template< typename T, Enable_if_t{}>* = nullptr> __HOST_DEVICE__ __half(T x) : data{static_cast<_Float16>(x)} {} #endif // MANIPULATORS __HOST_DEVICE__ __half& operator=(const __half&) = default; __HOST_DEVICE__ __half& operator=(__half&&) = default; __HOST_DEVICE__ __half& operator=(const __half_raw& x) { data = x.data; return *this; } __HOST_DEVICE__ volatile __half& operator=(const __half_raw& x) volatile { data = x.data; return *this; } volatile __half& operator=(const volatile __half_raw& x) volatile { data = x.data; return *this; } __half& operator=(__half_raw&& x) { data = x.data; return *this; } volatile __half& operator=(__half_raw&& x) volatile { data = x.data; return *this; } volatile __half& operator=(volatile __half_raw&& x) volatile { data = x.data; return *this; } #if !defined(__HIP_NO_HALF_CONVERSIONS__) template< typename T, Enable_if_t{}>* = nullptr> __HOST_DEVICE__ __half& operator=(T x) { data = static_cast<_Float16>(x); return *this; } #endif // MANIPULATORS - DEVICE ONLY #if !defined(__HIP_NO_HALF_CONVERSIONS__) template< typename T, Enable_if_t{}>* = nullptr> __device__ __half& operator=(T x) { data = static_cast<_Float16>(x); return *this; } #endif #if !defined(__HIP_NO_HALF_OPERATORS__) __device__ __half& operator+=(const __half& x) { data += x.data; return *this; } __device__ __half& operator-=(const __half& x) { data -= x.data; return *this; } __device__ __half& operator*=(const __half& x) { data *= x.data; return *this; } __device__ __half& operator/=(const __half& x) { data /= x.data; return *this; } __device__ __half& operator++() { ++data; return *this; } __device__ __half operator++(int) { __half tmp{*this}; ++*this; return tmp; } __device__ __half& operator--() { --data; return *this; } __device__ __half operator--(int) { __half tmp{*this}; --*this; return tmp; } #endif // ACCESSORS #if !defined(__HIP_NO_HALF_CONVERSIONS__) template< typename T, Enable_if_t{}>* = nullptr> __HOST_DEVICE__ operator T() const { return data; } #endif __HOST_DEVICE__ operator __half_raw() const { return __half_raw{data}; } __HOST_DEVICE__ operator __half_raw() const volatile { return __half_raw{data}; } #if !defined(__HIP_NO_HALF_CONVERSIONS__) template< typename T, Enable_if_t{}>* = nullptr> __HOST_DEVICE__ operator T() const { return data; } #endif #if !defined(__HIP_NO_HALF_OPERATORS__) __device__ __half operator+() const { return *this; } __device__ __half operator-() const { __half tmp{*this}; tmp.data = -tmp.data; return tmp; } #endif // FRIENDS #if !defined(__HIP_NO_HALF_OPERATORS__) friend inline __device__ __half operator+(const __half& x, const __half& y) { return __half{x} += y; } friend inline __device__ __half operator-(const __half& x, const __half& y) { return __half{x} -= y; } friend inline __device__ __half operator*(const __half& x, const __half& y) { return __half{x} *= y; } friend inline __device__ __half operator/(const __half& x, const __half& y) { return __half{x} /= y; } friend inline __device__ bool operator==(const __half& x, const __half& y) { return x.data == y.data; } friend inline __device__ bool operator!=(const __half& x, const __half& y) { return !(x == y); } friend inline __device__ bool operator<(const __half& x, const __half& y) { return x.data < y.data; } friend inline __device__ bool operator>(const __half& x, const __half& y) { return y.data < x.data; } friend inline __device__ bool operator<=(const __half& x, const __half& y) { return !(y < x); } friend inline __device__ bool operator>=(const __half& x, const __half& y) { return !(x < y); } #endif // !defined(__HIP_NO_HALF_OPERATORS__) }; // END STRUCT __HALF // BEGIN STRUCT __HALF2 struct __half2 { public: union { static_assert( sizeof(_Float16_2) == sizeof(unsigned short[2]), ""); struct { unsigned short x; unsigned short y; }; _Float16_2 data; }; // CREATORS __HOST_DEVICE__ __half2() = default; __HOST_DEVICE__ __half2(const __half2_raw& x) : data{x.data} {} __HOST_DEVICE__ __half2(decltype(data) x) : data{x} {} __HOST_DEVICE__ __half2(const __half& x, const __half& y) : data{ static_cast<__half_raw>(x).data, static_cast<__half_raw>(y).data} {} __HOST_DEVICE__ __half2(const __half2&) = default; __HOST_DEVICE__ __half2(__half2&&) = default; __HOST_DEVICE__ ~__half2() = default; // MANIPULATORS __HOST_DEVICE__ __half2& operator=(const __half2&) = default; __HOST_DEVICE__ __half2& operator=(__half2&&) = default; __HOST_DEVICE__ __half2& operator=(const __half2_raw& x) { data = x.data; return *this; } // MANIPULATORS - DEVICE ONLY #if !defined(__HIP_NO_HALF_OPERATORS__) __device__ __half2& operator+=(const __half2& x) { data += x.data; return *this; } __device__ __half2& operator-=(const __half2& x) { data -= x.data; return *this; } __device__ __half2& operator*=(const __half2& x) { data *= x.data; return *this; } __device__ __half2& operator/=(const __half2& x) { data /= x.data; return *this; } __device__ __half2& operator++() { return *this += _Float16_2{1, 1}; } __device__ __half2 operator++(int) { __half2 tmp{*this}; ++*this; return tmp; } __device__ __half2& operator--() { return *this -= _Float16_2{1, 1}; } __device__ __half2 operator--(int) { __half2 tmp{*this}; --*this; return tmp; } #endif // ACCESSORS __HOST_DEVICE__ operator decltype(data)() const { return data; } __HOST_DEVICE__ operator __half2_raw() const { __half2_raw r; r.data = data; return r; } // ACCESSORS - DEVICE ONLY #if !defined(__HIP_NO_HALF_OPERATORS__) __device__ __half2 operator+() const { return *this; } __device__ __half2 operator-() const { __half2 tmp{*this}; tmp.data = -tmp.data; return tmp; } #endif // FRIENDS #if !defined(__HIP_NO_HALF_OPERATORS__) friend inline __device__ __half2 operator+(const __half2& x, const __half2& y) { return __half2{x} += y; } friend inline __device__ __half2 operator-(const __half2& x, const __half2& y) { return __half2{x} -= y; } friend inline __device__ __half2 operator*(const __half2& x, const __half2& y) { return __half2{x} *= y; } friend inline __device__ __half2 operator/(const __half2& x, const __half2& y) { return __half2{x} /= y; } friend inline __device__ bool operator==(const __half2& x, const __half2& y) { auto r = x.data == y.data; return r.x != 0 && r.y != 0; } friend inline __device__ bool operator!=(const __half2& x, const __half2& y) { return !(x == y); } friend inline __device__ bool operator<(const __half2& x, const __half2& y) { auto r = x.data < y.data; return r.x != 0 && r.y != 0; } friend inline __device__ bool operator>(const __half2& x, const __half2& y) { return y < x; } friend inline __device__ bool operator<=(const __half2& x, const __half2& y) { return !(y < x); } friend inline __device__ bool operator>=(const __half2& x, const __half2& y) { return !(x < y); } #endif // !defined(__HIP_NO_HALF_OPERATORS__) }; // END STRUCT __HALF2 namespace { inline __HOST_DEVICE__ __half2 make_half2(__half x, __half y) { return __half2{x, y}; } inline __HOST_DEVICE__ __half __low2half(__half2 x) { return __half{__half_raw{static_cast<__half2_raw>(x).data.x}}; } inline __HOST_DEVICE__ __half __high2half(__half2 x) { return __half{__half_raw{static_cast<__half2_raw>(x).data.y}}; } inline __HOST_DEVICE__ __half2 __half2half2(__half x) { return __half2{x, x}; } inline __HOST_DEVICE__ __half2 __halves2half2(__half x, __half y) { return __half2{x, y}; } inline __HOST_DEVICE__ __half2 __low2half2(__half2 x) { return __half2{ _Float16_2{ static_cast<__half2_raw>(x).data.x, static_cast<__half2_raw>(x).data.x}}; } inline __HOST_DEVICE__ __half2 __high2half2(__half2 x) { return __half2{ _Float16_2{ static_cast<__half2_raw>(x).data.y, static_cast<__half2_raw>(x).data.y}}; } inline __HOST_DEVICE__ __half2 __lows2half2(__half2 x, __half2 y) { return __half2{ _Float16_2{ static_cast<__half2_raw>(x).data.x, static_cast<__half2_raw>(y).data.x}}; } inline __HOST_DEVICE__ __half2 __highs2half2(__half2 x, __half2 y) { return __half2{ _Float16_2{ static_cast<__half2_raw>(x).data.y, static_cast<__half2_raw>(y).data.y}}; } inline __HOST_DEVICE__ __half2 __lowhigh2highlow(__half2 x) { return __half2{ _Float16_2{ static_cast<__half2_raw>(x).data.y, static_cast<__half2_raw>(x).data.x}}; } // Bitcasts inline __device__ short __half_as_short(__half x) { return static_cast<__half_raw>(x).x; } inline __device__ unsigned short __half_as_ushort(__half x) { return static_cast<__half_raw>(x).x; } inline __device__ __half __short_as_half(short x) { __half_raw r; r.x = x; return r; } inline __device__ __half __ushort_as_half(unsigned short x) { __half_raw r; r.x = x; return r; } // float -> half | half2 inline __HOST_DEVICE__ __half __float2half(float x) { return __half_raw{static_cast<_Float16>(x)}; } inline __HOST_DEVICE__ __half __float2half_rn(float x) { return __half_raw{static_cast<_Float16>(x)}; } #if !defined(__HIPCC_RTC__) // TODO: rounding behaviour is not correct for host functions. inline __host__ __half __float2half_rz(float x) { return __half_raw{static_cast<_Float16>(x)}; } inline __host__ __half __float2half_rd(float x) { return __half_raw{static_cast<_Float16>(x)}; } inline __host__ __half __float2half_ru(float x) { return __half_raw{static_cast<_Float16>(x)}; } #endif inline __device__ __half __float2half_rz(float x) { return __half_raw{__ocml_cvtrtz_f16_f32(x)}; } inline __device__ __half __float2half_rd(float x) { return __half_raw{__ocml_cvtrtn_f16_f32(x)}; } inline __device__ __half __float2half_ru(float x) { return __half_raw{__ocml_cvtrtp_f16_f32(x)}; } inline __HOST_DEVICE__ __half2 __float2half2_rn(float x) { return __half2{ _Float16_2{ static_cast<_Float16>(x), static_cast<_Float16>(x)}}; } inline __HOST_DEVICE__ __half2 __floats2half2_rn(float x, float y) { return __half2{_Float16_2{ static_cast<_Float16>(x), static_cast<_Float16>(y)}}; } inline __HOST_DEVICE__ __half2 __float22half2_rn(float2 x) { return __floats2half2_rn(x.x, x.y); } // half | half2 -> float inline __HOST_DEVICE__ float __half2float(__half x) { return static_cast<__half_raw>(x).data; } inline __HOST_DEVICE__ float __low2float(__half2 x) { return static_cast<__half2_raw>(x).data.x; } inline __HOST_DEVICE__ float __high2float(__half2 x) { return static_cast<__half2_raw>(x).data.y; } inline __HOST_DEVICE__ float2 __half22float2(__half2 x) { return make_float2( static_cast<__half2_raw>(x).data.x, static_cast<__half2_raw>(x).data.y); } // half -> int inline __device__ int __half2int_rn(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ int __half2int_rz(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ int __half2int_rd(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ int __half2int_ru(__half x) { return static_cast<__half_raw>(x).data; } // int -> half inline __device__ __half __int2half_rn(int x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __int2half_rz(int x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __int2half_rd(int x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __int2half_ru(int x) { return __half_raw{static_cast<_Float16>(x)}; } // half -> short inline __device__ short __half2short_rn(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ short __half2short_rz(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ short __half2short_rd(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ short __half2short_ru(__half x) { return static_cast<__half_raw>(x).data; } // short -> half inline __device__ __half __short2half_rn(short x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __short2half_rz(short x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __short2half_rd(short x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __short2half_ru(short x) { return __half_raw{static_cast<_Float16>(x)}; } // half -> long long inline __device__ long long __half2ll_rn(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ long long __half2ll_rz(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ long long __half2ll_rd(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ long long __half2ll_ru(__half x) { return static_cast<__half_raw>(x).data; } // long long -> half inline __device__ __half __ll2half_rn(long long x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ll2half_rz(long long x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ll2half_rd(long long x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ll2half_ru(long long x) { return __half_raw{static_cast<_Float16>(x)}; } // half -> unsigned int inline __device__ unsigned int __half2uint_rn(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned int __half2uint_rz(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned int __half2uint_rd(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned int __half2uint_ru(__half x) { return static_cast<__half_raw>(x).data; } // unsigned int -> half inline __device__ __half __uint2half_rn(unsigned int x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __uint2half_rz(unsigned int x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __uint2half_rd(unsigned int x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __uint2half_ru(unsigned int x) { return __half_raw{static_cast<_Float16>(x)}; } // half -> unsigned short inline __device__ unsigned short __half2ushort_rn(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned short __half2ushort_rz(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned short __half2ushort_rd(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned short __half2ushort_ru(__half x) { return static_cast<__half_raw>(x).data; } // unsigned short -> half inline __device__ __half __ushort2half_rn(unsigned short x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ushort2half_rz(unsigned short x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ushort2half_rd(unsigned short x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ushort2half_ru(unsigned short x) { return __half_raw{static_cast<_Float16>(x)}; } // half -> unsigned long long inline __device__ unsigned long long __half2ull_rn(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned long long __half2ull_rz(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned long long __half2ull_rd(__half x) { return static_cast<__half_raw>(x).data; } inline __device__ unsigned long long __half2ull_ru(__half x) { return static_cast<__half_raw>(x).data; } // unsigned long long -> half inline __device__ __half __ull2half_rn(unsigned long long x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ull2half_rz(unsigned long long x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ull2half_rd(unsigned long long x) { return __half_raw{static_cast<_Float16>(x)}; } inline __device__ __half __ull2half_ru(unsigned long long x) { return __half_raw{static_cast<_Float16>(x)}; } // Load primitives inline __device__ __half __ldg(const __half* ptr) { return *ptr; } inline __device__ __half __ldcg(const __half* ptr) { return *ptr; } inline __device__ __half __ldca(const __half* ptr) { return *ptr; } inline __device__ __half __ldcs(const __half* ptr) { return *ptr; } inline __HOST_DEVICE__ __half2 __ldg(const __half2* ptr) { return *ptr; } inline __HOST_DEVICE__ __half2 __ldcg(const __half2* ptr) { return *ptr; } inline __HOST_DEVICE__ __half2 __ldca(const __half2* ptr) { return *ptr; } inline __HOST_DEVICE__ __half2 __ldcs(const __half2* ptr) { return *ptr; } // Relations inline __device__ bool __heq(__half x, __half y) { return static_cast<__half_raw>(x).data == static_cast<__half_raw>(y).data; } inline __device__ bool __hne(__half x, __half y) { return static_cast<__half_raw>(x).data != static_cast<__half_raw>(y).data; } inline __device__ bool __hle(__half x, __half y) { return static_cast<__half_raw>(x).data <= static_cast<__half_raw>(y).data; } inline __device__ bool __hge(__half x, __half y) { return static_cast<__half_raw>(x).data >= static_cast<__half_raw>(y).data; } inline __device__ bool __hlt(__half x, __half y) { return static_cast<__half_raw>(x).data < static_cast<__half_raw>(y).data; } inline __device__ bool __hgt(__half x, __half y) { return static_cast<__half_raw>(x).data > static_cast<__half_raw>(y).data; } inline __device__ bool __hequ(__half x, __half y) { return !(static_cast<__half_raw>(x).data < static_cast<__half_raw>(y).data) && !(static_cast<__half_raw>(x).data > static_cast<__half_raw>(y).data); } inline __device__ bool __hneu(__half x, __half y) { return !(static_cast<__half_raw>(x).data == static_cast<__half_raw>(y).data); } inline __device__ bool __hleu(__half x, __half y) { return !(static_cast<__half_raw>(x).data > static_cast<__half_raw>(y).data); } inline __device__ bool __hgeu(__half x, __half y) { return !(static_cast<__half_raw>(x).data < static_cast<__half_raw>(y).data); } inline __device__ bool __hltu(__half x, __half y) { return !(static_cast<__half_raw>(x).data >= static_cast<__half_raw>(y).data); } inline __device__ bool __hgtu(__half x, __half y) { return !(static_cast<__half_raw>(x).data <= static_cast<__half_raw>(y).data); } inline __HOST_DEVICE__ __half2 __heq2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(x).data == static_cast<__half2_raw>(y).data; return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hne2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(x).data != static_cast<__half2_raw>(y).data; return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hle2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(x).data <= static_cast<__half2_raw>(y).data; return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hge2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(x).data >= static_cast<__half2_raw>(y).data; return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hlt2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(x).data < static_cast<__half2_raw>(y).data; return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hgt2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(x).data > static_cast<__half2_raw>(y).data; return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hequ2(__half2 x, __half2 y) { auto r = !(static_cast<__half2_raw>(x).data < static_cast<__half2_raw>(y).data) && !(static_cast<__half2_raw>(x).data > static_cast<__half2_raw>(y).data); return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hneu2(__half2 x, __half2 y) { auto r = !(static_cast<__half2_raw>(x).data == static_cast<__half2_raw>(y).data); return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hleu2(__half2 x, __half2 y) { auto r = !(static_cast<__half2_raw>(x).data > static_cast<__half2_raw>(y).data); return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hgeu2(__half2 x, __half2 y) { auto r = !(static_cast<__half2_raw>(x).data < static_cast<__half2_raw>(y).data); return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hltu2(__half2 x, __half2 y) { auto r = !(static_cast<__half2_raw>(x).data >= static_cast<__half2_raw>(y).data); return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ __half2 __hgtu2(__half2 x, __half2 y) { auto r = !(static_cast<__half2_raw>(x).data <= static_cast<__half2_raw>(y).data); return __builtin_convertvector(-r, _Float16_2); } inline __HOST_DEVICE__ bool __hbeq2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__heq2(x, y)); return r.data.x != 0 && r.data.y != 0; } inline __HOST_DEVICE__ bool __hbne2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__hne2(x, y)); return r.data.x != 0 && r.data.y != 0; } inline __HOST_DEVICE__ bool __hble2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__hle2(x, y)); return r.data.x != 0 && r.data.y != 0; } inline __HOST_DEVICE__ bool __hbge2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__hge2(x, y)); return r.data.x != 0 && r.data.y != 0; } inline __HOST_DEVICE__ bool __hblt2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__hlt2(x, y)); return r.data.x != 0 && r.data.y != 0; } inline __HOST_DEVICE__ bool __hbgt2(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__hgt2(x, y)); return r.data.x != 0 && r.data.y != 0; } inline __HOST_DEVICE__ bool __hbequ2(__half2 x, __half2 y) { return __hbeq2(x, y); } inline __HOST_DEVICE__ bool __hbneu2(__half2 x, __half2 y) { return __hbne2(x, y); } inline __HOST_DEVICE__ bool __hbleu2(__half2 x, __half2 y) { return __hble2(x, y); } inline __HOST_DEVICE__ bool __hbgeu2(__half2 x, __half2 y) { return __hbge2(x, y); } inline __HOST_DEVICE__ bool __hbltu2(__half2 x, __half2 y) { return __hblt2(x, y); } inline __HOST_DEVICE__ bool __hbgtu2(__half2 x, __half2 y) { return __hbgt2(x, y); } inline __device__ __half __hmax(const __half x, const __half y) { return __half_raw{__ocml_fmax_f16(static_cast<__half_raw>(x).data, static_cast<__half_raw>(y).data)}; } inline __device__ __half __hmax_nan(const __half x, const __half y) { if(__ocml_isnan_f16(static_cast<__half_raw>(x).data)) { return x; } else if (__ocml_isnan_f16(static_cast<__half_raw>(y).data)) { return y; } return __hmax(x, y); } inline __device__ __half __hmin(const __half x, const __half y) { return __half_raw{__ocml_fmin_f16(static_cast<__half_raw>(x).data, static_cast<__half_raw>(y).data)}; } inline __device__ __half __hmin_nan(const __half x, const __half y) { if(__ocml_isnan_f16(static_cast<__half_raw>(x).data)) { return x; } else if (__ocml_isnan_f16(static_cast<__half_raw>(y).data)) { return y; } return __hmin(x, y); } // Arithmetic inline __device__ __half __clamp_01(__half x) { auto r = static_cast<__half_raw>(x); if (__hlt(x, __half_raw{0})) return __half_raw{0}; if (__hlt(__half_raw{1}, x)) return __half_raw{1}; return r; } inline __device__ __half __hadd(__half x, __half y) { return __half_raw{ static_cast<__half_raw>(x).data + static_cast<__half_raw>(y).data}; } inline __device__ __half __habs(__half x) { return __half_raw{ __ocml_fabs_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half __hsub(__half x, __half y) { return __half_raw{ static_cast<__half_raw>(x).data - static_cast<__half_raw>(y).data}; } inline __device__ __half __hmul(__half x, __half y) { return __half_raw{ static_cast<__half_raw>(x).data * static_cast<__half_raw>(y).data}; } inline __device__ __half __hadd_sat(__half x, __half y) { return __clamp_01(__hadd(x, y)); } inline __device__ __half __hsub_sat(__half x, __half y) { return __clamp_01(__hsub(x, y)); } inline __device__ __half __hmul_sat(__half x, __half y) { return __clamp_01(__hmul(x, y)); } inline __device__ __half __hfma(__half x, __half y, __half z) { return __half_raw{__ocml_fma_f16( static_cast<__half_raw>(x).data, static_cast<__half_raw>(y).data, static_cast<__half_raw>(z).data)}; } inline __device__ __half __hfma_sat(__half x, __half y, __half z) { return __clamp_01(__hfma(x, y, z)); } inline __device__ __half __hdiv(__half x, __half y) { return __half_raw{ static_cast<__half_raw>(x).data / static_cast<__half_raw>(y).data}; } inline __HOST_DEVICE__ __half2 __hadd2(__half2 x, __half2 y) { return __half2{ static_cast<__half2_raw>(x).data + static_cast<__half2_raw>(y).data}; } inline __HOST_DEVICE__ __half2 __habs2(__half2 x) { return __half2{ __ocml_fabs_2f16(static_cast<__half2_raw>(x).data)}; } inline __HOST_DEVICE__ __half2 __hsub2(__half2 x, __half2 y) { return __half2{ static_cast<__half2_raw>(x).data - static_cast<__half2_raw>(y).data}; } inline __HOST_DEVICE__ __half2 __hmul2(__half2 x, __half2 y) { return __half2{ static_cast<__half2_raw>(x).data * static_cast<__half2_raw>(y).data}; } inline __HOST_DEVICE__ __half2 __hadd2_sat(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__hadd2(x, y)); return __half2{ __clamp_01(__half_raw{r.data.x}), __clamp_01(__half_raw{r.data.y})}; } inline __HOST_DEVICE__ __half2 __hsub2_sat(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__hsub2(x, y)); return __half2{ __clamp_01(__half_raw{r.data.x}), __clamp_01(__half_raw{r.data.y})}; } inline __HOST_DEVICE__ __half2 __hmul2_sat(__half2 x, __half2 y) { auto r = static_cast<__half2_raw>(__hmul2(x, y)); return __half2{ __clamp_01(__half_raw{r.data.x}), __clamp_01(__half_raw{r.data.y})}; } inline __HOST_DEVICE__ __half2 __hfma2(__half2 x, __half2 y, __half2 z) { return __half2{__ocml_fma_2f16(x, y, z)}; } inline __HOST_DEVICE__ __half2 __hfma2_sat(__half2 x, __half2 y, __half2 z) { auto r = static_cast<__half2_raw>(__hfma2(x, y, z)); return __half2{ __clamp_01(__half_raw{r.data.x}), __clamp_01(__half_raw{r.data.y})}; } inline __HOST_DEVICE__ __half2 __h2div(__half2 x, __half2 y) { return __half2{ static_cast<__half2_raw>(x).data / static_cast<__half2_raw>(y).data}; } // Math functions #if defined(__clang__) && defined(__HIP__) inline __device__ float amd_mixed_dot(__half2 a, __half2 b, float c, bool saturate) { return __ockl_fdot2(static_cast<__half2_raw>(a).data, static_cast<__half2_raw>(b).data, c, saturate); } #endif inline __device__ __half htrunc(__half x) { return __half_raw{ __ocml_trunc_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hceil(__half x) { return __half_raw{ __ocml_ceil_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hfloor(__half x) { return __half_raw{ __ocml_floor_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hrint(__half x) { return __half_raw{ __ocml_rint_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hsin(__half x) { return __half_raw{ __ocml_sin_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hcos(__half x) { return __half_raw{ __ocml_cos_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hexp(__half x) { return __half_raw{ __ocml_exp_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hexp2(__half x) { return __half_raw{ __ocml_exp2_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hexp10(__half x) { return __half_raw{ __ocml_exp10_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hlog2(__half x) { return __half_raw{ __ocml_log2_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hlog(__half x) { return __half_raw{ __ocml_log_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hlog10(__half x) { return __half_raw{ __ocml_log10_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hrcp(__half x) { return __half_raw{ static_cast<_Float16>(__builtin_amdgcn_rcph(static_cast<__half_raw>(x).data))}; } inline __device__ __half hrsqrt(__half x) { return __half_raw{ __ocml_rsqrt_f16(static_cast<__half_raw>(x).data)}; } inline __device__ __half hsqrt(__half x) { return __half_raw{ __ocml_sqrt_f16(static_cast<__half_raw>(x).data)}; } inline __device__ bool __hisinf(__half x) { return __ocml_isinf_f16(static_cast<__half_raw>(x).data); } inline __device__ bool __hisnan(__half x) { return __ocml_isnan_f16(static_cast<__half_raw>(x).data); } inline __device__ __half __hneg(__half x) { return __half_raw{-static_cast<__half_raw>(x).data}; } inline __HOST_DEVICE__ __half2 h2trunc(__half2 x) { return __half2{__ocml_trunc_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2ceil(__half2 x) { return __half2{__ocml_ceil_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2floor(__half2 x) { return __half2{__ocml_floor_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2rint(__half2 x) { return __half2{__ocml_rint_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2sin(__half2 x) { return __half2{__ocml_sin_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2cos(__half2 x) { return __half2{__ocml_cos_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2exp(__half2 x) { return __half2{__ocml_exp_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2exp2(__half2 x) { return __half2{__ocml_exp2_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2exp10(__half2 x) { return __half2{__ocml_exp10_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2log2(__half2 x) { return __half2{__ocml_log2_2f16(x)}; } inline __HOST_DEVICE__ __half2 h2log(__half2 x) { return __ocml_log_2f16(x); } inline __HOST_DEVICE__ __half2 h2log10(__half2 x) { return __ocml_log10_2f16(x); } inline __HOST_DEVICE__ __half2 h2rcp(__half2 x) { return _Float16_2{static_cast<_Float16>(__builtin_amdgcn_rcph(x.x)), static_cast<_Float16>(__builtin_amdgcn_rcph(x.y))}; } inline __HOST_DEVICE__ __half2 h2rsqrt(__half2 x) { return __ocml_rsqrt_2f16(x); } inline __HOST_DEVICE__ __half2 h2sqrt(__half2 x) { return __ocml_sqrt_2f16(x); } inline __HOST_DEVICE__ __half2 __hisinf2(__half2 x) { auto r = __ocml_isinf_2f16(x); return __half2{_Float16_2{ static_cast<_Float16>(r.x), static_cast<_Float16>(r.y)}}; } inline __HOST_DEVICE__ __half2 __hisnan2(__half2 x) { auto r = __ocml_isnan_2f16(x); return __half2{_Float16_2{ static_cast<_Float16>(r.x), static_cast<_Float16>(r.y)}}; } inline __HOST_DEVICE__ __half2 __hneg2(__half2 x) { return __half2{-static_cast<__half2_raw>(x).data}; } } // Anonymous namespace. #if !defined(HIP_NO_HALF) using half = __half; using half2 = __half2; #endif __device__ inline __half __shfl(__half var, int src_lane, int width = warpSize) { union { int i; __half h; } tmp; tmp.h = var; tmp.i = __shfl(tmp.i, src_lane, width); return tmp.h; } __device__ inline __half2 __shfl(__half2 var, int src_lane, int width = warpSize) { union { int i; __half2 h; } tmp; tmp.h = var; tmp.i = __shfl(tmp.i, src_lane, width); return tmp.h; } __device__ inline __half __shfl_up(__half var, unsigned int lane_delta, int width = warpSize) { union { int i; __half h; } tmp; tmp.h = var; tmp.i = __shfl_up(tmp.i, lane_delta, width); return tmp.h; } __device__ inline __half2 __shfl_up(__half2 var, unsigned int lane_delta, int width = warpSize) { union { int i; __half2 h; } tmp; tmp.h = var; tmp.i = __shfl_up(tmp.i, lane_delta, width); return tmp.h; } __device__ inline __half __shfl_down(__half var, unsigned int lane_delta, int width = warpSize) { union { int i; __half h; } tmp; tmp.h = var; tmp.i = __shfl_down(tmp.i, lane_delta, width); return tmp.h; } __device__ inline __half2 __shfl_down(__half2 var, unsigned int lane_delta, int width = warpSize) { union { int i; __half2 h; } tmp; tmp.h = var; tmp.i = __shfl_down(tmp.i, lane_delta, width); return tmp.h; } __device__ inline __half __shfl_xor(__half var, int lane_mask, int width = warpSize) { union { int i; __half h; } tmp; tmp.h = var; tmp.i = __shfl_xor(tmp.i, lane_mask, width); return tmp.h; } __device__ inline __half2 __shfl_xor(__half2 var, int lane_mask, int width = warpSize) { union { int i; __half2 h; } tmp; tmp.h = var; tmp.i = __shfl_xor(tmp.i, lane_mask, width); return tmp.h; } #endif // defined(__cplusplus) #elif defined(__GNUC__) #if !defined(__HIPCC_RTC__) #include "hip_fp16_gcc.h" #endif #endif // !defined(__clang__) && defined(__GNUC__) #if defined(__clang__) #pragma clang diagnostic pop #endif #endif // HIP_INCLUDE_HIP_AMD_DETAIL_HIP_FP16_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_math_constants.h000066400000000000000000000134021450307266000256540ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef AMD_HIP_MATH_CONSTANTS_H #define AMD_HIP_MATH_CONSTANTS_H // single precision constants #define HIP_INF_F __int_as_float(0x7f800000U) #define HIP_NAN_F __int_as_float(0x7fffffffU) #define HIP_MIN_DENORM_F __int_as_float(0x00000001U) #define HIP_MAX_NORMAL_F __int_as_float(0x7f7fffffU) #define HIP_NEG_ZERO_F __int_as_float(0x80000000U) #define HIP_ZERO_F 0.0F #define HIP_ONE_F 1.0F #define HIP_SQRT_HALF_F 0.707106781F #define HIP_SQRT_HALF_HI_F 0.707106781F #define HIP_SQRT_HALF_LO_F 1.210161749e-08F #define HIP_SQRT_TWO_F 1.414213562F #define HIP_THIRD_F 0.333333333F #define HIP_PIO4_F 0.785398163F #define HIP_PIO2_F 1.570796327F #define HIP_3PIO4_F 2.356194490F #define HIP_2_OVER_PI_F 0.636619772F #define HIP_SQRT_2_OVER_PI_F 0.797884561F #define HIP_PI_F 3.141592654F #define HIP_L2E_F 1.442695041F #define HIP_L2T_F 3.321928094F #define HIP_LG2_F 0.301029996F #define HIP_LGE_F 0.434294482F #define HIP_LN2_F 0.693147181F #define HIP_LNT_F 2.302585093F #define HIP_LNPI_F 1.144729886F #define HIP_TWO_TO_M126_F 1.175494351e-38F #define HIP_TWO_TO_126_F 8.507059173e37F #define HIP_NORM_HUGE_F 3.402823466e38F #define HIP_TWO_TO_23_F 8388608.0F #define HIP_TWO_TO_24_F 16777216.0F #define HIP_TWO_TO_31_F 2147483648.0F #define HIP_TWO_TO_32_F 4294967296.0F #define HIP_REMQUO_BITS_F 3U #define HIP_REMQUO_MASK_F (~((~0U)< #ifdef __cplusplus extern "C" { #endif /** * @brief Query the installed library build name. * * This function can be used even when the library is not initialized. * * @returns Returns a string describing the build version of the library. The * string is owned by the library. */ const char* amd_dbgapi_get_build_name(); /** * @brief Query the installed library git hash. * * This function can be used even when the library is not initialized. * * @returns Returns git hash of the library. */ const char* amd_dbgapi_get_git_hash(); /** * @brief Query the installed library build ID. * * This function can be used even when the library is not initialized. * * @returns Returns build ID of the library. */ size_t amd_dbgapi_get_build_id(); #ifdef __cplusplus } /* extern "c" */ #endif //--- // Top part of file can be compiled with any compiler #if !defined(__HIPCC_RTC__) //#include #if __cplusplus #include #include #else #include #include #include #endif // __cplusplus #else typedef unsigned int uint32_t; typedef unsigned long long uint64_t; typedef signed int int32_t; typedef signed long long int64_t; namespace std { using ::uint32_t; using ::uint64_t; using ::int32_t; using ::int64_t; } #endif // !defined(__HIPCC_RTC__) #if __HIP_CLANG_ONLY__ #if !defined(__align__) #define __align__(x) __attribute__((aligned(x))) #endif #define CUDA_SUCCESS hipSuccess #if !defined(__HIPCC_RTC__) #include extern int HIP_TRACE_API; #endif // !defined(__HIPCC_RTC__) #ifdef __cplusplus #include #endif #include #include #include #include #include #include // TODO-HCC remove old definitions ; ~1602 hcc supports __HCC_ACCELERATOR__ define. #if defined(__KALMAR_ACCELERATOR__) && !defined(__HCC_ACCELERATOR__) #define __HCC_ACCELERATOR__ __KALMAR_ACCELERATOR__ #endif // Feature tests: #if (defined(__HCC_ACCELERATOR__) && (__HCC_ACCELERATOR__ != 0)) || __HIP_DEVICE_COMPILE__ // Device compile and not host compile: // 32-bit Atomics: #define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1) #define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1) #define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1) #define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1) #define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (1) // 64-bit Atomics: #define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1) #define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (1) // Doubles #define __HIP_ARCH_HAS_DOUBLES__ (1) // warp cross-lane operations: #define __HIP_ARCH_HAS_WARP_VOTE__ (1) #define __HIP_ARCH_HAS_WARP_BALLOT__ (1) #define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1) #define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0) // sync #define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (1) #define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0) // misc #define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0) #define __HIP_ARCH_HAS_3DGRID__ (1) #define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0) #endif /* Device feature flags */ #define launch_bounds_impl0(requiredMaxThreadsPerBlock) \ __attribute__((amdgpu_flat_work_group_size(1, requiredMaxThreadsPerBlock))) #define launch_bounds_impl1(requiredMaxThreadsPerBlock, minBlocksPerMultiprocessor) \ __attribute__((amdgpu_flat_work_group_size(1, requiredMaxThreadsPerBlock), \ amdgpu_waves_per_eu(minBlocksPerMultiprocessor))) #define select_impl_(_1, _2, impl_, ...) impl_ #define __launch_bounds__(...) \ select_impl_(__VA_ARGS__, launch_bounds_impl1, launch_bounds_impl0, )(__VA_ARGS__) #if !defined(__HIPCC_RTC__) __host__ inline void* __get_dynamicgroupbaseptr() { return nullptr; } #endif // !defined(__HIPCC_RTC__) // End doxygen API: /** * @} */ // // hip-clang functions // #if !defined(__HIPCC_RTC__) #define HIP_KERNEL_NAME(...) __VA_ARGS__ #define HIP_SYMBOL(X) X typedef int hipLaunchParm; template ::type* = nullptr> void pArgs(const std::tuple&, void*) {} template ::type* = nullptr> void pArgs(const std::tuple& formals, void** _vargs) { using T = typename std::tuple_element >::type; static_assert(!std::is_reference{}, "A __global__ function cannot have a reference as one of its " "arguments."); #if defined(HIP_STRICT) static_assert(std::is_trivially_copyable{}, "Only TriviallyCopyable types can be arguments to a __global__ " "function"); #endif _vargs[n] = const_cast(reinterpret_cast(&std::get(formals))); return pArgs(formals, _vargs); } template std::tuple validateArgsCountType(void (*kernel)(Formals...), std::tuple(actuals)) { static_assert(sizeof...(Formals) == sizeof...(Actuals), "Argument Count Mismatch"); std::tuple to_formals{std::move(actuals)}; return to_formals; } #if defined(HIP_TEMPLATE_KERNEL_LAUNCH) template void hipLaunchKernelGGL(F kernel, const dim3& numBlocks, const dim3& dimBlocks, std::uint32_t sharedMemBytes, hipStream_t stream, Args... args) { constexpr size_t count = sizeof...(Args); auto tup_ = std::tuple{args...}; auto tup = validateArgsCountType(kernel, tup_); void* _Args[count]; pArgs<0>(tup, _Args); auto k = reinterpret_cast(kernel); hipLaunchKernel(k, numBlocks, dimBlocks, _Args, sharedMemBytes, stream); } #else #define hipLaunchKernelGGLInternal(kernelName, numBlocks, numThreads, memPerBlock, streamId, ...) \ do { \ kernelName<<<(numBlocks), (numThreads), (memPerBlock), (streamId)>>>(__VA_ARGS__); \ } while (0) #define hipLaunchKernelGGL(kernelName, ...) hipLaunchKernelGGLInternal((kernelName), __VA_ARGS__) #endif #include #endif // !defined(__HIPCC_RTC__) extern "C" __device__ __attribute__((const)) size_t __ockl_get_local_id(uint); extern "C" __device__ __attribute__((const)) size_t __ockl_get_group_id(uint); extern "C" __device__ __attribute__((const)) size_t __ockl_get_local_size(uint); extern "C" __device__ __attribute__((const)) size_t __ockl_get_num_groups(uint); struct __HIP_BlockIdx { __device__ std::uint32_t operator()(std::uint32_t x) const noexcept { return __ockl_get_group_id(x); } }; struct __HIP_BlockDim { __device__ std::uint32_t operator()(std::uint32_t x) const noexcept { return __ockl_get_local_size(x); } }; struct __HIP_GridDim { __device__ std::uint32_t operator()(std::uint32_t x) const noexcept { return __ockl_get_num_groups(x); } }; struct __HIP_ThreadIdx { __device__ std::uint32_t operator()(std::uint32_t x) const noexcept { return __ockl_get_local_id(x); } }; #if defined(__HIPCC_RTC__) typedef struct dim3 { uint32_t x; ///< x uint32_t y; ///< y uint32_t z; ///< z #ifdef __cplusplus constexpr __device__ dim3(uint32_t _x = 1, uint32_t _y = 1, uint32_t _z = 1) : x(_x), y(_y), z(_z){}; #endif } dim3; #endif // !defined(__HIPCC_RTC__) template struct __HIP_Coordinates { using R = decltype(F{}(0)); struct __X { __device__ operator R() const noexcept { return F{}(0); } __device__ R operator+=(const R& rhs) { return F{}(0) + rhs; } }; struct __Y { __device__ operator R() const noexcept { return F{}(1); } __device__ R operator+=(const R& rhs) { return F{}(1) + rhs; } }; struct __Z { __device__ operator R() const noexcept { return F{}(2); } __device__ R operator+=(const R& rhs) { return F{}(2) + rhs; } }; static constexpr __X x{}; static constexpr __Y y{}; static constexpr __Z z{}; #ifdef __cplusplus __device__ operator dim3() const { return dim3(x, y, z); } #endif }; template #if !defined(_MSC_VER) __attribute__((weak)) #endif constexpr typename __HIP_Coordinates::__X __HIP_Coordinates::x; template #if !defined(_MSC_VER) __attribute__((weak)) #endif constexpr typename __HIP_Coordinates::__Y __HIP_Coordinates::y; template #if !defined(_MSC_VER) __attribute__((weak)) #endif constexpr typename __HIP_Coordinates::__Z __HIP_Coordinates::z; extern "C" __device__ __attribute__((const)) size_t __ockl_get_global_size(uint); inline __device__ std::uint32_t operator*(__HIP_Coordinates<__HIP_GridDim>::__X, __HIP_Coordinates<__HIP_BlockDim>::__X) noexcept { return __ockl_get_global_size(0); } inline __device__ std::uint32_t operator*(__HIP_Coordinates<__HIP_BlockDim>::__X, __HIP_Coordinates<__HIP_GridDim>::__X) noexcept { return __ockl_get_global_size(0); } inline __device__ std::uint32_t operator*(__HIP_Coordinates<__HIP_GridDim>::__Y, __HIP_Coordinates<__HIP_BlockDim>::__Y) noexcept { return __ockl_get_global_size(1); } inline __device__ std::uint32_t operator*(__HIP_Coordinates<__HIP_BlockDim>::__Y, __HIP_Coordinates<__HIP_GridDim>::__Y) noexcept { return __ockl_get_global_size(1); } inline __device__ std::uint32_t operator*(__HIP_Coordinates<__HIP_GridDim>::__Z, __HIP_Coordinates<__HIP_BlockDim>::__Z) noexcept { return __ockl_get_global_size(2); } inline __device__ std::uint32_t operator*(__HIP_Coordinates<__HIP_BlockDim>::__Z, __HIP_Coordinates<__HIP_GridDim>::__Z) noexcept { return __ockl_get_global_size(2); } static constexpr __HIP_Coordinates<__HIP_BlockDim> blockDim{}; static constexpr __HIP_Coordinates<__HIP_BlockIdx> blockIdx{}; static constexpr __HIP_Coordinates<__HIP_GridDim> gridDim{}; static constexpr __HIP_Coordinates<__HIP_ThreadIdx> threadIdx{}; extern "C" __device__ __attribute__((const)) size_t __ockl_get_local_id(uint); #define hipThreadIdx_x (__ockl_get_local_id(0)) #define hipThreadIdx_y (__ockl_get_local_id(1)) #define hipThreadIdx_z (__ockl_get_local_id(2)) extern "C" __device__ __attribute__((const)) size_t __ockl_get_group_id(uint); #define hipBlockIdx_x (__ockl_get_group_id(0)) #define hipBlockIdx_y (__ockl_get_group_id(1)) #define hipBlockIdx_z (__ockl_get_group_id(2)) extern "C" __device__ __attribute__((const)) size_t __ockl_get_local_size(uint); #define hipBlockDim_x (__ockl_get_local_size(0)) #define hipBlockDim_y (__ockl_get_local_size(1)) #define hipBlockDim_z (__ockl_get_local_size(2)) extern "C" __device__ __attribute__((const)) size_t __ockl_get_num_groups(uint); #define hipGridDim_x (__ockl_get_num_groups(0)) #define hipGridDim_y (__ockl_get_num_groups(1)) #define hipGridDim_z (__ockl_get_num_groups(2)) #include #if __HIP_HCC_COMPAT_MODE__ // Define HCC work item functions in terms of HIP builtin variables. #pragma push_macro("__DEFINE_HCC_FUNC") #define __DEFINE_HCC_FUNC(hc_fun,hip_var) \ inline __device__ __attribute__((always_inline)) uint hc_get_##hc_fun(uint i) { \ if (i==0) \ return hip_var.x; \ else if(i==1) \ return hip_var.y; \ else \ return hip_var.z; \ } __DEFINE_HCC_FUNC(workitem_id, threadIdx) __DEFINE_HCC_FUNC(group_id, blockIdx) __DEFINE_HCC_FUNC(group_size, blockDim) __DEFINE_HCC_FUNC(num_groups, gridDim) #pragma pop_macro("__DEFINE_HCC_FUNC") extern "C" __device__ __attribute__((const)) size_t __ockl_get_global_id(uint); inline __device__ __attribute__((always_inline)) uint hc_get_workitem_absolute_id(int dim) { return (uint)__ockl_get_global_id(dim); } #endif #if !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ #if !defined(__HIPCC_RTC__) // Support std::complex. #if !_OPENMP || __HIP_ENABLE_CUDA_WRAPPER_FOR_OPENMP__ #pragma push_macro("__CUDA__") #define __CUDA__ #include <__clang_cuda_math_forward_declares.h> #include <__clang_cuda_complex_builtins.h> // Workaround for using libc++ with HIP-Clang. // The following headers requires clang include path before standard C++ include path. // However libc++ include path requires to be before clang include path. // To workaround this, we pass -isystem with the parent directory of clang include // path instead of the clang include path itself. #include #include #include #undef __CUDA__ #pragma pop_macro("__CUDA__") #endif // !_OPENMP || __HIP_ENABLE_CUDA_WRAPPER_FOR_OPENMP__ #endif // !defined(__HIPCC_RTC__) #endif // !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ #endif // __HIP_CLANG_ONLY__ #endif // HIP_AMD_DETAIL_RUNTIME_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_runtime_pt_api.h000066400000000000000000000236161450307266000256560ustar00rootroot00000000000000/* Copyright (c) 2022 - Present Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #ifndef HIP_INCLUDE_HIP_HIP_RUNTIME_PT_API_H #define HIP_INCLUDE_HIP_HIP_RUNTIME_PT_API_H #if (defined(__HIP_PLATFORM_HCC__) || defined(__HIP_PLATFORM_AMD__)) && !(defined(__HIP_PLATFORM_NVCC__) || defined(__HIP_PLATFORM_NVIDIA__)) /// hipStreamPerThread implementation #if defined(HIP_API_PER_THREAD_DEFAULT_STREAM) #define __HIP_STREAM_PER_THREAD #define __HIP_API_SPT(api) api ## _spt #else #define __HIP_API_SPT(api) api #endif #if defined(__HIP_STREAM_PER_THREAD) // Memory APIs #define hipMemcpy __HIP_API_SPT(hipMemcpy) #define hipMemcpyToSymbol __HIP_API_SPT(hipMemcpyToSymbol) #define hipMemcpyFromSymbol __HIP_API_SPT(hipMemcpyFromSymbol) #define hipMemcpy2D __HIP_API_SPT(hipMemcpy2D) #define hipMemcpy2DFromArray __HIP_API_SPT(hipMemcpy2DFromArray) #define hipMemcpy3D __HIP_API_SPT(hipMemcpy3D) #define hipMemset __HIP_API_SPT(hipMemset) #define hipMemset2D __HIP_API_SPT(hipMemset2D) #define hipMemset3D __HIP_API_SPT(hipMemset3D) #define hipMemcpyAsync __HIP_API_SPT(hipMemcpyAsync) #define hipMemset3DAsync __HIP_API_SPT(hipMemset3DAsync) #define hipMemset2DAsync __HIP_API_SPT(hipMemset2DAsync) #define hipMemsetAsync __HIP_API_SPT(hipMemsetAsync) #define hipMemcpy3DAsync __HIP_API_SPT(hipMemcpy3DAsync) #define hipMemcpy2DAsync __HIP_API_SPT(hipMemcpy2DAsync) #define hipMemcpyFromSymbolAsync __HIP_API_SPT(hipMemcpyFromSymbolAsync) #define hipMemcpyToSymbolAsync __HIP_API_SPT(hipMemcpyToSymbolAsync) #define hipMemcpyFromArray __HIP_API_SPT(hipMemcpyFromArray) #define hipMemcpy2DToArray __HIP_API_SPT(hipMemcpy2DToArray) #define hipMemcpy2DFromArrayAsync __HIP_API_SPT(hipMemcpy2DFromArrayAsync) #define hipMemcpy2DToArrayAsync __HIP_API_SPT(hipMemcpy2DToArrayAsync) // Stream APIs #define hipStreamSynchronize __HIP_API_SPT(hipStreamSynchronize) #define hipStreamQuery __HIP_API_SPT(hipStreamQuery) #define hipStreamGetFlags __HIP_API_SPT(hipStreamGetFlags) #define hipStreamGetPriority __HIP_API_SPT(hipStreamGetPriority) #define hipStreamWaitEvent __HIP_API_SPT(hipStreamWaitEvent) #define hipStreamAddCallback __HIP_API_SPT(hipStreamAddCallback) #define hipLaunchHostFunc __HIP_API_SPT(hipLaunchHostFunc) // Event APIs #define hipEventRecord __HIP_API_SPT(hipEventRecord) // Launch APIs #define hipLaunchKernel __HIP_API_SPT(hipLaunchKernel) #define hipLaunchCooperativeKernel __HIP_API_SPT(hipLaunchCooperativeKernel) // Graph APIs #define hipGraphLaunch __HIP_API_SPT(hipGraphLaunch) #define hipStreamBeginCapture __HIP_API_SPT(hipStreamBeginCapture) #define hipStreamEndCapture __HIP_API_SPT(hipStreamEndCapture) #define hipStreamIsCapturing __HIP_API_SPT(hipStreamIsCapturing) #define hipStreamGetCaptureInfo __HIP_API_SPT(hipStreamGetCaptureInfo) #define hipStreamGetCaptureInfo_v2 __HIP_API_SPT(hipStreamGetCaptureInfo_v2) #endif #ifdef __cplusplus extern "C" { #endif hipError_t hipMemcpy_spt(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind); hipError_t hipMemcpyToSymbol_spt(const void* symbol, const void* src, size_t sizeBytes, size_t offset __dparm(0), hipMemcpyKind kind __dparm(hipMemcpyHostToDevice)); hipError_t hipMemcpyFromSymbol_spt(void* dst, const void* symbol,size_t sizeBytes, size_t offset __dparm(0), hipMemcpyKind kind __dparm(hipMemcpyDeviceToHost)); hipError_t hipMemcpy2D_spt(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind); hipError_t hipMemcpy2DFromArray_spt( void* dst, size_t dpitch, hipArray_const_t src, size_t wOffset, size_t hOffset, size_t width, size_t height, hipMemcpyKind kind); hipError_t hipMemcpy3D_spt(const struct hipMemcpy3DParms* p); hipError_t hipMemset_spt(void* dst, int value, size_t sizeBytes); hipError_t hipMemsetAsync_spt(void* dst, int value, size_t sizeBytes, hipStream_t stream); hipError_t hipMemset2D_spt(void* dst, size_t pitch, int value, size_t width, size_t height); hipError_t hipMemset2DAsync_spt(void* dst, size_t pitch, int value, size_t width, size_t height, hipStream_t stream); hipError_t hipMemset3DAsync_spt(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hipStream_t stream); hipError_t hipMemset3D_spt(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent ); hipError_t hipMemcpyAsync_spt(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream); hipError_t hipMemcpy3DAsync_spt(const hipMemcpy3DParms* p, hipStream_t stream); hipError_t hipMemcpy2DAsync_spt(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream); hipError_t hipMemcpyFromSymbolAsync_spt(void* dst, const void* symbol, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream); hipError_t hipMemcpyToSymbolAsync_spt(const void* symbol, const void* src, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream); hipError_t hipMemcpyFromArray_spt(void* dst, hipArray_const_t src, size_t wOffsetSrc, size_t hOffset, size_t count, hipMemcpyKind kind); hipError_t hipMemcpy2DToArray_spt(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind); hipError_t hipMemcpy2DFromArrayAsync_spt(void* dst, size_t dpitch, hipArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream); hipError_t hipMemcpy2DToArrayAsync_spt(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream); hipError_t hipStreamQuery_spt(hipStream_t stream); hipError_t hipStreamSynchronize_spt(hipStream_t stream); hipError_t hipStreamGetPriority_spt(hipStream_t stream, int* priority); hipError_t hipStreamWaitEvent_spt(hipStream_t stream, hipEvent_t event, unsigned int flags); hipError_t hipStreamGetFlags_spt(hipStream_t stream, unsigned int* flags); hipError_t hipStreamAddCallback_spt(hipStream_t stream, hipStreamCallback_t callback, void* userData, unsigned int flags); #ifdef __cplusplus hipError_t hipEventRecord_spt(hipEvent_t event, hipStream_t stream = NULL); #else hipError_t hipEventRecord_spt(hipEvent_t event, hipStream_t stream); #endif hipError_t hipLaunchCooperativeKernel_spt(const void* f, dim3 gridDim, dim3 blockDim, void **kernelParams, uint32_t sharedMemBytes, hipStream_t hStream); hipError_t hipLaunchKernel_spt(const void* function_address, dim3 numBlocks, dim3 dimBlocks, void** args, size_t sharedMemBytes, hipStream_t stream); hipError_t hipGraphLaunch_spt(hipGraphExec_t graphExec, hipStream_t stream); hipError_t hipStreamBeginCapture_spt(hipStream_t stream, hipStreamCaptureMode mode); hipError_t hipStreamEndCapture_spt(hipStream_t stream, hipGraph_t* pGraph); hipError_t hipStreamIsCapturing_spt(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus); hipError_t hipStreamGetCaptureInfo_spt(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus, unsigned long long* pId); hipError_t hipStreamGetCaptureInfo_v2_spt(hipStream_t stream, hipStreamCaptureStatus* captureStatus_out, unsigned long long* id_out, hipGraph_t* graph_out, const hipGraphNode_t** dependencies_out, size_t* numDependencies_out); hipError_t hipLaunchHostFunc_spt(hipStream_t stream, hipHostFn_t fn, void* userData); #ifdef __cplusplus } #endif // extern "C" #endif //(defined(__HIP_PLATFORM_HCC__) || defined(__HIP_PLATFORM_AMD__)) && !(defined(__HIP_PLATFORM_NVCC__) || defined(__HIP_PLATFORM_NVIDIA__)) #endif //HIP_INCLUDE_HIP_HIP_RUNTIME_PT_API_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_unsafe_atomics.h000066400000000000000000000574671450307266000256520ustar00rootroot00000000000000/* Copyright (c) 2021 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #ifdef __cplusplus #if defined(__clang__) #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wold-style-cast" #endif /** * @brief Unsafe floating point rmw atomic add. * * Performs a relaxed read-modify-write floating point atomic add with * device memory scope. Original value at \p addr is returned and * the value of \p addr is updated to have the original value plus \p value * * @note This operation currently only performs different operations for * the gfx90a target. Other devices continue to use safe atomics. * * It can be used to generate code that uses fast hardware floating point atomic * operations which may handle rounding and subnormal values differently than * non-atomic floating point operations. * * The operation is not always safe and can have undefined behavior unless * following condition are met: * * - \p addr is at least 4 bytes aligned * - If \p addr is a global segment address, it is in a coarse grain allocation. * Passing in global segment addresses in fine grain allocations will result in * undefined behavior and is not supported. * * @param [in,out] addr Pointer to value to be increment by \p value. * @param [in] value Value by \p addr is to be incremented. * @return Original value contained in \p addr. */ __device__ inline float unsafeAtomicAdd(float* addr, float value) { #if defined(__gfx90a__) && \ __has_builtin(__builtin_amdgcn_is_shared) && \ __has_builtin(__builtin_amdgcn_is_private) && \ __has_builtin(__builtin_amdgcn_ds_atomic_fadd_f32) && \ __has_builtin(__builtin_amdgcn_global_atomic_fadd_f32) if (__builtin_amdgcn_is_shared( (const __attribute__((address_space(0))) void*)addr)) return __builtin_amdgcn_ds_atomic_fadd_f32(addr, value); else if (__builtin_amdgcn_is_private( (const __attribute__((address_space(0))) void*)addr)) { float temp = *addr; *addr = temp + value; return temp; } else return __builtin_amdgcn_global_atomic_fadd_f32(addr, value); #elif __has_builtin(__hip_atomic_fetch_add) return __hip_atomic_fetch_add(addr, value, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #else return __atomic_fetch_add(addr, value, __ATOMIC_RELAXED); #endif } /** * @brief Unsafe floating point rmw atomic max. * * Performs a relaxed read-modify-write floating point atomic max with * device memory scope. The original value at \p addr is returned and * the value at \p addr is replaced by \p val if greater. * * @note This operation is currently identical to that performed by * atomicMax and is included for completeness. * * @param [in,out] addr Pointer to value to be updated * @param [in] val Value used to update the value at \p addr. * @return Original value contained in \p addr. */ __device__ inline float unsafeAtomicMax(float* addr, float val) { #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) float value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value < val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned int *uaddr = (unsigned int *)addr; unsigned int value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __uint_as_float(value) < val) { done = __atomic_compare_exchange_n(uaddr, &value, __float_as_uint(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __uint_as_float(value); #endif } /** * @brief Unsafe floating point rmw atomic min. * * Performs a relaxed read-modify-write floating point atomic min with * device memory scope. The original value at \p addr is returned and * the value at \p addr is replaced by \p val if lesser. * * @note This operation is currently identical to that performed by * atomicMin and is included for completeness. * * @param [in,out] addr Pointer to value to be updated * @param [in] val Value used to update the value at \p addr. * @return Original value contained in \p addr. */ __device__ inline float unsafeAtomicMin(float* addr, float val) { #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) float value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value > val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned int *uaddr = (unsigned int *)addr; unsigned int value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __uint_as_float(value) > val) { done = __atomic_compare_exchange_n(uaddr, &value, __float_as_uint(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __uint_as_float(value); #endif } /** * @brief Unsafe double precision rmw atomic add. * * Performs a relaxed read-modify-write double precision atomic add with * device memory scope. Original value at \p addr is returned and * the value of \p addr is updated to have the original value plus \p value * * @note This operation currently only performs different operations for * the gfx90a target. Other devices continue to use safe atomics. * * It can be used to generate code that uses fast hardware floating point atomic * operations which may handle rounding and subnormal values differently than * non-atomic floating point operations. * * The operation is not always safe and can have undefined behavior unless * following condition are met: * * - \p addr is at least 8 byte aligned * - If \p addr is a global segment address, it is in a coarse grain allocation. * Passing in global segment addresses in fine grain allocations will result in * undefined behavior and are not supported. * * @param [in,out] addr Pointer to value to be updated. * @param [in] value Value by \p addr is to be incremented. * @return Original value contained in \p addr. */ __device__ inline double unsafeAtomicAdd(double* addr, double value) { #if defined(__gfx90a__) && __has_builtin(__builtin_amdgcn_flat_atomic_fadd_f64) return __builtin_amdgcn_flat_atomic_fadd_f64(addr, value); #elif defined (__hip_atomic_fetch_add) return __hip_atomic_fetch_add(addr, value, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #else return __atomic_fetch_add(addr, value, __ATOMIC_RELAXED); #endif } /** * @brief Unsafe double precision rmw atomic max. * * Performs a relaxed read-modify-write double precision atomic max with * device memory scope. Original value at \p addr is returned and * the value of \p addr is updated with \p val if greater. * * @note This operation currently only performs different operations for * the gfx90a target. Other devices continue to use safe atomics. * * It can be used to generate code that uses fast hardware floating point atomic * operations which may handle rounding and subnormal values differently than * non-atomic floating point operations. * * The operation is not always safe and can have undefined behavior unless * following condition are met: * * - \p addr is at least 8 byte aligned * - If \p addr is a global segment address, it is in a coarse grain allocation. * Passing in global segment addresses in fine grain allocations will result in * undefined behavior and are not supported. * * @param [in,out] addr Pointer to value to be updated. * @param [in] val Value used to updated the contents at \p addr * @return Original value contained at \p addr. */ __device__ inline double unsafeAtomicMax(double* addr, double val) { #if (defined(__gfx90a__) || defined(__gfx940__) || defined(__gfx941__) || defined(__gfx942__)) && \ __has_builtin(__builtin_amdgcn_flat_atomic_fmax_f64) return __builtin_amdgcn_flat_atomic_fmax_f64(addr, val); #else #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) double value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value < val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned long long *uaddr = (unsigned long long *)addr; unsigned long long value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __longlong_as_double(value) < val) { done = __atomic_compare_exchange_n(uaddr, &value, __double_as_longlong(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __longlong_as_double(value); #endif #endif } /** * @brief Unsafe double precision rmw atomic min. * * Performs a relaxed read-modify-write double precision atomic min with * device memory scope. Original value at \p addr is returned and * the value of \p addr is updated with \p val if lesser. * * @note This operation currently only performs different operations for * the gfx90a target. Other devices continue to use safe atomics. * * It can be used to generate code that uses fast hardware floating point atomic * operations which may handle rounding and subnormal values differently than * non-atomic floating point operations. * * The operation is not always safe and can have undefined behavior unless * following condition are met: * * - \p addr is at least 8 byte aligned * - If \p addr is a global segment address, it is in a coarse grain allocation. * Passing in global segment addresses in fine grain allocations will result in * undefined behavior and are not supported. * * @param [in,out] addr Pointer to value to be updated. * @param [in] val Value used to updated the contents at \p addr * @return Original value contained at \p addr. */ __device__ inline double unsafeAtomicMin(double* addr, double val) { #if (defined(__gfx90a__) || defined(__gfx940__) || defined(__gfx941__) || defined(__gfx942__)) && \ __has_builtin(__builtin_amdgcn_flat_atomic_fmin_f64) return __builtin_amdgcn_flat_atomic_fmin_f64(addr, val); #else #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) double value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value > val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned long long *uaddr = (unsigned long long *)addr; unsigned long long value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __longlong_as_double(value) > val) { done = __atomic_compare_exchange_n(uaddr, &value, __double_as_longlong(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __longlong_as_double(value); #endif #endif } /** * @brief Safe floating point rmw atomic add. * * Performs a relaxed read-modify-write floating point atomic add with * device memory scope. Original value at \p addr is returned and * the value of \p addr is updated to have the original value plus \p value * * @note This operation ensures that, on all targets, we produce safe atomics. * This will be the case even when -munsafe-fp-atomics is passed into the compiler. * * @param [in,out] addr Pointer to value to be increment by \p value. * @param [in] value Value by \p addr is to be incremented. * @return Original value contained in \p addr. */ __device__ inline float safeAtomicAdd(float* addr, float value) { #if defined(__gfx908__) || defined(__gfx941__) \ || ((defined(__gfx90a__) || defined(__gfx940__) || defined(__gfx942__)) \ && !__has_builtin(__hip_atomic_fetch_add)) // On gfx908, we can generate unsafe FP32 atomic add that does not follow all // IEEE rules when -munsafe-fp-atomics is passed. Do a CAS loop emulation instead. // On gfx941, we can generate unsafe FP32 atomic add that may not always happen atomically, // so we need to force a CAS loop emulation to ensure safety. // On gfx90a, gfx940 and gfx942 if we do not have the __hip_atomic_fetch_add builtin, we // need to force a CAS loop here. float old_val; #if __has_builtin(__hip_atomic_load) old_val = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #else // !__has_builtin(__hip_atomic_load) old_val = __uint_as_float(__atomic_load_n(reinterpret_cast(addr), __ATOMIC_RELAXED)); #endif // __has_builtin(__hip_atomic_load) float expected, temp; do { temp = expected = old_val; #if __has_builtin(__hip_atomic_compare_exchange_strong) __hip_atomic_compare_exchange_strong(addr, &expected, old_val + value, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #else // !__has_builtin(__hip_atomic_compare_exchange_strong) __atomic_compare_exchange_n(addr, &expected, old_val + value, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); #endif // __has_builtin(__hip_atomic_compare_exchange_strong) old_val = expected; } while (__float_as_uint(temp) != __float_as_uint(old_val)); return old_val; #elif defined(__gfx90a__) // On gfx90a, with the __hip_atomic_fetch_add builtin, relaxed system-scope // atomics will produce safe CAS loops, but are otherwise not different than // agent-scope atomics. This logic is only applicable for gfx90a, and should // not be assumed on other architectures. return __hip_atomic_fetch_add(addr, value, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #elif __has_builtin(__hip_atomic_fetch_add) return __hip_atomic_fetch_add(addr, value, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #else return __atomic_fetch_add(addr, value, __ATOMIC_RELAXED); #endif } /** * @brief Safe floating point rmw atomic max. * * Performs a relaxed read-modify-write floating point atomic max with * device memory scope. The original value at \p addr is returned and * the value at \p addr is replaced by \p val if greater. * * @note This operation ensures that, on all targets, we produce safe atomics. * This will be the case even when -munsafe-fp-atomics is passed into the compiler. * * @param [in,out] addr Pointer to value to be updated * @param [in] val Value used to update the value at \p addr. * @return Original value contained in \p addr. */ __device__ inline float safeAtomicMax(float* addr, float val) { #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) float value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value < val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned int *uaddr = (unsigned int *)addr; unsigned int value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __uint_as_float(value) < val) { done = __atomic_compare_exchange_n(uaddr, &value, __float_as_uint(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __uint_as_float(value); #endif } /** * @brief Safe floating point rmw atomic min. * * Performs a relaxed read-modify-write floating point atomic min with * device memory scope. The original value at \p addr is returned and * the value at \p addr is replaced by \p val if lesser. * * @note This operation ensures that, on all targets, we produce safe atomics. * This will be the case even when -munsafe-fp-atomics is passed into the compiler. * * @param [in,out] addr Pointer to value to be updated * @param [in] val Value used to update the value at \p addr. * @return Original value contained in \p addr. */ __device__ inline float safeAtomicMin(float* addr, float val) { #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) float value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value > val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned int *uaddr = (unsigned int *)addr; unsigned int value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __uint_as_float(value) > val) { done = __atomic_compare_exchange_n(uaddr, &value, __float_as_uint(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __uint_as_float(value); #endif } /** * @brief Safe double precision rmw atomic add. * * Performs a relaxed read-modify-write double precision atomic add with * device memory scope. Original value at \p addr is returned and * the value of \p addr is updated to have the original value plus \p value * * @note This operation ensures that, on all targets, we produce safe atomics. * This will be the case even when -munsafe-fp-atomics is passed into the compiler. * * @param [in,out] addr Pointer to value to be increment by \p value. * @param [in] value Value by \p addr is to be incremented. * @return Original value contained in \p addr. */ __device__ inline double safeAtomicAdd(double* addr, double value) { #if defined(__gfx90a__) && __has_builtin(__hip_atomic_fetch_add) // On gfx90a, with the __hip_atomic_fetch_add builtin, relaxed system-scope // atomics will produce safe CAS loops, but are otherwise not different than // agent-scope atomics. This logic is only applicable for gfx90a, and should // not be assumed on other architectures. return __hip_atomic_fetch_add(addr, value, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM); #elif defined(__gfx90a__) // On gfx90a, if we do not have the __hip_atomic_fetch_add builtin, we need to // force a CAS loop here. double old_val; #if __has_builtin(__hip_atomic_load) old_val = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #else // !__has_builtin(__hip_atomic_load) old_val = __longlong_as_double(__atomic_load_n(reinterpret_cast(addr), __ATOMIC_RELAXED)); #endif // __has_builtin(__hip_atomic_load) double expected, temp; do { temp = expected = old_val; #if __has_builtin(__hip_atomic_compare_exchange_strong) __hip_atomic_compare_exchange_strong(addr, &expected, old_val + value, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #else // !__has_builtin(__hip_atomic_compare_exchange_strong) __atomic_compare_exchange_n(addr, &expected, old_val + value, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); #endif // __has_builtin(__hip_atomic_compare_exchange_strong) old_val = expected; } while (__double_as_longlong(temp) != __double_as_longlong(old_val)); return old_val; #else // !defined(__gfx90a__) #if __has_builtin(__hip_atomic_fetch_add) return __hip_atomic_fetch_add(addr, value, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); #else // !__has_builtin(__hip_atomic_fetch_add) return __atomic_fetch_add(addr, value, __ATOMIC_RELAXED); #endif // __has_builtin(__hip_atomic_fetch_add) #endif } /** * @brief Safe double precision rmw atomic max. * * Performs a relaxed read-modify-write double precision atomic max with * device memory scope. Original value at \p addr is returned and * the value of \p addr is updated with \p val if greater. * * @note This operation ensures that, on all targets, we produce safe atomics. * This will be the case even when -munsafe-fp-atomics is passed into the compiler. * * @param [in,out] addr Pointer to value to be updated. * @param [in] val Value used to updated the contents at \p addr * @return Original value contained at \p addr. */ __device__ inline double safeAtomicMax(double* addr, double val) { #if __has_builtin(__builtin_amdgcn_is_private) if (__builtin_amdgcn_is_private( (const __attribute__((address_space(0))) void*)addr)) { double old = *addr; *addr = __builtin_fmax(old, val); return old; } else { #endif #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) double value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value < val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned long long *uaddr = (unsigned long long *)addr; unsigned long long value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __longlong_as_double(value) < val) { done = __atomic_compare_exchange_n(uaddr, &value, __double_as_longlong(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __longlong_as_double(value); #endif #if __has_builtin(__builtin_amdgcn_is_private) } #endif } /** * @brief Safe double precision rmw atomic min. * * Performs a relaxed read-modify-write double precision atomic min with * device memory scope. Original value at \p addr is returned and * the value of \p addr is updated with \p val if lesser. * * @note This operation ensures that, on all targets, we produce safe atomics. * This will be the case even when -munsafe-fp-atomics is passed into the compiler. * * @param [in,out] addr Pointer to value to be updated. * @param [in] val Value used to updated the contents at \p addr * @return Original value contained at \p addr. */ __device__ inline double safeAtomicMin(double* addr, double val) { #if __has_builtin(__builtin_amdgcn_is_private) if (__builtin_amdgcn_is_private( (const __attribute__((address_space(0))) void*)addr)) { double old = *addr; *addr = __builtin_fmin(old, val); return old; } else { #endif #if __has_builtin(__hip_atomic_load) && \ __has_builtin(__hip_atomic_compare_exchange_strong) double value = __hip_atomic_load(addr, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); bool done = false; while (!done && value > val) { done = __hip_atomic_compare_exchange_strong(addr, &value, val, __ATOMIC_RELAXED, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_AGENT); } return value; #else unsigned long long *uaddr = (unsigned long long *)addr; unsigned long long value = __atomic_load_n(uaddr, __ATOMIC_RELAXED); bool done = false; while (!done && __longlong_as_double(value) > val) { done = __atomic_compare_exchange_n(uaddr, &value, __double_as_longlong(val), false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); } return __longlong_as_double(value); #endif #if __has_builtin(__builtin_amdgcn_is_private) } #endif } #if defined(__clang__) #pragma clang diagnostic pop #endif #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_hip_vector_types.h000066400000000000000000001546111450307266000253650ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /** * @file amd_detail/hip_vector_types.h * @brief Defines the different newt vector types for HIP runtime. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HIP_VECTOR_TYPES_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HIP_VECTOR_TYPES_H #include "hip/amd_detail/host_defines.h" #if defined(__HIPCC_RTC__) #define __HOST_DEVICE__ __device__ #else #define __HOST_DEVICE__ __host__ __device__ #endif #if defined(__has_attribute) #if __has_attribute(ext_vector_type) #define __NATIVE_VECTOR__(n, T) T __attribute__((ext_vector_type(n))) #else #define __NATIVE_VECTOR__(n, T) T[n] #endif #if defined(__cplusplus) #if !defined(__HIPCC_RTC__) #include #include #include #else namespace std { using ::size_t; template struct integral_constant { static constexpr const _Tp value = __v; typedef _Tp value_type; typedef integral_constant type; constexpr operator value_type() const { return value; } constexpr value_type operator()() const { return value; } }; template constexpr const _Tp integral_constant<_Tp, __v>::value; typedef integral_constant true_type; typedef integral_constant false_type; template using bool_constant = integral_constant; typedef bool_constant true_type; typedef bool_constant false_type; template struct enable_if {}; template struct enable_if { typedef __T type; }; template struct true_or_false_type : public false_type {}; template<> struct true_or_false_type : public true_type {}; template struct is_integral : public false_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template struct is_arithmetic : public false_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template struct is_floating_point : public false_type {}; template<> struct is_floating_point : public true_type {}; template<> struct is_floating_point : public true_type {}; template<> struct is_floating_point : public true_type {}; template struct is_same : public false_type {}; template struct is_same<__T, __T> : public true_type {}; template::value> struct is_signed : public false_type {}; template struct is_signed<_Tp, true> : public true_or_false_type<_Tp(-1) < _Tp(0)> {}; template struct is_convertible : public true_or_false_type<__is_convertible_to(_T1, _T2)> {}; template struct char_traits; template> class basic_istream; template> class basic_ostream; typedef basic_istream istream; typedef basic_ostream ostream; template struct is_scalar : public integral_constant {}; } // Namespace std. #endif // defined(__HIPCC_RTC__) namespace hip_impl { inline constexpr unsigned int next_pot(unsigned int x) { // Precondition: x > 1. return 1u << (32u - __builtin_clz(x - 1u)); } } // Namespace hip_impl. template struct HIP_vector_base; template struct HIP_vector_base { using Native_vec_ = __NATIVE_VECTOR__(1, T); union { Native_vec_ data; struct { T x; }; }; using value_type = T; __HOST_DEVICE__ HIP_vector_base() = default; __HOST_DEVICE__ explicit constexpr HIP_vector_base(T x_) noexcept : data{x_} {} __HOST_DEVICE__ constexpr HIP_vector_base(const HIP_vector_base&) = default; __HOST_DEVICE__ constexpr HIP_vector_base(HIP_vector_base&&) = default; __HOST_DEVICE__ ~HIP_vector_base() = default; __HOST_DEVICE__ HIP_vector_base& operator=(const HIP_vector_base&) = default; }; template struct HIP_vector_base { using Native_vec_ = __NATIVE_VECTOR__(2, T); union #if !__has_attribute(ext_vector_type) alignas(hip_impl::next_pot(2 * sizeof(T))) #endif { Native_vec_ data; struct { T x; T y; }; }; using value_type = T; __HOST_DEVICE__ HIP_vector_base() = default; __HOST_DEVICE__ explicit constexpr HIP_vector_base(T x_) noexcept : data{x_, x_} {} __HOST_DEVICE__ constexpr HIP_vector_base(T x_, T y_) noexcept : data{x_, y_} {} __HOST_DEVICE__ constexpr HIP_vector_base(const HIP_vector_base&) = default; __HOST_DEVICE__ constexpr HIP_vector_base(HIP_vector_base&&) = default; __HOST_DEVICE__ ~HIP_vector_base() = default; __HOST_DEVICE__ HIP_vector_base& operator=(const HIP_vector_base&) = default; }; template struct HIP_vector_base { struct Native_vec_ { T d[3]; __HOST_DEVICE__ Native_vec_() = default; __HOST_DEVICE__ explicit constexpr Native_vec_(T x_) noexcept : d{x_, x_, x_} {} __HOST_DEVICE__ constexpr Native_vec_(T x_, T y_, T z_) noexcept : d{x_, y_, z_} {} __HOST_DEVICE__ constexpr Native_vec_(const Native_vec_&) = default; __HOST_DEVICE__ constexpr Native_vec_(Native_vec_&&) = default; __HOST_DEVICE__ ~Native_vec_() = default; __HOST_DEVICE__ Native_vec_& operator=(const Native_vec_&) = default; __HOST_DEVICE__ Native_vec_& operator=(Native_vec_&&) = default; __HOST_DEVICE__ T& operator[](unsigned int idx) noexcept { return d[idx]; } __HOST_DEVICE__ T operator[](unsigned int idx) const noexcept { return d[idx]; } __HOST_DEVICE__ Native_vec_& operator+=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] += x_.d[i]; return *this; } __HOST_DEVICE__ Native_vec_& operator-=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] -= x_.d[i]; return *this; } __HOST_DEVICE__ Native_vec_& operator*=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] *= x_.d[i]; return *this; } __HOST_DEVICE__ Native_vec_& operator/=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] /= x_.d[i]; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ Native_vec_ operator-() const noexcept { auto r{*this}; for (auto&& x : r.d) x = -x; return r; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ Native_vec_ operator~() const noexcept { auto r{*this}; for (auto&& x : r.d) x = ~x; return r; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ Native_vec_& operator%=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] %= x_.d[i]; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ Native_vec_& operator^=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] ^= x_.d[i]; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ Native_vec_& operator|=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] |= x_.d[i]; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ Native_vec_& operator&=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] &= x_.d[i]; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ Native_vec_& operator>>=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] >>= x_.d[i]; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ Native_vec_& operator<<=(const Native_vec_& x_) noexcept { for (auto i = 0u; i != 3u; ++i) d[i] <<= x_.d[i]; return *this; } #if defined (__INTEL_COMPILER) typedef struct { int values[4]; } _Vec3_cmp; using Vec3_cmp = _Vec3_cmp; #else using Vec3_cmp = int __attribute__((vector_size(4 * sizeof(int)))); #endif //INTEL __HOST_DEVICE__ Vec3_cmp operator==(const Native_vec_& x_) const noexcept { return Vec3_cmp{d[0] == x_.d[0], d[1] == x_.d[1], d[2] == x_.d[2]}; } }; union { Native_vec_ data; struct { T x; T y; T z; }; }; using value_type = T; __HOST_DEVICE__ HIP_vector_base() = default; __HOST_DEVICE__ explicit constexpr HIP_vector_base(T x_) noexcept : data{x_, x_, x_} {} __HOST_DEVICE__ constexpr HIP_vector_base(T x_, T y_, T z_) noexcept : data{x_, y_, z_} {} __HOST_DEVICE__ constexpr HIP_vector_base(const HIP_vector_base&) = default; __HOST_DEVICE__ constexpr HIP_vector_base(HIP_vector_base&&) = default; __HOST_DEVICE__ ~HIP_vector_base() = default; __HOST_DEVICE__ HIP_vector_base& operator=(const HIP_vector_base&) = default; __HOST_DEVICE__ HIP_vector_base& operator=(HIP_vector_base&&) = default; }; template struct HIP_vector_base { using Native_vec_ = __NATIVE_VECTOR__(4, T); union #if !__has_attribute(ext_vector_type) alignas(hip_impl::next_pot(4 * sizeof(T))) #endif { Native_vec_ data; struct { T x; T y; T z; T w; }; }; using value_type = T; __HOST_DEVICE__ HIP_vector_base() = default; __HOST_DEVICE__ explicit constexpr HIP_vector_base(T x_) noexcept : data{x_, x_, x_, x_} {} __HOST_DEVICE__ constexpr HIP_vector_base(T x_, T y_, T z_, T w_) noexcept : data{x_, y_, z_, w_} {} __HOST_DEVICE__ constexpr HIP_vector_base(const HIP_vector_base&) = default; __HOST_DEVICE__ constexpr HIP_vector_base(HIP_vector_base&&) = default; __HOST_DEVICE__ ~HIP_vector_base() = default; __HOST_DEVICE__ HIP_vector_base& operator=(const HIP_vector_base&) = default; }; template struct HIP_vector_type : public HIP_vector_base { using HIP_vector_base::data; using typename HIP_vector_base::Native_vec_; __HOST_DEVICE__ HIP_vector_type() = default; template< typename U, typename std::enable_if< std::is_convertible::value>::type* = nullptr> __HOST_DEVICE__ explicit constexpr HIP_vector_type(U x_) noexcept : HIP_vector_base{static_cast(x_)} {} template< // TODO: constrain based on type as well. typename... Us, typename std::enable_if< (rank > 1) && sizeof...(Us) == rank>::type* = nullptr> __HOST_DEVICE__ constexpr HIP_vector_type(Us... xs) noexcept : HIP_vector_base{static_cast(xs)...} {} __HOST_DEVICE__ constexpr HIP_vector_type(const HIP_vector_type&) = default; __HOST_DEVICE__ constexpr HIP_vector_type(HIP_vector_type&&) = default; __HOST_DEVICE__ ~HIP_vector_type() = default; __HOST_DEVICE__ HIP_vector_type& operator=(const HIP_vector_type&) = default; __HOST_DEVICE__ HIP_vector_type& operator=(HIP_vector_type&&) = default; // Operators __HOST_DEVICE__ HIP_vector_type& operator++() noexcept { return *this += HIP_vector_type{1}; } __HOST_DEVICE__ HIP_vector_type operator++(int) noexcept { auto tmp(*this); ++*this; return tmp; } __HOST_DEVICE__ HIP_vector_type& operator--() noexcept { return *this -= HIP_vector_type{1}; } __HOST_DEVICE__ HIP_vector_type operator--(int) noexcept { auto tmp(*this); --*this; return tmp; } __HOST_DEVICE__ HIP_vector_type& operator+=(const HIP_vector_type& x) noexcept { data += x.data; return *this; } template< typename U, typename std::enable_if< std::is_convertible{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator+=(U x) noexcept { return *this += HIP_vector_type{x}; } __HOST_DEVICE__ HIP_vector_type& operator-=(const HIP_vector_type& x) noexcept { data -= x.data; return *this; } template< typename U, typename std::enable_if< std::is_convertible{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator-=(U x) noexcept { return *this -= HIP_vector_type{x}; } __HOST_DEVICE__ HIP_vector_type& operator*=(const HIP_vector_type& x) noexcept { data *= x.data; return *this; } friend __HOST_DEVICE__ inline constexpr HIP_vector_type operator*( HIP_vector_type x, const HIP_vector_type& y) noexcept { return HIP_vector_type{ x } *= y; } template< typename U, typename std::enable_if< std::is_convertible{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator*=(U x) noexcept { return *this *= HIP_vector_type{x}; } friend __HOST_DEVICE__ inline constexpr HIP_vector_type operator/( HIP_vector_type x, const HIP_vector_type& y) noexcept { return HIP_vector_type{ x } /= y; } __HOST_DEVICE__ HIP_vector_type& operator/=(const HIP_vector_type& x) noexcept { data /= x.data; return *this; } template< typename U, typename std::enable_if< std::is_convertible{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator/=(U x) noexcept { return *this /= HIP_vector_type{x}; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type operator-() const noexcept { auto tmp(*this); tmp.data = -tmp.data; return tmp; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type operator~() const noexcept { HIP_vector_type r{*this}; r.data = ~r.data; return r; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator%=(const HIP_vector_type& x) noexcept { data %= x.data; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator^=(const HIP_vector_type& x) noexcept { data ^= x.data; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator|=(const HIP_vector_type& x) noexcept { data |= x.data; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator&=(const HIP_vector_type& x) noexcept { data &= x.data; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator>>=(const HIP_vector_type& x) noexcept { data >>= x.data; return *this; } template< typename U = T, typename std::enable_if{}>::type* = nullptr> __HOST_DEVICE__ HIP_vector_type& operator<<=(const HIP_vector_type& x) noexcept { data <<= x.data; return *this; } }; template __HOST_DEVICE__ inline constexpr HIP_vector_type operator+( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} += y; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator+( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} += HIP_vector_type{y}; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator+( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} += y; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator-( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} -= y; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator-( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} -= HIP_vector_type{y}; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator-( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} -= y; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator*( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} *= HIP_vector_type{y}; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator*( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} *= y; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator/( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} /= HIP_vector_type{y}; } template __HOST_DEVICE__ inline constexpr HIP_vector_type operator/( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} /= y; } template __HOST_DEVICE__ inline constexpr bool _hip_any_zero(const V& x, int n) noexcept { return (n == -1) ? true : ((x[n] == 0) ? false : _hip_any_zero(x, n - 1)); } template __HOST_DEVICE__ inline constexpr bool operator==( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return _hip_any_zero(x.data == y.data, n - 1); } template __HOST_DEVICE__ inline constexpr bool operator==(const HIP_vector_type& x, U y) noexcept { return x == HIP_vector_type{y}; } template __HOST_DEVICE__ inline constexpr bool operator==(U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} == y; } template __HOST_DEVICE__ inline constexpr bool operator!=( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return !(x == y); } template __HOST_DEVICE__ inline constexpr bool operator!=(const HIP_vector_type& x, U y) noexcept { return !(x == y); } template __HOST_DEVICE__ inline constexpr bool operator!=(U x, const HIP_vector_type& y) noexcept { return !(x == y); } template< typename T, unsigned int n, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator%( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} %= y; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator%( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} %= HIP_vector_type{y}; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator%( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} %= y; } template< typename T, unsigned int n, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator^( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} ^= y; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator^( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} ^= HIP_vector_type{y}; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator^( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} ^= y; } template< typename T, unsigned int n, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator|( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} |= y; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator|( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} |= HIP_vector_type{y}; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator|( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} |= y; } template< typename T, unsigned int n, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator&( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} &= y; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator&( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} &= HIP_vector_type{y}; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator&( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} &= y; } template< typename T, unsigned int n, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator>>( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} >>= y; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator>>( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} >>= HIP_vector_type{y}; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator>>( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} >>= y; } template< typename T, unsigned int n, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator<<( const HIP_vector_type& x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} <<= y; } template< typename T, unsigned int n, typename U, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator<<( const HIP_vector_type& x, U y) noexcept { return HIP_vector_type{x} <<= HIP_vector_type{y}; } template< typename T, unsigned int n, typename U, typename std::enable_if::value>::type, typename std::enable_if{}>* = nullptr> __HOST_DEVICE__ inline constexpr HIP_vector_type operator<<( U x, const HIP_vector_type& y) noexcept { return HIP_vector_type{x} <<= y; } /* * Map HIP_vector_type to HIP_vector_type */ template __forceinline__ __HOST_DEVICE__ typename std::enable_if<(rankT == 1 && rankU >= 1), const HIP_vector_type>::type __hipMapVector(const HIP_vector_type& u) { return HIP_vector_type(static_cast(u.x)); }; template __forceinline__ __HOST_DEVICE__ typename std::enable_if<(rankT == 2 && rankU == 1), const HIP_vector_type>::type __hipMapVector(const HIP_vector_type& u) { return HIP_vector_type (static_cast(u.x), static_cast(0)); }; template __forceinline__ __HOST_DEVICE__ typename std::enable_if<(rankT == 2 && rankU >= 2), const HIP_vector_type>::type __hipMapVector(const HIP_vector_type& u) { return HIP_vector_type (static_cast(u.x), static_cast(u.y)); }; template __forceinline__ __HOST_DEVICE__ typename std::enable_if<(rankT == 4 && rankU == 1), const HIP_vector_type>::type __hipMapVector(const HIP_vector_type& u) { return HIP_vector_type (static_cast(u.x), static_cast(0), static_cast(0), static_cast(0)); }; template __forceinline__ __HOST_DEVICE__ typename std::enable_if<(rankT == 4 && rankU == 2), const HIP_vector_type>::type __hipMapVector(const HIP_vector_type& u) { return HIP_vector_type(static_cast(u.x), static_cast(u.y), static_cast(0), static_cast(0)); }; template __forceinline__ __HOST_DEVICE__ typename std::enable_if<(rankT == 4 && rankU == 4), const HIP_vector_type>::type __hipMapVector(const HIP_vector_type& u) { return HIP_vector_type (static_cast(u.x), static_cast(u.y), static_cast(u.z), static_cast(u.w)); }; #define __MAKE_VECTOR_TYPE__(CUDA_name, T) \ using CUDA_name##1 = HIP_vector_type;\ using CUDA_name##2 = HIP_vector_type;\ using CUDA_name##3 = HIP_vector_type;\ using CUDA_name##4 = HIP_vector_type; #else #define __MAKE_VECTOR_TYPE__(CUDA_name, T) \ typedef struct {\ T x;\ } CUDA_name##1;\ typedef struct {\ T x;\ T y;\ } CUDA_name##2;\ typedef struct {\ T x;\ T y;\ T z;\ } CUDA_name##3;\ typedef struct {\ T x;\ T y;\ T z;\ T w;\ } CUDA_name##4; #endif __MAKE_VECTOR_TYPE__(uchar, unsigned char); __MAKE_VECTOR_TYPE__(char, char); __MAKE_VECTOR_TYPE__(ushort, unsigned short); __MAKE_VECTOR_TYPE__(short, short); __MAKE_VECTOR_TYPE__(uint, unsigned int); __MAKE_VECTOR_TYPE__(int, int); __MAKE_VECTOR_TYPE__(ulong, unsigned long); __MAKE_VECTOR_TYPE__(long, long); __MAKE_VECTOR_TYPE__(ulonglong, unsigned long long); __MAKE_VECTOR_TYPE__(longlong, long long); __MAKE_VECTOR_TYPE__(float, float); __MAKE_VECTOR_TYPE__(double, double); #else // !defined(__has_attribute) #if defined(_MSC_VER) #include #include #include #include /* this is for compatibility with CUDA as CUDA allows accessing vector components in C++ program with MSVC */ typedef union { struct { char x; }; char data; } char1; typedef union { struct { char x; char y; }; char data[2]; } char2; typedef union { struct { char x; char y; char z; char w; }; char data[4]; } char4; typedef union { struct { char x; char y; char z; }; char data[3]; } char3; typedef union { __m64 data; } char8; typedef union { __m128i data; } char16; typedef union { struct { unsigned char x; }; unsigned char data; } uchar1; typedef union { struct { unsigned char x; unsigned char y; }; unsigned char data[2]; } uchar2; typedef union { struct { unsigned char x; unsigned char y; unsigned char z; unsigned char w; }; unsigned char data[4]; } uchar4; typedef union { struct { unsigned char x; unsigned char y; unsigned char z; }; unsigned char data[3]; } uchar3; typedef union { __m64 data; } uchar8; typedef union { __m128i data; } uchar16; typedef union { struct { short x; }; short data; } short1; typedef union { struct { short x; short y; }; short data[2]; } short2; typedef union { struct { short x; short y; short z; short w; }; __m64 data; } short4; typedef union { struct { short x; short y; short z; }; short data[3]; } short3; typedef union { __m128i data; } short8; typedef union { __m128i data[2]; } short16; typedef union { struct { unsigned short x; }; unsigned short data; } ushort1; typedef union { struct { unsigned short x; unsigned short y; }; unsigned short data[2]; } ushort2; typedef union { struct { unsigned short x; unsigned short y; unsigned short z; unsigned short w; }; __m64 data; } ushort4; typedef union { struct { unsigned short x; unsigned short y; unsigned short z; }; unsigned short data[3]; } ushort3; typedef union { __m128i data; } ushort8; typedef union { __m128i data[2]; } ushort16; typedef union { struct { int x; }; int data; } int1; typedef union { struct { int x; int y; }; __m64 data; } int2; typedef union { struct { int x; int y; int z; int w; }; __m128i data; } int4; typedef union { struct { int x; int y; int z; }; int data[3]; } int3; typedef union { __m128i data[2]; } int8; typedef union { __m128i data[4]; } int16; typedef union { struct { unsigned int x; }; unsigned int data; } uint1; typedef union { struct { unsigned int x; unsigned int y; }; __m64 data; } uint2; typedef union { struct { unsigned int x; unsigned int y; unsigned int z; unsigned int w; }; __m128i data; } uint4; typedef union { struct { unsigned int x; unsigned int y; unsigned int z; }; unsigned int data[3]; } uint3; typedef union { __m128i data[2]; } uint8; typedef union { __m128i data[4]; } uint16; typedef union { struct { int x; }; int data; } long1; typedef union { struct { int x; int y; }; __m64 data; } long2; typedef union { struct { int x; int y; int z; int w; }; __m128i data; } long4; typedef union { struct { int x; int y; int z; }; int data[3]; } long3; typedef union { __m128i data[2]; } long8; typedef union { __m128i data[4]; } long16; typedef union { struct { unsigned int x; }; unsigned int data; } ulong1; typedef union { struct { unsigned int x; unsigned int y; }; __m64 data; } ulong2; typedef union { struct { unsigned int x; unsigned int y; unsigned int z; unsigned int w; }; __m128i data; } ulong4; typedef union { struct { unsigned int x; unsigned int y; unsigned int z; }; unsigned int data[3]; } ulong3; typedef union { __m128i data[2]; } ulong8; typedef union { __m128i data[4]; } ulong16; typedef union { struct { long long x; }; __m64 data; } longlong1; typedef union { struct { long long x; long long y; }; __m128i data; } longlong2; typedef union { struct { long long x; long long y; long long z; long long w; }; __m128i data[2]; } longlong4; typedef union { struct { long long x; long long y; long long z; }; __m64 data[3]; } longlong3; typedef union { __m128i data[4]; } longlong8; typedef union { __m128i data[8]; } longlong16; typedef union { struct { __m64 x; }; __m64 data; } ulonglong1; typedef union { struct { __m64 x; __m64 y; }; __m128i data; } ulonglong2; typedef union { struct { __m64 x; __m64 y; __m64 z; __m64 w; }; __m128i data[2]; } ulonglong4; typedef union { struct { __m64 x; __m64 y; __m64 z; }; __m64 data[3]; } ulonglong3; typedef union { __m128i data[4]; } ulonglong8; typedef union { __m128i data[8]; } ulonglong16; typedef union { struct { float x; }; float data; } float1; typedef union { struct { float x; float y; }; __m64 data; } float2; typedef union { struct { float x; float y; float z; float w; }; __m128 data; } float4; typedef union { struct { float x; float y; float z; }; float data[3]; } float3; typedef union { __m256 data; } float8; typedef union { __m256 data[2]; } float16; typedef union { struct { double x; }; double data; } double1; typedef union { struct { double x; double y; }; __m128d data; } double2; typedef union { struct { double x; double y; double z; double w; }; __m256d data; } double4; typedef union { struct { double x; double y; double z; }; double data[3]; } double3; typedef union { __m256d data[2]; } double8; typedef union { __m256d data[4]; } double16; #else // !defined(_MSC_VER) /* this is for compatibility with CUDA as CUDA allows accessing vector components in C++ program with MSVC */ typedef union { struct { char x; }; char data; } char1; typedef union { struct { char x; char y; }; char data[2]; } char2; typedef union { struct { char x; char y; char z; char w; }; char data[4]; } char4; typedef union { char data[8]; } char8; typedef union { char data[16]; } char16; typedef union { struct { char x; char y; char z; }; char data[3]; } char3; typedef union { struct { unsigned char x; }; unsigned char data; } uchar1; typedef union { struct { unsigned char x; unsigned char y; }; unsigned char data[2]; } uchar2; typedef union { struct { unsigned char x; unsigned char y; unsigned char z; unsigned char w; }; unsigned char data[4]; } uchar4; typedef union { unsigned char data[8]; } uchar8; typedef union { unsigned char data[16]; } uchar16; typedef union { struct { unsigned char x; unsigned char y; unsigned char z; }; unsigned char data[3]; } uchar3; typedef union { struct { short x; }; short data; } short1; typedef union { struct { short x; short y; }; short data[2]; } short2; typedef union { struct { short x; short y; short z; short w; }; short data[4]; } short4; typedef union { short data[8]; } short8; typedef union { short data[16]; } short16; typedef union { struct { short x; short y; short z; }; short data[3]; } short3; typedef union { struct { unsigned short x; }; unsigned short data; } ushort1; typedef union { struct { unsigned short x; unsigned short y; }; unsigned short data[2]; } ushort2; typedef union { struct { unsigned short x; unsigned short y; unsigned short z; unsigned short w; }; unsigned short data[4]; } ushort4; typedef union { unsigned short data[8]; } ushort8; typedef union { unsigned short data[16]; } ushort16; typedef union { struct { unsigned short x; unsigned short y; unsigned short z; }; unsigned short data[3]; } ushort3; typedef union { struct { int x; }; int data; } int1; typedef union { struct { int x; int y; }; int data[2]; } int2; typedef union { struct { int x; int y; int z; int w; }; int data[4]; } int4; typedef union { int data[8]; } int8; typedef union { int data[16]; } int16; typedef union { struct { int x; int y; int z; }; int data[3]; } int3; typedef union { struct { unsigned int x; }; unsigned int data; } uint1; typedef union { struct { unsigned int x; unsigned int y; }; unsigned int data[2]; } uint2; typedef union { struct { unsigned int x; unsigned int y; unsigned int z; unsigned int w; }; unsigned int data[4]; } uint4; typedef union { unsigned int data[8]; } uint8; typedef union { unsigned int data[16]; } uint16; typedef union { struct { unsigned int x; unsigned int y; unsigned int z; }; unsigned int data[3]; } uint3; typedef union { struct { long x; }; long data; } long1; typedef union { struct { long x; long y; }; long data[2]; } long2; typedef union { struct { long x; long y; long z; long w; }; long data[4]; } long4; typedef union { long data[8]; } long8; typedef union { long data[16]; } long16; typedef union { struct { long x; long y; long z; }; long data[3]; } long3; typedef union { struct { unsigned long x; }; unsigned long data; } ulong1; typedef union { struct { unsigned long x; unsigned long y; }; unsigned long data[2]; } ulong2; typedef union { struct { unsigned long x; unsigned long y; unsigned long z; unsigned long w; }; unsigned long data[4]; } ulong4; typedef union { unsigned long data[8]; } ulong8; typedef union { unsigned long data[16]; } ulong16; typedef union { struct { unsigned long x; unsigned long y; unsigned long z; }; unsigned long data[3]; } ulong3; typedef union { struct { long long x; }; long long data; } longlong1; typedef union { struct { long long x; long long y; }; long long data[2]; } longlong2; typedef union { struct { long long x; long long y; long long z; long long w; }; long long data[4]; } longlong4; typedef union { long long data[8]; } longlong8; typedef union { long long data[16]; } longlong16; typedef union { struct { long long x; long long y; long long z; }; long long data[3]; } longlong3; typedef union { struct { unsigned long long x; }; unsigned long long data; } ulonglong1; typedef union { struct { unsigned long long x; unsigned long long y; }; unsigned long long data[2]; } ulonglong2; typedef union { struct { unsigned long long x; unsigned long long y; unsigned long long z; unsigned long long w; }; unsigned long long data[4]; } ulonglong4; typedef union { unsigned long long data[8]; } ulonglong8; typedef union { unsigned long long data[16]; } ulonglong16; typedef union { struct { unsigned long long x; unsigned long long y; unsigned long long z; }; unsigned long long data[3]; } ulonglong3; typedef union { struct { float x; }; float data; } float1; typedef union { struct { float x; float y; }; float data[2]; } float2; typedef union { struct { float x; float y; float z; float w; }; float data[4]; } float4; typedef union { float data[8]; } float8; typedef union { float data[16]; } float16; typedef union { struct { float x; float y; float z; }; float data[3]; } float3; typedef union { struct { double x; }; double data; } double1; typedef union { struct { double x; double y; }; double data[2]; } double2; typedef union { struct { double x; double y; double z; double w; }; double data[4]; } double4; typedef union { double data[8]; } double8; typedef union { double data[16]; } double16; typedef union { struct { double x; double y; double z; }; double data[3]; } double3; #endif // defined(_MSC_VER) #endif // defined(__has_attribute) #ifdef __cplusplus #define DECLOP_MAKE_ONE_COMPONENT(comp, type) \ static inline __HOST_DEVICE__ type make_##type(comp x) { \ type r{x}; \ return r; \ } #define DECLOP_MAKE_TWO_COMPONENT(comp, type) \ static inline __HOST_DEVICE__ type make_##type(comp x, comp y) { \ type r{x, y}; \ return r; \ } #define DECLOP_MAKE_THREE_COMPONENT(comp, type) \ static inline __HOST_DEVICE__ type make_##type(comp x, comp y, comp z) { \ type r{x, y, z}; \ return r; \ } #define DECLOP_MAKE_FOUR_COMPONENT(comp, type) \ static inline __HOST_DEVICE__ type make_##type(comp x, comp y, comp z, comp w) { \ type r{x, y, z, w}; \ return r; \ } #else #define DECLOP_MAKE_ONE_COMPONENT(comp, type) \ static inline __HOST_DEVICE__ type make_##type(comp x) { \ type r; \ r.x = x; \ return r; \ } #define DECLOP_MAKE_TWO_COMPONENT(comp, type) \ static inline __HOST_DEVICE__ type make_##type(comp x, comp y) { \ type r; \ r.x = x; \ r.y = y; \ return r; \ } #define DECLOP_MAKE_THREE_COMPONENT(comp, type) \ static inline __HOST_DEVICE__ type make_##type(comp x, comp y, comp z) { \ type r; \ r.x = x; \ r.y = y; \ r.z = z; \ return r; \ } #define DECLOP_MAKE_FOUR_COMPONENT(comp, type) \ static inline __HOST_DEVICE__ type make_##type(comp x, comp y, comp z, comp w) { \ type r; \ r.x = x; \ r.y = y; \ r.z = z; \ r.w = w; \ return r; \ } #endif DECLOP_MAKE_ONE_COMPONENT(unsigned char, uchar1); DECLOP_MAKE_TWO_COMPONENT(unsigned char, uchar2); DECLOP_MAKE_THREE_COMPONENT(unsigned char, uchar3); DECLOP_MAKE_FOUR_COMPONENT(unsigned char, uchar4); DECLOP_MAKE_ONE_COMPONENT(signed char, char1); DECLOP_MAKE_TWO_COMPONENT(signed char, char2); DECLOP_MAKE_THREE_COMPONENT(signed char, char3); DECLOP_MAKE_FOUR_COMPONENT(signed char, char4); DECLOP_MAKE_ONE_COMPONENT(unsigned short, ushort1); DECLOP_MAKE_TWO_COMPONENT(unsigned short, ushort2); DECLOP_MAKE_THREE_COMPONENT(unsigned short, ushort3); DECLOP_MAKE_FOUR_COMPONENT(unsigned short, ushort4); DECLOP_MAKE_ONE_COMPONENT(signed short, short1); DECLOP_MAKE_TWO_COMPONENT(signed short, short2); DECLOP_MAKE_THREE_COMPONENT(signed short, short3); DECLOP_MAKE_FOUR_COMPONENT(signed short, short4); DECLOP_MAKE_ONE_COMPONENT(unsigned int, uint1); DECLOP_MAKE_TWO_COMPONENT(unsigned int, uint2); DECLOP_MAKE_THREE_COMPONENT(unsigned int, uint3); DECLOP_MAKE_FOUR_COMPONENT(unsigned int, uint4); DECLOP_MAKE_ONE_COMPONENT(signed int, int1); DECLOP_MAKE_TWO_COMPONENT(signed int, int2); DECLOP_MAKE_THREE_COMPONENT(signed int, int3); DECLOP_MAKE_FOUR_COMPONENT(signed int, int4); DECLOP_MAKE_ONE_COMPONENT(float, float1); DECLOP_MAKE_TWO_COMPONENT(float, float2); DECLOP_MAKE_THREE_COMPONENT(float, float3); DECLOP_MAKE_FOUR_COMPONENT(float, float4); DECLOP_MAKE_ONE_COMPONENT(double, double1); DECLOP_MAKE_TWO_COMPONENT(double, double2); DECLOP_MAKE_THREE_COMPONENT(double, double3); DECLOP_MAKE_FOUR_COMPONENT(double, double4); DECLOP_MAKE_ONE_COMPONENT(unsigned long, ulong1); DECLOP_MAKE_TWO_COMPONENT(unsigned long, ulong2); DECLOP_MAKE_THREE_COMPONENT(unsigned long, ulong3); DECLOP_MAKE_FOUR_COMPONENT(unsigned long, ulong4); DECLOP_MAKE_ONE_COMPONENT(signed long, long1); DECLOP_MAKE_TWO_COMPONENT(signed long, long2); DECLOP_MAKE_THREE_COMPONENT(signed long, long3); DECLOP_MAKE_FOUR_COMPONENT(signed long, long4); DECLOP_MAKE_ONE_COMPONENT(unsigned long long, ulonglong1); DECLOP_MAKE_TWO_COMPONENT(unsigned long long, ulonglong2); DECLOP_MAKE_THREE_COMPONENT(unsigned long long, ulonglong3); DECLOP_MAKE_FOUR_COMPONENT(unsigned long long, ulonglong4); DECLOP_MAKE_ONE_COMPONENT(signed long long, longlong1); DECLOP_MAKE_TWO_COMPONENT(signed long long, longlong2); DECLOP_MAKE_THREE_COMPONENT(signed long long, longlong3); DECLOP_MAKE_FOUR_COMPONENT(signed long long, longlong4); #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_math_functions.h000066400000000000000000001100541450307266000250110ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "hip_fp16_math_fwd.h" #include "amd_hip_vector_types.h" #include "math_fwd.h" #include #if !defined(__HIPCC_RTC__) #include // assert.h is only for the host version of assert. // The device version of assert is implemented in hip/amd_detail/hip_runtime.h. // Users should include hip_runtime.h for the device version of assert. #if !__HIP_DEVICE_COMPILE__ #include #endif #include #include #include #endif // !defined(__HIPCC_RTC__) #if _LIBCPP_VERSION && __HIP__ namespace std { template <> struct __numeric_type<_Float16> { static _Float16 __test(_Float16); typedef _Float16 type; static const bool value = true; }; } #endif // _LIBCPP_VERSION #pragma push_macro("__DEVICE__") #pragma push_macro("__RETURN_TYPE") #define __DEVICE__ static __device__ #define __RETURN_TYPE bool #if !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ __DEVICE__ inline uint64_t __make_mantissa_base8(const char* tagp) { uint64_t r = 0; while (tagp) { char tmp = *tagp; if (tmp >= '0' && tmp <= '7') r = (r * 8u) + tmp - '0'; else return 0; ++tagp; } return r; } __DEVICE__ inline uint64_t __make_mantissa_base10(const char* tagp) { uint64_t r = 0; while (tagp) { char tmp = *tagp; if (tmp >= '0' && tmp <= '9') r = (r * 10u) + tmp - '0'; else return 0; ++tagp; } return r; } __DEVICE__ inline uint64_t __make_mantissa_base16(const char* tagp) { uint64_t r = 0; while (tagp) { char tmp = *tagp; if (tmp >= '0' && tmp <= '9') r = (r * 16u) + tmp - '0'; else if (tmp >= 'a' && tmp <= 'f') r = (r * 16u) + tmp - 'a' + 10; else if (tmp >= 'A' && tmp <= 'F') r = (r * 16u) + tmp - 'A' + 10; else return 0; ++tagp; } return r; } __DEVICE__ inline uint64_t __make_mantissa(const char* tagp) { if (!tagp) return 0u; if (*tagp == '0') { ++tagp; if (*tagp == 'x' || *tagp == 'X') return __make_mantissa_base16(tagp); else return __make_mantissa_base8(tagp); } return __make_mantissa_base10(tagp); } #endif // !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ // DOT FUNCTIONS #if __HIP_CLANG_ONLY__ __DEVICE__ inline int amd_mixed_dot(short2 a, short2 b, int c, bool saturate) { return __ockl_sdot2(a.data, b.data, c, saturate); } __DEVICE__ inline uint amd_mixed_dot(ushort2 a, ushort2 b, uint c, bool saturate) { return __ockl_udot2(a.data, b.data, c, saturate); } __DEVICE__ inline int amd_mixed_dot(char4 a, char4 b, int c, bool saturate) { return __ockl_sdot4(a.data, b.data, c, saturate); } __DEVICE__ inline uint amd_mixed_dot(uchar4 a, uchar4 b, uint c, bool saturate) { return __ockl_udot4(a.data, b.data, c, saturate); } __DEVICE__ inline int amd_mixed_dot(int a, int b, int c, bool saturate) { return __ockl_sdot8(a, b, c, saturate); } __DEVICE__ inline uint amd_mixed_dot(uint a, uint b, uint c, bool saturate) { return __ockl_udot8(a, b, c, saturate); } #endif #if !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ // BEGIN FLOAT __DEVICE__ inline float abs(float x) { return __ocml_fabs_f32(x); } __DEVICE__ inline float acosf(float x) { return __ocml_acos_f32(x); } __DEVICE__ inline float acoshf(float x) { return __ocml_acosh_f32(x); } __DEVICE__ inline float asinf(float x) { return __ocml_asin_f32(x); } __DEVICE__ inline float asinhf(float x) { return __ocml_asinh_f32(x); } __DEVICE__ inline float atan2f(float x, float y) { return __ocml_atan2_f32(x, y); } __DEVICE__ inline float atanf(float x) { return __ocml_atan_f32(x); } __DEVICE__ inline float atanhf(float x) { return __ocml_atanh_f32(x); } __DEVICE__ inline float cbrtf(float x) { return __ocml_cbrt_f32(x); } __DEVICE__ inline float ceilf(float x) { return __ocml_ceil_f32(x); } __DEVICE__ inline float copysignf(float x, float y) { return __ocml_copysign_f32(x, y); } __DEVICE__ inline float cosf(float x) { return __ocml_cos_f32(x); } __DEVICE__ inline float coshf(float x) { return __ocml_cosh_f32(x); } __DEVICE__ inline float cospif(float x) { return __ocml_cospi_f32(x); } __DEVICE__ inline float cyl_bessel_i0f(float x) { return __ocml_i0_f32(x); } __DEVICE__ inline float cyl_bessel_i1f(float x) { return __ocml_i1_f32(x); } __DEVICE__ inline float erfcf(float x) { return __ocml_erfc_f32(x); } __DEVICE__ inline float erfcinvf(float x) { return __ocml_erfcinv_f32(x); } __DEVICE__ inline float erfcxf(float x) { return __ocml_erfcx_f32(x); } __DEVICE__ inline float erff(float x) { return __ocml_erf_f32(x); } __DEVICE__ inline float erfinvf(float x) { return __ocml_erfinv_f32(x); } __DEVICE__ inline float exp10f(float x) { return __ocml_exp10_f32(x); } __DEVICE__ inline float exp2f(float x) { return __ocml_exp2_f32(x); } __DEVICE__ inline float expf(float x) { return __ocml_exp_f32(x); } __DEVICE__ inline float expm1f(float x) { return __ocml_expm1_f32(x); } __DEVICE__ inline float fabsf(float x) { return __ocml_fabs_f32(x); } __DEVICE__ inline float fdimf(float x, float y) { return __ocml_fdim_f32(x, y); } __DEVICE__ inline float fdividef(float x, float y) { return x / y; } __DEVICE__ inline float floorf(float x) { return __ocml_floor_f32(x); } __DEVICE__ inline float fmaf(float x, float y, float z) { return __ocml_fma_f32(x, y, z); } __DEVICE__ inline float fmaxf(float x, float y) { return __ocml_fmax_f32(x, y); } __DEVICE__ inline float fminf(float x, float y) { return __ocml_fmin_f32(x, y); } __DEVICE__ inline float fmodf(float x, float y) { return __ocml_fmod_f32(x, y); } __DEVICE__ inline float frexpf(float x, int* nptr) { int tmp; float r = __ocml_frexp_f32(x, (__attribute__((address_space(5))) int*) &tmp); *nptr = tmp; return r; } __DEVICE__ inline float hypotf(float x, float y) { return __ocml_hypot_f32(x, y); } __DEVICE__ inline int ilogbf(float x) { return __ocml_ilogb_f32(x); } __DEVICE__ inline __RETURN_TYPE isfinite(float x) { return __ocml_isfinite_f32(x); } __DEVICE__ inline __RETURN_TYPE isinf(float x) { return __ocml_isinf_f32(x); } __DEVICE__ inline __RETURN_TYPE isnan(float x) { return __ocml_isnan_f32(x); } __DEVICE__ inline float j0f(float x) { return __ocml_j0_f32(x); } __DEVICE__ inline float j1f(float x) { return __ocml_j1_f32(x); } __DEVICE__ inline float jnf(int n, float x) { // TODO: we could use Ahmes multiplication and the Miller & Brown algorithm // for linear recurrences to get O(log n) steps, but it's unclear if // it'd be beneficial in this case. if (n == 0) return j0f(x); if (n == 1) return j1f(x); float x0 = j0f(x); float x1 = j1f(x); for (int i = 1; i < n; ++i) { float x2 = (2 * i) / x * x1 - x0; x0 = x1; x1 = x2; } return x1; } __DEVICE__ inline float ldexpf(float x, int e) { return __ocml_ldexp_f32(x, e); } __DEVICE__ inline float lgammaf(float x) { return __ocml_lgamma_f32(x); } __DEVICE__ inline long long int llrintf(float x) { return __ocml_rint_f32(x); } __DEVICE__ inline long long int llroundf(float x) { return __ocml_round_f32(x); } __DEVICE__ inline float log10f(float x) { return __ocml_log10_f32(x); } __DEVICE__ inline float log1pf(float x) { return __ocml_log1p_f32(x); } __DEVICE__ inline float log2f(float x) { return __ocml_log2_f32(x); } __DEVICE__ inline float logbf(float x) { return __ocml_logb_f32(x); } __DEVICE__ inline float logf(float x) { return __ocml_log_f32(x); } __DEVICE__ inline long int lrintf(float x) { return __ocml_rint_f32(x); } __DEVICE__ inline long int lroundf(float x) { return __ocml_round_f32(x); } __DEVICE__ inline float modff(float x, float* iptr) { float tmp; float r = __ocml_modf_f32(x, (__attribute__((address_space(5))) float*) &tmp); *iptr = tmp; return r; } __DEVICE__ inline float nanf(const char* tagp) { union { float val; struct ieee_float { uint32_t mantissa : 22; uint32_t quiet : 1; uint32_t exponent : 8; uint32_t sign : 1; } bits; static_assert(sizeof(float) == sizeof(ieee_float), ""); } tmp; tmp.bits.sign = 0u; tmp.bits.exponent = ~0u; tmp.bits.quiet = 1u; tmp.bits.mantissa = __make_mantissa(tagp); return tmp.val; } __DEVICE__ inline float nearbyintf(float x) { return __ocml_nearbyint_f32(x); } __DEVICE__ inline float nextafterf(float x, float y) { return __ocml_nextafter_f32(x, y); } __DEVICE__ inline float norm3df(float x, float y, float z) { return __ocml_len3_f32(x, y, z); } __DEVICE__ inline float norm4df(float x, float y, float z, float w) { return __ocml_len4_f32(x, y, z, w); } __DEVICE__ inline float normcdff(float x) { return __ocml_ncdf_f32(x); } __DEVICE__ inline float normcdfinvf(float x) { return __ocml_ncdfinv_f32(x); } __DEVICE__ inline float normf(int dim, const float* a) { // TODO: placeholder until OCML adds support. float r = 0; while (dim--) { r += a[0] * a[0]; ++a; } return __ocml_sqrt_f32(r); } __DEVICE__ inline float powf(float x, float y) { return __ocml_pow_f32(x, y); } __DEVICE__ inline float powif(float base, int iexp) { return __ocml_pown_f32(base, iexp); } __DEVICE__ inline float rcbrtf(float x) { return __ocml_rcbrt_f32(x); } __DEVICE__ inline float remainderf(float x, float y) { return __ocml_remainder_f32(x, y); } __DEVICE__ inline float remquof(float x, float y, int* quo) { int tmp; float r = __ocml_remquo_f32(x, y, (__attribute__((address_space(5))) int*) &tmp); *quo = tmp; return r; } __DEVICE__ inline float rhypotf(float x, float y) { return __ocml_rhypot_f32(x, y); } __DEVICE__ inline float rintf(float x) { return __ocml_rint_f32(x); } __DEVICE__ inline float rnorm3df(float x, float y, float z) { return __ocml_rlen3_f32(x, y, z); } __DEVICE__ inline float rnorm4df(float x, float y, float z, float w) { return __ocml_rlen4_f32(x, y, z, w); } __DEVICE__ inline float rnormf(int dim, const float* a) { // TODO: placeholder until OCML adds support. float r = 0; while (dim--) { r += a[0] * a[0]; ++a; } return __ocml_rsqrt_f32(r); } __DEVICE__ inline float roundf(float x) { return __ocml_round_f32(x); } __DEVICE__ inline float rsqrtf(float x) { return __ocml_rsqrt_f32(x); } __DEVICE__ inline float scalblnf(float x, long int n) { return (n < INT_MAX) ? __ocml_scalbn_f32(x, n) : __ocml_scalb_f32(x, n); } __DEVICE__ inline float scalbnf(float x, int n) { return __ocml_scalbn_f32(x, n); } __DEVICE__ inline __RETURN_TYPE signbit(float x) { return __ocml_signbit_f32(x); } __DEVICE__ inline void sincosf(float x, float* sptr, float* cptr) { float tmp; *sptr = __ocml_sincos_f32(x, (__attribute__((address_space(5))) float*) &tmp); *cptr = tmp; } __DEVICE__ inline void sincospif(float x, float* sptr, float* cptr) { float tmp; *sptr = __ocml_sincospi_f32(x, (__attribute__((address_space(5))) float*) &tmp); *cptr = tmp; } __DEVICE__ inline float sinf(float x) { return __ocml_sin_f32(x); } __DEVICE__ inline float sinhf(float x) { return __ocml_sinh_f32(x); } __DEVICE__ inline float sinpif(float x) { return __ocml_sinpi_f32(x); } __DEVICE__ inline float sqrtf(float x) { return __ocml_sqrt_f32(x); } __DEVICE__ inline float tanf(float x) { return __ocml_tan_f32(x); } __DEVICE__ inline float tanhf(float x) { return __ocml_tanh_f32(x); } __DEVICE__ inline float tgammaf(float x) { return __ocml_tgamma_f32(x); } __DEVICE__ inline float truncf(float x) { return __ocml_trunc_f32(x); } __DEVICE__ inline float y0f(float x) { return __ocml_y0_f32(x); } __DEVICE__ inline float y1f(float x) { return __ocml_y1_f32(x); } __DEVICE__ inline float ynf(int n, float x) { // TODO: we could use Ahmes multiplication and the Miller & Brown algorithm // for linear recurrences to get O(log n) steps, but it's unclear if // it'd be beneficial in this case. Placeholder until OCML adds // support. if (n == 0) return y0f(x); if (n == 1) return y1f(x); float x0 = y0f(x); float x1 = y1f(x); for (int i = 1; i < n; ++i) { float x2 = (2 * i) / x * x1 - x0; x0 = x1; x1 = x2; } return x1; } // BEGIN INTRINSICS __DEVICE__ inline float __cosf(float x) { return __ocml_native_cos_f32(x); } __DEVICE__ inline float __exp10f(float x) { return __ocml_native_exp10_f32(x); } __DEVICE__ inline float __expf(float x) { return __ocml_native_exp_f32(x); } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fadd_rd(float x, float y) { return __ocml_add_rtn_f32(x, y); } #endif __DEVICE__ inline float __fadd_rn(float x, float y) { return x + y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fadd_ru(float x, float y) { return __ocml_add_rtp_f32(x, y); } __DEVICE__ inline float __fadd_rz(float x, float y) { return __ocml_add_rtz_f32(x, y); } __DEVICE__ inline float __fdiv_rd(float x, float y) { return __ocml_div_rtn_f32(x, y); } #endif __DEVICE__ inline float __fdiv_rn(float x, float y) { return x / y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fdiv_ru(float x, float y) { return __ocml_div_rtp_f32(x, y); } __DEVICE__ inline float __fdiv_rz(float x, float y) { return __ocml_div_rtz_f32(x, y); } #endif __DEVICE__ inline float __fdividef(float x, float y) { return x / y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fmaf_rd(float x, float y, float z) { return __ocml_fma_rtn_f32(x, y, z); } #endif __DEVICE__ inline float __fmaf_rn(float x, float y, float z) { return __ocml_fma_f32(x, y, z); } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fmaf_ru(float x, float y, float z) { return __ocml_fma_rtp_f32(x, y, z); } __DEVICE__ inline float __fmaf_rz(float x, float y, float z) { return __ocml_fma_rtz_f32(x, y, z); } __DEVICE__ inline float __fmul_rd(float x, float y) { return __ocml_mul_rtn_f32(x, y); } #endif __DEVICE__ inline float __fmul_rn(float x, float y) { return x * y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fmul_ru(float x, float y) { return __ocml_mul_rtp_f32(x, y); } __DEVICE__ inline float __fmul_rz(float x, float y) { return __ocml_mul_rtz_f32(x, y); } __DEVICE__ inline float __frcp_rd(float x) { return __builtin_amdgcn_rcpf(x); } #endif __DEVICE__ inline float __frcp_rn(float x) { return __builtin_amdgcn_rcpf(x); } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __frcp_ru(float x) { return __builtin_amdgcn_rcpf(x); } __DEVICE__ inline float __frcp_rz(float x) { return __builtin_amdgcn_rcpf(x); } #endif __DEVICE__ inline float __frsqrt_rn(float x) { return __builtin_amdgcn_rsqf(x); } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fsqrt_rd(float x) { return __ocml_sqrt_rtn_f32(x); } #endif __DEVICE__ inline float __fsqrt_rn(float x) { return __ocml_native_sqrt_f32(x); } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fsqrt_ru(float x) { return __ocml_sqrt_rtp_f32(x); } __DEVICE__ inline float __fsqrt_rz(float x) { return __ocml_sqrt_rtz_f32(x); } __DEVICE__ inline float __fsub_rd(float x, float y) { return __ocml_sub_rtn_f32(x, y); } #endif __DEVICE__ inline float __fsub_rn(float x, float y) { return x - y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline float __fsub_ru(float x, float y) { return __ocml_sub_rtp_f32(x, y); } __DEVICE__ inline float __fsub_rz(float x, float y) { return __ocml_sub_rtz_f32(x, y); } #endif __DEVICE__ inline float __log10f(float x) { return __ocml_native_log10_f32(x); } __DEVICE__ inline float __log2f(float x) { return __ocml_native_log2_f32(x); } __DEVICE__ inline float __logf(float x) { return __ocml_native_log_f32(x); } __DEVICE__ inline float __powf(float x, float y) { return __ocml_pow_f32(x, y); } __DEVICE__ inline float __saturatef(float x) { return (x < 0) ? 0 : ((x > 1) ? 1 : x); } __DEVICE__ inline void __sincosf(float x, float* sptr, float* cptr) { *sptr = __ocml_native_sin_f32(x); *cptr = __ocml_native_cos_f32(x); } __DEVICE__ inline float __sinf(float x) { return __ocml_native_sin_f32(x); } __DEVICE__ inline float __tanf(float x) { return __ocml_tan_f32(x); } // END INTRINSICS // END FLOAT // BEGIN DOUBLE __DEVICE__ inline double abs(double x) { return __ocml_fabs_f64(x); } __DEVICE__ inline double acos(double x) { return __ocml_acos_f64(x); } __DEVICE__ inline double acosh(double x) { return __ocml_acosh_f64(x); } __DEVICE__ inline double asin(double x) { return __ocml_asin_f64(x); } __DEVICE__ inline double asinh(double x) { return __ocml_asinh_f64(x); } __DEVICE__ inline double atan(double x) { return __ocml_atan_f64(x); } __DEVICE__ inline double atan2(double x, double y) { return __ocml_atan2_f64(x, y); } __DEVICE__ inline double atanh(double x) { return __ocml_atanh_f64(x); } __DEVICE__ inline double cbrt(double x) { return __ocml_cbrt_f64(x); } __DEVICE__ inline double ceil(double x) { return __ocml_ceil_f64(x); } __DEVICE__ inline double copysign(double x, double y) { return __ocml_copysign_f64(x, y); } __DEVICE__ inline double cos(double x) { return __ocml_cos_f64(x); } __DEVICE__ inline double cosh(double x) { return __ocml_cosh_f64(x); } __DEVICE__ inline double cospi(double x) { return __ocml_cospi_f64(x); } __DEVICE__ inline double cyl_bessel_i0(double x) { return __ocml_i0_f64(x); } __DEVICE__ inline double cyl_bessel_i1(double x) { return __ocml_i1_f64(x); } __DEVICE__ inline double erf(double x) { return __ocml_erf_f64(x); } __DEVICE__ inline double erfc(double x) { return __ocml_erfc_f64(x); } __DEVICE__ inline double erfcinv(double x) { return __ocml_erfcinv_f64(x); } __DEVICE__ inline double erfcx(double x) { return __ocml_erfcx_f64(x); } __DEVICE__ inline double erfinv(double x) { return __ocml_erfinv_f64(x); } __DEVICE__ inline double exp(double x) { return __ocml_exp_f64(x); } __DEVICE__ inline double exp10(double x) { return __ocml_exp10_f64(x); } __DEVICE__ inline double exp2(double x) { return __ocml_exp2_f64(x); } __DEVICE__ inline double expm1(double x) { return __ocml_expm1_f64(x); } __DEVICE__ inline double fabs(double x) { return __ocml_fabs_f64(x); } __DEVICE__ inline double fdim(double x, double y) { return __ocml_fdim_f64(x, y); } __DEVICE__ inline double floor(double x) { return __ocml_floor_f64(x); } __DEVICE__ inline double fma(double x, double y, double z) { return __ocml_fma_f64(x, y, z); } __DEVICE__ inline double fmax(double x, double y) { return __ocml_fmax_f64(x, y); } __DEVICE__ inline double fmin(double x, double y) { return __ocml_fmin_f64(x, y); } __DEVICE__ inline double fmod(double x, double y) { return __ocml_fmod_f64(x, y); } __DEVICE__ inline double frexp(double x, int* nptr) { int tmp; double r = __ocml_frexp_f64(x, (__attribute__((address_space(5))) int*) &tmp); *nptr = tmp; return r; } __DEVICE__ inline double hypot(double x, double y) { return __ocml_hypot_f64(x, y); } __DEVICE__ inline int ilogb(double x) { return __ocml_ilogb_f64(x); } __DEVICE__ inline __RETURN_TYPE isfinite(double x) { return __ocml_isfinite_f64(x); } __DEVICE__ inline __RETURN_TYPE isinf(double x) { return __ocml_isinf_f64(x); } __DEVICE__ inline __RETURN_TYPE isnan(double x) { return __ocml_isnan_f64(x); } __DEVICE__ inline double j0(double x) { return __ocml_j0_f64(x); } __DEVICE__ inline double j1(double x) { return __ocml_j1_f64(x); } __DEVICE__ inline double jn(int n, double x) { // TODO: we could use Ahmes multiplication and the Miller & Brown algorithm // for linear recurrences to get O(log n) steps, but it's unclear if // it'd be beneficial in this case. Placeholder until OCML adds // support. if (n == 0) return j0f(x); if (n == 1) return j1f(x); double x0 = j0f(x); double x1 = j1f(x); for (int i = 1; i < n; ++i) { double x2 = (2 * i) / x * x1 - x0; x0 = x1; x1 = x2; } return x1; } __DEVICE__ inline double ldexp(double x, int e) { return __ocml_ldexp_f64(x, e); } __DEVICE__ inline double lgamma(double x) { return __ocml_lgamma_f64(x); } __DEVICE__ inline long long int llrint(double x) { return __ocml_rint_f64(x); } __DEVICE__ inline long long int llround(double x) { return __ocml_round_f64(x); } __DEVICE__ inline double log(double x) { return __ocml_log_f64(x); } __DEVICE__ inline double log10(double x) { return __ocml_log10_f64(x); } __DEVICE__ inline double log1p(double x) { return __ocml_log1p_f64(x); } __DEVICE__ inline double log2(double x) { return __ocml_log2_f64(x); } __DEVICE__ inline double logb(double x) { return __ocml_logb_f64(x); } __DEVICE__ inline long int lrint(double x) { return __ocml_rint_f64(x); } __DEVICE__ inline long int lround(double x) { return __ocml_round_f64(x); } __DEVICE__ inline double modf(double x, double* iptr) { double tmp; double r = __ocml_modf_f64(x, (__attribute__((address_space(5))) double*) &tmp); *iptr = tmp; return r; } __DEVICE__ inline double nan(const char* tagp) { #if !_WIN32 union { double val; struct ieee_double { uint64_t mantissa : 51; uint32_t quiet : 1; uint32_t exponent : 11; uint32_t sign : 1; } bits; static_assert(sizeof(double) == sizeof(ieee_double), ""); } tmp; tmp.bits.sign = 0u; tmp.bits.exponent = ~0u; tmp.bits.quiet = 1u; tmp.bits.mantissa = __make_mantissa(tagp); return tmp.val; #else static_assert(sizeof(uint64_t)==sizeof(double)); uint64_t val = __make_mantissa(tagp); val |= 0xFFF << 51; return *reinterpret_cast(&val); #endif } __DEVICE__ inline double nearbyint(double x) { return __ocml_nearbyint_f64(x); } __DEVICE__ inline double nextafter(double x, double y) { return __ocml_nextafter_f64(x, y); } __DEVICE__ inline double norm(int dim, const double* a) { // TODO: placeholder until OCML adds support. double r = 0; while (dim--) { r += a[0] * a[0]; ++a; } return __ocml_sqrt_f64(r); } __DEVICE__ inline double norm3d(double x, double y, double z) { return __ocml_len3_f64(x, y, z); } __DEVICE__ inline double norm4d(double x, double y, double z, double w) { return __ocml_len4_f64(x, y, z, w); } __DEVICE__ inline double normcdf(double x) { return __ocml_ncdf_f64(x); } __DEVICE__ inline double normcdfinv(double x) { return __ocml_ncdfinv_f64(x); } __DEVICE__ inline double pow(double x, double y) { return __ocml_pow_f64(x, y); } __DEVICE__ inline double powi(double base, int iexp) { return __ocml_pown_f64(base, iexp); } __DEVICE__ inline double rcbrt(double x) { return __ocml_rcbrt_f64(x); } __DEVICE__ inline double remainder(double x, double y) { return __ocml_remainder_f64(x, y); } __DEVICE__ inline double remquo(double x, double y, int* quo) { int tmp; double r = __ocml_remquo_f64(x, y, (__attribute__((address_space(5))) int*) &tmp); *quo = tmp; return r; } __DEVICE__ inline double rhypot(double x, double y) { return __ocml_rhypot_f64(x, y); } __DEVICE__ inline double rint(double x) { return __ocml_rint_f64(x); } __DEVICE__ inline double rnorm(int dim, const double* a) { // TODO: placeholder until OCML adds support. double r = 0; while (dim--) { r += a[0] * a[0]; ++a; } return __ocml_rsqrt_f64(r); } __DEVICE__ inline double rnorm3d(double x, double y, double z) { return __ocml_rlen3_f64(x, y, z); } __DEVICE__ inline double rnorm4d(double x, double y, double z, double w) { return __ocml_rlen4_f64(x, y, z, w); } __DEVICE__ inline double round(double x) { return __ocml_round_f64(x); } __DEVICE__ inline double rsqrt(double x) { return __ocml_rsqrt_f64(x); } __DEVICE__ inline double scalbln(double x, long int n) { return (n < INT_MAX) ? __ocml_scalbn_f64(x, n) : __ocml_scalb_f64(x, n); } __DEVICE__ inline double scalbn(double x, int n) { return __ocml_scalbn_f64(x, n); } __DEVICE__ inline __RETURN_TYPE signbit(double x) { return __ocml_signbit_f64(x); } __DEVICE__ inline double sin(double x) { return __ocml_sin_f64(x); } __DEVICE__ inline void sincos(double x, double* sptr, double* cptr) { double tmp; *sptr = __ocml_sincos_f64(x, (__attribute__((address_space(5))) double*) &tmp); *cptr = tmp; } __DEVICE__ inline void sincospi(double x, double* sptr, double* cptr) { double tmp; *sptr = __ocml_sincospi_f64( x, (__attribute__((address_space(5))) double*) &tmp); *cptr = tmp; } __DEVICE__ inline double sinh(double x) { return __ocml_sinh_f64(x); } __DEVICE__ inline double sinpi(double x) { return __ocml_sinpi_f64(x); } __DEVICE__ inline double sqrt(double x) { return __ocml_sqrt_f64(x); } __DEVICE__ inline double tan(double x) { return __ocml_tan_f64(x); } __DEVICE__ inline double tanh(double x) { return __ocml_tanh_f64(x); } __DEVICE__ inline double tgamma(double x) { return __ocml_tgamma_f64(x); } __DEVICE__ inline double trunc(double x) { return __ocml_trunc_f64(x); } __DEVICE__ inline double y0(double x) { return __ocml_y0_f64(x); } __DEVICE__ inline double y1(double x) { return __ocml_y1_f64(x); } __DEVICE__ inline double yn(int n, double x) { // TODO: we could use Ahmes multiplication and the Miller & Brown algorithm // for linear recurrences to get O(log n) steps, but it's unclear if // it'd be beneficial in this case. Placeholder until OCML adds // support. if (n == 0) return j0f(x); if (n == 1) return j1f(x); double x0 = j0f(x); double x1 = j1f(x); for (int i = 1; i < n; ++i) { double x2 = (2 * i) / x * x1 - x0; x0 = x1; x1 = x2; } return x1; } // BEGIN INTRINSICS #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline double __dadd_rd(double x, double y) { return __ocml_add_rtn_f64(x, y); } #endif __DEVICE__ inline double __dadd_rn(double x, double y) { return x + y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline double __dadd_ru(double x, double y) { return __ocml_add_rtp_f64(x, y); } __DEVICE__ inline double __dadd_rz(double x, double y) { return __ocml_add_rtz_f64(x, y); } __DEVICE__ inline double __ddiv_rd(double x, double y) { return __ocml_div_rtn_f64(x, y); } #endif __DEVICE__ inline double __ddiv_rn(double x, double y) { return x / y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline double __ddiv_ru(double x, double y) { return __ocml_div_rtp_f64(x, y); } __DEVICE__ inline double __ddiv_rz(double x, double y) { return __ocml_div_rtz_f64(x, y); } __DEVICE__ inline double __dmul_rd(double x, double y) { return __ocml_mul_rtn_f64(x, y); } #endif __DEVICE__ inline double __dmul_rn(double x, double y) { return x * y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline double __dmul_ru(double x, double y) { return __ocml_mul_rtp_f64(x, y); } __DEVICE__ inline double __dmul_rz(double x, double y) { return __ocml_mul_rtz_f64(x, y); } __DEVICE__ inline double __drcp_rd(double x) { return __builtin_amdgcn_rcp(x); } #endif __DEVICE__ inline double __drcp_rn(double x) { return __builtin_amdgcn_rcp(x); } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline double __drcp_ru(double x) { return __builtin_amdgcn_rcp(x); } __DEVICE__ inline double __drcp_rz(double x) { return __builtin_amdgcn_rcp(x); } __DEVICE__ inline double __dsqrt_rd(double x) { return __ocml_sqrt_rtn_f64(x); } #endif __DEVICE__ inline double __dsqrt_rn(double x) { return __ocml_sqrt_f64(x); } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline double __dsqrt_ru(double x) { return __ocml_sqrt_rtp_f64(x); } __DEVICE__ inline double __dsqrt_rz(double x) { return __ocml_sqrt_rtz_f64(x); } __DEVICE__ inline double __dsub_rd(double x, double y) { return __ocml_sub_rtn_f64(x, y); } #endif __DEVICE__ inline double __dsub_rn(double x, double y) { return x - y; } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline double __dsub_ru(double x, double y) { return __ocml_sub_rtp_f64(x, y); } __DEVICE__ inline double __dsub_rz(double x, double y) { return __ocml_sub_rtz_f64(x, y); } __DEVICE__ inline double __fma_rd(double x, double y, double z) { return __ocml_fma_rtn_f64(x, y, z); } #endif __DEVICE__ inline double __fma_rn(double x, double y, double z) { return __ocml_fma_f64(x, y, z); } #if defined OCML_BASIC_ROUNDED_OPERATIONS __DEVICE__ inline double __fma_ru(double x, double y, double z) { return __ocml_fma_rtp_f64(x, y, z); } __DEVICE__ inline double __fma_rz(double x, double y, double z) { return __ocml_fma_rtz_f64(x, y, z); } #endif // END INTRINSICS // END DOUBLE // BEGIN INTEGER __DEVICE__ inline int abs(int x) { int sgn = x >> (sizeof(int) * CHAR_BIT - 1); return (x ^ sgn) - sgn; } __DEVICE__ inline long labs(long x) { long sgn = x >> (sizeof(long) * CHAR_BIT - 1); return (x ^ sgn) - sgn; } __DEVICE__ inline long long llabs(long long x) { long long sgn = x >> (sizeof(long long) * CHAR_BIT - 1); return (x ^ sgn) - sgn; } #if defined(__cplusplus) __DEVICE__ inline long abs(long x) { return labs(x); } __DEVICE__ inline long long abs(long long x) { return llabs(x); } #endif // END INTEGER __DEVICE__ inline _Float16 fma(_Float16 x, _Float16 y, _Float16 z) { return __ocml_fma_f16(x, y, z); } __DEVICE__ inline float fma(float x, float y, float z) { return fmaf(x, y, z); } #pragma push_macro("__DEF_FLOAT_FUN") #pragma push_macro("__DEF_FLOAT_FUN2") #pragma push_macro("__DEF_FLOAT_FUN2I") #pragma push_macro("__HIP_OVERLOAD") #pragma push_macro("__HIP_OVERLOAD2") // __hip_enable_if::type is a type function which returns __T if __B is true. template struct __hip_enable_if {}; template struct __hip_enable_if { typedef __T type; }; // __HIP_OVERLOAD1 is used to resolve function calls with integer argument to // avoid compilation error due to ambibuity. e.g. floor(5) is resolved with // floor(double). #define __HIP_OVERLOAD1(__retty, __fn) \ template \ __DEVICE__ \ typename __hip_enable_if::is_integer, \ __retty>::type \ __fn(__T __x) { \ return ::__fn((double)__x); \ } // __HIP_OVERLOAD2 is used to resolve function calls with mixed float/double // or integer argument to avoid compilation error due to ambibuity. e.g. // max(5.0f, 6.0) is resolved with max(double, double). #define __HIP_OVERLOAD2(__retty, __fn) \ template \ __DEVICE__ typename __hip_enable_if< \ std::numeric_limits<__T1>::is_specialized && \ std::numeric_limits<__T2>::is_specialized, \ __retty>::type \ __fn(__T1 __x, __T2 __y) { \ return __fn((double)__x, (double)__y); \ } // Define cmath functions with float argument and returns float. #define __DEF_FUN1(retty, func) \ __DEVICE__ \ inline \ float func(float x) \ { \ return func##f(x); \ } \ __HIP_OVERLOAD1(retty, func) // Define cmath functions with float argument and returns retty. #define __DEF_FUNI(retty, func) \ __DEVICE__ \ inline \ retty func(float x) \ { \ return func##f(x); \ } \ __HIP_OVERLOAD1(retty, func) // define cmath functions with two float arguments. #define __DEF_FUN2(retty, func) \ __DEVICE__ \ inline \ float func(float x, float y) \ { \ return func##f(x, y); \ } \ __HIP_OVERLOAD2(retty, func) __DEF_FUN1(double, acos) __DEF_FUN1(double, acosh) __DEF_FUN1(double, asin) __DEF_FUN1(double, asinh) __DEF_FUN1(double, atan) __DEF_FUN2(double, atan2); __DEF_FUN1(double, atanh) __DEF_FUN1(double, cbrt) __DEF_FUN1(double, ceil) __DEF_FUN2(double, copysign); __DEF_FUN1(double, cos) __DEF_FUN1(double, cosh) __DEF_FUN1(double, erf) __DEF_FUN1(double, erfc) __DEF_FUN1(double, exp) __DEF_FUN1(double, exp2) __DEF_FUN1(double, expm1) __DEF_FUN1(double, fabs) __DEF_FUN2(double, fdim); __DEF_FUN1(double, floor) __DEF_FUN2(double, fmax); __DEF_FUN2(double, fmin); __DEF_FUN2(double, fmod); //__HIP_OVERLOAD1(int, fpclassify) __DEF_FUN2(double, hypot); __DEF_FUNI(int, ilogb) __HIP_OVERLOAD1(bool, isfinite) __HIP_OVERLOAD2(bool, isgreater); __HIP_OVERLOAD2(bool, isgreaterequal); __HIP_OVERLOAD1(bool, isinf); __HIP_OVERLOAD2(bool, isless); __HIP_OVERLOAD2(bool, islessequal); __HIP_OVERLOAD2(bool, islessgreater); __HIP_OVERLOAD1(bool, isnan); //__HIP_OVERLOAD1(bool, isnormal) __HIP_OVERLOAD2(bool, isunordered); __DEF_FUN1(double, lgamma) __DEF_FUN1(double, log) __DEF_FUN1(double, log10) __DEF_FUN1(double, log1p) __DEF_FUN1(double, log2) __DEF_FUN1(double, logb) __DEF_FUNI(long long, llrint) __DEF_FUNI(long long, llround) __DEF_FUNI(long, lrint) __DEF_FUNI(long, lround) __DEF_FUN1(double, nearbyint); __DEF_FUN2(double, nextafter); __DEF_FUN2(double, pow); __DEF_FUN2(double, remainder); __DEF_FUN1(double, rint); __DEF_FUN1(double, round); __HIP_OVERLOAD1(bool, signbit) __DEF_FUN1(double, sin) __DEF_FUN1(double, sinh) __DEF_FUN1(double, sqrt) __DEF_FUN1(double, tan) __DEF_FUN1(double, tanh) __DEF_FUN1(double, tgamma) __DEF_FUN1(double, trunc); // define cmath functions with a float and an integer argument. #define __DEF_FLOAT_FUN2I(func) \ __DEVICE__ \ inline \ float func(float x, int y) \ { \ return func##f(x, y); \ } __DEF_FLOAT_FUN2I(scalbn) __DEF_FLOAT_FUN2I(ldexp) template __DEVICE__ inline T min(T arg1, T arg2) { return (arg1 < arg2) ? arg1 : arg2; } template __DEVICE__ inline T max(T arg1, T arg2) { return (arg1 > arg2) ? arg1 : arg2; } __DEVICE__ inline int min(int arg1, int arg2) { return (arg1 < arg2) ? arg1 : arg2; } __DEVICE__ inline int max(int arg1, int arg2) { return (arg1 > arg2) ? arg1 : arg2; } __DEVICE__ inline int min(uint32_t arg1, int arg2) { return (arg1 < arg2) ? arg1 : arg2; } __DEVICE__ inline int max(uint32_t arg1, int arg2) { return (arg1 > arg2) ? arg1 : arg2; } __DEVICE__ inline float max(float x, float y) { return fmaxf(x, y); } __DEVICE__ inline double max(double x, double y) { return fmax(x, y); } __DEVICE__ inline float min(float x, float y) { return fminf(x, y); } __DEVICE__ inline double min(double x, double y) { return fmin(x, y); } __HIP_OVERLOAD2(double, max) __HIP_OVERLOAD2(double, min) #if !defined(__HIPCC_RTC__) __host__ inline static int min(int arg1, int arg2) { return std::min(arg1, arg2); } __host__ inline static int max(int arg1, int arg2) { return std::max(arg1, arg2); } #endif // !defined(__HIPCC_RTC__) __DEVICE__ inline float pow(float base, int iexp) { return powif(base, iexp); } __DEVICE__ inline double pow(double base, int iexp) { return powi(base, iexp); } __DEVICE__ inline _Float16 pow(_Float16 base, int iexp) { return __ocml_pown_f16(base, iexp); } #pragma pop_macro("__DEF_FLOAT_FUN") #pragma pop_macro("__DEF_FLOAT_FUN2") #pragma pop_macro("__DEF_FLOAT_FUN2I") #pragma pop_macro("__HIP_OVERLOAD") #pragma pop_macro("__HIP_OVERLOAD2") #endif // !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ #pragma pop_macro("__DEVICE__") #pragma pop_macro("__RETURN_TYPE") // For backward compatibility. // There are HIP applications e.g. TensorFlow, expecting __HIP_ARCH_* macros // defined after including math_functions.h. #include clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_surface_functions.h000066400000000000000000000252321450307266000255130ustar00rootroot00000000000000/* Copyright (c) 2018 - 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_SURFACE_FUNCTIONS_H #define HIP_INCLUDE_HIP_AMD_DETAIL_SURFACE_FUNCTIONS_H #if defined(__cplusplus) #include #include #include #include #define __HIP_SURFACE_OBJECT_PARAMETERS_INIT \ unsigned int ADDRESS_SPACE_CONSTANT* i = (unsigned int ADDRESS_SPACE_CONSTANT*)surfObj; // CUDA is using byte address, need map to pixel address for HIP static __HOST_DEVICE__ __forceinline__ int __hipGetPixelAddr(int x, int format, int order) { /* * use below format index to generate format LUT typedef enum { HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT8 = 0, HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT16 = 1, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8 = 2, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT16 = 3, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24 = 4, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555 = 5, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565 = 6, HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010 = 7, HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT8 = 8, HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT16 = 9, HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT32 = 10, HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8 = 11, HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16 = 12, HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 = 13, HSA_EXT_IMAGE_CHANNEL_TYPE_HALF_FLOAT = 14, HSA_EXT_IMAGE_CHANNEL_TYPE_FLOAT = 15 } hsa_ext_image_channel_type_t; */ static const int FormatLUT[] = { 0, 1, 0, 1, 3, 1, 1, 1, 0, 1, 2, 0, 1, 2, 1, 2 }; x = FormatLUT[format] == 3 ? x / FormatLUT[format] : x >> FormatLUT[format]; /* * use below order index to generate order LUT typedef enum { HSA_EXT_IMAGE_CHANNEL_ORDER_A = 0, HSA_EXT_IMAGE_CHANNEL_ORDER_R = 1, HSA_EXT_IMAGE_CHANNEL_ORDER_RX = 2, HSA_EXT_IMAGE_CHANNEL_ORDER_RG = 3, HSA_EXT_IMAGE_CHANNEL_ORDER_RGX = 4, HSA_EXT_IMAGE_CHANNEL_ORDER_RA = 5, HSA_EXT_IMAGE_CHANNEL_ORDER_RGB = 6, HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX = 7, HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA = 8, HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA = 9, HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB = 10, HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR = 11, HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB = 12, HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX = 13, HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA = 14, HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA = 15, HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY = 16, HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE = 17, HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH = 18, HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL = 19 } hsa_ext_image_channel_order_t; */ static const int OrderLUT[] = { 0, 0, 1, 1, 3, 1, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 0, 0, 0, 0 }; return x = OrderLUT[order] == 3 ? x / OrderLUT[order] : x >> OrderLUT[order]; } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf1Dread(T* data, hipSurfaceObject_t surfObj, int x, int boundaryMode = hipBoundaryModeZero) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_1D(i), __ockl_image_channel_order_1D(i)); auto tmp = __ockl_image_load_1D(i, x); *data = __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf1Dwrite(T data, hipSurfaceObject_t surfObj, int x) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_1D(i), __ockl_image_channel_order_1D(i)); auto tmp = __hipMapTo(data); __ockl_image_store_1D(i, x, tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf2Dread(T* data, hipSurfaceObject_t surfObj, int x, int y) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_2D(i), __ockl_image_channel_order_2D(i)); auto tmp = __ockl_image_load_2D(i, int2(x, y).data); *data = __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf2Dwrite(T data, hipSurfaceObject_t surfObj, int x, int y) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_2D(i), __ockl_image_channel_order_2D(i)); auto tmp = __hipMapTo(data); __ockl_image_store_2D(i, int2(x, y).data, tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf3Dread(T* data, hipSurfaceObject_t surfObj, int x, int y, int z) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_3D(i), __ockl_image_channel_order_3D(i)); auto tmp = __ockl_image_load_3D(i, int4(x, y, z, 0).data); *data = __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf3Dwrite(T data, hipSurfaceObject_t surfObj, int x, int y, int z) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_3D(i), __ockl_image_channel_order_3D(i)); auto tmp = __hipMapTo(data); __ockl_image_store_3D(i, int4(x, y, z, 0).data, tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf1DLayeredread(T* data, hipSurfaceObject_t surfObj, int x, int layer) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_1D(i), __ockl_image_channel_order_1D(i)); auto tmp = __ockl_image_load_lod_1D(i, x, layer); *data = __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf1DLayeredwrite(T data, hipSurfaceObject_t surfObj, int x, int layer) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_1D(i), __ockl_image_channel_order_1D(i)); auto tmp = __hipMapTo(data); __ockl_image_store_lod_1D(i, x, layer, tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf2DLayeredread(T* data, hipSurfaceObject_t surfObj, int x, int y, int layer) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_2D(i), __ockl_image_channel_order_2D(i)); auto tmp = __ockl_image_load_lod_2D(i, int2(x, y).data, layer); *data = __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surf2DLayeredwrite(T data, hipSurfaceObject_t surfObj, int x, int y, int layer) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_2D(i), __ockl_image_channel_order_2D(i)); auto tmp = __hipMapTo(data); __ockl_image_store_lod_2D(i, int2(x, y).data, layer, tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surfCubemapread(T* data, hipSurfaceObject_t surfObj, int x, int y, int face) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_2D(i), __ockl_image_channel_order_2D(i)); auto tmp = __ockl_image_load_CM(i, int2(x, y).data, face); *data = __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surfCubemapwrite(T data, hipSurfaceObject_t surfObj, int x, int y, int face) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_2D(i), __ockl_image_channel_order_2D(i)); auto tmp = __hipMapTo(data); __ockl_image_store_CM(i, int2(x, y).data, face, tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surfCubemapLayeredread(T* data, hipSurfaceObject_t surfObj, int x, int y, int face, int layer) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_2D(i), __ockl_image_channel_order_2D(i)); auto tmp = __ockl_image_load_lod_CM(i, int2(x, y).data, face, layer); *data = __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void surfCubemapLayeredwrite(T* data, hipSurfaceObject_t surfObj, int x, int y, int face, int layer) { __HIP_SURFACE_OBJECT_PARAMETERS_INIT x = __hipGetPixelAddr(x, __ockl_image_channel_data_type_2D(i), __ockl_image_channel_order_2D(i)); auto tmp = __hipMapTo(data); __ockl_image_store_lod_CM(i, int2(x, y).data, face, layer, tmp); } #endif #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/amd_warp_functions.h000066400000000000000000000457351450307266000250460ustar00rootroot00000000000000/* Copyright (c) 2022 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_WARP_FUNCTIONS_H #define HIP_INCLUDE_HIP_AMD_DETAIL_WARP_FUNCTIONS_H #if defined(__clang__) #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wreserved-identifier" #pragma clang diagnostic ignored "-Wreserved-macro-identifier" #pragma clang diagnostic ignored "-Wsign-conversion" #pragma clang diagnostic ignored "-Wold-style-cast" #pragma clang diagnostic ignored "-Wc++98-compat" #pragma clang diagnostic ignored "-Wc++98-compat-pedantic" #endif __device__ static inline unsigned __hip_ds_bpermute(int index, unsigned src) { union { int i; unsigned u; float f; } tmp; tmp.u = src; tmp.i = __builtin_amdgcn_ds_bpermute(index, tmp.i); return tmp.u; } __device__ static inline float __hip_ds_bpermutef(int index, float src) { union { int i; unsigned u; float f; } tmp; tmp.f = src; tmp.i = __builtin_amdgcn_ds_bpermute(index, tmp.i); return tmp.f; } __device__ static inline unsigned __hip_ds_permute(int index, unsigned src) { union { int i; unsigned u; float f; } tmp; tmp.u = src; tmp.i = __builtin_amdgcn_ds_permute(index, tmp.i); return tmp.u; } __device__ static inline float __hip_ds_permutef(int index, float src) { union { int i; unsigned u; float f; } tmp; tmp.f = src; tmp.i = __builtin_amdgcn_ds_permute(index, tmp.i); return tmp.f; } #define __hip_ds_swizzle(src, pattern) __hip_ds_swizzle_N<(pattern)>((src)) #define __hip_ds_swizzlef(src, pattern) __hip_ds_swizzlef_N<(pattern)>((src)) template __device__ static inline unsigned __hip_ds_swizzle_N(unsigned int src) { union { int i; unsigned u; float f; } tmp; tmp.u = src; tmp.i = __builtin_amdgcn_ds_swizzle(tmp.i, pattern); return tmp.u; } template __device__ static inline float __hip_ds_swizzlef_N(float src) { union { int i; unsigned u; float f; } tmp; tmp.f = src; tmp.i = __builtin_amdgcn_ds_swizzle(tmp.i, pattern); return tmp.f; } #define __hip_move_dpp(src, dpp_ctrl, row_mask, bank_mask, bound_ctrl) \ __hip_move_dpp_N<(dpp_ctrl), (row_mask), (bank_mask), (bound_ctrl)>((src)) template __device__ static inline int __hip_move_dpp_N(int src) { return __builtin_amdgcn_mov_dpp(src, dpp_ctrl, row_mask, bank_mask, bound_ctrl); } static constexpr int warpSize = __AMDGCN_WAVEFRONT_SIZE; __device__ inline int __shfl(int var, int src_lane, int width = warpSize) { int self = __lane_id(); int index = (src_lane & (width - 1)) + (self & ~(width-1)); return __builtin_amdgcn_ds_bpermute(index<<2, var); } __device__ inline unsigned int __shfl(unsigned int var, int src_lane, int width = warpSize) { union { int i; unsigned u; float f; } tmp; tmp.u = var; tmp.i = __shfl(tmp.i, src_lane, width); return tmp.u; } __device__ inline float __shfl(float var, int src_lane, int width = warpSize) { union { int i; unsigned u; float f; } tmp; tmp.f = var; tmp.i = __shfl(tmp.i, src_lane, width); return tmp.f; } __device__ inline double __shfl(double var, int src_lane, int width = warpSize) { static_assert(sizeof(double) == 2 * sizeof(int), ""); static_assert(sizeof(double) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl(tmp[0], src_lane, width); tmp[1] = __shfl(tmp[1], src_lane, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); double tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline long __shfl(long var, int src_lane, int width = warpSize) { #ifndef _MSC_VER static_assert(sizeof(long) == 2 * sizeof(int), ""); static_assert(sizeof(long) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl(tmp[0], src_lane, width); tmp[1] = __shfl(tmp[1], src_lane, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; #else static_assert(sizeof(long) == sizeof(int), ""); return static_cast(__shfl(static_cast(var), src_lane, width)); #endif } __device__ inline unsigned long __shfl(unsigned long var, int src_lane, int width = warpSize) { #ifndef _MSC_VER static_assert(sizeof(unsigned long) == 2 * sizeof(unsigned int), ""); static_assert(sizeof(unsigned long) == sizeof(uint64_t), ""); unsigned int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl(tmp[0], src_lane, width); tmp[1] = __shfl(tmp[1], src_lane, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); unsigned long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; #else static_assert(sizeof(unsigned long) == sizeof(unsigned int), ""); return static_cast(__shfl(static_cast(var), src_lane, width)); #endif } __device__ inline long long __shfl(long long var, int src_lane, int width = warpSize) { static_assert(sizeof(long long) == 2 * sizeof(int), ""); static_assert(sizeof(long long) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl(tmp[0], src_lane, width); tmp[1] = __shfl(tmp[1], src_lane, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); long long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline unsigned long long __shfl(unsigned long long var, int src_lane, int width = warpSize) { static_assert(sizeof(unsigned long long) == 2 * sizeof(unsigned int), ""); static_assert(sizeof(unsigned long long) == sizeof(uint64_t), ""); unsigned int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl(tmp[0], src_lane, width); tmp[1] = __shfl(tmp[1], src_lane, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); unsigned long long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline int __shfl_up(int var, unsigned int lane_delta, int width = warpSize) { int self = __lane_id(); int index = self - lane_delta; index = (index < (self & ~(width-1)))?self:index; return __builtin_amdgcn_ds_bpermute(index<<2, var); } __device__ inline unsigned int __shfl_up(unsigned int var, unsigned int lane_delta, int width = warpSize) { union { int i; unsigned u; float f; } tmp; tmp.u = var; tmp.i = __shfl_up(tmp.i, lane_delta, width); return tmp.u; } __device__ inline float __shfl_up(float var, unsigned int lane_delta, int width = warpSize) { union { int i; unsigned u; float f; } tmp; tmp.f = var; tmp.i = __shfl_up(tmp.i, lane_delta, width); return tmp.f; } __device__ inline double __shfl_up(double var, unsigned int lane_delta, int width = warpSize) { static_assert(sizeof(double) == 2 * sizeof(int), ""); static_assert(sizeof(double) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_up(tmp[0], lane_delta, width); tmp[1] = __shfl_up(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); double tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline long __shfl_up(long var, unsigned int lane_delta, int width = warpSize) { #ifndef _MSC_VER static_assert(sizeof(long) == 2 * sizeof(int), ""); static_assert(sizeof(long) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_up(tmp[0], lane_delta, width); tmp[1] = __shfl_up(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; #else static_assert(sizeof(long) == sizeof(int), ""); return static_cast(__shfl_up(static_cast(var), lane_delta, width)); #endif } __device__ inline unsigned long __shfl_up(unsigned long var, unsigned int lane_delta, int width = warpSize) { #ifndef _MSC_VER static_assert(sizeof(unsigned long) == 2 * sizeof(unsigned int), ""); static_assert(sizeof(unsigned long) == sizeof(uint64_t), ""); unsigned int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_up(tmp[0], lane_delta, width); tmp[1] = __shfl_up(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); unsigned long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; #else static_assert(sizeof(unsigned long) == sizeof(unsigned int), ""); return static_cast(__shfl_up(static_cast(var), lane_delta, width)); #endif } __device__ inline long long __shfl_up(long long var, unsigned int lane_delta, int width = warpSize) { static_assert(sizeof(long long) == 2 * sizeof(int), ""); static_assert(sizeof(long long) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_up(tmp[0], lane_delta, width); tmp[1] = __shfl_up(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); long long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline unsigned long long __shfl_up(unsigned long long var, unsigned int lane_delta, int width = warpSize) { static_assert(sizeof(unsigned long long) == 2 * sizeof(unsigned int), ""); static_assert(sizeof(unsigned long long) == sizeof(uint64_t), ""); unsigned int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_up(tmp[0], lane_delta, width); tmp[1] = __shfl_up(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); unsigned long long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline int __shfl_down(int var, unsigned int lane_delta, int width = warpSize) { int self = __lane_id(); int index = self + lane_delta; index = (int)((self&(width-1))+lane_delta) >= width?self:index; return __builtin_amdgcn_ds_bpermute(index<<2, var); } __device__ inline unsigned int __shfl_down(unsigned int var, unsigned int lane_delta, int width = warpSize) { union { int i; unsigned u; float f; } tmp; tmp.u = var; tmp.i = __shfl_down(tmp.i, lane_delta, width); return tmp.u; } __device__ inline float __shfl_down(float var, unsigned int lane_delta, int width = warpSize) { union { int i; unsigned u; float f; } tmp; tmp.f = var; tmp.i = __shfl_down(tmp.i, lane_delta, width); return tmp.f; } __device__ inline double __shfl_down(double var, unsigned int lane_delta, int width = warpSize) { static_assert(sizeof(double) == 2 * sizeof(int), ""); static_assert(sizeof(double) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_down(tmp[0], lane_delta, width); tmp[1] = __shfl_down(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); double tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline long __shfl_down(long var, unsigned int lane_delta, int width = warpSize) { #ifndef _MSC_VER static_assert(sizeof(long) == 2 * sizeof(int), ""); static_assert(sizeof(long) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_down(tmp[0], lane_delta, width); tmp[1] = __shfl_down(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; #else static_assert(sizeof(long) == sizeof(int), ""); return static_cast(__shfl_down(static_cast(var), lane_delta, width)); #endif } __device__ inline unsigned long __shfl_down(unsigned long var, unsigned int lane_delta, int width = warpSize) { #ifndef _MSC_VER static_assert(sizeof(unsigned long) == 2 * sizeof(unsigned int), ""); static_assert(sizeof(unsigned long) == sizeof(uint64_t), ""); unsigned int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_down(tmp[0], lane_delta, width); tmp[1] = __shfl_down(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); unsigned long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; #else static_assert(sizeof(unsigned long) == sizeof(unsigned int), ""); return static_cast(__shfl_down(static_cast(var), lane_delta, width)); #endif } __device__ inline long long __shfl_down(long long var, unsigned int lane_delta, int width = warpSize) { static_assert(sizeof(long long) == 2 * sizeof(int), ""); static_assert(sizeof(long long) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_down(tmp[0], lane_delta, width); tmp[1] = __shfl_down(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); long long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline unsigned long long __shfl_down(unsigned long long var, unsigned int lane_delta, int width = warpSize) { static_assert(sizeof(unsigned long long) == 2 * sizeof(unsigned int), ""); static_assert(sizeof(unsigned long long) == sizeof(uint64_t), ""); unsigned int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_down(tmp[0], lane_delta, width); tmp[1] = __shfl_down(tmp[1], lane_delta, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); unsigned long long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline int __shfl_xor(int var, int lane_mask, int width = warpSize) { int self = __lane_id(); int index = self^lane_mask; index = index >= ((self+width)&~(width-1))?self:index; return __builtin_amdgcn_ds_bpermute(index<<2, var); } __device__ inline unsigned int __shfl_xor(unsigned int var, int lane_mask, int width = warpSize) { union { int i; unsigned u; float f; } tmp; tmp.u = var; tmp.i = __shfl_xor(tmp.i, lane_mask, width); return tmp.u; } __device__ inline float __shfl_xor(float var, int lane_mask, int width = warpSize) { union { int i; unsigned u; float f; } tmp; tmp.f = var; tmp.i = __shfl_xor(tmp.i, lane_mask, width); return tmp.f; } __device__ inline double __shfl_xor(double var, int lane_mask, int width = warpSize) { static_assert(sizeof(double) == 2 * sizeof(int), ""); static_assert(sizeof(double) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_xor(tmp[0], lane_mask, width); tmp[1] = __shfl_xor(tmp[1], lane_mask, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); double tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline long __shfl_xor(long var, int lane_mask, int width = warpSize) { #ifndef _MSC_VER static_assert(sizeof(long) == 2 * sizeof(int), ""); static_assert(sizeof(long) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_xor(tmp[0], lane_mask, width); tmp[1] = __shfl_xor(tmp[1], lane_mask, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; #else static_assert(sizeof(long) == sizeof(int), ""); return static_cast(__shfl_xor(static_cast(var), lane_mask, width)); #endif } __device__ inline unsigned long __shfl_xor(unsigned long var, int lane_mask, int width = warpSize) { #ifndef _MSC_VER static_assert(sizeof(unsigned long) == 2 * sizeof(unsigned int), ""); static_assert(sizeof(unsigned long) == sizeof(uint64_t), ""); unsigned int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_xor(tmp[0], lane_mask, width); tmp[1] = __shfl_xor(tmp[1], lane_mask, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); unsigned long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; #else static_assert(sizeof(unsigned long) == sizeof(unsigned int), ""); return static_cast(__shfl_xor(static_cast(var), lane_mask, width)); #endif } __device__ inline long long __shfl_xor(long long var, int lane_mask, int width = warpSize) { static_assert(sizeof(long long) == 2 * sizeof(int), ""); static_assert(sizeof(long long) == sizeof(uint64_t), ""); int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_xor(tmp[0], lane_mask, width); tmp[1] = __shfl_xor(tmp[1], lane_mask, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); long long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } __device__ inline unsigned long long __shfl_xor(unsigned long long var, int lane_mask, int width = warpSize) { static_assert(sizeof(unsigned long long) == 2 * sizeof(unsigned int), ""); static_assert(sizeof(unsigned long long) == sizeof(uint64_t), ""); unsigned int tmp[2]; __builtin_memcpy(tmp, &var, sizeof(tmp)); tmp[0] = __shfl_xor(tmp[0], lane_mask, width); tmp[1] = __shfl_xor(tmp[1], lane_mask, width); uint64_t tmp0 = (static_cast(tmp[1]) << 32ull) | static_cast(tmp[0]); unsigned long long tmp1; __builtin_memcpy(&tmp1, &tmp0, sizeof(tmp0)); return tmp1; } #if defined(__clang__) #pragma clang diagnostic pop #endif #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/concepts.hpp000066400000000000000000000023441450307266000233270ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once namespace hip_impl // Documentation only. { #define requires(...) #define FunctionalProcedure typename } // namespace hip_impl clr-rocm-5.7.1/hipamd/include/hip/amd_detail/device_library_decls.h000066400000000000000000000163761450307266000253200ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /** * @file amd_detail/device_library_decls.h * @brief Contains declarations for types and functions in device library. * Uses int64_t and uint64_t instead of long, long long, unsigned * long and unsigned long long types for device library API * declarations. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_DEVICE_LIBRARY_DECLS_H #define HIP_INCLUDE_HIP_AMD_DETAIL_DEVICE_LIBRARY_DECLS_H #include "hip/amd_detail/host_defines.h" typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef unsigned long long ullong; extern "C" __device__ __attribute__((const)) bool __ockl_wfany_i32(int); extern "C" __device__ __attribute__((const)) bool __ockl_wfall_i32(int); extern "C" __device__ uint __ockl_activelane_u32(void); extern "C" __device__ __attribute__((const)) uint __ockl_mul24_u32(uint, uint); extern "C" __device__ __attribute__((const)) int __ockl_mul24_i32(int, int); extern "C" __device__ __attribute__((const)) uint __ockl_mul_hi_u32(uint, uint); extern "C" __device__ __attribute__((const)) int __ockl_mul_hi_i32(int, int); extern "C" __device__ __attribute__((const)) uint __ockl_sadd_u32(uint, uint, uint); extern "C" __device__ __attribute__((const)) uchar __ockl_clz_u8(uchar); extern "C" __device__ __attribute__((const)) ushort __ockl_clz_u16(ushort); extern "C" __device__ __attribute__((const)) uint __ockl_clz_u32(uint); extern "C" __device__ __attribute__((const)) uint64_t __ockl_clz_u64(uint64_t); extern "C" __device__ __attribute__((const)) float __ocml_floor_f32(float); extern "C" __device__ __attribute__((const)) float __ocml_rint_f32(float); extern "C" __device__ __attribute__((const)) float __ocml_ceil_f32(float); extern "C" __device__ __attribute__((const)) float __ocml_trunc_f32(float); extern "C" __device__ __attribute__((const)) float __ocml_fmin_f32(float, float); extern "C" __device__ __attribute__((const)) float __ocml_fmax_f32(float, float); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtn_f32_f64(double); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtp_f32_f64(double); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtz_f32_f64(double); extern "C" __device__ __attribute__((const)) _Float16 __ocml_cvtrtn_f16_f32(float); extern "C" __device__ __attribute__((const)) _Float16 __ocml_cvtrtp_f16_f32(float); extern "C" __device__ __attribute__((const)) _Float16 __ocml_cvtrtz_f16_f32(float); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtn_f32_s32(int); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtp_f32_s32(int); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtz_f32_s32(int); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtn_f32_u32(uint32_t); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtp_f32_u32(uint32_t); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtz_f32_u32(uint32_t); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtn_f32_s64(int64_t); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtp_f32_s64(int64_t); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtz_f32_s64(int64_t); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtn_f32_u64(uint64_t); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtp_f32_u64(uint64_t); extern "C" __device__ __attribute__((const)) float __ocml_cvtrtz_f32_u64(uint64_t); extern "C" __device__ __attribute__((const)) double __ocml_cvtrtn_f64_s64(int64_t); extern "C" __device__ __attribute__((const)) double __ocml_cvtrtp_f64_s64(int64_t); extern "C" __device__ __attribute__((const)) double __ocml_cvtrtz_f64_s64(int64_t); extern "C" __device__ __attribute__((const)) double __ocml_cvtrtn_f64_u64(uint64_t); extern "C" __device__ __attribute__((const)) double __ocml_cvtrtp_f64_u64(uint64_t); extern "C" __device__ __attribute__((const)) double __ocml_cvtrtz_f64_u64(uint64_t); extern "C" __device__ __attribute__((convergent)) void __ockl_gws_init(uint nwm1, uint rid); extern "C" __device__ __attribute__((convergent)) void __ockl_gws_barrier(uint nwm1, uint rid); extern "C" __device__ __attribute__((const)) uint32_t __ockl_lane_u32(); extern "C" __device__ __attribute__((const)) int __ockl_grid_is_valid(void); extern "C" __device__ __attribute__((convergent)) void __ockl_grid_sync(void); extern "C" __device__ __attribute__((const)) uint __ockl_multi_grid_num_grids(void); extern "C" __device__ __attribute__((const)) uint __ockl_multi_grid_grid_rank(void); extern "C" __device__ __attribute__((const)) uint __ockl_multi_grid_size(void); extern "C" __device__ __attribute__((const)) uint __ockl_multi_grid_thread_rank(void); extern "C" __device__ __attribute__((const)) int __ockl_multi_grid_is_valid(void); extern "C" __device__ __attribute__((convergent)) void __ockl_multi_grid_sync(void); extern "C" __device__ void __ockl_atomic_add_noret_f32(float*, float); extern "C" __device__ __attribute__((convergent)) int __ockl_wgred_add_i32(int a); extern "C" __device__ __attribute__((convergent)) int __ockl_wgred_and_i32(int a); extern "C" __device__ __attribute__((convergent)) int __ockl_wgred_or_i32(int a); extern "C" __device__ uint64_t __ockl_fprintf_stderr_begin(); extern "C" __device__ uint64_t __ockl_fprintf_append_args(uint64_t msg_desc, uint32_t num_args, uint64_t value0, uint64_t value1, uint64_t value2, uint64_t value3, uint64_t value4, uint64_t value5, uint64_t value6, uint32_t is_last); extern "C" __device__ uint64_t __ockl_fprintf_append_string_n(uint64_t msg_desc, const char* data, uint64_t length, uint32_t is_last); // Introduce local address space #define __local __attribute__((address_space(3))) #ifdef __HIP_DEVICE_COMPILE__ __device__ inline static __local void* __to_local(unsigned x) { return (__local void*)x; } #endif //__HIP_DEVICE_COMPILE__ // Using hip.amdgcn.bc - sync threads #define __CLK_LOCAL_MEM_FENCE 0x01 typedef unsigned __cl_mem_fence_flags; #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/functional_grid_launch.hpp000066400000000000000000000177011450307266000262150ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "concepts.hpp" #include "helpers.hpp" #include "program_state.hpp" #include "hip_runtime_api.h" #include #include #include #include #include #include hipError_t ihipExtLaunchMultiKernelMultiDevice(hipLaunchParams* launchParamsList, int numDevices, unsigned int flags, hip_impl::program_state& ps); hipError_t hipLaunchCooperativeKernel(const void* f, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMem, hipStream_t stream, hip_impl::program_state& ps); hipError_t hipLaunchCooperativeKernelMultiDevice(hipLaunchParams* launchParamsList, int numDevices, unsigned int flags, hip_impl::program_state& ps); #pragma GCC visibility push(hidden) namespace hip_impl { template {}>::type* = nullptr> inline T round_up_to_next_multiple_nonnegative(T x, T y) { T tmp = x + y - 1; return tmp - tmp % y; } template < std::size_t n, typename... Ts, typename std::enable_if::type* = nullptr> inline hip_impl::kernarg make_kernarg( const std::tuple&, const kernargs_size_align&, hip_impl::kernarg kernarg) { return kernarg; } template < std::size_t n, typename... Ts, typename std::enable_if::type* = nullptr> inline hip_impl::kernarg make_kernarg( const std::tuple& formals, const kernargs_size_align& size_align, hip_impl::kernarg kernarg) { using T = typename std::tuple_element>::type; static_assert( !std::is_reference{}, "A __global__ function cannot have a reference as one of its " "arguments."); #if defined(HIP_STRICT) static_assert( std::is_trivially_copyable{}, "Only TriviallyCopyable types can be arguments to a __global__ " "function"); #endif kernarg.resize(round_up_to_next_multiple_nonnegative( kernarg.size(), size_align.alignment(n)) + size_align.size(n)); std::memcpy( kernarg.data() + kernarg.size() - size_align.size(n), &std::get(formals), size_align.size(n)); return make_kernarg(formals, size_align, std::move(kernarg)); } template inline hip_impl::kernarg make_kernarg( void (*kernel)(Formals...), std::tuple actuals) { static_assert(sizeof...(Formals) == sizeof...(Actuals), "The count of formal arguments must match the count of actuals."); if (sizeof...(Formals) == 0) return {}; std::tuple to_formals{std::move(actuals)}; hip_impl::kernarg kernarg; kernarg.reserve(sizeof(to_formals)); auto& ps = hip_impl::get_program_state(); return make_kernarg<0>(to_formals, ps.get_kernargs_size_align( reinterpret_cast(kernel)), std::move(kernarg)); } HIP_INTERNAL_EXPORTED_API hsa_agent_t target_agent(hipStream_t stream); inline __attribute__((visibility("hidden"))) void hipLaunchKernelGGLImpl( std::uintptr_t function_address, const dim3& numBlocks, const dim3& dimBlocks, std::uint32_t sharedMemBytes, hipStream_t stream, void** kernarg) { const auto& kd = hip_impl::get_program_state().kernel_descriptor(function_address, target_agent(stream)); hipModuleLaunchKernel(kd, numBlocks.x, numBlocks.y, numBlocks.z, dimBlocks.x, dimBlocks.y, dimBlocks.z, sharedMemBytes, stream, nullptr, kernarg); } } // Namespace hip_impl. template inline hipError_t hipOccupancyMaxPotentialBlockSize(int* gridSize, int* blockSize, T kernel, size_t dynSharedMemPerBlk = 0, int blockSizeLimit = 0) { using namespace hip_impl; hip_impl::hip_init(); auto f = get_program_state().kernel_descriptor(reinterpret_cast(kernel), target_agent(0)); return hipModuleOccupancyMaxPotentialBlockSize(gridSize, blockSize, f, dynSharedMemPerBlk, blockSizeLimit); } template inline hipError_t hipOccupancyMaxPotentialBlockSizeWithFlags(int* gridSize, int* blockSize, T kernel, size_t dynSharedMemPerBlk = 0, int blockSizeLimit = 0, unsigned int flags = 0 ) { using namespace hip_impl; hip_impl::hip_init(); if(flags != hipOccupancyDefault) return hipErrorNotSupported; auto f = get_program_state().kernel_descriptor(reinterpret_cast(kernel), target_agent(0)); return hipModuleOccupancyMaxPotentialBlockSize(gridSize, blockSize, f, dynSharedMemPerBlk, blockSizeLimit); } template inline void hipLaunchKernelGGL(F kernel, const dim3& numBlocks, const dim3& dimBlocks, std::uint32_t sharedMemBytes, hipStream_t stream, Args... args) { hip_impl::hip_init(); auto kernarg = hip_impl::make_kernarg(kernel, std::tuple{std::move(args)...}); std::size_t kernarg_size = kernarg.size(); void* config[]{ HIP_LAUNCH_PARAM_BUFFER_POINTER, kernarg.data(), HIP_LAUNCH_PARAM_BUFFER_SIZE, &kernarg_size, HIP_LAUNCH_PARAM_END}; hip_impl::hipLaunchKernelGGLImpl(reinterpret_cast(kernel), numBlocks, dimBlocks, sharedMemBytes, stream, &config[0]); } template inline __attribute__((visibility("hidden"))) hipError_t hipLaunchCooperativeKernel(F f, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMem, hipStream_t stream) { hip_impl::hip_init(); auto& ps = hip_impl::get_program_state(); return hipLaunchCooperativeKernel(reinterpret_cast(f), gridDim, blockDim, args, sharedMem, stream, ps); } inline __attribute__((visibility("hidden"))) hipError_t hipLaunchCooperativeKernelMultiDevice(hipLaunchParams* launchParamsList, int numDevices, unsigned int flags) { hip_impl::hip_init(); auto& ps = hip_impl::get_program_state(); return hipLaunchCooperativeKernelMultiDevice(launchParamsList, numDevices, flags, ps); } #pragma GCC visibility pop clr-rocm-5.7.1/hipamd/include/hip/amd_detail/grid_launch.h000066400000000000000000000034461450307266000234340ustar00rootroot00000000000000#pragma once #include #include #define GRID_LAUNCH_VERSION 20 // Extern definitions namespace hc{ class completion_future; class accelerator_view; } // 3 dim structure for groups and grids. typedef struct gl_dim3 { int x,y,z; gl_dim3(uint32_t _x=1, uint32_t _y=1, uint32_t _z=1) : x(_x), y(_y), z(_z) {}; } gl_dim3; typedef enum gl_barrier_bit { barrier_bit_queue_default, barrier_bit_none, barrier_bit_wait, } gl_barrier_bit; // grid_launch_parm contains information used to launch the kernel. typedef struct grid_launch_parm { //! Grid dimensions gl_dim3 grid_dim; //! Group dimensions gl_dim3 group_dim; //! Amount of dynamic group memory to use with the kernel launch. //! This memory is in addition to the amount used statically in the kernel. unsigned int dynamic_group_mem_bytes; //! Control setting of barrier bit on per-packet basis: //! See gl_barrier_bit description. //! Placeholder, is not used to control packet dispatch yet enum gl_barrier_bit barrier_bit; //! Value of packet fences to apply to launch. //! The correspond to the value of bits 9:14 in the AQL packet, //! see HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE and hsa_fence_scope_t. unsigned int launch_fence; //! Pointer to the accelerator_view where the kernel should execute. //! If NULL, the default view on the default accelerator is used. hc::accelerator_view *av; //! Pointer to the completion_future used to track the status of the command. //! If NULL, the command does not write status. In this case, //! synchronization can be enforced with queue-level waits or //! waiting on younger commands. hc::completion_future *cf; grid_launch_parm() = default; } grid_launch_parm; extern void init_grid_launch(grid_launch_parm *gl); clr-rocm-5.7.1/hipamd/include/hip/amd_detail/grid_launch.hpp000066400000000000000000000025321450307266000237670ustar00rootroot00000000000000#pragma once #include "grid_launch.h" #include "hc.hpp" class grid_launch_parm_cxx : public grid_launch_parm { public: grid_launch_parm_cxx() = default; // customized serialization: don't need av and cf in kernel __attribute__((annotate("serialize"))) void __cxxamp_serialize(Kalmar::Serialize& s) const { s.Append(sizeof(int), &grid_dim.x); s.Append(sizeof(int), &grid_dim.y); s.Append(sizeof(int), &grid_dim.z); s.Append(sizeof(int), &group_dim.x); s.Append(sizeof(int), &group_dim.y); s.Append(sizeof(int), &group_dim.z); } __attribute__((annotate("user_deserialize"))) grid_launch_parm_cxx(int grid_dim_x, int grid_dim_y, int grid_dim_z, int group_dim_x, int group_dim_y, int group_dim_z) { grid_dim.x = grid_dim_x; grid_dim.y = grid_dim_y; grid_dim.z = grid_dim_z; group_dim.x = group_dim_x; group_dim.y = group_dim_y; group_dim.z = group_dim_z; } }; extern inline void grid_launch_init(grid_launch_parm *lp) { lp->grid_dim.x = lp->grid_dim.y = lp->grid_dim.z = 1; lp->group_dim.x = lp->group_dim.y = lp->group_dim.z = 1; lp->dynamic_group_mem_bytes = 0; lp->barrier_bit = barrier_bit_queue_default; lp->launch_fence = -1; // TODO - set to NULL? static hc::accelerator_view av = hc::accelerator().get_default_view(); lp->av = &av; lp->cf = NULL; } clr-rocm-5.7.1/hipamd/include/hip/amd_detail/grid_launch_GGL.hpp000066400000000000000000000023031450307266000244540ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #if GENERIC_GRID_LAUNCH == 1 #include "macro_based_grid_launch.hpp" #endif // GENERIC_GRID_LAUNCHclr-rocm-5.7.1/hipamd/include/hip/amd_detail/helpers.hpp000066400000000000000000000131131450307266000231470ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "concepts.hpp" #include // For std::conditional, std::decay, std::enable_if, // std::false_type, std result_of and std::true_type. #include // For std::declval. #ifdef __has_include // Check if __has_include is present # if __has_include() // Check for version header # include # if defined(__cpp_lib_is_invocable) && !defined(HIP_HAS_INVOCABLE) # define HIP_HAS_INVOCABLE __cpp_lib_is_invocable # endif # if defined(__cpp_lib_result_of_sfinae) && !defined(HIP_HAS_RESULT_OF_SFINAE) # define HIP_HAS_RESULT_OF_SFINAE __cpp_lib_result_of_sfinae # endif # endif #endif #ifndef HIP_HAS_INVOCABLE #define HIP_HAS_INVOCABLE 0 #endif #ifndef HIP_HAS_RESULT_OF_SFINAE #define HIP_HAS_RESULT_OF_SFINAE 0 #endif namespace std { // TODO: these should be removed as soon as possible. #if (__cplusplus < 201406L) #if (__cplusplus < 201402L) template using enable_if_t = typename enable_if::type; template using conditional_t = typename conditional::type; template using decay_t = typename decay::type; template using result_of_t = typename result_of::type; template using remove_reference_t = typename remove_reference::type; #endif #endif } // namespace std namespace hip_impl { template using void_t_ = void; #if HIP_HAS_INVOCABLE template struct is_callable_impl; template struct is_callable_impl : std::is_invocable {}; #elif HIP_HAS_RESULT_OF_SFINAE template struct is_callable_impl : std::false_type {}; template struct is_callable_impl::type > > : std::true_type {}; #else template auto simple_invoke(T Base::*pmd, Derived&& ref) -> decltype(static_cast(ref).*pmd); template auto simple_invoke(PMD&& pmd, Pointer&& ptr) -> decltype((*static_cast(ptr)).*static_cast(pmd)); template auto simple_invoke(T Base::*pmd, const std::reference_wrapper& ref) -> decltype(ref.get().*pmd); template auto simple_invoke(T Base::*pmf, Derived&& ref, Args&&... args) -> decltype((static_cast(ref).*pmf)(static_cast(args)...)); template auto simple_invoke(PMF&& pmf, Pointer&& ptr, Args&&... args) -> decltype(((*static_cast(ptr)).*static_cast(pmf))(static_cast(args)...)); template auto simple_invoke(T Base::*pmf, const std::reference_wrapper& ref, Args&&... args) -> decltype((ref.get().*pmf)(static_cast(args)...)); template auto simple_invoke(F&& f, Ts&&... xs) -> decltype(f(static_cast(xs)...)); template struct is_callable_impl : std::false_type {}; template struct is_callable_impl(), std::declval()...))> > : std::true_type {}; #endif template struct is_callable : is_callable_impl {}; #define count_macro_args_impl_hip_(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, \ _14, _15, _16, _17, _18, _19, _20, _21, _22, _23, _24, _25, \ _26, _27, _28, _29, _30, _31, _n, ...) \ _n #define count_macro_args_hip_(...) \ count_macro_args_impl_hip_(, ##__VA_ARGS__, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, \ 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, \ 0) #define overloaded_macro_expand_hip_(macro, arg_cnt) macro##arg_cnt #define overload_macro_impl_hip_(macro, arg_cnt) overloaded_macro_expand_hip_(macro, arg_cnt) #define overload_macro_hip_(macro, ...) \ overload_macro_impl_hip_(macro, count_macro_args_hip_(__VA_ARGS__))(__VA_ARGS__) } // namespace hip_impl clr-rocm-5.7.1/hipamd/include/hip/amd_detail/hip_cooperative_groups_helper.h000066400000000000000000000200561450307266000272670ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /** * @file amd_detail/hip_cooperative_groups_helper.h * * @brief Device side implementation of cooperative group feature. * * Defines helper constructs and APIs which aid the types and device API * wrappers defined within `amd_detail/hip_cooperative_groups.h`. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COOPERATIVE_GROUPS_HELPER_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COOPERATIVE_GROUPS_HELPER_H #if __cplusplus #if !defined(__HIPCC_RTC__) #include #endif #if !defined(__align__) #define __align__(x) __attribute__((aligned(x))) #endif #if defined(__clang__) #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wreserved-macro-identifier" #pragma clang diagnostic ignored "-Wc++98-compat" #pragma clang diagnostic ignored "-Wc++98-compat-pedantic" #pragma clang diagnostic ignored "-Wshorten-64-to-32" #endif #if !defined(__CG_QUALIFIER__) #define __CG_QUALIFIER__ __device__ __forceinline__ #endif #if !defined(__CG_STATIC_QUALIFIER__) #define __CG_STATIC_QUALIFIER__ __device__ static __forceinline__ #endif #if !defined(_CG_STATIC_CONST_DECL_) #define _CG_STATIC_CONST_DECL_ static constexpr #endif #if __AMDGCN_WAVEFRONT_SIZE == 32 using lane_mask = unsigned int; #else using lane_mask = unsigned long long int; #endif namespace cooperative_groups { /* Global scope */ template using is_power_of_2 = std::integral_constant; template using is_valid_wavefront = std::integral_constant; template using is_valid_tile_size = std::integral_constant::value && is_valid_wavefront::value>; template using is_valid_type = std::integral_constant::value || std::is_floating_point::value>; namespace internal { /** * @brief Enums representing different cooperative group types * @note This enum is only applicable on Linux. * */ typedef enum { cg_invalid, cg_multi_grid, cg_grid, cg_workgroup, cg_tiled_group, cg_coalesced_group } group_type; /** * @ingroup CooperativeG * @{ * This section describes the cooperative groups functions of HIP runtime API. * * The cooperative groups provides flexible thread parallel programming algorithms, threads * cooperate and share data to perform collective computations. * * @note Cooperative groups feature is implemented on Linux, under developement * on Windows. * */ /** * * @brief Functionalities related to multi-grid cooperative group type * @note The following cooperative groups functions are only applicable on Linux. * */ namespace multi_grid { __CG_STATIC_QUALIFIER__ uint32_t num_grids() { return static_cast(__ockl_multi_grid_num_grids()); } __CG_STATIC_QUALIFIER__ uint32_t grid_rank() { return static_cast(__ockl_multi_grid_grid_rank()); } __CG_STATIC_QUALIFIER__ uint32_t size() { return static_cast(__ockl_multi_grid_size()); } __CG_STATIC_QUALIFIER__ uint32_t thread_rank() { return static_cast(__ockl_multi_grid_thread_rank()); } __CG_STATIC_QUALIFIER__ bool is_valid() { return static_cast(__ockl_multi_grid_is_valid()); } __CG_STATIC_QUALIFIER__ void sync() { __ockl_multi_grid_sync(); } } // namespace multi_grid /** * @brief Functionalities related to grid cooperative group type * @note The following cooperative groups functions are only applicable on Linux. */ namespace grid { __CG_STATIC_QUALIFIER__ uint32_t size() { return static_cast((blockDim.z * gridDim.z) * (blockDim.y * gridDim.y) * (blockDim.x * gridDim.x)); } __CG_STATIC_QUALIFIER__ uint32_t thread_rank() { // Compute global id of the workgroup to which the current thread belongs to uint32_t blkIdx = static_cast((blockIdx.z * gridDim.y * gridDim.x) + (blockIdx.y * gridDim.x) + (blockIdx.x)); // Compute total number of threads being passed to reach current workgroup // within grid uint32_t num_threads_till_current_workgroup = static_cast(blkIdx * (blockDim.x * blockDim.y * blockDim.z)); // Compute thread local rank within current workgroup uint32_t local_thread_rank = static_cast((threadIdx.z * blockDim.y * blockDim.x) + (threadIdx.y * blockDim.x) + (threadIdx.x)); return (num_threads_till_current_workgroup + local_thread_rank); } __CG_STATIC_QUALIFIER__ bool is_valid() { return static_cast(__ockl_grid_is_valid()); } __CG_STATIC_QUALIFIER__ void sync() { __ockl_grid_sync(); } } // namespace grid /** * @brief Functionalities related to `workgroup` (thread_block in CUDA terminology) * cooperative group type * @note The following cooperative groups functions are only applicable on Linux. */ namespace workgroup { __CG_STATIC_QUALIFIER__ dim3 group_index() { return (dim3(static_cast(blockIdx.x), static_cast(blockIdx.y), static_cast(blockIdx.z))); } __CG_STATIC_QUALIFIER__ dim3 thread_index() { return (dim3(static_cast(threadIdx.x), static_cast(threadIdx.y), static_cast(threadIdx.z))); } __CG_STATIC_QUALIFIER__ uint32_t size() { return (static_cast(blockDim.x * blockDim.y * blockDim.z)); } __CG_STATIC_QUALIFIER__ uint32_t thread_rank() { return (static_cast((threadIdx.z * blockDim.y * blockDim.x) + (threadIdx.y * blockDim.x) + (threadIdx.x))); } __CG_STATIC_QUALIFIER__ bool is_valid() { return true; } __CG_STATIC_QUALIFIER__ void sync() { __syncthreads(); } __CG_STATIC_QUALIFIER__ dim3 block_dim() { return (dim3(static_cast(blockDim.x), static_cast(blockDim.y), static_cast(blockDim.z))); } } // namespace workgroup namespace tiled_group { // enforce ordering for memory intructions __CG_STATIC_QUALIFIER__ void sync() { __builtin_amdgcn_fence(__ATOMIC_ACQ_REL, "agent"); } } // namespace tiled_group namespace coalesced_group { // enforce ordering for memory intructions __CG_STATIC_QUALIFIER__ void sync() { __builtin_amdgcn_fence(__ATOMIC_ACQ_REL, "agent"); } // Masked bit count // // For each thread, this function returns the number of active threads which // have i-th bit of x set and come before the current thread. __CG_STATIC_QUALIFIER__ unsigned int masked_bit_count(lane_mask x, unsigned int add = 0) { unsigned int counter=0; #if __AMDGCN_WAVEFRONT_SIZE == 32 counter = __builtin_amdgcn_mbcnt_lo(x, add); #else counter = __builtin_amdgcn_mbcnt_lo(static_cast(x), add); counter = __builtin_amdgcn_mbcnt_hi(static_cast(x >> 32), counter); #endif return counter; } } // namespace coalesced_group } // namespace internal } // namespace cooperative_groups /** * @} */ #if defined(__clang__) #pragma clang diagnostic pop #endif #endif // __cplusplus #endif // HIP_INCLUDE_HIP_AMD_DETAIL_HIP_COOPERATIVE_GROUPS_HELPER_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/hip_fp16_gcc.h000066400000000000000000000150041450307266000233760ustar00rootroot00000000000000#pragma once #if defined(__cplusplus) #include #endif struct __half_raw { unsigned short x; }; struct __half2_raw { unsigned short x; unsigned short y; }; #if defined(__cplusplus) struct __half; __half __float2half(float); float __half2float(__half); // BEGIN STRUCT __HALF struct __half { protected: unsigned short __x; public: // CREATORS __half() = default; __half(const __half_raw& x) : __x{x.x} {} #if !defined(__HIP_NO_HALF_CONVERSIONS__) __half(float x) : __x{__float2half(x).__x} {} __half(double x) : __x{__float2half(x).__x} {} #endif __half(const __half&) = default; __half(__half&&) = default; ~__half() = default; // MANIPULATORS __half& operator=(const __half&) = default; __half& operator=(__half&&) = default; __half& operator=(const __half_raw& x) { __x = x.x; return *this; } #if !defined(__HIP_NO_HALF_CONVERSIONS__) __half& operator=(float x) { __x = __float2half(x).__x; return *this; } __half& operator=(double x) { return *this = static_cast(x); } #endif // ACCESSORS operator float() const { return __half2float(*this); } operator __half_raw() const { return __half_raw{__x}; } }; // END STRUCT __HALF // BEGIN STRUCT __HALF2 struct __half2 { public: __half x; __half y; // CREATORS __half2() = default; __half2(const __half2_raw& ix) : x{reinterpret_cast(ix.x)}, y{reinterpret_cast(ix.y)} {} __half2(const __half& ix, const __half& iy) : x{ix}, y{iy} {} __half2(const __half2&) = default; __half2(__half2&&) = default; ~__half2() = default; // MANIPULATORS __half2& operator=(const __half2&) = default; __half2& operator=(__half2&&) = default; __half2& operator=(const __half2_raw& ix) { x = reinterpret_cast(ix.x); y = reinterpret_cast(ix.y); return *this; } // ACCESSORS operator __half2_raw() const { return __half2_raw{ reinterpret_cast(x), reinterpret_cast(y)}; } }; // END STRUCT __HALF2 inline unsigned short __internal_float2half( float flt, unsigned int& sgn, unsigned int& rem) { unsigned int x{}; std::memcpy(&x, &flt, sizeof(flt)); unsigned int u = (x & 0x7fffffffU); sgn = ((x >> 16) & 0x8000U); // NaN/+Inf/-Inf if (u >= 0x7f800000U) { rem = 0; return static_cast( (u == 0x7f800000U) ? (sgn | 0x7c00U) : 0x7fffU); } // Overflows if (u > 0x477fefffU) { rem = 0x80000000U; return static_cast(sgn | 0x7bffU); } // Normal numbers if (u >= 0x38800000U) { rem = u << 19; u -= 0x38000000U; return static_cast(sgn | (u >> 13)); } // +0/-0 if (u < 0x33000001U) { rem = u; return static_cast(sgn); } // Denormal numbers unsigned int exponent = u >> 23; unsigned int mantissa = (u & 0x7fffffU); unsigned int shift = 0x7eU - exponent; mantissa |= 0x800000U; rem = mantissa << (32 - shift); return static_cast(sgn | (mantissa >> shift)); } inline __half __float2half(float x) { __half_raw r; unsigned int sgn{}; unsigned int rem{}; r.x = __internal_float2half(x, sgn, rem); if (rem > 0x80000000U || (rem == 0x80000000U && (r.x & 0x1))) ++r.x; return r; } inline __half __float2half_rn(float x) { return __float2half(x); } inline __half __float2half_rz(float x) { __half_raw r; unsigned int sgn{}; unsigned int rem{}; r.x = __internal_float2half(x, sgn, rem); return r; } inline __half __float2half_rd(float x) { __half_raw r; unsigned int sgn{}; unsigned int rem{}; r.x = __internal_float2half(x, sgn, rem); if (rem && sgn) ++r.x; return r; } inline __half __float2half_ru(float x) { __half_raw r; unsigned int sgn{}; unsigned int rem{}; r.x = __internal_float2half(x, sgn, rem); if (rem && !sgn) ++r.x; return r; } inline __half2 __float2half2_rn(float x) { return __half2{__float2half_rn(x), __float2half_rn(x)}; } inline __half2 __floats2half2_rn(float x, float y) { return __half2{__float2half_rn(x), __float2half_rn(y)}; } inline float __internal_half2float(unsigned short x) { unsigned int sign = ((x >> 15) & 1); unsigned int exponent = ((x >> 10) & 0x1f); unsigned int mantissa = ((x & 0x3ff) << 13); if (exponent == 0x1fU) { /* NaN or Inf */ mantissa = (mantissa ? (sign = 0, 0x7fffffU) : 0); exponent = 0xffU; } else if (!exponent) { /* Denorm or Zero */ if (mantissa) { unsigned int msb; exponent = 0x71U; do { msb = (mantissa & 0x400000U); mantissa <<= 1; /* normalize */ --exponent; } while (!msb); mantissa &= 0x7fffffU; /* 1.mantissa is implicit */ } } else { exponent += 0x70U; } unsigned int u = ((sign << 31) | (exponent << 23) | mantissa); float f; memcpy(&f, &u, sizeof(u)); return f; } inline float __half2float(__half x) { return __internal_half2float(static_cast<__half_raw>(x).x); } inline float __low2float(__half2 x) { return __internal_half2float(static_cast<__half2_raw>(x).x); } inline float __high2float(__half2 x) { return __internal_half2float(static_cast<__half2_raw>(x).y); } #if !defined(HIP_NO_HALF) using half = __half; using half2 = __half2; #endif #endif // defined(__cplusplus) clr-rocm-5.7.1/hipamd/include/hip/amd_detail/hip_fp16_math_fwd.h000066400000000000000000000120161450307266000244330ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once // /* // Half Math Functions // */ #if !defined(__HIPCC_RTC__) #include "host_defines.h" #endif #ifndef __CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ extern "C" { __device__ __attribute__((const)) _Float16 __ocml_ceil_f16(_Float16); __device__ _Float16 __ocml_cos_f16(_Float16); __device__ __attribute__((pure)) _Float16 __ocml_exp_f16(_Float16); __device__ __attribute__((pure)) _Float16 __ocml_exp10_f16(_Float16); __device__ __attribute__((pure)) _Float16 __ocml_exp2_f16(_Float16); __device__ __attribute__((const)) _Float16 __ocml_floor_f16(_Float16); __device__ __attribute__((const)) _Float16 __ocml_fma_f16(_Float16, _Float16, _Float16); __device__ __attribute__((const)) _Float16 __ocml_fabs_f16(_Float16); __device__ __attribute__((const)) int __ocml_isinf_f16(_Float16); __device__ __attribute__((const)) int __ocml_isnan_f16(_Float16); __device__ __attribute__((pure)) _Float16 __ocml_log_f16(_Float16); __device__ __attribute__((pure)) _Float16 __ocml_log10_f16(_Float16); __device__ __attribute__((pure)) _Float16 __ocml_log2_f16(_Float16); __device__ __attribute__((pure)) _Float16 __ocml_pown_f16(_Float16, int); __device__ __attribute__((const)) _Float16 __ocml_rint_f16(_Float16); __device__ __attribute__((const)) _Float16 __ocml_rsqrt_f16(_Float16); __device__ _Float16 __ocml_sin_f16(_Float16); __device__ __attribute__((const)) _Float16 __ocml_sqrt_f16(_Float16); __device__ __attribute__((const)) _Float16 __ocml_trunc_f16(_Float16); __device__ __attribute__((const)) _Float16 __ocml_fmax_f16(_Float16, _Float16); __device__ __attribute__((const)) _Float16 __ocml_fmin_f16(_Float16, _Float16); typedef _Float16 __2f16 __attribute__((ext_vector_type(2))); typedef short __2i16 __attribute__((ext_vector_type(2))); #if defined(__clang__) && defined(__HIP__) __device__ __attribute__((const)) float __ockl_fdot2(__2f16 a, __2f16 b, float c, bool s); #endif __device__ __attribute__((const)) __2f16 __ocml_ceil_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_fabs_2f16(__2f16); __device__ __2f16 __ocml_cos_2f16(__2f16); __device__ __attribute__((pure)) __2f16 __ocml_exp_2f16(__2f16); __device__ __attribute__((pure)) __2f16 __ocml_exp10_2f16(__2f16); __device__ __attribute__((pure)) __2f16 __ocml_exp2_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_floor_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_fma_2f16(__2f16, __2f16, __2f16); __device__ __attribute__((const)) __2i16 __ocml_isinf_2f16(__2f16); __device__ __attribute__((const)) __2i16 __ocml_isnan_2f16(__2f16); __device__ __attribute__((pure)) __2f16 __ocml_log_2f16(__2f16); __device__ __attribute__((pure)) __2f16 __ocml_log10_2f16(__2f16); __device__ __attribute__((pure)) __2f16 __ocml_log2_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16); __device__ __2f16 __ocml_sin_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_sqrt_2f16(__2f16); __device__ __attribute__((const)) __2f16 __ocml_trunc_2f16(__2f16); __device__ __attribute__((const)) _Float16 __ocml_cvtrtn_f16_f32(float); __device__ __attribute__((const)) _Float16 __ocml_cvtrtp_f16_f32(float); __device__ __attribute__((const)) _Float16 __ocml_cvtrtz_f16_f32(float); } #endif // !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ //TODO: remove these after they get into clang header __clang_hip_libdevice_declares.h' extern "C" { __device__ __attribute__((const)) _Float16 __ocml_fmax_f16(_Float16, _Float16); __device__ __attribute__((const)) _Float16 __ocml_fmin_f16(_Float16, _Float16); __device__ __attribute__((const)) _Float16 __ocml_cvtrtn_f16_f32(float); __device__ __attribute__((const)) _Float16 __ocml_cvtrtp_f16_f32(float); __device__ __attribute__((const)) _Float16 __ocml_cvtrtz_f16_f32(float); } clr-rocm-5.7.1/hipamd/include/hip/amd_detail/hip_ldg.h000066400000000000000000000071041450307266000225560ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HIP_LDG_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HIP_LDG_H #if __HIP_CLANG_ONLY__ #include "amd_hip_vector_types.h" #include "host_defines.h" __device__ inline static char __ldg(const char* ptr) { return *ptr; } __device__ inline static char2 __ldg(const char2* ptr) { return *ptr; } __device__ inline static char4 __ldg(const char4* ptr) { return *ptr; } __device__ inline static signed char __ldg(const signed char* ptr) { return ptr[0]; } __device__ inline static unsigned char __ldg(const unsigned char* ptr) { return ptr[0]; } __device__ inline static short __ldg(const short* ptr) { return ptr[0]; } __device__ inline static short2 __ldg(const short2* ptr) { return ptr[0]; } __device__ inline static short4 __ldg(const short4* ptr) { return ptr[0]; } __device__ inline static unsigned short __ldg(const unsigned short* ptr) { return ptr[0]; } __device__ inline static int __ldg(const int* ptr) { return ptr[0]; } __device__ inline static int2 __ldg(const int2* ptr) { return ptr[0]; } __device__ inline static int4 __ldg(const int4* ptr) { return ptr[0]; } __device__ inline static unsigned int __ldg(const unsigned int* ptr) { return ptr[0]; } __device__ inline static long __ldg(const long* ptr) { return ptr[0]; } __device__ inline static unsigned long __ldg(const unsigned long* ptr) { return ptr[0]; } __device__ inline static long long __ldg(const long long* ptr) { return ptr[0]; } __device__ inline static longlong2 __ldg(const longlong2* ptr) { return ptr[0]; } __device__ inline static unsigned long long __ldg(const unsigned long long* ptr) { return ptr[0]; } __device__ inline static uchar2 __ldg(const uchar2* ptr) { return ptr[0]; } __device__ inline static uchar4 __ldg(const uchar4* ptr) { return ptr[0]; } __device__ inline static ushort2 __ldg(const ushort2* ptr) { return ptr[0]; } __device__ inline static uint2 __ldg(const uint2* ptr) { return ptr[0]; } __device__ inline static uint4 __ldg(const uint4* ptr) { return ptr[0]; } __device__ inline static ulonglong2 __ldg(const ulonglong2* ptr) { return ptr[0]; } __device__ inline static float __ldg(const float* ptr) { return ptr[0]; } __device__ inline static float2 __ldg(const float2* ptr) { return ptr[0]; } __device__ inline static float4 __ldg(const float4* ptr) { return ptr[0]; } __device__ inline static double __ldg(const double* ptr) { return ptr[0]; } __device__ inline static double2 __ldg(const double2* ptr) { return ptr[0]; } #endif // __HIP_CLANG_ONLY__ #endif // HIP_LDG_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/hip_prof_str.h000066400000000000000000022030571450307266000236550ustar00rootroot00000000000000// Generated file. DO NOT EDIT. // // This file is automatically generated by the hip_prof_gen.py script. // If changes are required, run the script and commit the updated file. #ifndef _HIP_PROF_STR_H #define _HIP_PROF_STR_H #define HIP_PROF_VER 1 // HIP API callbacks ID enumeration enum hip_api_id_t { HIP_API_ID_NONE = 0, HIP_API_ID_FIRST = 1, HIP_API_ID___hipPopCallConfiguration = 1, HIP_API_ID___hipPushCallConfiguration = 2, HIP_API_ID_hipArray3DCreate = 3, HIP_API_ID_hipArrayCreate = 4, HIP_API_ID_hipArrayDestroy = 5, HIP_API_ID_hipChooseDevice = 6, HIP_API_ID_hipConfigureCall = 7, HIP_API_ID_hipCtxCreate = 8, HIP_API_ID_hipCtxDestroy = 9, HIP_API_ID_hipCtxDisablePeerAccess = 10, HIP_API_ID_hipCtxEnablePeerAccess = 11, HIP_API_ID_hipCtxGetApiVersion = 12, HIP_API_ID_hipCtxGetCacheConfig = 13, HIP_API_ID_hipCtxGetCurrent = 14, HIP_API_ID_hipCtxGetDevice = 15, HIP_API_ID_hipCtxGetFlags = 16, HIP_API_ID_hipCtxGetSharedMemConfig = 17, HIP_API_ID_hipCtxPopCurrent = 18, HIP_API_ID_hipCtxPushCurrent = 19, HIP_API_ID_hipCtxSetCacheConfig = 20, HIP_API_ID_hipCtxSetCurrent = 21, HIP_API_ID_hipCtxSetSharedMemConfig = 22, HIP_API_ID_hipCtxSynchronize = 23, HIP_API_ID_hipDestroyExternalMemory = 24, HIP_API_ID_hipDestroyExternalSemaphore = 25, HIP_API_ID_hipDeviceCanAccessPeer = 26, HIP_API_ID_hipDeviceComputeCapability = 27, HIP_API_ID_hipDeviceDisablePeerAccess = 28, HIP_API_ID_hipDeviceEnablePeerAccess = 29, HIP_API_ID_hipDeviceGet = 30, HIP_API_ID_hipDeviceGetAttribute = 31, HIP_API_ID_hipDeviceGetByPCIBusId = 32, HIP_API_ID_hipDeviceGetCacheConfig = 33, HIP_API_ID_hipDeviceGetLimit = 34, HIP_API_ID_hipDeviceGetName = 35, HIP_API_ID_hipDeviceGetP2PAttribute = 36, HIP_API_ID_hipDeviceGetPCIBusId = 37, HIP_API_ID_hipDeviceGetSharedMemConfig = 38, HIP_API_ID_hipDeviceGetStreamPriorityRange = 39, HIP_API_ID_hipDevicePrimaryCtxGetState = 40, HIP_API_ID_hipDevicePrimaryCtxRelease = 41, HIP_API_ID_hipDevicePrimaryCtxReset = 42, HIP_API_ID_hipDevicePrimaryCtxRetain = 43, HIP_API_ID_hipDevicePrimaryCtxSetFlags = 44, HIP_API_ID_hipDeviceReset = 45, HIP_API_ID_hipDeviceSetCacheConfig = 46, HIP_API_ID_hipDeviceSetSharedMemConfig = 47, HIP_API_ID_hipDeviceSynchronize = 48, HIP_API_ID_hipDeviceTotalMem = 49, HIP_API_ID_RESERVED_50 = 50, HIP_API_ID_hipDrvMemcpy2DUnaligned = 51, HIP_API_ID_hipDrvMemcpy3D = 52, HIP_API_ID_hipDrvMemcpy3DAsync = 53, HIP_API_ID_hipEventCreate = 54, HIP_API_ID_hipEventCreateWithFlags = 55, HIP_API_ID_hipEventDestroy = 56, HIP_API_ID_hipEventElapsedTime = 57, HIP_API_ID_hipEventQuery = 58, HIP_API_ID_hipEventRecord = 59, HIP_API_ID_hipEventSynchronize = 60, HIP_API_ID_hipExtGetLinkTypeAndHopCount = 61, HIP_API_ID_hipExtLaunchKernel = 62, HIP_API_ID_hipExtLaunchMultiKernelMultiDevice = 63, HIP_API_ID_hipExtMallocWithFlags = 64, HIP_API_ID_hipExtModuleLaunchKernel = 65, HIP_API_ID_hipExtStreamCreateWithCUMask = 66, HIP_API_ID_hipExtStreamGetCUMask = 67, HIP_API_ID_hipExternalMemoryGetMappedBuffer = 68, HIP_API_ID_hipFree = 69, HIP_API_ID_hipFreeArray = 70, HIP_API_ID_hipFreeHost = 71, HIP_API_ID_hipFreeMipmappedArray = 72, HIP_API_ID_hipFuncGetAttribute = 73, HIP_API_ID_hipFuncGetAttributes = 74, HIP_API_ID_hipFuncSetAttribute = 75, HIP_API_ID_hipFuncSetCacheConfig = 76, HIP_API_ID_hipFuncSetSharedMemConfig = 77, HIP_API_ID_hipGetDevice = 78, HIP_API_ID_hipGetDeviceCount = 79, HIP_API_ID_hipGetDeviceFlags = 80, HIP_API_ID_hipGetDeviceProperties = 81, HIP_API_ID_RESERVED_82 = 82, HIP_API_ID_hipGetErrorString = 83, HIP_API_ID_hipGetLastError = 84, HIP_API_ID_hipGetMipmappedArrayLevel = 85, HIP_API_ID_hipGetSymbolAddress = 86, HIP_API_ID_hipGetSymbolSize = 87, HIP_API_ID_hipHccModuleLaunchKernel = 88, HIP_API_ID_hipHostAlloc = 89, HIP_API_ID_hipHostFree = 90, HIP_API_ID_hipHostGetDevicePointer = 91, HIP_API_ID_hipHostGetFlags = 92, HIP_API_ID_hipHostMalloc = 93, HIP_API_ID_hipHostRegister = 94, HIP_API_ID_hipHostUnregister = 95, HIP_API_ID_hipImportExternalMemory = 96, HIP_API_ID_hipImportExternalSemaphore = 97, HIP_API_ID_hipInit = 98, HIP_API_ID_hipIpcCloseMemHandle = 99, HIP_API_ID_hipIpcGetEventHandle = 100, HIP_API_ID_hipIpcGetMemHandle = 101, HIP_API_ID_hipIpcOpenEventHandle = 102, HIP_API_ID_hipIpcOpenMemHandle = 103, HIP_API_ID_hipLaunchByPtr = 104, HIP_API_ID_hipLaunchCooperativeKernel = 105, HIP_API_ID_hipLaunchCooperativeKernelMultiDevice = 106, HIP_API_ID_hipLaunchKernel = 107, HIP_API_ID_hipMalloc = 108, HIP_API_ID_hipMalloc3D = 109, HIP_API_ID_hipMalloc3DArray = 110, HIP_API_ID_hipMallocArray = 111, HIP_API_ID_hipMallocHost = 112, HIP_API_ID_hipMallocManaged = 113, HIP_API_ID_hipMallocMipmappedArray = 114, HIP_API_ID_hipMallocPitch = 115, HIP_API_ID_hipMemAdvise = 116, HIP_API_ID_hipMemAllocHost = 117, HIP_API_ID_hipMemAllocPitch = 118, HIP_API_ID_hipMemGetAddressRange = 119, HIP_API_ID_hipMemGetInfo = 120, HIP_API_ID_hipMemPrefetchAsync = 121, HIP_API_ID_hipMemPtrGetInfo = 122, HIP_API_ID_hipMemRangeGetAttribute = 123, HIP_API_ID_hipMemRangeGetAttributes = 124, HIP_API_ID_hipMemcpy = 125, HIP_API_ID_hipMemcpy2D = 126, HIP_API_ID_hipMemcpy2DAsync = 127, HIP_API_ID_hipMemcpy2DFromArray = 128, HIP_API_ID_hipMemcpy2DFromArrayAsync = 129, HIP_API_ID_hipMemcpy2DToArray = 130, HIP_API_ID_hipMemcpy2DToArrayAsync = 131, HIP_API_ID_hipMemcpy3D = 132, HIP_API_ID_hipMemcpy3DAsync = 133, HIP_API_ID_hipMemcpyAsync = 134, HIP_API_ID_hipMemcpyAtoH = 135, HIP_API_ID_hipMemcpyDtoD = 136, HIP_API_ID_hipMemcpyDtoDAsync = 137, HIP_API_ID_hipMemcpyDtoH = 138, HIP_API_ID_hipMemcpyDtoHAsync = 139, HIP_API_ID_hipMemcpyFromArray = 140, HIP_API_ID_hipMemcpyFromSymbol = 141, HIP_API_ID_hipMemcpyFromSymbolAsync = 142, HIP_API_ID_hipMemcpyHtoA = 143, HIP_API_ID_hipMemcpyHtoD = 144, HIP_API_ID_hipMemcpyHtoDAsync = 145, HIP_API_ID_hipMemcpyParam2D = 146, HIP_API_ID_hipMemcpyParam2DAsync = 147, HIP_API_ID_hipMemcpyPeer = 148, HIP_API_ID_hipMemcpyPeerAsync = 149, HIP_API_ID_hipMemcpyToArray = 150, HIP_API_ID_hipMemcpyToSymbol = 151, HIP_API_ID_hipMemcpyToSymbolAsync = 152, HIP_API_ID_hipMemcpyWithStream = 153, HIP_API_ID_hipMemset = 154, HIP_API_ID_hipMemset2D = 155, HIP_API_ID_hipMemset2DAsync = 156, HIP_API_ID_hipMemset3D = 157, HIP_API_ID_hipMemset3DAsync = 158, HIP_API_ID_hipMemsetAsync = 159, HIP_API_ID_hipMemsetD16 = 160, HIP_API_ID_hipMemsetD16Async = 161, HIP_API_ID_hipMemsetD32 = 162, HIP_API_ID_hipMemsetD32Async = 163, HIP_API_ID_hipMemsetD8 = 164, HIP_API_ID_hipMemsetD8Async = 165, HIP_API_ID_hipModuleGetFunction = 166, HIP_API_ID_hipModuleGetGlobal = 167, HIP_API_ID_hipModuleGetTexRef = 168, HIP_API_ID_hipModuleLaunchKernel = 169, HIP_API_ID_hipModuleLoad = 170, HIP_API_ID_hipModuleLoadData = 171, HIP_API_ID_hipModuleLoadDataEx = 172, HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessor = 173, HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags = 174, HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSize = 175, HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSizeWithFlags = 176, HIP_API_ID_hipModuleUnload = 177, HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessor = 178, HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags = 179, HIP_API_ID_hipOccupancyMaxPotentialBlockSize = 180, HIP_API_ID_hipPeekAtLastError = 181, HIP_API_ID_hipPointerGetAttributes = 182, HIP_API_ID_hipProfilerStart = 183, HIP_API_ID_hipProfilerStop = 184, HIP_API_ID_RESERVED_185 = 185, HIP_API_ID_hipSetDevice = 186, HIP_API_ID_hipSetDeviceFlags = 187, HIP_API_ID_hipSetupArgument = 188, HIP_API_ID_hipSignalExternalSemaphoresAsync = 189, HIP_API_ID_hipStreamAddCallback = 190, HIP_API_ID_hipStreamAttachMemAsync = 191, HIP_API_ID_hipStreamCreate = 192, HIP_API_ID_hipStreamCreateWithFlags = 193, HIP_API_ID_hipStreamCreateWithPriority = 194, HIP_API_ID_hipStreamDestroy = 195, HIP_API_ID_hipStreamGetFlags = 196, HIP_API_ID_hipStreamGetPriority = 197, HIP_API_ID_hipStreamQuery = 198, HIP_API_ID_hipStreamSynchronize = 199, HIP_API_ID_hipStreamWaitEvent = 200, HIP_API_ID_hipStreamWaitValue32 = 201, HIP_API_ID_hipStreamWaitValue64 = 202, HIP_API_ID_hipStreamWriteValue32 = 203, HIP_API_ID_hipStreamWriteValue64 = 204, HIP_API_ID_hipWaitExternalSemaphoresAsync = 205, HIP_API_ID_hipCreateSurfaceObject = 206, HIP_API_ID_hipDestroySurfaceObject = 207, HIP_API_ID_hipGraphAddKernelNode = 208, HIP_API_ID_hipGraphAddMemcpyNode = 209, HIP_API_ID_hipGraphAddMemsetNode = 210, HIP_API_ID_hipGraphCreate = 211, HIP_API_ID_hipGraphDestroy = 212, HIP_API_ID_hipGraphExecDestroy = 213, HIP_API_ID_hipGraphInstantiate = 214, HIP_API_ID_hipGraphLaunch = 215, HIP_API_ID_hipMipmappedArrayCreate = 216, HIP_API_ID_hipMipmappedArrayDestroy = 217, HIP_API_ID_hipMipmappedArrayGetLevel = 218, HIP_API_ID_hipStreamBeginCapture = 219, HIP_API_ID_hipStreamEndCapture = 220, HIP_API_ID_hipTexRefGetAddress = 221, HIP_API_ID_hipTexRefGetFlags = 222, HIP_API_ID_hipTexRefGetFormat = 223, HIP_API_ID_hipTexRefGetMaxAnisotropy = 224, HIP_API_ID_hipTexRefGetMipMappedArray = 225, HIP_API_ID_hipTexRefGetMipmapLevelBias = 226, HIP_API_ID_hipTexRefGetMipmapLevelClamp = 227, HIP_API_ID_hipTexRefSetAddress = 228, HIP_API_ID_hipTexRefSetAddress2D = 229, HIP_API_ID_hipTexRefSetBorderColor = 230, HIP_API_ID_hipTexRefSetFormat = 231, HIP_API_ID_hipTexRefSetMaxAnisotropy = 232, HIP_API_ID_hipTexRefSetMipmapLevelClamp = 233, HIP_API_ID_hipTexRefSetMipmappedArray = 234, HIP_API_ID_hipGLGetDevices = 235, HIP_API_ID_hipGraphAddDependencies = 236, HIP_API_ID_hipGraphAddEmptyNode = 237, HIP_API_ID_hipGraphExecKernelNodeSetParams = 238, HIP_API_ID_hipGraphGetNodes = 239, HIP_API_ID_hipGraphGetRootNodes = 240, HIP_API_ID_hipGraphKernelNodeGetParams = 241, HIP_API_ID_hipGraphKernelNodeSetParams = 242, HIP_API_ID_hipGraphMemcpyNodeGetParams = 243, HIP_API_ID_hipGraphMemcpyNodeSetParams = 244, HIP_API_ID_hipGraphMemsetNodeGetParams = 245, HIP_API_ID_hipGraphMemsetNodeSetParams = 246, HIP_API_ID_hipGraphicsGLRegisterBuffer = 247, HIP_API_ID_hipGraphicsMapResources = 248, HIP_API_ID_hipGraphicsResourceGetMappedPointer = 249, HIP_API_ID_hipGraphicsUnmapResources = 250, HIP_API_ID_hipGraphicsUnregisterResource = 251, HIP_API_ID_hipGraphAddChildGraphNode = 252, HIP_API_ID_hipGraphAddEventRecordNode = 253, HIP_API_ID_hipGraphAddEventWaitNode = 254, HIP_API_ID_hipGraphAddHostNode = 255, HIP_API_ID_hipGraphAddMemcpyNode1D = 256, HIP_API_ID_hipGraphAddMemcpyNodeFromSymbol = 257, HIP_API_ID_hipGraphAddMemcpyNodeToSymbol = 258, HIP_API_ID_hipGraphChildGraphNodeGetGraph = 259, HIP_API_ID_hipGraphClone = 260, HIP_API_ID_hipGraphDestroyNode = 261, HIP_API_ID_hipGraphEventRecordNodeGetEvent = 262, HIP_API_ID_hipGraphEventRecordNodeSetEvent = 263, HIP_API_ID_hipGraphEventWaitNodeGetEvent = 264, HIP_API_ID_hipGraphEventWaitNodeSetEvent = 265, HIP_API_ID_hipGraphExecChildGraphNodeSetParams = 266, HIP_API_ID_hipGraphExecEventRecordNodeSetEvent = 267, HIP_API_ID_hipGraphExecEventWaitNodeSetEvent = 268, HIP_API_ID_hipGraphExecHostNodeSetParams = 269, HIP_API_ID_hipGraphExecMemcpyNodeSetParams = 270, HIP_API_ID_hipGraphExecMemcpyNodeSetParams1D = 271, HIP_API_ID_hipGraphExecMemcpyNodeSetParamsFromSymbol = 272, HIP_API_ID_hipGraphExecMemcpyNodeSetParamsToSymbol = 273, HIP_API_ID_hipGraphExecMemsetNodeSetParams = 274, HIP_API_ID_hipGraphExecUpdate = 275, HIP_API_ID_hipGraphGetEdges = 276, HIP_API_ID_hipGraphHostNodeGetParams = 277, HIP_API_ID_hipGraphHostNodeSetParams = 278, HIP_API_ID_hipGraphInstantiateWithFlags = 279, HIP_API_ID_hipGraphMemcpyNodeSetParams1D = 280, HIP_API_ID_hipGraphMemcpyNodeSetParamsFromSymbol = 281, HIP_API_ID_hipGraphMemcpyNodeSetParamsToSymbol = 282, HIP_API_ID_hipGraphNodeFindInClone = 283, HIP_API_ID_hipGraphNodeGetDependencies = 284, HIP_API_ID_hipGraphNodeGetDependentNodes = 285, HIP_API_ID_hipGraphNodeGetType = 286, HIP_API_ID_hipGraphRemoveDependencies = 287, HIP_API_ID_hipStreamGetCaptureInfo = 288, HIP_API_ID_hipStreamGetCaptureInfo_v2 = 289, HIP_API_ID_hipStreamIsCapturing = 290, HIP_API_ID_hipStreamUpdateCaptureDependencies = 291, HIP_API_ID_hipDrvPointerGetAttributes = 292, HIP_API_ID_hipGraphicsGLRegisterImage = 293, HIP_API_ID_hipGraphicsSubResourceGetMappedArray = 294, HIP_API_ID_hipPointerGetAttribute = 295, HIP_API_ID_RESERVED_296 = 296, HIP_API_ID_hipThreadExchangeStreamCaptureMode = 297, HIP_API_ID_hipDeviceGetUuid = 298, HIP_API_ID_hipGetChannelDesc = 299, HIP_API_ID_hipGraphKernelNodeGetAttribute = 300, HIP_API_ID_hipGraphKernelNodeSetAttribute = 301, HIP_API_ID_hipLaunchHostFunc = 302, HIP_API_ID_hipDeviceGetDefaultMemPool = 303, HIP_API_ID_hipDeviceGetMemPool = 304, HIP_API_ID_hipDeviceSetMemPool = 305, HIP_API_ID_hipFreeAsync = 306, HIP_API_ID_hipMallocAsync = 307, HIP_API_ID_hipMallocFromPoolAsync = 308, HIP_API_ID_hipMemPoolCreate = 309, HIP_API_ID_hipMemPoolDestroy = 310, HIP_API_ID_hipMemPoolExportPointer = 311, HIP_API_ID_hipMemPoolExportToShareableHandle = 312, HIP_API_ID_hipMemPoolGetAccess = 313, HIP_API_ID_hipMemPoolGetAttribute = 314, HIP_API_ID_hipMemPoolImportFromShareableHandle = 315, HIP_API_ID_hipMemPoolImportPointer = 316, HIP_API_ID_hipMemPoolSetAccess = 317, HIP_API_ID_hipMemPoolSetAttribute = 318, HIP_API_ID_hipMemPoolTrimTo = 319, HIP_API_ID_hipMemAddressFree = 320, HIP_API_ID_hipMemAddressReserve = 321, HIP_API_ID_hipMemCreate = 322, HIP_API_ID_hipMemExportToShareableHandle = 323, HIP_API_ID_hipMemGetAccess = 324, HIP_API_ID_hipMemGetAllocationGranularity = 325, HIP_API_ID_hipMemGetAllocationPropertiesFromHandle = 326, HIP_API_ID_hipMemImportFromShareableHandle = 327, HIP_API_ID_hipMemMap = 328, HIP_API_ID_hipMemMapArrayAsync = 329, HIP_API_ID_hipMemRelease = 330, HIP_API_ID_hipMemRetainAllocationHandle = 331, HIP_API_ID_hipMemSetAccess = 332, HIP_API_ID_hipMemUnmap = 333, HIP_API_ID_hipDeviceSetGraphMemAttribute = 334, HIP_API_ID_hipDeviceGetGraphMemAttribute = 335, HIP_API_ID_hipDeviceGraphMemTrim = 336, HIP_API_ID_hipDeviceSetLimit = 337, HIP_API_ID_hipTexRefSetArray = 338, HIP_API_ID_hipTexRefSetFlags = 339, HIP_API_ID_hipTexRefSetMipmapLevelBias = 340, HIP_API_ID_hipDriverGetVersion = 341, HIP_API_ID_hipGraphUpload = 342, HIP_API_ID_hipRuntimeGetVersion = 343, HIP_API_ID_hipUserObjectCreate = 344, HIP_API_ID_hipUserObjectRelease = 345, HIP_API_ID_hipUserObjectRetain = 346, HIP_API_ID_hipGraphRetainUserObject = 347, HIP_API_ID_hipGraphReleaseUserObject = 348, HIP_API_ID_hipGraphDebugDotPrint = 349, HIP_API_ID_hipGraphKernelNodeCopyAttributes = 350, HIP_API_ID_hipGraphNodeGetEnabled = 351, HIP_API_ID_hipGraphNodeSetEnabled = 352, HIP_API_ID_hipPointerSetAttribute = 353, HIP_API_ID_hipGraphAddMemAllocNode = 354, HIP_API_ID_hipGraphAddMemFreeNode = 355, HIP_API_ID_hipGraphMemAllocNodeGetParams = 356, HIP_API_ID_hipGraphMemFreeNodeGetParams = 357, HIP_API_ID_hipModuleLaunchCooperativeKernel = 358, HIP_API_ID_hipModuleLaunchCooperativeKernelMultiDevice = 359, HIP_API_ID_hipArray3DGetDescriptor = 360, HIP_API_ID_hipArrayGetDescriptor = 361, HIP_API_ID_hipArrayGetInfo = 362, HIP_API_ID_hipStreamGetDevice = 363, HIP_API_ID_LAST = 363, HIP_API_ID_hipBindTexture = HIP_API_ID_NONE, HIP_API_ID_hipBindTexture2D = HIP_API_ID_NONE, HIP_API_ID_hipBindTextureToArray = HIP_API_ID_NONE, HIP_API_ID_hipBindTextureToMipmappedArray = HIP_API_ID_NONE, HIP_API_ID_hipCreateTextureObject = HIP_API_ID_NONE, HIP_API_ID_hipDestroyTextureObject = HIP_API_ID_NONE, HIP_API_ID_hipDeviceGetCount = HIP_API_ID_NONE, HIP_API_ID_hipGetTextureAlignmentOffset = HIP_API_ID_NONE, HIP_API_ID_hipGetTextureObjectResourceDesc = HIP_API_ID_NONE, HIP_API_ID_hipGetTextureObjectResourceViewDesc = HIP_API_ID_NONE, HIP_API_ID_hipGetTextureObjectTextureDesc = HIP_API_ID_NONE, HIP_API_ID_hipGetTextureReference = HIP_API_ID_NONE, HIP_API_ID_hipMemcpy2DArrayToArray = HIP_API_ID_NONE, HIP_API_ID_hipMemcpyArrayToArray = HIP_API_ID_NONE, HIP_API_ID_hipMemcpyAtoA = HIP_API_ID_NONE, HIP_API_ID_hipMemcpyAtoD = HIP_API_ID_NONE, HIP_API_ID_hipMemcpyAtoHAsync = HIP_API_ID_NONE, HIP_API_ID_hipMemcpyDtoA = HIP_API_ID_NONE, HIP_API_ID_hipMemcpyFromArrayAsync = HIP_API_ID_NONE, HIP_API_ID_hipMemcpyHtoAAsync = HIP_API_ID_NONE, HIP_API_ID_hipMemcpyToArrayAsync = HIP_API_ID_NONE, HIP_API_ID_hipModuleLaunchKernelExt = HIP_API_ID_NONE, HIP_API_ID_hipSetValidDevices = HIP_API_ID_NONE, HIP_API_ID_hipTexObjectCreate = HIP_API_ID_NONE, HIP_API_ID_hipTexObjectDestroy = HIP_API_ID_NONE, HIP_API_ID_hipTexObjectGetResourceDesc = HIP_API_ID_NONE, HIP_API_ID_hipTexObjectGetResourceViewDesc = HIP_API_ID_NONE, HIP_API_ID_hipTexObjectGetTextureDesc = HIP_API_ID_NONE, HIP_API_ID_hipTexRefGetAddressMode = HIP_API_ID_NONE, HIP_API_ID_hipTexRefGetArray = HIP_API_ID_NONE, HIP_API_ID_hipTexRefGetBorderColor = HIP_API_ID_NONE, HIP_API_ID_hipTexRefGetFilterMode = HIP_API_ID_NONE, HIP_API_ID_hipTexRefGetMipmapFilterMode = HIP_API_ID_NONE, HIP_API_ID_hipTexRefGetMipmappedArray = HIP_API_ID_NONE, HIP_API_ID_hipTexRefSetAddressMode = HIP_API_ID_NONE, HIP_API_ID_hipTexRefSetFilterMode = HIP_API_ID_NONE, HIP_API_ID_hipTexRefSetMipmapFilterMode = HIP_API_ID_NONE, HIP_API_ID_hipUnbindTexture = HIP_API_ID_NONE, }; // Return the HIP API string for a given callback ID static inline const char* hip_api_name(const uint32_t id) { switch(id) { case HIP_API_ID___hipPopCallConfiguration: return "__hipPopCallConfiguration"; case HIP_API_ID___hipPushCallConfiguration: return "__hipPushCallConfiguration"; case HIP_API_ID_hipArray3DCreate: return "hipArray3DCreate"; case HIP_API_ID_hipArray3DGetDescriptor: return "hipArray3DGetDescriptor"; case HIP_API_ID_hipArrayCreate: return "hipArrayCreate"; case HIP_API_ID_hipArrayDestroy: return "hipArrayDestroy"; case HIP_API_ID_hipArrayGetDescriptor: return "hipArrayGetDescriptor"; case HIP_API_ID_hipArrayGetInfo: return "hipArrayGetInfo"; case HIP_API_ID_hipChooseDevice: return "hipChooseDevice"; case HIP_API_ID_hipConfigureCall: return "hipConfigureCall"; case HIP_API_ID_hipCreateSurfaceObject: return "hipCreateSurfaceObject"; case HIP_API_ID_hipCtxCreate: return "hipCtxCreate"; case HIP_API_ID_hipCtxDestroy: return "hipCtxDestroy"; case HIP_API_ID_hipCtxDisablePeerAccess: return "hipCtxDisablePeerAccess"; case HIP_API_ID_hipCtxEnablePeerAccess: return "hipCtxEnablePeerAccess"; case HIP_API_ID_hipCtxGetApiVersion: return "hipCtxGetApiVersion"; case HIP_API_ID_hipCtxGetCacheConfig: return "hipCtxGetCacheConfig"; case HIP_API_ID_hipCtxGetCurrent: return "hipCtxGetCurrent"; case HIP_API_ID_hipCtxGetDevice: return "hipCtxGetDevice"; case HIP_API_ID_hipCtxGetFlags: return "hipCtxGetFlags"; case HIP_API_ID_hipCtxGetSharedMemConfig: return "hipCtxGetSharedMemConfig"; case HIP_API_ID_hipCtxPopCurrent: return "hipCtxPopCurrent"; case HIP_API_ID_hipCtxPushCurrent: return "hipCtxPushCurrent"; case HIP_API_ID_hipCtxSetCacheConfig: return "hipCtxSetCacheConfig"; case HIP_API_ID_hipCtxSetCurrent: return "hipCtxSetCurrent"; case HIP_API_ID_hipCtxSetSharedMemConfig: return "hipCtxSetSharedMemConfig"; case HIP_API_ID_hipCtxSynchronize: return "hipCtxSynchronize"; case HIP_API_ID_hipDestroyExternalMemory: return "hipDestroyExternalMemory"; case HIP_API_ID_hipDestroyExternalSemaphore: return "hipDestroyExternalSemaphore"; case HIP_API_ID_hipDestroySurfaceObject: return "hipDestroySurfaceObject"; case HIP_API_ID_hipDeviceCanAccessPeer: return "hipDeviceCanAccessPeer"; case HIP_API_ID_hipDeviceComputeCapability: return "hipDeviceComputeCapability"; case HIP_API_ID_hipDeviceDisablePeerAccess: return "hipDeviceDisablePeerAccess"; case HIP_API_ID_hipDeviceEnablePeerAccess: return "hipDeviceEnablePeerAccess"; case HIP_API_ID_hipDeviceGet: return "hipDeviceGet"; case HIP_API_ID_hipDeviceGetAttribute: return "hipDeviceGetAttribute"; case HIP_API_ID_hipDeviceGetByPCIBusId: return "hipDeviceGetByPCIBusId"; case HIP_API_ID_hipDeviceGetCacheConfig: return "hipDeviceGetCacheConfig"; case HIP_API_ID_hipDeviceGetDefaultMemPool: return "hipDeviceGetDefaultMemPool"; case HIP_API_ID_hipDeviceGetGraphMemAttribute: return "hipDeviceGetGraphMemAttribute"; case HIP_API_ID_hipDeviceGetLimit: return "hipDeviceGetLimit"; case HIP_API_ID_hipDeviceGetMemPool: return "hipDeviceGetMemPool"; case HIP_API_ID_hipDeviceGetName: return "hipDeviceGetName"; case HIP_API_ID_hipDeviceGetP2PAttribute: return "hipDeviceGetP2PAttribute"; case HIP_API_ID_hipDeviceGetPCIBusId: return "hipDeviceGetPCIBusId"; case HIP_API_ID_hipDeviceGetSharedMemConfig: return "hipDeviceGetSharedMemConfig"; case HIP_API_ID_hipDeviceGetStreamPriorityRange: return "hipDeviceGetStreamPriorityRange"; case HIP_API_ID_hipDeviceGetUuid: return "hipDeviceGetUuid"; case HIP_API_ID_hipDeviceGraphMemTrim: return "hipDeviceGraphMemTrim"; case HIP_API_ID_hipDevicePrimaryCtxGetState: return "hipDevicePrimaryCtxGetState"; case HIP_API_ID_hipDevicePrimaryCtxRelease: return "hipDevicePrimaryCtxRelease"; case HIP_API_ID_hipDevicePrimaryCtxReset: return "hipDevicePrimaryCtxReset"; case HIP_API_ID_hipDevicePrimaryCtxRetain: return "hipDevicePrimaryCtxRetain"; case HIP_API_ID_hipDevicePrimaryCtxSetFlags: return "hipDevicePrimaryCtxSetFlags"; case HIP_API_ID_hipDeviceReset: return "hipDeviceReset"; case HIP_API_ID_hipDeviceSetCacheConfig: return "hipDeviceSetCacheConfig"; case HIP_API_ID_hipDeviceSetGraphMemAttribute: return "hipDeviceSetGraphMemAttribute"; case HIP_API_ID_hipDeviceSetLimit: return "hipDeviceSetLimit"; case HIP_API_ID_hipDeviceSetMemPool: return "hipDeviceSetMemPool"; case HIP_API_ID_hipDeviceSetSharedMemConfig: return "hipDeviceSetSharedMemConfig"; case HIP_API_ID_hipDeviceSynchronize: return "hipDeviceSynchronize"; case HIP_API_ID_hipDeviceTotalMem: return "hipDeviceTotalMem"; case HIP_API_ID_hipDriverGetVersion: return "hipDriverGetVersion"; case HIP_API_ID_hipDrvMemcpy2DUnaligned: return "hipDrvMemcpy2DUnaligned"; case HIP_API_ID_hipDrvMemcpy3D: return "hipDrvMemcpy3D"; case HIP_API_ID_hipDrvMemcpy3DAsync: return "hipDrvMemcpy3DAsync"; case HIP_API_ID_hipDrvPointerGetAttributes: return "hipDrvPointerGetAttributes"; case HIP_API_ID_hipEventCreate: return "hipEventCreate"; case HIP_API_ID_hipEventCreateWithFlags: return "hipEventCreateWithFlags"; case HIP_API_ID_hipEventDestroy: return "hipEventDestroy"; case HIP_API_ID_hipEventElapsedTime: return "hipEventElapsedTime"; case HIP_API_ID_hipEventQuery: return "hipEventQuery"; case HIP_API_ID_hipEventRecord: return "hipEventRecord"; case HIP_API_ID_hipEventSynchronize: return "hipEventSynchronize"; case HIP_API_ID_hipExtGetLinkTypeAndHopCount: return "hipExtGetLinkTypeAndHopCount"; case HIP_API_ID_hipExtLaunchKernel: return "hipExtLaunchKernel"; case HIP_API_ID_hipExtLaunchMultiKernelMultiDevice: return "hipExtLaunchMultiKernelMultiDevice"; case HIP_API_ID_hipExtMallocWithFlags: return "hipExtMallocWithFlags"; case HIP_API_ID_hipExtModuleLaunchKernel: return "hipExtModuleLaunchKernel"; case HIP_API_ID_hipExtStreamCreateWithCUMask: return "hipExtStreamCreateWithCUMask"; case HIP_API_ID_hipExtStreamGetCUMask: return "hipExtStreamGetCUMask"; case HIP_API_ID_hipExternalMemoryGetMappedBuffer: return "hipExternalMemoryGetMappedBuffer"; case HIP_API_ID_hipFree: return "hipFree"; case HIP_API_ID_hipFreeArray: return "hipFreeArray"; case HIP_API_ID_hipFreeAsync: return "hipFreeAsync"; case HIP_API_ID_hipFreeHost: return "hipFreeHost"; case HIP_API_ID_hipFreeMipmappedArray: return "hipFreeMipmappedArray"; case HIP_API_ID_hipFuncGetAttribute: return "hipFuncGetAttribute"; case HIP_API_ID_hipFuncGetAttributes: return "hipFuncGetAttributes"; case HIP_API_ID_hipFuncSetAttribute: return "hipFuncSetAttribute"; case HIP_API_ID_hipFuncSetCacheConfig: return "hipFuncSetCacheConfig"; case HIP_API_ID_hipFuncSetSharedMemConfig: return "hipFuncSetSharedMemConfig"; case HIP_API_ID_hipGLGetDevices: return "hipGLGetDevices"; case HIP_API_ID_hipGetChannelDesc: return "hipGetChannelDesc"; case HIP_API_ID_hipGetDevice: return "hipGetDevice"; case HIP_API_ID_hipGetDeviceCount: return "hipGetDeviceCount"; case HIP_API_ID_hipGetDeviceFlags: return "hipGetDeviceFlags"; case HIP_API_ID_hipGetDeviceProperties: return "hipGetDeviceProperties"; case HIP_API_ID_hipGetErrorString: return "hipGetErrorString"; case HIP_API_ID_hipGetLastError: return "hipGetLastError"; case HIP_API_ID_hipGetMipmappedArrayLevel: return "hipGetMipmappedArrayLevel"; case HIP_API_ID_hipGetSymbolAddress: return "hipGetSymbolAddress"; case HIP_API_ID_hipGetSymbolSize: return "hipGetSymbolSize"; case HIP_API_ID_hipGraphAddChildGraphNode: return "hipGraphAddChildGraphNode"; case HIP_API_ID_hipGraphAddDependencies: return "hipGraphAddDependencies"; case HIP_API_ID_hipGraphAddEmptyNode: return "hipGraphAddEmptyNode"; case HIP_API_ID_hipGraphAddEventRecordNode: return "hipGraphAddEventRecordNode"; case HIP_API_ID_hipGraphAddEventWaitNode: return "hipGraphAddEventWaitNode"; case HIP_API_ID_hipGraphAddHostNode: return "hipGraphAddHostNode"; case HIP_API_ID_hipGraphAddKernelNode: return "hipGraphAddKernelNode"; case HIP_API_ID_hipGraphAddMemAllocNode: return "hipGraphAddMemAllocNode"; case HIP_API_ID_hipGraphAddMemFreeNode: return "hipGraphAddMemFreeNode"; case HIP_API_ID_hipGraphAddMemcpyNode: return "hipGraphAddMemcpyNode"; case HIP_API_ID_hipGraphAddMemcpyNode1D: return "hipGraphAddMemcpyNode1D"; case HIP_API_ID_hipGraphAddMemcpyNodeFromSymbol: return "hipGraphAddMemcpyNodeFromSymbol"; case HIP_API_ID_hipGraphAddMemcpyNodeToSymbol: return "hipGraphAddMemcpyNodeToSymbol"; case HIP_API_ID_hipGraphAddMemsetNode: return "hipGraphAddMemsetNode"; case HIP_API_ID_hipGraphChildGraphNodeGetGraph: return "hipGraphChildGraphNodeGetGraph"; case HIP_API_ID_hipGraphClone: return "hipGraphClone"; case HIP_API_ID_hipGraphCreate: return "hipGraphCreate"; case HIP_API_ID_hipGraphDebugDotPrint: return "hipGraphDebugDotPrint"; case HIP_API_ID_hipGraphDestroy: return "hipGraphDestroy"; case HIP_API_ID_hipGraphDestroyNode: return "hipGraphDestroyNode"; case HIP_API_ID_hipGraphEventRecordNodeGetEvent: return "hipGraphEventRecordNodeGetEvent"; case HIP_API_ID_hipGraphEventRecordNodeSetEvent: return "hipGraphEventRecordNodeSetEvent"; case HIP_API_ID_hipGraphEventWaitNodeGetEvent: return "hipGraphEventWaitNodeGetEvent"; case HIP_API_ID_hipGraphEventWaitNodeSetEvent: return "hipGraphEventWaitNodeSetEvent"; case HIP_API_ID_hipGraphExecChildGraphNodeSetParams: return "hipGraphExecChildGraphNodeSetParams"; case HIP_API_ID_hipGraphExecDestroy: return "hipGraphExecDestroy"; case HIP_API_ID_hipGraphExecEventRecordNodeSetEvent: return "hipGraphExecEventRecordNodeSetEvent"; case HIP_API_ID_hipGraphExecEventWaitNodeSetEvent: return "hipGraphExecEventWaitNodeSetEvent"; case HIP_API_ID_hipGraphExecHostNodeSetParams: return "hipGraphExecHostNodeSetParams"; case HIP_API_ID_hipGraphExecKernelNodeSetParams: return "hipGraphExecKernelNodeSetParams"; case HIP_API_ID_hipGraphExecMemcpyNodeSetParams: return "hipGraphExecMemcpyNodeSetParams"; case HIP_API_ID_hipGraphExecMemcpyNodeSetParams1D: return "hipGraphExecMemcpyNodeSetParams1D"; case HIP_API_ID_hipGraphExecMemcpyNodeSetParamsFromSymbol: return "hipGraphExecMemcpyNodeSetParamsFromSymbol"; case HIP_API_ID_hipGraphExecMemcpyNodeSetParamsToSymbol: return "hipGraphExecMemcpyNodeSetParamsToSymbol"; case HIP_API_ID_hipGraphExecMemsetNodeSetParams: return "hipGraphExecMemsetNodeSetParams"; case HIP_API_ID_hipGraphExecUpdate: return "hipGraphExecUpdate"; case HIP_API_ID_hipGraphGetEdges: return "hipGraphGetEdges"; case HIP_API_ID_hipGraphGetNodes: return "hipGraphGetNodes"; case HIP_API_ID_hipGraphGetRootNodes: return "hipGraphGetRootNodes"; case HIP_API_ID_hipGraphHostNodeGetParams: return "hipGraphHostNodeGetParams"; case HIP_API_ID_hipGraphHostNodeSetParams: return "hipGraphHostNodeSetParams"; case HIP_API_ID_hipGraphInstantiate: return "hipGraphInstantiate"; case HIP_API_ID_hipGraphInstantiateWithFlags: return "hipGraphInstantiateWithFlags"; case HIP_API_ID_hipGraphKernelNodeCopyAttributes: return "hipGraphKernelNodeCopyAttributes"; case HIP_API_ID_hipGraphKernelNodeGetAttribute: return "hipGraphKernelNodeGetAttribute"; case HIP_API_ID_hipGraphKernelNodeGetParams: return "hipGraphKernelNodeGetParams"; case HIP_API_ID_hipGraphKernelNodeSetAttribute: return "hipGraphKernelNodeSetAttribute"; case HIP_API_ID_hipGraphKernelNodeSetParams: return "hipGraphKernelNodeSetParams"; case HIP_API_ID_hipGraphLaunch: return "hipGraphLaunch"; case HIP_API_ID_hipGraphMemAllocNodeGetParams: return "hipGraphMemAllocNodeGetParams"; case HIP_API_ID_hipGraphMemFreeNodeGetParams: return "hipGraphMemFreeNodeGetParams"; case HIP_API_ID_hipGraphMemcpyNodeGetParams: return "hipGraphMemcpyNodeGetParams"; case HIP_API_ID_hipGraphMemcpyNodeSetParams: return "hipGraphMemcpyNodeSetParams"; case HIP_API_ID_hipGraphMemcpyNodeSetParams1D: return "hipGraphMemcpyNodeSetParams1D"; case HIP_API_ID_hipGraphMemcpyNodeSetParamsFromSymbol: return "hipGraphMemcpyNodeSetParamsFromSymbol"; case HIP_API_ID_hipGraphMemcpyNodeSetParamsToSymbol: return "hipGraphMemcpyNodeSetParamsToSymbol"; case HIP_API_ID_hipGraphMemsetNodeGetParams: return "hipGraphMemsetNodeGetParams"; case HIP_API_ID_hipGraphMemsetNodeSetParams: return "hipGraphMemsetNodeSetParams"; case HIP_API_ID_hipGraphNodeFindInClone: return "hipGraphNodeFindInClone"; case HIP_API_ID_hipGraphNodeGetDependencies: return "hipGraphNodeGetDependencies"; case HIP_API_ID_hipGraphNodeGetDependentNodes: return "hipGraphNodeGetDependentNodes"; case HIP_API_ID_hipGraphNodeGetEnabled: return "hipGraphNodeGetEnabled"; case HIP_API_ID_hipGraphNodeGetType: return "hipGraphNodeGetType"; case HIP_API_ID_hipGraphNodeSetEnabled: return "hipGraphNodeSetEnabled"; case HIP_API_ID_hipGraphReleaseUserObject: return "hipGraphReleaseUserObject"; case HIP_API_ID_hipGraphRemoveDependencies: return "hipGraphRemoveDependencies"; case HIP_API_ID_hipGraphRetainUserObject: return "hipGraphRetainUserObject"; case HIP_API_ID_hipGraphUpload: return "hipGraphUpload"; case HIP_API_ID_hipGraphicsGLRegisterBuffer: return "hipGraphicsGLRegisterBuffer"; case HIP_API_ID_hipGraphicsGLRegisterImage: return "hipGraphicsGLRegisterImage"; case HIP_API_ID_hipGraphicsMapResources: return "hipGraphicsMapResources"; case HIP_API_ID_hipGraphicsResourceGetMappedPointer: return "hipGraphicsResourceGetMappedPointer"; case HIP_API_ID_hipGraphicsSubResourceGetMappedArray: return "hipGraphicsSubResourceGetMappedArray"; case HIP_API_ID_hipGraphicsUnmapResources: return "hipGraphicsUnmapResources"; case HIP_API_ID_hipGraphicsUnregisterResource: return "hipGraphicsUnregisterResource"; case HIP_API_ID_hipHccModuleLaunchKernel: return "hipHccModuleLaunchKernel"; case HIP_API_ID_hipHostAlloc: return "hipHostAlloc"; case HIP_API_ID_hipHostFree: return "hipHostFree"; case HIP_API_ID_hipHostGetDevicePointer: return "hipHostGetDevicePointer"; case HIP_API_ID_hipHostGetFlags: return "hipHostGetFlags"; case HIP_API_ID_hipHostMalloc: return "hipHostMalloc"; case HIP_API_ID_hipHostRegister: return "hipHostRegister"; case HIP_API_ID_hipHostUnregister: return "hipHostUnregister"; case HIP_API_ID_hipImportExternalMemory: return "hipImportExternalMemory"; case HIP_API_ID_hipImportExternalSemaphore: return "hipImportExternalSemaphore"; case HIP_API_ID_hipInit: return "hipInit"; case HIP_API_ID_hipIpcCloseMemHandle: return "hipIpcCloseMemHandle"; case HIP_API_ID_hipIpcGetEventHandle: return "hipIpcGetEventHandle"; case HIP_API_ID_hipIpcGetMemHandle: return "hipIpcGetMemHandle"; case HIP_API_ID_hipIpcOpenEventHandle: return "hipIpcOpenEventHandle"; case HIP_API_ID_hipIpcOpenMemHandle: return "hipIpcOpenMemHandle"; case HIP_API_ID_hipLaunchByPtr: return "hipLaunchByPtr"; case HIP_API_ID_hipLaunchCooperativeKernel: return "hipLaunchCooperativeKernel"; case HIP_API_ID_hipLaunchCooperativeKernelMultiDevice: return "hipLaunchCooperativeKernelMultiDevice"; case HIP_API_ID_hipLaunchHostFunc: return "hipLaunchHostFunc"; case HIP_API_ID_hipLaunchKernel: return "hipLaunchKernel"; case HIP_API_ID_hipMalloc: return "hipMalloc"; case HIP_API_ID_hipMalloc3D: return "hipMalloc3D"; case HIP_API_ID_hipMalloc3DArray: return "hipMalloc3DArray"; case HIP_API_ID_hipMallocArray: return "hipMallocArray"; case HIP_API_ID_hipMallocAsync: return "hipMallocAsync"; case HIP_API_ID_hipMallocFromPoolAsync: return "hipMallocFromPoolAsync"; case HIP_API_ID_hipMallocHost: return "hipMallocHost"; case HIP_API_ID_hipMallocManaged: return "hipMallocManaged"; case HIP_API_ID_hipMallocMipmappedArray: return "hipMallocMipmappedArray"; case HIP_API_ID_hipMallocPitch: return "hipMallocPitch"; case HIP_API_ID_hipMemAddressFree: return "hipMemAddressFree"; case HIP_API_ID_hipMemAddressReserve: return "hipMemAddressReserve"; case HIP_API_ID_hipMemAdvise: return "hipMemAdvise"; case HIP_API_ID_hipMemAllocHost: return "hipMemAllocHost"; case HIP_API_ID_hipMemAllocPitch: return "hipMemAllocPitch"; case HIP_API_ID_hipMemCreate: return "hipMemCreate"; case HIP_API_ID_hipMemExportToShareableHandle: return "hipMemExportToShareableHandle"; case HIP_API_ID_hipMemGetAccess: return "hipMemGetAccess"; case HIP_API_ID_hipMemGetAddressRange: return "hipMemGetAddressRange"; case HIP_API_ID_hipMemGetAllocationGranularity: return "hipMemGetAllocationGranularity"; case HIP_API_ID_hipMemGetAllocationPropertiesFromHandle: return "hipMemGetAllocationPropertiesFromHandle"; case HIP_API_ID_hipMemGetInfo: return "hipMemGetInfo"; case HIP_API_ID_hipMemImportFromShareableHandle: return "hipMemImportFromShareableHandle"; case HIP_API_ID_hipMemMap: return "hipMemMap"; case HIP_API_ID_hipMemMapArrayAsync: return "hipMemMapArrayAsync"; case HIP_API_ID_hipMemPoolCreate: return "hipMemPoolCreate"; case HIP_API_ID_hipMemPoolDestroy: return "hipMemPoolDestroy"; case HIP_API_ID_hipMemPoolExportPointer: return "hipMemPoolExportPointer"; case HIP_API_ID_hipMemPoolExportToShareableHandle: return "hipMemPoolExportToShareableHandle"; case HIP_API_ID_hipMemPoolGetAccess: return "hipMemPoolGetAccess"; case HIP_API_ID_hipMemPoolGetAttribute: return "hipMemPoolGetAttribute"; case HIP_API_ID_hipMemPoolImportFromShareableHandle: return "hipMemPoolImportFromShareableHandle"; case HIP_API_ID_hipMemPoolImportPointer: return "hipMemPoolImportPointer"; case HIP_API_ID_hipMemPoolSetAccess: return "hipMemPoolSetAccess"; case HIP_API_ID_hipMemPoolSetAttribute: return "hipMemPoolSetAttribute"; case HIP_API_ID_hipMemPoolTrimTo: return "hipMemPoolTrimTo"; case HIP_API_ID_hipMemPrefetchAsync: return "hipMemPrefetchAsync"; case HIP_API_ID_hipMemPtrGetInfo: return "hipMemPtrGetInfo"; case HIP_API_ID_hipMemRangeGetAttribute: return "hipMemRangeGetAttribute"; case HIP_API_ID_hipMemRangeGetAttributes: return "hipMemRangeGetAttributes"; case HIP_API_ID_hipMemRelease: return "hipMemRelease"; case HIP_API_ID_hipMemRetainAllocationHandle: return "hipMemRetainAllocationHandle"; case HIP_API_ID_hipMemSetAccess: return "hipMemSetAccess"; case HIP_API_ID_hipMemUnmap: return "hipMemUnmap"; case HIP_API_ID_hipMemcpy: return "hipMemcpy"; case HIP_API_ID_hipMemcpy2D: return "hipMemcpy2D"; case HIP_API_ID_hipMemcpy2DAsync: return "hipMemcpy2DAsync"; case HIP_API_ID_hipMemcpy2DFromArray: return "hipMemcpy2DFromArray"; case HIP_API_ID_hipMemcpy2DFromArrayAsync: return "hipMemcpy2DFromArrayAsync"; case HIP_API_ID_hipMemcpy2DToArray: return "hipMemcpy2DToArray"; case HIP_API_ID_hipMemcpy2DToArrayAsync: return "hipMemcpy2DToArrayAsync"; case HIP_API_ID_hipMemcpy3D: return "hipMemcpy3D"; case HIP_API_ID_hipMemcpy3DAsync: return "hipMemcpy3DAsync"; case HIP_API_ID_hipMemcpyAsync: return "hipMemcpyAsync"; case HIP_API_ID_hipMemcpyAtoH: return "hipMemcpyAtoH"; case HIP_API_ID_hipMemcpyDtoD: return "hipMemcpyDtoD"; case HIP_API_ID_hipMemcpyDtoDAsync: return "hipMemcpyDtoDAsync"; case HIP_API_ID_hipMemcpyDtoH: return "hipMemcpyDtoH"; case HIP_API_ID_hipMemcpyDtoHAsync: return "hipMemcpyDtoHAsync"; case HIP_API_ID_hipMemcpyFromArray: return "hipMemcpyFromArray"; case HIP_API_ID_hipMemcpyFromSymbol: return "hipMemcpyFromSymbol"; case HIP_API_ID_hipMemcpyFromSymbolAsync: return "hipMemcpyFromSymbolAsync"; case HIP_API_ID_hipMemcpyHtoA: return "hipMemcpyHtoA"; case HIP_API_ID_hipMemcpyHtoD: return "hipMemcpyHtoD"; case HIP_API_ID_hipMemcpyHtoDAsync: return "hipMemcpyHtoDAsync"; case HIP_API_ID_hipMemcpyParam2D: return "hipMemcpyParam2D"; case HIP_API_ID_hipMemcpyParam2DAsync: return "hipMemcpyParam2DAsync"; case HIP_API_ID_hipMemcpyPeer: return "hipMemcpyPeer"; case HIP_API_ID_hipMemcpyPeerAsync: return "hipMemcpyPeerAsync"; case HIP_API_ID_hipMemcpyToArray: return "hipMemcpyToArray"; case HIP_API_ID_hipMemcpyToSymbol: return "hipMemcpyToSymbol"; case HIP_API_ID_hipMemcpyToSymbolAsync: return "hipMemcpyToSymbolAsync"; case HIP_API_ID_hipMemcpyWithStream: return "hipMemcpyWithStream"; case HIP_API_ID_hipMemset: return "hipMemset"; case HIP_API_ID_hipMemset2D: return "hipMemset2D"; case HIP_API_ID_hipMemset2DAsync: return "hipMemset2DAsync"; case HIP_API_ID_hipMemset3D: return "hipMemset3D"; case HIP_API_ID_hipMemset3DAsync: return "hipMemset3DAsync"; case HIP_API_ID_hipMemsetAsync: return "hipMemsetAsync"; case HIP_API_ID_hipMemsetD16: return "hipMemsetD16"; case HIP_API_ID_hipMemsetD16Async: return "hipMemsetD16Async"; case HIP_API_ID_hipMemsetD32: return "hipMemsetD32"; case HIP_API_ID_hipMemsetD32Async: return "hipMemsetD32Async"; case HIP_API_ID_hipMemsetD8: return "hipMemsetD8"; case HIP_API_ID_hipMemsetD8Async: return "hipMemsetD8Async"; case HIP_API_ID_hipMipmappedArrayCreate: return "hipMipmappedArrayCreate"; case HIP_API_ID_hipMipmappedArrayDestroy: return "hipMipmappedArrayDestroy"; case HIP_API_ID_hipMipmappedArrayGetLevel: return "hipMipmappedArrayGetLevel"; case HIP_API_ID_hipModuleGetFunction: return "hipModuleGetFunction"; case HIP_API_ID_hipModuleGetGlobal: return "hipModuleGetGlobal"; case HIP_API_ID_hipModuleGetTexRef: return "hipModuleGetTexRef"; case HIP_API_ID_hipModuleLaunchCooperativeKernel: return "hipModuleLaunchCooperativeKernel"; case HIP_API_ID_hipModuleLaunchCooperativeKernelMultiDevice: return "hipModuleLaunchCooperativeKernelMultiDevice"; case HIP_API_ID_hipModuleLaunchKernel: return "hipModuleLaunchKernel"; case HIP_API_ID_hipModuleLoad: return "hipModuleLoad"; case HIP_API_ID_hipModuleLoadData: return "hipModuleLoadData"; case HIP_API_ID_hipModuleLoadDataEx: return "hipModuleLoadDataEx"; case HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessor: return "hipModuleOccupancyMaxActiveBlocksPerMultiprocessor"; case HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags: return "hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags"; case HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSize: return "hipModuleOccupancyMaxPotentialBlockSize"; case HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSizeWithFlags: return "hipModuleOccupancyMaxPotentialBlockSizeWithFlags"; case HIP_API_ID_hipModuleUnload: return "hipModuleUnload"; case HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessor: return "hipOccupancyMaxActiveBlocksPerMultiprocessor"; case HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags: return "hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags"; case HIP_API_ID_hipOccupancyMaxPotentialBlockSize: return "hipOccupancyMaxPotentialBlockSize"; case HIP_API_ID_hipPeekAtLastError: return "hipPeekAtLastError"; case HIP_API_ID_hipPointerGetAttribute: return "hipPointerGetAttribute"; case HIP_API_ID_hipPointerGetAttributes: return "hipPointerGetAttributes"; case HIP_API_ID_hipPointerSetAttribute: return "hipPointerSetAttribute"; case HIP_API_ID_hipProfilerStart: return "hipProfilerStart"; case HIP_API_ID_hipProfilerStop: return "hipProfilerStop"; case HIP_API_ID_hipRuntimeGetVersion: return "hipRuntimeGetVersion"; case HIP_API_ID_hipSetDevice: return "hipSetDevice"; case HIP_API_ID_hipSetDeviceFlags: return "hipSetDeviceFlags"; case HIP_API_ID_hipSetupArgument: return "hipSetupArgument"; case HIP_API_ID_hipSignalExternalSemaphoresAsync: return "hipSignalExternalSemaphoresAsync"; case HIP_API_ID_hipStreamAddCallback: return "hipStreamAddCallback"; case HIP_API_ID_hipStreamAttachMemAsync: return "hipStreamAttachMemAsync"; case HIP_API_ID_hipStreamBeginCapture: return "hipStreamBeginCapture"; case HIP_API_ID_hipStreamCreate: return "hipStreamCreate"; case HIP_API_ID_hipStreamCreateWithFlags: return "hipStreamCreateWithFlags"; case HIP_API_ID_hipStreamCreateWithPriority: return "hipStreamCreateWithPriority"; case HIP_API_ID_hipStreamDestroy: return "hipStreamDestroy"; case HIP_API_ID_hipStreamEndCapture: return "hipStreamEndCapture"; case HIP_API_ID_hipStreamGetCaptureInfo: return "hipStreamGetCaptureInfo"; case HIP_API_ID_hipStreamGetCaptureInfo_v2: return "hipStreamGetCaptureInfo_v2"; case HIP_API_ID_hipStreamGetDevice: return "hipStreamGetDevice"; case HIP_API_ID_hipStreamGetFlags: return "hipStreamGetFlags"; case HIP_API_ID_hipStreamGetPriority: return "hipStreamGetPriority"; case HIP_API_ID_hipStreamIsCapturing: return "hipStreamIsCapturing"; case HIP_API_ID_hipStreamQuery: return "hipStreamQuery"; case HIP_API_ID_hipStreamSynchronize: return "hipStreamSynchronize"; case HIP_API_ID_hipStreamUpdateCaptureDependencies: return "hipStreamUpdateCaptureDependencies"; case HIP_API_ID_hipStreamWaitEvent: return "hipStreamWaitEvent"; case HIP_API_ID_hipStreamWaitValue32: return "hipStreamWaitValue32"; case HIP_API_ID_hipStreamWaitValue64: return "hipStreamWaitValue64"; case HIP_API_ID_hipStreamWriteValue32: return "hipStreamWriteValue32"; case HIP_API_ID_hipStreamWriteValue64: return "hipStreamWriteValue64"; case HIP_API_ID_hipTexRefGetAddress: return "hipTexRefGetAddress"; case HIP_API_ID_hipTexRefGetFlags: return "hipTexRefGetFlags"; case HIP_API_ID_hipTexRefGetFormat: return "hipTexRefGetFormat"; case HIP_API_ID_hipTexRefGetMaxAnisotropy: return "hipTexRefGetMaxAnisotropy"; case HIP_API_ID_hipTexRefGetMipMappedArray: return "hipTexRefGetMipMappedArray"; case HIP_API_ID_hipTexRefGetMipmapLevelBias: return "hipTexRefGetMipmapLevelBias"; case HIP_API_ID_hipTexRefGetMipmapLevelClamp: return "hipTexRefGetMipmapLevelClamp"; case HIP_API_ID_hipTexRefSetAddress: return "hipTexRefSetAddress"; case HIP_API_ID_hipTexRefSetAddress2D: return "hipTexRefSetAddress2D"; case HIP_API_ID_hipTexRefSetArray: return "hipTexRefSetArray"; case HIP_API_ID_hipTexRefSetBorderColor: return "hipTexRefSetBorderColor"; case HIP_API_ID_hipTexRefSetFlags: return "hipTexRefSetFlags"; case HIP_API_ID_hipTexRefSetFormat: return "hipTexRefSetFormat"; case HIP_API_ID_hipTexRefSetMaxAnisotropy: return "hipTexRefSetMaxAnisotropy"; case HIP_API_ID_hipTexRefSetMipmapLevelBias: return "hipTexRefSetMipmapLevelBias"; case HIP_API_ID_hipTexRefSetMipmapLevelClamp: return "hipTexRefSetMipmapLevelClamp"; case HIP_API_ID_hipTexRefSetMipmappedArray: return "hipTexRefSetMipmappedArray"; case HIP_API_ID_hipThreadExchangeStreamCaptureMode: return "hipThreadExchangeStreamCaptureMode"; case HIP_API_ID_hipUserObjectCreate: return "hipUserObjectCreate"; case HIP_API_ID_hipUserObjectRelease: return "hipUserObjectRelease"; case HIP_API_ID_hipUserObjectRetain: return "hipUserObjectRetain"; case HIP_API_ID_hipWaitExternalSemaphoresAsync: return "hipWaitExternalSemaphoresAsync"; }; return "unknown"; }; #include // Return the HIP API callback ID for a given name static inline uint32_t hipApiIdByName(const char* name) { if (strcmp("__hipPopCallConfiguration", name) == 0) return HIP_API_ID___hipPopCallConfiguration; if (strcmp("__hipPushCallConfiguration", name) == 0) return HIP_API_ID___hipPushCallConfiguration; if (strcmp("hipArray3DCreate", name) == 0) return HIP_API_ID_hipArray3DCreate; if (strcmp("hipArray3DGetDescriptor", name) == 0) return HIP_API_ID_hipArray3DGetDescriptor; if (strcmp("hipArrayCreate", name) == 0) return HIP_API_ID_hipArrayCreate; if (strcmp("hipArrayDestroy", name) == 0) return HIP_API_ID_hipArrayDestroy; if (strcmp("hipArrayGetDescriptor", name) == 0) return HIP_API_ID_hipArrayGetDescriptor; if (strcmp("hipArrayGetInfo", name) == 0) return HIP_API_ID_hipArrayGetInfo; if (strcmp("hipChooseDevice", name) == 0) return HIP_API_ID_hipChooseDevice; if (strcmp("hipConfigureCall", name) == 0) return HIP_API_ID_hipConfigureCall; if (strcmp("hipCreateSurfaceObject", name) == 0) return HIP_API_ID_hipCreateSurfaceObject; if (strcmp("hipCtxCreate", name) == 0) return HIP_API_ID_hipCtxCreate; if (strcmp("hipCtxDestroy", name) == 0) return HIP_API_ID_hipCtxDestroy; if (strcmp("hipCtxDisablePeerAccess", name) == 0) return HIP_API_ID_hipCtxDisablePeerAccess; if (strcmp("hipCtxEnablePeerAccess", name) == 0) return HIP_API_ID_hipCtxEnablePeerAccess; if (strcmp("hipCtxGetApiVersion", name) == 0) return HIP_API_ID_hipCtxGetApiVersion; if (strcmp("hipCtxGetCacheConfig", name) == 0) return HIP_API_ID_hipCtxGetCacheConfig; if (strcmp("hipCtxGetCurrent", name) == 0) return HIP_API_ID_hipCtxGetCurrent; if (strcmp("hipCtxGetDevice", name) == 0) return HIP_API_ID_hipCtxGetDevice; if (strcmp("hipCtxGetFlags", name) == 0) return HIP_API_ID_hipCtxGetFlags; if (strcmp("hipCtxGetSharedMemConfig", name) == 0) return HIP_API_ID_hipCtxGetSharedMemConfig; if (strcmp("hipCtxPopCurrent", name) == 0) return HIP_API_ID_hipCtxPopCurrent; if (strcmp("hipCtxPushCurrent", name) == 0) return HIP_API_ID_hipCtxPushCurrent; if (strcmp("hipCtxSetCacheConfig", name) == 0) return HIP_API_ID_hipCtxSetCacheConfig; if (strcmp("hipCtxSetCurrent", name) == 0) return HIP_API_ID_hipCtxSetCurrent; if (strcmp("hipCtxSetSharedMemConfig", name) == 0) return HIP_API_ID_hipCtxSetSharedMemConfig; if (strcmp("hipCtxSynchronize", name) == 0) return HIP_API_ID_hipCtxSynchronize; if (strcmp("hipDestroyExternalMemory", name) == 0) return HIP_API_ID_hipDestroyExternalMemory; if (strcmp("hipDestroyExternalSemaphore", name) == 0) return HIP_API_ID_hipDestroyExternalSemaphore; if (strcmp("hipDestroySurfaceObject", name) == 0) return HIP_API_ID_hipDestroySurfaceObject; if (strcmp("hipDeviceCanAccessPeer", name) == 0) return HIP_API_ID_hipDeviceCanAccessPeer; if (strcmp("hipDeviceComputeCapability", name) == 0) return HIP_API_ID_hipDeviceComputeCapability; if (strcmp("hipDeviceDisablePeerAccess", name) == 0) return HIP_API_ID_hipDeviceDisablePeerAccess; if (strcmp("hipDeviceEnablePeerAccess", name) == 0) return HIP_API_ID_hipDeviceEnablePeerAccess; if (strcmp("hipDeviceGet", name) == 0) return HIP_API_ID_hipDeviceGet; if (strcmp("hipDeviceGetAttribute", name) == 0) return HIP_API_ID_hipDeviceGetAttribute; if (strcmp("hipDeviceGetByPCIBusId", name) == 0) return HIP_API_ID_hipDeviceGetByPCIBusId; if (strcmp("hipDeviceGetCacheConfig", name) == 0) return HIP_API_ID_hipDeviceGetCacheConfig; if (strcmp("hipDeviceGetDefaultMemPool", name) == 0) return HIP_API_ID_hipDeviceGetDefaultMemPool; if (strcmp("hipDeviceGetGraphMemAttribute", name) == 0) return HIP_API_ID_hipDeviceGetGraphMemAttribute; if (strcmp("hipDeviceGetLimit", name) == 0) return HIP_API_ID_hipDeviceGetLimit; if (strcmp("hipDeviceGetMemPool", name) == 0) return HIP_API_ID_hipDeviceGetMemPool; if (strcmp("hipDeviceGetName", name) == 0) return HIP_API_ID_hipDeviceGetName; if (strcmp("hipDeviceGetP2PAttribute", name) == 0) return HIP_API_ID_hipDeviceGetP2PAttribute; if (strcmp("hipDeviceGetPCIBusId", name) == 0) return HIP_API_ID_hipDeviceGetPCIBusId; if (strcmp("hipDeviceGetSharedMemConfig", name) == 0) return HIP_API_ID_hipDeviceGetSharedMemConfig; if (strcmp("hipDeviceGetStreamPriorityRange", name) == 0) return HIP_API_ID_hipDeviceGetStreamPriorityRange; if (strcmp("hipDeviceGetUuid", name) == 0) return HIP_API_ID_hipDeviceGetUuid; if (strcmp("hipDeviceGraphMemTrim", name) == 0) return HIP_API_ID_hipDeviceGraphMemTrim; if (strcmp("hipDevicePrimaryCtxGetState", name) == 0) return HIP_API_ID_hipDevicePrimaryCtxGetState; if (strcmp("hipDevicePrimaryCtxRelease", name) == 0) return HIP_API_ID_hipDevicePrimaryCtxRelease; if (strcmp("hipDevicePrimaryCtxReset", name) == 0) return HIP_API_ID_hipDevicePrimaryCtxReset; if (strcmp("hipDevicePrimaryCtxRetain", name) == 0) return HIP_API_ID_hipDevicePrimaryCtxRetain; if (strcmp("hipDevicePrimaryCtxSetFlags", name) == 0) return HIP_API_ID_hipDevicePrimaryCtxSetFlags; if (strcmp("hipDeviceReset", name) == 0) return HIP_API_ID_hipDeviceReset; if (strcmp("hipDeviceSetCacheConfig", name) == 0) return HIP_API_ID_hipDeviceSetCacheConfig; if (strcmp("hipDeviceSetGraphMemAttribute", name) == 0) return HIP_API_ID_hipDeviceSetGraphMemAttribute; if (strcmp("hipDeviceSetLimit", name) == 0) return HIP_API_ID_hipDeviceSetLimit; if (strcmp("hipDeviceSetMemPool", name) == 0) return HIP_API_ID_hipDeviceSetMemPool; if (strcmp("hipDeviceSetSharedMemConfig", name) == 0) return HIP_API_ID_hipDeviceSetSharedMemConfig; if (strcmp("hipDeviceSynchronize", name) == 0) return HIP_API_ID_hipDeviceSynchronize; if (strcmp("hipDeviceTotalMem", name) == 0) return HIP_API_ID_hipDeviceTotalMem; if (strcmp("hipDriverGetVersion", name) == 0) return HIP_API_ID_hipDriverGetVersion; if (strcmp("hipDrvMemcpy2DUnaligned", name) == 0) return HIP_API_ID_hipDrvMemcpy2DUnaligned; if (strcmp("hipDrvMemcpy3D", name) == 0) return HIP_API_ID_hipDrvMemcpy3D; if (strcmp("hipDrvMemcpy3DAsync", name) == 0) return HIP_API_ID_hipDrvMemcpy3DAsync; if (strcmp("hipDrvPointerGetAttributes", name) == 0) return HIP_API_ID_hipDrvPointerGetAttributes; if (strcmp("hipEventCreate", name) == 0) return HIP_API_ID_hipEventCreate; if (strcmp("hipEventCreateWithFlags", name) == 0) return HIP_API_ID_hipEventCreateWithFlags; if (strcmp("hipEventDestroy", name) == 0) return HIP_API_ID_hipEventDestroy; if (strcmp("hipEventElapsedTime", name) == 0) return HIP_API_ID_hipEventElapsedTime; if (strcmp("hipEventQuery", name) == 0) return HIP_API_ID_hipEventQuery; if (strcmp("hipEventRecord", name) == 0) return HIP_API_ID_hipEventRecord; if (strcmp("hipEventSynchronize", name) == 0) return HIP_API_ID_hipEventSynchronize; if (strcmp("hipExtGetLinkTypeAndHopCount", name) == 0) return HIP_API_ID_hipExtGetLinkTypeAndHopCount; if (strcmp("hipExtLaunchKernel", name) == 0) return HIP_API_ID_hipExtLaunchKernel; if (strcmp("hipExtLaunchMultiKernelMultiDevice", name) == 0) return HIP_API_ID_hipExtLaunchMultiKernelMultiDevice; if (strcmp("hipExtMallocWithFlags", name) == 0) return HIP_API_ID_hipExtMallocWithFlags; if (strcmp("hipExtModuleLaunchKernel", name) == 0) return HIP_API_ID_hipExtModuleLaunchKernel; if (strcmp("hipExtStreamCreateWithCUMask", name) == 0) return HIP_API_ID_hipExtStreamCreateWithCUMask; if (strcmp("hipExtStreamGetCUMask", name) == 0) return HIP_API_ID_hipExtStreamGetCUMask; if (strcmp("hipExternalMemoryGetMappedBuffer", name) == 0) return HIP_API_ID_hipExternalMemoryGetMappedBuffer; if (strcmp("hipFree", name) == 0) return HIP_API_ID_hipFree; if (strcmp("hipFreeArray", name) == 0) return HIP_API_ID_hipFreeArray; if (strcmp("hipFreeAsync", name) == 0) return HIP_API_ID_hipFreeAsync; if (strcmp("hipFreeHost", name) == 0) return HIP_API_ID_hipFreeHost; if (strcmp("hipFreeMipmappedArray", name) == 0) return HIP_API_ID_hipFreeMipmappedArray; if (strcmp("hipFuncGetAttribute", name) == 0) return HIP_API_ID_hipFuncGetAttribute; if (strcmp("hipFuncGetAttributes", name) == 0) return HIP_API_ID_hipFuncGetAttributes; if (strcmp("hipFuncSetAttribute", name) == 0) return HIP_API_ID_hipFuncSetAttribute; if (strcmp("hipFuncSetCacheConfig", name) == 0) return HIP_API_ID_hipFuncSetCacheConfig; if (strcmp("hipFuncSetSharedMemConfig", name) == 0) return HIP_API_ID_hipFuncSetSharedMemConfig; if (strcmp("hipGLGetDevices", name) == 0) return HIP_API_ID_hipGLGetDevices; if (strcmp("hipGetChannelDesc", name) == 0) return HIP_API_ID_hipGetChannelDesc; if (strcmp("hipGetDevice", name) == 0) return HIP_API_ID_hipGetDevice; if (strcmp("hipGetDeviceCount", name) == 0) return HIP_API_ID_hipGetDeviceCount; if (strcmp("hipGetDeviceFlags", name) == 0) return HIP_API_ID_hipGetDeviceFlags; if (strcmp("hipGetDeviceProperties", name) == 0) return HIP_API_ID_hipGetDeviceProperties; if (strcmp("hipGetErrorString", name) == 0) return HIP_API_ID_hipGetErrorString; if (strcmp("hipGetLastError", name) == 0) return HIP_API_ID_hipGetLastError; if (strcmp("hipGetMipmappedArrayLevel", name) == 0) return HIP_API_ID_hipGetMipmappedArrayLevel; if (strcmp("hipGetSymbolAddress", name) == 0) return HIP_API_ID_hipGetSymbolAddress; if (strcmp("hipGetSymbolSize", name) == 0) return HIP_API_ID_hipGetSymbolSize; if (strcmp("hipGraphAddChildGraphNode", name) == 0) return HIP_API_ID_hipGraphAddChildGraphNode; if (strcmp("hipGraphAddDependencies", name) == 0) return HIP_API_ID_hipGraphAddDependencies; if (strcmp("hipGraphAddEmptyNode", name) == 0) return HIP_API_ID_hipGraphAddEmptyNode; if (strcmp("hipGraphAddEventRecordNode", name) == 0) return HIP_API_ID_hipGraphAddEventRecordNode; if (strcmp("hipGraphAddEventWaitNode", name) == 0) return HIP_API_ID_hipGraphAddEventWaitNode; if (strcmp("hipGraphAddHostNode", name) == 0) return HIP_API_ID_hipGraphAddHostNode; if (strcmp("hipGraphAddKernelNode", name) == 0) return HIP_API_ID_hipGraphAddKernelNode; if (strcmp("hipGraphAddMemAllocNode", name) == 0) return HIP_API_ID_hipGraphAddMemAllocNode; if (strcmp("hipGraphAddMemFreeNode", name) == 0) return HIP_API_ID_hipGraphAddMemFreeNode; if (strcmp("hipGraphAddMemcpyNode", name) == 0) return HIP_API_ID_hipGraphAddMemcpyNode; if (strcmp("hipGraphAddMemcpyNode1D", name) == 0) return HIP_API_ID_hipGraphAddMemcpyNode1D; if (strcmp("hipGraphAddMemcpyNodeFromSymbol", name) == 0) return HIP_API_ID_hipGraphAddMemcpyNodeFromSymbol; if (strcmp("hipGraphAddMemcpyNodeToSymbol", name) == 0) return HIP_API_ID_hipGraphAddMemcpyNodeToSymbol; if (strcmp("hipGraphAddMemsetNode", name) == 0) return HIP_API_ID_hipGraphAddMemsetNode; if (strcmp("hipGraphChildGraphNodeGetGraph", name) == 0) return HIP_API_ID_hipGraphChildGraphNodeGetGraph; if (strcmp("hipGraphClone", name) == 0) return HIP_API_ID_hipGraphClone; if (strcmp("hipGraphCreate", name) == 0) return HIP_API_ID_hipGraphCreate; if (strcmp("hipGraphDebugDotPrint", name) == 0) return HIP_API_ID_hipGraphDebugDotPrint; if (strcmp("hipGraphDestroy", name) == 0) return HIP_API_ID_hipGraphDestroy; if (strcmp("hipGraphDestroyNode", name) == 0) return HIP_API_ID_hipGraphDestroyNode; if (strcmp("hipGraphEventRecordNodeGetEvent", name) == 0) return HIP_API_ID_hipGraphEventRecordNodeGetEvent; if (strcmp("hipGraphEventRecordNodeSetEvent", name) == 0) return HIP_API_ID_hipGraphEventRecordNodeSetEvent; if (strcmp("hipGraphEventWaitNodeGetEvent", name) == 0) return HIP_API_ID_hipGraphEventWaitNodeGetEvent; if (strcmp("hipGraphEventWaitNodeSetEvent", name) == 0) return HIP_API_ID_hipGraphEventWaitNodeSetEvent; if (strcmp("hipGraphExecChildGraphNodeSetParams", name) == 0) return HIP_API_ID_hipGraphExecChildGraphNodeSetParams; if (strcmp("hipGraphExecDestroy", name) == 0) return HIP_API_ID_hipGraphExecDestroy; if (strcmp("hipGraphExecEventRecordNodeSetEvent", name) == 0) return HIP_API_ID_hipGraphExecEventRecordNodeSetEvent; if (strcmp("hipGraphExecEventWaitNodeSetEvent", name) == 0) return HIP_API_ID_hipGraphExecEventWaitNodeSetEvent; if (strcmp("hipGraphExecHostNodeSetParams", name) == 0) return HIP_API_ID_hipGraphExecHostNodeSetParams; if (strcmp("hipGraphExecKernelNodeSetParams", name) == 0) return HIP_API_ID_hipGraphExecKernelNodeSetParams; if (strcmp("hipGraphExecMemcpyNodeSetParams", name) == 0) return HIP_API_ID_hipGraphExecMemcpyNodeSetParams; if (strcmp("hipGraphExecMemcpyNodeSetParams1D", name) == 0) return HIP_API_ID_hipGraphExecMemcpyNodeSetParams1D; if (strcmp("hipGraphExecMemcpyNodeSetParamsFromSymbol", name) == 0) return HIP_API_ID_hipGraphExecMemcpyNodeSetParamsFromSymbol; if (strcmp("hipGraphExecMemcpyNodeSetParamsToSymbol", name) == 0) return HIP_API_ID_hipGraphExecMemcpyNodeSetParamsToSymbol; if (strcmp("hipGraphExecMemsetNodeSetParams", name) == 0) return HIP_API_ID_hipGraphExecMemsetNodeSetParams; if (strcmp("hipGraphExecUpdate", name) == 0) return HIP_API_ID_hipGraphExecUpdate; if (strcmp("hipGraphGetEdges", name) == 0) return HIP_API_ID_hipGraphGetEdges; if (strcmp("hipGraphGetNodes", name) == 0) return HIP_API_ID_hipGraphGetNodes; if (strcmp("hipGraphGetRootNodes", name) == 0) return HIP_API_ID_hipGraphGetRootNodes; if (strcmp("hipGraphHostNodeGetParams", name) == 0) return HIP_API_ID_hipGraphHostNodeGetParams; if (strcmp("hipGraphHostNodeSetParams", name) == 0) return HIP_API_ID_hipGraphHostNodeSetParams; if (strcmp("hipGraphInstantiate", name) == 0) return HIP_API_ID_hipGraphInstantiate; if (strcmp("hipGraphInstantiateWithFlags", name) == 0) return HIP_API_ID_hipGraphInstantiateWithFlags; if (strcmp("hipGraphKernelNodeCopyAttributes", name) == 0) return HIP_API_ID_hipGraphKernelNodeCopyAttributes; if (strcmp("hipGraphKernelNodeGetAttribute", name) == 0) return HIP_API_ID_hipGraphKernelNodeGetAttribute; if (strcmp("hipGraphKernelNodeGetParams", name) == 0) return HIP_API_ID_hipGraphKernelNodeGetParams; if (strcmp("hipGraphKernelNodeSetAttribute", name) == 0) return HIP_API_ID_hipGraphKernelNodeSetAttribute; if (strcmp("hipGraphKernelNodeSetParams", name) == 0) return HIP_API_ID_hipGraphKernelNodeSetParams; if (strcmp("hipGraphLaunch", name) == 0) return HIP_API_ID_hipGraphLaunch; if (strcmp("hipGraphMemAllocNodeGetParams", name) == 0) return HIP_API_ID_hipGraphMemAllocNodeGetParams; if (strcmp("hipGraphMemFreeNodeGetParams", name) == 0) return HIP_API_ID_hipGraphMemFreeNodeGetParams; if (strcmp("hipGraphMemcpyNodeGetParams", name) == 0) return HIP_API_ID_hipGraphMemcpyNodeGetParams; if (strcmp("hipGraphMemcpyNodeSetParams", name) == 0) return HIP_API_ID_hipGraphMemcpyNodeSetParams; if (strcmp("hipGraphMemcpyNodeSetParams1D", name) == 0) return HIP_API_ID_hipGraphMemcpyNodeSetParams1D; if (strcmp("hipGraphMemcpyNodeSetParamsFromSymbol", name) == 0) return HIP_API_ID_hipGraphMemcpyNodeSetParamsFromSymbol; if (strcmp("hipGraphMemcpyNodeSetParamsToSymbol", name) == 0) return HIP_API_ID_hipGraphMemcpyNodeSetParamsToSymbol; if (strcmp("hipGraphMemsetNodeGetParams", name) == 0) return HIP_API_ID_hipGraphMemsetNodeGetParams; if (strcmp("hipGraphMemsetNodeSetParams", name) == 0) return HIP_API_ID_hipGraphMemsetNodeSetParams; if (strcmp("hipGraphNodeFindInClone", name) == 0) return HIP_API_ID_hipGraphNodeFindInClone; if (strcmp("hipGraphNodeGetDependencies", name) == 0) return HIP_API_ID_hipGraphNodeGetDependencies; if (strcmp("hipGraphNodeGetDependentNodes", name) == 0) return HIP_API_ID_hipGraphNodeGetDependentNodes; if (strcmp("hipGraphNodeGetEnabled", name) == 0) return HIP_API_ID_hipGraphNodeGetEnabled; if (strcmp("hipGraphNodeGetType", name) == 0) return HIP_API_ID_hipGraphNodeGetType; if (strcmp("hipGraphNodeSetEnabled", name) == 0) return HIP_API_ID_hipGraphNodeSetEnabled; if (strcmp("hipGraphReleaseUserObject", name) == 0) return HIP_API_ID_hipGraphReleaseUserObject; if (strcmp("hipGraphRemoveDependencies", name) == 0) return HIP_API_ID_hipGraphRemoveDependencies; if (strcmp("hipGraphRetainUserObject", name) == 0) return HIP_API_ID_hipGraphRetainUserObject; if (strcmp("hipGraphUpload", name) == 0) return HIP_API_ID_hipGraphUpload; if (strcmp("hipGraphicsGLRegisterBuffer", name) == 0) return HIP_API_ID_hipGraphicsGLRegisterBuffer; if (strcmp("hipGraphicsGLRegisterImage", name) == 0) return HIP_API_ID_hipGraphicsGLRegisterImage; if (strcmp("hipGraphicsMapResources", name) == 0) return HIP_API_ID_hipGraphicsMapResources; if (strcmp("hipGraphicsResourceGetMappedPointer", name) == 0) return HIP_API_ID_hipGraphicsResourceGetMappedPointer; if (strcmp("hipGraphicsSubResourceGetMappedArray", name) == 0) return HIP_API_ID_hipGraphicsSubResourceGetMappedArray; if (strcmp("hipGraphicsUnmapResources", name) == 0) return HIP_API_ID_hipGraphicsUnmapResources; if (strcmp("hipGraphicsUnregisterResource", name) == 0) return HIP_API_ID_hipGraphicsUnregisterResource; if (strcmp("hipHccModuleLaunchKernel", name) == 0) return HIP_API_ID_hipHccModuleLaunchKernel; if (strcmp("hipHostAlloc", name) == 0) return HIP_API_ID_hipHostAlloc; if (strcmp("hipHostFree", name) == 0) return HIP_API_ID_hipHostFree; if (strcmp("hipHostGetDevicePointer", name) == 0) return HIP_API_ID_hipHostGetDevicePointer; if (strcmp("hipHostGetFlags", name) == 0) return HIP_API_ID_hipHostGetFlags; if (strcmp("hipHostMalloc", name) == 0) return HIP_API_ID_hipHostMalloc; if (strcmp("hipHostRegister", name) == 0) return HIP_API_ID_hipHostRegister; if (strcmp("hipHostUnregister", name) == 0) return HIP_API_ID_hipHostUnregister; if (strcmp("hipImportExternalMemory", name) == 0) return HIP_API_ID_hipImportExternalMemory; if (strcmp("hipImportExternalSemaphore", name) == 0) return HIP_API_ID_hipImportExternalSemaphore; if (strcmp("hipInit", name) == 0) return HIP_API_ID_hipInit; if (strcmp("hipIpcCloseMemHandle", name) == 0) return HIP_API_ID_hipIpcCloseMemHandle; if (strcmp("hipIpcGetEventHandle", name) == 0) return HIP_API_ID_hipIpcGetEventHandle; if (strcmp("hipIpcGetMemHandle", name) == 0) return HIP_API_ID_hipIpcGetMemHandle; if (strcmp("hipIpcOpenEventHandle", name) == 0) return HIP_API_ID_hipIpcOpenEventHandle; if (strcmp("hipIpcOpenMemHandle", name) == 0) return HIP_API_ID_hipIpcOpenMemHandle; if (strcmp("hipLaunchByPtr", name) == 0) return HIP_API_ID_hipLaunchByPtr; if (strcmp("hipLaunchCooperativeKernel", name) == 0) return HIP_API_ID_hipLaunchCooperativeKernel; if (strcmp("hipLaunchCooperativeKernelMultiDevice", name) == 0) return HIP_API_ID_hipLaunchCooperativeKernelMultiDevice; if (strcmp("hipLaunchHostFunc", name) == 0) return HIP_API_ID_hipLaunchHostFunc; if (strcmp("hipLaunchKernel", name) == 0) return HIP_API_ID_hipLaunchKernel; if (strcmp("hipMalloc", name) == 0) return HIP_API_ID_hipMalloc; if (strcmp("hipMalloc3D", name) == 0) return HIP_API_ID_hipMalloc3D; if (strcmp("hipMalloc3DArray", name) == 0) return HIP_API_ID_hipMalloc3DArray; if (strcmp("hipMallocArray", name) == 0) return HIP_API_ID_hipMallocArray; if (strcmp("hipMallocAsync", name) == 0) return HIP_API_ID_hipMallocAsync; if (strcmp("hipMallocFromPoolAsync", name) == 0) return HIP_API_ID_hipMallocFromPoolAsync; if (strcmp("hipMallocHost", name) == 0) return HIP_API_ID_hipMallocHost; if (strcmp("hipMallocManaged", name) == 0) return HIP_API_ID_hipMallocManaged; if (strcmp("hipMallocMipmappedArray", name) == 0) return HIP_API_ID_hipMallocMipmappedArray; if (strcmp("hipMallocPitch", name) == 0) return HIP_API_ID_hipMallocPitch; if (strcmp("hipMemAddressFree", name) == 0) return HIP_API_ID_hipMemAddressFree; if (strcmp("hipMemAddressReserve", name) == 0) return HIP_API_ID_hipMemAddressReserve; if (strcmp("hipMemAdvise", name) == 0) return HIP_API_ID_hipMemAdvise; if (strcmp("hipMemAllocHost", name) == 0) return HIP_API_ID_hipMemAllocHost; if (strcmp("hipMemAllocPitch", name) == 0) return HIP_API_ID_hipMemAllocPitch; if (strcmp("hipMemCreate", name) == 0) return HIP_API_ID_hipMemCreate; if (strcmp("hipMemExportToShareableHandle", name) == 0) return HIP_API_ID_hipMemExportToShareableHandle; if (strcmp("hipMemGetAccess", name) == 0) return HIP_API_ID_hipMemGetAccess; if (strcmp("hipMemGetAddressRange", name) == 0) return HIP_API_ID_hipMemGetAddressRange; if (strcmp("hipMemGetAllocationGranularity", name) == 0) return HIP_API_ID_hipMemGetAllocationGranularity; if (strcmp("hipMemGetAllocationPropertiesFromHandle", name) == 0) return HIP_API_ID_hipMemGetAllocationPropertiesFromHandle; if (strcmp("hipMemGetInfo", name) == 0) return HIP_API_ID_hipMemGetInfo; if (strcmp("hipMemImportFromShareableHandle", name) == 0) return HIP_API_ID_hipMemImportFromShareableHandle; if (strcmp("hipMemMap", name) == 0) return HIP_API_ID_hipMemMap; if (strcmp("hipMemMapArrayAsync", name) == 0) return HIP_API_ID_hipMemMapArrayAsync; if (strcmp("hipMemPoolCreate", name) == 0) return HIP_API_ID_hipMemPoolCreate; if (strcmp("hipMemPoolDestroy", name) == 0) return HIP_API_ID_hipMemPoolDestroy; if (strcmp("hipMemPoolExportPointer", name) == 0) return HIP_API_ID_hipMemPoolExportPointer; if (strcmp("hipMemPoolExportToShareableHandle", name) == 0) return HIP_API_ID_hipMemPoolExportToShareableHandle; if (strcmp("hipMemPoolGetAccess", name) == 0) return HIP_API_ID_hipMemPoolGetAccess; if (strcmp("hipMemPoolGetAttribute", name) == 0) return HIP_API_ID_hipMemPoolGetAttribute; if (strcmp("hipMemPoolImportFromShareableHandle", name) == 0) return HIP_API_ID_hipMemPoolImportFromShareableHandle; if (strcmp("hipMemPoolImportPointer", name) == 0) return HIP_API_ID_hipMemPoolImportPointer; if (strcmp("hipMemPoolSetAccess", name) == 0) return HIP_API_ID_hipMemPoolSetAccess; if (strcmp("hipMemPoolSetAttribute", name) == 0) return HIP_API_ID_hipMemPoolSetAttribute; if (strcmp("hipMemPoolTrimTo", name) == 0) return HIP_API_ID_hipMemPoolTrimTo; if (strcmp("hipMemPrefetchAsync", name) == 0) return HIP_API_ID_hipMemPrefetchAsync; if (strcmp("hipMemPtrGetInfo", name) == 0) return HIP_API_ID_hipMemPtrGetInfo; if (strcmp("hipMemRangeGetAttribute", name) == 0) return HIP_API_ID_hipMemRangeGetAttribute; if (strcmp("hipMemRangeGetAttributes", name) == 0) return HIP_API_ID_hipMemRangeGetAttributes; if (strcmp("hipMemRelease", name) == 0) return HIP_API_ID_hipMemRelease; if (strcmp("hipMemRetainAllocationHandle", name) == 0) return HIP_API_ID_hipMemRetainAllocationHandle; if (strcmp("hipMemSetAccess", name) == 0) return HIP_API_ID_hipMemSetAccess; if (strcmp("hipMemUnmap", name) == 0) return HIP_API_ID_hipMemUnmap; if (strcmp("hipMemcpy", name) == 0) return HIP_API_ID_hipMemcpy; if (strcmp("hipMemcpy2D", name) == 0) return HIP_API_ID_hipMemcpy2D; if (strcmp("hipMemcpy2DAsync", name) == 0) return HIP_API_ID_hipMemcpy2DAsync; if (strcmp("hipMemcpy2DFromArray", name) == 0) return HIP_API_ID_hipMemcpy2DFromArray; if (strcmp("hipMemcpy2DFromArrayAsync", name) == 0) return HIP_API_ID_hipMemcpy2DFromArrayAsync; if (strcmp("hipMemcpy2DToArray", name) == 0) return HIP_API_ID_hipMemcpy2DToArray; if (strcmp("hipMemcpy2DToArrayAsync", name) == 0) return HIP_API_ID_hipMemcpy2DToArrayAsync; if (strcmp("hipMemcpy3D", name) == 0) return HIP_API_ID_hipMemcpy3D; if (strcmp("hipMemcpy3DAsync", name) == 0) return HIP_API_ID_hipMemcpy3DAsync; if (strcmp("hipMemcpyAsync", name) == 0) return HIP_API_ID_hipMemcpyAsync; if (strcmp("hipMemcpyAtoH", name) == 0) return HIP_API_ID_hipMemcpyAtoH; if (strcmp("hipMemcpyDtoD", name) == 0) return HIP_API_ID_hipMemcpyDtoD; if (strcmp("hipMemcpyDtoDAsync", name) == 0) return HIP_API_ID_hipMemcpyDtoDAsync; if (strcmp("hipMemcpyDtoH", name) == 0) return HIP_API_ID_hipMemcpyDtoH; if (strcmp("hipMemcpyDtoHAsync", name) == 0) return HIP_API_ID_hipMemcpyDtoHAsync; if (strcmp("hipMemcpyFromArray", name) == 0) return HIP_API_ID_hipMemcpyFromArray; if (strcmp("hipMemcpyFromSymbol", name) == 0) return HIP_API_ID_hipMemcpyFromSymbol; if (strcmp("hipMemcpyFromSymbolAsync", name) == 0) return HIP_API_ID_hipMemcpyFromSymbolAsync; if (strcmp("hipMemcpyHtoA", name) == 0) return HIP_API_ID_hipMemcpyHtoA; if (strcmp("hipMemcpyHtoD", name) == 0) return HIP_API_ID_hipMemcpyHtoD; if (strcmp("hipMemcpyHtoDAsync", name) == 0) return HIP_API_ID_hipMemcpyHtoDAsync; if (strcmp("hipMemcpyParam2D", name) == 0) return HIP_API_ID_hipMemcpyParam2D; if (strcmp("hipMemcpyParam2DAsync", name) == 0) return HIP_API_ID_hipMemcpyParam2DAsync; if (strcmp("hipMemcpyPeer", name) == 0) return HIP_API_ID_hipMemcpyPeer; if (strcmp("hipMemcpyPeerAsync", name) == 0) return HIP_API_ID_hipMemcpyPeerAsync; if (strcmp("hipMemcpyToArray", name) == 0) return HIP_API_ID_hipMemcpyToArray; if (strcmp("hipMemcpyToSymbol", name) == 0) return HIP_API_ID_hipMemcpyToSymbol; if (strcmp("hipMemcpyToSymbolAsync", name) == 0) return HIP_API_ID_hipMemcpyToSymbolAsync; if (strcmp("hipMemcpyWithStream", name) == 0) return HIP_API_ID_hipMemcpyWithStream; if (strcmp("hipMemset", name) == 0) return HIP_API_ID_hipMemset; if (strcmp("hipMemset2D", name) == 0) return HIP_API_ID_hipMemset2D; if (strcmp("hipMemset2DAsync", name) == 0) return HIP_API_ID_hipMemset2DAsync; if (strcmp("hipMemset3D", name) == 0) return HIP_API_ID_hipMemset3D; if (strcmp("hipMemset3DAsync", name) == 0) return HIP_API_ID_hipMemset3DAsync; if (strcmp("hipMemsetAsync", name) == 0) return HIP_API_ID_hipMemsetAsync; if (strcmp("hipMemsetD16", name) == 0) return HIP_API_ID_hipMemsetD16; if (strcmp("hipMemsetD16Async", name) == 0) return HIP_API_ID_hipMemsetD16Async; if (strcmp("hipMemsetD32", name) == 0) return HIP_API_ID_hipMemsetD32; if (strcmp("hipMemsetD32Async", name) == 0) return HIP_API_ID_hipMemsetD32Async; if (strcmp("hipMemsetD8", name) == 0) return HIP_API_ID_hipMemsetD8; if (strcmp("hipMemsetD8Async", name) == 0) return HIP_API_ID_hipMemsetD8Async; if (strcmp("hipMipmappedArrayCreate", name) == 0) return HIP_API_ID_hipMipmappedArrayCreate; if (strcmp("hipMipmappedArrayDestroy", name) == 0) return HIP_API_ID_hipMipmappedArrayDestroy; if (strcmp("hipMipmappedArrayGetLevel", name) == 0) return HIP_API_ID_hipMipmappedArrayGetLevel; if (strcmp("hipModuleGetFunction", name) == 0) return HIP_API_ID_hipModuleGetFunction; if (strcmp("hipModuleGetGlobal", name) == 0) return HIP_API_ID_hipModuleGetGlobal; if (strcmp("hipModuleGetTexRef", name) == 0) return HIP_API_ID_hipModuleGetTexRef; if (strcmp("hipModuleLaunchCooperativeKernel", name) == 0) return HIP_API_ID_hipModuleLaunchCooperativeKernel; if (strcmp("hipModuleLaunchCooperativeKernelMultiDevice", name) == 0) return HIP_API_ID_hipModuleLaunchCooperativeKernelMultiDevice; if (strcmp("hipModuleLaunchKernel", name) == 0) return HIP_API_ID_hipModuleLaunchKernel; if (strcmp("hipModuleLoad", name) == 0) return HIP_API_ID_hipModuleLoad; if (strcmp("hipModuleLoadData", name) == 0) return HIP_API_ID_hipModuleLoadData; if (strcmp("hipModuleLoadDataEx", name) == 0) return HIP_API_ID_hipModuleLoadDataEx; if (strcmp("hipModuleOccupancyMaxActiveBlocksPerMultiprocessor", name) == 0) return HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessor; if (strcmp("hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags", name) == 0) return HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags; if (strcmp("hipModuleOccupancyMaxPotentialBlockSize", name) == 0) return HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSize; if (strcmp("hipModuleOccupancyMaxPotentialBlockSizeWithFlags", name) == 0) return HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSizeWithFlags; if (strcmp("hipModuleUnload", name) == 0) return HIP_API_ID_hipModuleUnload; if (strcmp("hipOccupancyMaxActiveBlocksPerMultiprocessor", name) == 0) return HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessor; if (strcmp("hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags", name) == 0) return HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags; if (strcmp("hipOccupancyMaxPotentialBlockSize", name) == 0) return HIP_API_ID_hipOccupancyMaxPotentialBlockSize; if (strcmp("hipPeekAtLastError", name) == 0) return HIP_API_ID_hipPeekAtLastError; if (strcmp("hipPointerGetAttribute", name) == 0) return HIP_API_ID_hipPointerGetAttribute; if (strcmp("hipPointerGetAttributes", name) == 0) return HIP_API_ID_hipPointerGetAttributes; if (strcmp("hipPointerSetAttribute", name) == 0) return HIP_API_ID_hipPointerSetAttribute; if (strcmp("hipProfilerStart", name) == 0) return HIP_API_ID_hipProfilerStart; if (strcmp("hipProfilerStop", name) == 0) return HIP_API_ID_hipProfilerStop; if (strcmp("hipRuntimeGetVersion", name) == 0) return HIP_API_ID_hipRuntimeGetVersion; if (strcmp("hipSetDevice", name) == 0) return HIP_API_ID_hipSetDevice; if (strcmp("hipSetDeviceFlags", name) == 0) return HIP_API_ID_hipSetDeviceFlags; if (strcmp("hipSetupArgument", name) == 0) return HIP_API_ID_hipSetupArgument; if (strcmp("hipSignalExternalSemaphoresAsync", name) == 0) return HIP_API_ID_hipSignalExternalSemaphoresAsync; if (strcmp("hipStreamAddCallback", name) == 0) return HIP_API_ID_hipStreamAddCallback; if (strcmp("hipStreamAttachMemAsync", name) == 0) return HIP_API_ID_hipStreamAttachMemAsync; if (strcmp("hipStreamBeginCapture", name) == 0) return HIP_API_ID_hipStreamBeginCapture; if (strcmp("hipStreamCreate", name) == 0) return HIP_API_ID_hipStreamCreate; if (strcmp("hipStreamCreateWithFlags", name) == 0) return HIP_API_ID_hipStreamCreateWithFlags; if (strcmp("hipStreamCreateWithPriority", name) == 0) return HIP_API_ID_hipStreamCreateWithPriority; if (strcmp("hipStreamDestroy", name) == 0) return HIP_API_ID_hipStreamDestroy; if (strcmp("hipStreamEndCapture", name) == 0) return HIP_API_ID_hipStreamEndCapture; if (strcmp("hipStreamGetCaptureInfo", name) == 0) return HIP_API_ID_hipStreamGetCaptureInfo; if (strcmp("hipStreamGetCaptureInfo_v2", name) == 0) return HIP_API_ID_hipStreamGetCaptureInfo_v2; if (strcmp("hipStreamGetDevice", name) == 0) return HIP_API_ID_hipStreamGetDevice; if (strcmp("hipStreamGetFlags", name) == 0) return HIP_API_ID_hipStreamGetFlags; if (strcmp("hipStreamGetPriority", name) == 0) return HIP_API_ID_hipStreamGetPriority; if (strcmp("hipStreamIsCapturing", name) == 0) return HIP_API_ID_hipStreamIsCapturing; if (strcmp("hipStreamQuery", name) == 0) return HIP_API_ID_hipStreamQuery; if (strcmp("hipStreamSynchronize", name) == 0) return HIP_API_ID_hipStreamSynchronize; if (strcmp("hipStreamUpdateCaptureDependencies", name) == 0) return HIP_API_ID_hipStreamUpdateCaptureDependencies; if (strcmp("hipStreamWaitEvent", name) == 0) return HIP_API_ID_hipStreamWaitEvent; if (strcmp("hipStreamWaitValue32", name) == 0) return HIP_API_ID_hipStreamWaitValue32; if (strcmp("hipStreamWaitValue64", name) == 0) return HIP_API_ID_hipStreamWaitValue64; if (strcmp("hipStreamWriteValue32", name) == 0) return HIP_API_ID_hipStreamWriteValue32; if (strcmp("hipStreamWriteValue64", name) == 0) return HIP_API_ID_hipStreamWriteValue64; if (strcmp("hipTexRefGetAddress", name) == 0) return HIP_API_ID_hipTexRefGetAddress; if (strcmp("hipTexRefGetFlags", name) == 0) return HIP_API_ID_hipTexRefGetFlags; if (strcmp("hipTexRefGetFormat", name) == 0) return HIP_API_ID_hipTexRefGetFormat; if (strcmp("hipTexRefGetMaxAnisotropy", name) == 0) return HIP_API_ID_hipTexRefGetMaxAnisotropy; if (strcmp("hipTexRefGetMipMappedArray", name) == 0) return HIP_API_ID_hipTexRefGetMipMappedArray; if (strcmp("hipTexRefGetMipmapLevelBias", name) == 0) return HIP_API_ID_hipTexRefGetMipmapLevelBias; if (strcmp("hipTexRefGetMipmapLevelClamp", name) == 0) return HIP_API_ID_hipTexRefGetMipmapLevelClamp; if (strcmp("hipTexRefSetAddress", name) == 0) return HIP_API_ID_hipTexRefSetAddress; if (strcmp("hipTexRefSetAddress2D", name) == 0) return HIP_API_ID_hipTexRefSetAddress2D; if (strcmp("hipTexRefSetArray", name) == 0) return HIP_API_ID_hipTexRefSetArray; if (strcmp("hipTexRefSetBorderColor", name) == 0) return HIP_API_ID_hipTexRefSetBorderColor; if (strcmp("hipTexRefSetFlags", name) == 0) return HIP_API_ID_hipTexRefSetFlags; if (strcmp("hipTexRefSetFormat", name) == 0) return HIP_API_ID_hipTexRefSetFormat; if (strcmp("hipTexRefSetMaxAnisotropy", name) == 0) return HIP_API_ID_hipTexRefSetMaxAnisotropy; if (strcmp("hipTexRefSetMipmapLevelBias", name) == 0) return HIP_API_ID_hipTexRefSetMipmapLevelBias; if (strcmp("hipTexRefSetMipmapLevelClamp", name) == 0) return HIP_API_ID_hipTexRefSetMipmapLevelClamp; if (strcmp("hipTexRefSetMipmappedArray", name) == 0) return HIP_API_ID_hipTexRefSetMipmappedArray; if (strcmp("hipThreadExchangeStreamCaptureMode", name) == 0) return HIP_API_ID_hipThreadExchangeStreamCaptureMode; if (strcmp("hipUserObjectCreate", name) == 0) return HIP_API_ID_hipUserObjectCreate; if (strcmp("hipUserObjectRelease", name) == 0) return HIP_API_ID_hipUserObjectRelease; if (strcmp("hipUserObjectRetain", name) == 0) return HIP_API_ID_hipUserObjectRetain; if (strcmp("hipWaitExternalSemaphoresAsync", name) == 0) return HIP_API_ID_hipWaitExternalSemaphoresAsync; return HIP_API_ID_NONE; } // HIP API callbacks data structures typedef struct hip_api_data_s { uint64_t correlation_id; uint32_t phase; union { struct { dim3* gridDim; dim3 gridDim__val; dim3* blockDim; dim3 blockDim__val; size_t* sharedMem; size_t sharedMem__val; hipStream_t* stream; hipStream_t stream__val; } __hipPopCallConfiguration; struct { dim3 gridDim; dim3 blockDim; size_t sharedMem; hipStream_t stream; } __hipPushCallConfiguration; struct { hipArray** array; hipArray* array__val; const HIP_ARRAY3D_DESCRIPTOR* pAllocateArray; HIP_ARRAY3D_DESCRIPTOR pAllocateArray__val; } hipArray3DCreate; struct { HIP_ARRAY3D_DESCRIPTOR* pArrayDescriptor; HIP_ARRAY3D_DESCRIPTOR pArrayDescriptor__val; hipArray* array; hipArray array__val; } hipArray3DGetDescriptor; struct { hipArray** pHandle; hipArray* pHandle__val; const HIP_ARRAY_DESCRIPTOR* pAllocateArray; HIP_ARRAY_DESCRIPTOR pAllocateArray__val; } hipArrayCreate; struct { hipArray* array; hipArray array__val; } hipArrayDestroy; struct { HIP_ARRAY_DESCRIPTOR* pArrayDescriptor; HIP_ARRAY_DESCRIPTOR pArrayDescriptor__val; hipArray* array; hipArray array__val; } hipArrayGetDescriptor; struct { hipChannelFormatDesc* desc; hipChannelFormatDesc desc__val; hipExtent* extent; hipExtent extent__val; unsigned int* flags; unsigned int flags__val; hipArray* array; hipArray array__val; } hipArrayGetInfo; struct { int* device; int device__val; const hipDeviceProp_t* prop; hipDeviceProp_t prop__val; } hipChooseDevice; struct { dim3 gridDim; dim3 blockDim; size_t sharedMem; hipStream_t stream; } hipConfigureCall; struct { hipSurfaceObject_t* pSurfObject; hipSurfaceObject_t pSurfObject__val; const hipResourceDesc* pResDesc; hipResourceDesc pResDesc__val; } hipCreateSurfaceObject; struct { hipCtx_t* ctx; hipCtx_t ctx__val; unsigned int flags; hipDevice_t device; } hipCtxCreate; struct { hipCtx_t ctx; } hipCtxDestroy; struct { hipCtx_t peerCtx; } hipCtxDisablePeerAccess; struct { hipCtx_t peerCtx; unsigned int flags; } hipCtxEnablePeerAccess; struct { hipCtx_t ctx; int* apiVersion; int apiVersion__val; } hipCtxGetApiVersion; struct { hipFuncCache_t* cacheConfig; hipFuncCache_t cacheConfig__val; } hipCtxGetCacheConfig; struct { hipCtx_t* ctx; hipCtx_t ctx__val; } hipCtxGetCurrent; struct { hipDevice_t* device; hipDevice_t device__val; } hipCtxGetDevice; struct { unsigned int* flags; unsigned int flags__val; } hipCtxGetFlags; struct { hipSharedMemConfig* pConfig; hipSharedMemConfig pConfig__val; } hipCtxGetSharedMemConfig; struct { hipCtx_t* ctx; hipCtx_t ctx__val; } hipCtxPopCurrent; struct { hipCtx_t ctx; } hipCtxPushCurrent; struct { hipFuncCache_t cacheConfig; } hipCtxSetCacheConfig; struct { hipCtx_t ctx; } hipCtxSetCurrent; struct { hipSharedMemConfig config; } hipCtxSetSharedMemConfig; struct { hipExternalMemory_t extMem; } hipDestroyExternalMemory; struct { hipExternalSemaphore_t extSem; } hipDestroyExternalSemaphore; struct { hipSurfaceObject_t surfaceObject; } hipDestroySurfaceObject; struct { int* canAccessPeer; int canAccessPeer__val; int deviceId; int peerDeviceId; } hipDeviceCanAccessPeer; struct { int* major; int major__val; int* minor; int minor__val; hipDevice_t device; } hipDeviceComputeCapability; struct { int peerDeviceId; } hipDeviceDisablePeerAccess; struct { int peerDeviceId; unsigned int flags; } hipDeviceEnablePeerAccess; struct { hipDevice_t* device; hipDevice_t device__val; int ordinal; } hipDeviceGet; struct { int* pi; int pi__val; hipDeviceAttribute_t attr; int deviceId; } hipDeviceGetAttribute; struct { int* device; int device__val; const char* pciBusId; char pciBusId__val; } hipDeviceGetByPCIBusId; struct { hipFuncCache_t* cacheConfig; hipFuncCache_t cacheConfig__val; } hipDeviceGetCacheConfig; struct { hipMemPool_t* mem_pool; hipMemPool_t mem_pool__val; int device; } hipDeviceGetDefaultMemPool; struct { int device; hipGraphMemAttributeType attr; void* value; } hipDeviceGetGraphMemAttribute; struct { size_t* pValue; size_t pValue__val; enum hipLimit_t limit; } hipDeviceGetLimit; struct { hipMemPool_t* mem_pool; hipMemPool_t mem_pool__val; int device; } hipDeviceGetMemPool; struct { char* name; char name__val; int len; hipDevice_t device; } hipDeviceGetName; struct { int* value; int value__val; hipDeviceP2PAttr attr; int srcDevice; int dstDevice; } hipDeviceGetP2PAttribute; struct { char* pciBusId; char pciBusId__val; int len; int device; } hipDeviceGetPCIBusId; struct { hipSharedMemConfig* pConfig; hipSharedMemConfig pConfig__val; } hipDeviceGetSharedMemConfig; struct { int* leastPriority; int leastPriority__val; int* greatestPriority; int greatestPriority__val; } hipDeviceGetStreamPriorityRange; struct { hipUUID* uuid; hipUUID uuid__val; hipDevice_t device; } hipDeviceGetUuid; struct { int device; } hipDeviceGraphMemTrim; struct { hipDevice_t dev; unsigned int* flags; unsigned int flags__val; int* active; int active__val; } hipDevicePrimaryCtxGetState; struct { hipDevice_t dev; } hipDevicePrimaryCtxRelease; struct { hipDevice_t dev; } hipDevicePrimaryCtxReset; struct { hipCtx_t* pctx; hipCtx_t pctx__val; hipDevice_t dev; } hipDevicePrimaryCtxRetain; struct { hipDevice_t dev; unsigned int flags; } hipDevicePrimaryCtxSetFlags; struct { hipFuncCache_t cacheConfig; } hipDeviceSetCacheConfig; struct { int device; hipGraphMemAttributeType attr; void* value; } hipDeviceSetGraphMemAttribute; struct { enum hipLimit_t limit; size_t value; } hipDeviceSetLimit; struct { int device; hipMemPool_t mem_pool; } hipDeviceSetMemPool; struct { hipSharedMemConfig config; } hipDeviceSetSharedMemConfig; struct { size_t* bytes; size_t bytes__val; hipDevice_t device; } hipDeviceTotalMem; struct { int* driverVersion; int driverVersion__val; } hipDriverGetVersion; struct { const hip_Memcpy2D* pCopy; hip_Memcpy2D pCopy__val; } hipDrvMemcpy2DUnaligned; struct { const HIP_MEMCPY3D* pCopy; HIP_MEMCPY3D pCopy__val; } hipDrvMemcpy3D; struct { const HIP_MEMCPY3D* pCopy; HIP_MEMCPY3D pCopy__val; hipStream_t stream; } hipDrvMemcpy3DAsync; struct { unsigned int numAttributes; hipPointer_attribute* attributes; hipPointer_attribute attributes__val; void** data; void* data__val; hipDeviceptr_t ptr; } hipDrvPointerGetAttributes; struct { hipEvent_t* event; hipEvent_t event__val; } hipEventCreate; struct { hipEvent_t* event; hipEvent_t event__val; unsigned int flags; } hipEventCreateWithFlags; struct { hipEvent_t event; } hipEventDestroy; struct { float* ms; float ms__val; hipEvent_t start; hipEvent_t stop; } hipEventElapsedTime; struct { hipEvent_t event; } hipEventQuery; struct { hipEvent_t event; hipStream_t stream; } hipEventRecord; struct { hipEvent_t event; } hipEventSynchronize; struct { int device1; int device2; unsigned int* linktype; unsigned int linktype__val; unsigned int* hopcount; unsigned int hopcount__val; } hipExtGetLinkTypeAndHopCount; struct { const void* function_address; dim3 numBlocks; dim3 dimBlocks; void** args; void* args__val; size_t sharedMemBytes; hipStream_t stream; hipEvent_t startEvent; hipEvent_t stopEvent; int flags; } hipExtLaunchKernel; struct { hipLaunchParams* launchParamsList; hipLaunchParams launchParamsList__val; int numDevices; unsigned int flags; } hipExtLaunchMultiKernelMultiDevice; struct { void** ptr; void* ptr__val; size_t sizeBytes; unsigned int flags; } hipExtMallocWithFlags; struct { hipFunction_t f; unsigned int globalWorkSizeX; unsigned int globalWorkSizeY; unsigned int globalWorkSizeZ; unsigned int localWorkSizeX; unsigned int localWorkSizeY; unsigned int localWorkSizeZ; size_t sharedMemBytes; hipStream_t hStream; void** kernelParams; void* kernelParams__val; void** extra; void* extra__val; hipEvent_t startEvent; hipEvent_t stopEvent; unsigned int flags; } hipExtModuleLaunchKernel; struct { hipStream_t* stream; hipStream_t stream__val; unsigned int cuMaskSize; const unsigned int* cuMask; unsigned int cuMask__val; } hipExtStreamCreateWithCUMask; struct { hipStream_t stream; unsigned int cuMaskSize; unsigned int* cuMask; unsigned int cuMask__val; } hipExtStreamGetCUMask; struct { void** devPtr; void* devPtr__val; hipExternalMemory_t extMem; const hipExternalMemoryBufferDesc* bufferDesc; hipExternalMemoryBufferDesc bufferDesc__val; } hipExternalMemoryGetMappedBuffer; struct { void* ptr; } hipFree; struct { hipArray* array; hipArray array__val; } hipFreeArray; struct { void* dev_ptr; hipStream_t stream; } hipFreeAsync; struct { void* ptr; } hipFreeHost; struct { hipMipmappedArray_t mipmappedArray; } hipFreeMipmappedArray; struct { int* value; int value__val; hipFunction_attribute attrib; hipFunction_t hfunc; } hipFuncGetAttribute; struct { hipFuncAttributes* attr; hipFuncAttributes attr__val; const void* func; } hipFuncGetAttributes; struct { const void* func; hipFuncAttribute attr; int value; } hipFuncSetAttribute; struct { const void* func; hipFuncCache_t config; } hipFuncSetCacheConfig; struct { const void* func; hipSharedMemConfig config; } hipFuncSetSharedMemConfig; struct { unsigned int* pHipDeviceCount; unsigned int pHipDeviceCount__val; int* pHipDevices; int pHipDevices__val; unsigned int hipDeviceCount; hipGLDeviceList deviceList; } hipGLGetDevices; struct { hipChannelFormatDesc* desc; hipChannelFormatDesc desc__val; hipArray_const_t array; } hipGetChannelDesc; struct { int* deviceId; int deviceId__val; } hipGetDevice; struct { int* count; int count__val; } hipGetDeviceCount; struct { unsigned int* flags; unsigned int flags__val; } hipGetDeviceFlags; struct { hipDeviceProp_t* props; hipDeviceProp_t props__val; hipDevice_t device; } hipGetDeviceProperties; struct { hipArray_t* levelArray; hipArray_t levelArray__val; hipMipmappedArray_const_t mipmappedArray; unsigned int level; } hipGetMipmappedArrayLevel; struct { void** devPtr; void* devPtr__val; const void* symbol; } hipGetSymbolAddress; struct { size_t* size; size_t size__val; const void* symbol; } hipGetSymbolSize; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; hipGraph_t childGraph; } hipGraphAddChildGraphNode; struct { hipGraph_t graph; const hipGraphNode_t* from; hipGraphNode_t from__val; const hipGraphNode_t* to; hipGraphNode_t to__val; size_t numDependencies; } hipGraphAddDependencies; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; } hipGraphAddEmptyNode; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; hipEvent_t event; } hipGraphAddEventRecordNode; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; hipEvent_t event; } hipGraphAddEventWaitNode; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; const hipHostNodeParams* pNodeParams; hipHostNodeParams pNodeParams__val; } hipGraphAddHostNode; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; const hipKernelNodeParams* pNodeParams; hipKernelNodeParams pNodeParams__val; } hipGraphAddKernelNode; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; hipMemAllocNodeParams* pNodeParams; hipMemAllocNodeParams pNodeParams__val; } hipGraphAddMemAllocNode; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; void* dev_ptr; } hipGraphAddMemFreeNode; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; const hipMemcpy3DParms* pCopyParams; hipMemcpy3DParms pCopyParams__val; } hipGraphAddMemcpyNode; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; void* dst; const void* src; size_t count; hipMemcpyKind kind; } hipGraphAddMemcpyNode1D; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; void* dst; const void* symbol; size_t count; size_t offset; hipMemcpyKind kind; } hipGraphAddMemcpyNodeFromSymbol; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; const void* symbol; const void* src; size_t count; size_t offset; hipMemcpyKind kind; } hipGraphAddMemcpyNodeToSymbol; struct { hipGraphNode_t* pGraphNode; hipGraphNode_t pGraphNode__val; hipGraph_t graph; const hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t numDependencies; const hipMemsetParams* pMemsetParams; hipMemsetParams pMemsetParams__val; } hipGraphAddMemsetNode; struct { hipGraphNode_t node; hipGraph_t* pGraph; hipGraph_t pGraph__val; } hipGraphChildGraphNodeGetGraph; struct { hipGraph_t* pGraphClone; hipGraph_t pGraphClone__val; hipGraph_t originalGraph; } hipGraphClone; struct { hipGraph_t* pGraph; hipGraph_t pGraph__val; unsigned int flags; } hipGraphCreate; struct { hipGraph_t graph; const char* path; char path__val; unsigned int flags; } hipGraphDebugDotPrint; struct { hipGraph_t graph; } hipGraphDestroy; struct { hipGraphNode_t node; } hipGraphDestroyNode; struct { hipGraphNode_t node; hipEvent_t* event_out; hipEvent_t event_out__val; } hipGraphEventRecordNodeGetEvent; struct { hipGraphNode_t node; hipEvent_t event; } hipGraphEventRecordNodeSetEvent; struct { hipGraphNode_t node; hipEvent_t* event_out; hipEvent_t event_out__val; } hipGraphEventWaitNodeGetEvent; struct { hipGraphNode_t node; hipEvent_t event; } hipGraphEventWaitNodeSetEvent; struct { hipGraphExec_t hGraphExec; hipGraphNode_t node; hipGraph_t childGraph; } hipGraphExecChildGraphNodeSetParams; struct { hipGraphExec_t graphExec; } hipGraphExecDestroy; struct { hipGraphExec_t hGraphExec; hipGraphNode_t hNode; hipEvent_t event; } hipGraphExecEventRecordNodeSetEvent; struct { hipGraphExec_t hGraphExec; hipGraphNode_t hNode; hipEvent_t event; } hipGraphExecEventWaitNodeSetEvent; struct { hipGraphExec_t hGraphExec; hipGraphNode_t node; const hipHostNodeParams* pNodeParams; hipHostNodeParams pNodeParams__val; } hipGraphExecHostNodeSetParams; struct { hipGraphExec_t hGraphExec; hipGraphNode_t node; const hipKernelNodeParams* pNodeParams; hipKernelNodeParams pNodeParams__val; } hipGraphExecKernelNodeSetParams; struct { hipGraphExec_t hGraphExec; hipGraphNode_t node; hipMemcpy3DParms* pNodeParams; hipMemcpy3DParms pNodeParams__val; } hipGraphExecMemcpyNodeSetParams; struct { hipGraphExec_t hGraphExec; hipGraphNode_t node; void* dst; const void* src; size_t count; hipMemcpyKind kind; } hipGraphExecMemcpyNodeSetParams1D; struct { hipGraphExec_t hGraphExec; hipGraphNode_t node; void* dst; const void* symbol; size_t count; size_t offset; hipMemcpyKind kind; } hipGraphExecMemcpyNodeSetParamsFromSymbol; struct { hipGraphExec_t hGraphExec; hipGraphNode_t node; const void* symbol; const void* src; size_t count; size_t offset; hipMemcpyKind kind; } hipGraphExecMemcpyNodeSetParamsToSymbol; struct { hipGraphExec_t hGraphExec; hipGraphNode_t node; const hipMemsetParams* pNodeParams; hipMemsetParams pNodeParams__val; } hipGraphExecMemsetNodeSetParams; struct { hipGraphExec_t hGraphExec; hipGraph_t hGraph; hipGraphNode_t* hErrorNode_out; hipGraphNode_t hErrorNode_out__val; hipGraphExecUpdateResult* updateResult_out; hipGraphExecUpdateResult updateResult_out__val; } hipGraphExecUpdate; struct { hipGraph_t graph; hipGraphNode_t* from; hipGraphNode_t from__val; hipGraphNode_t* to; hipGraphNode_t to__val; size_t* numEdges; size_t numEdges__val; } hipGraphGetEdges; struct { hipGraph_t graph; hipGraphNode_t* nodes; hipGraphNode_t nodes__val; size_t* numNodes; size_t numNodes__val; } hipGraphGetNodes; struct { hipGraph_t graph; hipGraphNode_t* pRootNodes; hipGraphNode_t pRootNodes__val; size_t* pNumRootNodes; size_t pNumRootNodes__val; } hipGraphGetRootNodes; struct { hipGraphNode_t node; hipHostNodeParams* pNodeParams; hipHostNodeParams pNodeParams__val; } hipGraphHostNodeGetParams; struct { hipGraphNode_t node; const hipHostNodeParams* pNodeParams; hipHostNodeParams pNodeParams__val; } hipGraphHostNodeSetParams; struct { hipGraphExec_t* pGraphExec; hipGraphExec_t pGraphExec__val; hipGraph_t graph; hipGraphNode_t* pErrorNode; hipGraphNode_t pErrorNode__val; char* pLogBuffer; char pLogBuffer__val; size_t bufferSize; } hipGraphInstantiate; struct { hipGraphExec_t* pGraphExec; hipGraphExec_t pGraphExec__val; hipGraph_t graph; unsigned long long flags; } hipGraphInstantiateWithFlags; struct { hipGraphNode_t hSrc; hipGraphNode_t hDst; } hipGraphKernelNodeCopyAttributes; struct { hipGraphNode_t hNode; hipKernelNodeAttrID attr; hipKernelNodeAttrValue* value; hipKernelNodeAttrValue value__val; } hipGraphKernelNodeGetAttribute; struct { hipGraphNode_t node; hipKernelNodeParams* pNodeParams; hipKernelNodeParams pNodeParams__val; } hipGraphKernelNodeGetParams; struct { hipGraphNode_t hNode; hipKernelNodeAttrID attr; const hipKernelNodeAttrValue* value; hipKernelNodeAttrValue value__val; } hipGraphKernelNodeSetAttribute; struct { hipGraphNode_t node; const hipKernelNodeParams* pNodeParams; hipKernelNodeParams pNodeParams__val; } hipGraphKernelNodeSetParams; struct { hipGraphExec_t graphExec; hipStream_t stream; } hipGraphLaunch; struct { hipGraphNode_t node; hipMemAllocNodeParams* pNodeParams; hipMemAllocNodeParams pNodeParams__val; } hipGraphMemAllocNodeGetParams; struct { hipGraphNode_t node; void* dev_ptr; } hipGraphMemFreeNodeGetParams; struct { hipGraphNode_t node; hipMemcpy3DParms* pNodeParams; hipMemcpy3DParms pNodeParams__val; } hipGraphMemcpyNodeGetParams; struct { hipGraphNode_t node; const hipMemcpy3DParms* pNodeParams; hipMemcpy3DParms pNodeParams__val; } hipGraphMemcpyNodeSetParams; struct { hipGraphNode_t node; void* dst; const void* src; size_t count; hipMemcpyKind kind; } hipGraphMemcpyNodeSetParams1D; struct { hipGraphNode_t node; void* dst; const void* symbol; size_t count; size_t offset; hipMemcpyKind kind; } hipGraphMemcpyNodeSetParamsFromSymbol; struct { hipGraphNode_t node; const void* symbol; const void* src; size_t count; size_t offset; hipMemcpyKind kind; } hipGraphMemcpyNodeSetParamsToSymbol; struct { hipGraphNode_t node; hipMemsetParams* pNodeParams; hipMemsetParams pNodeParams__val; } hipGraphMemsetNodeGetParams; struct { hipGraphNode_t node; const hipMemsetParams* pNodeParams; hipMemsetParams pNodeParams__val; } hipGraphMemsetNodeSetParams; struct { hipGraphNode_t* pNode; hipGraphNode_t pNode__val; hipGraphNode_t originalNode; hipGraph_t clonedGraph; } hipGraphNodeFindInClone; struct { hipGraphNode_t node; hipGraphNode_t* pDependencies; hipGraphNode_t pDependencies__val; size_t* pNumDependencies; size_t pNumDependencies__val; } hipGraphNodeGetDependencies; struct { hipGraphNode_t node; hipGraphNode_t* pDependentNodes; hipGraphNode_t pDependentNodes__val; size_t* pNumDependentNodes; size_t pNumDependentNodes__val; } hipGraphNodeGetDependentNodes; struct { hipGraphExec_t hGraphExec; hipGraphNode_t hNode; unsigned int* isEnabled; unsigned int isEnabled__val; } hipGraphNodeGetEnabled; struct { hipGraphNode_t node; hipGraphNodeType* pType; hipGraphNodeType pType__val; } hipGraphNodeGetType; struct { hipGraphExec_t hGraphExec; hipGraphNode_t hNode; unsigned int isEnabled; } hipGraphNodeSetEnabled; struct { hipGraph_t graph; hipUserObject_t object; unsigned int count; } hipGraphReleaseUserObject; struct { hipGraph_t graph; const hipGraphNode_t* from; hipGraphNode_t from__val; const hipGraphNode_t* to; hipGraphNode_t to__val; size_t numDependencies; } hipGraphRemoveDependencies; struct { hipGraph_t graph; hipUserObject_t object; unsigned int count; unsigned int flags; } hipGraphRetainUserObject; struct { hipGraphExec_t graphExec; hipStream_t stream; } hipGraphUpload; struct { hipGraphicsResource** resource; hipGraphicsResource* resource__val; GLuint buffer; unsigned int flags; } hipGraphicsGLRegisterBuffer; struct { hipGraphicsResource** resource; hipGraphicsResource* resource__val; GLuint image; GLenum target; unsigned int flags; } hipGraphicsGLRegisterImage; struct { int count; hipGraphicsResource_t* resources; hipGraphicsResource_t resources__val; hipStream_t stream; } hipGraphicsMapResources; struct { void** devPtr; void* devPtr__val; size_t* size; size_t size__val; hipGraphicsResource_t resource; } hipGraphicsResourceGetMappedPointer; struct { hipArray_t* array; hipArray_t array__val; hipGraphicsResource_t resource; unsigned int arrayIndex; unsigned int mipLevel; } hipGraphicsSubResourceGetMappedArray; struct { int count; hipGraphicsResource_t* resources; hipGraphicsResource_t resources__val; hipStream_t stream; } hipGraphicsUnmapResources; struct { hipGraphicsResource_t resource; } hipGraphicsUnregisterResource; struct { hipFunction_t f; unsigned int globalWorkSizeX; unsigned int globalWorkSizeY; unsigned int globalWorkSizeZ; unsigned int blockDimX; unsigned int blockDimY; unsigned int blockDimZ; size_t sharedMemBytes; hipStream_t hStream; void** kernelParams; void* kernelParams__val; void** extra; void* extra__val; hipEvent_t startEvent; hipEvent_t stopEvent; } hipHccModuleLaunchKernel; struct { void** ptr; void* ptr__val; size_t size; unsigned int flags; } hipHostAlloc; struct { void* ptr; } hipHostFree; struct { void** devPtr; void* devPtr__val; void* hstPtr; unsigned int flags; } hipHostGetDevicePointer; struct { unsigned int* flagsPtr; unsigned int flagsPtr__val; void* hostPtr; } hipHostGetFlags; struct { void** ptr; void* ptr__val; size_t size; unsigned int flags; } hipHostMalloc; struct { void* hostPtr; size_t sizeBytes; unsigned int flags; } hipHostRegister; struct { void* hostPtr; } hipHostUnregister; struct { hipExternalMemory_t* extMem_out; hipExternalMemory_t extMem_out__val; const hipExternalMemoryHandleDesc* memHandleDesc; hipExternalMemoryHandleDesc memHandleDesc__val; } hipImportExternalMemory; struct { hipExternalSemaphore_t* extSem_out; hipExternalSemaphore_t extSem_out__val; const hipExternalSemaphoreHandleDesc* semHandleDesc; hipExternalSemaphoreHandleDesc semHandleDesc__val; } hipImportExternalSemaphore; struct { unsigned int flags; } hipInit; struct { void* devPtr; } hipIpcCloseMemHandle; struct { hipIpcEventHandle_t* handle; hipIpcEventHandle_t handle__val; hipEvent_t event; } hipIpcGetEventHandle; struct { hipIpcMemHandle_t* handle; hipIpcMemHandle_t handle__val; void* devPtr; } hipIpcGetMemHandle; struct { hipEvent_t* event; hipEvent_t event__val; hipIpcEventHandle_t handle; } hipIpcOpenEventHandle; struct { void** devPtr; void* devPtr__val; hipIpcMemHandle_t handle; unsigned int flags; } hipIpcOpenMemHandle; struct { const void* hostFunction; } hipLaunchByPtr; struct { const void* f; dim3 gridDim; dim3 blockDimX; void** kernelParams; void* kernelParams__val; unsigned int sharedMemBytes; hipStream_t stream; } hipLaunchCooperativeKernel; struct { hipLaunchParams* launchParamsList; hipLaunchParams launchParamsList__val; int numDevices; unsigned int flags; } hipLaunchCooperativeKernelMultiDevice; struct { hipStream_t stream; hipHostFn_t fn; void* userData; } hipLaunchHostFunc; struct { const void* function_address; dim3 numBlocks; dim3 dimBlocks; void** args; void* args__val; size_t sharedMemBytes; hipStream_t stream; } hipLaunchKernel; struct { void** ptr; void* ptr__val; size_t size; } hipMalloc; struct { hipPitchedPtr* pitchedDevPtr; hipPitchedPtr pitchedDevPtr__val; hipExtent extent; } hipMalloc3D; struct { hipArray_t* array; hipArray_t array__val; const hipChannelFormatDesc* desc; hipChannelFormatDesc desc__val; hipExtent extent; unsigned int flags; } hipMalloc3DArray; struct { hipArray** array; hipArray* array__val; const hipChannelFormatDesc* desc; hipChannelFormatDesc desc__val; size_t width; size_t height; unsigned int flags; } hipMallocArray; struct { void** dev_ptr; void* dev_ptr__val; size_t size; hipStream_t stream; } hipMallocAsync; struct { void** dev_ptr; void* dev_ptr__val; size_t size; hipMemPool_t mem_pool; hipStream_t stream; } hipMallocFromPoolAsync; struct { void** ptr; void* ptr__val; size_t size; } hipMallocHost; struct { void** dev_ptr; void* dev_ptr__val; size_t size; unsigned int flags; } hipMallocManaged; struct { hipMipmappedArray_t* mipmappedArray; hipMipmappedArray_t mipmappedArray__val; const hipChannelFormatDesc* desc; hipChannelFormatDesc desc__val; hipExtent extent; unsigned int numLevels; unsigned int flags; } hipMallocMipmappedArray; struct { void** ptr; void* ptr__val; size_t* pitch; size_t pitch__val; size_t width; size_t height; } hipMallocPitch; struct { void* devPtr; size_t size; } hipMemAddressFree; struct { void** ptr; void* ptr__val; size_t size; size_t alignment; void* addr; unsigned long long flags; } hipMemAddressReserve; struct { const void* dev_ptr; size_t count; hipMemoryAdvise advice; int device; } hipMemAdvise; struct { void** ptr; void* ptr__val; size_t size; } hipMemAllocHost; struct { hipDeviceptr_t* dptr; hipDeviceptr_t dptr__val; size_t* pitch; size_t pitch__val; size_t widthInBytes; size_t height; unsigned int elementSizeBytes; } hipMemAllocPitch; struct { hipMemGenericAllocationHandle_t* handle; hipMemGenericAllocationHandle_t handle__val; size_t size; const hipMemAllocationProp* prop; hipMemAllocationProp prop__val; unsigned long long flags; } hipMemCreate; struct { void* shareableHandle; hipMemGenericAllocationHandle_t handle; hipMemAllocationHandleType handleType; unsigned long long flags; } hipMemExportToShareableHandle; struct { unsigned long long* flags; unsigned long long flags__val; const hipMemLocation* location; hipMemLocation location__val; void* ptr; } hipMemGetAccess; struct { hipDeviceptr_t* pbase; hipDeviceptr_t pbase__val; size_t* psize; size_t psize__val; hipDeviceptr_t dptr; } hipMemGetAddressRange; struct { size_t* granularity; size_t granularity__val; const hipMemAllocationProp* prop; hipMemAllocationProp prop__val; hipMemAllocationGranularity_flags option; } hipMemGetAllocationGranularity; struct { hipMemAllocationProp* prop; hipMemAllocationProp prop__val; hipMemGenericAllocationHandle_t handle; } hipMemGetAllocationPropertiesFromHandle; struct { size_t* free; size_t free__val; size_t* total; size_t total__val; } hipMemGetInfo; struct { hipMemGenericAllocationHandle_t* handle; hipMemGenericAllocationHandle_t handle__val; void* osHandle; hipMemAllocationHandleType shHandleType; } hipMemImportFromShareableHandle; struct { void* ptr; size_t size; size_t offset; hipMemGenericAllocationHandle_t handle; unsigned long long flags; } hipMemMap; struct { hipArrayMapInfo* mapInfoList; hipArrayMapInfo mapInfoList__val; unsigned int count; hipStream_t stream; } hipMemMapArrayAsync; struct { hipMemPool_t* mem_pool; hipMemPool_t mem_pool__val; const hipMemPoolProps* pool_props; hipMemPoolProps pool_props__val; } hipMemPoolCreate; struct { hipMemPool_t mem_pool; } hipMemPoolDestroy; struct { hipMemPoolPtrExportData* export_data; hipMemPoolPtrExportData export_data__val; void* dev_ptr; } hipMemPoolExportPointer; struct { void* shared_handle; hipMemPool_t mem_pool; hipMemAllocationHandleType handle_type; unsigned int flags; } hipMemPoolExportToShareableHandle; struct { hipMemAccessFlags* flags; hipMemAccessFlags flags__val; hipMemPool_t mem_pool; hipMemLocation* location; hipMemLocation location__val; } hipMemPoolGetAccess; struct { hipMemPool_t mem_pool; hipMemPoolAttr attr; void* value; } hipMemPoolGetAttribute; struct { hipMemPool_t* mem_pool; hipMemPool_t mem_pool__val; void* shared_handle; hipMemAllocationHandleType handle_type; unsigned int flags; } hipMemPoolImportFromShareableHandle; struct { void** dev_ptr; void* dev_ptr__val; hipMemPool_t mem_pool; hipMemPoolPtrExportData* export_data; hipMemPoolPtrExportData export_data__val; } hipMemPoolImportPointer; struct { hipMemPool_t mem_pool; const hipMemAccessDesc* desc_list; hipMemAccessDesc desc_list__val; size_t count; } hipMemPoolSetAccess; struct { hipMemPool_t mem_pool; hipMemPoolAttr attr; void* value; } hipMemPoolSetAttribute; struct { hipMemPool_t mem_pool; size_t min_bytes_to_hold; } hipMemPoolTrimTo; struct { const void* dev_ptr; size_t count; int device; hipStream_t stream; } hipMemPrefetchAsync; struct { void* ptr; size_t* size; size_t size__val; } hipMemPtrGetInfo; struct { void* data; size_t data_size; hipMemRangeAttribute attribute; const void* dev_ptr; size_t count; } hipMemRangeGetAttribute; struct { void** data; void* data__val; size_t* data_sizes; size_t data_sizes__val; hipMemRangeAttribute* attributes; hipMemRangeAttribute attributes__val; size_t num_attributes; const void* dev_ptr; size_t count; } hipMemRangeGetAttributes; struct { hipMemGenericAllocationHandle_t handle; } hipMemRelease; struct { hipMemGenericAllocationHandle_t* handle; hipMemGenericAllocationHandle_t handle__val; void* addr; } hipMemRetainAllocationHandle; struct { void* ptr; size_t size; const hipMemAccessDesc* desc; hipMemAccessDesc desc__val; size_t count; } hipMemSetAccess; struct { void* ptr; size_t size; } hipMemUnmap; struct { void* dst; const void* src; size_t sizeBytes; hipMemcpyKind kind; } hipMemcpy; struct { void* dst; size_t dpitch; const void* src; size_t spitch; size_t width; size_t height; hipMemcpyKind kind; } hipMemcpy2D; struct { void* dst; size_t dpitch; const void* src; size_t spitch; size_t width; size_t height; hipMemcpyKind kind; hipStream_t stream; } hipMemcpy2DAsync; struct { void* dst; size_t dpitch; hipArray_const_t src; size_t wOffset; size_t hOffset; size_t width; size_t height; hipMemcpyKind kind; } hipMemcpy2DFromArray; struct { void* dst; size_t dpitch; hipArray_const_t src; size_t wOffset; size_t hOffset; size_t width; size_t height; hipMemcpyKind kind; hipStream_t stream; } hipMemcpy2DFromArrayAsync; struct { hipArray* dst; hipArray dst__val; size_t wOffset; size_t hOffset; const void* src; size_t spitch; size_t width; size_t height; hipMemcpyKind kind; } hipMemcpy2DToArray; struct { hipArray* dst; hipArray dst__val; size_t wOffset; size_t hOffset; const void* src; size_t spitch; size_t width; size_t height; hipMemcpyKind kind; hipStream_t stream; } hipMemcpy2DToArrayAsync; struct { const hipMemcpy3DParms* p; hipMemcpy3DParms p__val; } hipMemcpy3D; struct { const hipMemcpy3DParms* p; hipMemcpy3DParms p__val; hipStream_t stream; } hipMemcpy3DAsync; struct { void* dst; const void* src; size_t sizeBytes; hipMemcpyKind kind; hipStream_t stream; } hipMemcpyAsync; struct { void* dst; hipArray* srcArray; hipArray srcArray__val; size_t srcOffset; size_t count; } hipMemcpyAtoH; struct { hipDeviceptr_t dst; hipDeviceptr_t src; size_t sizeBytes; } hipMemcpyDtoD; struct { hipDeviceptr_t dst; hipDeviceptr_t src; size_t sizeBytes; hipStream_t stream; } hipMemcpyDtoDAsync; struct { void* dst; hipDeviceptr_t src; size_t sizeBytes; } hipMemcpyDtoH; struct { void* dst; hipDeviceptr_t src; size_t sizeBytes; hipStream_t stream; } hipMemcpyDtoHAsync; struct { void* dst; hipArray_const_t srcArray; size_t wOffset; size_t hOffset; size_t count; hipMemcpyKind kind; } hipMemcpyFromArray; struct { void* dst; const void* symbol; size_t sizeBytes; size_t offset; hipMemcpyKind kind; } hipMemcpyFromSymbol; struct { void* dst; const void* symbol; size_t sizeBytes; size_t offset; hipMemcpyKind kind; hipStream_t stream; } hipMemcpyFromSymbolAsync; struct { hipArray* dstArray; hipArray dstArray__val; size_t dstOffset; const void* srcHost; size_t count; } hipMemcpyHtoA; struct { hipDeviceptr_t dst; void* src; size_t sizeBytes; } hipMemcpyHtoD; struct { hipDeviceptr_t dst; void* src; size_t sizeBytes; hipStream_t stream; } hipMemcpyHtoDAsync; struct { const hip_Memcpy2D* pCopy; hip_Memcpy2D pCopy__val; } hipMemcpyParam2D; struct { const hip_Memcpy2D* pCopy; hip_Memcpy2D pCopy__val; hipStream_t stream; } hipMemcpyParam2DAsync; struct { void* dst; int dstDeviceId; const void* src; int srcDeviceId; size_t sizeBytes; } hipMemcpyPeer; struct { void* dst; int dstDeviceId; const void* src; int srcDevice; size_t sizeBytes; hipStream_t stream; } hipMemcpyPeerAsync; struct { hipArray* dst; hipArray dst__val; size_t wOffset; size_t hOffset; const void* src; size_t count; hipMemcpyKind kind; } hipMemcpyToArray; struct { const void* symbol; const void* src; size_t sizeBytes; size_t offset; hipMemcpyKind kind; } hipMemcpyToSymbol; struct { const void* symbol; const void* src; size_t sizeBytes; size_t offset; hipMemcpyKind kind; hipStream_t stream; } hipMemcpyToSymbolAsync; struct { void* dst; const void* src; size_t sizeBytes; hipMemcpyKind kind; hipStream_t stream; } hipMemcpyWithStream; struct { void* dst; int value; size_t sizeBytes; } hipMemset; struct { void* dst; size_t pitch; int value; size_t width; size_t height; } hipMemset2D; struct { void* dst; size_t pitch; int value; size_t width; size_t height; hipStream_t stream; } hipMemset2DAsync; struct { hipPitchedPtr pitchedDevPtr; int value; hipExtent extent; } hipMemset3D; struct { hipPitchedPtr pitchedDevPtr; int value; hipExtent extent; hipStream_t stream; } hipMemset3DAsync; struct { void* dst; int value; size_t sizeBytes; hipStream_t stream; } hipMemsetAsync; struct { hipDeviceptr_t dest; unsigned short value; size_t count; } hipMemsetD16; struct { hipDeviceptr_t dest; unsigned short value; size_t count; hipStream_t stream; } hipMemsetD16Async; struct { hipDeviceptr_t dest; int value; size_t count; } hipMemsetD32; struct { hipDeviceptr_t dst; int value; size_t count; hipStream_t stream; } hipMemsetD32Async; struct { hipDeviceptr_t dest; unsigned char value; size_t count; } hipMemsetD8; struct { hipDeviceptr_t dest; unsigned char value; size_t count; hipStream_t stream; } hipMemsetD8Async; struct { hipMipmappedArray_t* pHandle; hipMipmappedArray_t pHandle__val; HIP_ARRAY3D_DESCRIPTOR* pMipmappedArrayDesc; HIP_ARRAY3D_DESCRIPTOR pMipmappedArrayDesc__val; unsigned int numMipmapLevels; } hipMipmappedArrayCreate; struct { hipMipmappedArray_t hMipmappedArray; } hipMipmappedArrayDestroy; struct { hipArray_t* pLevelArray; hipArray_t pLevelArray__val; hipMipmappedArray_t hMipMappedArray; unsigned int level; } hipMipmappedArrayGetLevel; struct { hipFunction_t* function; hipFunction_t function__val; hipModule_t module; const char* kname; char kname__val; } hipModuleGetFunction; struct { hipDeviceptr_t* dptr; hipDeviceptr_t dptr__val; size_t* bytes; size_t bytes__val; hipModule_t hmod; const char* name; char name__val; } hipModuleGetGlobal; struct { textureReference** texRef; textureReference* texRef__val; hipModule_t hmod; const char* name; char name__val; } hipModuleGetTexRef; struct { hipFunction_t f; unsigned int gridDimX; unsigned int gridDimY; unsigned int gridDimZ; unsigned int blockDimX; unsigned int blockDimY; unsigned int blockDimZ; unsigned int sharedMemBytes; hipStream_t stream; void** kernelParams; void* kernelParams__val; } hipModuleLaunchCooperativeKernel; struct { hipFunctionLaunchParams* launchParamsList; hipFunctionLaunchParams launchParamsList__val; unsigned int numDevices; unsigned int flags; } hipModuleLaunchCooperativeKernelMultiDevice; struct { hipFunction_t f; unsigned int gridDimX; unsigned int gridDimY; unsigned int gridDimZ; unsigned int blockDimX; unsigned int blockDimY; unsigned int blockDimZ; unsigned int sharedMemBytes; hipStream_t stream; void** kernelParams; void* kernelParams__val; void** extra; void* extra__val; } hipModuleLaunchKernel; struct { hipModule_t* module; hipModule_t module__val; const char* fname; char fname__val; } hipModuleLoad; struct { hipModule_t* module; hipModule_t module__val; const void* image; } hipModuleLoadData; struct { hipModule_t* module; hipModule_t module__val; const void* image; unsigned int numOptions; hipJitOption* options; hipJitOption options__val; void** optionsValues; void* optionsValues__val; } hipModuleLoadDataEx; struct { int* numBlocks; int numBlocks__val; hipFunction_t f; int blockSize; size_t dynSharedMemPerBlk; } hipModuleOccupancyMaxActiveBlocksPerMultiprocessor; struct { int* numBlocks; int numBlocks__val; hipFunction_t f; int blockSize; size_t dynSharedMemPerBlk; unsigned int flags; } hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags; struct { int* gridSize; int gridSize__val; int* blockSize; int blockSize__val; hipFunction_t f; size_t dynSharedMemPerBlk; int blockSizeLimit; } hipModuleOccupancyMaxPotentialBlockSize; struct { int* gridSize; int gridSize__val; int* blockSize; int blockSize__val; hipFunction_t f; size_t dynSharedMemPerBlk; int blockSizeLimit; unsigned int flags; } hipModuleOccupancyMaxPotentialBlockSizeWithFlags; struct { hipModule_t module; } hipModuleUnload; struct { int* numBlocks; int numBlocks__val; const void* f; int blockSize; size_t dynamicSMemSize; } hipOccupancyMaxActiveBlocksPerMultiprocessor; struct { int* numBlocks; int numBlocks__val; const void* f; int blockSize; size_t dynamicSMemSize; unsigned int flags; } hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags; struct { int* gridSize; int gridSize__val; int* blockSize; int blockSize__val; const void* f; size_t dynSharedMemPerBlk; int blockSizeLimit; } hipOccupancyMaxPotentialBlockSize; struct { void* data; hipPointer_attribute attribute; hipDeviceptr_t ptr; } hipPointerGetAttribute; struct { hipPointerAttribute_t* attributes; hipPointerAttribute_t attributes__val; const void* ptr; } hipPointerGetAttributes; struct { const void* value; hipPointer_attribute attribute; hipDeviceptr_t ptr; } hipPointerSetAttribute; struct { int* runtimeVersion; int runtimeVersion__val; } hipRuntimeGetVersion; struct { int deviceId; } hipSetDevice; struct { unsigned int flags; } hipSetDeviceFlags; struct { const void* arg; size_t size; size_t offset; } hipSetupArgument; struct { const hipExternalSemaphore_t* extSemArray; hipExternalSemaphore_t extSemArray__val; const hipExternalSemaphoreSignalParams* paramsArray; hipExternalSemaphoreSignalParams paramsArray__val; unsigned int numExtSems; hipStream_t stream; } hipSignalExternalSemaphoresAsync; struct { hipStream_t stream; hipStreamCallback_t callback; void* userData; unsigned int flags; } hipStreamAddCallback; struct { hipStream_t stream; void* dev_ptr; size_t length; unsigned int flags; } hipStreamAttachMemAsync; struct { hipStream_t stream; hipStreamCaptureMode mode; } hipStreamBeginCapture; struct { hipStream_t* stream; hipStream_t stream__val; } hipStreamCreate; struct { hipStream_t* stream; hipStream_t stream__val; unsigned int flags; } hipStreamCreateWithFlags; struct { hipStream_t* stream; hipStream_t stream__val; unsigned int flags; int priority; } hipStreamCreateWithPriority; struct { hipStream_t stream; } hipStreamDestroy; struct { hipStream_t stream; hipGraph_t* pGraph; hipGraph_t pGraph__val; } hipStreamEndCapture; struct { hipStream_t stream; hipStreamCaptureStatus* pCaptureStatus; hipStreamCaptureStatus pCaptureStatus__val; unsigned long long* pId; unsigned long long pId__val; } hipStreamGetCaptureInfo; struct { hipStream_t stream; hipStreamCaptureStatus* captureStatus_out; hipStreamCaptureStatus captureStatus_out__val; unsigned long long* id_out; unsigned long long id_out__val; hipGraph_t* graph_out; hipGraph_t graph_out__val; const hipGraphNode_t** dependencies_out; const hipGraphNode_t* dependencies_out__val; size_t* numDependencies_out; size_t numDependencies_out__val; } hipStreamGetCaptureInfo_v2; struct { hipStream_t stream; hipDevice_t* device; hipDevice_t device__val; } hipStreamGetDevice; struct { hipStream_t stream; unsigned int* flags; unsigned int flags__val; } hipStreamGetFlags; struct { hipStream_t stream; int* priority; int priority__val; } hipStreamGetPriority; struct { hipStream_t stream; hipStreamCaptureStatus* pCaptureStatus; hipStreamCaptureStatus pCaptureStatus__val; } hipStreamIsCapturing; struct { hipStream_t stream; } hipStreamQuery; struct { hipStream_t stream; } hipStreamSynchronize; struct { hipStream_t stream; hipGraphNode_t* dependencies; hipGraphNode_t dependencies__val; size_t numDependencies; unsigned int flags; } hipStreamUpdateCaptureDependencies; struct { hipStream_t stream; hipEvent_t event; unsigned int flags; } hipStreamWaitEvent; struct { hipStream_t stream; void* ptr; unsigned int value; unsigned int flags; unsigned int mask; } hipStreamWaitValue32; struct { hipStream_t stream; void* ptr; uint64_t value; unsigned int flags; uint64_t mask; } hipStreamWaitValue64; struct { hipStream_t stream; void* ptr; unsigned int value; unsigned int flags; } hipStreamWriteValue32; struct { hipStream_t stream; void* ptr; uint64_t value; unsigned int flags; } hipStreamWriteValue64; struct { hipDeviceptr_t* dev_ptr; hipDeviceptr_t dev_ptr__val; const textureReference* texRef; textureReference texRef__val; } hipTexRefGetAddress; struct { unsigned int* pFlags; unsigned int pFlags__val; const textureReference* texRef; textureReference texRef__val; } hipTexRefGetFlags; struct { hipArray_Format* pFormat; hipArray_Format pFormat__val; int* pNumChannels; int pNumChannels__val; const textureReference* texRef; textureReference texRef__val; } hipTexRefGetFormat; struct { int* pmaxAnsio; int pmaxAnsio__val; const textureReference* texRef; textureReference texRef__val; } hipTexRefGetMaxAnisotropy; struct { hipMipmappedArray_t* pArray; hipMipmappedArray_t pArray__val; const textureReference* texRef; textureReference texRef__val; } hipTexRefGetMipMappedArray; struct { float* pbias; float pbias__val; const textureReference* texRef; textureReference texRef__val; } hipTexRefGetMipmapLevelBias; struct { float* pminMipmapLevelClamp; float pminMipmapLevelClamp__val; float* pmaxMipmapLevelClamp; float pmaxMipmapLevelClamp__val; const textureReference* texRef; textureReference texRef__val; } hipTexRefGetMipmapLevelClamp; struct { size_t* ByteOffset; size_t ByteOffset__val; textureReference* texRef; textureReference texRef__val; hipDeviceptr_t dptr; size_t bytes; } hipTexRefSetAddress; struct { textureReference* texRef; textureReference texRef__val; const HIP_ARRAY_DESCRIPTOR* desc; HIP_ARRAY_DESCRIPTOR desc__val; hipDeviceptr_t dptr; size_t Pitch; } hipTexRefSetAddress2D; struct { textureReference* tex; textureReference tex__val; hipArray_const_t array; unsigned int flags; } hipTexRefSetArray; struct { textureReference* texRef; textureReference texRef__val; float* pBorderColor; float pBorderColor__val; } hipTexRefSetBorderColor; struct { textureReference* texRef; textureReference texRef__val; unsigned int Flags; } hipTexRefSetFlags; struct { textureReference* texRef; textureReference texRef__val; hipArray_Format fmt; int NumPackedComponents; } hipTexRefSetFormat; struct { textureReference* texRef; textureReference texRef__val; unsigned int maxAniso; } hipTexRefSetMaxAnisotropy; struct { textureReference* texRef; textureReference texRef__val; float bias; } hipTexRefSetMipmapLevelBias; struct { textureReference* texRef; textureReference texRef__val; float minMipMapLevelClamp; float maxMipMapLevelClamp; } hipTexRefSetMipmapLevelClamp; struct { textureReference* texRef; textureReference texRef__val; hipMipmappedArray* mipmappedArray; hipMipmappedArray mipmappedArray__val; unsigned int Flags; } hipTexRefSetMipmappedArray; struct { hipStreamCaptureMode* mode; hipStreamCaptureMode mode__val; } hipThreadExchangeStreamCaptureMode; struct { hipUserObject_t* object_out; hipUserObject_t object_out__val; void* ptr; hipHostFn_t destroy; unsigned int initialRefcount; unsigned int flags; } hipUserObjectCreate; struct { hipUserObject_t object; unsigned int count; } hipUserObjectRelease; struct { hipUserObject_t object; unsigned int count; } hipUserObjectRetain; struct { const hipExternalSemaphore_t* extSemArray; hipExternalSemaphore_t extSemArray__val; const hipExternalSemaphoreWaitParams* paramsArray; hipExternalSemaphoreWaitParams paramsArray__val; unsigned int numExtSems; hipStream_t stream; } hipWaitExternalSemaphoresAsync; } args; uint64_t *phase_data; } hip_api_data_t; // HIP API callbacks args data filling macros // __hipPopCallConfiguration[('dim3*', 'gridDim'), ('dim3*', 'blockDim'), ('size_t*', 'sharedMem'), ('hipStream_t*', 'stream')] #define INIT___hipPopCallConfiguration_CB_ARGS_DATA(cb_data) { \ cb_data.args.__hipPopCallConfiguration.gridDim = (dim3*)gridDim; \ cb_data.args.__hipPopCallConfiguration.blockDim = (dim3*)blockDim; \ cb_data.args.__hipPopCallConfiguration.sharedMem = (size_t*)sharedMem; \ cb_data.args.__hipPopCallConfiguration.stream = (hipStream_t*)stream; \ }; // __hipPushCallConfiguration[('dim3', 'gridDim'), ('dim3', 'blockDim'), ('size_t', 'sharedMem'), ('hipStream_t', 'stream')] #define INIT___hipPushCallConfiguration_CB_ARGS_DATA(cb_data) { \ cb_data.args.__hipPushCallConfiguration.gridDim = (dim3)gridDim; \ cb_data.args.__hipPushCallConfiguration.blockDim = (dim3)blockDim; \ cb_data.args.__hipPushCallConfiguration.sharedMem = (size_t)sharedMem; \ cb_data.args.__hipPushCallConfiguration.stream = (hipStream_t)stream; \ }; // hipArray3DCreate[('hipArray**', 'array'), ('const HIP_ARRAY3D_DESCRIPTOR*', 'pAllocateArray')] #define INIT_hipArray3DCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipArray3DCreate.array = (hipArray**)array; \ cb_data.args.hipArray3DCreate.pAllocateArray = (const HIP_ARRAY3D_DESCRIPTOR*)pAllocateArray; \ }; // hipArray3DGetDescriptor[('HIP_ARRAY3D_DESCRIPTOR*', 'pArrayDescriptor'), ('hipArray*', 'array')] #define INIT_hipArray3DGetDescriptor_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipArray3DGetDescriptor.pArrayDescriptor = (HIP_ARRAY3D_DESCRIPTOR*)pArrayDescriptor; \ cb_data.args.hipArray3DGetDescriptor.array = (hipArray*)array; \ }; // hipArrayCreate[('hipArray**', 'pHandle'), ('const HIP_ARRAY_DESCRIPTOR*', 'pAllocateArray')] #define INIT_hipArrayCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipArrayCreate.pHandle = (hipArray**)array; \ cb_data.args.hipArrayCreate.pAllocateArray = (const HIP_ARRAY_DESCRIPTOR*)pAllocateArray; \ }; // hipArrayDestroy[('hipArray*', 'array')] #define INIT_hipArrayDestroy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipArrayDestroy.array = (hipArray*)array; \ }; // hipArrayGetDescriptor[('HIP_ARRAY_DESCRIPTOR*', 'pArrayDescriptor'), ('hipArray*', 'array')] #define INIT_hipArrayGetDescriptor_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipArrayGetDescriptor.pArrayDescriptor = (HIP_ARRAY_DESCRIPTOR*)pArrayDescriptor; \ cb_data.args.hipArrayGetDescriptor.array = (hipArray*)array; \ }; // hipArrayGetInfo[('hipChannelFormatDesc*', 'desc'), ('hipExtent*', 'extent'), ('unsigned int*', 'flags'), ('hipArray*', 'array')] #define INIT_hipArrayGetInfo_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipArrayGetInfo.desc = (hipChannelFormatDesc*)desc; \ cb_data.args.hipArrayGetInfo.extent = (hipExtent*)extent; \ cb_data.args.hipArrayGetInfo.flags = (unsigned int*)flags; \ cb_data.args.hipArrayGetInfo.array = (hipArray*)array; \ }; // hipChooseDevice[('int*', 'device'), ('const hipDeviceProp_t*', 'prop')] #define INIT_hipChooseDevice_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipChooseDevice.device = (int*)device; \ cb_data.args.hipChooseDevice.prop = (const hipDeviceProp_t*)properties; \ }; // hipConfigureCall[('dim3', 'gridDim'), ('dim3', 'blockDim'), ('size_t', 'sharedMem'), ('hipStream_t', 'stream')] #define INIT_hipConfigureCall_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipConfigureCall.gridDim = (dim3)gridDim; \ cb_data.args.hipConfigureCall.blockDim = (dim3)blockDim; \ cb_data.args.hipConfigureCall.sharedMem = (size_t)sharedMem; \ cb_data.args.hipConfigureCall.stream = (hipStream_t)stream; \ }; // hipCreateSurfaceObject[('hipSurfaceObject_t*', 'pSurfObject'), ('const hipResourceDesc*', 'pResDesc')] #define INIT_hipCreateSurfaceObject_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCreateSurfaceObject.pSurfObject = (hipSurfaceObject_t*)pSurfObject; \ cb_data.args.hipCreateSurfaceObject.pResDesc = (const hipResourceDesc*)pResDesc; \ }; // hipCtxCreate[('hipCtx_t*', 'ctx'), ('unsigned int', 'flags'), ('hipDevice_t', 'device')] #define INIT_hipCtxCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxCreate.ctx = (hipCtx_t*)ctx; \ cb_data.args.hipCtxCreate.flags = (unsigned int)flags; \ cb_data.args.hipCtxCreate.device = (hipDevice_t)device; \ }; // hipCtxDestroy[('hipCtx_t', 'ctx')] #define INIT_hipCtxDestroy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxDestroy.ctx = (hipCtx_t)ctx; \ }; // hipCtxDisablePeerAccess[('hipCtx_t', 'peerCtx')] #define INIT_hipCtxDisablePeerAccess_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxDisablePeerAccess.peerCtx = (hipCtx_t)peerCtx; \ }; // hipCtxEnablePeerAccess[('hipCtx_t', 'peerCtx'), ('unsigned int', 'flags')] #define INIT_hipCtxEnablePeerAccess_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxEnablePeerAccess.peerCtx = (hipCtx_t)peerCtx; \ cb_data.args.hipCtxEnablePeerAccess.flags = (unsigned int)flags; \ }; // hipCtxGetApiVersion[('hipCtx_t', 'ctx'), ('int*', 'apiVersion')] #define INIT_hipCtxGetApiVersion_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxGetApiVersion.ctx = (hipCtx_t)ctx; \ cb_data.args.hipCtxGetApiVersion.apiVersion = (int*)apiVersion; \ }; // hipCtxGetCacheConfig[('hipFuncCache_t*', 'cacheConfig')] #define INIT_hipCtxGetCacheConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxGetCacheConfig.cacheConfig = (hipFuncCache_t*)cacheConfig; \ }; // hipCtxGetCurrent[('hipCtx_t*', 'ctx')] #define INIT_hipCtxGetCurrent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxGetCurrent.ctx = (hipCtx_t*)ctx; \ }; // hipCtxGetDevice[('hipDevice_t*', 'device')] #define INIT_hipCtxGetDevice_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxGetDevice.device = (hipDevice_t*)device; \ }; // hipCtxGetFlags[('unsigned int*', 'flags')] #define INIT_hipCtxGetFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxGetFlags.flags = (unsigned int*)flags; \ }; // hipCtxGetSharedMemConfig[('hipSharedMemConfig*', 'pConfig')] #define INIT_hipCtxGetSharedMemConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxGetSharedMemConfig.pConfig = (hipSharedMemConfig*)pConfig; \ }; // hipCtxPopCurrent[('hipCtx_t*', 'ctx')] #define INIT_hipCtxPopCurrent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxPopCurrent.ctx = (hipCtx_t*)ctx; \ }; // hipCtxPushCurrent[('hipCtx_t', 'ctx')] #define INIT_hipCtxPushCurrent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxPushCurrent.ctx = (hipCtx_t)ctx; \ }; // hipCtxSetCacheConfig[('hipFuncCache_t', 'cacheConfig')] #define INIT_hipCtxSetCacheConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxSetCacheConfig.cacheConfig = (hipFuncCache_t)cacheConfig; \ }; // hipCtxSetCurrent[('hipCtx_t', 'ctx')] #define INIT_hipCtxSetCurrent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxSetCurrent.ctx = (hipCtx_t)ctx; \ }; // hipCtxSetSharedMemConfig[('hipSharedMemConfig', 'config')] #define INIT_hipCtxSetSharedMemConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipCtxSetSharedMemConfig.config = (hipSharedMemConfig)config; \ }; // hipCtxSynchronize[] #define INIT_hipCtxSynchronize_CB_ARGS_DATA(cb_data) { \ }; // hipDestroyExternalMemory[('hipExternalMemory_t', 'extMem')] #define INIT_hipDestroyExternalMemory_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDestroyExternalMemory.extMem = (hipExternalMemory_t)extMem; \ }; // hipDestroyExternalSemaphore[('hipExternalSemaphore_t', 'extSem')] #define INIT_hipDestroyExternalSemaphore_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDestroyExternalSemaphore.extSem = (hipExternalSemaphore_t)extSem; \ }; // hipDestroySurfaceObject[('hipSurfaceObject_t', 'surfaceObject')] #define INIT_hipDestroySurfaceObject_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDestroySurfaceObject.surfaceObject = (hipSurfaceObject_t)surfaceObject; \ }; // hipDeviceCanAccessPeer[('int*', 'canAccessPeer'), ('int', 'deviceId'), ('int', 'peerDeviceId')] #define INIT_hipDeviceCanAccessPeer_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceCanAccessPeer.canAccessPeer = (int*)canAccess; \ cb_data.args.hipDeviceCanAccessPeer.deviceId = (int)deviceId; \ cb_data.args.hipDeviceCanAccessPeer.peerDeviceId = (int)peerDeviceId; \ }; // hipDeviceComputeCapability[('int*', 'major'), ('int*', 'minor'), ('hipDevice_t', 'device')] #define INIT_hipDeviceComputeCapability_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceComputeCapability.major = (int*)major; \ cb_data.args.hipDeviceComputeCapability.minor = (int*)minor; \ cb_data.args.hipDeviceComputeCapability.device = (hipDevice_t)device; \ }; // hipDeviceDisablePeerAccess[('int', 'peerDeviceId')] #define INIT_hipDeviceDisablePeerAccess_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceDisablePeerAccess.peerDeviceId = (int)peerDeviceId; \ }; // hipDeviceEnablePeerAccess[('int', 'peerDeviceId'), ('unsigned int', 'flags')] #define INIT_hipDeviceEnablePeerAccess_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceEnablePeerAccess.peerDeviceId = (int)peerDeviceId; \ cb_data.args.hipDeviceEnablePeerAccess.flags = (unsigned int)flags; \ }; // hipDeviceGet[('hipDevice_t*', 'device'), ('int', 'ordinal')] #define INIT_hipDeviceGet_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGet.device = (hipDevice_t*)device; \ cb_data.args.hipDeviceGet.ordinal = (int)deviceId; \ }; // hipDeviceGetAttribute[('int*', 'pi'), ('hipDeviceAttribute_t', 'attr'), ('int', 'deviceId')] #define INIT_hipDeviceGetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetAttribute.pi = (int*)pi; \ cb_data.args.hipDeviceGetAttribute.attr = (hipDeviceAttribute_t)attr; \ cb_data.args.hipDeviceGetAttribute.deviceId = (int)device; \ }; // hipDeviceGetByPCIBusId[('int*', 'device'), ('const char*', 'pciBusId')] #define INIT_hipDeviceGetByPCIBusId_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetByPCIBusId.device = (int*)device; \ cb_data.args.hipDeviceGetByPCIBusId.pciBusId = (pciBusIdstr) ? strdup(pciBusIdstr) : NULL; \ }; // hipDeviceGetCacheConfig[('hipFuncCache_t*', 'cacheConfig')] #define INIT_hipDeviceGetCacheConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetCacheConfig.cacheConfig = (hipFuncCache_t*)cacheConfig; \ }; // hipDeviceGetDefaultMemPool[('hipMemPool_t*', 'mem_pool'), ('int', 'device')] #define INIT_hipDeviceGetDefaultMemPool_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetDefaultMemPool.mem_pool = (hipMemPool_t*)mem_pool; \ cb_data.args.hipDeviceGetDefaultMemPool.device = (int)device; \ }; // hipDeviceGetGraphMemAttribute[('int', 'device'), ('hipGraphMemAttributeType', 'attr'), ('void*', 'value')] #define INIT_hipDeviceGetGraphMemAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetGraphMemAttribute.device = (int)device; \ cb_data.args.hipDeviceGetGraphMemAttribute.attr = (hipGraphMemAttributeType)attr; \ cb_data.args.hipDeviceGetGraphMemAttribute.value = (void*)value; \ }; // hipDeviceGetLimit[('size_t*', 'pValue'), ('hipLimit_t', 'limit')] #define INIT_hipDeviceGetLimit_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetLimit.pValue = (size_t*)pValue; \ cb_data.args.hipDeviceGetLimit.limit = (hipLimit_t)limit; \ }; // hipDeviceGetMemPool[('hipMemPool_t*', 'mem_pool'), ('int', 'device')] #define INIT_hipDeviceGetMemPool_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetMemPool.mem_pool = (hipMemPool_t*)mem_pool; \ cb_data.args.hipDeviceGetMemPool.device = (int)device; \ }; // hipDeviceGetName[('char*', 'name'), ('int', 'len'), ('hipDevice_t', 'device')] #define INIT_hipDeviceGetName_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetName.name = (char*)name; \ cb_data.args.hipDeviceGetName.len = (int)len; \ cb_data.args.hipDeviceGetName.device = (hipDevice_t)device; \ }; // hipDeviceGetP2PAttribute[('int*', 'value'), ('hipDeviceP2PAttr', 'attr'), ('int', 'srcDevice'), ('int', 'dstDevice')] #define INIT_hipDeviceGetP2PAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetP2PAttribute.value = (int*)value; \ cb_data.args.hipDeviceGetP2PAttribute.attr = (hipDeviceP2PAttr)attr; \ cb_data.args.hipDeviceGetP2PAttribute.srcDevice = (int)srcDevice; \ cb_data.args.hipDeviceGetP2PAttribute.dstDevice = (int)dstDevice; \ }; // hipDeviceGetPCIBusId[('char*', 'pciBusId'), ('int', 'len'), ('int', 'device')] #define INIT_hipDeviceGetPCIBusId_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetPCIBusId.pciBusId = (char*)pciBusId; \ cb_data.args.hipDeviceGetPCIBusId.len = (int)len; \ cb_data.args.hipDeviceGetPCIBusId.device = (int)device; \ }; // hipDeviceGetSharedMemConfig[('hipSharedMemConfig*', 'pConfig')] #define INIT_hipDeviceGetSharedMemConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetSharedMemConfig.pConfig = (hipSharedMemConfig*)pConfig; \ }; // hipDeviceGetStreamPriorityRange[('int*', 'leastPriority'), ('int*', 'greatestPriority')] #define INIT_hipDeviceGetStreamPriorityRange_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetStreamPriorityRange.leastPriority = (int*)leastPriority; \ cb_data.args.hipDeviceGetStreamPriorityRange.greatestPriority = (int*)greatestPriority; \ }; // hipDeviceGetUuid[('hipUUID*', 'uuid'), ('hipDevice_t', 'device')] #define INIT_hipDeviceGetUuid_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGetUuid.uuid = (hipUUID*)uuid; \ cb_data.args.hipDeviceGetUuid.device = (hipDevice_t)device; \ }; // hipDeviceGraphMemTrim[('int', 'device')] #define INIT_hipDeviceGraphMemTrim_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceGraphMemTrim.device = (int)device; \ }; // hipDevicePrimaryCtxGetState[('hipDevice_t', 'dev'), ('unsigned int*', 'flags'), ('int*', 'active')] #define INIT_hipDevicePrimaryCtxGetState_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDevicePrimaryCtxGetState.dev = (hipDevice_t)dev; \ cb_data.args.hipDevicePrimaryCtxGetState.flags = (unsigned int*)flags; \ cb_data.args.hipDevicePrimaryCtxGetState.active = (int*)active; \ }; // hipDevicePrimaryCtxRelease[('hipDevice_t', 'dev')] #define INIT_hipDevicePrimaryCtxRelease_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDevicePrimaryCtxRelease.dev = (hipDevice_t)dev; \ }; // hipDevicePrimaryCtxReset[('hipDevice_t', 'dev')] #define INIT_hipDevicePrimaryCtxReset_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDevicePrimaryCtxReset.dev = (hipDevice_t)dev; \ }; // hipDevicePrimaryCtxRetain[('hipCtx_t*', 'pctx'), ('hipDevice_t', 'dev')] #define INIT_hipDevicePrimaryCtxRetain_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDevicePrimaryCtxRetain.pctx = (hipCtx_t*)pctx; \ cb_data.args.hipDevicePrimaryCtxRetain.dev = (hipDevice_t)dev; \ }; // hipDevicePrimaryCtxSetFlags[('hipDevice_t', 'dev'), ('unsigned int', 'flags')] #define INIT_hipDevicePrimaryCtxSetFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDevicePrimaryCtxSetFlags.dev = (hipDevice_t)dev; \ cb_data.args.hipDevicePrimaryCtxSetFlags.flags = (unsigned int)flags; \ }; // hipDeviceReset[] #define INIT_hipDeviceReset_CB_ARGS_DATA(cb_data) { \ }; // hipDeviceSetCacheConfig[('hipFuncCache_t', 'cacheConfig')] #define INIT_hipDeviceSetCacheConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceSetCacheConfig.cacheConfig = (hipFuncCache_t)cacheConfig; \ }; // hipDeviceSetGraphMemAttribute[('int', 'device'), ('hipGraphMemAttributeType', 'attr'), ('void*', 'value')] #define INIT_hipDeviceSetGraphMemAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceSetGraphMemAttribute.device = (int)device; \ cb_data.args.hipDeviceSetGraphMemAttribute.attr = (hipGraphMemAttributeType)attr; \ cb_data.args.hipDeviceSetGraphMemAttribute.value = (void*)value; \ }; // hipDeviceSetLimit[('hipLimit_t', 'limit'), ('size_t', 'value')] #define INIT_hipDeviceSetLimit_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceSetLimit.limit = (hipLimit_t)limit; \ cb_data.args.hipDeviceSetLimit.value = (size_t)value; \ }; // hipDeviceSetMemPool[('int', 'device'), ('hipMemPool_t', 'mem_pool')] #define INIT_hipDeviceSetMemPool_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceSetMemPool.device = (int)device; \ cb_data.args.hipDeviceSetMemPool.mem_pool = (hipMemPool_t)mem_pool; \ }; // hipDeviceSetSharedMemConfig[('hipSharedMemConfig', 'config')] #define INIT_hipDeviceSetSharedMemConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceSetSharedMemConfig.config = (hipSharedMemConfig)config; \ }; // hipDeviceSynchronize[] #define INIT_hipDeviceSynchronize_CB_ARGS_DATA(cb_data) { \ }; // hipDeviceTotalMem[('size_t*', 'bytes'), ('hipDevice_t', 'device')] #define INIT_hipDeviceTotalMem_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDeviceTotalMem.bytes = (size_t*)bytes; \ cb_data.args.hipDeviceTotalMem.device = (hipDevice_t)device; \ }; // hipDriverGetVersion[('int*', 'driverVersion')] #define INIT_hipDriverGetVersion_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDriverGetVersion.driverVersion = (int*)driverVersion; \ }; // hipDrvMemcpy2DUnaligned[('const hip_Memcpy2D*', 'pCopy')] #define INIT_hipDrvMemcpy2DUnaligned_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDrvMemcpy2DUnaligned.pCopy = (const hip_Memcpy2D*)pCopy; \ }; // hipDrvMemcpy3D[('const HIP_MEMCPY3D*', 'pCopy')] #define INIT_hipDrvMemcpy3D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDrvMemcpy3D.pCopy = (const HIP_MEMCPY3D*)pCopy; \ }; // hipDrvMemcpy3DAsync[('const HIP_MEMCPY3D*', 'pCopy'), ('hipStream_t', 'stream')] #define INIT_hipDrvMemcpy3DAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDrvMemcpy3DAsync.pCopy = (const HIP_MEMCPY3D*)pCopy; \ cb_data.args.hipDrvMemcpy3DAsync.stream = (hipStream_t)stream; \ }; // hipDrvPointerGetAttributes[('unsigned int', 'numAttributes'), ('hipPointer_attribute*', 'attributes'), ('void**', 'data'), ('hipDeviceptr_t', 'ptr')] #define INIT_hipDrvPointerGetAttributes_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipDrvPointerGetAttributes.numAttributes = (unsigned int)numAttributes; \ cb_data.args.hipDrvPointerGetAttributes.attributes = (hipPointer_attribute*)attributes; \ cb_data.args.hipDrvPointerGetAttributes.data = (void**)data; \ cb_data.args.hipDrvPointerGetAttributes.ptr = (hipDeviceptr_t)ptr; \ }; // hipEventCreate[('hipEvent_t*', 'event')] #define INIT_hipEventCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipEventCreate.event = (hipEvent_t*)event; \ }; // hipEventCreateWithFlags[('hipEvent_t*', 'event'), ('unsigned int', 'flags')] #define INIT_hipEventCreateWithFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipEventCreateWithFlags.event = (hipEvent_t*)event; \ cb_data.args.hipEventCreateWithFlags.flags = (unsigned int)flags; \ }; // hipEventDestroy[('hipEvent_t', 'event')] #define INIT_hipEventDestroy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipEventDestroy.event = (hipEvent_t)event; \ }; // hipEventElapsedTime[('float*', 'ms'), ('hipEvent_t', 'start'), ('hipEvent_t', 'stop')] #define INIT_hipEventElapsedTime_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipEventElapsedTime.ms = (float*)ms; \ cb_data.args.hipEventElapsedTime.start = (hipEvent_t)start; \ cb_data.args.hipEventElapsedTime.stop = (hipEvent_t)stop; \ }; // hipEventQuery[('hipEvent_t', 'event')] #define INIT_hipEventQuery_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipEventQuery.event = (hipEvent_t)event; \ }; // hipEventRecord[('hipEvent_t', 'event'), ('hipStream_t', 'stream')] #define INIT_hipEventRecord_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipEventRecord.event = (hipEvent_t)event; \ cb_data.args.hipEventRecord.stream = (hipStream_t)stream; \ }; // hipEventSynchronize[('hipEvent_t', 'event')] #define INIT_hipEventSynchronize_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipEventSynchronize.event = (hipEvent_t)event; \ }; // hipExtGetLinkTypeAndHopCount[('int', 'device1'), ('int', 'device2'), ('unsigned int*', 'linktype'), ('unsigned int*', 'hopcount')] #define INIT_hipExtGetLinkTypeAndHopCount_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipExtGetLinkTypeAndHopCount.device1 = (int)device1; \ cb_data.args.hipExtGetLinkTypeAndHopCount.device2 = (int)device2; \ cb_data.args.hipExtGetLinkTypeAndHopCount.linktype = (unsigned int*)linktype; \ cb_data.args.hipExtGetLinkTypeAndHopCount.hopcount = (unsigned int*)hopcount; \ }; // hipExtLaunchKernel[('const void*', 'function_address'), ('dim3', 'numBlocks'), ('dim3', 'dimBlocks'), ('void**', 'args'), ('size_t', 'sharedMemBytes'), ('hipStream_t', 'stream'), ('hipEvent_t', 'startEvent'), ('hipEvent_t', 'stopEvent'), ('int', 'flags')] #define INIT_hipExtLaunchKernel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipExtLaunchKernel.function_address = (const void*)hostFunction; \ cb_data.args.hipExtLaunchKernel.numBlocks = (dim3)gridDim; \ cb_data.args.hipExtLaunchKernel.dimBlocks = (dim3)blockDim; \ cb_data.args.hipExtLaunchKernel.args = (void**)args; \ cb_data.args.hipExtLaunchKernel.sharedMemBytes = (size_t)sharedMemBytes; \ cb_data.args.hipExtLaunchKernel.stream = (hipStream_t)stream; \ cb_data.args.hipExtLaunchKernel.startEvent = (hipEvent_t)startEvent; \ cb_data.args.hipExtLaunchKernel.stopEvent = (hipEvent_t)stopEvent; \ cb_data.args.hipExtLaunchKernel.flags = (int)flags; \ }; // hipExtLaunchMultiKernelMultiDevice[('hipLaunchParams*', 'launchParamsList'), ('int', 'numDevices'), ('unsigned int', 'flags')] #define INIT_hipExtLaunchMultiKernelMultiDevice_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipExtLaunchMultiKernelMultiDevice.launchParamsList = (hipLaunchParams*)launchParamsList; \ cb_data.args.hipExtLaunchMultiKernelMultiDevice.numDevices = (int)numDevices; \ cb_data.args.hipExtLaunchMultiKernelMultiDevice.flags = (unsigned int)flags; \ }; // hipExtMallocWithFlags[('void**', 'ptr'), ('size_t', 'sizeBytes'), ('unsigned int', 'flags')] #define INIT_hipExtMallocWithFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipExtMallocWithFlags.ptr = (void**)ptr; \ cb_data.args.hipExtMallocWithFlags.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipExtMallocWithFlags.flags = (unsigned int)flags; \ }; // hipExtModuleLaunchKernel[('hipFunction_t', 'f'), ('unsigned int', 'globalWorkSizeX'), ('unsigned int', 'globalWorkSizeY'), ('unsigned int', 'globalWorkSizeZ'), ('unsigned int', 'localWorkSizeX'), ('unsigned int', 'localWorkSizeY'), ('unsigned int', 'localWorkSizeZ'), ('size_t', 'sharedMemBytes'), ('hipStream_t', 'hStream'), ('void**', 'kernelParams'), ('void**', 'extra'), ('hipEvent_t', 'startEvent'), ('hipEvent_t', 'stopEvent'), ('unsigned int', 'flags')] #define INIT_hipExtModuleLaunchKernel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipExtModuleLaunchKernel.f = (hipFunction_t)f; \ cb_data.args.hipExtModuleLaunchKernel.globalWorkSizeX = (unsigned int)globalWorkSizeX; \ cb_data.args.hipExtModuleLaunchKernel.globalWorkSizeY = (unsigned int)globalWorkSizeY; \ cb_data.args.hipExtModuleLaunchKernel.globalWorkSizeZ = (unsigned int)globalWorkSizeZ; \ cb_data.args.hipExtModuleLaunchKernel.localWorkSizeX = (unsigned int)localWorkSizeX; \ cb_data.args.hipExtModuleLaunchKernel.localWorkSizeY = (unsigned int)localWorkSizeY; \ cb_data.args.hipExtModuleLaunchKernel.localWorkSizeZ = (unsigned int)localWorkSizeZ; \ cb_data.args.hipExtModuleLaunchKernel.sharedMemBytes = (size_t)sharedMemBytes; \ cb_data.args.hipExtModuleLaunchKernel.hStream = (hipStream_t)hStream; \ cb_data.args.hipExtModuleLaunchKernel.kernelParams = (void**)kernelParams; \ cb_data.args.hipExtModuleLaunchKernel.extra = (void**)extra; \ cb_data.args.hipExtModuleLaunchKernel.startEvent = (hipEvent_t)startEvent; \ cb_data.args.hipExtModuleLaunchKernel.stopEvent = (hipEvent_t)stopEvent; \ cb_data.args.hipExtModuleLaunchKernel.flags = (unsigned int)flags; \ }; // hipExtStreamCreateWithCUMask[('hipStream_t*', 'stream'), ('unsigned int', 'cuMaskSize'), ('const unsigned int*', 'cuMask')] #define INIT_hipExtStreamCreateWithCUMask_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipExtStreamCreateWithCUMask.stream = (hipStream_t*)stream; \ cb_data.args.hipExtStreamCreateWithCUMask.cuMaskSize = (unsigned int)cuMaskSize; \ cb_data.args.hipExtStreamCreateWithCUMask.cuMask = (const unsigned int*)cuMask; \ }; // hipExtStreamGetCUMask[('hipStream_t', 'stream'), ('unsigned int', 'cuMaskSize'), ('unsigned int*', 'cuMask')] #define INIT_hipExtStreamGetCUMask_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipExtStreamGetCUMask.stream = (hipStream_t)stream; \ cb_data.args.hipExtStreamGetCUMask.cuMaskSize = (unsigned int)cuMaskSize; \ cb_data.args.hipExtStreamGetCUMask.cuMask = (unsigned int*)cuMask; \ }; // hipExternalMemoryGetMappedBuffer[('void**', 'devPtr'), ('hipExternalMemory_t', 'extMem'), ('const hipExternalMemoryBufferDesc*', 'bufferDesc')] #define INIT_hipExternalMemoryGetMappedBuffer_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipExternalMemoryGetMappedBuffer.devPtr = (void**)devPtr; \ cb_data.args.hipExternalMemoryGetMappedBuffer.extMem = (hipExternalMemory_t)extMem; \ cb_data.args.hipExternalMemoryGetMappedBuffer.bufferDesc = (const hipExternalMemoryBufferDesc*)bufferDesc; \ }; // hipFree[('void*', 'ptr')] #define INIT_hipFree_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFree.ptr = (void*)ptr; \ }; // hipFreeArray[('hipArray*', 'array')] #define INIT_hipFreeArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFreeArray.array = (hipArray*)array; \ }; // hipFreeAsync[('void*', 'dev_ptr'), ('hipStream_t', 'stream')] #define INIT_hipFreeAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFreeAsync.dev_ptr = (void*)dev_ptr; \ cb_data.args.hipFreeAsync.stream = (hipStream_t)stream; \ }; // hipFreeHost[('void*', 'ptr')] #define INIT_hipFreeHost_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFreeHost.ptr = (void*)ptr; \ }; // hipFreeMipmappedArray[('hipMipmappedArray_t', 'mipmappedArray')] #define INIT_hipFreeMipmappedArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFreeMipmappedArray.mipmappedArray = (hipMipmappedArray_t)mipmappedArray; \ }; // hipFuncGetAttribute[('int*', 'value'), ('hipFunction_attribute', 'attrib'), ('hipFunction_t', 'hfunc')] #define INIT_hipFuncGetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFuncGetAttribute.value = (int*)value; \ cb_data.args.hipFuncGetAttribute.attrib = (hipFunction_attribute)attrib; \ cb_data.args.hipFuncGetAttribute.hfunc = (hipFunction_t)hfunc; \ }; // hipFuncGetAttributes[('hipFuncAttributes*', 'attr'), ('const void*', 'func')] #define INIT_hipFuncGetAttributes_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFuncGetAttributes.attr = (hipFuncAttributes*)attr; \ cb_data.args.hipFuncGetAttributes.func = (const void*)func; \ }; // hipFuncSetAttribute[('const void*', 'func'), ('hipFuncAttribute', 'attr'), ('int', 'value')] #define INIT_hipFuncSetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFuncSetAttribute.func = (const void*)func; \ cb_data.args.hipFuncSetAttribute.attr = (hipFuncAttribute)attr; \ cb_data.args.hipFuncSetAttribute.value = (int)value; \ }; // hipFuncSetCacheConfig[('const void*', 'func'), ('hipFuncCache_t', 'config')] #define INIT_hipFuncSetCacheConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFuncSetCacheConfig.func = (const void*)func; \ cb_data.args.hipFuncSetCacheConfig.config = (hipFuncCache_t)cacheConfig; \ }; // hipFuncSetSharedMemConfig[('const void*', 'func'), ('hipSharedMemConfig', 'config')] #define INIT_hipFuncSetSharedMemConfig_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipFuncSetSharedMemConfig.func = (const void*)func; \ cb_data.args.hipFuncSetSharedMemConfig.config = (hipSharedMemConfig)config; \ }; // hipGLGetDevices[('unsigned int*', 'pHipDeviceCount'), ('int*', 'pHipDevices'), ('unsigned int', 'hipDeviceCount'), ('hipGLDeviceList', 'deviceList')] #define INIT_hipGLGetDevices_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGLGetDevices.pHipDeviceCount = (unsigned int*)pHipDeviceCount; \ cb_data.args.hipGLGetDevices.pHipDevices = (int*)pHipDevices; \ cb_data.args.hipGLGetDevices.hipDeviceCount = (unsigned int)hipDeviceCount; \ cb_data.args.hipGLGetDevices.deviceList = (hipGLDeviceList)deviceList; \ }; // hipGetChannelDesc[('hipChannelFormatDesc*', 'desc'), ('hipArray_const_t', 'array')] #define INIT_hipGetChannelDesc_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGetChannelDesc.desc = (hipChannelFormatDesc*)desc; \ cb_data.args.hipGetChannelDesc.array = (hipArray_const_t)array; \ }; // hipGetDevice[('int*', 'deviceId')] #define INIT_hipGetDevice_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGetDevice.deviceId = (int*)deviceId; \ }; // hipGetDeviceCount[('int*', 'count')] #define INIT_hipGetDeviceCount_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGetDeviceCount.count = (int*)count; \ }; // hipGetDeviceFlags[('unsigned int*', 'flags')] #define INIT_hipGetDeviceFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGetDeviceFlags.flags = (unsigned int*)flags; \ }; // hipGetDeviceProperties[('hipDeviceProp_t*', 'props'), ('hipDevice_t', 'device')] #define INIT_hipGetDeviceProperties_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGetDeviceProperties.props = (hipDeviceProp_t*)props; \ cb_data.args.hipGetDeviceProperties.device = (hipDevice_t)device; \ }; // hipGetErrorString[] #define INIT_hipGetErrorString_CB_ARGS_DATA(cb_data) { \ }; // hipGetLastError[] #define INIT_hipGetLastError_CB_ARGS_DATA(cb_data) { \ }; // hipGetMipmappedArrayLevel[('hipArray_t*', 'levelArray'), ('hipMipmappedArray_const_t', 'mipmappedArray'), ('unsigned int', 'level')] #define INIT_hipGetMipmappedArrayLevel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGetMipmappedArrayLevel.levelArray = (hipArray_t*)levelArray; \ cb_data.args.hipGetMipmappedArrayLevel.mipmappedArray = (hipMipmappedArray_const_t)mipmappedArray; \ cb_data.args.hipGetMipmappedArrayLevel.level = (unsigned int)level; \ }; // hipGetSymbolAddress[('void**', 'devPtr'), ('const void*', 'symbol')] #define INIT_hipGetSymbolAddress_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGetSymbolAddress.devPtr = (void**)devPtr; \ cb_data.args.hipGetSymbolAddress.symbol = (const void*)symbol; \ }; // hipGetSymbolSize[('size_t*', 'size'), ('const void*', 'symbol')] #define INIT_hipGetSymbolSize_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGetSymbolSize.size = (size_t*)sizePtr; \ cb_data.args.hipGetSymbolSize.symbol = (const void*)symbol; \ }; // hipGraphAddChildGraphNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipGraph_t', 'childGraph')] #define INIT_hipGraphAddChildGraphNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddChildGraphNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddChildGraphNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddChildGraphNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddChildGraphNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddChildGraphNode.childGraph = (hipGraph_t)childGraph; \ }; // hipGraphAddDependencies[('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'from'), ('const hipGraphNode_t*', 'to'), ('size_t', 'numDependencies')] #define INIT_hipGraphAddDependencies_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddDependencies.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddDependencies.from = (const hipGraphNode_t*)from; \ cb_data.args.hipGraphAddDependencies.to = (const hipGraphNode_t*)to; \ cb_data.args.hipGraphAddDependencies.numDependencies = (size_t)numDependencies; \ }; // hipGraphAddEmptyNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies')] #define INIT_hipGraphAddEmptyNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddEmptyNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddEmptyNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddEmptyNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddEmptyNode.numDependencies = (size_t)numDependencies; \ }; // hipGraphAddEventRecordNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipEvent_t', 'event')] #define INIT_hipGraphAddEventRecordNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddEventRecordNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddEventRecordNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddEventRecordNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddEventRecordNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddEventRecordNode.event = (hipEvent_t)event; \ }; // hipGraphAddEventWaitNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipEvent_t', 'event')] #define INIT_hipGraphAddEventWaitNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddEventWaitNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddEventWaitNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddEventWaitNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddEventWaitNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddEventWaitNode.event = (hipEvent_t)event; \ }; // hipGraphAddHostNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipHostNodeParams*', 'pNodeParams')] #define INIT_hipGraphAddHostNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddHostNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddHostNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddHostNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddHostNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddHostNode.pNodeParams = (const hipHostNodeParams*)pNodeParams; \ }; // hipGraphAddKernelNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipKernelNodeParams*', 'pNodeParams')] #define INIT_hipGraphAddKernelNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddKernelNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddKernelNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddKernelNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddKernelNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddKernelNode.pNodeParams = (const hipKernelNodeParams*)pNodeParams; \ }; // hipGraphAddMemAllocNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipMemAllocNodeParams*', 'pNodeParams')] #define INIT_hipGraphAddMemAllocNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddMemAllocNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddMemAllocNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddMemAllocNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddMemAllocNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddMemAllocNode.pNodeParams = (hipMemAllocNodeParams*)pNodeParams; \ }; // hipGraphAddMemFreeNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('void*', 'dev_ptr')] #define INIT_hipGraphAddMemFreeNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddMemFreeNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddMemFreeNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddMemFreeNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddMemFreeNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddMemFreeNode.dev_ptr = (void*)dev_ptr; \ }; // hipGraphAddMemcpyNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipMemcpy3DParms*', 'pCopyParams')] #define INIT_hipGraphAddMemcpyNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddMemcpyNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddMemcpyNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddMemcpyNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddMemcpyNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddMemcpyNode.pCopyParams = (const hipMemcpy3DParms*)pCopyParams; \ }; // hipGraphAddMemcpyNode1D[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('void*', 'dst'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphAddMemcpyNode1D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddMemcpyNode1D.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddMemcpyNode1D.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddMemcpyNode1D.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddMemcpyNode1D.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddMemcpyNode1D.dst = (void*)dst; \ cb_data.args.hipGraphAddMemcpyNode1D.src = (const void*)src; \ cb_data.args.hipGraphAddMemcpyNode1D.count = (size_t)count; \ cb_data.args.hipGraphAddMemcpyNode1D.kind = (hipMemcpyKind)kind; \ }; // hipGraphAddMemcpyNodeFromSymbol[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphAddMemcpyNodeFromSymbol_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.dst = (void*)dst; \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.symbol = (const void*)symbol; \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.count = (size_t)count; \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.offset = (size_t)offset; \ cb_data.args.hipGraphAddMemcpyNodeFromSymbol.kind = (hipMemcpyKind)kind; \ }; // hipGraphAddMemcpyNodeToSymbol[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphAddMemcpyNodeToSymbol_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.symbol = (const void*)symbol; \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.src = (const void*)src; \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.count = (size_t)count; \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.offset = (size_t)offset; \ cb_data.args.hipGraphAddMemcpyNodeToSymbol.kind = (hipMemcpyKind)kind; \ }; // hipGraphAddMemsetNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipMemsetParams*', 'pMemsetParams')] #define INIT_hipGraphAddMemsetNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphAddMemsetNode.pGraphNode = (hipGraphNode_t*)pGraphNode; \ cb_data.args.hipGraphAddMemsetNode.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphAddMemsetNode.pDependencies = (const hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphAddMemsetNode.numDependencies = (size_t)numDependencies; \ cb_data.args.hipGraphAddMemsetNode.pMemsetParams = (const hipMemsetParams*)pMemsetParams; \ }; // hipGraphChildGraphNodeGetGraph[('hipGraphNode_t', 'node'), ('hipGraph_t*', 'pGraph')] #define INIT_hipGraphChildGraphNodeGetGraph_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphChildGraphNodeGetGraph.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphChildGraphNodeGetGraph.pGraph = (hipGraph_t*)pGraph; \ }; // hipGraphClone[('hipGraph_t*', 'pGraphClone'), ('hipGraph_t', 'originalGraph')] #define INIT_hipGraphClone_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphClone.pGraphClone = (hipGraph_t*)pGraphClone; \ cb_data.args.hipGraphClone.originalGraph = (hipGraph_t)originalGraph; \ }; // hipGraphCreate[('hipGraph_t*', 'pGraph'), ('unsigned int', 'flags')] #define INIT_hipGraphCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphCreate.pGraph = (hipGraph_t*)pGraph; \ cb_data.args.hipGraphCreate.flags = (unsigned int)flags; \ }; // hipGraphDebugDotPrint[('hipGraph_t', 'graph'), ('const char*', 'path'), ('unsigned int', 'flags')] #define INIT_hipGraphDebugDotPrint_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphDebugDotPrint.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphDebugDotPrint.path = (path) ? strdup(path) : NULL; \ cb_data.args.hipGraphDebugDotPrint.flags = (unsigned int)flags; \ }; // hipGraphDestroy[('hipGraph_t', 'graph')] #define INIT_hipGraphDestroy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphDestroy.graph = (hipGraph_t)graph; \ }; // hipGraphDestroyNode[('hipGraphNode_t', 'node')] #define INIT_hipGraphDestroyNode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphDestroyNode.node = (hipGraphNode_t)node; \ }; // hipGraphEventRecordNodeGetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t*', 'event_out')] #define INIT_hipGraphEventRecordNodeGetEvent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphEventRecordNodeGetEvent.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphEventRecordNodeGetEvent.event_out = (hipEvent_t*)event_out; \ }; // hipGraphEventRecordNodeSetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t', 'event')] #define INIT_hipGraphEventRecordNodeSetEvent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphEventRecordNodeSetEvent.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphEventRecordNodeSetEvent.event = (hipEvent_t)event; \ }; // hipGraphEventWaitNodeGetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t*', 'event_out')] #define INIT_hipGraphEventWaitNodeGetEvent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphEventWaitNodeGetEvent.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphEventWaitNodeGetEvent.event_out = (hipEvent_t*)event_out; \ }; // hipGraphEventWaitNodeSetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t', 'event')] #define INIT_hipGraphEventWaitNodeSetEvent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphEventWaitNodeSetEvent.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphEventWaitNodeSetEvent.event = (hipEvent_t)event; \ }; // hipGraphExecChildGraphNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('hipGraph_t', 'childGraph')] #define INIT_hipGraphExecChildGraphNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecChildGraphNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecChildGraphNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphExecChildGraphNodeSetParams.childGraph = (hipGraph_t)childGraph; \ }; // hipGraphExecDestroy[('hipGraphExec_t', 'graphExec')] #define INIT_hipGraphExecDestroy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecDestroy.graphExec = (hipGraphExec_t)pGraphExec; \ }; // hipGraphExecEventRecordNodeSetEvent[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('hipEvent_t', 'event')] #define INIT_hipGraphExecEventRecordNodeSetEvent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecEventRecordNodeSetEvent.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecEventRecordNodeSetEvent.hNode = (hipGraphNode_t)hNode; \ cb_data.args.hipGraphExecEventRecordNodeSetEvent.event = (hipEvent_t)event; \ }; // hipGraphExecEventWaitNodeSetEvent[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('hipEvent_t', 'event')] #define INIT_hipGraphExecEventWaitNodeSetEvent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecEventWaitNodeSetEvent.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecEventWaitNodeSetEvent.hNode = (hipGraphNode_t)hNode; \ cb_data.args.hipGraphExecEventWaitNodeSetEvent.event = (hipEvent_t)event; \ }; // hipGraphExecHostNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipHostNodeParams*', 'pNodeParams')] #define INIT_hipGraphExecHostNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecHostNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecHostNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphExecHostNodeSetParams.pNodeParams = (const hipHostNodeParams*)pNodeParams; \ }; // hipGraphExecKernelNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipKernelNodeParams*', 'pNodeParams')] #define INIT_hipGraphExecKernelNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecKernelNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecKernelNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphExecKernelNodeSetParams.pNodeParams = (const hipKernelNodeParams*)pNodeParams; \ }; // hipGraphExecMemcpyNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('hipMemcpy3DParms*', 'pNodeParams')] #define INIT_hipGraphExecMemcpyNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecMemcpyNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecMemcpyNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphExecMemcpyNodeSetParams.pNodeParams = (hipMemcpy3DParms*)pNodeParams; \ }; // hipGraphExecMemcpyNodeSetParams1D[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphExecMemcpyNodeSetParams1D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecMemcpyNodeSetParams1D.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecMemcpyNodeSetParams1D.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphExecMemcpyNodeSetParams1D.dst = (void*)dst; \ cb_data.args.hipGraphExecMemcpyNodeSetParams1D.src = (const void*)src; \ cb_data.args.hipGraphExecMemcpyNodeSetParams1D.count = (size_t)count; \ cb_data.args.hipGraphExecMemcpyNodeSetParams1D.kind = (hipMemcpyKind)kind; \ }; // hipGraphExecMemcpyNodeSetParamsFromSymbol[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphExecMemcpyNodeSetParamsFromSymbol_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.dst = (void*)dst; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.symbol = (const void*)symbol; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.count = (size_t)count; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.offset = (size_t)offset; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsFromSymbol.kind = (hipMemcpyKind)kind; \ }; // hipGraphExecMemcpyNodeSetParamsToSymbol[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphExecMemcpyNodeSetParamsToSymbol_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.symbol = (const void*)symbol; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.src = (const void*)src; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.count = (size_t)count; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.offset = (size_t)offset; \ cb_data.args.hipGraphExecMemcpyNodeSetParamsToSymbol.kind = (hipMemcpyKind)kind; \ }; // hipGraphExecMemsetNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipMemsetParams*', 'pNodeParams')] #define INIT_hipGraphExecMemsetNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecMemsetNodeSetParams.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecMemsetNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphExecMemsetNodeSetParams.pNodeParams = (const hipMemsetParams*)pNodeParams; \ }; // hipGraphExecUpdate[('hipGraphExec_t', 'hGraphExec'), ('hipGraph_t', 'hGraph'), ('hipGraphNode_t*', 'hErrorNode_out'), ('hipGraphExecUpdateResult*', 'updateResult_out')] #define INIT_hipGraphExecUpdate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphExecUpdate.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphExecUpdate.hGraph = (hipGraph_t)hGraph; \ cb_data.args.hipGraphExecUpdate.hErrorNode_out = (hipGraphNode_t*)hErrorNode_out; \ cb_data.args.hipGraphExecUpdate.updateResult_out = (hipGraphExecUpdateResult*)updateResult_out; \ }; // hipGraphGetEdges[('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'from'), ('hipGraphNode_t*', 'to'), ('size_t*', 'numEdges')] #define INIT_hipGraphGetEdges_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphGetEdges.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphGetEdges.from = (hipGraphNode_t*)from; \ cb_data.args.hipGraphGetEdges.to = (hipGraphNode_t*)to; \ cb_data.args.hipGraphGetEdges.numEdges = (size_t*)numEdges; \ }; // hipGraphGetNodes[('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'nodes'), ('size_t*', 'numNodes')] #define INIT_hipGraphGetNodes_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphGetNodes.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphGetNodes.nodes = (hipGraphNode_t*)nodes; \ cb_data.args.hipGraphGetNodes.numNodes = (size_t*)numNodes; \ }; // hipGraphGetRootNodes[('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'pRootNodes'), ('size_t*', 'pNumRootNodes')] #define INIT_hipGraphGetRootNodes_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphGetRootNodes.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphGetRootNodes.pRootNodes = (hipGraphNode_t*)pRootNodes; \ cb_data.args.hipGraphGetRootNodes.pNumRootNodes = (size_t*)pNumRootNodes; \ }; // hipGraphHostNodeGetParams[('hipGraphNode_t', 'node'), ('hipHostNodeParams*', 'pNodeParams')] #define INIT_hipGraphHostNodeGetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphHostNodeGetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphHostNodeGetParams.pNodeParams = (hipHostNodeParams*)pNodeParams; \ }; // hipGraphHostNodeSetParams[('hipGraphNode_t', 'node'), ('const hipHostNodeParams*', 'pNodeParams')] #define INIT_hipGraphHostNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphHostNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphHostNodeSetParams.pNodeParams = (const hipHostNodeParams*)pNodeParams; \ }; // hipGraphInstantiate[('hipGraphExec_t*', 'pGraphExec'), ('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'pErrorNode'), ('char*', 'pLogBuffer'), ('size_t', 'bufferSize')] #define INIT_hipGraphInstantiate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphInstantiate.pGraphExec = (hipGraphExec_t*)pGraphExec; \ cb_data.args.hipGraphInstantiate.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphInstantiate.pErrorNode = (hipGraphNode_t*)pErrorNode; \ cb_data.args.hipGraphInstantiate.pLogBuffer = (char*)pLogBuffer; \ cb_data.args.hipGraphInstantiate.bufferSize = (size_t)bufferSize; \ }; // hipGraphInstantiateWithFlags[('hipGraphExec_t*', 'pGraphExec'), ('hipGraph_t', 'graph'), ('unsigned long long', 'flags')] #define INIT_hipGraphInstantiateWithFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphInstantiateWithFlags.pGraphExec = (hipGraphExec_t*)pGraphExec; \ cb_data.args.hipGraphInstantiateWithFlags.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphInstantiateWithFlags.flags = (unsigned long long)flags; \ }; // hipGraphKernelNodeCopyAttributes[('hipGraphNode_t', 'hSrc'), ('hipGraphNode_t', 'hDst')] #define INIT_hipGraphKernelNodeCopyAttributes_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphKernelNodeCopyAttributes.hSrc = (hipGraphNode_t)hSrc; \ cb_data.args.hipGraphKernelNodeCopyAttributes.hDst = (hipGraphNode_t)hDst; \ }; // hipGraphKernelNodeGetAttribute[('hipGraphNode_t', 'hNode'), ('hipKernelNodeAttrID', 'attr'), ('hipKernelNodeAttrValue*', 'value')] #define INIT_hipGraphKernelNodeGetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphKernelNodeGetAttribute.hNode = (hipGraphNode_t)hNode; \ cb_data.args.hipGraphKernelNodeGetAttribute.attr = (hipKernelNodeAttrID)attr; \ cb_data.args.hipGraphKernelNodeGetAttribute.value = (hipKernelNodeAttrValue*)value; \ }; // hipGraphKernelNodeGetParams[('hipGraphNode_t', 'node'), ('hipKernelNodeParams*', 'pNodeParams')] #define INIT_hipGraphKernelNodeGetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphKernelNodeGetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphKernelNodeGetParams.pNodeParams = (hipKernelNodeParams*)pNodeParams; \ }; // hipGraphKernelNodeSetAttribute[('hipGraphNode_t', 'hNode'), ('hipKernelNodeAttrID', 'attr'), ('const hipKernelNodeAttrValue*', 'value')] #define INIT_hipGraphKernelNodeSetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphKernelNodeSetAttribute.hNode = (hipGraphNode_t)hNode; \ cb_data.args.hipGraphKernelNodeSetAttribute.attr = (hipKernelNodeAttrID)attr; \ cb_data.args.hipGraphKernelNodeSetAttribute.value = (const hipKernelNodeAttrValue*)value; \ }; // hipGraphKernelNodeSetParams[('hipGraphNode_t', 'node'), ('const hipKernelNodeParams*', 'pNodeParams')] #define INIT_hipGraphKernelNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphKernelNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphKernelNodeSetParams.pNodeParams = (const hipKernelNodeParams*)pNodeParams; \ }; // hipGraphLaunch[('hipGraphExec_t', 'graphExec'), ('hipStream_t', 'stream')] #define INIT_hipGraphLaunch_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphLaunch.graphExec = (hipGraphExec_t)graphExec; \ cb_data.args.hipGraphLaunch.stream = (hipStream_t)stream; \ }; // hipGraphMemAllocNodeGetParams[('hipGraphNode_t', 'node'), ('hipMemAllocNodeParams*', 'pNodeParams')] #define INIT_hipGraphMemAllocNodeGetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemAllocNodeGetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemAllocNodeGetParams.pNodeParams = (hipMemAllocNodeParams*)pNodeParams; \ }; // hipGraphMemFreeNodeGetParams[('hipGraphNode_t', 'node'), ('void*', 'dev_ptr')] #define INIT_hipGraphMemFreeNodeGetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemFreeNodeGetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemFreeNodeGetParams.dev_ptr = (void*)dev_ptr; \ }; // hipGraphMemcpyNodeGetParams[('hipGraphNode_t', 'node'), ('hipMemcpy3DParms*', 'pNodeParams')] #define INIT_hipGraphMemcpyNodeGetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemcpyNodeGetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemcpyNodeGetParams.pNodeParams = (hipMemcpy3DParms*)pNodeParams; \ }; // hipGraphMemcpyNodeSetParams[('hipGraphNode_t', 'node'), ('const hipMemcpy3DParms*', 'pNodeParams')] #define INIT_hipGraphMemcpyNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemcpyNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemcpyNodeSetParams.pNodeParams = (const hipMemcpy3DParms*)pNodeParams; \ }; // hipGraphMemcpyNodeSetParams1D[('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphMemcpyNodeSetParams1D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemcpyNodeSetParams1D.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemcpyNodeSetParams1D.dst = (void*)dst; \ cb_data.args.hipGraphMemcpyNodeSetParams1D.src = (const void*)src; \ cb_data.args.hipGraphMemcpyNodeSetParams1D.count = (size_t)count; \ cb_data.args.hipGraphMemcpyNodeSetParams1D.kind = (hipMemcpyKind)kind; \ }; // hipGraphMemcpyNodeSetParamsFromSymbol[('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphMemcpyNodeSetParamsFromSymbol_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.dst = (void*)dst; \ cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.symbol = (const void*)symbol; \ cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.count = (size_t)count; \ cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.offset = (size_t)offset; \ cb_data.args.hipGraphMemcpyNodeSetParamsFromSymbol.kind = (hipMemcpyKind)kind; \ }; // hipGraphMemcpyNodeSetParamsToSymbol[('hipGraphNode_t', 'node'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] #define INIT_hipGraphMemcpyNodeSetParamsToSymbol_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.symbol = (const void*)symbol; \ cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.src = (const void*)src; \ cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.count = (size_t)count; \ cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.offset = (size_t)offset; \ cb_data.args.hipGraphMemcpyNodeSetParamsToSymbol.kind = (hipMemcpyKind)kind; \ }; // hipGraphMemsetNodeGetParams[('hipGraphNode_t', 'node'), ('hipMemsetParams*', 'pNodeParams')] #define INIT_hipGraphMemsetNodeGetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemsetNodeGetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemsetNodeGetParams.pNodeParams = (hipMemsetParams*)pNodeParams; \ }; // hipGraphMemsetNodeSetParams[('hipGraphNode_t', 'node'), ('const hipMemsetParams*', 'pNodeParams')] #define INIT_hipGraphMemsetNodeSetParams_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphMemsetNodeSetParams.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphMemsetNodeSetParams.pNodeParams = (const hipMemsetParams*)pNodeParams; \ }; // hipGraphNodeFindInClone[('hipGraphNode_t*', 'pNode'), ('hipGraphNode_t', 'originalNode'), ('hipGraph_t', 'clonedGraph')] #define INIT_hipGraphNodeFindInClone_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphNodeFindInClone.pNode = (hipGraphNode_t*)pNode; \ cb_data.args.hipGraphNodeFindInClone.originalNode = (hipGraphNode_t)originalNode; \ cb_data.args.hipGraphNodeFindInClone.clonedGraph = (hipGraph_t)clonedGraph; \ }; // hipGraphNodeGetDependencies[('hipGraphNode_t', 'node'), ('hipGraphNode_t*', 'pDependencies'), ('size_t*', 'pNumDependencies')] #define INIT_hipGraphNodeGetDependencies_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphNodeGetDependencies.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphNodeGetDependencies.pDependencies = (hipGraphNode_t*)pDependencies; \ cb_data.args.hipGraphNodeGetDependencies.pNumDependencies = (size_t*)pNumDependencies; \ }; // hipGraphNodeGetDependentNodes[('hipGraphNode_t', 'node'), ('hipGraphNode_t*', 'pDependentNodes'), ('size_t*', 'pNumDependentNodes')] #define INIT_hipGraphNodeGetDependentNodes_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphNodeGetDependentNodes.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphNodeGetDependentNodes.pDependentNodes = (hipGraphNode_t*)pDependentNodes; \ cb_data.args.hipGraphNodeGetDependentNodes.pNumDependentNodes = (size_t*)pNumDependentNodes; \ }; // hipGraphNodeGetEnabled[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('unsigned int*', 'isEnabled')] #define INIT_hipGraphNodeGetEnabled_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphNodeGetEnabled.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphNodeGetEnabled.hNode = (hipGraphNode_t)hNode; \ cb_data.args.hipGraphNodeGetEnabled.isEnabled = (unsigned int*)isEnabled; \ }; // hipGraphNodeGetType[('hipGraphNode_t', 'node'), ('hipGraphNodeType*', 'pType')] #define INIT_hipGraphNodeGetType_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphNodeGetType.node = (hipGraphNode_t)node; \ cb_data.args.hipGraphNodeGetType.pType = (hipGraphNodeType*)pType; \ }; // hipGraphNodeSetEnabled[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('unsigned int', 'isEnabled')] #define INIT_hipGraphNodeSetEnabled_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphNodeSetEnabled.hGraphExec = (hipGraphExec_t)hGraphExec; \ cb_data.args.hipGraphNodeSetEnabled.hNode = (hipGraphNode_t)hNode; \ cb_data.args.hipGraphNodeSetEnabled.isEnabled = (unsigned int)isEnabled; \ }; // hipGraphReleaseUserObject[('hipGraph_t', 'graph'), ('hipUserObject_t', 'object'), ('unsigned int', 'count')] #define INIT_hipGraphReleaseUserObject_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphReleaseUserObject.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphReleaseUserObject.object = (hipUserObject_t)object; \ cb_data.args.hipGraphReleaseUserObject.count = (unsigned int)count; \ }; // hipGraphRemoveDependencies[('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'from'), ('const hipGraphNode_t*', 'to'), ('size_t', 'numDependencies')] #define INIT_hipGraphRemoveDependencies_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphRemoveDependencies.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphRemoveDependencies.from = (const hipGraphNode_t*)from; \ cb_data.args.hipGraphRemoveDependencies.to = (const hipGraphNode_t*)to; \ cb_data.args.hipGraphRemoveDependencies.numDependencies = (size_t)numDependencies; \ }; // hipGraphRetainUserObject[('hipGraph_t', 'graph'), ('hipUserObject_t', 'object'), ('unsigned int', 'count'), ('unsigned int', 'flags')] #define INIT_hipGraphRetainUserObject_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphRetainUserObject.graph = (hipGraph_t)graph; \ cb_data.args.hipGraphRetainUserObject.object = (hipUserObject_t)object; \ cb_data.args.hipGraphRetainUserObject.count = (unsigned int)count; \ cb_data.args.hipGraphRetainUserObject.flags = (unsigned int)flags; \ }; // hipGraphUpload[('hipGraphExec_t', 'graphExec'), ('hipStream_t', 'stream')] #define INIT_hipGraphUpload_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphUpload.graphExec = (hipGraphExec_t)graphExec; \ cb_data.args.hipGraphUpload.stream = (hipStream_t)stream; \ }; // hipGraphicsGLRegisterBuffer[('hipGraphicsResource**', 'resource'), ('GLuint', 'buffer'), ('unsigned int', 'flags')] #define INIT_hipGraphicsGLRegisterBuffer_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphicsGLRegisterBuffer.resource = (hipGraphicsResource**)resource; \ cb_data.args.hipGraphicsGLRegisterBuffer.buffer = (GLuint)buffer; \ cb_data.args.hipGraphicsGLRegisterBuffer.flags = (unsigned int)flags; \ }; // hipGraphicsGLRegisterImage[('hipGraphicsResource**', 'resource'), ('GLuint', 'image'), ('GLenum', 'target'), ('unsigned int', 'flags')] #define INIT_hipGraphicsGLRegisterImage_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphicsGLRegisterImage.resource = (hipGraphicsResource**)resource; \ cb_data.args.hipGraphicsGLRegisterImage.image = (GLuint)image; \ cb_data.args.hipGraphicsGLRegisterImage.target = (GLenum)target; \ cb_data.args.hipGraphicsGLRegisterImage.flags = (unsigned int)flags; \ }; // hipGraphicsMapResources[('int', 'count'), ('hipGraphicsResource_t*', 'resources'), ('hipStream_t', 'stream')] #define INIT_hipGraphicsMapResources_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphicsMapResources.count = (int)count; \ cb_data.args.hipGraphicsMapResources.resources = (hipGraphicsResource_t*)resources; \ cb_data.args.hipGraphicsMapResources.stream = (hipStream_t)stream; \ }; // hipGraphicsResourceGetMappedPointer[('void**', 'devPtr'), ('size_t*', 'size'), ('hipGraphicsResource_t', 'resource')] #define INIT_hipGraphicsResourceGetMappedPointer_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphicsResourceGetMappedPointer.devPtr = (void**)devPtr; \ cb_data.args.hipGraphicsResourceGetMappedPointer.size = (size_t*)size; \ cb_data.args.hipGraphicsResourceGetMappedPointer.resource = (hipGraphicsResource_t)resource; \ }; // hipGraphicsSubResourceGetMappedArray[('hipArray_t*', 'array'), ('hipGraphicsResource_t', 'resource'), ('unsigned int', 'arrayIndex'), ('unsigned int', 'mipLevel')] #define INIT_hipGraphicsSubResourceGetMappedArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphicsSubResourceGetMappedArray.array = (hipArray_t*)array; \ cb_data.args.hipGraphicsSubResourceGetMappedArray.resource = (hipGraphicsResource_t)resource; \ cb_data.args.hipGraphicsSubResourceGetMappedArray.arrayIndex = (unsigned int)arrayIndex; \ cb_data.args.hipGraphicsSubResourceGetMappedArray.mipLevel = (unsigned int)mipLevel; \ }; // hipGraphicsUnmapResources[('int', 'count'), ('hipGraphicsResource_t*', 'resources'), ('hipStream_t', 'stream')] #define INIT_hipGraphicsUnmapResources_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphicsUnmapResources.count = (int)count; \ cb_data.args.hipGraphicsUnmapResources.resources = (hipGraphicsResource_t*)resources; \ cb_data.args.hipGraphicsUnmapResources.stream = (hipStream_t)stream; \ }; // hipGraphicsUnregisterResource[('hipGraphicsResource_t', 'resource')] #define INIT_hipGraphicsUnregisterResource_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipGraphicsUnregisterResource.resource = (hipGraphicsResource_t)resource; \ }; // hipHccModuleLaunchKernel[('hipFunction_t', 'f'), ('unsigned int', 'globalWorkSizeX'), ('unsigned int', 'globalWorkSizeY'), ('unsigned int', 'globalWorkSizeZ'), ('unsigned int', 'blockDimX'), ('unsigned int', 'blockDimY'), ('unsigned int', 'blockDimZ'), ('size_t', 'sharedMemBytes'), ('hipStream_t', 'hStream'), ('void**', 'kernelParams'), ('void**', 'extra'), ('hipEvent_t', 'startEvent'), ('hipEvent_t', 'stopEvent')] #define INIT_hipHccModuleLaunchKernel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipHccModuleLaunchKernel.f = (hipFunction_t)f; \ cb_data.args.hipHccModuleLaunchKernel.globalWorkSizeX = (unsigned int)globalWorkSizeX; \ cb_data.args.hipHccModuleLaunchKernel.globalWorkSizeY = (unsigned int)globalWorkSizeY; \ cb_data.args.hipHccModuleLaunchKernel.globalWorkSizeZ = (unsigned int)globalWorkSizeZ; \ cb_data.args.hipHccModuleLaunchKernel.blockDimX = (unsigned int)blockDimX; \ cb_data.args.hipHccModuleLaunchKernel.blockDimY = (unsigned int)blockDimY; \ cb_data.args.hipHccModuleLaunchKernel.blockDimZ = (unsigned int)blockDimZ; \ cb_data.args.hipHccModuleLaunchKernel.sharedMemBytes = (size_t)sharedMemBytes; \ cb_data.args.hipHccModuleLaunchKernel.hStream = (hipStream_t)hStream; \ cb_data.args.hipHccModuleLaunchKernel.kernelParams = (void**)kernelParams; \ cb_data.args.hipHccModuleLaunchKernel.extra = (void**)extra; \ cb_data.args.hipHccModuleLaunchKernel.startEvent = (hipEvent_t)startEvent; \ cb_data.args.hipHccModuleLaunchKernel.stopEvent = (hipEvent_t)stopEvent; \ }; // hipHostAlloc[('void**', 'ptr'), ('size_t', 'size'), ('unsigned int', 'flags')] #define INIT_hipHostAlloc_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipHostAlloc.ptr = (void**)ptr; \ cb_data.args.hipHostAlloc.size = (size_t)sizeBytes; \ cb_data.args.hipHostAlloc.flags = (unsigned int)flags; \ }; // hipHostFree[('void*', 'ptr')] #define INIT_hipHostFree_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipHostFree.ptr = (void*)ptr; \ }; // hipHostGetDevicePointer[('void**', 'devPtr'), ('void*', 'hstPtr'), ('unsigned int', 'flags')] #define INIT_hipHostGetDevicePointer_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipHostGetDevicePointer.devPtr = (void**)devicePointer; \ cb_data.args.hipHostGetDevicePointer.hstPtr = (void*)hostPointer; \ cb_data.args.hipHostGetDevicePointer.flags = (unsigned int)flags; \ }; // hipHostGetFlags[('unsigned int*', 'flagsPtr'), ('void*', 'hostPtr')] #define INIT_hipHostGetFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipHostGetFlags.flagsPtr = (unsigned int*)flagsPtr; \ cb_data.args.hipHostGetFlags.hostPtr = (void*)hostPtr; \ }; // hipHostMalloc[('void**', 'ptr'), ('size_t', 'size'), ('unsigned int', 'flags')] #define INIT_hipHostMalloc_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipHostMalloc.ptr = (void**)ptr; \ cb_data.args.hipHostMalloc.size = (size_t)sizeBytes; \ cb_data.args.hipHostMalloc.flags = (unsigned int)flags; \ }; // hipHostRegister[('void*', 'hostPtr'), ('size_t', 'sizeBytes'), ('unsigned int', 'flags')] #define INIT_hipHostRegister_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipHostRegister.hostPtr = (void*)hostPtr; \ cb_data.args.hipHostRegister.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipHostRegister.flags = (unsigned int)flags; \ }; // hipHostUnregister[('void*', 'hostPtr')] #define INIT_hipHostUnregister_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipHostUnregister.hostPtr = (void*)hostPtr; \ }; // hipImportExternalMemory[('hipExternalMemory_t*', 'extMem_out'), ('const hipExternalMemoryHandleDesc*', 'memHandleDesc')] #define INIT_hipImportExternalMemory_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipImportExternalMemory.extMem_out = (hipExternalMemory_t*)extMem_out; \ cb_data.args.hipImportExternalMemory.memHandleDesc = (const hipExternalMemoryHandleDesc*)memHandleDesc; \ }; // hipImportExternalSemaphore[('hipExternalSemaphore_t*', 'extSem_out'), ('const hipExternalSemaphoreHandleDesc*', 'semHandleDesc')] #define INIT_hipImportExternalSemaphore_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipImportExternalSemaphore.extSem_out = (hipExternalSemaphore_t*)extSem_out; \ cb_data.args.hipImportExternalSemaphore.semHandleDesc = (const hipExternalSemaphoreHandleDesc*)semHandleDesc; \ }; // hipInit[('unsigned int', 'flags')] #define INIT_hipInit_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipInit.flags = (unsigned int)flags; \ }; // hipIpcCloseMemHandle[('void*', 'devPtr')] #define INIT_hipIpcCloseMemHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipIpcCloseMemHandle.devPtr = (void*)dev_ptr; \ }; // hipIpcGetEventHandle[('hipIpcEventHandle_t*', 'handle'), ('hipEvent_t', 'event')] #define INIT_hipIpcGetEventHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipIpcGetEventHandle.handle = (hipIpcEventHandle_t*)handle; \ cb_data.args.hipIpcGetEventHandle.event = (hipEvent_t)event; \ }; // hipIpcGetMemHandle[('hipIpcMemHandle_t*', 'handle'), ('void*', 'devPtr')] #define INIT_hipIpcGetMemHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipIpcGetMemHandle.handle = (hipIpcMemHandle_t*)handle; \ cb_data.args.hipIpcGetMemHandle.devPtr = (void*)dev_ptr; \ }; // hipIpcOpenEventHandle[('hipEvent_t*', 'event'), ('hipIpcEventHandle_t', 'handle')] #define INIT_hipIpcOpenEventHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipIpcOpenEventHandle.event = (hipEvent_t*)event; \ cb_data.args.hipIpcOpenEventHandle.handle = (hipIpcEventHandle_t)handle; \ }; // hipIpcOpenMemHandle[('void**', 'devPtr'), ('hipIpcMemHandle_t', 'handle'), ('unsigned int', 'flags')] #define INIT_hipIpcOpenMemHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipIpcOpenMemHandle.devPtr = (void**)dev_ptr; \ cb_data.args.hipIpcOpenMemHandle.handle = (hipIpcMemHandle_t)handle; \ cb_data.args.hipIpcOpenMemHandle.flags = (unsigned int)flags; \ }; // hipLaunchByPtr[('const void*', 'hostFunction')] #define INIT_hipLaunchByPtr_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipLaunchByPtr.hostFunction = (const void*)hostFunction; \ }; // hipLaunchCooperativeKernel[('const void*', 'f'), ('dim3', 'gridDim'), ('dim3', 'blockDimX'), ('void**', 'kernelParams'), ('unsigned int', 'sharedMemBytes'), ('hipStream_t', 'stream')] #define INIT_hipLaunchCooperativeKernel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipLaunchCooperativeKernel.f = (const void*)f; \ cb_data.args.hipLaunchCooperativeKernel.gridDim = (dim3)gridDim; \ cb_data.args.hipLaunchCooperativeKernel.blockDimX = (dim3)blockDim; \ cb_data.args.hipLaunchCooperativeKernel.kernelParams = (void**)kernelParams; \ cb_data.args.hipLaunchCooperativeKernel.sharedMemBytes = (unsigned int)sharedMemBytes; \ cb_data.args.hipLaunchCooperativeKernel.stream = (hipStream_t)hStream; \ }; // hipLaunchCooperativeKernelMultiDevice[('hipLaunchParams*', 'launchParamsList'), ('int', 'numDevices'), ('unsigned int', 'flags')] #define INIT_hipLaunchCooperativeKernelMultiDevice_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipLaunchCooperativeKernelMultiDevice.launchParamsList = (hipLaunchParams*)launchParamsList; \ cb_data.args.hipLaunchCooperativeKernelMultiDevice.numDevices = (int)numDevices; \ cb_data.args.hipLaunchCooperativeKernelMultiDevice.flags = (unsigned int)flags; \ }; // hipLaunchHostFunc[('hipStream_t', 'stream'), ('hipHostFn_t', 'fn'), ('void*', 'userData')] #define INIT_hipLaunchHostFunc_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipLaunchHostFunc.stream = (hipStream_t)stream; \ cb_data.args.hipLaunchHostFunc.fn = (hipHostFn_t)fn; \ cb_data.args.hipLaunchHostFunc.userData = (void*)userData; \ }; // hipLaunchKernel[('const void*', 'function_address'), ('dim3', 'numBlocks'), ('dim3', 'dimBlocks'), ('void**', 'args'), ('size_t', 'sharedMemBytes'), ('hipStream_t', 'stream')] #define INIT_hipLaunchKernel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipLaunchKernel.function_address = (const void*)hostFunction; \ cb_data.args.hipLaunchKernel.numBlocks = (dim3)gridDim; \ cb_data.args.hipLaunchKernel.dimBlocks = (dim3)blockDim; \ cb_data.args.hipLaunchKernel.args = (void**)args; \ cb_data.args.hipLaunchKernel.sharedMemBytes = (size_t)sharedMemBytes; \ cb_data.args.hipLaunchKernel.stream = (hipStream_t)stream; \ }; // hipMalloc[('void**', 'ptr'), ('size_t', 'size')] #define INIT_hipMalloc_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMalloc.ptr = (void**)ptr; \ cb_data.args.hipMalloc.size = (size_t)sizeBytes; \ }; // hipMalloc3D[('hipPitchedPtr*', 'pitchedDevPtr'), ('hipExtent', 'extent')] #define INIT_hipMalloc3D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMalloc3D.pitchedDevPtr = (hipPitchedPtr*)pitchedDevPtr; \ cb_data.args.hipMalloc3D.extent = (hipExtent)extent; \ }; // hipMalloc3DArray[('hipArray_t*', 'array'), ('const hipChannelFormatDesc*', 'desc'), ('hipExtent', 'extent'), ('unsigned int', 'flags')] #define INIT_hipMalloc3DArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMalloc3DArray.array = (hipArray_t*)array; \ cb_data.args.hipMalloc3DArray.desc = (const hipChannelFormatDesc*)desc; \ cb_data.args.hipMalloc3DArray.extent = (hipExtent)extent; \ cb_data.args.hipMalloc3DArray.flags = (unsigned int)flags; \ }; // hipMallocArray[('hipArray**', 'array'), ('const hipChannelFormatDesc*', 'desc'), ('size_t', 'width'), ('size_t', 'height'), ('unsigned int', 'flags')] #define INIT_hipMallocArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMallocArray.array = (hipArray**)array; \ cb_data.args.hipMallocArray.desc = (const hipChannelFormatDesc*)desc; \ cb_data.args.hipMallocArray.width = (size_t)width; \ cb_data.args.hipMallocArray.height = (size_t)height; \ cb_data.args.hipMallocArray.flags = (unsigned int)flags; \ }; // hipMallocAsync[('void**', 'dev_ptr'), ('size_t', 'size'), ('hipStream_t', 'stream')] #define INIT_hipMallocAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMallocAsync.dev_ptr = (void**)dev_ptr; \ cb_data.args.hipMallocAsync.size = (size_t)size; \ cb_data.args.hipMallocAsync.stream = (hipStream_t)stream; \ }; // hipMallocFromPoolAsync[('void**', 'dev_ptr'), ('size_t', 'size'), ('hipMemPool_t', 'mem_pool'), ('hipStream_t', 'stream')] #define INIT_hipMallocFromPoolAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMallocFromPoolAsync.dev_ptr = (void**)dev_ptr; \ cb_data.args.hipMallocFromPoolAsync.size = (size_t)size; \ cb_data.args.hipMallocFromPoolAsync.mem_pool = (hipMemPool_t)mem_pool; \ cb_data.args.hipMallocFromPoolAsync.stream = (hipStream_t)stream; \ }; // hipMallocHost[('void**', 'ptr'), ('size_t', 'size')] #define INIT_hipMallocHost_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMallocHost.ptr = (void**)ptr; \ cb_data.args.hipMallocHost.size = (size_t)size; \ }; // hipMallocManaged[('void**', 'dev_ptr'), ('size_t', 'size'), ('unsigned int', 'flags')] #define INIT_hipMallocManaged_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMallocManaged.dev_ptr = (void**)dev_ptr; \ cb_data.args.hipMallocManaged.size = (size_t)size; \ cb_data.args.hipMallocManaged.flags = (unsigned int)flags; \ }; // hipMallocMipmappedArray[('hipMipmappedArray_t*', 'mipmappedArray'), ('const hipChannelFormatDesc*', 'desc'), ('hipExtent', 'extent'), ('unsigned int', 'numLevels'), ('unsigned int', 'flags')] #define INIT_hipMallocMipmappedArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMallocMipmappedArray.mipmappedArray = (hipMipmappedArray_t*)mipmappedArray; \ cb_data.args.hipMallocMipmappedArray.desc = (const hipChannelFormatDesc*)desc; \ cb_data.args.hipMallocMipmappedArray.extent = (hipExtent)extent; \ cb_data.args.hipMallocMipmappedArray.numLevels = (unsigned int)numLevels; \ cb_data.args.hipMallocMipmappedArray.flags = (unsigned int)flags; \ }; // hipMallocPitch[('void**', 'ptr'), ('size_t*', 'pitch'), ('size_t', 'width'), ('size_t', 'height')] #define INIT_hipMallocPitch_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMallocPitch.ptr = (void**)ptr; \ cb_data.args.hipMallocPitch.pitch = (size_t*)pitch; \ cb_data.args.hipMallocPitch.width = (size_t)width; \ cb_data.args.hipMallocPitch.height = (size_t)height; \ }; // hipMemAddressFree[('void*', 'devPtr'), ('size_t', 'size')] #define INIT_hipMemAddressFree_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemAddressFree.devPtr = (void*)devPtr; \ cb_data.args.hipMemAddressFree.size = (size_t)size; \ }; // hipMemAddressReserve[('void**', 'ptr'), ('size_t', 'size'), ('size_t', 'alignment'), ('void*', 'addr'), ('unsigned long long', 'flags')] #define INIT_hipMemAddressReserve_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemAddressReserve.ptr = (void**)ptr; \ cb_data.args.hipMemAddressReserve.size = (size_t)size; \ cb_data.args.hipMemAddressReserve.alignment = (size_t)alignment; \ cb_data.args.hipMemAddressReserve.addr = (void*)addr; \ cb_data.args.hipMemAddressReserve.flags = (unsigned long long)flags; \ }; // hipMemAdvise[('const void*', 'dev_ptr'), ('size_t', 'count'), ('hipMemoryAdvise', 'advice'), ('int', 'device')] #define INIT_hipMemAdvise_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemAdvise.dev_ptr = (const void*)dev_ptr; \ cb_data.args.hipMemAdvise.count = (size_t)count; \ cb_data.args.hipMemAdvise.advice = (hipMemoryAdvise)advice; \ cb_data.args.hipMemAdvise.device = (int)device; \ }; // hipMemAllocHost[('void**', 'ptr'), ('size_t', 'size')] #define INIT_hipMemAllocHost_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemAllocHost.ptr = (void**)ptr; \ cb_data.args.hipMemAllocHost.size = (size_t)size; \ }; // hipMemAllocPitch[('hipDeviceptr_t*', 'dptr'), ('size_t*', 'pitch'), ('size_t', 'widthInBytes'), ('size_t', 'height'), ('unsigned int', 'elementSizeBytes')] #define INIT_hipMemAllocPitch_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemAllocPitch.dptr = (hipDeviceptr_t*)dptr; \ cb_data.args.hipMemAllocPitch.pitch = (size_t*)pitch; \ cb_data.args.hipMemAllocPitch.widthInBytes = (size_t)widthInBytes; \ cb_data.args.hipMemAllocPitch.height = (size_t)height; \ cb_data.args.hipMemAllocPitch.elementSizeBytes = (unsigned int)elementSizeBytes; \ }; // hipMemCreate[('hipMemGenericAllocationHandle_t*', 'handle'), ('size_t', 'size'), ('const hipMemAllocationProp*', 'prop'), ('unsigned long long', 'flags')] #define INIT_hipMemCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemCreate.handle = (hipMemGenericAllocationHandle_t*)handle; \ cb_data.args.hipMemCreate.size = (size_t)size; \ cb_data.args.hipMemCreate.prop = (const hipMemAllocationProp*)prop; \ cb_data.args.hipMemCreate.flags = (unsigned long long)flags; \ }; // hipMemExportToShareableHandle[('void*', 'shareableHandle'), ('hipMemGenericAllocationHandle_t', 'handle'), ('hipMemAllocationHandleType', 'handleType'), ('unsigned long long', 'flags')] #define INIT_hipMemExportToShareableHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemExportToShareableHandle.shareableHandle = (void*)shareableHandle; \ cb_data.args.hipMemExportToShareableHandle.handle = (hipMemGenericAllocationHandle_t)handle; \ cb_data.args.hipMemExportToShareableHandle.handleType = (hipMemAllocationHandleType)handleType; \ cb_data.args.hipMemExportToShareableHandle.flags = (unsigned long long)flags; \ }; // hipMemGetAccess[('unsigned long long*', 'flags'), ('const hipMemLocation*', 'location'), ('void*', 'ptr')] #define INIT_hipMemGetAccess_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemGetAccess.flags = (unsigned long long*)flags; \ cb_data.args.hipMemGetAccess.location = (const hipMemLocation*)location; \ cb_data.args.hipMemGetAccess.ptr = (void*)ptr; \ }; // hipMemGetAddressRange[('hipDeviceptr_t*', 'pbase'), ('size_t*', 'psize'), ('hipDeviceptr_t', 'dptr')] #define INIT_hipMemGetAddressRange_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemGetAddressRange.pbase = (hipDeviceptr_t*)pbase; \ cb_data.args.hipMemGetAddressRange.psize = (size_t*)psize; \ cb_data.args.hipMemGetAddressRange.dptr = (hipDeviceptr_t)dptr; \ }; // hipMemGetAllocationGranularity[('size_t*', 'granularity'), ('const hipMemAllocationProp*', 'prop'), ('hipMemAllocationGranularity_flags', 'option')] #define INIT_hipMemGetAllocationGranularity_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemGetAllocationGranularity.granularity = (size_t*)granularity; \ cb_data.args.hipMemGetAllocationGranularity.prop = (const hipMemAllocationProp*)prop; \ cb_data.args.hipMemGetAllocationGranularity.option = (hipMemAllocationGranularity_flags)option; \ }; // hipMemGetAllocationPropertiesFromHandle[('hipMemAllocationProp*', 'prop'), ('hipMemGenericAllocationHandle_t', 'handle')] #define INIT_hipMemGetAllocationPropertiesFromHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemGetAllocationPropertiesFromHandle.prop = (hipMemAllocationProp*)prop; \ cb_data.args.hipMemGetAllocationPropertiesFromHandle.handle = (hipMemGenericAllocationHandle_t)handle; \ }; // hipMemGetInfo[('size_t*', 'free'), ('size_t*', 'total')] #define INIT_hipMemGetInfo_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemGetInfo.free = (size_t*)free; \ cb_data.args.hipMemGetInfo.total = (size_t*)total; \ }; // hipMemImportFromShareableHandle[('hipMemGenericAllocationHandle_t*', 'handle'), ('void*', 'osHandle'), ('hipMemAllocationHandleType', 'shHandleType')] #define INIT_hipMemImportFromShareableHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemImportFromShareableHandle.handle = (hipMemGenericAllocationHandle_t*)handle; \ cb_data.args.hipMemImportFromShareableHandle.osHandle = (void*)osHandle; \ cb_data.args.hipMemImportFromShareableHandle.shHandleType = (hipMemAllocationHandleType)shHandleType; \ }; // hipMemMap[('void*', 'ptr'), ('size_t', 'size'), ('size_t', 'offset'), ('hipMemGenericAllocationHandle_t', 'handle'), ('unsigned long long', 'flags')] #define INIT_hipMemMap_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemMap.ptr = (void*)ptr; \ cb_data.args.hipMemMap.size = (size_t)size; \ cb_data.args.hipMemMap.offset = (size_t)offset; \ cb_data.args.hipMemMap.handle = (hipMemGenericAllocationHandle_t)handle; \ cb_data.args.hipMemMap.flags = (unsigned long long)flags; \ }; // hipMemMapArrayAsync[('hipArrayMapInfo*', 'mapInfoList'), ('unsigned int', 'count'), ('hipStream_t', 'stream')] #define INIT_hipMemMapArrayAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemMapArrayAsync.mapInfoList = (hipArrayMapInfo*)mapInfoList; \ cb_data.args.hipMemMapArrayAsync.count = (unsigned int)count; \ cb_data.args.hipMemMapArrayAsync.stream = (hipStream_t)stream; \ }; // hipMemPoolCreate[('hipMemPool_t*', 'mem_pool'), ('const hipMemPoolProps*', 'pool_props')] #define INIT_hipMemPoolCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolCreate.mem_pool = (hipMemPool_t*)mem_pool; \ cb_data.args.hipMemPoolCreate.pool_props = (const hipMemPoolProps*)pool_props; \ }; // hipMemPoolDestroy[('hipMemPool_t', 'mem_pool')] #define INIT_hipMemPoolDestroy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolDestroy.mem_pool = (hipMemPool_t)mem_pool; \ }; // hipMemPoolExportPointer[('hipMemPoolPtrExportData*', 'export_data'), ('void*', 'dev_ptr')] #define INIT_hipMemPoolExportPointer_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolExportPointer.export_data = (hipMemPoolPtrExportData*)export_data; \ cb_data.args.hipMemPoolExportPointer.dev_ptr = (void*)ptr; \ }; // hipMemPoolExportToShareableHandle[('void*', 'shared_handle'), ('hipMemPool_t', 'mem_pool'), ('hipMemAllocationHandleType', 'handle_type'), ('unsigned int', 'flags')] #define INIT_hipMemPoolExportToShareableHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolExportToShareableHandle.shared_handle = (void*)shared_handle; \ cb_data.args.hipMemPoolExportToShareableHandle.mem_pool = (hipMemPool_t)mem_pool; \ cb_data.args.hipMemPoolExportToShareableHandle.handle_type = (hipMemAllocationHandleType)handle_type; \ cb_data.args.hipMemPoolExportToShareableHandle.flags = (unsigned int)flags; \ }; // hipMemPoolGetAccess[('hipMemAccessFlags*', 'flags'), ('hipMemPool_t', 'mem_pool'), ('hipMemLocation*', 'location')] #define INIT_hipMemPoolGetAccess_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolGetAccess.flags = (hipMemAccessFlags*)flags; \ cb_data.args.hipMemPoolGetAccess.mem_pool = (hipMemPool_t)mem_pool; \ cb_data.args.hipMemPoolGetAccess.location = (hipMemLocation*)location; \ }; // hipMemPoolGetAttribute[('hipMemPool_t', 'mem_pool'), ('hipMemPoolAttr', 'attr'), ('void*', 'value')] #define INIT_hipMemPoolGetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolGetAttribute.mem_pool = (hipMemPool_t)mem_pool; \ cb_data.args.hipMemPoolGetAttribute.attr = (hipMemPoolAttr)attr; \ cb_data.args.hipMemPoolGetAttribute.value = (void*)value; \ }; // hipMemPoolImportFromShareableHandle[('hipMemPool_t*', 'mem_pool'), ('void*', 'shared_handle'), ('hipMemAllocationHandleType', 'handle_type'), ('unsigned int', 'flags')] #define INIT_hipMemPoolImportFromShareableHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolImportFromShareableHandle.mem_pool = (hipMemPool_t*)mem_pool; \ cb_data.args.hipMemPoolImportFromShareableHandle.shared_handle = (void*)shared_handle; \ cb_data.args.hipMemPoolImportFromShareableHandle.handle_type = (hipMemAllocationHandleType)handle_type; \ cb_data.args.hipMemPoolImportFromShareableHandle.flags = (unsigned int)flags; \ }; // hipMemPoolImportPointer[('void**', 'dev_ptr'), ('hipMemPool_t', 'mem_pool'), ('hipMemPoolPtrExportData*', 'export_data')] #define INIT_hipMemPoolImportPointer_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolImportPointer.dev_ptr = (void**)ptr; \ cb_data.args.hipMemPoolImportPointer.mem_pool = (hipMemPool_t)mem_pool; \ cb_data.args.hipMemPoolImportPointer.export_data = (hipMemPoolPtrExportData*)export_data; \ }; // hipMemPoolSetAccess[('hipMemPool_t', 'mem_pool'), ('const hipMemAccessDesc*', 'desc_list'), ('size_t', 'count')] #define INIT_hipMemPoolSetAccess_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolSetAccess.mem_pool = (hipMemPool_t)mem_pool; \ cb_data.args.hipMemPoolSetAccess.desc_list = (const hipMemAccessDesc*)desc_list; \ cb_data.args.hipMemPoolSetAccess.count = (size_t)count; \ }; // hipMemPoolSetAttribute[('hipMemPool_t', 'mem_pool'), ('hipMemPoolAttr', 'attr'), ('void*', 'value')] #define INIT_hipMemPoolSetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolSetAttribute.mem_pool = (hipMemPool_t)mem_pool; \ cb_data.args.hipMemPoolSetAttribute.attr = (hipMemPoolAttr)attr; \ cb_data.args.hipMemPoolSetAttribute.value = (void*)value; \ }; // hipMemPoolTrimTo[('hipMemPool_t', 'mem_pool'), ('size_t', 'min_bytes_to_hold')] #define INIT_hipMemPoolTrimTo_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPoolTrimTo.mem_pool = (hipMemPool_t)mem_pool; \ cb_data.args.hipMemPoolTrimTo.min_bytes_to_hold = (size_t)min_bytes_to_hold; \ }; // hipMemPrefetchAsync[('const void*', 'dev_ptr'), ('size_t', 'count'), ('int', 'device'), ('hipStream_t', 'stream')] #define INIT_hipMemPrefetchAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPrefetchAsync.dev_ptr = (const void*)dev_ptr; \ cb_data.args.hipMemPrefetchAsync.count = (size_t)count; \ cb_data.args.hipMemPrefetchAsync.device = (int)device; \ cb_data.args.hipMemPrefetchAsync.stream = (hipStream_t)stream; \ }; // hipMemPtrGetInfo[('void*', 'ptr'), ('size_t*', 'size')] #define INIT_hipMemPtrGetInfo_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemPtrGetInfo.ptr = (void*)ptr; \ cb_data.args.hipMemPtrGetInfo.size = (size_t*)size; \ }; // hipMemRangeGetAttribute[('void*', 'data'), ('size_t', 'data_size'), ('hipMemRangeAttribute', 'attribute'), ('const void*', 'dev_ptr'), ('size_t', 'count')] #define INIT_hipMemRangeGetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemRangeGetAttribute.data = (void*)data; \ cb_data.args.hipMemRangeGetAttribute.data_size = (size_t)data_size; \ cb_data.args.hipMemRangeGetAttribute.attribute = (hipMemRangeAttribute)attribute; \ cb_data.args.hipMemRangeGetAttribute.dev_ptr = (const void*)dev_ptr; \ cb_data.args.hipMemRangeGetAttribute.count = (size_t)count; \ }; // hipMemRangeGetAttributes[('void**', 'data'), ('size_t*', 'data_sizes'), ('hipMemRangeAttribute*', 'attributes'), ('size_t', 'num_attributes'), ('const void*', 'dev_ptr'), ('size_t', 'count')] #define INIT_hipMemRangeGetAttributes_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemRangeGetAttributes.data = (void**)data; \ cb_data.args.hipMemRangeGetAttributes.data_sizes = (size_t*)data_sizes; \ cb_data.args.hipMemRangeGetAttributes.attributes = (hipMemRangeAttribute*)attributes; \ cb_data.args.hipMemRangeGetAttributes.num_attributes = (size_t)num_attributes; \ cb_data.args.hipMemRangeGetAttributes.dev_ptr = (const void*)dev_ptr; \ cb_data.args.hipMemRangeGetAttributes.count = (size_t)count; \ }; // hipMemRelease[('hipMemGenericAllocationHandle_t', 'handle')] #define INIT_hipMemRelease_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemRelease.handle = (hipMemGenericAllocationHandle_t)handle; \ }; // hipMemRetainAllocationHandle[('hipMemGenericAllocationHandle_t*', 'handle'), ('void*', 'addr')] #define INIT_hipMemRetainAllocationHandle_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemRetainAllocationHandle.handle = (hipMemGenericAllocationHandle_t*)handle; \ cb_data.args.hipMemRetainAllocationHandle.addr = (void*)addr; \ }; // hipMemSetAccess[('void*', 'ptr'), ('size_t', 'size'), ('const hipMemAccessDesc*', 'desc'), ('size_t', 'count')] #define INIT_hipMemSetAccess_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemSetAccess.ptr = (void*)ptr; \ cb_data.args.hipMemSetAccess.size = (size_t)size; \ cb_data.args.hipMemSetAccess.desc = (const hipMemAccessDesc*)desc; \ cb_data.args.hipMemSetAccess.count = (size_t)count; \ }; // hipMemUnmap[('void*', 'ptr'), ('size_t', 'size')] #define INIT_hipMemUnmap_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemUnmap.ptr = (void*)ptr; \ cb_data.args.hipMemUnmap.size = (size_t)size; \ }; // hipMemcpy[('void*', 'dst'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('hipMemcpyKind', 'kind')] #define INIT_hipMemcpy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy.dst = (void*)dst; \ cb_data.args.hipMemcpy.src = (const void*)src; \ cb_data.args.hipMemcpy.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemcpy.kind = (hipMemcpyKind)kind; \ }; // hipMemcpy2D[('void*', 'dst'), ('size_t', 'dpitch'), ('const void*', 'src'), ('size_t', 'spitch'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind')] #define INIT_hipMemcpy2D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy2D.dst = (void*)dst; \ cb_data.args.hipMemcpy2D.dpitch = (size_t)dpitch; \ cb_data.args.hipMemcpy2D.src = (const void*)src; \ cb_data.args.hipMemcpy2D.spitch = (size_t)spitch; \ cb_data.args.hipMemcpy2D.width = (size_t)width; \ cb_data.args.hipMemcpy2D.height = (size_t)height; \ cb_data.args.hipMemcpy2D.kind = (hipMemcpyKind)kind; \ }; // hipMemcpy2DAsync[('void*', 'dst'), ('size_t', 'dpitch'), ('const void*', 'src'), ('size_t', 'spitch'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] #define INIT_hipMemcpy2DAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy2DAsync.dst = (void*)dst; \ cb_data.args.hipMemcpy2DAsync.dpitch = (size_t)dpitch; \ cb_data.args.hipMemcpy2DAsync.src = (const void*)src; \ cb_data.args.hipMemcpy2DAsync.spitch = (size_t)spitch; \ cb_data.args.hipMemcpy2DAsync.width = (size_t)width; \ cb_data.args.hipMemcpy2DAsync.height = (size_t)height; \ cb_data.args.hipMemcpy2DAsync.kind = (hipMemcpyKind)kind; \ cb_data.args.hipMemcpy2DAsync.stream = (hipStream_t)stream; \ }; // hipMemcpy2DFromArray[('void*', 'dst'), ('size_t', 'dpitch'), ('hipArray_const_t', 'src'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind')] #define INIT_hipMemcpy2DFromArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy2DFromArray.dst = (void*)dst; \ cb_data.args.hipMemcpy2DFromArray.dpitch = (size_t)dpitch; \ cb_data.args.hipMemcpy2DFromArray.src = (hipArray_const_t)src; \ cb_data.args.hipMemcpy2DFromArray.wOffset = (size_t)wOffsetSrc; \ cb_data.args.hipMemcpy2DFromArray.hOffset = (size_t)hOffset; \ cb_data.args.hipMemcpy2DFromArray.width = (size_t)width; \ cb_data.args.hipMemcpy2DFromArray.height = (size_t)height; \ cb_data.args.hipMemcpy2DFromArray.kind = (hipMemcpyKind)kind; \ }; // hipMemcpy2DFromArrayAsync[('void*', 'dst'), ('size_t', 'dpitch'), ('hipArray_const_t', 'src'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] #define INIT_hipMemcpy2DFromArrayAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy2DFromArrayAsync.dst = (void*)dst; \ cb_data.args.hipMemcpy2DFromArrayAsync.dpitch = (size_t)dpitch; \ cb_data.args.hipMemcpy2DFromArrayAsync.src = (hipArray_const_t)src; \ cb_data.args.hipMemcpy2DFromArrayAsync.wOffset = (size_t)wOffsetSrc; \ cb_data.args.hipMemcpy2DFromArrayAsync.hOffset = (size_t)hOffsetSrc; \ cb_data.args.hipMemcpy2DFromArrayAsync.width = (size_t)width; \ cb_data.args.hipMemcpy2DFromArrayAsync.height = (size_t)height; \ cb_data.args.hipMemcpy2DFromArrayAsync.kind = (hipMemcpyKind)kind; \ cb_data.args.hipMemcpy2DFromArrayAsync.stream = (hipStream_t)stream; \ }; // hipMemcpy2DToArray[('hipArray*', 'dst'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('const void*', 'src'), ('size_t', 'spitch'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind')] #define INIT_hipMemcpy2DToArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy2DToArray.dst = (hipArray*)dst; \ cb_data.args.hipMemcpy2DToArray.wOffset = (size_t)wOffset; \ cb_data.args.hipMemcpy2DToArray.hOffset = (size_t)hOffset; \ cb_data.args.hipMemcpy2DToArray.src = (const void*)src; \ cb_data.args.hipMemcpy2DToArray.spitch = (size_t)spitch; \ cb_data.args.hipMemcpy2DToArray.width = (size_t)width; \ cb_data.args.hipMemcpy2DToArray.height = (size_t)height; \ cb_data.args.hipMemcpy2DToArray.kind = (hipMemcpyKind)kind; \ }; // hipMemcpy2DToArrayAsync[('hipArray*', 'dst'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('const void*', 'src'), ('size_t', 'spitch'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] #define INIT_hipMemcpy2DToArrayAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy2DToArrayAsync.dst = (hipArray*)dst; \ cb_data.args.hipMemcpy2DToArrayAsync.wOffset = (size_t)wOffset; \ cb_data.args.hipMemcpy2DToArrayAsync.hOffset = (size_t)hOffset; \ cb_data.args.hipMemcpy2DToArrayAsync.src = (const void*)src; \ cb_data.args.hipMemcpy2DToArrayAsync.spitch = (size_t)spitch; \ cb_data.args.hipMemcpy2DToArrayAsync.width = (size_t)width; \ cb_data.args.hipMemcpy2DToArrayAsync.height = (size_t)height; \ cb_data.args.hipMemcpy2DToArrayAsync.kind = (hipMemcpyKind)kind; \ cb_data.args.hipMemcpy2DToArrayAsync.stream = (hipStream_t)stream; \ }; // hipMemcpy3D[('const hipMemcpy3DParms*', 'p')] #define INIT_hipMemcpy3D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy3D.p = (const hipMemcpy3DParms*)p; \ }; // hipMemcpy3DAsync[('const hipMemcpy3DParms*', 'p'), ('hipStream_t', 'stream')] #define INIT_hipMemcpy3DAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpy3DAsync.p = (const hipMemcpy3DParms*)p; \ cb_data.args.hipMemcpy3DAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyAsync[('void*', 'dst'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyAsync.dst = (void*)dst; \ cb_data.args.hipMemcpyAsync.src = (const void*)src; \ cb_data.args.hipMemcpyAsync.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemcpyAsync.kind = (hipMemcpyKind)kind; \ cb_data.args.hipMemcpyAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyAtoH[('void*', 'dst'), ('hipArray*', 'srcArray'), ('size_t', 'srcOffset'), ('size_t', 'count')] #define INIT_hipMemcpyAtoH_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyAtoH.dst = (void*)dstHost; \ cb_data.args.hipMemcpyAtoH.srcArray = (hipArray*)srcArray; \ cb_data.args.hipMemcpyAtoH.srcOffset = (size_t)srcOffset; \ cb_data.args.hipMemcpyAtoH.count = (size_t)ByteCount; \ }; // hipMemcpyDtoD[('hipDeviceptr_t', 'dst'), ('hipDeviceptr_t', 'src'), ('size_t', 'sizeBytes')] #define INIT_hipMemcpyDtoD_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyDtoD.dst = (hipDeviceptr_t)dstDevice; \ cb_data.args.hipMemcpyDtoD.src = (hipDeviceptr_t)srcDevice; \ cb_data.args.hipMemcpyDtoD.sizeBytes = (size_t)ByteCount; \ }; // hipMemcpyDtoDAsync[('hipDeviceptr_t', 'dst'), ('hipDeviceptr_t', 'src'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyDtoDAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyDtoDAsync.dst = (hipDeviceptr_t)dstDevice; \ cb_data.args.hipMemcpyDtoDAsync.src = (hipDeviceptr_t)srcDevice; \ cb_data.args.hipMemcpyDtoDAsync.sizeBytes = (size_t)ByteCount; \ cb_data.args.hipMemcpyDtoDAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyDtoH[('void*', 'dst'), ('hipDeviceptr_t', 'src'), ('size_t', 'sizeBytes')] #define INIT_hipMemcpyDtoH_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyDtoH.dst = (void*)dstHost; \ cb_data.args.hipMemcpyDtoH.src = (hipDeviceptr_t)srcDevice; \ cb_data.args.hipMemcpyDtoH.sizeBytes = (size_t)ByteCount; \ }; // hipMemcpyDtoHAsync[('void*', 'dst'), ('hipDeviceptr_t', 'src'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyDtoHAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyDtoHAsync.dst = (void*)dstHost; \ cb_data.args.hipMemcpyDtoHAsync.src = (hipDeviceptr_t)srcDevice; \ cb_data.args.hipMemcpyDtoHAsync.sizeBytes = (size_t)ByteCount; \ cb_data.args.hipMemcpyDtoHAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyFromArray[('void*', 'dst'), ('hipArray_const_t', 'srcArray'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] #define INIT_hipMemcpyFromArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyFromArray.dst = (void*)dst; \ cb_data.args.hipMemcpyFromArray.srcArray = (hipArray_const_t)src; \ cb_data.args.hipMemcpyFromArray.wOffset = (size_t)wOffsetSrc; \ cb_data.args.hipMemcpyFromArray.hOffset = (size_t)hOffset; \ cb_data.args.hipMemcpyFromArray.count = (size_t)count; \ cb_data.args.hipMemcpyFromArray.kind = (hipMemcpyKind)kind; \ }; // hipMemcpyFromSymbol[('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'sizeBytes'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] #define INIT_hipMemcpyFromSymbol_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyFromSymbol.dst = (void*)dst; \ cb_data.args.hipMemcpyFromSymbol.symbol = (const void*)symbol; \ cb_data.args.hipMemcpyFromSymbol.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemcpyFromSymbol.offset = (size_t)offset; \ cb_data.args.hipMemcpyFromSymbol.kind = (hipMemcpyKind)kind; \ }; // hipMemcpyFromSymbolAsync[('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'sizeBytes'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyFromSymbolAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyFromSymbolAsync.dst = (void*)dst; \ cb_data.args.hipMemcpyFromSymbolAsync.symbol = (const void*)symbol; \ cb_data.args.hipMemcpyFromSymbolAsync.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemcpyFromSymbolAsync.offset = (size_t)offset; \ cb_data.args.hipMemcpyFromSymbolAsync.kind = (hipMemcpyKind)kind; \ cb_data.args.hipMemcpyFromSymbolAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyHtoA[('hipArray*', 'dstArray'), ('size_t', 'dstOffset'), ('const void*', 'srcHost'), ('size_t', 'count')] #define INIT_hipMemcpyHtoA_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyHtoA.dstArray = (hipArray*)dstArray; \ cb_data.args.hipMemcpyHtoA.dstOffset = (size_t)dstOffset; \ cb_data.args.hipMemcpyHtoA.srcHost = (const void*)srcHost; \ cb_data.args.hipMemcpyHtoA.count = (size_t)ByteCount; \ }; // hipMemcpyHtoD[('hipDeviceptr_t', 'dst'), ('void*', 'src'), ('size_t', 'sizeBytes')] #define INIT_hipMemcpyHtoD_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyHtoD.dst = (hipDeviceptr_t)dstDevice; \ cb_data.args.hipMemcpyHtoD.src = (void*)srcHost; \ cb_data.args.hipMemcpyHtoD.sizeBytes = (size_t)ByteCount; \ }; // hipMemcpyHtoDAsync[('hipDeviceptr_t', 'dst'), ('void*', 'src'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyHtoDAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyHtoDAsync.dst = (hipDeviceptr_t)dstDevice; \ cb_data.args.hipMemcpyHtoDAsync.src = (void*)srcHost; \ cb_data.args.hipMemcpyHtoDAsync.sizeBytes = (size_t)ByteCount; \ cb_data.args.hipMemcpyHtoDAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyParam2D[('const hip_Memcpy2D*', 'pCopy')] #define INIT_hipMemcpyParam2D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyParam2D.pCopy = (const hip_Memcpy2D*)pCopy; \ }; // hipMemcpyParam2DAsync[('const hip_Memcpy2D*', 'pCopy'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyParam2DAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyParam2DAsync.pCopy = (const hip_Memcpy2D*)pCopy; \ cb_data.args.hipMemcpyParam2DAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyPeer[('void*', 'dst'), ('int', 'dstDeviceId'), ('const void*', 'src'), ('int', 'srcDeviceId'), ('size_t', 'sizeBytes')] #define INIT_hipMemcpyPeer_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyPeer.dst = (void*)dst; \ cb_data.args.hipMemcpyPeer.dstDeviceId = (int)dstDevice; \ cb_data.args.hipMemcpyPeer.src = (const void*)src; \ cb_data.args.hipMemcpyPeer.srcDeviceId = (int)srcDevice; \ cb_data.args.hipMemcpyPeer.sizeBytes = (size_t)sizeBytes; \ }; // hipMemcpyPeerAsync[('void*', 'dst'), ('int', 'dstDeviceId'), ('const void*', 'src'), ('int', 'srcDevice'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyPeerAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyPeerAsync.dst = (void*)dst; \ cb_data.args.hipMemcpyPeerAsync.dstDeviceId = (int)dstDevice; \ cb_data.args.hipMemcpyPeerAsync.src = (const void*)src; \ cb_data.args.hipMemcpyPeerAsync.srcDevice = (int)srcDevice; \ cb_data.args.hipMemcpyPeerAsync.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemcpyPeerAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyToArray[('hipArray*', 'dst'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] #define INIT_hipMemcpyToArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyToArray.dst = (hipArray*)dst; \ cb_data.args.hipMemcpyToArray.wOffset = (size_t)wOffset; \ cb_data.args.hipMemcpyToArray.hOffset = (size_t)hOffset; \ cb_data.args.hipMemcpyToArray.src = (const void*)src; \ cb_data.args.hipMemcpyToArray.count = (size_t)count; \ cb_data.args.hipMemcpyToArray.kind = (hipMemcpyKind)kind; \ }; // hipMemcpyToSymbol[('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] #define INIT_hipMemcpyToSymbol_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyToSymbol.symbol = (const void*)symbol; \ cb_data.args.hipMemcpyToSymbol.src = (const void*)src; \ cb_data.args.hipMemcpyToSymbol.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemcpyToSymbol.offset = (size_t)offset; \ cb_data.args.hipMemcpyToSymbol.kind = (hipMemcpyKind)kind; \ }; // hipMemcpyToSymbolAsync[('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyToSymbolAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyToSymbolAsync.symbol = (const void*)symbol; \ cb_data.args.hipMemcpyToSymbolAsync.src = (const void*)src; \ cb_data.args.hipMemcpyToSymbolAsync.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemcpyToSymbolAsync.offset = (size_t)offset; \ cb_data.args.hipMemcpyToSymbolAsync.kind = (hipMemcpyKind)kind; \ cb_data.args.hipMemcpyToSymbolAsync.stream = (hipStream_t)stream; \ }; // hipMemcpyWithStream[('void*', 'dst'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] #define INIT_hipMemcpyWithStream_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemcpyWithStream.dst = (void*)dst; \ cb_data.args.hipMemcpyWithStream.src = (const void*)src; \ cb_data.args.hipMemcpyWithStream.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemcpyWithStream.kind = (hipMemcpyKind)kind; \ cb_data.args.hipMemcpyWithStream.stream = (hipStream_t)stream; \ }; // hipMemset[('void*', 'dst'), ('int', 'value'), ('size_t', 'sizeBytes')] #define INIT_hipMemset_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemset.dst = (void*)dst; \ cb_data.args.hipMemset.value = (int)value; \ cb_data.args.hipMemset.sizeBytes = (size_t)sizeBytes; \ }; // hipMemset2D[('void*', 'dst'), ('size_t', 'pitch'), ('int', 'value'), ('size_t', 'width'), ('size_t', 'height')] #define INIT_hipMemset2D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemset2D.dst = (void*)dst; \ cb_data.args.hipMemset2D.pitch = (size_t)pitch; \ cb_data.args.hipMemset2D.value = (int)value; \ cb_data.args.hipMemset2D.width = (size_t)width; \ cb_data.args.hipMemset2D.height = (size_t)height; \ }; // hipMemset2DAsync[('void*', 'dst'), ('size_t', 'pitch'), ('int', 'value'), ('size_t', 'width'), ('size_t', 'height'), ('hipStream_t', 'stream')] #define INIT_hipMemset2DAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemset2DAsync.dst = (void*)dst; \ cb_data.args.hipMemset2DAsync.pitch = (size_t)pitch; \ cb_data.args.hipMemset2DAsync.value = (int)value; \ cb_data.args.hipMemset2DAsync.width = (size_t)width; \ cb_data.args.hipMemset2DAsync.height = (size_t)height; \ cb_data.args.hipMemset2DAsync.stream = (hipStream_t)stream; \ }; // hipMemset3D[('hipPitchedPtr', 'pitchedDevPtr'), ('int', 'value'), ('hipExtent', 'extent')] #define INIT_hipMemset3D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemset3D.pitchedDevPtr = (hipPitchedPtr)pitchedDevPtr; \ cb_data.args.hipMemset3D.value = (int)value; \ cb_data.args.hipMemset3D.extent = (hipExtent)extent; \ }; // hipMemset3DAsync[('hipPitchedPtr', 'pitchedDevPtr'), ('int', 'value'), ('hipExtent', 'extent'), ('hipStream_t', 'stream')] #define INIT_hipMemset3DAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemset3DAsync.pitchedDevPtr = (hipPitchedPtr)pitchedDevPtr; \ cb_data.args.hipMemset3DAsync.value = (int)value; \ cb_data.args.hipMemset3DAsync.extent = (hipExtent)extent; \ cb_data.args.hipMemset3DAsync.stream = (hipStream_t)stream; \ }; // hipMemsetAsync[('void*', 'dst'), ('int', 'value'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] #define INIT_hipMemsetAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemsetAsync.dst = (void*)dst; \ cb_data.args.hipMemsetAsync.value = (int)value; \ cb_data.args.hipMemsetAsync.sizeBytes = (size_t)sizeBytes; \ cb_data.args.hipMemsetAsync.stream = (hipStream_t)stream; \ }; // hipMemsetD16[('hipDeviceptr_t', 'dest'), ('unsigned short', 'value'), ('size_t', 'count')] #define INIT_hipMemsetD16_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemsetD16.dest = (hipDeviceptr_t)dst; \ cb_data.args.hipMemsetD16.value = (unsigned short)value; \ cb_data.args.hipMemsetD16.count = (size_t)count; \ }; // hipMemsetD16Async[('hipDeviceptr_t', 'dest'), ('unsigned short', 'value'), ('size_t', 'count'), ('hipStream_t', 'stream')] #define INIT_hipMemsetD16Async_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemsetD16Async.dest = (hipDeviceptr_t)dst; \ cb_data.args.hipMemsetD16Async.value = (unsigned short)value; \ cb_data.args.hipMemsetD16Async.count = (size_t)count; \ cb_data.args.hipMemsetD16Async.stream = (hipStream_t)stream; \ }; // hipMemsetD32[('hipDeviceptr_t', 'dest'), ('int', 'value'), ('size_t', 'count')] #define INIT_hipMemsetD32_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemsetD32.dest = (hipDeviceptr_t)dst; \ cb_data.args.hipMemsetD32.value = (int)value; \ cb_data.args.hipMemsetD32.count = (size_t)count; \ }; // hipMemsetD32Async[('hipDeviceptr_t', 'dst'), ('int', 'value'), ('size_t', 'count'), ('hipStream_t', 'stream')] #define INIT_hipMemsetD32Async_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemsetD32Async.dst = (hipDeviceptr_t)dst; \ cb_data.args.hipMemsetD32Async.value = (int)value; \ cb_data.args.hipMemsetD32Async.count = (size_t)count; \ cb_data.args.hipMemsetD32Async.stream = (hipStream_t)stream; \ }; // hipMemsetD8[('hipDeviceptr_t', 'dest'), ('unsigned char', 'value'), ('size_t', 'count')] #define INIT_hipMemsetD8_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemsetD8.dest = (hipDeviceptr_t)dst; \ cb_data.args.hipMemsetD8.value = (unsigned char)value; \ cb_data.args.hipMemsetD8.count = (size_t)count; \ }; // hipMemsetD8Async[('hipDeviceptr_t', 'dest'), ('unsigned char', 'value'), ('size_t', 'count'), ('hipStream_t', 'stream')] #define INIT_hipMemsetD8Async_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMemsetD8Async.dest = (hipDeviceptr_t)dst; \ cb_data.args.hipMemsetD8Async.value = (unsigned char)value; \ cb_data.args.hipMemsetD8Async.count = (size_t)count; \ cb_data.args.hipMemsetD8Async.stream = (hipStream_t)stream; \ }; // hipMipmappedArrayCreate[('hipMipmappedArray_t*', 'pHandle'), ('HIP_ARRAY3D_DESCRIPTOR*', 'pMipmappedArrayDesc'), ('unsigned int', 'numMipmapLevels')] #define INIT_hipMipmappedArrayCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMipmappedArrayCreate.pHandle = (hipMipmappedArray_t*)mipmapped_array_pptr; \ cb_data.args.hipMipmappedArrayCreate.pMipmappedArrayDesc = (HIP_ARRAY3D_DESCRIPTOR*)mipmapped_array_desc_ptr; \ cb_data.args.hipMipmappedArrayCreate.numMipmapLevels = (unsigned int)num_mipmap_levels; \ }; // hipMipmappedArrayDestroy[('hipMipmappedArray_t', 'hMipmappedArray')] #define INIT_hipMipmappedArrayDestroy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMipmappedArrayDestroy.hMipmappedArray = (hipMipmappedArray_t)mipmapped_array_ptr; \ }; // hipMipmappedArrayGetLevel[('hipArray_t*', 'pLevelArray'), ('hipMipmappedArray_t', 'hMipMappedArray'), ('unsigned int', 'level')] #define INIT_hipMipmappedArrayGetLevel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipMipmappedArrayGetLevel.pLevelArray = (hipArray_t*)level_array_pptr; \ cb_data.args.hipMipmappedArrayGetLevel.hMipMappedArray = (hipMipmappedArray_t)mipmapped_array_ptr; \ cb_data.args.hipMipmappedArrayGetLevel.level = (unsigned int)mip_level; \ }; // hipModuleGetFunction[('hipFunction_t*', 'function'), ('hipModule_t', 'module'), ('const char*', 'kname')] #define INIT_hipModuleGetFunction_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleGetFunction.function = (hipFunction_t*)hfunc; \ cb_data.args.hipModuleGetFunction.module = (hipModule_t)hmod; \ cb_data.args.hipModuleGetFunction.kname = (name) ? strdup(name) : NULL; \ }; // hipModuleGetGlobal[('hipDeviceptr_t*', 'dptr'), ('size_t*', 'bytes'), ('hipModule_t', 'hmod'), ('const char*', 'name')] #define INIT_hipModuleGetGlobal_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleGetGlobal.dptr = (hipDeviceptr_t*)dptr; \ cb_data.args.hipModuleGetGlobal.bytes = (size_t*)bytes; \ cb_data.args.hipModuleGetGlobal.hmod = (hipModule_t)hmod; \ cb_data.args.hipModuleGetGlobal.name = (name) ? strdup(name) : NULL; \ }; // hipModuleGetTexRef[('textureReference**', 'texRef'), ('hipModule_t', 'hmod'), ('const char*', 'name')] #define INIT_hipModuleGetTexRef_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleGetTexRef.texRef = (textureReference**)texRef; \ cb_data.args.hipModuleGetTexRef.hmod = (hipModule_t)hmod; \ cb_data.args.hipModuleGetTexRef.name = (name) ? strdup(name) : NULL; \ }; // hipModuleLaunchCooperativeKernel[('hipFunction_t', 'f'), ('unsigned int', 'gridDimX'), ('unsigned int', 'gridDimY'), ('unsigned int', 'gridDimZ'), ('unsigned int', 'blockDimX'), ('unsigned int', 'blockDimY'), ('unsigned int', 'blockDimZ'), ('unsigned int', 'sharedMemBytes'), ('hipStream_t', 'stream'), ('void**', 'kernelParams')] #define INIT_hipModuleLaunchCooperativeKernel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleLaunchCooperativeKernel.f = (hipFunction_t)f; \ cb_data.args.hipModuleLaunchCooperativeKernel.gridDimX = (unsigned int)gridDimX; \ cb_data.args.hipModuleLaunchCooperativeKernel.gridDimY = (unsigned int)gridDimY; \ cb_data.args.hipModuleLaunchCooperativeKernel.gridDimZ = (unsigned int)gridDimZ; \ cb_data.args.hipModuleLaunchCooperativeKernel.blockDimX = (unsigned int)blockDimX; \ cb_data.args.hipModuleLaunchCooperativeKernel.blockDimY = (unsigned int)blockDimY; \ cb_data.args.hipModuleLaunchCooperativeKernel.blockDimZ = (unsigned int)blockDimZ; \ cb_data.args.hipModuleLaunchCooperativeKernel.sharedMemBytes = (unsigned int)sharedMemBytes; \ cb_data.args.hipModuleLaunchCooperativeKernel.stream = (hipStream_t)stream; \ cb_data.args.hipModuleLaunchCooperativeKernel.kernelParams = (void**)kernelParams; \ }; // hipModuleLaunchCooperativeKernelMultiDevice[('hipFunctionLaunchParams*', 'launchParamsList'), ('unsigned int', 'numDevices'), ('unsigned int', 'flags')] #define INIT_hipModuleLaunchCooperativeKernelMultiDevice_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleLaunchCooperativeKernelMultiDevice.launchParamsList = (hipFunctionLaunchParams*)launchParamsList; \ cb_data.args.hipModuleLaunchCooperativeKernelMultiDevice.numDevices = (unsigned int)numDevices; \ cb_data.args.hipModuleLaunchCooperativeKernelMultiDevice.flags = (unsigned int)flags; \ }; // hipModuleLaunchKernel[('hipFunction_t', 'f'), ('unsigned int', 'gridDimX'), ('unsigned int', 'gridDimY'), ('unsigned int', 'gridDimZ'), ('unsigned int', 'blockDimX'), ('unsigned int', 'blockDimY'), ('unsigned int', 'blockDimZ'), ('unsigned int', 'sharedMemBytes'), ('hipStream_t', 'stream'), ('void**', 'kernelParams'), ('void**', 'extra')] #define INIT_hipModuleLaunchKernel_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleLaunchKernel.f = (hipFunction_t)f; \ cb_data.args.hipModuleLaunchKernel.gridDimX = (unsigned int)gridDimX; \ cb_data.args.hipModuleLaunchKernel.gridDimY = (unsigned int)gridDimY; \ cb_data.args.hipModuleLaunchKernel.gridDimZ = (unsigned int)gridDimZ; \ cb_data.args.hipModuleLaunchKernel.blockDimX = (unsigned int)blockDimX; \ cb_data.args.hipModuleLaunchKernel.blockDimY = (unsigned int)blockDimY; \ cb_data.args.hipModuleLaunchKernel.blockDimZ = (unsigned int)blockDimZ; \ cb_data.args.hipModuleLaunchKernel.sharedMemBytes = (unsigned int)sharedMemBytes; \ cb_data.args.hipModuleLaunchKernel.stream = (hipStream_t)hStream; \ cb_data.args.hipModuleLaunchKernel.kernelParams = (void**)kernelParams; \ cb_data.args.hipModuleLaunchKernel.extra = (void**)extra; \ }; // hipModuleLoad[('hipModule_t*', 'module'), ('const char*', 'fname')] #define INIT_hipModuleLoad_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleLoad.module = (hipModule_t*)module; \ cb_data.args.hipModuleLoad.fname = (fname) ? strdup(fname) : NULL; \ }; // hipModuleLoadData[('hipModule_t*', 'module'), ('const void*', 'image')] #define INIT_hipModuleLoadData_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleLoadData.module = (hipModule_t*)module; \ cb_data.args.hipModuleLoadData.image = (const void*)image; \ }; // hipModuleLoadDataEx[('hipModule_t*', 'module'), ('const void*', 'image'), ('unsigned int', 'numOptions'), ('hipJitOption*', 'options'), ('void**', 'optionsValues')] #define INIT_hipModuleLoadDataEx_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleLoadDataEx.module = (hipModule_t*)module; \ cb_data.args.hipModuleLoadDataEx.image = (const void*)image; \ cb_data.args.hipModuleLoadDataEx.numOptions = (unsigned int)numOptions; \ cb_data.args.hipModuleLoadDataEx.options = (hipJitOption*)options; \ cb_data.args.hipModuleLoadDataEx.optionsValues = (void**)optionsValues; \ }; // hipModuleOccupancyMaxActiveBlocksPerMultiprocessor[('int*', 'numBlocks'), ('hipFunction_t', 'f'), ('int', 'blockSize'), ('size_t', 'dynSharedMemPerBlk')] #define INIT_hipModuleOccupancyMaxActiveBlocksPerMultiprocessor_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks = (int*)numBlocks; \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.f = (hipFunction_t)f; \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.blockSize = (int)blockSize; \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.dynSharedMemPerBlk = (size_t)dynSharedMemPerBlk; \ }; // hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags[('int*', 'numBlocks'), ('hipFunction_t', 'f'), ('int', 'blockSize'), ('size_t', 'dynSharedMemPerBlk'), ('unsigned int', 'flags')] #define INIT_hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks = (int*)numBlocks; \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.f = (hipFunction_t)f; \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.blockSize = (int)blockSize; \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.dynSharedMemPerBlk = (size_t)dynSharedMemPerBlk; \ cb_data.args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.flags = (unsigned int)flags; \ }; // hipModuleOccupancyMaxPotentialBlockSize[('int*', 'gridSize'), ('int*', 'blockSize'), ('hipFunction_t', 'f'), ('size_t', 'dynSharedMemPerBlk'), ('int', 'blockSizeLimit')] #define INIT_hipModuleOccupancyMaxPotentialBlockSize_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSize.gridSize = (int*)gridSize; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSize.blockSize = (int*)blockSize; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSize.f = (hipFunction_t)f; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSize.dynSharedMemPerBlk = (size_t)dynSharedMemPerBlk; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSize.blockSizeLimit = (int)blockSizeLimit; \ }; // hipModuleOccupancyMaxPotentialBlockSizeWithFlags[('int*', 'gridSize'), ('int*', 'blockSize'), ('hipFunction_t', 'f'), ('size_t', 'dynSharedMemPerBlk'), ('int', 'blockSizeLimit'), ('unsigned int', 'flags')] #define INIT_hipModuleOccupancyMaxPotentialBlockSizeWithFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.gridSize = (int*)gridSize; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.blockSize = (int*)blockSize; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.f = (hipFunction_t)f; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.dynSharedMemPerBlk = (size_t)dynSharedMemPerBlk; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.blockSizeLimit = (int)blockSizeLimit; \ cb_data.args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.flags = (unsigned int)flags; \ }; // hipModuleUnload[('hipModule_t', 'module')] #define INIT_hipModuleUnload_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipModuleUnload.module = (hipModule_t)hmod; \ }; // hipOccupancyMaxActiveBlocksPerMultiprocessor[('int*', 'numBlocks'), ('const void*', 'f'), ('int', 'blockSize'), ('size_t', 'dynamicSMemSize')] #define INIT_hipOccupancyMaxActiveBlocksPerMultiprocessor_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks = (int*)numBlocks; \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessor.f = (const void*)f; \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessor.blockSize = (int)blockSize; \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessor.dynamicSMemSize = (size_t)dynamicSMemSize; \ }; // hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags[('int*', 'numBlocks'), ('const void*', 'f'), ('int', 'blockSize'), ('size_t', 'dynamicSMemSize'), ('unsigned int', 'flags')] #define INIT_hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks = (int*)numBlocks; \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.f = (const void*)f; \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.blockSize = (int)blockSize; \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.dynamicSMemSize = (size_t)dynamicSMemSize; \ cb_data.args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.flags = (unsigned int)flags; \ }; // hipOccupancyMaxPotentialBlockSize[('int*', 'gridSize'), ('int*', 'blockSize'), ('const void*', 'f'), ('size_t', 'dynSharedMemPerBlk'), ('int', 'blockSizeLimit')] #define INIT_hipOccupancyMaxPotentialBlockSize_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipOccupancyMaxPotentialBlockSize.gridSize = (int*)gridSize; \ cb_data.args.hipOccupancyMaxPotentialBlockSize.blockSize = (int*)blockSize; \ cb_data.args.hipOccupancyMaxPotentialBlockSize.f = (const void*)f; \ cb_data.args.hipOccupancyMaxPotentialBlockSize.dynSharedMemPerBlk = (size_t)dynSharedMemPerBlk; \ cb_data.args.hipOccupancyMaxPotentialBlockSize.blockSizeLimit = (int)blockSizeLimit; \ }; // hipPeekAtLastError[] #define INIT_hipPeekAtLastError_CB_ARGS_DATA(cb_data) { \ }; // hipPointerGetAttribute[('void*', 'data'), ('hipPointer_attribute', 'attribute'), ('hipDeviceptr_t', 'ptr')] #define INIT_hipPointerGetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipPointerGetAttribute.data = (void*)data; \ cb_data.args.hipPointerGetAttribute.attribute = (hipPointer_attribute)attribute; \ cb_data.args.hipPointerGetAttribute.ptr = (hipDeviceptr_t)ptr; \ }; // hipPointerGetAttributes[('hipPointerAttribute_t*', 'attributes'), ('const void*', 'ptr')] #define INIT_hipPointerGetAttributes_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipPointerGetAttributes.attributes = (hipPointerAttribute_t*)attributes; \ cb_data.args.hipPointerGetAttributes.ptr = (const void*)ptr; \ }; // hipPointerSetAttribute[('const void*', 'value'), ('hipPointer_attribute', 'attribute'), ('hipDeviceptr_t', 'ptr')] #define INIT_hipPointerSetAttribute_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipPointerSetAttribute.value = (const void*)value; \ cb_data.args.hipPointerSetAttribute.attribute = (hipPointer_attribute)attribute; \ cb_data.args.hipPointerSetAttribute.ptr = (hipDeviceptr_t)ptr; \ }; // hipProfilerStart[] #define INIT_hipProfilerStart_CB_ARGS_DATA(cb_data) { \ }; // hipProfilerStop[] #define INIT_hipProfilerStop_CB_ARGS_DATA(cb_data) { \ }; // hipRuntimeGetVersion[('int*', 'runtimeVersion')] #define INIT_hipRuntimeGetVersion_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipRuntimeGetVersion.runtimeVersion = (int*)runtimeVersion; \ }; // hipSetDevice[('int', 'deviceId')] #define INIT_hipSetDevice_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipSetDevice.deviceId = (int)device; \ }; // hipSetDeviceFlags[('unsigned int', 'flags')] #define INIT_hipSetDeviceFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipSetDeviceFlags.flags = (unsigned int)flags; \ }; // hipSetupArgument[('const void*', 'arg'), ('size_t', 'size'), ('size_t', 'offset')] #define INIT_hipSetupArgument_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipSetupArgument.arg = (const void*)arg; \ cb_data.args.hipSetupArgument.size = (size_t)size; \ cb_data.args.hipSetupArgument.offset = (size_t)offset; \ }; // hipSignalExternalSemaphoresAsync[('const hipExternalSemaphore_t*', 'extSemArray'), ('const hipExternalSemaphoreSignalParams*', 'paramsArray'), ('unsigned int', 'numExtSems'), ('hipStream_t', 'stream')] #define INIT_hipSignalExternalSemaphoresAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipSignalExternalSemaphoresAsync.extSemArray = (const hipExternalSemaphore_t*)extSemArray; \ cb_data.args.hipSignalExternalSemaphoresAsync.paramsArray = (const hipExternalSemaphoreSignalParams*)paramsArray; \ cb_data.args.hipSignalExternalSemaphoresAsync.numExtSems = (unsigned int)numExtSems; \ cb_data.args.hipSignalExternalSemaphoresAsync.stream = (hipStream_t)stream; \ }; // hipStreamAddCallback[('hipStream_t', 'stream'), ('hipStreamCallback_t', 'callback'), ('void*', 'userData'), ('unsigned int', 'flags')] #define INIT_hipStreamAddCallback_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamAddCallback.stream = (hipStream_t)stream; \ cb_data.args.hipStreamAddCallback.callback = (hipStreamCallback_t)callback; \ cb_data.args.hipStreamAddCallback.userData = (void*)userData; \ cb_data.args.hipStreamAddCallback.flags = (unsigned int)flags; \ }; // hipStreamAttachMemAsync[('hipStream_t', 'stream'), ('void*', 'dev_ptr'), ('size_t', 'length'), ('unsigned int', 'flags')] #define INIT_hipStreamAttachMemAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamAttachMemAsync.stream = (hipStream_t)stream; \ cb_data.args.hipStreamAttachMemAsync.dev_ptr = (void*)dev_ptr; \ cb_data.args.hipStreamAttachMemAsync.length = (size_t)length; \ cb_data.args.hipStreamAttachMemAsync.flags = (unsigned int)flags; \ }; // hipStreamBeginCapture[('hipStream_t', 'stream'), ('hipStreamCaptureMode', 'mode')] #define INIT_hipStreamBeginCapture_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamBeginCapture.stream = (hipStream_t)stream; \ cb_data.args.hipStreamBeginCapture.mode = (hipStreamCaptureMode)mode; \ }; // hipStreamCreate[('hipStream_t*', 'stream')] #define INIT_hipStreamCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamCreate.stream = (hipStream_t*)stream; \ }; // hipStreamCreateWithFlags[('hipStream_t*', 'stream'), ('unsigned int', 'flags')] #define INIT_hipStreamCreateWithFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamCreateWithFlags.stream = (hipStream_t*)stream; \ cb_data.args.hipStreamCreateWithFlags.flags = (unsigned int)flags; \ }; // hipStreamCreateWithPriority[('hipStream_t*', 'stream'), ('unsigned int', 'flags'), ('int', 'priority')] #define INIT_hipStreamCreateWithPriority_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamCreateWithPriority.stream = (hipStream_t*)stream; \ cb_data.args.hipStreamCreateWithPriority.flags = (unsigned int)flags; \ cb_data.args.hipStreamCreateWithPriority.priority = (int)priority; \ }; // hipStreamDestroy[('hipStream_t', 'stream')] #define INIT_hipStreamDestroy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamDestroy.stream = (hipStream_t)stream; \ }; // hipStreamEndCapture[('hipStream_t', 'stream'), ('hipGraph_t*', 'pGraph')] #define INIT_hipStreamEndCapture_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamEndCapture.stream = (hipStream_t)stream; \ cb_data.args.hipStreamEndCapture.pGraph = (hipGraph_t*)pGraph; \ }; // hipStreamGetCaptureInfo[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'pCaptureStatus'), ('unsigned long long*', 'pId')] #define INIT_hipStreamGetCaptureInfo_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamGetCaptureInfo.stream = (hipStream_t)stream; \ cb_data.args.hipStreamGetCaptureInfo.pCaptureStatus = (hipStreamCaptureStatus*)pCaptureStatus; \ cb_data.args.hipStreamGetCaptureInfo.pId = (unsigned long long*)pId; \ }; // hipStreamGetCaptureInfo_v2[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'captureStatus_out'), ('unsigned long long*', 'id_out'), ('hipGraph_t*', 'graph_out'), ('const hipGraphNode_t**', 'dependencies_out'), ('size_t*', 'numDependencies_out')] #define INIT_hipStreamGetCaptureInfo_v2_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamGetCaptureInfo_v2.stream = (hipStream_t)stream; \ cb_data.args.hipStreamGetCaptureInfo_v2.captureStatus_out = (hipStreamCaptureStatus*)captureStatus_out; \ cb_data.args.hipStreamGetCaptureInfo_v2.id_out = (unsigned long long*)id_out; \ cb_data.args.hipStreamGetCaptureInfo_v2.graph_out = (hipGraph_t*)graph_out; \ cb_data.args.hipStreamGetCaptureInfo_v2.dependencies_out = (const hipGraphNode_t**)dependencies_out; \ cb_data.args.hipStreamGetCaptureInfo_v2.numDependencies_out = (size_t*)numDependencies_out; \ }; // hipStreamGetDevice[('hipStream_t', 'stream'), ('hipDevice_t*', 'device')] #define INIT_hipStreamGetDevice_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamGetDevice.stream = (hipStream_t)stream; \ cb_data.args.hipStreamGetDevice.device = (hipDevice_t*)device; \ }; // hipStreamGetFlags[('hipStream_t', 'stream'), ('unsigned int*', 'flags')] #define INIT_hipStreamGetFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamGetFlags.stream = (hipStream_t)stream; \ cb_data.args.hipStreamGetFlags.flags = (unsigned int*)flags; \ }; // hipStreamGetPriority[('hipStream_t', 'stream'), ('int*', 'priority')] #define INIT_hipStreamGetPriority_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamGetPriority.stream = (hipStream_t)stream; \ cb_data.args.hipStreamGetPriority.priority = (int*)priority; \ }; // hipStreamIsCapturing[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'pCaptureStatus')] #define INIT_hipStreamIsCapturing_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamIsCapturing.stream = (hipStream_t)stream; \ cb_data.args.hipStreamIsCapturing.pCaptureStatus = (hipStreamCaptureStatus*)pCaptureStatus; \ }; // hipStreamQuery[('hipStream_t', 'stream')] #define INIT_hipStreamQuery_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamQuery.stream = (hipStream_t)stream; \ }; // hipStreamSynchronize[('hipStream_t', 'stream')] #define INIT_hipStreamSynchronize_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamSynchronize.stream = (hipStream_t)stream; \ }; // hipStreamUpdateCaptureDependencies[('hipStream_t', 'stream'), ('hipGraphNode_t*', 'dependencies'), ('size_t', 'numDependencies'), ('unsigned int', 'flags')] #define INIT_hipStreamUpdateCaptureDependencies_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamUpdateCaptureDependencies.stream = (hipStream_t)stream; \ cb_data.args.hipStreamUpdateCaptureDependencies.dependencies = (hipGraphNode_t*)dependencies; \ cb_data.args.hipStreamUpdateCaptureDependencies.numDependencies = (size_t)numDependencies; \ cb_data.args.hipStreamUpdateCaptureDependencies.flags = (unsigned int)flags; \ }; // hipStreamWaitEvent[('hipStream_t', 'stream'), ('hipEvent_t', 'event'), ('unsigned int', 'flags')] #define INIT_hipStreamWaitEvent_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamWaitEvent.stream = (hipStream_t)stream; \ cb_data.args.hipStreamWaitEvent.event = (hipEvent_t)event; \ cb_data.args.hipStreamWaitEvent.flags = (unsigned int)flags; \ }; // hipStreamWaitValue32[('hipStream_t', 'stream'), ('void*', 'ptr'), ('unsigned int', 'value'), ('unsigned int', 'flags'), ('unsigned int', 'mask')] #define INIT_hipStreamWaitValue32_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamWaitValue32.stream = (hipStream_t)stream; \ cb_data.args.hipStreamWaitValue32.ptr = (void*)ptr; \ cb_data.args.hipStreamWaitValue32.value = (unsigned int)value; \ cb_data.args.hipStreamWaitValue32.flags = (unsigned int)flags; \ cb_data.args.hipStreamWaitValue32.mask = (unsigned int)mask; \ }; // hipStreamWaitValue64[('hipStream_t', 'stream'), ('void*', 'ptr'), ('uint64_t', 'value'), ('unsigned int', 'flags'), ('uint64_t', 'mask')] #define INIT_hipStreamWaitValue64_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamWaitValue64.stream = (hipStream_t)stream; \ cb_data.args.hipStreamWaitValue64.ptr = (void*)ptr; \ cb_data.args.hipStreamWaitValue64.value = (uint64_t)value; \ cb_data.args.hipStreamWaitValue64.flags = (unsigned int)flags; \ cb_data.args.hipStreamWaitValue64.mask = (uint64_t)mask; \ }; // hipStreamWriteValue32[('hipStream_t', 'stream'), ('void*', 'ptr'), ('unsigned int', 'value'), ('unsigned int', 'flags')] #define INIT_hipStreamWriteValue32_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamWriteValue32.stream = (hipStream_t)stream; \ cb_data.args.hipStreamWriteValue32.ptr = (void*)ptr; \ cb_data.args.hipStreamWriteValue32.value = (unsigned int)value; \ cb_data.args.hipStreamWriteValue32.flags = (unsigned int)flags; \ }; // hipStreamWriteValue64[('hipStream_t', 'stream'), ('void*', 'ptr'), ('uint64_t', 'value'), ('unsigned int', 'flags')] #define INIT_hipStreamWriteValue64_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipStreamWriteValue64.stream = (hipStream_t)stream; \ cb_data.args.hipStreamWriteValue64.ptr = (void*)ptr; \ cb_data.args.hipStreamWriteValue64.value = (uint64_t)value; \ cb_data.args.hipStreamWriteValue64.flags = (unsigned int)flags; \ }; // hipTexRefGetAddress[('hipDeviceptr_t*', 'dev_ptr'), ('const textureReference*', 'texRef')] #define INIT_hipTexRefGetAddress_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefGetAddress.dev_ptr = (hipDeviceptr_t*)dptr; \ cb_data.args.hipTexRefGetAddress.texRef = (const textureReference*)texRef; \ }; // hipTexRefGetFlags[('unsigned int*', 'pFlags'), ('const textureReference*', 'texRef')] #define INIT_hipTexRefGetFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefGetFlags.pFlags = (unsigned int*)pFlags; \ cb_data.args.hipTexRefGetFlags.texRef = (const textureReference*)texRef; \ }; // hipTexRefGetFormat[('hipArray_Format*', 'pFormat'), ('int*', 'pNumChannels'), ('const textureReference*', 'texRef')] #define INIT_hipTexRefGetFormat_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefGetFormat.pFormat = (hipArray_Format*)pFormat; \ cb_data.args.hipTexRefGetFormat.pNumChannels = (int*)pNumChannels; \ cb_data.args.hipTexRefGetFormat.texRef = (const textureReference*)texRef; \ }; // hipTexRefGetMaxAnisotropy[('int*', 'pmaxAnsio'), ('const textureReference*', 'texRef')] #define INIT_hipTexRefGetMaxAnisotropy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefGetMaxAnisotropy.pmaxAnsio = (int*)pmaxAnsio; \ cb_data.args.hipTexRefGetMaxAnisotropy.texRef = (const textureReference*)texRef; \ }; // hipTexRefGetMipMappedArray[('hipMipmappedArray_t*', 'pArray'), ('const textureReference*', 'texRef')] #define INIT_hipTexRefGetMipMappedArray_CB_ARGS_DATA(cb_data) { \ }; // hipTexRefGetMipmapLevelBias[('float*', 'pbias'), ('const textureReference*', 'texRef')] #define INIT_hipTexRefGetMipmapLevelBias_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefGetMipmapLevelBias.pbias = (float*)pbias; \ cb_data.args.hipTexRefGetMipmapLevelBias.texRef = (const textureReference*)texRef; \ }; // hipTexRefGetMipmapLevelClamp[('float*', 'pminMipmapLevelClamp'), ('float*', 'pmaxMipmapLevelClamp'), ('const textureReference*', 'texRef')] #define INIT_hipTexRefGetMipmapLevelClamp_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefGetMipmapLevelClamp.pminMipmapLevelClamp = (float*)pminMipmapLevelClamp; \ cb_data.args.hipTexRefGetMipmapLevelClamp.pmaxMipmapLevelClamp = (float*)pmaxMipmapLevelClamp; \ cb_data.args.hipTexRefGetMipmapLevelClamp.texRef = (const textureReference*)texRef; \ }; // hipTexRefSetAddress[('size_t*', 'ByteOffset'), ('textureReference*', 'texRef'), ('hipDeviceptr_t', 'dptr'), ('size_t', 'bytes')] #define INIT_hipTexRefSetAddress_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetAddress.ByteOffset = (size_t*)ByteOffset; \ cb_data.args.hipTexRefSetAddress.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetAddress.dptr = (hipDeviceptr_t)dptr; \ cb_data.args.hipTexRefSetAddress.bytes = (size_t)bytes; \ }; // hipTexRefSetAddress2D[('textureReference*', 'texRef'), ('const HIP_ARRAY_DESCRIPTOR*', 'desc'), ('hipDeviceptr_t', 'dptr'), ('size_t', 'Pitch')] #define INIT_hipTexRefSetAddress2D_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetAddress2D.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetAddress2D.desc = (const HIP_ARRAY_DESCRIPTOR*)desc; \ cb_data.args.hipTexRefSetAddress2D.dptr = (hipDeviceptr_t)dptr; \ cb_data.args.hipTexRefSetAddress2D.Pitch = (size_t)Pitch; \ }; // hipTexRefSetArray[('textureReference*', 'tex'), ('hipArray_const_t', 'array'), ('unsigned int', 'flags')] #define INIT_hipTexRefSetArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetArray.tex = (textureReference*)texRef; \ cb_data.args.hipTexRefSetArray.array = (hipArray_const_t)array; \ cb_data.args.hipTexRefSetArray.flags = (unsigned int)flags; \ }; // hipTexRefSetBorderColor[('textureReference*', 'texRef'), ('float*', 'pBorderColor')] #define INIT_hipTexRefSetBorderColor_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetBorderColor.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetBorderColor.pBorderColor = (float*)pBorderColor; \ }; // hipTexRefSetFlags[('textureReference*', 'texRef'), ('unsigned int', 'Flags')] #define INIT_hipTexRefSetFlags_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetFlags.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetFlags.Flags = (unsigned int)Flags; \ }; // hipTexRefSetFormat[('textureReference*', 'texRef'), ('hipArray_Format', 'fmt'), ('int', 'NumPackedComponents')] #define INIT_hipTexRefSetFormat_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetFormat.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetFormat.fmt = (hipArray_Format)fmt; \ cb_data.args.hipTexRefSetFormat.NumPackedComponents = (int)NumPackedComponents; \ }; // hipTexRefSetMaxAnisotropy[('textureReference*', 'texRef'), ('unsigned int', 'maxAniso')] #define INIT_hipTexRefSetMaxAnisotropy_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetMaxAnisotropy.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetMaxAnisotropy.maxAniso = (unsigned int)maxAniso; \ }; // hipTexRefSetMipmapLevelBias[('textureReference*', 'texRef'), ('float', 'bias')] #define INIT_hipTexRefSetMipmapLevelBias_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetMipmapLevelBias.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetMipmapLevelBias.bias = (float)bias; \ }; // hipTexRefSetMipmapLevelClamp[('textureReference*', 'texRef'), ('float', 'minMipMapLevelClamp'), ('float', 'maxMipMapLevelClamp')] #define INIT_hipTexRefSetMipmapLevelClamp_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetMipmapLevelClamp.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetMipmapLevelClamp.minMipMapLevelClamp = (float)minMipMapLevelClamp; \ cb_data.args.hipTexRefSetMipmapLevelClamp.maxMipMapLevelClamp = (float)maxMipMapLevelClamp; \ }; // hipTexRefSetMipmappedArray[('textureReference*', 'texRef'), ('hipMipmappedArray*', 'mipmappedArray'), ('unsigned int', 'Flags')] #define INIT_hipTexRefSetMipmappedArray_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipTexRefSetMipmappedArray.texRef = (textureReference*)texRef; \ cb_data.args.hipTexRefSetMipmappedArray.mipmappedArray = (hipMipmappedArray*)mipmappedArray; \ cb_data.args.hipTexRefSetMipmappedArray.Flags = (unsigned int)Flags; \ }; // hipThreadExchangeStreamCaptureMode[('hipStreamCaptureMode*', 'mode')] #define INIT_hipThreadExchangeStreamCaptureMode_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipThreadExchangeStreamCaptureMode.mode = (hipStreamCaptureMode*)mode; \ }; // hipUserObjectCreate[('hipUserObject_t*', 'object_out'), ('void*', 'ptr'), ('hipHostFn_t', 'destroy'), ('unsigned int', 'initialRefcount'), ('unsigned int', 'flags')] #define INIT_hipUserObjectCreate_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipUserObjectCreate.object_out = (hipUserObject_t*)object_out; \ cb_data.args.hipUserObjectCreate.ptr = (void*)ptr; \ cb_data.args.hipUserObjectCreate.destroy = (hipHostFn_t)destroy; \ cb_data.args.hipUserObjectCreate.initialRefcount = (unsigned int)initialRefcount; \ cb_data.args.hipUserObjectCreate.flags = (unsigned int)flags; \ }; // hipUserObjectRelease[('hipUserObject_t', 'object'), ('unsigned int', 'count')] #define INIT_hipUserObjectRelease_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipUserObjectRelease.object = (hipUserObject_t)object; \ cb_data.args.hipUserObjectRelease.count = (unsigned int)count; \ }; // hipUserObjectRetain[('hipUserObject_t', 'object'), ('unsigned int', 'count')] #define INIT_hipUserObjectRetain_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipUserObjectRetain.object = (hipUserObject_t)object; \ cb_data.args.hipUserObjectRetain.count = (unsigned int)count; \ }; // hipWaitExternalSemaphoresAsync[('const hipExternalSemaphore_t*', 'extSemArray'), ('const hipExternalSemaphoreWaitParams*', 'paramsArray'), ('unsigned int', 'numExtSems'), ('hipStream_t', 'stream')] #define INIT_hipWaitExternalSemaphoresAsync_CB_ARGS_DATA(cb_data) { \ cb_data.args.hipWaitExternalSemaphoresAsync.extSemArray = (const hipExternalSemaphore_t*)extSemArray; \ cb_data.args.hipWaitExternalSemaphoresAsync.paramsArray = (const hipExternalSemaphoreWaitParams*)paramsArray; \ cb_data.args.hipWaitExternalSemaphoresAsync.numExtSems = (unsigned int)numExtSems; \ cb_data.args.hipWaitExternalSemaphoresAsync.stream = (hipStream_t)stream; \ }; #define INIT_CB_ARGS_DATA(cb_id, cb_data) INIT_##cb_id##_CB_ARGS_DATA(cb_data) // Macros for non-public API primitives // hipBindTexture() #define INIT_hipBindTexture_CB_ARGS_DATA(cb_data) {}; // hipBindTexture2D() #define INIT_hipBindTexture2D_CB_ARGS_DATA(cb_data) {}; // hipBindTextureToArray() #define INIT_hipBindTextureToArray_CB_ARGS_DATA(cb_data) {}; // hipBindTextureToMipmappedArray() #define INIT_hipBindTextureToMipmappedArray_CB_ARGS_DATA(cb_data) {}; // hipCreateTextureObject() #define INIT_hipCreateTextureObject_CB_ARGS_DATA(cb_data) {}; // hipDestroyTextureObject() #define INIT_hipDestroyTextureObject_CB_ARGS_DATA(cb_data) {}; // hipDeviceGetCount() #define INIT_hipDeviceGetCount_CB_ARGS_DATA(cb_data) {}; // hipGetTextureAlignmentOffset() #define INIT_hipGetTextureAlignmentOffset_CB_ARGS_DATA(cb_data) {}; // hipGetTextureObjectResourceDesc() #define INIT_hipGetTextureObjectResourceDesc_CB_ARGS_DATA(cb_data) {}; // hipGetTextureObjectResourceViewDesc() #define INIT_hipGetTextureObjectResourceViewDesc_CB_ARGS_DATA(cb_data) {}; // hipGetTextureObjectTextureDesc() #define INIT_hipGetTextureObjectTextureDesc_CB_ARGS_DATA(cb_data) {}; // hipGetTextureReference() #define INIT_hipGetTextureReference_CB_ARGS_DATA(cb_data) {}; // hipMemcpy2DArrayToArray() #define INIT_hipMemcpy2DArrayToArray_CB_ARGS_DATA(cb_data) {}; // hipMemcpyArrayToArray() #define INIT_hipMemcpyArrayToArray_CB_ARGS_DATA(cb_data) {}; // hipMemcpyAtoA() #define INIT_hipMemcpyAtoA_CB_ARGS_DATA(cb_data) {}; // hipMemcpyAtoD() #define INIT_hipMemcpyAtoD_CB_ARGS_DATA(cb_data) {}; // hipMemcpyAtoHAsync() #define INIT_hipMemcpyAtoHAsync_CB_ARGS_DATA(cb_data) {}; // hipMemcpyDtoA() #define INIT_hipMemcpyDtoA_CB_ARGS_DATA(cb_data) {}; // hipMemcpyFromArrayAsync() #define INIT_hipMemcpyFromArrayAsync_CB_ARGS_DATA(cb_data) {}; // hipMemcpyHtoAAsync() #define INIT_hipMemcpyHtoAAsync_CB_ARGS_DATA(cb_data) {}; // hipMemcpyToArrayAsync() #define INIT_hipMemcpyToArrayAsync_CB_ARGS_DATA(cb_data) {}; // hipModuleLaunchKernelExt() #define INIT_hipModuleLaunchKernelExt_CB_ARGS_DATA(cb_data) {}; // hipSetValidDevices() #define INIT_hipSetValidDevices_CB_ARGS_DATA(cb_data) {}; // hipTexObjectCreate() #define INIT_hipTexObjectCreate_CB_ARGS_DATA(cb_data) {}; // hipTexObjectDestroy() #define INIT_hipTexObjectDestroy_CB_ARGS_DATA(cb_data) {}; // hipTexObjectGetResourceDesc() #define INIT_hipTexObjectGetResourceDesc_CB_ARGS_DATA(cb_data) {}; // hipTexObjectGetResourceViewDesc() #define INIT_hipTexObjectGetResourceViewDesc_CB_ARGS_DATA(cb_data) {}; // hipTexObjectGetTextureDesc() #define INIT_hipTexObjectGetTextureDesc_CB_ARGS_DATA(cb_data) {}; // hipTexRefGetAddressMode() #define INIT_hipTexRefGetAddressMode_CB_ARGS_DATA(cb_data) {}; // hipTexRefGetArray() #define INIT_hipTexRefGetArray_CB_ARGS_DATA(cb_data) {}; // hipTexRefGetBorderColor() #define INIT_hipTexRefGetBorderColor_CB_ARGS_DATA(cb_data) {}; // hipTexRefGetFilterMode() #define INIT_hipTexRefGetFilterMode_CB_ARGS_DATA(cb_data) {}; // hipTexRefGetMipmapFilterMode() #define INIT_hipTexRefGetMipmapFilterMode_CB_ARGS_DATA(cb_data) {}; // hipTexRefGetMipmappedArray() #define INIT_hipTexRefGetMipmappedArray_CB_ARGS_DATA(cb_data) {}; // hipTexRefSetAddressMode() #define INIT_hipTexRefSetAddressMode_CB_ARGS_DATA(cb_data) {}; // hipTexRefSetFilterMode() #define INIT_hipTexRefSetFilterMode_CB_ARGS_DATA(cb_data) {}; // hipTexRefSetMipmapFilterMode() #define INIT_hipTexRefSetMipmapFilterMode_CB_ARGS_DATA(cb_data) {}; // hipUnbindTexture() #define INIT_hipUnbindTexture_CB_ARGS_DATA(cb_data) {}; #define INIT_NONE_CB_ARGS_DATA(cb_data) {}; #if HIP_PROF_HIP_API_STRING // HIP API args filling helper static inline void hipApiArgsInit(hip_api_id_t id, hip_api_data_t* data) { switch (id) { // __hipPopCallConfiguration[('dim3*', 'gridDim'), ('dim3*', 'blockDim'), ('size_t*', 'sharedMem'), ('hipStream_t*', 'stream')] case HIP_API_ID___hipPopCallConfiguration: if (data->args.__hipPopCallConfiguration.gridDim) data->args.__hipPopCallConfiguration.gridDim__val = *(data->args.__hipPopCallConfiguration.gridDim); if (data->args.__hipPopCallConfiguration.blockDim) data->args.__hipPopCallConfiguration.blockDim__val = *(data->args.__hipPopCallConfiguration.blockDim); if (data->args.__hipPopCallConfiguration.sharedMem) data->args.__hipPopCallConfiguration.sharedMem__val = *(data->args.__hipPopCallConfiguration.sharedMem); if (data->args.__hipPopCallConfiguration.stream) data->args.__hipPopCallConfiguration.stream__val = *(data->args.__hipPopCallConfiguration.stream); break; // __hipPushCallConfiguration[('dim3', 'gridDim'), ('dim3', 'blockDim'), ('size_t', 'sharedMem'), ('hipStream_t', 'stream')] case HIP_API_ID___hipPushCallConfiguration: break; // hipArray3DCreate[('hipArray**', 'array'), ('const HIP_ARRAY3D_DESCRIPTOR*', 'pAllocateArray')] case HIP_API_ID_hipArray3DCreate: if (data->args.hipArray3DCreate.array) data->args.hipArray3DCreate.array__val = *(data->args.hipArray3DCreate.array); if (data->args.hipArray3DCreate.pAllocateArray) data->args.hipArray3DCreate.pAllocateArray__val = *(data->args.hipArray3DCreate.pAllocateArray); break; // hipArray3DGetDescriptor[('HIP_ARRAY3D_DESCRIPTOR*', 'pArrayDescriptor'), ('hipArray*', 'array')] case HIP_API_ID_hipArray3DGetDescriptor: if (data->args.hipArray3DGetDescriptor.pArrayDescriptor) data->args.hipArray3DGetDescriptor.pArrayDescriptor__val = *(data->args.hipArray3DGetDescriptor.pArrayDescriptor); if (data->args.hipArray3DGetDescriptor.array) data->args.hipArray3DGetDescriptor.array__val = *(data->args.hipArray3DGetDescriptor.array); break; // hipArrayCreate[('hipArray**', 'pHandle'), ('const HIP_ARRAY_DESCRIPTOR*', 'pAllocateArray')] case HIP_API_ID_hipArrayCreate: if (data->args.hipArrayCreate.pHandle) data->args.hipArrayCreate.pHandle__val = *(data->args.hipArrayCreate.pHandle); if (data->args.hipArrayCreate.pAllocateArray) data->args.hipArrayCreate.pAllocateArray__val = *(data->args.hipArrayCreate.pAllocateArray); break; // hipArrayDestroy[('hipArray*', 'array')] case HIP_API_ID_hipArrayDestroy: if (data->args.hipArrayDestroy.array) data->args.hipArrayDestroy.array__val = *(data->args.hipArrayDestroy.array); break; // hipArrayGetDescriptor[('HIP_ARRAY_DESCRIPTOR*', 'pArrayDescriptor'), ('hipArray*', 'array')] case HIP_API_ID_hipArrayGetDescriptor: if (data->args.hipArrayGetDescriptor.pArrayDescriptor) data->args.hipArrayGetDescriptor.pArrayDescriptor__val = *(data->args.hipArrayGetDescriptor.pArrayDescriptor); if (data->args.hipArrayGetDescriptor.array) data->args.hipArrayGetDescriptor.array__val = *(data->args.hipArrayGetDescriptor.array); break; // hipArrayGetInfo[('hipChannelFormatDesc*', 'desc'), ('hipExtent*', 'extent'), ('unsigned int*', 'flags'), ('hipArray*', 'array')] case HIP_API_ID_hipArrayGetInfo: if (data->args.hipArrayGetInfo.desc) data->args.hipArrayGetInfo.desc__val = *(data->args.hipArrayGetInfo.desc); if (data->args.hipArrayGetInfo.extent) data->args.hipArrayGetInfo.extent__val = *(data->args.hipArrayGetInfo.extent); if (data->args.hipArrayGetInfo.flags) data->args.hipArrayGetInfo.flags__val = *(data->args.hipArrayGetInfo.flags); if (data->args.hipArrayGetInfo.array) data->args.hipArrayGetInfo.array__val = *(data->args.hipArrayGetInfo.array); break; // hipChooseDevice[('int*', 'device'), ('const hipDeviceProp_t*', 'prop')] case HIP_API_ID_hipChooseDevice: if (data->args.hipChooseDevice.device) data->args.hipChooseDevice.device__val = *(data->args.hipChooseDevice.device); if (data->args.hipChooseDevice.prop) data->args.hipChooseDevice.prop__val = *(data->args.hipChooseDevice.prop); break; // hipConfigureCall[('dim3', 'gridDim'), ('dim3', 'blockDim'), ('size_t', 'sharedMem'), ('hipStream_t', 'stream')] case HIP_API_ID_hipConfigureCall: break; // hipCreateSurfaceObject[('hipSurfaceObject_t*', 'pSurfObject'), ('const hipResourceDesc*', 'pResDesc')] case HIP_API_ID_hipCreateSurfaceObject: if (data->args.hipCreateSurfaceObject.pSurfObject) data->args.hipCreateSurfaceObject.pSurfObject__val = *(data->args.hipCreateSurfaceObject.pSurfObject); if (data->args.hipCreateSurfaceObject.pResDesc) data->args.hipCreateSurfaceObject.pResDesc__val = *(data->args.hipCreateSurfaceObject.pResDesc); break; // hipCtxCreate[('hipCtx_t*', 'ctx'), ('unsigned int', 'flags'), ('hipDevice_t', 'device')] case HIP_API_ID_hipCtxCreate: if (data->args.hipCtxCreate.ctx) data->args.hipCtxCreate.ctx__val = *(data->args.hipCtxCreate.ctx); break; // hipCtxDestroy[('hipCtx_t', 'ctx')] case HIP_API_ID_hipCtxDestroy: break; // hipCtxDisablePeerAccess[('hipCtx_t', 'peerCtx')] case HIP_API_ID_hipCtxDisablePeerAccess: break; // hipCtxEnablePeerAccess[('hipCtx_t', 'peerCtx'), ('unsigned int', 'flags')] case HIP_API_ID_hipCtxEnablePeerAccess: break; // hipCtxGetApiVersion[('hipCtx_t', 'ctx'), ('int*', 'apiVersion')] case HIP_API_ID_hipCtxGetApiVersion: if (data->args.hipCtxGetApiVersion.apiVersion) data->args.hipCtxGetApiVersion.apiVersion__val = *(data->args.hipCtxGetApiVersion.apiVersion); break; // hipCtxGetCacheConfig[('hipFuncCache_t*', 'cacheConfig')] case HIP_API_ID_hipCtxGetCacheConfig: if (data->args.hipCtxGetCacheConfig.cacheConfig) data->args.hipCtxGetCacheConfig.cacheConfig__val = *(data->args.hipCtxGetCacheConfig.cacheConfig); break; // hipCtxGetCurrent[('hipCtx_t*', 'ctx')] case HIP_API_ID_hipCtxGetCurrent: if (data->args.hipCtxGetCurrent.ctx) data->args.hipCtxGetCurrent.ctx__val = *(data->args.hipCtxGetCurrent.ctx); break; // hipCtxGetDevice[('hipDevice_t*', 'device')] case HIP_API_ID_hipCtxGetDevice: if (data->args.hipCtxGetDevice.device) data->args.hipCtxGetDevice.device__val = *(data->args.hipCtxGetDevice.device); break; // hipCtxGetFlags[('unsigned int*', 'flags')] case HIP_API_ID_hipCtxGetFlags: if (data->args.hipCtxGetFlags.flags) data->args.hipCtxGetFlags.flags__val = *(data->args.hipCtxGetFlags.flags); break; // hipCtxGetSharedMemConfig[('hipSharedMemConfig*', 'pConfig')] case HIP_API_ID_hipCtxGetSharedMemConfig: if (data->args.hipCtxGetSharedMemConfig.pConfig) data->args.hipCtxGetSharedMemConfig.pConfig__val = *(data->args.hipCtxGetSharedMemConfig.pConfig); break; // hipCtxPopCurrent[('hipCtx_t*', 'ctx')] case HIP_API_ID_hipCtxPopCurrent: if (data->args.hipCtxPopCurrent.ctx) data->args.hipCtxPopCurrent.ctx__val = *(data->args.hipCtxPopCurrent.ctx); break; // hipCtxPushCurrent[('hipCtx_t', 'ctx')] case HIP_API_ID_hipCtxPushCurrent: break; // hipCtxSetCacheConfig[('hipFuncCache_t', 'cacheConfig')] case HIP_API_ID_hipCtxSetCacheConfig: break; // hipCtxSetCurrent[('hipCtx_t', 'ctx')] case HIP_API_ID_hipCtxSetCurrent: break; // hipCtxSetSharedMemConfig[('hipSharedMemConfig', 'config')] case HIP_API_ID_hipCtxSetSharedMemConfig: break; // hipCtxSynchronize[] case HIP_API_ID_hipCtxSynchronize: break; // hipDestroyExternalMemory[('hipExternalMemory_t', 'extMem')] case HIP_API_ID_hipDestroyExternalMemory: break; // hipDestroyExternalSemaphore[('hipExternalSemaphore_t', 'extSem')] case HIP_API_ID_hipDestroyExternalSemaphore: break; // hipDestroySurfaceObject[('hipSurfaceObject_t', 'surfaceObject')] case HIP_API_ID_hipDestroySurfaceObject: break; // hipDeviceCanAccessPeer[('int*', 'canAccessPeer'), ('int', 'deviceId'), ('int', 'peerDeviceId')] case HIP_API_ID_hipDeviceCanAccessPeer: if (data->args.hipDeviceCanAccessPeer.canAccessPeer) data->args.hipDeviceCanAccessPeer.canAccessPeer__val = *(data->args.hipDeviceCanAccessPeer.canAccessPeer); break; // hipDeviceComputeCapability[('int*', 'major'), ('int*', 'minor'), ('hipDevice_t', 'device')] case HIP_API_ID_hipDeviceComputeCapability: if (data->args.hipDeviceComputeCapability.major) data->args.hipDeviceComputeCapability.major__val = *(data->args.hipDeviceComputeCapability.major); if (data->args.hipDeviceComputeCapability.minor) data->args.hipDeviceComputeCapability.minor__val = *(data->args.hipDeviceComputeCapability.minor); break; // hipDeviceDisablePeerAccess[('int', 'peerDeviceId')] case HIP_API_ID_hipDeviceDisablePeerAccess: break; // hipDeviceEnablePeerAccess[('int', 'peerDeviceId'), ('unsigned int', 'flags')] case HIP_API_ID_hipDeviceEnablePeerAccess: break; // hipDeviceGet[('hipDevice_t*', 'device'), ('int', 'ordinal')] case HIP_API_ID_hipDeviceGet: if (data->args.hipDeviceGet.device) data->args.hipDeviceGet.device__val = *(data->args.hipDeviceGet.device); break; // hipDeviceGetAttribute[('int*', 'pi'), ('hipDeviceAttribute_t', 'attr'), ('int', 'deviceId')] case HIP_API_ID_hipDeviceGetAttribute: if (data->args.hipDeviceGetAttribute.pi) data->args.hipDeviceGetAttribute.pi__val = *(data->args.hipDeviceGetAttribute.pi); break; // hipDeviceGetByPCIBusId[('int*', 'device'), ('const char*', 'pciBusId')] case HIP_API_ID_hipDeviceGetByPCIBusId: if (data->args.hipDeviceGetByPCIBusId.device) data->args.hipDeviceGetByPCIBusId.device__val = *(data->args.hipDeviceGetByPCIBusId.device); if (data->args.hipDeviceGetByPCIBusId.pciBusId) data->args.hipDeviceGetByPCIBusId.pciBusId__val = *(data->args.hipDeviceGetByPCIBusId.pciBusId); break; // hipDeviceGetCacheConfig[('hipFuncCache_t*', 'cacheConfig')] case HIP_API_ID_hipDeviceGetCacheConfig: if (data->args.hipDeviceGetCacheConfig.cacheConfig) data->args.hipDeviceGetCacheConfig.cacheConfig__val = *(data->args.hipDeviceGetCacheConfig.cacheConfig); break; // hipDeviceGetDefaultMemPool[('hipMemPool_t*', 'mem_pool'), ('int', 'device')] case HIP_API_ID_hipDeviceGetDefaultMemPool: if (data->args.hipDeviceGetDefaultMemPool.mem_pool) data->args.hipDeviceGetDefaultMemPool.mem_pool__val = *(data->args.hipDeviceGetDefaultMemPool.mem_pool); break; // hipDeviceGetGraphMemAttribute[('int', 'device'), ('hipGraphMemAttributeType', 'attr'), ('void*', 'value')] case HIP_API_ID_hipDeviceGetGraphMemAttribute: break; // hipDeviceGetLimit[('size_t*', 'pValue'), ('hipLimit_t', 'limit')] case HIP_API_ID_hipDeviceGetLimit: if (data->args.hipDeviceGetLimit.pValue) data->args.hipDeviceGetLimit.pValue__val = *(data->args.hipDeviceGetLimit.pValue); break; // hipDeviceGetMemPool[('hipMemPool_t*', 'mem_pool'), ('int', 'device')] case HIP_API_ID_hipDeviceGetMemPool: if (data->args.hipDeviceGetMemPool.mem_pool) data->args.hipDeviceGetMemPool.mem_pool__val = *(data->args.hipDeviceGetMemPool.mem_pool); break; // hipDeviceGetName[('char*', 'name'), ('int', 'len'), ('hipDevice_t', 'device')] case HIP_API_ID_hipDeviceGetName: data->args.hipDeviceGetName.name = (data->args.hipDeviceGetName.name) ? strdup(data->args.hipDeviceGetName.name) : NULL; break; // hipDeviceGetP2PAttribute[('int*', 'value'), ('hipDeviceP2PAttr', 'attr'), ('int', 'srcDevice'), ('int', 'dstDevice')] case HIP_API_ID_hipDeviceGetP2PAttribute: if (data->args.hipDeviceGetP2PAttribute.value) data->args.hipDeviceGetP2PAttribute.value__val = *(data->args.hipDeviceGetP2PAttribute.value); break; // hipDeviceGetPCIBusId[('char*', 'pciBusId'), ('int', 'len'), ('int', 'device')] case HIP_API_ID_hipDeviceGetPCIBusId: data->args.hipDeviceGetPCIBusId.pciBusId = (data->args.hipDeviceGetPCIBusId.pciBusId) ? strdup(data->args.hipDeviceGetPCIBusId.pciBusId) : NULL; break; // hipDeviceGetSharedMemConfig[('hipSharedMemConfig*', 'pConfig')] case HIP_API_ID_hipDeviceGetSharedMemConfig: if (data->args.hipDeviceGetSharedMemConfig.pConfig) data->args.hipDeviceGetSharedMemConfig.pConfig__val = *(data->args.hipDeviceGetSharedMemConfig.pConfig); break; // hipDeviceGetStreamPriorityRange[('int*', 'leastPriority'), ('int*', 'greatestPriority')] case HIP_API_ID_hipDeviceGetStreamPriorityRange: if (data->args.hipDeviceGetStreamPriorityRange.leastPriority) data->args.hipDeviceGetStreamPriorityRange.leastPriority__val = *(data->args.hipDeviceGetStreamPriorityRange.leastPriority); if (data->args.hipDeviceGetStreamPriorityRange.greatestPriority) data->args.hipDeviceGetStreamPriorityRange.greatestPriority__val = *(data->args.hipDeviceGetStreamPriorityRange.greatestPriority); break; // hipDeviceGetUuid[('hipUUID*', 'uuid'), ('hipDevice_t', 'device')] case HIP_API_ID_hipDeviceGetUuid: if (data->args.hipDeviceGetUuid.uuid) data->args.hipDeviceGetUuid.uuid__val = *(data->args.hipDeviceGetUuid.uuid); break; // hipDeviceGraphMemTrim[('int', 'device')] case HIP_API_ID_hipDeviceGraphMemTrim: break; // hipDevicePrimaryCtxGetState[('hipDevice_t', 'dev'), ('unsigned int*', 'flags'), ('int*', 'active')] case HIP_API_ID_hipDevicePrimaryCtxGetState: if (data->args.hipDevicePrimaryCtxGetState.flags) data->args.hipDevicePrimaryCtxGetState.flags__val = *(data->args.hipDevicePrimaryCtxGetState.flags); if (data->args.hipDevicePrimaryCtxGetState.active) data->args.hipDevicePrimaryCtxGetState.active__val = *(data->args.hipDevicePrimaryCtxGetState.active); break; // hipDevicePrimaryCtxRelease[('hipDevice_t', 'dev')] case HIP_API_ID_hipDevicePrimaryCtxRelease: break; // hipDevicePrimaryCtxReset[('hipDevice_t', 'dev')] case HIP_API_ID_hipDevicePrimaryCtxReset: break; // hipDevicePrimaryCtxRetain[('hipCtx_t*', 'pctx'), ('hipDevice_t', 'dev')] case HIP_API_ID_hipDevicePrimaryCtxRetain: if (data->args.hipDevicePrimaryCtxRetain.pctx) data->args.hipDevicePrimaryCtxRetain.pctx__val = *(data->args.hipDevicePrimaryCtxRetain.pctx); break; // hipDevicePrimaryCtxSetFlags[('hipDevice_t', 'dev'), ('unsigned int', 'flags')] case HIP_API_ID_hipDevicePrimaryCtxSetFlags: break; // hipDeviceReset[] case HIP_API_ID_hipDeviceReset: break; // hipDeviceSetCacheConfig[('hipFuncCache_t', 'cacheConfig')] case HIP_API_ID_hipDeviceSetCacheConfig: break; // hipDeviceSetGraphMemAttribute[('int', 'device'), ('hipGraphMemAttributeType', 'attr'), ('void*', 'value')] case HIP_API_ID_hipDeviceSetGraphMemAttribute: break; // hipDeviceSetLimit[('hipLimit_t', 'limit'), ('size_t', 'value')] case HIP_API_ID_hipDeviceSetLimit: break; // hipDeviceSetMemPool[('int', 'device'), ('hipMemPool_t', 'mem_pool')] case HIP_API_ID_hipDeviceSetMemPool: break; // hipDeviceSetSharedMemConfig[('hipSharedMemConfig', 'config')] case HIP_API_ID_hipDeviceSetSharedMemConfig: break; // hipDeviceSynchronize[] case HIP_API_ID_hipDeviceSynchronize: break; // hipDeviceTotalMem[('size_t*', 'bytes'), ('hipDevice_t', 'device')] case HIP_API_ID_hipDeviceTotalMem: if (data->args.hipDeviceTotalMem.bytes) data->args.hipDeviceTotalMem.bytes__val = *(data->args.hipDeviceTotalMem.bytes); break; // hipDriverGetVersion[('int*', 'driverVersion')] case HIP_API_ID_hipDriverGetVersion: if (data->args.hipDriverGetVersion.driverVersion) data->args.hipDriverGetVersion.driverVersion__val = *(data->args.hipDriverGetVersion.driverVersion); break; // hipDrvMemcpy2DUnaligned[('const hip_Memcpy2D*', 'pCopy')] case HIP_API_ID_hipDrvMemcpy2DUnaligned: if (data->args.hipDrvMemcpy2DUnaligned.pCopy) data->args.hipDrvMemcpy2DUnaligned.pCopy__val = *(data->args.hipDrvMemcpy2DUnaligned.pCopy); break; // hipDrvMemcpy3D[('const HIP_MEMCPY3D*', 'pCopy')] case HIP_API_ID_hipDrvMemcpy3D: if (data->args.hipDrvMemcpy3D.pCopy) data->args.hipDrvMemcpy3D.pCopy__val = *(data->args.hipDrvMemcpy3D.pCopy); break; // hipDrvMemcpy3DAsync[('const HIP_MEMCPY3D*', 'pCopy'), ('hipStream_t', 'stream')] case HIP_API_ID_hipDrvMemcpy3DAsync: if (data->args.hipDrvMemcpy3DAsync.pCopy) data->args.hipDrvMemcpy3DAsync.pCopy__val = *(data->args.hipDrvMemcpy3DAsync.pCopy); break; // hipDrvPointerGetAttributes[('unsigned int', 'numAttributes'), ('hipPointer_attribute*', 'attributes'), ('void**', 'data'), ('hipDeviceptr_t', 'ptr')] case HIP_API_ID_hipDrvPointerGetAttributes: if (data->args.hipDrvPointerGetAttributes.attributes) data->args.hipDrvPointerGetAttributes.attributes__val = *(data->args.hipDrvPointerGetAttributes.attributes); if (data->args.hipDrvPointerGetAttributes.data) data->args.hipDrvPointerGetAttributes.data__val = *(data->args.hipDrvPointerGetAttributes.data); break; // hipEventCreate[('hipEvent_t*', 'event')] case HIP_API_ID_hipEventCreate: if (data->args.hipEventCreate.event) data->args.hipEventCreate.event__val = *(data->args.hipEventCreate.event); break; // hipEventCreateWithFlags[('hipEvent_t*', 'event'), ('unsigned int', 'flags')] case HIP_API_ID_hipEventCreateWithFlags: if (data->args.hipEventCreateWithFlags.event) data->args.hipEventCreateWithFlags.event__val = *(data->args.hipEventCreateWithFlags.event); break; // hipEventDestroy[('hipEvent_t', 'event')] case HIP_API_ID_hipEventDestroy: break; // hipEventElapsedTime[('float*', 'ms'), ('hipEvent_t', 'start'), ('hipEvent_t', 'stop')] case HIP_API_ID_hipEventElapsedTime: if (data->args.hipEventElapsedTime.ms) data->args.hipEventElapsedTime.ms__val = *(data->args.hipEventElapsedTime.ms); break; // hipEventQuery[('hipEvent_t', 'event')] case HIP_API_ID_hipEventQuery: break; // hipEventRecord[('hipEvent_t', 'event'), ('hipStream_t', 'stream')] case HIP_API_ID_hipEventRecord: break; // hipEventSynchronize[('hipEvent_t', 'event')] case HIP_API_ID_hipEventSynchronize: break; // hipExtGetLinkTypeAndHopCount[('int', 'device1'), ('int', 'device2'), ('unsigned int*', 'linktype'), ('unsigned int*', 'hopcount')] case HIP_API_ID_hipExtGetLinkTypeAndHopCount: if (data->args.hipExtGetLinkTypeAndHopCount.linktype) data->args.hipExtGetLinkTypeAndHopCount.linktype__val = *(data->args.hipExtGetLinkTypeAndHopCount.linktype); if (data->args.hipExtGetLinkTypeAndHopCount.hopcount) data->args.hipExtGetLinkTypeAndHopCount.hopcount__val = *(data->args.hipExtGetLinkTypeAndHopCount.hopcount); break; // hipExtLaunchKernel[('const void*', 'function_address'), ('dim3', 'numBlocks'), ('dim3', 'dimBlocks'), ('void**', 'args'), ('size_t', 'sharedMemBytes'), ('hipStream_t', 'stream'), ('hipEvent_t', 'startEvent'), ('hipEvent_t', 'stopEvent'), ('int', 'flags')] case HIP_API_ID_hipExtLaunchKernel: if (data->args.hipExtLaunchKernel.args) data->args.hipExtLaunchKernel.args__val = *(data->args.hipExtLaunchKernel.args); break; // hipExtLaunchMultiKernelMultiDevice[('hipLaunchParams*', 'launchParamsList'), ('int', 'numDevices'), ('unsigned int', 'flags')] case HIP_API_ID_hipExtLaunchMultiKernelMultiDevice: if (data->args.hipExtLaunchMultiKernelMultiDevice.launchParamsList) data->args.hipExtLaunchMultiKernelMultiDevice.launchParamsList__val = *(data->args.hipExtLaunchMultiKernelMultiDevice.launchParamsList); break; // hipExtMallocWithFlags[('void**', 'ptr'), ('size_t', 'sizeBytes'), ('unsigned int', 'flags')] case HIP_API_ID_hipExtMallocWithFlags: if (data->args.hipExtMallocWithFlags.ptr) data->args.hipExtMallocWithFlags.ptr__val = *(data->args.hipExtMallocWithFlags.ptr); break; // hipExtModuleLaunchKernel[('hipFunction_t', 'f'), ('unsigned int', 'globalWorkSizeX'), ('unsigned int', 'globalWorkSizeY'), ('unsigned int', 'globalWorkSizeZ'), ('unsigned int', 'localWorkSizeX'), ('unsigned int', 'localWorkSizeY'), ('unsigned int', 'localWorkSizeZ'), ('size_t', 'sharedMemBytes'), ('hipStream_t', 'hStream'), ('void**', 'kernelParams'), ('void**', 'extra'), ('hipEvent_t', 'startEvent'), ('hipEvent_t', 'stopEvent'), ('unsigned int', 'flags')] case HIP_API_ID_hipExtModuleLaunchKernel: if (data->args.hipExtModuleLaunchKernel.kernelParams) data->args.hipExtModuleLaunchKernel.kernelParams__val = *(data->args.hipExtModuleLaunchKernel.kernelParams); if (data->args.hipExtModuleLaunchKernel.extra) data->args.hipExtModuleLaunchKernel.extra__val = *(data->args.hipExtModuleLaunchKernel.extra); break; // hipExtStreamCreateWithCUMask[('hipStream_t*', 'stream'), ('unsigned int', 'cuMaskSize'), ('const unsigned int*', 'cuMask')] case HIP_API_ID_hipExtStreamCreateWithCUMask: if (data->args.hipExtStreamCreateWithCUMask.stream) data->args.hipExtStreamCreateWithCUMask.stream__val = *(data->args.hipExtStreamCreateWithCUMask.stream); if (data->args.hipExtStreamCreateWithCUMask.cuMask) data->args.hipExtStreamCreateWithCUMask.cuMask__val = *(data->args.hipExtStreamCreateWithCUMask.cuMask); break; // hipExtStreamGetCUMask[('hipStream_t', 'stream'), ('unsigned int', 'cuMaskSize'), ('unsigned int*', 'cuMask')] case HIP_API_ID_hipExtStreamGetCUMask: if (data->args.hipExtStreamGetCUMask.cuMask) data->args.hipExtStreamGetCUMask.cuMask__val = *(data->args.hipExtStreamGetCUMask.cuMask); break; // hipExternalMemoryGetMappedBuffer[('void**', 'devPtr'), ('hipExternalMemory_t', 'extMem'), ('const hipExternalMemoryBufferDesc*', 'bufferDesc')] case HIP_API_ID_hipExternalMemoryGetMappedBuffer: if (data->args.hipExternalMemoryGetMappedBuffer.devPtr) data->args.hipExternalMemoryGetMappedBuffer.devPtr__val = *(data->args.hipExternalMemoryGetMappedBuffer.devPtr); if (data->args.hipExternalMemoryGetMappedBuffer.bufferDesc) data->args.hipExternalMemoryGetMappedBuffer.bufferDesc__val = *(data->args.hipExternalMemoryGetMappedBuffer.bufferDesc); break; // hipFree[('void*', 'ptr')] case HIP_API_ID_hipFree: break; // hipFreeArray[('hipArray*', 'array')] case HIP_API_ID_hipFreeArray: if (data->args.hipFreeArray.array) data->args.hipFreeArray.array__val = *(data->args.hipFreeArray.array); break; // hipFreeAsync[('void*', 'dev_ptr'), ('hipStream_t', 'stream')] case HIP_API_ID_hipFreeAsync: break; // hipFreeHost[('void*', 'ptr')] case HIP_API_ID_hipFreeHost: break; // hipFreeMipmappedArray[('hipMipmappedArray_t', 'mipmappedArray')] case HIP_API_ID_hipFreeMipmappedArray: break; // hipFuncGetAttribute[('int*', 'value'), ('hipFunction_attribute', 'attrib'), ('hipFunction_t', 'hfunc')] case HIP_API_ID_hipFuncGetAttribute: if (data->args.hipFuncGetAttribute.value) data->args.hipFuncGetAttribute.value__val = *(data->args.hipFuncGetAttribute.value); break; // hipFuncGetAttributes[('hipFuncAttributes*', 'attr'), ('const void*', 'func')] case HIP_API_ID_hipFuncGetAttributes: if (data->args.hipFuncGetAttributes.attr) data->args.hipFuncGetAttributes.attr__val = *(data->args.hipFuncGetAttributes.attr); break; // hipFuncSetAttribute[('const void*', 'func'), ('hipFuncAttribute', 'attr'), ('int', 'value')] case HIP_API_ID_hipFuncSetAttribute: break; // hipFuncSetCacheConfig[('const void*', 'func'), ('hipFuncCache_t', 'config')] case HIP_API_ID_hipFuncSetCacheConfig: break; // hipFuncSetSharedMemConfig[('const void*', 'func'), ('hipSharedMemConfig', 'config')] case HIP_API_ID_hipFuncSetSharedMemConfig: break; // hipGLGetDevices[('unsigned int*', 'pHipDeviceCount'), ('int*', 'pHipDevices'), ('unsigned int', 'hipDeviceCount'), ('hipGLDeviceList', 'deviceList')] case HIP_API_ID_hipGLGetDevices: if (data->args.hipGLGetDevices.pHipDeviceCount) data->args.hipGLGetDevices.pHipDeviceCount__val = *(data->args.hipGLGetDevices.pHipDeviceCount); if (data->args.hipGLGetDevices.pHipDevices) data->args.hipGLGetDevices.pHipDevices__val = *(data->args.hipGLGetDevices.pHipDevices); break; // hipGetChannelDesc[('hipChannelFormatDesc*', 'desc'), ('hipArray_const_t', 'array')] case HIP_API_ID_hipGetChannelDesc: if (data->args.hipGetChannelDesc.desc) data->args.hipGetChannelDesc.desc__val = *(data->args.hipGetChannelDesc.desc); break; // hipGetDevice[('int*', 'deviceId')] case HIP_API_ID_hipGetDevice: if (data->args.hipGetDevice.deviceId) data->args.hipGetDevice.deviceId__val = *(data->args.hipGetDevice.deviceId); break; // hipGetDeviceCount[('int*', 'count')] case HIP_API_ID_hipGetDeviceCount: if (data->args.hipGetDeviceCount.count) data->args.hipGetDeviceCount.count__val = *(data->args.hipGetDeviceCount.count); break; // hipGetDeviceFlags[('unsigned int*', 'flags')] case HIP_API_ID_hipGetDeviceFlags: if (data->args.hipGetDeviceFlags.flags) data->args.hipGetDeviceFlags.flags__val = *(data->args.hipGetDeviceFlags.flags); break; // hipGetDeviceProperties[('hipDeviceProp_t*', 'props'), ('hipDevice_t', 'device')] case HIP_API_ID_hipGetDeviceProperties: if (data->args.hipGetDeviceProperties.props) data->args.hipGetDeviceProperties.props__val = *(data->args.hipGetDeviceProperties.props); break; // hipGetErrorString[] case HIP_API_ID_hipGetErrorString: break; // hipGetLastError[] case HIP_API_ID_hipGetLastError: break; // hipGetMipmappedArrayLevel[('hipArray_t*', 'levelArray'), ('hipMipmappedArray_const_t', 'mipmappedArray'), ('unsigned int', 'level')] case HIP_API_ID_hipGetMipmappedArrayLevel: if (data->args.hipGetMipmappedArrayLevel.levelArray) data->args.hipGetMipmappedArrayLevel.levelArray__val = *(data->args.hipGetMipmappedArrayLevel.levelArray); break; // hipGetSymbolAddress[('void**', 'devPtr'), ('const void*', 'symbol')] case HIP_API_ID_hipGetSymbolAddress: if (data->args.hipGetSymbolAddress.devPtr) data->args.hipGetSymbolAddress.devPtr__val = *(data->args.hipGetSymbolAddress.devPtr); break; // hipGetSymbolSize[('size_t*', 'size'), ('const void*', 'symbol')] case HIP_API_ID_hipGetSymbolSize: if (data->args.hipGetSymbolSize.size) data->args.hipGetSymbolSize.size__val = *(data->args.hipGetSymbolSize.size); break; // hipGraphAddChildGraphNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipGraph_t', 'childGraph')] case HIP_API_ID_hipGraphAddChildGraphNode: if (data->args.hipGraphAddChildGraphNode.pGraphNode) data->args.hipGraphAddChildGraphNode.pGraphNode__val = *(data->args.hipGraphAddChildGraphNode.pGraphNode); if (data->args.hipGraphAddChildGraphNode.pDependencies) data->args.hipGraphAddChildGraphNode.pDependencies__val = *(data->args.hipGraphAddChildGraphNode.pDependencies); break; // hipGraphAddDependencies[('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'from'), ('const hipGraphNode_t*', 'to'), ('size_t', 'numDependencies')] case HIP_API_ID_hipGraphAddDependencies: if (data->args.hipGraphAddDependencies.from) data->args.hipGraphAddDependencies.from__val = *(data->args.hipGraphAddDependencies.from); if (data->args.hipGraphAddDependencies.to) data->args.hipGraphAddDependencies.to__val = *(data->args.hipGraphAddDependencies.to); break; // hipGraphAddEmptyNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies')] case HIP_API_ID_hipGraphAddEmptyNode: if (data->args.hipGraphAddEmptyNode.pGraphNode) data->args.hipGraphAddEmptyNode.pGraphNode__val = *(data->args.hipGraphAddEmptyNode.pGraphNode); if (data->args.hipGraphAddEmptyNode.pDependencies) data->args.hipGraphAddEmptyNode.pDependencies__val = *(data->args.hipGraphAddEmptyNode.pDependencies); break; // hipGraphAddEventRecordNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipEvent_t', 'event')] case HIP_API_ID_hipGraphAddEventRecordNode: if (data->args.hipGraphAddEventRecordNode.pGraphNode) data->args.hipGraphAddEventRecordNode.pGraphNode__val = *(data->args.hipGraphAddEventRecordNode.pGraphNode); if (data->args.hipGraphAddEventRecordNode.pDependencies) data->args.hipGraphAddEventRecordNode.pDependencies__val = *(data->args.hipGraphAddEventRecordNode.pDependencies); break; // hipGraphAddEventWaitNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipEvent_t', 'event')] case HIP_API_ID_hipGraphAddEventWaitNode: if (data->args.hipGraphAddEventWaitNode.pGraphNode) data->args.hipGraphAddEventWaitNode.pGraphNode__val = *(data->args.hipGraphAddEventWaitNode.pGraphNode); if (data->args.hipGraphAddEventWaitNode.pDependencies) data->args.hipGraphAddEventWaitNode.pDependencies__val = *(data->args.hipGraphAddEventWaitNode.pDependencies); break; // hipGraphAddHostNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipHostNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphAddHostNode: if (data->args.hipGraphAddHostNode.pGraphNode) data->args.hipGraphAddHostNode.pGraphNode__val = *(data->args.hipGraphAddHostNode.pGraphNode); if (data->args.hipGraphAddHostNode.pDependencies) data->args.hipGraphAddHostNode.pDependencies__val = *(data->args.hipGraphAddHostNode.pDependencies); if (data->args.hipGraphAddHostNode.pNodeParams) data->args.hipGraphAddHostNode.pNodeParams__val = *(data->args.hipGraphAddHostNode.pNodeParams); break; // hipGraphAddKernelNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipKernelNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphAddKernelNode: if (data->args.hipGraphAddKernelNode.pGraphNode) data->args.hipGraphAddKernelNode.pGraphNode__val = *(data->args.hipGraphAddKernelNode.pGraphNode); if (data->args.hipGraphAddKernelNode.pDependencies) data->args.hipGraphAddKernelNode.pDependencies__val = *(data->args.hipGraphAddKernelNode.pDependencies); if (data->args.hipGraphAddKernelNode.pNodeParams) data->args.hipGraphAddKernelNode.pNodeParams__val = *(data->args.hipGraphAddKernelNode.pNodeParams); break; // hipGraphAddMemAllocNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('hipMemAllocNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphAddMemAllocNode: if (data->args.hipGraphAddMemAllocNode.pGraphNode) data->args.hipGraphAddMemAllocNode.pGraphNode__val = *(data->args.hipGraphAddMemAllocNode.pGraphNode); if (data->args.hipGraphAddMemAllocNode.pDependencies) data->args.hipGraphAddMemAllocNode.pDependencies__val = *(data->args.hipGraphAddMemAllocNode.pDependencies); if (data->args.hipGraphAddMemAllocNode.pNodeParams) data->args.hipGraphAddMemAllocNode.pNodeParams__val = *(data->args.hipGraphAddMemAllocNode.pNodeParams); break; // hipGraphAddMemFreeNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('void*', 'dev_ptr')] case HIP_API_ID_hipGraphAddMemFreeNode: if (data->args.hipGraphAddMemFreeNode.pGraphNode) data->args.hipGraphAddMemFreeNode.pGraphNode__val = *(data->args.hipGraphAddMemFreeNode.pGraphNode); if (data->args.hipGraphAddMemFreeNode.pDependencies) data->args.hipGraphAddMemFreeNode.pDependencies__val = *(data->args.hipGraphAddMemFreeNode.pDependencies); break; // hipGraphAddMemcpyNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipMemcpy3DParms*', 'pCopyParams')] case HIP_API_ID_hipGraphAddMemcpyNode: if (data->args.hipGraphAddMemcpyNode.pGraphNode) data->args.hipGraphAddMemcpyNode.pGraphNode__val = *(data->args.hipGraphAddMemcpyNode.pGraphNode); if (data->args.hipGraphAddMemcpyNode.pDependencies) data->args.hipGraphAddMemcpyNode.pDependencies__val = *(data->args.hipGraphAddMemcpyNode.pDependencies); if (data->args.hipGraphAddMemcpyNode.pCopyParams) data->args.hipGraphAddMemcpyNode.pCopyParams__val = *(data->args.hipGraphAddMemcpyNode.pCopyParams); break; // hipGraphAddMemcpyNode1D[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('void*', 'dst'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphAddMemcpyNode1D: if (data->args.hipGraphAddMemcpyNode1D.pGraphNode) data->args.hipGraphAddMemcpyNode1D.pGraphNode__val = *(data->args.hipGraphAddMemcpyNode1D.pGraphNode); if (data->args.hipGraphAddMemcpyNode1D.pDependencies) data->args.hipGraphAddMemcpyNode1D.pDependencies__val = *(data->args.hipGraphAddMemcpyNode1D.pDependencies); break; // hipGraphAddMemcpyNodeFromSymbol[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphAddMemcpyNodeFromSymbol: if (data->args.hipGraphAddMemcpyNodeFromSymbol.pGraphNode) data->args.hipGraphAddMemcpyNodeFromSymbol.pGraphNode__val = *(data->args.hipGraphAddMemcpyNodeFromSymbol.pGraphNode); if (data->args.hipGraphAddMemcpyNodeFromSymbol.pDependencies) data->args.hipGraphAddMemcpyNodeFromSymbol.pDependencies__val = *(data->args.hipGraphAddMemcpyNodeFromSymbol.pDependencies); break; // hipGraphAddMemcpyNodeToSymbol[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphAddMemcpyNodeToSymbol: if (data->args.hipGraphAddMemcpyNodeToSymbol.pGraphNode) data->args.hipGraphAddMemcpyNodeToSymbol.pGraphNode__val = *(data->args.hipGraphAddMemcpyNodeToSymbol.pGraphNode); if (data->args.hipGraphAddMemcpyNodeToSymbol.pDependencies) data->args.hipGraphAddMemcpyNodeToSymbol.pDependencies__val = *(data->args.hipGraphAddMemcpyNodeToSymbol.pDependencies); break; // hipGraphAddMemsetNode[('hipGraphNode_t*', 'pGraphNode'), ('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'pDependencies'), ('size_t', 'numDependencies'), ('const hipMemsetParams*', 'pMemsetParams')] case HIP_API_ID_hipGraphAddMemsetNode: if (data->args.hipGraphAddMemsetNode.pGraphNode) data->args.hipGraphAddMemsetNode.pGraphNode__val = *(data->args.hipGraphAddMemsetNode.pGraphNode); if (data->args.hipGraphAddMemsetNode.pDependencies) data->args.hipGraphAddMemsetNode.pDependencies__val = *(data->args.hipGraphAddMemsetNode.pDependencies); if (data->args.hipGraphAddMemsetNode.pMemsetParams) data->args.hipGraphAddMemsetNode.pMemsetParams__val = *(data->args.hipGraphAddMemsetNode.pMemsetParams); break; // hipGraphChildGraphNodeGetGraph[('hipGraphNode_t', 'node'), ('hipGraph_t*', 'pGraph')] case HIP_API_ID_hipGraphChildGraphNodeGetGraph: if (data->args.hipGraphChildGraphNodeGetGraph.pGraph) data->args.hipGraphChildGraphNodeGetGraph.pGraph__val = *(data->args.hipGraphChildGraphNodeGetGraph.pGraph); break; // hipGraphClone[('hipGraph_t*', 'pGraphClone'), ('hipGraph_t', 'originalGraph')] case HIP_API_ID_hipGraphClone: if (data->args.hipGraphClone.pGraphClone) data->args.hipGraphClone.pGraphClone__val = *(data->args.hipGraphClone.pGraphClone); break; // hipGraphCreate[('hipGraph_t*', 'pGraph'), ('unsigned int', 'flags')] case HIP_API_ID_hipGraphCreate: if (data->args.hipGraphCreate.pGraph) data->args.hipGraphCreate.pGraph__val = *(data->args.hipGraphCreate.pGraph); break; // hipGraphDebugDotPrint[('hipGraph_t', 'graph'), ('const char*', 'path'), ('unsigned int', 'flags')] case HIP_API_ID_hipGraphDebugDotPrint: if (data->args.hipGraphDebugDotPrint.path) data->args.hipGraphDebugDotPrint.path__val = *(data->args.hipGraphDebugDotPrint.path); break; // hipGraphDestroy[('hipGraph_t', 'graph')] case HIP_API_ID_hipGraphDestroy: break; // hipGraphDestroyNode[('hipGraphNode_t', 'node')] case HIP_API_ID_hipGraphDestroyNode: break; // hipGraphEventRecordNodeGetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t*', 'event_out')] case HIP_API_ID_hipGraphEventRecordNodeGetEvent: if (data->args.hipGraphEventRecordNodeGetEvent.event_out) data->args.hipGraphEventRecordNodeGetEvent.event_out__val = *(data->args.hipGraphEventRecordNodeGetEvent.event_out); break; // hipGraphEventRecordNodeSetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t', 'event')] case HIP_API_ID_hipGraphEventRecordNodeSetEvent: break; // hipGraphEventWaitNodeGetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t*', 'event_out')] case HIP_API_ID_hipGraphEventWaitNodeGetEvent: if (data->args.hipGraphEventWaitNodeGetEvent.event_out) data->args.hipGraphEventWaitNodeGetEvent.event_out__val = *(data->args.hipGraphEventWaitNodeGetEvent.event_out); break; // hipGraphEventWaitNodeSetEvent[('hipGraphNode_t', 'node'), ('hipEvent_t', 'event')] case HIP_API_ID_hipGraphEventWaitNodeSetEvent: break; // hipGraphExecChildGraphNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('hipGraph_t', 'childGraph')] case HIP_API_ID_hipGraphExecChildGraphNodeSetParams: break; // hipGraphExecDestroy[('hipGraphExec_t', 'graphExec')] case HIP_API_ID_hipGraphExecDestroy: break; // hipGraphExecEventRecordNodeSetEvent[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('hipEvent_t', 'event')] case HIP_API_ID_hipGraphExecEventRecordNodeSetEvent: break; // hipGraphExecEventWaitNodeSetEvent[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('hipEvent_t', 'event')] case HIP_API_ID_hipGraphExecEventWaitNodeSetEvent: break; // hipGraphExecHostNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipHostNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphExecHostNodeSetParams: if (data->args.hipGraphExecHostNodeSetParams.pNodeParams) data->args.hipGraphExecHostNodeSetParams.pNodeParams__val = *(data->args.hipGraphExecHostNodeSetParams.pNodeParams); break; // hipGraphExecKernelNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipKernelNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphExecKernelNodeSetParams: if (data->args.hipGraphExecKernelNodeSetParams.pNodeParams) data->args.hipGraphExecKernelNodeSetParams.pNodeParams__val = *(data->args.hipGraphExecKernelNodeSetParams.pNodeParams); break; // hipGraphExecMemcpyNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('hipMemcpy3DParms*', 'pNodeParams')] case HIP_API_ID_hipGraphExecMemcpyNodeSetParams: if (data->args.hipGraphExecMemcpyNodeSetParams.pNodeParams) data->args.hipGraphExecMemcpyNodeSetParams.pNodeParams__val = *(data->args.hipGraphExecMemcpyNodeSetParams.pNodeParams); break; // hipGraphExecMemcpyNodeSetParams1D[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphExecMemcpyNodeSetParams1D: break; // hipGraphExecMemcpyNodeSetParamsFromSymbol[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphExecMemcpyNodeSetParamsFromSymbol: break; // hipGraphExecMemcpyNodeSetParamsToSymbol[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphExecMemcpyNodeSetParamsToSymbol: break; // hipGraphExecMemsetNodeSetParams[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'node'), ('const hipMemsetParams*', 'pNodeParams')] case HIP_API_ID_hipGraphExecMemsetNodeSetParams: if (data->args.hipGraphExecMemsetNodeSetParams.pNodeParams) data->args.hipGraphExecMemsetNodeSetParams.pNodeParams__val = *(data->args.hipGraphExecMemsetNodeSetParams.pNodeParams); break; // hipGraphExecUpdate[('hipGraphExec_t', 'hGraphExec'), ('hipGraph_t', 'hGraph'), ('hipGraphNode_t*', 'hErrorNode_out'), ('hipGraphExecUpdateResult*', 'updateResult_out')] case HIP_API_ID_hipGraphExecUpdate: if (data->args.hipGraphExecUpdate.hErrorNode_out) data->args.hipGraphExecUpdate.hErrorNode_out__val = *(data->args.hipGraphExecUpdate.hErrorNode_out); if (data->args.hipGraphExecUpdate.updateResult_out) data->args.hipGraphExecUpdate.updateResult_out__val = *(data->args.hipGraphExecUpdate.updateResult_out); break; // hipGraphGetEdges[('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'from'), ('hipGraphNode_t*', 'to'), ('size_t*', 'numEdges')] case HIP_API_ID_hipGraphGetEdges: if (data->args.hipGraphGetEdges.from) data->args.hipGraphGetEdges.from__val = *(data->args.hipGraphGetEdges.from); if (data->args.hipGraphGetEdges.to) data->args.hipGraphGetEdges.to__val = *(data->args.hipGraphGetEdges.to); if (data->args.hipGraphGetEdges.numEdges) data->args.hipGraphGetEdges.numEdges__val = *(data->args.hipGraphGetEdges.numEdges); break; // hipGraphGetNodes[('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'nodes'), ('size_t*', 'numNodes')] case HIP_API_ID_hipGraphGetNodes: if (data->args.hipGraphGetNodes.nodes) data->args.hipGraphGetNodes.nodes__val = *(data->args.hipGraphGetNodes.nodes); if (data->args.hipGraphGetNodes.numNodes) data->args.hipGraphGetNodes.numNodes__val = *(data->args.hipGraphGetNodes.numNodes); break; // hipGraphGetRootNodes[('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'pRootNodes'), ('size_t*', 'pNumRootNodes')] case HIP_API_ID_hipGraphGetRootNodes: if (data->args.hipGraphGetRootNodes.pRootNodes) data->args.hipGraphGetRootNodes.pRootNodes__val = *(data->args.hipGraphGetRootNodes.pRootNodes); if (data->args.hipGraphGetRootNodes.pNumRootNodes) data->args.hipGraphGetRootNodes.pNumRootNodes__val = *(data->args.hipGraphGetRootNodes.pNumRootNodes); break; // hipGraphHostNodeGetParams[('hipGraphNode_t', 'node'), ('hipHostNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphHostNodeGetParams: if (data->args.hipGraphHostNodeGetParams.pNodeParams) data->args.hipGraphHostNodeGetParams.pNodeParams__val = *(data->args.hipGraphHostNodeGetParams.pNodeParams); break; // hipGraphHostNodeSetParams[('hipGraphNode_t', 'node'), ('const hipHostNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphHostNodeSetParams: if (data->args.hipGraphHostNodeSetParams.pNodeParams) data->args.hipGraphHostNodeSetParams.pNodeParams__val = *(data->args.hipGraphHostNodeSetParams.pNodeParams); break; // hipGraphInstantiate[('hipGraphExec_t*', 'pGraphExec'), ('hipGraph_t', 'graph'), ('hipGraphNode_t*', 'pErrorNode'), ('char*', 'pLogBuffer'), ('size_t', 'bufferSize')] case HIP_API_ID_hipGraphInstantiate: if (data->args.hipGraphInstantiate.pGraphExec) data->args.hipGraphInstantiate.pGraphExec__val = *(data->args.hipGraphInstantiate.pGraphExec); if (data->args.hipGraphInstantiate.pErrorNode) data->args.hipGraphInstantiate.pErrorNode__val = *(data->args.hipGraphInstantiate.pErrorNode); data->args.hipGraphInstantiate.pLogBuffer = (data->args.hipGraphInstantiate.pLogBuffer) ? strdup(data->args.hipGraphInstantiate.pLogBuffer) : NULL; break; // hipGraphInstantiateWithFlags[('hipGraphExec_t*', 'pGraphExec'), ('hipGraph_t', 'graph'), ('unsigned long long', 'flags')] case HIP_API_ID_hipGraphInstantiateWithFlags: if (data->args.hipGraphInstantiateWithFlags.pGraphExec) data->args.hipGraphInstantiateWithFlags.pGraphExec__val = *(data->args.hipGraphInstantiateWithFlags.pGraphExec); break; // hipGraphKernelNodeCopyAttributes[('hipGraphNode_t', 'hSrc'), ('hipGraphNode_t', 'hDst')] case HIP_API_ID_hipGraphKernelNodeCopyAttributes: break; // hipGraphKernelNodeGetAttribute[('hipGraphNode_t', 'hNode'), ('hipKernelNodeAttrID', 'attr'), ('hipKernelNodeAttrValue*', 'value')] case HIP_API_ID_hipGraphKernelNodeGetAttribute: if (data->args.hipGraphKernelNodeGetAttribute.value) data->args.hipGraphKernelNodeGetAttribute.value__val = *(data->args.hipGraphKernelNodeGetAttribute.value); break; // hipGraphKernelNodeGetParams[('hipGraphNode_t', 'node'), ('hipKernelNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphKernelNodeGetParams: if (data->args.hipGraphKernelNodeGetParams.pNodeParams) data->args.hipGraphKernelNodeGetParams.pNodeParams__val = *(data->args.hipGraphKernelNodeGetParams.pNodeParams); break; // hipGraphKernelNodeSetAttribute[('hipGraphNode_t', 'hNode'), ('hipKernelNodeAttrID', 'attr'), ('const hipKernelNodeAttrValue*', 'value')] case HIP_API_ID_hipGraphKernelNodeSetAttribute: if (data->args.hipGraphKernelNodeSetAttribute.value) data->args.hipGraphKernelNodeSetAttribute.value__val = *(data->args.hipGraphKernelNodeSetAttribute.value); break; // hipGraphKernelNodeSetParams[('hipGraphNode_t', 'node'), ('const hipKernelNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphKernelNodeSetParams: if (data->args.hipGraphKernelNodeSetParams.pNodeParams) data->args.hipGraphKernelNodeSetParams.pNodeParams__val = *(data->args.hipGraphKernelNodeSetParams.pNodeParams); break; // hipGraphLaunch[('hipGraphExec_t', 'graphExec'), ('hipStream_t', 'stream')] case HIP_API_ID_hipGraphLaunch: break; // hipGraphMemAllocNodeGetParams[('hipGraphNode_t', 'node'), ('hipMemAllocNodeParams*', 'pNodeParams')] case HIP_API_ID_hipGraphMemAllocNodeGetParams: if (data->args.hipGraphMemAllocNodeGetParams.pNodeParams) data->args.hipGraphMemAllocNodeGetParams.pNodeParams__val = *(data->args.hipGraphMemAllocNodeGetParams.pNodeParams); break; // hipGraphMemFreeNodeGetParams[('hipGraphNode_t', 'node'), ('void*', 'dev_ptr')] case HIP_API_ID_hipGraphMemFreeNodeGetParams: break; // hipGraphMemcpyNodeGetParams[('hipGraphNode_t', 'node'), ('hipMemcpy3DParms*', 'pNodeParams')] case HIP_API_ID_hipGraphMemcpyNodeGetParams: if (data->args.hipGraphMemcpyNodeGetParams.pNodeParams) data->args.hipGraphMemcpyNodeGetParams.pNodeParams__val = *(data->args.hipGraphMemcpyNodeGetParams.pNodeParams); break; // hipGraphMemcpyNodeSetParams[('hipGraphNode_t', 'node'), ('const hipMemcpy3DParms*', 'pNodeParams')] case HIP_API_ID_hipGraphMemcpyNodeSetParams: if (data->args.hipGraphMemcpyNodeSetParams.pNodeParams) data->args.hipGraphMemcpyNodeSetParams.pNodeParams__val = *(data->args.hipGraphMemcpyNodeSetParams.pNodeParams); break; // hipGraphMemcpyNodeSetParams1D[('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphMemcpyNodeSetParams1D: break; // hipGraphMemcpyNodeSetParamsFromSymbol[('hipGraphNode_t', 'node'), ('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphMemcpyNodeSetParamsFromSymbol: break; // hipGraphMemcpyNodeSetParamsToSymbol[('hipGraphNode_t', 'node'), ('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'count'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipGraphMemcpyNodeSetParamsToSymbol: break; // hipGraphMemsetNodeGetParams[('hipGraphNode_t', 'node'), ('hipMemsetParams*', 'pNodeParams')] case HIP_API_ID_hipGraphMemsetNodeGetParams: if (data->args.hipGraphMemsetNodeGetParams.pNodeParams) data->args.hipGraphMemsetNodeGetParams.pNodeParams__val = *(data->args.hipGraphMemsetNodeGetParams.pNodeParams); break; // hipGraphMemsetNodeSetParams[('hipGraphNode_t', 'node'), ('const hipMemsetParams*', 'pNodeParams')] case HIP_API_ID_hipGraphMemsetNodeSetParams: if (data->args.hipGraphMemsetNodeSetParams.pNodeParams) data->args.hipGraphMemsetNodeSetParams.pNodeParams__val = *(data->args.hipGraphMemsetNodeSetParams.pNodeParams); break; // hipGraphNodeFindInClone[('hipGraphNode_t*', 'pNode'), ('hipGraphNode_t', 'originalNode'), ('hipGraph_t', 'clonedGraph')] case HIP_API_ID_hipGraphNodeFindInClone: if (data->args.hipGraphNodeFindInClone.pNode) data->args.hipGraphNodeFindInClone.pNode__val = *(data->args.hipGraphNodeFindInClone.pNode); break; // hipGraphNodeGetDependencies[('hipGraphNode_t', 'node'), ('hipGraphNode_t*', 'pDependencies'), ('size_t*', 'pNumDependencies')] case HIP_API_ID_hipGraphNodeGetDependencies: if (data->args.hipGraphNodeGetDependencies.pDependencies) data->args.hipGraphNodeGetDependencies.pDependencies__val = *(data->args.hipGraphNodeGetDependencies.pDependencies); if (data->args.hipGraphNodeGetDependencies.pNumDependencies) data->args.hipGraphNodeGetDependencies.pNumDependencies__val = *(data->args.hipGraphNodeGetDependencies.pNumDependencies); break; // hipGraphNodeGetDependentNodes[('hipGraphNode_t', 'node'), ('hipGraphNode_t*', 'pDependentNodes'), ('size_t*', 'pNumDependentNodes')] case HIP_API_ID_hipGraphNodeGetDependentNodes: if (data->args.hipGraphNodeGetDependentNodes.pDependentNodes) data->args.hipGraphNodeGetDependentNodes.pDependentNodes__val = *(data->args.hipGraphNodeGetDependentNodes.pDependentNodes); if (data->args.hipGraphNodeGetDependentNodes.pNumDependentNodes) data->args.hipGraphNodeGetDependentNodes.pNumDependentNodes__val = *(data->args.hipGraphNodeGetDependentNodes.pNumDependentNodes); break; // hipGraphNodeGetEnabled[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('unsigned int*', 'isEnabled')] case HIP_API_ID_hipGraphNodeGetEnabled: if (data->args.hipGraphNodeGetEnabled.isEnabled) data->args.hipGraphNodeGetEnabled.isEnabled__val = *(data->args.hipGraphNodeGetEnabled.isEnabled); break; // hipGraphNodeGetType[('hipGraphNode_t', 'node'), ('hipGraphNodeType*', 'pType')] case HIP_API_ID_hipGraphNodeGetType: if (data->args.hipGraphNodeGetType.pType) data->args.hipGraphNodeGetType.pType__val = *(data->args.hipGraphNodeGetType.pType); break; // hipGraphNodeSetEnabled[('hipGraphExec_t', 'hGraphExec'), ('hipGraphNode_t', 'hNode'), ('unsigned int', 'isEnabled')] case HIP_API_ID_hipGraphNodeSetEnabled: break; // hipGraphReleaseUserObject[('hipGraph_t', 'graph'), ('hipUserObject_t', 'object'), ('unsigned int', 'count')] case HIP_API_ID_hipGraphReleaseUserObject: break; // hipGraphRemoveDependencies[('hipGraph_t', 'graph'), ('const hipGraphNode_t*', 'from'), ('const hipGraphNode_t*', 'to'), ('size_t', 'numDependencies')] case HIP_API_ID_hipGraphRemoveDependencies: if (data->args.hipGraphRemoveDependencies.from) data->args.hipGraphRemoveDependencies.from__val = *(data->args.hipGraphRemoveDependencies.from); if (data->args.hipGraphRemoveDependencies.to) data->args.hipGraphRemoveDependencies.to__val = *(data->args.hipGraphRemoveDependencies.to); break; // hipGraphRetainUserObject[('hipGraph_t', 'graph'), ('hipUserObject_t', 'object'), ('unsigned int', 'count'), ('unsigned int', 'flags')] case HIP_API_ID_hipGraphRetainUserObject: break; // hipGraphUpload[('hipGraphExec_t', 'graphExec'), ('hipStream_t', 'stream')] case HIP_API_ID_hipGraphUpload: break; // hipGraphicsGLRegisterBuffer[('hipGraphicsResource**', 'resource'), ('GLuint', 'buffer'), ('unsigned int', 'flags')] case HIP_API_ID_hipGraphicsGLRegisterBuffer: if (data->args.hipGraphicsGLRegisterBuffer.resource) data->args.hipGraphicsGLRegisterBuffer.resource__val = *(data->args.hipGraphicsGLRegisterBuffer.resource); break; // hipGraphicsGLRegisterImage[('hipGraphicsResource**', 'resource'), ('GLuint', 'image'), ('GLenum', 'target'), ('unsigned int', 'flags')] case HIP_API_ID_hipGraphicsGLRegisterImage: if (data->args.hipGraphicsGLRegisterImage.resource) data->args.hipGraphicsGLRegisterImage.resource__val = *(data->args.hipGraphicsGLRegisterImage.resource); break; // hipGraphicsMapResources[('int', 'count'), ('hipGraphicsResource_t*', 'resources'), ('hipStream_t', 'stream')] case HIP_API_ID_hipGraphicsMapResources: if (data->args.hipGraphicsMapResources.resources) data->args.hipGraphicsMapResources.resources__val = *(data->args.hipGraphicsMapResources.resources); break; // hipGraphicsResourceGetMappedPointer[('void**', 'devPtr'), ('size_t*', 'size'), ('hipGraphicsResource_t', 'resource')] case HIP_API_ID_hipGraphicsResourceGetMappedPointer: if (data->args.hipGraphicsResourceGetMappedPointer.devPtr) data->args.hipGraphicsResourceGetMappedPointer.devPtr__val = *(data->args.hipGraphicsResourceGetMappedPointer.devPtr); if (data->args.hipGraphicsResourceGetMappedPointer.size) data->args.hipGraphicsResourceGetMappedPointer.size__val = *(data->args.hipGraphicsResourceGetMappedPointer.size); break; // hipGraphicsSubResourceGetMappedArray[('hipArray_t*', 'array'), ('hipGraphicsResource_t', 'resource'), ('unsigned int', 'arrayIndex'), ('unsigned int', 'mipLevel')] case HIP_API_ID_hipGraphicsSubResourceGetMappedArray: if (data->args.hipGraphicsSubResourceGetMappedArray.array) data->args.hipGraphicsSubResourceGetMappedArray.array__val = *(data->args.hipGraphicsSubResourceGetMappedArray.array); break; // hipGraphicsUnmapResources[('int', 'count'), ('hipGraphicsResource_t*', 'resources'), ('hipStream_t', 'stream')] case HIP_API_ID_hipGraphicsUnmapResources: if (data->args.hipGraphicsUnmapResources.resources) data->args.hipGraphicsUnmapResources.resources__val = *(data->args.hipGraphicsUnmapResources.resources); break; // hipGraphicsUnregisterResource[('hipGraphicsResource_t', 'resource')] case HIP_API_ID_hipGraphicsUnregisterResource: break; // hipHccModuleLaunchKernel[('hipFunction_t', 'f'), ('unsigned int', 'globalWorkSizeX'), ('unsigned int', 'globalWorkSizeY'), ('unsigned int', 'globalWorkSizeZ'), ('unsigned int', 'blockDimX'), ('unsigned int', 'blockDimY'), ('unsigned int', 'blockDimZ'), ('size_t', 'sharedMemBytes'), ('hipStream_t', 'hStream'), ('void**', 'kernelParams'), ('void**', 'extra'), ('hipEvent_t', 'startEvent'), ('hipEvent_t', 'stopEvent')] case HIP_API_ID_hipHccModuleLaunchKernel: if (data->args.hipHccModuleLaunchKernel.kernelParams) data->args.hipHccModuleLaunchKernel.kernelParams__val = *(data->args.hipHccModuleLaunchKernel.kernelParams); if (data->args.hipHccModuleLaunchKernel.extra) data->args.hipHccModuleLaunchKernel.extra__val = *(data->args.hipHccModuleLaunchKernel.extra); break; // hipHostAlloc[('void**', 'ptr'), ('size_t', 'size'), ('unsigned int', 'flags')] case HIP_API_ID_hipHostAlloc: if (data->args.hipHostAlloc.ptr) data->args.hipHostAlloc.ptr__val = *(data->args.hipHostAlloc.ptr); break; // hipHostFree[('void*', 'ptr')] case HIP_API_ID_hipHostFree: break; // hipHostGetDevicePointer[('void**', 'devPtr'), ('void*', 'hstPtr'), ('unsigned int', 'flags')] case HIP_API_ID_hipHostGetDevicePointer: if (data->args.hipHostGetDevicePointer.devPtr) data->args.hipHostGetDevicePointer.devPtr__val = *(data->args.hipHostGetDevicePointer.devPtr); break; // hipHostGetFlags[('unsigned int*', 'flagsPtr'), ('void*', 'hostPtr')] case HIP_API_ID_hipHostGetFlags: if (data->args.hipHostGetFlags.flagsPtr) data->args.hipHostGetFlags.flagsPtr__val = *(data->args.hipHostGetFlags.flagsPtr); break; // hipHostMalloc[('void**', 'ptr'), ('size_t', 'size'), ('unsigned int', 'flags')] case HIP_API_ID_hipHostMalloc: if (data->args.hipHostMalloc.ptr) data->args.hipHostMalloc.ptr__val = *(data->args.hipHostMalloc.ptr); break; // hipHostRegister[('void*', 'hostPtr'), ('size_t', 'sizeBytes'), ('unsigned int', 'flags')] case HIP_API_ID_hipHostRegister: break; // hipHostUnregister[('void*', 'hostPtr')] case HIP_API_ID_hipHostUnregister: break; // hipImportExternalMemory[('hipExternalMemory_t*', 'extMem_out'), ('const hipExternalMemoryHandleDesc*', 'memHandleDesc')] case HIP_API_ID_hipImportExternalMemory: if (data->args.hipImportExternalMemory.extMem_out) data->args.hipImportExternalMemory.extMem_out__val = *(data->args.hipImportExternalMemory.extMem_out); if (data->args.hipImportExternalMemory.memHandleDesc) data->args.hipImportExternalMemory.memHandleDesc__val = *(data->args.hipImportExternalMemory.memHandleDesc); break; // hipImportExternalSemaphore[('hipExternalSemaphore_t*', 'extSem_out'), ('const hipExternalSemaphoreHandleDesc*', 'semHandleDesc')] case HIP_API_ID_hipImportExternalSemaphore: if (data->args.hipImportExternalSemaphore.extSem_out) data->args.hipImportExternalSemaphore.extSem_out__val = *(data->args.hipImportExternalSemaphore.extSem_out); if (data->args.hipImportExternalSemaphore.semHandleDesc) data->args.hipImportExternalSemaphore.semHandleDesc__val = *(data->args.hipImportExternalSemaphore.semHandleDesc); break; // hipInit[('unsigned int', 'flags')] case HIP_API_ID_hipInit: break; // hipIpcCloseMemHandle[('void*', 'devPtr')] case HIP_API_ID_hipIpcCloseMemHandle: break; // hipIpcGetEventHandle[('hipIpcEventHandle_t*', 'handle'), ('hipEvent_t', 'event')] case HIP_API_ID_hipIpcGetEventHandle: if (data->args.hipIpcGetEventHandle.handle) data->args.hipIpcGetEventHandle.handle__val = *(data->args.hipIpcGetEventHandle.handle); break; // hipIpcGetMemHandle[('hipIpcMemHandle_t*', 'handle'), ('void*', 'devPtr')] case HIP_API_ID_hipIpcGetMemHandle: if (data->args.hipIpcGetMemHandle.handle) data->args.hipIpcGetMemHandle.handle__val = *(data->args.hipIpcGetMemHandle.handle); break; // hipIpcOpenEventHandle[('hipEvent_t*', 'event'), ('hipIpcEventHandle_t', 'handle')] case HIP_API_ID_hipIpcOpenEventHandle: if (data->args.hipIpcOpenEventHandle.event) data->args.hipIpcOpenEventHandle.event__val = *(data->args.hipIpcOpenEventHandle.event); break; // hipIpcOpenMemHandle[('void**', 'devPtr'), ('hipIpcMemHandle_t', 'handle'), ('unsigned int', 'flags')] case HIP_API_ID_hipIpcOpenMemHandle: if (data->args.hipIpcOpenMemHandle.devPtr) data->args.hipIpcOpenMemHandle.devPtr__val = *(data->args.hipIpcOpenMemHandle.devPtr); break; // hipLaunchByPtr[('const void*', 'hostFunction')] case HIP_API_ID_hipLaunchByPtr: break; // hipLaunchCooperativeKernel[('const void*', 'f'), ('dim3', 'gridDim'), ('dim3', 'blockDimX'), ('void**', 'kernelParams'), ('unsigned int', 'sharedMemBytes'), ('hipStream_t', 'stream')] case HIP_API_ID_hipLaunchCooperativeKernel: if (data->args.hipLaunchCooperativeKernel.kernelParams) data->args.hipLaunchCooperativeKernel.kernelParams__val = *(data->args.hipLaunchCooperativeKernel.kernelParams); break; // hipLaunchCooperativeKernelMultiDevice[('hipLaunchParams*', 'launchParamsList'), ('int', 'numDevices'), ('unsigned int', 'flags')] case HIP_API_ID_hipLaunchCooperativeKernelMultiDevice: if (data->args.hipLaunchCooperativeKernelMultiDevice.launchParamsList) data->args.hipLaunchCooperativeKernelMultiDevice.launchParamsList__val = *(data->args.hipLaunchCooperativeKernelMultiDevice.launchParamsList); break; // hipLaunchHostFunc[('hipStream_t', 'stream'), ('hipHostFn_t', 'fn'), ('void*', 'userData')] case HIP_API_ID_hipLaunchHostFunc: break; // hipLaunchKernel[('const void*', 'function_address'), ('dim3', 'numBlocks'), ('dim3', 'dimBlocks'), ('void**', 'args'), ('size_t', 'sharedMemBytes'), ('hipStream_t', 'stream')] case HIP_API_ID_hipLaunchKernel: if (data->args.hipLaunchKernel.args) data->args.hipLaunchKernel.args__val = *(data->args.hipLaunchKernel.args); break; // hipMalloc[('void**', 'ptr'), ('size_t', 'size')] case HIP_API_ID_hipMalloc: if (data->args.hipMalloc.ptr) data->args.hipMalloc.ptr__val = *(data->args.hipMalloc.ptr); break; // hipMalloc3D[('hipPitchedPtr*', 'pitchedDevPtr'), ('hipExtent', 'extent')] case HIP_API_ID_hipMalloc3D: if (data->args.hipMalloc3D.pitchedDevPtr) data->args.hipMalloc3D.pitchedDevPtr__val = *(data->args.hipMalloc3D.pitchedDevPtr); break; // hipMalloc3DArray[('hipArray_t*', 'array'), ('const hipChannelFormatDesc*', 'desc'), ('hipExtent', 'extent'), ('unsigned int', 'flags')] case HIP_API_ID_hipMalloc3DArray: if (data->args.hipMalloc3DArray.array) data->args.hipMalloc3DArray.array__val = *(data->args.hipMalloc3DArray.array); if (data->args.hipMalloc3DArray.desc) data->args.hipMalloc3DArray.desc__val = *(data->args.hipMalloc3DArray.desc); break; // hipMallocArray[('hipArray**', 'array'), ('const hipChannelFormatDesc*', 'desc'), ('size_t', 'width'), ('size_t', 'height'), ('unsigned int', 'flags')] case HIP_API_ID_hipMallocArray: if (data->args.hipMallocArray.array) data->args.hipMallocArray.array__val = *(data->args.hipMallocArray.array); if (data->args.hipMallocArray.desc) data->args.hipMallocArray.desc__val = *(data->args.hipMallocArray.desc); break; // hipMallocAsync[('void**', 'dev_ptr'), ('size_t', 'size'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMallocAsync: if (data->args.hipMallocAsync.dev_ptr) data->args.hipMallocAsync.dev_ptr__val = *(data->args.hipMallocAsync.dev_ptr); break; // hipMallocFromPoolAsync[('void**', 'dev_ptr'), ('size_t', 'size'), ('hipMemPool_t', 'mem_pool'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMallocFromPoolAsync: if (data->args.hipMallocFromPoolAsync.dev_ptr) data->args.hipMallocFromPoolAsync.dev_ptr__val = *(data->args.hipMallocFromPoolAsync.dev_ptr); break; // hipMallocHost[('void**', 'ptr'), ('size_t', 'size')] case HIP_API_ID_hipMallocHost: if (data->args.hipMallocHost.ptr) data->args.hipMallocHost.ptr__val = *(data->args.hipMallocHost.ptr); break; // hipMallocManaged[('void**', 'dev_ptr'), ('size_t', 'size'), ('unsigned int', 'flags')] case HIP_API_ID_hipMallocManaged: if (data->args.hipMallocManaged.dev_ptr) data->args.hipMallocManaged.dev_ptr__val = *(data->args.hipMallocManaged.dev_ptr); break; // hipMallocMipmappedArray[('hipMipmappedArray_t*', 'mipmappedArray'), ('const hipChannelFormatDesc*', 'desc'), ('hipExtent', 'extent'), ('unsigned int', 'numLevels'), ('unsigned int', 'flags')] case HIP_API_ID_hipMallocMipmappedArray: if (data->args.hipMallocMipmappedArray.mipmappedArray) data->args.hipMallocMipmappedArray.mipmappedArray__val = *(data->args.hipMallocMipmappedArray.mipmappedArray); if (data->args.hipMallocMipmappedArray.desc) data->args.hipMallocMipmappedArray.desc__val = *(data->args.hipMallocMipmappedArray.desc); break; // hipMallocPitch[('void**', 'ptr'), ('size_t*', 'pitch'), ('size_t', 'width'), ('size_t', 'height')] case HIP_API_ID_hipMallocPitch: if (data->args.hipMallocPitch.ptr) data->args.hipMallocPitch.ptr__val = *(data->args.hipMallocPitch.ptr); if (data->args.hipMallocPitch.pitch) data->args.hipMallocPitch.pitch__val = *(data->args.hipMallocPitch.pitch); break; // hipMemAddressFree[('void*', 'devPtr'), ('size_t', 'size')] case HIP_API_ID_hipMemAddressFree: break; // hipMemAddressReserve[('void**', 'ptr'), ('size_t', 'size'), ('size_t', 'alignment'), ('void*', 'addr'), ('unsigned long long', 'flags')] case HIP_API_ID_hipMemAddressReserve: if (data->args.hipMemAddressReserve.ptr) data->args.hipMemAddressReserve.ptr__val = *(data->args.hipMemAddressReserve.ptr); break; // hipMemAdvise[('const void*', 'dev_ptr'), ('size_t', 'count'), ('hipMemoryAdvise', 'advice'), ('int', 'device')] case HIP_API_ID_hipMemAdvise: break; // hipMemAllocHost[('void**', 'ptr'), ('size_t', 'size')] case HIP_API_ID_hipMemAllocHost: if (data->args.hipMemAllocHost.ptr) data->args.hipMemAllocHost.ptr__val = *(data->args.hipMemAllocHost.ptr); break; // hipMemAllocPitch[('hipDeviceptr_t*', 'dptr'), ('size_t*', 'pitch'), ('size_t', 'widthInBytes'), ('size_t', 'height'), ('unsigned int', 'elementSizeBytes')] case HIP_API_ID_hipMemAllocPitch: if (data->args.hipMemAllocPitch.dptr) data->args.hipMemAllocPitch.dptr__val = *(data->args.hipMemAllocPitch.dptr); if (data->args.hipMemAllocPitch.pitch) data->args.hipMemAllocPitch.pitch__val = *(data->args.hipMemAllocPitch.pitch); break; // hipMemCreate[('hipMemGenericAllocationHandle_t*', 'handle'), ('size_t', 'size'), ('const hipMemAllocationProp*', 'prop'), ('unsigned long long', 'flags')] case HIP_API_ID_hipMemCreate: if (data->args.hipMemCreate.handle) data->args.hipMemCreate.handle__val = *(data->args.hipMemCreate.handle); if (data->args.hipMemCreate.prop) data->args.hipMemCreate.prop__val = *(data->args.hipMemCreate.prop); break; // hipMemExportToShareableHandle[('void*', 'shareableHandle'), ('hipMemGenericAllocationHandle_t', 'handle'), ('hipMemAllocationHandleType', 'handleType'), ('unsigned long long', 'flags')] case HIP_API_ID_hipMemExportToShareableHandle: break; // hipMemGetAccess[('unsigned long long*', 'flags'), ('const hipMemLocation*', 'location'), ('void*', 'ptr')] case HIP_API_ID_hipMemGetAccess: if (data->args.hipMemGetAccess.flags) data->args.hipMemGetAccess.flags__val = *(data->args.hipMemGetAccess.flags); if (data->args.hipMemGetAccess.location) data->args.hipMemGetAccess.location__val = *(data->args.hipMemGetAccess.location); break; // hipMemGetAddressRange[('hipDeviceptr_t*', 'pbase'), ('size_t*', 'psize'), ('hipDeviceptr_t', 'dptr')] case HIP_API_ID_hipMemGetAddressRange: if (data->args.hipMemGetAddressRange.pbase) data->args.hipMemGetAddressRange.pbase__val = *(data->args.hipMemGetAddressRange.pbase); if (data->args.hipMemGetAddressRange.psize) data->args.hipMemGetAddressRange.psize__val = *(data->args.hipMemGetAddressRange.psize); break; // hipMemGetAllocationGranularity[('size_t*', 'granularity'), ('const hipMemAllocationProp*', 'prop'), ('hipMemAllocationGranularity_flags', 'option')] case HIP_API_ID_hipMemGetAllocationGranularity: if (data->args.hipMemGetAllocationGranularity.granularity) data->args.hipMemGetAllocationGranularity.granularity__val = *(data->args.hipMemGetAllocationGranularity.granularity); if (data->args.hipMemGetAllocationGranularity.prop) data->args.hipMemGetAllocationGranularity.prop__val = *(data->args.hipMemGetAllocationGranularity.prop); break; // hipMemGetAllocationPropertiesFromHandle[('hipMemAllocationProp*', 'prop'), ('hipMemGenericAllocationHandle_t', 'handle')] case HIP_API_ID_hipMemGetAllocationPropertiesFromHandle: if (data->args.hipMemGetAllocationPropertiesFromHandle.prop) data->args.hipMemGetAllocationPropertiesFromHandle.prop__val = *(data->args.hipMemGetAllocationPropertiesFromHandle.prop); break; // hipMemGetInfo[('size_t*', 'free'), ('size_t*', 'total')] case HIP_API_ID_hipMemGetInfo: if (data->args.hipMemGetInfo.free) data->args.hipMemGetInfo.free__val = *(data->args.hipMemGetInfo.free); if (data->args.hipMemGetInfo.total) data->args.hipMemGetInfo.total__val = *(data->args.hipMemGetInfo.total); break; // hipMemImportFromShareableHandle[('hipMemGenericAllocationHandle_t*', 'handle'), ('void*', 'osHandle'), ('hipMemAllocationHandleType', 'shHandleType')] case HIP_API_ID_hipMemImportFromShareableHandle: if (data->args.hipMemImportFromShareableHandle.handle) data->args.hipMemImportFromShareableHandle.handle__val = *(data->args.hipMemImportFromShareableHandle.handle); break; // hipMemMap[('void*', 'ptr'), ('size_t', 'size'), ('size_t', 'offset'), ('hipMemGenericAllocationHandle_t', 'handle'), ('unsigned long long', 'flags')] case HIP_API_ID_hipMemMap: break; // hipMemMapArrayAsync[('hipArrayMapInfo*', 'mapInfoList'), ('unsigned int', 'count'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemMapArrayAsync: if (data->args.hipMemMapArrayAsync.mapInfoList) data->args.hipMemMapArrayAsync.mapInfoList__val = *(data->args.hipMemMapArrayAsync.mapInfoList); break; // hipMemPoolCreate[('hipMemPool_t*', 'mem_pool'), ('const hipMemPoolProps*', 'pool_props')] case HIP_API_ID_hipMemPoolCreate: if (data->args.hipMemPoolCreate.mem_pool) data->args.hipMemPoolCreate.mem_pool__val = *(data->args.hipMemPoolCreate.mem_pool); if (data->args.hipMemPoolCreate.pool_props) data->args.hipMemPoolCreate.pool_props__val = *(data->args.hipMemPoolCreate.pool_props); break; // hipMemPoolDestroy[('hipMemPool_t', 'mem_pool')] case HIP_API_ID_hipMemPoolDestroy: break; // hipMemPoolExportPointer[('hipMemPoolPtrExportData*', 'export_data'), ('void*', 'dev_ptr')] case HIP_API_ID_hipMemPoolExportPointer: if (data->args.hipMemPoolExportPointer.export_data) data->args.hipMemPoolExportPointer.export_data__val = *(data->args.hipMemPoolExportPointer.export_data); break; // hipMemPoolExportToShareableHandle[('void*', 'shared_handle'), ('hipMemPool_t', 'mem_pool'), ('hipMemAllocationHandleType', 'handle_type'), ('unsigned int', 'flags')] case HIP_API_ID_hipMemPoolExportToShareableHandle: break; // hipMemPoolGetAccess[('hipMemAccessFlags*', 'flags'), ('hipMemPool_t', 'mem_pool'), ('hipMemLocation*', 'location')] case HIP_API_ID_hipMemPoolGetAccess: if (data->args.hipMemPoolGetAccess.flags) data->args.hipMemPoolGetAccess.flags__val = *(data->args.hipMemPoolGetAccess.flags); if (data->args.hipMemPoolGetAccess.location) data->args.hipMemPoolGetAccess.location__val = *(data->args.hipMemPoolGetAccess.location); break; // hipMemPoolGetAttribute[('hipMemPool_t', 'mem_pool'), ('hipMemPoolAttr', 'attr'), ('void*', 'value')] case HIP_API_ID_hipMemPoolGetAttribute: break; // hipMemPoolImportFromShareableHandle[('hipMemPool_t*', 'mem_pool'), ('void*', 'shared_handle'), ('hipMemAllocationHandleType', 'handle_type'), ('unsigned int', 'flags')] case HIP_API_ID_hipMemPoolImportFromShareableHandle: if (data->args.hipMemPoolImportFromShareableHandle.mem_pool) data->args.hipMemPoolImportFromShareableHandle.mem_pool__val = *(data->args.hipMemPoolImportFromShareableHandle.mem_pool); break; // hipMemPoolImportPointer[('void**', 'dev_ptr'), ('hipMemPool_t', 'mem_pool'), ('hipMemPoolPtrExportData*', 'export_data')] case HIP_API_ID_hipMemPoolImportPointer: if (data->args.hipMemPoolImportPointer.dev_ptr) data->args.hipMemPoolImportPointer.dev_ptr__val = *(data->args.hipMemPoolImportPointer.dev_ptr); if (data->args.hipMemPoolImportPointer.export_data) data->args.hipMemPoolImportPointer.export_data__val = *(data->args.hipMemPoolImportPointer.export_data); break; // hipMemPoolSetAccess[('hipMemPool_t', 'mem_pool'), ('const hipMemAccessDesc*', 'desc_list'), ('size_t', 'count')] case HIP_API_ID_hipMemPoolSetAccess: if (data->args.hipMemPoolSetAccess.desc_list) data->args.hipMemPoolSetAccess.desc_list__val = *(data->args.hipMemPoolSetAccess.desc_list); break; // hipMemPoolSetAttribute[('hipMemPool_t', 'mem_pool'), ('hipMemPoolAttr', 'attr'), ('void*', 'value')] case HIP_API_ID_hipMemPoolSetAttribute: break; // hipMemPoolTrimTo[('hipMemPool_t', 'mem_pool'), ('size_t', 'min_bytes_to_hold')] case HIP_API_ID_hipMemPoolTrimTo: break; // hipMemPrefetchAsync[('const void*', 'dev_ptr'), ('size_t', 'count'), ('int', 'device'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemPrefetchAsync: break; // hipMemPtrGetInfo[('void*', 'ptr'), ('size_t*', 'size')] case HIP_API_ID_hipMemPtrGetInfo: if (data->args.hipMemPtrGetInfo.size) data->args.hipMemPtrGetInfo.size__val = *(data->args.hipMemPtrGetInfo.size); break; // hipMemRangeGetAttribute[('void*', 'data'), ('size_t', 'data_size'), ('hipMemRangeAttribute', 'attribute'), ('const void*', 'dev_ptr'), ('size_t', 'count')] case HIP_API_ID_hipMemRangeGetAttribute: break; // hipMemRangeGetAttributes[('void**', 'data'), ('size_t*', 'data_sizes'), ('hipMemRangeAttribute*', 'attributes'), ('size_t', 'num_attributes'), ('const void*', 'dev_ptr'), ('size_t', 'count')] case HIP_API_ID_hipMemRangeGetAttributes: if (data->args.hipMemRangeGetAttributes.data) data->args.hipMemRangeGetAttributes.data__val = *(data->args.hipMemRangeGetAttributes.data); if (data->args.hipMemRangeGetAttributes.data_sizes) data->args.hipMemRangeGetAttributes.data_sizes__val = *(data->args.hipMemRangeGetAttributes.data_sizes); if (data->args.hipMemRangeGetAttributes.attributes) data->args.hipMemRangeGetAttributes.attributes__val = *(data->args.hipMemRangeGetAttributes.attributes); break; // hipMemRelease[('hipMemGenericAllocationHandle_t', 'handle')] case HIP_API_ID_hipMemRelease: break; // hipMemRetainAllocationHandle[('hipMemGenericAllocationHandle_t*', 'handle'), ('void*', 'addr')] case HIP_API_ID_hipMemRetainAllocationHandle: if (data->args.hipMemRetainAllocationHandle.handle) data->args.hipMemRetainAllocationHandle.handle__val = *(data->args.hipMemRetainAllocationHandle.handle); break; // hipMemSetAccess[('void*', 'ptr'), ('size_t', 'size'), ('const hipMemAccessDesc*', 'desc'), ('size_t', 'count')] case HIP_API_ID_hipMemSetAccess: if (data->args.hipMemSetAccess.desc) data->args.hipMemSetAccess.desc__val = *(data->args.hipMemSetAccess.desc); break; // hipMemUnmap[('void*', 'ptr'), ('size_t', 'size')] case HIP_API_ID_hipMemUnmap: break; // hipMemcpy[('void*', 'dst'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipMemcpy: break; // hipMemcpy2D[('void*', 'dst'), ('size_t', 'dpitch'), ('const void*', 'src'), ('size_t', 'spitch'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipMemcpy2D: break; // hipMemcpy2DAsync[('void*', 'dst'), ('size_t', 'dpitch'), ('const void*', 'src'), ('size_t', 'spitch'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpy2DAsync: break; // hipMemcpy2DFromArray[('void*', 'dst'), ('size_t', 'dpitch'), ('hipArray_const_t', 'src'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipMemcpy2DFromArray: break; // hipMemcpy2DFromArrayAsync[('void*', 'dst'), ('size_t', 'dpitch'), ('hipArray_const_t', 'src'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpy2DFromArrayAsync: break; // hipMemcpy2DToArray[('hipArray*', 'dst'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('const void*', 'src'), ('size_t', 'spitch'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipMemcpy2DToArray: if (data->args.hipMemcpy2DToArray.dst) data->args.hipMemcpy2DToArray.dst__val = *(data->args.hipMemcpy2DToArray.dst); break; // hipMemcpy2DToArrayAsync[('hipArray*', 'dst'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('const void*', 'src'), ('size_t', 'spitch'), ('size_t', 'width'), ('size_t', 'height'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpy2DToArrayAsync: if (data->args.hipMemcpy2DToArrayAsync.dst) data->args.hipMemcpy2DToArrayAsync.dst__val = *(data->args.hipMemcpy2DToArrayAsync.dst); break; // hipMemcpy3D[('const hipMemcpy3DParms*', 'p')] case HIP_API_ID_hipMemcpy3D: if (data->args.hipMemcpy3D.p) data->args.hipMemcpy3D.p__val = *(data->args.hipMemcpy3D.p); break; // hipMemcpy3DAsync[('const hipMemcpy3DParms*', 'p'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpy3DAsync: if (data->args.hipMemcpy3DAsync.p) data->args.hipMemcpy3DAsync.p__val = *(data->args.hipMemcpy3DAsync.p); break; // hipMemcpyAsync[('void*', 'dst'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyAsync: break; // hipMemcpyAtoH[('void*', 'dst'), ('hipArray*', 'srcArray'), ('size_t', 'srcOffset'), ('size_t', 'count')] case HIP_API_ID_hipMemcpyAtoH: if (data->args.hipMemcpyAtoH.srcArray) data->args.hipMemcpyAtoH.srcArray__val = *(data->args.hipMemcpyAtoH.srcArray); break; // hipMemcpyDtoD[('hipDeviceptr_t', 'dst'), ('hipDeviceptr_t', 'src'), ('size_t', 'sizeBytes')] case HIP_API_ID_hipMemcpyDtoD: break; // hipMemcpyDtoDAsync[('hipDeviceptr_t', 'dst'), ('hipDeviceptr_t', 'src'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyDtoDAsync: break; // hipMemcpyDtoH[('void*', 'dst'), ('hipDeviceptr_t', 'src'), ('size_t', 'sizeBytes')] case HIP_API_ID_hipMemcpyDtoH: break; // hipMemcpyDtoHAsync[('void*', 'dst'), ('hipDeviceptr_t', 'src'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyDtoHAsync: break; // hipMemcpyFromArray[('void*', 'dst'), ('hipArray_const_t', 'srcArray'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipMemcpyFromArray: break; // hipMemcpyFromSymbol[('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'sizeBytes'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipMemcpyFromSymbol: break; // hipMemcpyFromSymbolAsync[('void*', 'dst'), ('const void*', 'symbol'), ('size_t', 'sizeBytes'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyFromSymbolAsync: break; // hipMemcpyHtoA[('hipArray*', 'dstArray'), ('size_t', 'dstOffset'), ('const void*', 'srcHost'), ('size_t', 'count')] case HIP_API_ID_hipMemcpyHtoA: if (data->args.hipMemcpyHtoA.dstArray) data->args.hipMemcpyHtoA.dstArray__val = *(data->args.hipMemcpyHtoA.dstArray); break; // hipMemcpyHtoD[('hipDeviceptr_t', 'dst'), ('void*', 'src'), ('size_t', 'sizeBytes')] case HIP_API_ID_hipMemcpyHtoD: break; // hipMemcpyHtoDAsync[('hipDeviceptr_t', 'dst'), ('void*', 'src'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyHtoDAsync: break; // hipMemcpyParam2D[('const hip_Memcpy2D*', 'pCopy')] case HIP_API_ID_hipMemcpyParam2D: if (data->args.hipMemcpyParam2D.pCopy) data->args.hipMemcpyParam2D.pCopy__val = *(data->args.hipMemcpyParam2D.pCopy); break; // hipMemcpyParam2DAsync[('const hip_Memcpy2D*', 'pCopy'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyParam2DAsync: if (data->args.hipMemcpyParam2DAsync.pCopy) data->args.hipMemcpyParam2DAsync.pCopy__val = *(data->args.hipMemcpyParam2DAsync.pCopy); break; // hipMemcpyPeer[('void*', 'dst'), ('int', 'dstDeviceId'), ('const void*', 'src'), ('int', 'srcDeviceId'), ('size_t', 'sizeBytes')] case HIP_API_ID_hipMemcpyPeer: break; // hipMemcpyPeerAsync[('void*', 'dst'), ('int', 'dstDeviceId'), ('const void*', 'src'), ('int', 'srcDevice'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyPeerAsync: break; // hipMemcpyToArray[('hipArray*', 'dst'), ('size_t', 'wOffset'), ('size_t', 'hOffset'), ('const void*', 'src'), ('size_t', 'count'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipMemcpyToArray: if (data->args.hipMemcpyToArray.dst) data->args.hipMemcpyToArray.dst__val = *(data->args.hipMemcpyToArray.dst); break; // hipMemcpyToSymbol[('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind')] case HIP_API_ID_hipMemcpyToSymbol: break; // hipMemcpyToSymbolAsync[('const void*', 'symbol'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('size_t', 'offset'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyToSymbolAsync: break; // hipMemcpyWithStream[('void*', 'dst'), ('const void*', 'src'), ('size_t', 'sizeBytes'), ('hipMemcpyKind', 'kind'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemcpyWithStream: break; // hipMemset[('void*', 'dst'), ('int', 'value'), ('size_t', 'sizeBytes')] case HIP_API_ID_hipMemset: break; // hipMemset2D[('void*', 'dst'), ('size_t', 'pitch'), ('int', 'value'), ('size_t', 'width'), ('size_t', 'height')] case HIP_API_ID_hipMemset2D: break; // hipMemset2DAsync[('void*', 'dst'), ('size_t', 'pitch'), ('int', 'value'), ('size_t', 'width'), ('size_t', 'height'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemset2DAsync: break; // hipMemset3D[('hipPitchedPtr', 'pitchedDevPtr'), ('int', 'value'), ('hipExtent', 'extent')] case HIP_API_ID_hipMemset3D: break; // hipMemset3DAsync[('hipPitchedPtr', 'pitchedDevPtr'), ('int', 'value'), ('hipExtent', 'extent'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemset3DAsync: break; // hipMemsetAsync[('void*', 'dst'), ('int', 'value'), ('size_t', 'sizeBytes'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemsetAsync: break; // hipMemsetD16[('hipDeviceptr_t', 'dest'), ('unsigned short', 'value'), ('size_t', 'count')] case HIP_API_ID_hipMemsetD16: break; // hipMemsetD16Async[('hipDeviceptr_t', 'dest'), ('unsigned short', 'value'), ('size_t', 'count'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemsetD16Async: break; // hipMemsetD32[('hipDeviceptr_t', 'dest'), ('int', 'value'), ('size_t', 'count')] case HIP_API_ID_hipMemsetD32: break; // hipMemsetD32Async[('hipDeviceptr_t', 'dst'), ('int', 'value'), ('size_t', 'count'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemsetD32Async: break; // hipMemsetD8[('hipDeviceptr_t', 'dest'), ('unsigned char', 'value'), ('size_t', 'count')] case HIP_API_ID_hipMemsetD8: break; // hipMemsetD8Async[('hipDeviceptr_t', 'dest'), ('unsigned char', 'value'), ('size_t', 'count'), ('hipStream_t', 'stream')] case HIP_API_ID_hipMemsetD8Async: break; // hipMipmappedArrayCreate[('hipMipmappedArray_t*', 'pHandle'), ('HIP_ARRAY3D_DESCRIPTOR*', 'pMipmappedArrayDesc'), ('unsigned int', 'numMipmapLevels')] case HIP_API_ID_hipMipmappedArrayCreate: if (data->args.hipMipmappedArrayCreate.pHandle) data->args.hipMipmappedArrayCreate.pHandle__val = *(data->args.hipMipmappedArrayCreate.pHandle); if (data->args.hipMipmappedArrayCreate.pMipmappedArrayDesc) data->args.hipMipmappedArrayCreate.pMipmappedArrayDesc__val = *(data->args.hipMipmappedArrayCreate.pMipmappedArrayDesc); break; // hipMipmappedArrayDestroy[('hipMipmappedArray_t', 'hMipmappedArray')] case HIP_API_ID_hipMipmappedArrayDestroy: break; // hipMipmappedArrayGetLevel[('hipArray_t*', 'pLevelArray'), ('hipMipmappedArray_t', 'hMipMappedArray'), ('unsigned int', 'level')] case HIP_API_ID_hipMipmappedArrayGetLevel: if (data->args.hipMipmappedArrayGetLevel.pLevelArray) data->args.hipMipmappedArrayGetLevel.pLevelArray__val = *(data->args.hipMipmappedArrayGetLevel.pLevelArray); break; // hipModuleGetFunction[('hipFunction_t*', 'function'), ('hipModule_t', 'module'), ('const char*', 'kname')] case HIP_API_ID_hipModuleGetFunction: if (data->args.hipModuleGetFunction.function) data->args.hipModuleGetFunction.function__val = *(data->args.hipModuleGetFunction.function); if (data->args.hipModuleGetFunction.kname) data->args.hipModuleGetFunction.kname__val = *(data->args.hipModuleGetFunction.kname); break; // hipModuleGetGlobal[('hipDeviceptr_t*', 'dptr'), ('size_t*', 'bytes'), ('hipModule_t', 'hmod'), ('const char*', 'name')] case HIP_API_ID_hipModuleGetGlobal: if (data->args.hipModuleGetGlobal.dptr) data->args.hipModuleGetGlobal.dptr__val = *(data->args.hipModuleGetGlobal.dptr); if (data->args.hipModuleGetGlobal.bytes) data->args.hipModuleGetGlobal.bytes__val = *(data->args.hipModuleGetGlobal.bytes); if (data->args.hipModuleGetGlobal.name) data->args.hipModuleGetGlobal.name__val = *(data->args.hipModuleGetGlobal.name); break; // hipModuleGetTexRef[('textureReference**', 'texRef'), ('hipModule_t', 'hmod'), ('const char*', 'name')] case HIP_API_ID_hipModuleGetTexRef: if (data->args.hipModuleGetTexRef.texRef) data->args.hipModuleGetTexRef.texRef__val = *(data->args.hipModuleGetTexRef.texRef); if (data->args.hipModuleGetTexRef.name) data->args.hipModuleGetTexRef.name__val = *(data->args.hipModuleGetTexRef.name); break; // hipModuleLaunchCooperativeKernel[('hipFunction_t', 'f'), ('unsigned int', 'gridDimX'), ('unsigned int', 'gridDimY'), ('unsigned int', 'gridDimZ'), ('unsigned int', 'blockDimX'), ('unsigned int', 'blockDimY'), ('unsigned int', 'blockDimZ'), ('unsigned int', 'sharedMemBytes'), ('hipStream_t', 'stream'), ('void**', 'kernelParams')] case HIP_API_ID_hipModuleLaunchCooperativeKernel: if (data->args.hipModuleLaunchCooperativeKernel.kernelParams) data->args.hipModuleLaunchCooperativeKernel.kernelParams__val = *(data->args.hipModuleLaunchCooperativeKernel.kernelParams); break; // hipModuleLaunchCooperativeKernelMultiDevice[('hipFunctionLaunchParams*', 'launchParamsList'), ('unsigned int', 'numDevices'), ('unsigned int', 'flags')] case HIP_API_ID_hipModuleLaunchCooperativeKernelMultiDevice: if (data->args.hipModuleLaunchCooperativeKernelMultiDevice.launchParamsList) data->args.hipModuleLaunchCooperativeKernelMultiDevice.launchParamsList__val = *(data->args.hipModuleLaunchCooperativeKernelMultiDevice.launchParamsList); break; // hipModuleLaunchKernel[('hipFunction_t', 'f'), ('unsigned int', 'gridDimX'), ('unsigned int', 'gridDimY'), ('unsigned int', 'gridDimZ'), ('unsigned int', 'blockDimX'), ('unsigned int', 'blockDimY'), ('unsigned int', 'blockDimZ'), ('unsigned int', 'sharedMemBytes'), ('hipStream_t', 'stream'), ('void**', 'kernelParams'), ('void**', 'extra')] case HIP_API_ID_hipModuleLaunchKernel: if (data->args.hipModuleLaunchKernel.kernelParams) data->args.hipModuleLaunchKernel.kernelParams__val = *(data->args.hipModuleLaunchKernel.kernelParams); if (data->args.hipModuleLaunchKernel.extra) data->args.hipModuleLaunchKernel.extra__val = *(data->args.hipModuleLaunchKernel.extra); break; // hipModuleLoad[('hipModule_t*', 'module'), ('const char*', 'fname')] case HIP_API_ID_hipModuleLoad: if (data->args.hipModuleLoad.module) data->args.hipModuleLoad.module__val = *(data->args.hipModuleLoad.module); if (data->args.hipModuleLoad.fname) data->args.hipModuleLoad.fname__val = *(data->args.hipModuleLoad.fname); break; // hipModuleLoadData[('hipModule_t*', 'module'), ('const void*', 'image')] case HIP_API_ID_hipModuleLoadData: if (data->args.hipModuleLoadData.module) data->args.hipModuleLoadData.module__val = *(data->args.hipModuleLoadData.module); break; // hipModuleLoadDataEx[('hipModule_t*', 'module'), ('const void*', 'image'), ('unsigned int', 'numOptions'), ('hipJitOption*', 'options'), ('void**', 'optionsValues')] case HIP_API_ID_hipModuleLoadDataEx: if (data->args.hipModuleLoadDataEx.module) data->args.hipModuleLoadDataEx.module__val = *(data->args.hipModuleLoadDataEx.module); if (data->args.hipModuleLoadDataEx.options) data->args.hipModuleLoadDataEx.options__val = *(data->args.hipModuleLoadDataEx.options); if (data->args.hipModuleLoadDataEx.optionsValues) data->args.hipModuleLoadDataEx.optionsValues__val = *(data->args.hipModuleLoadDataEx.optionsValues); break; // hipModuleOccupancyMaxActiveBlocksPerMultiprocessor[('int*', 'numBlocks'), ('hipFunction_t', 'f'), ('int', 'blockSize'), ('size_t', 'dynSharedMemPerBlk')] case HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessor: if (data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks) data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks__val = *(data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks); break; // hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags[('int*', 'numBlocks'), ('hipFunction_t', 'f'), ('int', 'blockSize'), ('size_t', 'dynSharedMemPerBlk'), ('unsigned int', 'flags')] case HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags: if (data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks) data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks__val = *(data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks); break; // hipModuleOccupancyMaxPotentialBlockSize[('int*', 'gridSize'), ('int*', 'blockSize'), ('hipFunction_t', 'f'), ('size_t', 'dynSharedMemPerBlk'), ('int', 'blockSizeLimit')] case HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSize: if (data->args.hipModuleOccupancyMaxPotentialBlockSize.gridSize) data->args.hipModuleOccupancyMaxPotentialBlockSize.gridSize__val = *(data->args.hipModuleOccupancyMaxPotentialBlockSize.gridSize); if (data->args.hipModuleOccupancyMaxPotentialBlockSize.blockSize) data->args.hipModuleOccupancyMaxPotentialBlockSize.blockSize__val = *(data->args.hipModuleOccupancyMaxPotentialBlockSize.blockSize); break; // hipModuleOccupancyMaxPotentialBlockSizeWithFlags[('int*', 'gridSize'), ('int*', 'blockSize'), ('hipFunction_t', 'f'), ('size_t', 'dynSharedMemPerBlk'), ('int', 'blockSizeLimit'), ('unsigned int', 'flags')] case HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSizeWithFlags: if (data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.gridSize) data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.gridSize__val = *(data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.gridSize); if (data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.blockSize) data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.blockSize__val = *(data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.blockSize); break; // hipModuleUnload[('hipModule_t', 'module')] case HIP_API_ID_hipModuleUnload: break; // hipOccupancyMaxActiveBlocksPerMultiprocessor[('int*', 'numBlocks'), ('const void*', 'f'), ('int', 'blockSize'), ('size_t', 'dynamicSMemSize')] case HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessor: if (data->args.hipOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks) data->args.hipOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks__val = *(data->args.hipOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks); break; // hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags[('int*', 'numBlocks'), ('const void*', 'f'), ('int', 'blockSize'), ('size_t', 'dynamicSMemSize'), ('unsigned int', 'flags')] case HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags: if (data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks) data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks__val = *(data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks); break; // hipOccupancyMaxPotentialBlockSize[('int*', 'gridSize'), ('int*', 'blockSize'), ('const void*', 'f'), ('size_t', 'dynSharedMemPerBlk'), ('int', 'blockSizeLimit')] case HIP_API_ID_hipOccupancyMaxPotentialBlockSize: if (data->args.hipOccupancyMaxPotentialBlockSize.gridSize) data->args.hipOccupancyMaxPotentialBlockSize.gridSize__val = *(data->args.hipOccupancyMaxPotentialBlockSize.gridSize); if (data->args.hipOccupancyMaxPotentialBlockSize.blockSize) data->args.hipOccupancyMaxPotentialBlockSize.blockSize__val = *(data->args.hipOccupancyMaxPotentialBlockSize.blockSize); break; // hipPeekAtLastError[] case HIP_API_ID_hipPeekAtLastError: break; // hipPointerGetAttribute[('void*', 'data'), ('hipPointer_attribute', 'attribute'), ('hipDeviceptr_t', 'ptr')] case HIP_API_ID_hipPointerGetAttribute: break; // hipPointerGetAttributes[('hipPointerAttribute_t*', 'attributes'), ('const void*', 'ptr')] case HIP_API_ID_hipPointerGetAttributes: if (data->args.hipPointerGetAttributes.attributes) data->args.hipPointerGetAttributes.attributes__val = *(data->args.hipPointerGetAttributes.attributes); break; // hipPointerSetAttribute[('const void*', 'value'), ('hipPointer_attribute', 'attribute'), ('hipDeviceptr_t', 'ptr')] case HIP_API_ID_hipPointerSetAttribute: break; // hipProfilerStart[] case HIP_API_ID_hipProfilerStart: break; // hipProfilerStop[] case HIP_API_ID_hipProfilerStop: break; // hipRuntimeGetVersion[('int*', 'runtimeVersion')] case HIP_API_ID_hipRuntimeGetVersion: if (data->args.hipRuntimeGetVersion.runtimeVersion) data->args.hipRuntimeGetVersion.runtimeVersion__val = *(data->args.hipRuntimeGetVersion.runtimeVersion); break; // hipSetDevice[('int', 'deviceId')] case HIP_API_ID_hipSetDevice: break; // hipSetDeviceFlags[('unsigned int', 'flags')] case HIP_API_ID_hipSetDeviceFlags: break; // hipSetupArgument[('const void*', 'arg'), ('size_t', 'size'), ('size_t', 'offset')] case HIP_API_ID_hipSetupArgument: break; // hipSignalExternalSemaphoresAsync[('const hipExternalSemaphore_t*', 'extSemArray'), ('const hipExternalSemaphoreSignalParams*', 'paramsArray'), ('unsigned int', 'numExtSems'), ('hipStream_t', 'stream')] case HIP_API_ID_hipSignalExternalSemaphoresAsync: if (data->args.hipSignalExternalSemaphoresAsync.extSemArray) data->args.hipSignalExternalSemaphoresAsync.extSemArray__val = *(data->args.hipSignalExternalSemaphoresAsync.extSemArray); if (data->args.hipSignalExternalSemaphoresAsync.paramsArray) data->args.hipSignalExternalSemaphoresAsync.paramsArray__val = *(data->args.hipSignalExternalSemaphoresAsync.paramsArray); break; // hipStreamAddCallback[('hipStream_t', 'stream'), ('hipStreamCallback_t', 'callback'), ('void*', 'userData'), ('unsigned int', 'flags')] case HIP_API_ID_hipStreamAddCallback: break; // hipStreamAttachMemAsync[('hipStream_t', 'stream'), ('void*', 'dev_ptr'), ('size_t', 'length'), ('unsigned int', 'flags')] case HIP_API_ID_hipStreamAttachMemAsync: break; // hipStreamBeginCapture[('hipStream_t', 'stream'), ('hipStreamCaptureMode', 'mode')] case HIP_API_ID_hipStreamBeginCapture: break; // hipStreamCreate[('hipStream_t*', 'stream')] case HIP_API_ID_hipStreamCreate: if (data->args.hipStreamCreate.stream) data->args.hipStreamCreate.stream__val = *(data->args.hipStreamCreate.stream); break; // hipStreamCreateWithFlags[('hipStream_t*', 'stream'), ('unsigned int', 'flags')] case HIP_API_ID_hipStreamCreateWithFlags: if (data->args.hipStreamCreateWithFlags.stream) data->args.hipStreamCreateWithFlags.stream__val = *(data->args.hipStreamCreateWithFlags.stream); break; // hipStreamCreateWithPriority[('hipStream_t*', 'stream'), ('unsigned int', 'flags'), ('int', 'priority')] case HIP_API_ID_hipStreamCreateWithPriority: if (data->args.hipStreamCreateWithPriority.stream) data->args.hipStreamCreateWithPriority.stream__val = *(data->args.hipStreamCreateWithPriority.stream); break; // hipStreamDestroy[('hipStream_t', 'stream')] case HIP_API_ID_hipStreamDestroy: break; // hipStreamEndCapture[('hipStream_t', 'stream'), ('hipGraph_t*', 'pGraph')] case HIP_API_ID_hipStreamEndCapture: if (data->args.hipStreamEndCapture.pGraph) data->args.hipStreamEndCapture.pGraph__val = *(data->args.hipStreamEndCapture.pGraph); break; // hipStreamGetCaptureInfo[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'pCaptureStatus'), ('unsigned long long*', 'pId')] case HIP_API_ID_hipStreamGetCaptureInfo: if (data->args.hipStreamGetCaptureInfo.pCaptureStatus) data->args.hipStreamGetCaptureInfo.pCaptureStatus__val = *(data->args.hipStreamGetCaptureInfo.pCaptureStatus); if (data->args.hipStreamGetCaptureInfo.pId) data->args.hipStreamGetCaptureInfo.pId__val = *(data->args.hipStreamGetCaptureInfo.pId); break; // hipStreamGetCaptureInfo_v2[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'captureStatus_out'), ('unsigned long long*', 'id_out'), ('hipGraph_t*', 'graph_out'), ('const hipGraphNode_t**', 'dependencies_out'), ('size_t*', 'numDependencies_out')] case HIP_API_ID_hipStreamGetCaptureInfo_v2: if (data->args.hipStreamGetCaptureInfo_v2.captureStatus_out) data->args.hipStreamGetCaptureInfo_v2.captureStatus_out__val = *(data->args.hipStreamGetCaptureInfo_v2.captureStatus_out); if (data->args.hipStreamGetCaptureInfo_v2.id_out) data->args.hipStreamGetCaptureInfo_v2.id_out__val = *(data->args.hipStreamGetCaptureInfo_v2.id_out); if (data->args.hipStreamGetCaptureInfo_v2.graph_out) data->args.hipStreamGetCaptureInfo_v2.graph_out__val = *(data->args.hipStreamGetCaptureInfo_v2.graph_out); if (data->args.hipStreamGetCaptureInfo_v2.dependencies_out) data->args.hipStreamGetCaptureInfo_v2.dependencies_out__val = *(data->args.hipStreamGetCaptureInfo_v2.dependencies_out); if (data->args.hipStreamGetCaptureInfo_v2.numDependencies_out) data->args.hipStreamGetCaptureInfo_v2.numDependencies_out__val = *(data->args.hipStreamGetCaptureInfo_v2.numDependencies_out); break; // hipStreamGetDevice[('hipStream_t', 'stream'), ('hipDevice_t*', 'device')] case HIP_API_ID_hipStreamGetDevice: if (data->args.hipStreamGetDevice.device) data->args.hipStreamGetDevice.device__val = *(data->args.hipStreamGetDevice.device); break; // hipStreamGetFlags[('hipStream_t', 'stream'), ('unsigned int*', 'flags')] case HIP_API_ID_hipStreamGetFlags: if (data->args.hipStreamGetFlags.flags) data->args.hipStreamGetFlags.flags__val = *(data->args.hipStreamGetFlags.flags); break; // hipStreamGetPriority[('hipStream_t', 'stream'), ('int*', 'priority')] case HIP_API_ID_hipStreamGetPriority: if (data->args.hipStreamGetPriority.priority) data->args.hipStreamGetPriority.priority__val = *(data->args.hipStreamGetPriority.priority); break; // hipStreamIsCapturing[('hipStream_t', 'stream'), ('hipStreamCaptureStatus*', 'pCaptureStatus')] case HIP_API_ID_hipStreamIsCapturing: if (data->args.hipStreamIsCapturing.pCaptureStatus) data->args.hipStreamIsCapturing.pCaptureStatus__val = *(data->args.hipStreamIsCapturing.pCaptureStatus); break; // hipStreamQuery[('hipStream_t', 'stream')] case HIP_API_ID_hipStreamQuery: break; // hipStreamSynchronize[('hipStream_t', 'stream')] case HIP_API_ID_hipStreamSynchronize: break; // hipStreamUpdateCaptureDependencies[('hipStream_t', 'stream'), ('hipGraphNode_t*', 'dependencies'), ('size_t', 'numDependencies'), ('unsigned int', 'flags')] case HIP_API_ID_hipStreamUpdateCaptureDependencies: if (data->args.hipStreamUpdateCaptureDependencies.dependencies) data->args.hipStreamUpdateCaptureDependencies.dependencies__val = *(data->args.hipStreamUpdateCaptureDependencies.dependencies); break; // hipStreamWaitEvent[('hipStream_t', 'stream'), ('hipEvent_t', 'event'), ('unsigned int', 'flags')] case HIP_API_ID_hipStreamWaitEvent: break; // hipStreamWaitValue32[('hipStream_t', 'stream'), ('void*', 'ptr'), ('unsigned int', 'value'), ('unsigned int', 'flags'), ('unsigned int', 'mask')] case HIP_API_ID_hipStreamWaitValue32: break; // hipStreamWaitValue64[('hipStream_t', 'stream'), ('void*', 'ptr'), ('uint64_t', 'value'), ('unsigned int', 'flags'), ('uint64_t', 'mask')] case HIP_API_ID_hipStreamWaitValue64: break; // hipStreamWriteValue32[('hipStream_t', 'stream'), ('void*', 'ptr'), ('unsigned int', 'value'), ('unsigned int', 'flags')] case HIP_API_ID_hipStreamWriteValue32: break; // hipStreamWriteValue64[('hipStream_t', 'stream'), ('void*', 'ptr'), ('uint64_t', 'value'), ('unsigned int', 'flags')] case HIP_API_ID_hipStreamWriteValue64: break; // hipTexRefGetAddress[('hipDeviceptr_t*', 'dev_ptr'), ('const textureReference*', 'texRef')] case HIP_API_ID_hipTexRefGetAddress: if (data->args.hipTexRefGetAddress.dev_ptr) data->args.hipTexRefGetAddress.dev_ptr__val = *(data->args.hipTexRefGetAddress.dev_ptr); if (data->args.hipTexRefGetAddress.texRef) data->args.hipTexRefGetAddress.texRef__val = *(data->args.hipTexRefGetAddress.texRef); break; // hipTexRefGetFlags[('unsigned int*', 'pFlags'), ('const textureReference*', 'texRef')] case HIP_API_ID_hipTexRefGetFlags: if (data->args.hipTexRefGetFlags.pFlags) data->args.hipTexRefGetFlags.pFlags__val = *(data->args.hipTexRefGetFlags.pFlags); if (data->args.hipTexRefGetFlags.texRef) data->args.hipTexRefGetFlags.texRef__val = *(data->args.hipTexRefGetFlags.texRef); break; // hipTexRefGetFormat[('hipArray_Format*', 'pFormat'), ('int*', 'pNumChannels'), ('const textureReference*', 'texRef')] case HIP_API_ID_hipTexRefGetFormat: if (data->args.hipTexRefGetFormat.pFormat) data->args.hipTexRefGetFormat.pFormat__val = *(data->args.hipTexRefGetFormat.pFormat); if (data->args.hipTexRefGetFormat.pNumChannels) data->args.hipTexRefGetFormat.pNumChannels__val = *(data->args.hipTexRefGetFormat.pNumChannels); if (data->args.hipTexRefGetFormat.texRef) data->args.hipTexRefGetFormat.texRef__val = *(data->args.hipTexRefGetFormat.texRef); break; // hipTexRefGetMaxAnisotropy[('int*', 'pmaxAnsio'), ('const textureReference*', 'texRef')] case HIP_API_ID_hipTexRefGetMaxAnisotropy: if (data->args.hipTexRefGetMaxAnisotropy.pmaxAnsio) data->args.hipTexRefGetMaxAnisotropy.pmaxAnsio__val = *(data->args.hipTexRefGetMaxAnisotropy.pmaxAnsio); if (data->args.hipTexRefGetMaxAnisotropy.texRef) data->args.hipTexRefGetMaxAnisotropy.texRef__val = *(data->args.hipTexRefGetMaxAnisotropy.texRef); break; // hipTexRefGetMipMappedArray[('hipMipmappedArray_t*', 'pArray'), ('const textureReference*', 'texRef')] case HIP_API_ID_hipTexRefGetMipMappedArray: if (data->args.hipTexRefGetMipMappedArray.pArray) data->args.hipTexRefGetMipMappedArray.pArray__val = *(data->args.hipTexRefGetMipMappedArray.pArray); if (data->args.hipTexRefGetMipMappedArray.texRef) data->args.hipTexRefGetMipMappedArray.texRef__val = *(data->args.hipTexRefGetMipMappedArray.texRef); break; // hipTexRefGetMipmapLevelBias[('float*', 'pbias'), ('const textureReference*', 'texRef')] case HIP_API_ID_hipTexRefGetMipmapLevelBias: if (data->args.hipTexRefGetMipmapLevelBias.pbias) data->args.hipTexRefGetMipmapLevelBias.pbias__val = *(data->args.hipTexRefGetMipmapLevelBias.pbias); if (data->args.hipTexRefGetMipmapLevelBias.texRef) data->args.hipTexRefGetMipmapLevelBias.texRef__val = *(data->args.hipTexRefGetMipmapLevelBias.texRef); break; // hipTexRefGetMipmapLevelClamp[('float*', 'pminMipmapLevelClamp'), ('float*', 'pmaxMipmapLevelClamp'), ('const textureReference*', 'texRef')] case HIP_API_ID_hipTexRefGetMipmapLevelClamp: if (data->args.hipTexRefGetMipmapLevelClamp.pminMipmapLevelClamp) data->args.hipTexRefGetMipmapLevelClamp.pminMipmapLevelClamp__val = *(data->args.hipTexRefGetMipmapLevelClamp.pminMipmapLevelClamp); if (data->args.hipTexRefGetMipmapLevelClamp.pmaxMipmapLevelClamp) data->args.hipTexRefGetMipmapLevelClamp.pmaxMipmapLevelClamp__val = *(data->args.hipTexRefGetMipmapLevelClamp.pmaxMipmapLevelClamp); if (data->args.hipTexRefGetMipmapLevelClamp.texRef) data->args.hipTexRefGetMipmapLevelClamp.texRef__val = *(data->args.hipTexRefGetMipmapLevelClamp.texRef); break; // hipTexRefSetAddress[('size_t*', 'ByteOffset'), ('textureReference*', 'texRef'), ('hipDeviceptr_t', 'dptr'), ('size_t', 'bytes')] case HIP_API_ID_hipTexRefSetAddress: if (data->args.hipTexRefSetAddress.ByteOffset) data->args.hipTexRefSetAddress.ByteOffset__val = *(data->args.hipTexRefSetAddress.ByteOffset); if (data->args.hipTexRefSetAddress.texRef) data->args.hipTexRefSetAddress.texRef__val = *(data->args.hipTexRefSetAddress.texRef); break; // hipTexRefSetAddress2D[('textureReference*', 'texRef'), ('const HIP_ARRAY_DESCRIPTOR*', 'desc'), ('hipDeviceptr_t', 'dptr'), ('size_t', 'Pitch')] case HIP_API_ID_hipTexRefSetAddress2D: if (data->args.hipTexRefSetAddress2D.texRef) data->args.hipTexRefSetAddress2D.texRef__val = *(data->args.hipTexRefSetAddress2D.texRef); if (data->args.hipTexRefSetAddress2D.desc) data->args.hipTexRefSetAddress2D.desc__val = *(data->args.hipTexRefSetAddress2D.desc); break; // hipTexRefSetArray[('textureReference*', 'tex'), ('hipArray_const_t', 'array'), ('unsigned int', 'flags')] case HIP_API_ID_hipTexRefSetArray: if (data->args.hipTexRefSetArray.tex) data->args.hipTexRefSetArray.tex__val = *(data->args.hipTexRefSetArray.tex); break; // hipTexRefSetBorderColor[('textureReference*', 'texRef'), ('float*', 'pBorderColor')] case HIP_API_ID_hipTexRefSetBorderColor: if (data->args.hipTexRefSetBorderColor.texRef) data->args.hipTexRefSetBorderColor.texRef__val = *(data->args.hipTexRefSetBorderColor.texRef); if (data->args.hipTexRefSetBorderColor.pBorderColor) data->args.hipTexRefSetBorderColor.pBorderColor__val = *(data->args.hipTexRefSetBorderColor.pBorderColor); break; // hipTexRefSetFlags[('textureReference*', 'texRef'), ('unsigned int', 'Flags')] case HIP_API_ID_hipTexRefSetFlags: if (data->args.hipTexRefSetFlags.texRef) data->args.hipTexRefSetFlags.texRef__val = *(data->args.hipTexRefSetFlags.texRef); break; // hipTexRefSetFormat[('textureReference*', 'texRef'), ('hipArray_Format', 'fmt'), ('int', 'NumPackedComponents')] case HIP_API_ID_hipTexRefSetFormat: if (data->args.hipTexRefSetFormat.texRef) data->args.hipTexRefSetFormat.texRef__val = *(data->args.hipTexRefSetFormat.texRef); break; // hipTexRefSetMaxAnisotropy[('textureReference*', 'texRef'), ('unsigned int', 'maxAniso')] case HIP_API_ID_hipTexRefSetMaxAnisotropy: if (data->args.hipTexRefSetMaxAnisotropy.texRef) data->args.hipTexRefSetMaxAnisotropy.texRef__val = *(data->args.hipTexRefSetMaxAnisotropy.texRef); break; // hipTexRefSetMipmapLevelBias[('textureReference*', 'texRef'), ('float', 'bias')] case HIP_API_ID_hipTexRefSetMipmapLevelBias: if (data->args.hipTexRefSetMipmapLevelBias.texRef) data->args.hipTexRefSetMipmapLevelBias.texRef__val = *(data->args.hipTexRefSetMipmapLevelBias.texRef); break; // hipTexRefSetMipmapLevelClamp[('textureReference*', 'texRef'), ('float', 'minMipMapLevelClamp'), ('float', 'maxMipMapLevelClamp')] case HIP_API_ID_hipTexRefSetMipmapLevelClamp: if (data->args.hipTexRefSetMipmapLevelClamp.texRef) data->args.hipTexRefSetMipmapLevelClamp.texRef__val = *(data->args.hipTexRefSetMipmapLevelClamp.texRef); break; // hipTexRefSetMipmappedArray[('textureReference*', 'texRef'), ('hipMipmappedArray*', 'mipmappedArray'), ('unsigned int', 'Flags')] case HIP_API_ID_hipTexRefSetMipmappedArray: if (data->args.hipTexRefSetMipmappedArray.texRef) data->args.hipTexRefSetMipmappedArray.texRef__val = *(data->args.hipTexRefSetMipmappedArray.texRef); if (data->args.hipTexRefSetMipmappedArray.mipmappedArray) data->args.hipTexRefSetMipmappedArray.mipmappedArray__val = *(data->args.hipTexRefSetMipmappedArray.mipmappedArray); break; // hipThreadExchangeStreamCaptureMode[('hipStreamCaptureMode*', 'mode')] case HIP_API_ID_hipThreadExchangeStreamCaptureMode: if (data->args.hipThreadExchangeStreamCaptureMode.mode) data->args.hipThreadExchangeStreamCaptureMode.mode__val = *(data->args.hipThreadExchangeStreamCaptureMode.mode); break; // hipUserObjectCreate[('hipUserObject_t*', 'object_out'), ('void*', 'ptr'), ('hipHostFn_t', 'destroy'), ('unsigned int', 'initialRefcount'), ('unsigned int', 'flags')] case HIP_API_ID_hipUserObjectCreate: if (data->args.hipUserObjectCreate.object_out) data->args.hipUserObjectCreate.object_out__val = *(data->args.hipUserObjectCreate.object_out); break; // hipUserObjectRelease[('hipUserObject_t', 'object'), ('unsigned int', 'count')] case HIP_API_ID_hipUserObjectRelease: break; // hipUserObjectRetain[('hipUserObject_t', 'object'), ('unsigned int', 'count')] case HIP_API_ID_hipUserObjectRetain: break; // hipWaitExternalSemaphoresAsync[('const hipExternalSemaphore_t*', 'extSemArray'), ('const hipExternalSemaphoreWaitParams*', 'paramsArray'), ('unsigned int', 'numExtSems'), ('hipStream_t', 'stream')] case HIP_API_ID_hipWaitExternalSemaphoresAsync: if (data->args.hipWaitExternalSemaphoresAsync.extSemArray) data->args.hipWaitExternalSemaphoresAsync.extSemArray__val = *(data->args.hipWaitExternalSemaphoresAsync.extSemArray); if (data->args.hipWaitExternalSemaphoresAsync.paramsArray) data->args.hipWaitExternalSemaphoresAsync.paramsArray__val = *(data->args.hipWaitExternalSemaphoresAsync.paramsArray); break; default: break; }; } #include #include // HIP API string method, method name and parameters static inline const char* hipApiString(hip_api_id_t id, const hip_api_data_t* data) { std::ostringstream oss; switch (id) { case HIP_API_ID___hipPopCallConfiguration: oss << "__hipPopCallConfiguration("; if (data->args.__hipPopCallConfiguration.gridDim == NULL) oss << "gridDim=NULL"; else { oss << "gridDim="; roctracer::hip_support::detail::operator<<(oss, data->args.__hipPopCallConfiguration.gridDim__val); } if (data->args.__hipPopCallConfiguration.blockDim == NULL) oss << ", blockDim=NULL"; else { oss << ", blockDim="; roctracer::hip_support::detail::operator<<(oss, data->args.__hipPopCallConfiguration.blockDim__val); } if (data->args.__hipPopCallConfiguration.sharedMem == NULL) oss << ", sharedMem=NULL"; else { oss << ", sharedMem="; roctracer::hip_support::detail::operator<<(oss, data->args.__hipPopCallConfiguration.sharedMem__val); } if (data->args.__hipPopCallConfiguration.stream == NULL) oss << ", stream=NULL"; else { oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.__hipPopCallConfiguration.stream__val); } oss << ")"; break; case HIP_API_ID___hipPushCallConfiguration: oss << "__hipPushCallConfiguration("; oss << "gridDim="; roctracer::hip_support::detail::operator<<(oss, data->args.__hipPushCallConfiguration.gridDim); oss << ", blockDim="; roctracer::hip_support::detail::operator<<(oss, data->args.__hipPushCallConfiguration.blockDim); oss << ", sharedMem="; roctracer::hip_support::detail::operator<<(oss, data->args.__hipPushCallConfiguration.sharedMem); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.__hipPushCallConfiguration.stream); oss << ")"; break; case HIP_API_ID_hipArray3DCreate: oss << "hipArray3DCreate("; if (data->args.hipArray3DCreate.array == NULL) oss << "array=NULL"; else { oss << "array="; roctracer::hip_support::detail::operator<<(oss, (void*)data->args.hipArray3DCreate.array__val); } if (data->args.hipArray3DCreate.pAllocateArray == NULL) oss << ", pAllocateArray=NULL"; else { oss << ", pAllocateArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArray3DCreate.pAllocateArray__val); } oss << ")"; break; case HIP_API_ID_hipArray3DGetDescriptor: oss << "hipArray3DGetDescriptor("; if (data->args.hipArray3DGetDescriptor.pArrayDescriptor == NULL) oss << "pArrayDescriptor=NULL"; else { oss << "pArrayDescriptor="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArray3DGetDescriptor.pArrayDescriptor__val); } if (data->args.hipArray3DGetDescriptor.array == NULL) oss << ", array=NULL"; else { oss << ", array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArray3DGetDescriptor.array__val); } oss << ")"; break; case HIP_API_ID_hipArrayCreate: oss << "hipArrayCreate("; if (data->args.hipArrayCreate.pHandle == NULL) oss << "pHandle=NULL"; else { oss << "pHandle="; roctracer::hip_support::detail::operator<<(oss, (void*)data->args.hipArrayCreate.pHandle__val); } if (data->args.hipArrayCreate.pAllocateArray == NULL) oss << ", pAllocateArray=NULL"; else { oss << ", pAllocateArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArrayCreate.pAllocateArray__val); } oss << ")"; break; case HIP_API_ID_hipArrayDestroy: oss << "hipArrayDestroy("; if (data->args.hipArrayDestroy.array == NULL) oss << "array=NULL"; else { oss << "array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArrayDestroy.array__val); } oss << ")"; break; case HIP_API_ID_hipArrayGetDescriptor: oss << "hipArrayGetDescriptor("; if (data->args.hipArrayGetDescriptor.pArrayDescriptor == NULL) oss << "pArrayDescriptor=NULL"; else { oss << "pArrayDescriptor="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArrayGetDescriptor.pArrayDescriptor__val); } if (data->args.hipArrayGetDescriptor.array == NULL) oss << ", array=NULL"; else { oss << ", array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArrayGetDescriptor.array__val); } oss << ")"; break; case HIP_API_ID_hipArrayGetInfo: oss << "hipArrayGetInfo("; if (data->args.hipArrayGetInfo.desc == NULL) oss << "desc=NULL"; else { oss << "desc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArrayGetInfo.desc__val); } if (data->args.hipArrayGetInfo.extent == NULL) oss << ", extent=NULL"; else { oss << ", extent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArrayGetInfo.extent__val); } if (data->args.hipArrayGetInfo.flags == NULL) oss << ", flags=NULL"; else { oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArrayGetInfo.flags__val); } if (data->args.hipArrayGetInfo.array == NULL) oss << ", array=NULL"; else { oss << ", array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipArrayGetInfo.array__val); } oss << ")"; break; case HIP_API_ID_hipChooseDevice: oss << "hipChooseDevice("; if (data->args.hipChooseDevice.device == NULL) oss << "device=NULL"; else { oss << "device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipChooseDevice.device__val); } if (data->args.hipChooseDevice.prop == NULL) oss << ", prop=NULL"; else { oss << ", prop="; roctracer::hip_support::detail::operator<<(oss, data->args.hipChooseDevice.prop__val); } oss << ")"; break; case HIP_API_ID_hipConfigureCall: oss << "hipConfigureCall("; oss << "gridDim="; roctracer::hip_support::detail::operator<<(oss, data->args.hipConfigureCall.gridDim); oss << ", blockDim="; roctracer::hip_support::detail::operator<<(oss, data->args.hipConfigureCall.blockDim); oss << ", sharedMem="; roctracer::hip_support::detail::operator<<(oss, data->args.hipConfigureCall.sharedMem); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipConfigureCall.stream); oss << ")"; break; case HIP_API_ID_hipCreateSurfaceObject: oss << "hipCreateSurfaceObject("; if (data->args.hipCreateSurfaceObject.pSurfObject == NULL) oss << "pSurfObject=NULL"; else { oss << "pSurfObject="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCreateSurfaceObject.pSurfObject__val); } if (data->args.hipCreateSurfaceObject.pResDesc == NULL) oss << ", pResDesc=NULL"; else { oss << ", pResDesc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCreateSurfaceObject.pResDesc__val); } oss << ")"; break; case HIP_API_ID_hipCtxCreate: oss << "hipCtxCreate("; if (data->args.hipCtxCreate.ctx == NULL) oss << "ctx=NULL"; else { oss << "ctx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxCreate.ctx__val); } oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxCreate.flags); oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxCreate.device); oss << ")"; break; case HIP_API_ID_hipCtxDestroy: oss << "hipCtxDestroy("; oss << "ctx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxDestroy.ctx); oss << ")"; break; case HIP_API_ID_hipCtxDisablePeerAccess: oss << "hipCtxDisablePeerAccess("; oss << "peerCtx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxDisablePeerAccess.peerCtx); oss << ")"; break; case HIP_API_ID_hipCtxEnablePeerAccess: oss << "hipCtxEnablePeerAccess("; oss << "peerCtx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxEnablePeerAccess.peerCtx); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxEnablePeerAccess.flags); oss << ")"; break; case HIP_API_ID_hipCtxGetApiVersion: oss << "hipCtxGetApiVersion("; oss << "ctx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxGetApiVersion.ctx); if (data->args.hipCtxGetApiVersion.apiVersion == NULL) oss << ", apiVersion=NULL"; else { oss << ", apiVersion="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxGetApiVersion.apiVersion__val); } oss << ")"; break; case HIP_API_ID_hipCtxGetCacheConfig: oss << "hipCtxGetCacheConfig("; if (data->args.hipCtxGetCacheConfig.cacheConfig == NULL) oss << "cacheConfig=NULL"; else { oss << "cacheConfig="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxGetCacheConfig.cacheConfig__val); } oss << ")"; break; case HIP_API_ID_hipCtxGetCurrent: oss << "hipCtxGetCurrent("; if (data->args.hipCtxGetCurrent.ctx == NULL) oss << "ctx=NULL"; else { oss << "ctx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxGetCurrent.ctx__val); } oss << ")"; break; case HIP_API_ID_hipCtxGetDevice: oss << "hipCtxGetDevice("; if (data->args.hipCtxGetDevice.device == NULL) oss << "device=NULL"; else { oss << "device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxGetDevice.device__val); } oss << ")"; break; case HIP_API_ID_hipCtxGetFlags: oss << "hipCtxGetFlags("; if (data->args.hipCtxGetFlags.flags == NULL) oss << "flags=NULL"; else { oss << "flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxGetFlags.flags__val); } oss << ")"; break; case HIP_API_ID_hipCtxGetSharedMemConfig: oss << "hipCtxGetSharedMemConfig("; if (data->args.hipCtxGetSharedMemConfig.pConfig == NULL) oss << "pConfig=NULL"; else { oss << "pConfig="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxGetSharedMemConfig.pConfig__val); } oss << ")"; break; case HIP_API_ID_hipCtxPopCurrent: oss << "hipCtxPopCurrent("; if (data->args.hipCtxPopCurrent.ctx == NULL) oss << "ctx=NULL"; else { oss << "ctx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxPopCurrent.ctx__val); } oss << ")"; break; case HIP_API_ID_hipCtxPushCurrent: oss << "hipCtxPushCurrent("; oss << "ctx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxPushCurrent.ctx); oss << ")"; break; case HIP_API_ID_hipCtxSetCacheConfig: oss << "hipCtxSetCacheConfig("; oss << "cacheConfig="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxSetCacheConfig.cacheConfig); oss << ")"; break; case HIP_API_ID_hipCtxSetCurrent: oss << "hipCtxSetCurrent("; oss << "ctx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxSetCurrent.ctx); oss << ")"; break; case HIP_API_ID_hipCtxSetSharedMemConfig: oss << "hipCtxSetSharedMemConfig("; oss << "config="; roctracer::hip_support::detail::operator<<(oss, data->args.hipCtxSetSharedMemConfig.config); oss << ")"; break; case HIP_API_ID_hipCtxSynchronize: oss << "hipCtxSynchronize("; oss << ")"; break; case HIP_API_ID_hipDestroyExternalMemory: oss << "hipDestroyExternalMemory("; oss << "extMem="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDestroyExternalMemory.extMem); oss << ")"; break; case HIP_API_ID_hipDestroyExternalSemaphore: oss << "hipDestroyExternalSemaphore("; oss << "extSem="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDestroyExternalSemaphore.extSem); oss << ")"; break; case HIP_API_ID_hipDestroySurfaceObject: oss << "hipDestroySurfaceObject("; oss << "surfaceObject="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDestroySurfaceObject.surfaceObject); oss << ")"; break; case HIP_API_ID_hipDeviceCanAccessPeer: oss << "hipDeviceCanAccessPeer("; if (data->args.hipDeviceCanAccessPeer.canAccessPeer == NULL) oss << "canAccessPeer=NULL"; else { oss << "canAccessPeer="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceCanAccessPeer.canAccessPeer__val); } oss << ", deviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceCanAccessPeer.deviceId); oss << ", peerDeviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceCanAccessPeer.peerDeviceId); oss << ")"; break; case HIP_API_ID_hipDeviceComputeCapability: oss << "hipDeviceComputeCapability("; if (data->args.hipDeviceComputeCapability.major == NULL) oss << "major=NULL"; else { oss << "major="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceComputeCapability.major__val); } if (data->args.hipDeviceComputeCapability.minor == NULL) oss << ", minor=NULL"; else { oss << ", minor="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceComputeCapability.minor__val); } oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceComputeCapability.device); oss << ")"; break; case HIP_API_ID_hipDeviceDisablePeerAccess: oss << "hipDeviceDisablePeerAccess("; oss << "peerDeviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceDisablePeerAccess.peerDeviceId); oss << ")"; break; case HIP_API_ID_hipDeviceEnablePeerAccess: oss << "hipDeviceEnablePeerAccess("; oss << "peerDeviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceEnablePeerAccess.peerDeviceId); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceEnablePeerAccess.flags); oss << ")"; break; case HIP_API_ID_hipDeviceGet: oss << "hipDeviceGet("; if (data->args.hipDeviceGet.device == NULL) oss << "device=NULL"; else { oss << "device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGet.device__val); } oss << ", ordinal="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGet.ordinal); oss << ")"; break; case HIP_API_ID_hipDeviceGetAttribute: oss << "hipDeviceGetAttribute("; if (data->args.hipDeviceGetAttribute.pi == NULL) oss << "pi=NULL"; else { oss << "pi="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetAttribute.pi__val); } oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetAttribute.attr); oss << ", deviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetAttribute.deviceId); oss << ")"; break; case HIP_API_ID_hipDeviceGetByPCIBusId: oss << "hipDeviceGetByPCIBusId("; if (data->args.hipDeviceGetByPCIBusId.device == NULL) oss << "device=NULL"; else { oss << "device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetByPCIBusId.device__val); } if (data->args.hipDeviceGetByPCIBusId.pciBusId == NULL) oss << ", pciBusId=NULL"; else { oss << ", pciBusId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetByPCIBusId.pciBusId__val); } oss << ")"; break; case HIP_API_ID_hipDeviceGetCacheConfig: oss << "hipDeviceGetCacheConfig("; if (data->args.hipDeviceGetCacheConfig.cacheConfig == NULL) oss << "cacheConfig=NULL"; else { oss << "cacheConfig="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetCacheConfig.cacheConfig__val); } oss << ")"; break; case HIP_API_ID_hipDeviceGetDefaultMemPool: oss << "hipDeviceGetDefaultMemPool("; if (data->args.hipDeviceGetDefaultMemPool.mem_pool == NULL) oss << "mem_pool=NULL"; else { oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetDefaultMemPool.mem_pool__val); } oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetDefaultMemPool.device); oss << ")"; break; case HIP_API_ID_hipDeviceGetGraphMemAttribute: oss << "hipDeviceGetGraphMemAttribute("; oss << "device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetGraphMemAttribute.device); oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetGraphMemAttribute.attr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetGraphMemAttribute.value); oss << ")"; break; case HIP_API_ID_hipDeviceGetLimit: oss << "hipDeviceGetLimit("; if (data->args.hipDeviceGetLimit.pValue == NULL) oss << "pValue=NULL"; else { oss << "pValue="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetLimit.pValue__val); } oss << ", limit="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetLimit.limit); oss << ")"; break; case HIP_API_ID_hipDeviceGetMemPool: oss << "hipDeviceGetMemPool("; if (data->args.hipDeviceGetMemPool.mem_pool == NULL) oss << "mem_pool=NULL"; else { oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetMemPool.mem_pool__val); } oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetMemPool.device); oss << ")"; break; case HIP_API_ID_hipDeviceGetName: oss << "hipDeviceGetName("; if (data->args.hipDeviceGetName.name == NULL) oss << "name=NULL"; else { oss << "name="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetName.name__val); } oss << ", len="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetName.len); oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetName.device); oss << ")"; break; case HIP_API_ID_hipDeviceGetP2PAttribute: oss << "hipDeviceGetP2PAttribute("; if (data->args.hipDeviceGetP2PAttribute.value == NULL) oss << "value=NULL"; else { oss << "value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetP2PAttribute.value__val); } oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetP2PAttribute.attr); oss << ", srcDevice="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetP2PAttribute.srcDevice); oss << ", dstDevice="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetP2PAttribute.dstDevice); oss << ")"; break; case HIP_API_ID_hipDeviceGetPCIBusId: oss << "hipDeviceGetPCIBusId("; if (data->args.hipDeviceGetPCIBusId.pciBusId == NULL) oss << "pciBusId=NULL"; else { oss << "pciBusId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetPCIBusId.pciBusId__val); } oss << ", len="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetPCIBusId.len); oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetPCIBusId.device); oss << ")"; break; case HIP_API_ID_hipDeviceGetSharedMemConfig: oss << "hipDeviceGetSharedMemConfig("; if (data->args.hipDeviceGetSharedMemConfig.pConfig == NULL) oss << "pConfig=NULL"; else { oss << "pConfig="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetSharedMemConfig.pConfig__val); } oss << ")"; break; case HIP_API_ID_hipDeviceGetStreamPriorityRange: oss << "hipDeviceGetStreamPriorityRange("; if (data->args.hipDeviceGetStreamPriorityRange.leastPriority == NULL) oss << "leastPriority=NULL"; else { oss << "leastPriority="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetStreamPriorityRange.leastPriority__val); } if (data->args.hipDeviceGetStreamPriorityRange.greatestPriority == NULL) oss << ", greatestPriority=NULL"; else { oss << ", greatestPriority="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetStreamPriorityRange.greatestPriority__val); } oss << ")"; break; case HIP_API_ID_hipDeviceGetUuid: oss << "hipDeviceGetUuid("; if (data->args.hipDeviceGetUuid.uuid == NULL) oss << "uuid=NULL"; else { oss << "uuid="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetUuid.uuid__val); } oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGetUuid.device); oss << ")"; break; case HIP_API_ID_hipDeviceGraphMemTrim: oss << "hipDeviceGraphMemTrim("; oss << "device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceGraphMemTrim.device); oss << ")"; break; case HIP_API_ID_hipDevicePrimaryCtxGetState: oss << "hipDevicePrimaryCtxGetState("; oss << "dev="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxGetState.dev); if (data->args.hipDevicePrimaryCtxGetState.flags == NULL) oss << ", flags=NULL"; else { oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxGetState.flags__val); } if (data->args.hipDevicePrimaryCtxGetState.active == NULL) oss << ", active=NULL"; else { oss << ", active="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxGetState.active__val); } oss << ")"; break; case HIP_API_ID_hipDevicePrimaryCtxRelease: oss << "hipDevicePrimaryCtxRelease("; oss << "dev="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxRelease.dev); oss << ")"; break; case HIP_API_ID_hipDevicePrimaryCtxReset: oss << "hipDevicePrimaryCtxReset("; oss << "dev="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxReset.dev); oss << ")"; break; case HIP_API_ID_hipDevicePrimaryCtxRetain: oss << "hipDevicePrimaryCtxRetain("; if (data->args.hipDevicePrimaryCtxRetain.pctx == NULL) oss << "pctx=NULL"; else { oss << "pctx="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxRetain.pctx__val); } oss << ", dev="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxRetain.dev); oss << ")"; break; case HIP_API_ID_hipDevicePrimaryCtxSetFlags: oss << "hipDevicePrimaryCtxSetFlags("; oss << "dev="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxSetFlags.dev); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDevicePrimaryCtxSetFlags.flags); oss << ")"; break; case HIP_API_ID_hipDeviceReset: oss << "hipDeviceReset("; oss << ")"; break; case HIP_API_ID_hipDeviceSetCacheConfig: oss << "hipDeviceSetCacheConfig("; oss << "cacheConfig="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetCacheConfig.cacheConfig); oss << ")"; break; case HIP_API_ID_hipDeviceSetGraphMemAttribute: oss << "hipDeviceSetGraphMemAttribute("; oss << "device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetGraphMemAttribute.device); oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetGraphMemAttribute.attr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetGraphMemAttribute.value); oss << ")"; break; case HIP_API_ID_hipDeviceSetLimit: oss << "hipDeviceSetLimit("; oss << "limit="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetLimit.limit); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetLimit.value); oss << ")"; break; case HIP_API_ID_hipDeviceSetMemPool: oss << "hipDeviceSetMemPool("; oss << "device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetMemPool.device); oss << ", mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetMemPool.mem_pool); oss << ")"; break; case HIP_API_ID_hipDeviceSetSharedMemConfig: oss << "hipDeviceSetSharedMemConfig("; oss << "config="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceSetSharedMemConfig.config); oss << ")"; break; case HIP_API_ID_hipDeviceSynchronize: oss << "hipDeviceSynchronize("; oss << ")"; break; case HIP_API_ID_hipDeviceTotalMem: oss << "hipDeviceTotalMem("; if (data->args.hipDeviceTotalMem.bytes == NULL) oss << "bytes=NULL"; else { oss << "bytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceTotalMem.bytes__val); } oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDeviceTotalMem.device); oss << ")"; break; case HIP_API_ID_hipDriverGetVersion: oss << "hipDriverGetVersion("; if (data->args.hipDriverGetVersion.driverVersion == NULL) oss << "driverVersion=NULL"; else { oss << "driverVersion="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDriverGetVersion.driverVersion__val); } oss << ")"; break; case HIP_API_ID_hipDrvMemcpy2DUnaligned: oss << "hipDrvMemcpy2DUnaligned("; if (data->args.hipDrvMemcpy2DUnaligned.pCopy == NULL) oss << "pCopy=NULL"; else { oss << "pCopy="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDrvMemcpy2DUnaligned.pCopy__val); } oss << ")"; break; case HIP_API_ID_hipDrvMemcpy3D: oss << "hipDrvMemcpy3D("; if (data->args.hipDrvMemcpy3D.pCopy == NULL) oss << "pCopy=NULL"; else { oss << "pCopy="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDrvMemcpy3D.pCopy__val); } oss << ")"; break; case HIP_API_ID_hipDrvMemcpy3DAsync: oss << "hipDrvMemcpy3DAsync("; if (data->args.hipDrvMemcpy3DAsync.pCopy == NULL) oss << "pCopy=NULL"; else { oss << "pCopy="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDrvMemcpy3DAsync.pCopy__val); } oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDrvMemcpy3DAsync.stream); oss << ")"; break; case HIP_API_ID_hipDrvPointerGetAttributes: oss << "hipDrvPointerGetAttributes("; oss << "numAttributes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDrvPointerGetAttributes.numAttributes); if (data->args.hipDrvPointerGetAttributes.attributes == NULL) oss << ", attributes=NULL"; else { oss << ", attributes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDrvPointerGetAttributes.attributes__val); } if (data->args.hipDrvPointerGetAttributes.data == NULL) oss << ", data=NULL"; else { oss << ", data="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDrvPointerGetAttributes.data__val); } oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipDrvPointerGetAttributes.ptr); oss << ")"; break; case HIP_API_ID_hipEventCreate: oss << "hipEventCreate("; if (data->args.hipEventCreate.event == NULL) oss << "event=NULL"; else { oss << "event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventCreate.event__val); } oss << ")"; break; case HIP_API_ID_hipEventCreateWithFlags: oss << "hipEventCreateWithFlags("; if (data->args.hipEventCreateWithFlags.event == NULL) oss << "event=NULL"; else { oss << "event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventCreateWithFlags.event__val); } oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventCreateWithFlags.flags); oss << ")"; break; case HIP_API_ID_hipEventDestroy: oss << "hipEventDestroy("; oss << "event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventDestroy.event); oss << ")"; break; case HIP_API_ID_hipEventElapsedTime: oss << "hipEventElapsedTime("; if (data->args.hipEventElapsedTime.ms == NULL) oss << "ms=NULL"; else { oss << "ms="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventElapsedTime.ms__val); } oss << ", start="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventElapsedTime.start); oss << ", stop="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventElapsedTime.stop); oss << ")"; break; case HIP_API_ID_hipEventQuery: oss << "hipEventQuery("; oss << "event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventQuery.event); oss << ")"; break; case HIP_API_ID_hipEventRecord: oss << "hipEventRecord("; oss << "event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventRecord.event); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventRecord.stream); oss << ")"; break; case HIP_API_ID_hipEventSynchronize: oss << "hipEventSynchronize("; oss << "event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipEventSynchronize.event); oss << ")"; break; case HIP_API_ID_hipExtGetLinkTypeAndHopCount: oss << "hipExtGetLinkTypeAndHopCount("; oss << "device1="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtGetLinkTypeAndHopCount.device1); oss << ", device2="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtGetLinkTypeAndHopCount.device2); if (data->args.hipExtGetLinkTypeAndHopCount.linktype == NULL) oss << ", linktype=NULL"; else { oss << ", linktype="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtGetLinkTypeAndHopCount.linktype__val); } if (data->args.hipExtGetLinkTypeAndHopCount.hopcount == NULL) oss << ", hopcount=NULL"; else { oss << ", hopcount="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtGetLinkTypeAndHopCount.hopcount__val); } oss << ")"; break; case HIP_API_ID_hipExtLaunchKernel: oss << "hipExtLaunchKernel("; oss << "function_address="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.function_address); oss << ", numBlocks="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.numBlocks); oss << ", dimBlocks="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.dimBlocks); if (data->args.hipExtLaunchKernel.args == NULL) oss << ", args=NULL"; else { oss << ", args="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.args__val); } oss << ", sharedMemBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.sharedMemBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.stream); oss << ", startEvent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.startEvent); oss << ", stopEvent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.stopEvent); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchKernel.flags); oss << ")"; break; case HIP_API_ID_hipExtLaunchMultiKernelMultiDevice: oss << "hipExtLaunchMultiKernelMultiDevice("; if (data->args.hipExtLaunchMultiKernelMultiDevice.launchParamsList == NULL) oss << "launchParamsList=NULL"; else { oss << "launchParamsList="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchMultiKernelMultiDevice.launchParamsList__val); } oss << ", numDevices="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchMultiKernelMultiDevice.numDevices); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtLaunchMultiKernelMultiDevice.flags); oss << ")"; break; case HIP_API_ID_hipExtMallocWithFlags: oss << "hipExtMallocWithFlags("; if (data->args.hipExtMallocWithFlags.ptr == NULL) oss << "ptr=NULL"; else { oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtMallocWithFlags.ptr__val); } oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtMallocWithFlags.sizeBytes); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtMallocWithFlags.flags); oss << ")"; break; case HIP_API_ID_hipExtModuleLaunchKernel: oss << "hipExtModuleLaunchKernel("; oss << "f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.f); oss << ", globalWorkSizeX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.globalWorkSizeX); oss << ", globalWorkSizeY="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.globalWorkSizeY); oss << ", globalWorkSizeZ="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.globalWorkSizeZ); oss << ", localWorkSizeX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.localWorkSizeX); oss << ", localWorkSizeY="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.localWorkSizeY); oss << ", localWorkSizeZ="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.localWorkSizeZ); oss << ", sharedMemBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.sharedMemBytes); oss << ", hStream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.hStream); if (data->args.hipExtModuleLaunchKernel.kernelParams == NULL) oss << ", kernelParams=NULL"; else { oss << ", kernelParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.kernelParams__val); } if (data->args.hipExtModuleLaunchKernel.extra == NULL) oss << ", extra=NULL"; else { oss << ", extra="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.extra__val); } oss << ", startEvent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.startEvent); oss << ", stopEvent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.stopEvent); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtModuleLaunchKernel.flags); oss << ")"; break; case HIP_API_ID_hipExtStreamCreateWithCUMask: oss << "hipExtStreamCreateWithCUMask("; if (data->args.hipExtStreamCreateWithCUMask.stream == NULL) oss << "stream=NULL"; else { oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtStreamCreateWithCUMask.stream__val); } oss << ", cuMaskSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtStreamCreateWithCUMask.cuMaskSize); if (data->args.hipExtStreamCreateWithCUMask.cuMask == NULL) oss << ", cuMask=NULL"; else { oss << ", cuMask="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtStreamCreateWithCUMask.cuMask__val); } oss << ")"; break; case HIP_API_ID_hipExtStreamGetCUMask: oss << "hipExtStreamGetCUMask("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtStreamGetCUMask.stream); oss << ", cuMaskSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtStreamGetCUMask.cuMaskSize); if (data->args.hipExtStreamGetCUMask.cuMask == NULL) oss << ", cuMask=NULL"; else { oss << ", cuMask="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExtStreamGetCUMask.cuMask__val); } oss << ")"; break; case HIP_API_ID_hipExternalMemoryGetMappedBuffer: oss << "hipExternalMemoryGetMappedBuffer("; if (data->args.hipExternalMemoryGetMappedBuffer.devPtr == NULL) oss << "devPtr=NULL"; else { oss << "devPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExternalMemoryGetMappedBuffer.devPtr__val); } oss << ", extMem="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExternalMemoryGetMappedBuffer.extMem); if (data->args.hipExternalMemoryGetMappedBuffer.bufferDesc == NULL) oss << ", bufferDesc=NULL"; else { oss << ", bufferDesc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipExternalMemoryGetMappedBuffer.bufferDesc__val); } oss << ")"; break; case HIP_API_ID_hipFree: oss << "hipFree("; oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFree.ptr); oss << ")"; break; case HIP_API_ID_hipFreeArray: oss << "hipFreeArray("; if (data->args.hipFreeArray.array == NULL) oss << "array=NULL"; else { oss << "array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFreeArray.array__val); } oss << ")"; break; case HIP_API_ID_hipFreeAsync: oss << "hipFreeAsync("; oss << "dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFreeAsync.dev_ptr); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFreeAsync.stream); oss << ")"; break; case HIP_API_ID_hipFreeHost: oss << "hipFreeHost("; oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFreeHost.ptr); oss << ")"; break; case HIP_API_ID_hipFreeMipmappedArray: oss << "hipFreeMipmappedArray("; oss << "mipmappedArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFreeMipmappedArray.mipmappedArray); oss << ")"; break; case HIP_API_ID_hipFuncGetAttribute: oss << "hipFuncGetAttribute("; if (data->args.hipFuncGetAttribute.value == NULL) oss << "value=NULL"; else { oss << "value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncGetAttribute.value__val); } oss << ", attrib="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncGetAttribute.attrib); oss << ", hfunc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncGetAttribute.hfunc); oss << ")"; break; case HIP_API_ID_hipFuncGetAttributes: oss << "hipFuncGetAttributes("; if (data->args.hipFuncGetAttributes.attr == NULL) oss << "attr=NULL"; else { oss << "attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncGetAttributes.attr__val); } oss << ", func="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncGetAttributes.func); oss << ")"; break; case HIP_API_ID_hipFuncSetAttribute: oss << "hipFuncSetAttribute("; oss << "func="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncSetAttribute.func); oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncSetAttribute.attr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncSetAttribute.value); oss << ")"; break; case HIP_API_ID_hipFuncSetCacheConfig: oss << "hipFuncSetCacheConfig("; oss << "func="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncSetCacheConfig.func); oss << ", config="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncSetCacheConfig.config); oss << ")"; break; case HIP_API_ID_hipFuncSetSharedMemConfig: oss << "hipFuncSetSharedMemConfig("; oss << "func="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncSetSharedMemConfig.func); oss << ", config="; roctracer::hip_support::detail::operator<<(oss, data->args.hipFuncSetSharedMemConfig.config); oss << ")"; break; case HIP_API_ID_hipGLGetDevices: oss << "hipGLGetDevices("; if (data->args.hipGLGetDevices.pHipDeviceCount == NULL) oss << "pHipDeviceCount=NULL"; else { oss << "pHipDeviceCount="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGLGetDevices.pHipDeviceCount__val); } if (data->args.hipGLGetDevices.pHipDevices == NULL) oss << ", pHipDevices=NULL"; else { oss << ", pHipDevices="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGLGetDevices.pHipDevices__val); } oss << ", hipDeviceCount="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGLGetDevices.hipDeviceCount); oss << ", deviceList="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGLGetDevices.deviceList); oss << ")"; break; case HIP_API_ID_hipGetChannelDesc: oss << "hipGetChannelDesc("; if (data->args.hipGetChannelDesc.desc == NULL) oss << "desc=NULL"; else { oss << "desc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetChannelDesc.desc__val); } oss << ", array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetChannelDesc.array); oss << ")"; break; case HIP_API_ID_hipGetDevice: oss << "hipGetDevice("; if (data->args.hipGetDevice.deviceId == NULL) oss << "deviceId=NULL"; else { oss << "deviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetDevice.deviceId__val); } oss << ")"; break; case HIP_API_ID_hipGetDeviceCount: oss << "hipGetDeviceCount("; if (data->args.hipGetDeviceCount.count == NULL) oss << "count=NULL"; else { oss << "count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetDeviceCount.count__val); } oss << ")"; break; case HIP_API_ID_hipGetDeviceFlags: oss << "hipGetDeviceFlags("; if (data->args.hipGetDeviceFlags.flags == NULL) oss << "flags=NULL"; else { oss << "flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetDeviceFlags.flags__val); } oss << ")"; break; case HIP_API_ID_hipGetDeviceProperties: oss << "hipGetDeviceProperties("; if (data->args.hipGetDeviceProperties.props == NULL) oss << "props=NULL"; else { oss << "props="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetDeviceProperties.props__val); } oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetDeviceProperties.device); oss << ")"; break; case HIP_API_ID_hipGetErrorString: oss << "hipGetErrorString("; oss << ")"; break; case HIP_API_ID_hipGetLastError: oss << "hipGetLastError("; oss << ")"; break; case HIP_API_ID_hipGetMipmappedArrayLevel: oss << "hipGetMipmappedArrayLevel("; if (data->args.hipGetMipmappedArrayLevel.levelArray == NULL) oss << "levelArray=NULL"; else { oss << "levelArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetMipmappedArrayLevel.levelArray__val); } oss << ", mipmappedArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetMipmappedArrayLevel.mipmappedArray); oss << ", level="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetMipmappedArrayLevel.level); oss << ")"; break; case HIP_API_ID_hipGetSymbolAddress: oss << "hipGetSymbolAddress("; if (data->args.hipGetSymbolAddress.devPtr == NULL) oss << "devPtr=NULL"; else { oss << "devPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetSymbolAddress.devPtr__val); } oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetSymbolAddress.symbol); oss << ")"; break; case HIP_API_ID_hipGetSymbolSize: oss << "hipGetSymbolSize("; if (data->args.hipGetSymbolSize.size == NULL) oss << "size=NULL"; else { oss << "size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetSymbolSize.size__val); } oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGetSymbolSize.symbol); oss << ")"; break; case HIP_API_ID_hipGraphAddChildGraphNode: oss << "hipGraphAddChildGraphNode("; if (data->args.hipGraphAddChildGraphNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddChildGraphNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddChildGraphNode.graph); if (data->args.hipGraphAddChildGraphNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddChildGraphNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddChildGraphNode.numDependencies); oss << ", childGraph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddChildGraphNode.childGraph); oss << ")"; break; case HIP_API_ID_hipGraphAddDependencies: oss << "hipGraphAddDependencies("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddDependencies.graph); if (data->args.hipGraphAddDependencies.from == NULL) oss << ", from=NULL"; else { oss << ", from="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddDependencies.from__val); } if (data->args.hipGraphAddDependencies.to == NULL) oss << ", to=NULL"; else { oss << ", to="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddDependencies.to__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddDependencies.numDependencies); oss << ")"; break; case HIP_API_ID_hipGraphAddEmptyNode: oss << "hipGraphAddEmptyNode("; if (data->args.hipGraphAddEmptyNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEmptyNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEmptyNode.graph); if (data->args.hipGraphAddEmptyNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEmptyNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEmptyNode.numDependencies); oss << ")"; break; case HIP_API_ID_hipGraphAddEventRecordNode: oss << "hipGraphAddEventRecordNode("; if (data->args.hipGraphAddEventRecordNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventRecordNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventRecordNode.graph); if (data->args.hipGraphAddEventRecordNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventRecordNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventRecordNode.numDependencies); oss << ", event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventRecordNode.event); oss << ")"; break; case HIP_API_ID_hipGraphAddEventWaitNode: oss << "hipGraphAddEventWaitNode("; if (data->args.hipGraphAddEventWaitNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventWaitNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventWaitNode.graph); if (data->args.hipGraphAddEventWaitNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventWaitNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventWaitNode.numDependencies); oss << ", event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddEventWaitNode.event); oss << ")"; break; case HIP_API_ID_hipGraphAddHostNode: oss << "hipGraphAddHostNode("; if (data->args.hipGraphAddHostNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddHostNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddHostNode.graph); if (data->args.hipGraphAddHostNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddHostNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddHostNode.numDependencies); if (data->args.hipGraphAddHostNode.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddHostNode.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphAddKernelNode: oss << "hipGraphAddKernelNode("; if (data->args.hipGraphAddKernelNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddKernelNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddKernelNode.graph); if (data->args.hipGraphAddKernelNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddKernelNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddKernelNode.numDependencies); if (data->args.hipGraphAddKernelNode.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddKernelNode.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphAddMemAllocNode: oss << "hipGraphAddMemAllocNode("; if (data->args.hipGraphAddMemAllocNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemAllocNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemAllocNode.graph); if (data->args.hipGraphAddMemAllocNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemAllocNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemAllocNode.numDependencies); if (data->args.hipGraphAddMemAllocNode.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemAllocNode.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphAddMemFreeNode: oss << "hipGraphAddMemFreeNode("; if (data->args.hipGraphAddMemFreeNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemFreeNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemFreeNode.graph); if (data->args.hipGraphAddMemFreeNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemFreeNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemFreeNode.numDependencies); oss << ", dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemFreeNode.dev_ptr); oss << ")"; break; case HIP_API_ID_hipGraphAddMemcpyNode: oss << "hipGraphAddMemcpyNode("; if (data->args.hipGraphAddMemcpyNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode.graph); if (data->args.hipGraphAddMemcpyNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode.numDependencies); if (data->args.hipGraphAddMemcpyNode.pCopyParams == NULL) oss << ", pCopyParams=NULL"; else { oss << ", pCopyParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode.pCopyParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphAddMemcpyNode1D: oss << "hipGraphAddMemcpyNode1D("; if (data->args.hipGraphAddMemcpyNode1D.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode1D.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode1D.graph); if (data->args.hipGraphAddMemcpyNode1D.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode1D.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode1D.numDependencies); oss << ", dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode1D.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode1D.src); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode1D.count); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNode1D.kind); oss << ")"; break; case HIP_API_ID_hipGraphAddMemcpyNodeFromSymbol: oss << "hipGraphAddMemcpyNodeFromSymbol("; if (data->args.hipGraphAddMemcpyNodeFromSymbol.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.graph); if (data->args.hipGraphAddMemcpyNodeFromSymbol.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.numDependencies); oss << ", dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.dst); oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.symbol); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.count); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeFromSymbol.kind); oss << ")"; break; case HIP_API_ID_hipGraphAddMemcpyNodeToSymbol: oss << "hipGraphAddMemcpyNodeToSymbol("; if (data->args.hipGraphAddMemcpyNodeToSymbol.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.graph); if (data->args.hipGraphAddMemcpyNodeToSymbol.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.numDependencies); oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.symbol); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.src); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.count); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemcpyNodeToSymbol.kind); oss << ")"; break; case HIP_API_ID_hipGraphAddMemsetNode: oss << "hipGraphAddMemsetNode("; if (data->args.hipGraphAddMemsetNode.pGraphNode == NULL) oss << "pGraphNode=NULL"; else { oss << "pGraphNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemsetNode.pGraphNode__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemsetNode.graph); if (data->args.hipGraphAddMemsetNode.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemsetNode.pDependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemsetNode.numDependencies); if (data->args.hipGraphAddMemsetNode.pMemsetParams == NULL) oss << ", pMemsetParams=NULL"; else { oss << ", pMemsetParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphAddMemsetNode.pMemsetParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphChildGraphNodeGetGraph: oss << "hipGraphChildGraphNodeGetGraph("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphChildGraphNodeGetGraph.node); if (data->args.hipGraphChildGraphNodeGetGraph.pGraph == NULL) oss << ", pGraph=NULL"; else { oss << ", pGraph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphChildGraphNodeGetGraph.pGraph__val); } oss << ")"; break; case HIP_API_ID_hipGraphClone: oss << "hipGraphClone("; if (data->args.hipGraphClone.pGraphClone == NULL) oss << "pGraphClone=NULL"; else { oss << "pGraphClone="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphClone.pGraphClone__val); } oss << ", originalGraph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphClone.originalGraph); oss << ")"; break; case HIP_API_ID_hipGraphCreate: oss << "hipGraphCreate("; if (data->args.hipGraphCreate.pGraph == NULL) oss << "pGraph=NULL"; else { oss << "pGraph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphCreate.pGraph__val); } oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphCreate.flags); oss << ")"; break; case HIP_API_ID_hipGraphDebugDotPrint: oss << "hipGraphDebugDotPrint("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphDebugDotPrint.graph); if (data->args.hipGraphDebugDotPrint.path == NULL) oss << ", path=NULL"; else { oss << ", path="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphDebugDotPrint.path__val); } oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphDebugDotPrint.flags); oss << ")"; break; case HIP_API_ID_hipGraphDestroy: oss << "hipGraphDestroy("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphDestroy.graph); oss << ")"; break; case HIP_API_ID_hipGraphDestroyNode: oss << "hipGraphDestroyNode("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphDestroyNode.node); oss << ")"; break; case HIP_API_ID_hipGraphEventRecordNodeGetEvent: oss << "hipGraphEventRecordNodeGetEvent("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphEventRecordNodeGetEvent.node); if (data->args.hipGraphEventRecordNodeGetEvent.event_out == NULL) oss << ", event_out=NULL"; else { oss << ", event_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphEventRecordNodeGetEvent.event_out__val); } oss << ")"; break; case HIP_API_ID_hipGraphEventRecordNodeSetEvent: oss << "hipGraphEventRecordNodeSetEvent("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphEventRecordNodeSetEvent.node); oss << ", event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphEventRecordNodeSetEvent.event); oss << ")"; break; case HIP_API_ID_hipGraphEventWaitNodeGetEvent: oss << "hipGraphEventWaitNodeGetEvent("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphEventWaitNodeGetEvent.node); if (data->args.hipGraphEventWaitNodeGetEvent.event_out == NULL) oss << ", event_out=NULL"; else { oss << ", event_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphEventWaitNodeGetEvent.event_out__val); } oss << ")"; break; case HIP_API_ID_hipGraphEventWaitNodeSetEvent: oss << "hipGraphEventWaitNodeSetEvent("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphEventWaitNodeSetEvent.node); oss << ", event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphEventWaitNodeSetEvent.event); oss << ")"; break; case HIP_API_ID_hipGraphExecChildGraphNodeSetParams: oss << "hipGraphExecChildGraphNodeSetParams("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecChildGraphNodeSetParams.hGraphExec); oss << ", node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecChildGraphNodeSetParams.node); oss << ", childGraph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecChildGraphNodeSetParams.childGraph); oss << ")"; break; case HIP_API_ID_hipGraphExecDestroy: oss << "hipGraphExecDestroy("; oss << "graphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecDestroy.graphExec); oss << ")"; break; case HIP_API_ID_hipGraphExecEventRecordNodeSetEvent: oss << "hipGraphExecEventRecordNodeSetEvent("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecEventRecordNodeSetEvent.hGraphExec); oss << ", hNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecEventRecordNodeSetEvent.hNode); oss << ", event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecEventRecordNodeSetEvent.event); oss << ")"; break; case HIP_API_ID_hipGraphExecEventWaitNodeSetEvent: oss << "hipGraphExecEventWaitNodeSetEvent("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecEventWaitNodeSetEvent.hGraphExec); oss << ", hNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecEventWaitNodeSetEvent.hNode); oss << ", event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecEventWaitNodeSetEvent.event); oss << ")"; break; case HIP_API_ID_hipGraphExecHostNodeSetParams: oss << "hipGraphExecHostNodeSetParams("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecHostNodeSetParams.hGraphExec); oss << ", node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecHostNodeSetParams.node); if (data->args.hipGraphExecHostNodeSetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecHostNodeSetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphExecKernelNodeSetParams: oss << "hipGraphExecKernelNodeSetParams("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecKernelNodeSetParams.hGraphExec); oss << ", node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecKernelNodeSetParams.node); if (data->args.hipGraphExecKernelNodeSetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecKernelNodeSetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphExecMemcpyNodeSetParams: oss << "hipGraphExecMemcpyNodeSetParams("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams.hGraphExec); oss << ", node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams.node); if (data->args.hipGraphExecMemcpyNodeSetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphExecMemcpyNodeSetParams1D: oss << "hipGraphExecMemcpyNodeSetParams1D("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams1D.hGraphExec); oss << ", node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams1D.node); oss << ", dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams1D.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams1D.src); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams1D.count); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParams1D.kind); oss << ")"; break; case HIP_API_ID_hipGraphExecMemcpyNodeSetParamsFromSymbol: oss << "hipGraphExecMemcpyNodeSetParamsFromSymbol("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsFromSymbol.hGraphExec); oss << ", node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsFromSymbol.node); oss << ", dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsFromSymbol.dst); oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsFromSymbol.symbol); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsFromSymbol.count); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsFromSymbol.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsFromSymbol.kind); oss << ")"; break; case HIP_API_ID_hipGraphExecMemcpyNodeSetParamsToSymbol: oss << "hipGraphExecMemcpyNodeSetParamsToSymbol("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsToSymbol.hGraphExec); oss << ", node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsToSymbol.node); oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsToSymbol.symbol); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsToSymbol.src); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsToSymbol.count); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsToSymbol.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemcpyNodeSetParamsToSymbol.kind); oss << ")"; break; case HIP_API_ID_hipGraphExecMemsetNodeSetParams: oss << "hipGraphExecMemsetNodeSetParams("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemsetNodeSetParams.hGraphExec); oss << ", node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemsetNodeSetParams.node); if (data->args.hipGraphExecMemsetNodeSetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecMemsetNodeSetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphExecUpdate: oss << "hipGraphExecUpdate("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecUpdate.hGraphExec); oss << ", hGraph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecUpdate.hGraph); if (data->args.hipGraphExecUpdate.hErrorNode_out == NULL) oss << ", hErrorNode_out=NULL"; else { oss << ", hErrorNode_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecUpdate.hErrorNode_out__val); } if (data->args.hipGraphExecUpdate.updateResult_out == NULL) oss << ", updateResult_out=NULL"; else { oss << ", updateResult_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphExecUpdate.updateResult_out__val); } oss << ")"; break; case HIP_API_ID_hipGraphGetEdges: oss << "hipGraphGetEdges("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetEdges.graph); if (data->args.hipGraphGetEdges.from == NULL) oss << ", from=NULL"; else { oss << ", from="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetEdges.from__val); } if (data->args.hipGraphGetEdges.to == NULL) oss << ", to=NULL"; else { oss << ", to="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetEdges.to__val); } if (data->args.hipGraphGetEdges.numEdges == NULL) oss << ", numEdges=NULL"; else { oss << ", numEdges="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetEdges.numEdges__val); } oss << ")"; break; case HIP_API_ID_hipGraphGetNodes: oss << "hipGraphGetNodes("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetNodes.graph); if (data->args.hipGraphGetNodes.nodes == NULL) oss << ", nodes=NULL"; else { oss << ", nodes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetNodes.nodes__val); } if (data->args.hipGraphGetNodes.numNodes == NULL) oss << ", numNodes=NULL"; else { oss << ", numNodes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetNodes.numNodes__val); } oss << ")"; break; case HIP_API_ID_hipGraphGetRootNodes: oss << "hipGraphGetRootNodes("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetRootNodes.graph); if (data->args.hipGraphGetRootNodes.pRootNodes == NULL) oss << ", pRootNodes=NULL"; else { oss << ", pRootNodes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetRootNodes.pRootNodes__val); } if (data->args.hipGraphGetRootNodes.pNumRootNodes == NULL) oss << ", pNumRootNodes=NULL"; else { oss << ", pNumRootNodes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphGetRootNodes.pNumRootNodes__val); } oss << ")"; break; case HIP_API_ID_hipGraphHostNodeGetParams: oss << "hipGraphHostNodeGetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphHostNodeGetParams.node); if (data->args.hipGraphHostNodeGetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphHostNodeGetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphHostNodeSetParams: oss << "hipGraphHostNodeSetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphHostNodeSetParams.node); if (data->args.hipGraphHostNodeSetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphHostNodeSetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphInstantiate: oss << "hipGraphInstantiate("; if (data->args.hipGraphInstantiate.pGraphExec == NULL) oss << "pGraphExec=NULL"; else { oss << "pGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphInstantiate.pGraphExec__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphInstantiate.graph); if (data->args.hipGraphInstantiate.pErrorNode == NULL) oss << ", pErrorNode=NULL"; else { oss << ", pErrorNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphInstantiate.pErrorNode__val); } if (data->args.hipGraphInstantiate.pLogBuffer == NULL) oss << ", pLogBuffer=NULL"; else { oss << ", pLogBuffer="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphInstantiate.pLogBuffer__val); } oss << ", bufferSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphInstantiate.bufferSize); oss << ")"; break; case HIP_API_ID_hipGraphInstantiateWithFlags: oss << "hipGraphInstantiateWithFlags("; if (data->args.hipGraphInstantiateWithFlags.pGraphExec == NULL) oss << "pGraphExec=NULL"; else { oss << "pGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphInstantiateWithFlags.pGraphExec__val); } oss << ", graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphInstantiateWithFlags.graph); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphInstantiateWithFlags.flags); oss << ")"; break; case HIP_API_ID_hipGraphKernelNodeCopyAttributes: oss << "hipGraphKernelNodeCopyAttributes("; oss << "hSrc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeCopyAttributes.hSrc); oss << ", hDst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeCopyAttributes.hDst); oss << ")"; break; case HIP_API_ID_hipGraphKernelNodeGetAttribute: oss << "hipGraphKernelNodeGetAttribute("; oss << "hNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeGetAttribute.hNode); oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeGetAttribute.attr); if (data->args.hipGraphKernelNodeGetAttribute.value == NULL) oss << ", value=NULL"; else { oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeGetAttribute.value__val); } oss << ")"; break; case HIP_API_ID_hipGraphKernelNodeGetParams: oss << "hipGraphKernelNodeGetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeGetParams.node); if (data->args.hipGraphKernelNodeGetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeGetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphKernelNodeSetAttribute: oss << "hipGraphKernelNodeSetAttribute("; oss << "hNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeSetAttribute.hNode); oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeSetAttribute.attr); if (data->args.hipGraphKernelNodeSetAttribute.value == NULL) oss << ", value=NULL"; else { oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeSetAttribute.value__val); } oss << ")"; break; case HIP_API_ID_hipGraphKernelNodeSetParams: oss << "hipGraphKernelNodeSetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeSetParams.node); if (data->args.hipGraphKernelNodeSetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphKernelNodeSetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphLaunch: oss << "hipGraphLaunch("; oss << "graphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphLaunch.graphExec); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphLaunch.stream); oss << ")"; break; case HIP_API_ID_hipGraphMemAllocNodeGetParams: oss << "hipGraphMemAllocNodeGetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemAllocNodeGetParams.node); if (data->args.hipGraphMemAllocNodeGetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemAllocNodeGetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphMemFreeNodeGetParams: oss << "hipGraphMemFreeNodeGetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemFreeNodeGetParams.node); oss << ", dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemFreeNodeGetParams.dev_ptr); oss << ")"; break; case HIP_API_ID_hipGraphMemcpyNodeGetParams: oss << "hipGraphMemcpyNodeGetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeGetParams.node); if (data->args.hipGraphMemcpyNodeGetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeGetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphMemcpyNodeSetParams: oss << "hipGraphMemcpyNodeSetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParams.node); if (data->args.hipGraphMemcpyNodeSetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphMemcpyNodeSetParams1D: oss << "hipGraphMemcpyNodeSetParams1D("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParams1D.node); oss << ", dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParams1D.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParams1D.src); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParams1D.count); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParams1D.kind); oss << ")"; break; case HIP_API_ID_hipGraphMemcpyNodeSetParamsFromSymbol: oss << "hipGraphMemcpyNodeSetParamsFromSymbol("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsFromSymbol.node); oss << ", dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsFromSymbol.dst); oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsFromSymbol.symbol); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsFromSymbol.count); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsFromSymbol.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsFromSymbol.kind); oss << ")"; break; case HIP_API_ID_hipGraphMemcpyNodeSetParamsToSymbol: oss << "hipGraphMemcpyNodeSetParamsToSymbol("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsToSymbol.node); oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsToSymbol.symbol); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsToSymbol.src); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsToSymbol.count); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsToSymbol.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemcpyNodeSetParamsToSymbol.kind); oss << ")"; break; case HIP_API_ID_hipGraphMemsetNodeGetParams: oss << "hipGraphMemsetNodeGetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemsetNodeGetParams.node); if (data->args.hipGraphMemsetNodeGetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemsetNodeGetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphMemsetNodeSetParams: oss << "hipGraphMemsetNodeSetParams("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemsetNodeSetParams.node); if (data->args.hipGraphMemsetNodeSetParams.pNodeParams == NULL) oss << ", pNodeParams=NULL"; else { oss << ", pNodeParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphMemsetNodeSetParams.pNodeParams__val); } oss << ")"; break; case HIP_API_ID_hipGraphNodeFindInClone: oss << "hipGraphNodeFindInClone("; if (data->args.hipGraphNodeFindInClone.pNode == NULL) oss << "pNode=NULL"; else { oss << "pNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeFindInClone.pNode__val); } oss << ", originalNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeFindInClone.originalNode); oss << ", clonedGraph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeFindInClone.clonedGraph); oss << ")"; break; case HIP_API_ID_hipGraphNodeGetDependencies: oss << "hipGraphNodeGetDependencies("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetDependencies.node); if (data->args.hipGraphNodeGetDependencies.pDependencies == NULL) oss << ", pDependencies=NULL"; else { oss << ", pDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetDependencies.pDependencies__val); } if (data->args.hipGraphNodeGetDependencies.pNumDependencies == NULL) oss << ", pNumDependencies=NULL"; else { oss << ", pNumDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetDependencies.pNumDependencies__val); } oss << ")"; break; case HIP_API_ID_hipGraphNodeGetDependentNodes: oss << "hipGraphNodeGetDependentNodes("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetDependentNodes.node); if (data->args.hipGraphNodeGetDependentNodes.pDependentNodes == NULL) oss << ", pDependentNodes=NULL"; else { oss << ", pDependentNodes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetDependentNodes.pDependentNodes__val); } if (data->args.hipGraphNodeGetDependentNodes.pNumDependentNodes == NULL) oss << ", pNumDependentNodes=NULL"; else { oss << ", pNumDependentNodes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetDependentNodes.pNumDependentNodes__val); } oss << ")"; break; case HIP_API_ID_hipGraphNodeGetEnabled: oss << "hipGraphNodeGetEnabled("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetEnabled.hGraphExec); oss << ", hNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetEnabled.hNode); if (data->args.hipGraphNodeGetEnabled.isEnabled == NULL) oss << ", isEnabled=NULL"; else { oss << ", isEnabled="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetEnabled.isEnabled__val); } oss << ")"; break; case HIP_API_ID_hipGraphNodeGetType: oss << "hipGraphNodeGetType("; oss << "node="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetType.node); if (data->args.hipGraphNodeGetType.pType == NULL) oss << ", pType=NULL"; else { oss << ", pType="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeGetType.pType__val); } oss << ")"; break; case HIP_API_ID_hipGraphNodeSetEnabled: oss << "hipGraphNodeSetEnabled("; oss << "hGraphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeSetEnabled.hGraphExec); oss << ", hNode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeSetEnabled.hNode); oss << ", isEnabled="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphNodeSetEnabled.isEnabled); oss << ")"; break; case HIP_API_ID_hipGraphReleaseUserObject: oss << "hipGraphReleaseUserObject("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphReleaseUserObject.graph); oss << ", object="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphReleaseUserObject.object); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphReleaseUserObject.count); oss << ")"; break; case HIP_API_ID_hipGraphRemoveDependencies: oss << "hipGraphRemoveDependencies("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphRemoveDependencies.graph); if (data->args.hipGraphRemoveDependencies.from == NULL) oss << ", from=NULL"; else { oss << ", from="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphRemoveDependencies.from__val); } if (data->args.hipGraphRemoveDependencies.to == NULL) oss << ", to=NULL"; else { oss << ", to="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphRemoveDependencies.to__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphRemoveDependencies.numDependencies); oss << ")"; break; case HIP_API_ID_hipGraphRetainUserObject: oss << "hipGraphRetainUserObject("; oss << "graph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphRetainUserObject.graph); oss << ", object="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphRetainUserObject.object); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphRetainUserObject.count); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphRetainUserObject.flags); oss << ")"; break; case HIP_API_ID_hipGraphUpload: oss << "hipGraphUpload("; oss << "graphExec="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphUpload.graphExec); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphUpload.stream); oss << ")"; break; case HIP_API_ID_hipGraphicsGLRegisterBuffer: oss << "hipGraphicsGLRegisterBuffer("; if (data->args.hipGraphicsGLRegisterBuffer.resource == NULL) oss << "resource=NULL"; else { oss << "resource="; roctracer::hip_support::detail::operator<<(oss, (void*)data->args.hipGraphicsGLRegisterBuffer.resource__val); } oss << ", buffer="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsGLRegisterBuffer.buffer); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsGLRegisterBuffer.flags); oss << ")"; break; case HIP_API_ID_hipGraphicsGLRegisterImage: oss << "hipGraphicsGLRegisterImage("; if (data->args.hipGraphicsGLRegisterImage.resource == NULL) oss << "resource=NULL"; else { oss << "resource="; roctracer::hip_support::detail::operator<<(oss, (void*)data->args.hipGraphicsGLRegisterImage.resource__val); } oss << ", image="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsGLRegisterImage.image); oss << ", target="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsGLRegisterImage.target); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsGLRegisterImage.flags); oss << ")"; break; case HIP_API_ID_hipGraphicsMapResources: oss << "hipGraphicsMapResources("; oss << "count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsMapResources.count); if (data->args.hipGraphicsMapResources.resources == NULL) oss << ", resources=NULL"; else { oss << ", resources="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsMapResources.resources__val); } oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsMapResources.stream); oss << ")"; break; case HIP_API_ID_hipGraphicsResourceGetMappedPointer: oss << "hipGraphicsResourceGetMappedPointer("; if (data->args.hipGraphicsResourceGetMappedPointer.devPtr == NULL) oss << "devPtr=NULL"; else { oss << "devPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsResourceGetMappedPointer.devPtr__val); } if (data->args.hipGraphicsResourceGetMappedPointer.size == NULL) oss << ", size=NULL"; else { oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsResourceGetMappedPointer.size__val); } oss << ", resource="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsResourceGetMappedPointer.resource); oss << ")"; break; case HIP_API_ID_hipGraphicsSubResourceGetMappedArray: oss << "hipGraphicsSubResourceGetMappedArray("; if (data->args.hipGraphicsSubResourceGetMappedArray.array == NULL) oss << "array=NULL"; else { oss << "array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsSubResourceGetMappedArray.array__val); } oss << ", resource="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsSubResourceGetMappedArray.resource); oss << ", arrayIndex="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsSubResourceGetMappedArray.arrayIndex); oss << ", mipLevel="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsSubResourceGetMappedArray.mipLevel); oss << ")"; break; case HIP_API_ID_hipGraphicsUnmapResources: oss << "hipGraphicsUnmapResources("; oss << "count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsUnmapResources.count); if (data->args.hipGraphicsUnmapResources.resources == NULL) oss << ", resources=NULL"; else { oss << ", resources="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsUnmapResources.resources__val); } oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsUnmapResources.stream); oss << ")"; break; case HIP_API_ID_hipGraphicsUnregisterResource: oss << "hipGraphicsUnregisterResource("; oss << "resource="; roctracer::hip_support::detail::operator<<(oss, data->args.hipGraphicsUnregisterResource.resource); oss << ")"; break; case HIP_API_ID_hipHccModuleLaunchKernel: oss << "hipHccModuleLaunchKernel("; oss << "f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.f); oss << ", globalWorkSizeX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.globalWorkSizeX); oss << ", globalWorkSizeY="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.globalWorkSizeY); oss << ", globalWorkSizeZ="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.globalWorkSizeZ); oss << ", blockDimX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.blockDimX); oss << ", blockDimY="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.blockDimY); oss << ", blockDimZ="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.blockDimZ); oss << ", sharedMemBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.sharedMemBytes); oss << ", hStream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.hStream); if (data->args.hipHccModuleLaunchKernel.kernelParams == NULL) oss << ", kernelParams=NULL"; else { oss << ", kernelParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.kernelParams__val); } if (data->args.hipHccModuleLaunchKernel.extra == NULL) oss << ", extra=NULL"; else { oss << ", extra="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.extra__val); } oss << ", startEvent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.startEvent); oss << ", stopEvent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHccModuleLaunchKernel.stopEvent); oss << ")"; break; case HIP_API_ID_hipHostAlloc: oss << "hipHostAlloc("; if (data->args.hipHostAlloc.ptr == NULL) oss << "ptr=NULL"; else { oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostAlloc.ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostAlloc.size); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostAlloc.flags); oss << ")"; break; case HIP_API_ID_hipHostFree: oss << "hipHostFree("; oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostFree.ptr); oss << ")"; break; case HIP_API_ID_hipHostGetDevicePointer: oss << "hipHostGetDevicePointer("; if (data->args.hipHostGetDevicePointer.devPtr == NULL) oss << "devPtr=NULL"; else { oss << "devPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostGetDevicePointer.devPtr__val); } oss << ", hstPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostGetDevicePointer.hstPtr); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostGetDevicePointer.flags); oss << ")"; break; case HIP_API_ID_hipHostGetFlags: oss << "hipHostGetFlags("; if (data->args.hipHostGetFlags.flagsPtr == NULL) oss << "flagsPtr=NULL"; else { oss << "flagsPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostGetFlags.flagsPtr__val); } oss << ", hostPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostGetFlags.hostPtr); oss << ")"; break; case HIP_API_ID_hipHostMalloc: oss << "hipHostMalloc("; if (data->args.hipHostMalloc.ptr == NULL) oss << "ptr=NULL"; else { oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostMalloc.ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostMalloc.size); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostMalloc.flags); oss << ")"; break; case HIP_API_ID_hipHostRegister: oss << "hipHostRegister("; oss << "hostPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostRegister.hostPtr); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostRegister.sizeBytes); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostRegister.flags); oss << ")"; break; case HIP_API_ID_hipHostUnregister: oss << "hipHostUnregister("; oss << "hostPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipHostUnregister.hostPtr); oss << ")"; break; case HIP_API_ID_hipImportExternalMemory: oss << "hipImportExternalMemory("; if (data->args.hipImportExternalMemory.extMem_out == NULL) oss << "extMem_out=NULL"; else { oss << "extMem_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipImportExternalMemory.extMem_out__val); } if (data->args.hipImportExternalMemory.memHandleDesc == NULL) oss << ", memHandleDesc=NULL"; else { oss << ", memHandleDesc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipImportExternalMemory.memHandleDesc__val); } oss << ")"; break; case HIP_API_ID_hipImportExternalSemaphore: oss << "hipImportExternalSemaphore("; if (data->args.hipImportExternalSemaphore.extSem_out == NULL) oss << "extSem_out=NULL"; else { oss << "extSem_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipImportExternalSemaphore.extSem_out__val); } if (data->args.hipImportExternalSemaphore.semHandleDesc == NULL) oss << ", semHandleDesc=NULL"; else { oss << ", semHandleDesc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipImportExternalSemaphore.semHandleDesc__val); } oss << ")"; break; case HIP_API_ID_hipInit: oss << "hipInit("; oss << "flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipInit.flags); oss << ")"; break; case HIP_API_ID_hipIpcCloseMemHandle: oss << "hipIpcCloseMemHandle("; oss << "devPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcCloseMemHandle.devPtr); oss << ")"; break; case HIP_API_ID_hipIpcGetEventHandle: oss << "hipIpcGetEventHandle("; if (data->args.hipIpcGetEventHandle.handle == NULL) oss << "handle=NULL"; else { oss << "handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcGetEventHandle.handle__val); } oss << ", event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcGetEventHandle.event); oss << ")"; break; case HIP_API_ID_hipIpcGetMemHandle: oss << "hipIpcGetMemHandle("; if (data->args.hipIpcGetMemHandle.handle == NULL) oss << "handle=NULL"; else { oss << "handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcGetMemHandle.handle__val); } oss << ", devPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcGetMemHandle.devPtr); oss << ")"; break; case HIP_API_ID_hipIpcOpenEventHandle: oss << "hipIpcOpenEventHandle("; if (data->args.hipIpcOpenEventHandle.event == NULL) oss << "event=NULL"; else { oss << "event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcOpenEventHandle.event__val); } oss << ", handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcOpenEventHandle.handle); oss << ")"; break; case HIP_API_ID_hipIpcOpenMemHandle: oss << "hipIpcOpenMemHandle("; if (data->args.hipIpcOpenMemHandle.devPtr == NULL) oss << "devPtr=NULL"; else { oss << "devPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcOpenMemHandle.devPtr__val); } oss << ", handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcOpenMemHandle.handle); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipIpcOpenMemHandle.flags); oss << ")"; break; case HIP_API_ID_hipLaunchByPtr: oss << "hipLaunchByPtr("; oss << "hostFunction="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchByPtr.hostFunction); oss << ")"; break; case HIP_API_ID_hipLaunchCooperativeKernel: oss << "hipLaunchCooperativeKernel("; oss << "f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernel.f); oss << ", gridDim="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernel.gridDim); oss << ", blockDimX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernel.blockDimX); if (data->args.hipLaunchCooperativeKernel.kernelParams == NULL) oss << ", kernelParams=NULL"; else { oss << ", kernelParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernel.kernelParams__val); } oss << ", sharedMemBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernel.sharedMemBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernel.stream); oss << ")"; break; case HIP_API_ID_hipLaunchCooperativeKernelMultiDevice: oss << "hipLaunchCooperativeKernelMultiDevice("; if (data->args.hipLaunchCooperativeKernelMultiDevice.launchParamsList == NULL) oss << "launchParamsList=NULL"; else { oss << "launchParamsList="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernelMultiDevice.launchParamsList__val); } oss << ", numDevices="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernelMultiDevice.numDevices); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchCooperativeKernelMultiDevice.flags); oss << ")"; break; case HIP_API_ID_hipLaunchHostFunc: oss << "hipLaunchHostFunc("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchHostFunc.stream); oss << ", fn="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchHostFunc.fn); oss << ", userData="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchHostFunc.userData); oss << ")"; break; case HIP_API_ID_hipLaunchKernel: oss << "hipLaunchKernel("; oss << "function_address="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchKernel.function_address); oss << ", numBlocks="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchKernel.numBlocks); oss << ", dimBlocks="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchKernel.dimBlocks); if (data->args.hipLaunchKernel.args == NULL) oss << ", args=NULL"; else { oss << ", args="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchKernel.args__val); } oss << ", sharedMemBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchKernel.sharedMemBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipLaunchKernel.stream); oss << ")"; break; case HIP_API_ID_hipMalloc: oss << "hipMalloc("; if (data->args.hipMalloc.ptr == NULL) oss << "ptr=NULL"; else { oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMalloc.ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMalloc.size); oss << ")"; break; case HIP_API_ID_hipMalloc3D: oss << "hipMalloc3D("; if (data->args.hipMalloc3D.pitchedDevPtr == NULL) oss << "pitchedDevPtr=NULL"; else { oss << "pitchedDevPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMalloc3D.pitchedDevPtr__val); } oss << ", extent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMalloc3D.extent); oss << ")"; break; case HIP_API_ID_hipMalloc3DArray: oss << "hipMalloc3DArray("; if (data->args.hipMalloc3DArray.array == NULL) oss << "array=NULL"; else { oss << "array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMalloc3DArray.array__val); } if (data->args.hipMalloc3DArray.desc == NULL) oss << ", desc=NULL"; else { oss << ", desc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMalloc3DArray.desc__val); } oss << ", extent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMalloc3DArray.extent); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMalloc3DArray.flags); oss << ")"; break; case HIP_API_ID_hipMallocArray: oss << "hipMallocArray("; if (data->args.hipMallocArray.array == NULL) oss << "array=NULL"; else { oss << "array="; roctracer::hip_support::detail::operator<<(oss, (void*)data->args.hipMallocArray.array__val); } if (data->args.hipMallocArray.desc == NULL) oss << ", desc=NULL"; else { oss << ", desc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocArray.desc__val); } oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocArray.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocArray.height); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocArray.flags); oss << ")"; break; case HIP_API_ID_hipMallocAsync: oss << "hipMallocAsync("; if (data->args.hipMallocAsync.dev_ptr == NULL) oss << "dev_ptr=NULL"; else { oss << "dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocAsync.dev_ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocAsync.size); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocAsync.stream); oss << ")"; break; case HIP_API_ID_hipMallocFromPoolAsync: oss << "hipMallocFromPoolAsync("; if (data->args.hipMallocFromPoolAsync.dev_ptr == NULL) oss << "dev_ptr=NULL"; else { oss << "dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocFromPoolAsync.dev_ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocFromPoolAsync.size); oss << ", mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocFromPoolAsync.mem_pool); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocFromPoolAsync.stream); oss << ")"; break; case HIP_API_ID_hipMallocHost: oss << "hipMallocHost("; if (data->args.hipMallocHost.ptr == NULL) oss << "ptr=NULL"; else { oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocHost.ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocHost.size); oss << ")"; break; case HIP_API_ID_hipMallocManaged: oss << "hipMallocManaged("; if (data->args.hipMallocManaged.dev_ptr == NULL) oss << "dev_ptr=NULL"; else { oss << "dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocManaged.dev_ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocManaged.size); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocManaged.flags); oss << ")"; break; case HIP_API_ID_hipMallocMipmappedArray: oss << "hipMallocMipmappedArray("; if (data->args.hipMallocMipmappedArray.mipmappedArray == NULL) oss << "mipmappedArray=NULL"; else { oss << "mipmappedArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocMipmappedArray.mipmappedArray__val); } if (data->args.hipMallocMipmappedArray.desc == NULL) oss << ", desc=NULL"; else { oss << ", desc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocMipmappedArray.desc__val); } oss << ", extent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocMipmappedArray.extent); oss << ", numLevels="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocMipmappedArray.numLevels); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocMipmappedArray.flags); oss << ")"; break; case HIP_API_ID_hipMallocPitch: oss << "hipMallocPitch("; if (data->args.hipMallocPitch.ptr == NULL) oss << "ptr=NULL"; else { oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocPitch.ptr__val); } if (data->args.hipMallocPitch.pitch == NULL) oss << ", pitch=NULL"; else { oss << ", pitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocPitch.pitch__val); } oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocPitch.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMallocPitch.height); oss << ")"; break; case HIP_API_ID_hipMemAddressFree: oss << "hipMemAddressFree("; oss << "devPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAddressFree.devPtr); oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAddressFree.size); oss << ")"; break; case HIP_API_ID_hipMemAddressReserve: oss << "hipMemAddressReserve("; if (data->args.hipMemAddressReserve.ptr == NULL) oss << "ptr=NULL"; else { oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAddressReserve.ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAddressReserve.size); oss << ", alignment="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAddressReserve.alignment); oss << ", addr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAddressReserve.addr); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAddressReserve.flags); oss << ")"; break; case HIP_API_ID_hipMemAdvise: oss << "hipMemAdvise("; oss << "dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAdvise.dev_ptr); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAdvise.count); oss << ", advice="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAdvise.advice); oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAdvise.device); oss << ")"; break; case HIP_API_ID_hipMemAllocHost: oss << "hipMemAllocHost("; if (data->args.hipMemAllocHost.ptr == NULL) oss << "ptr=NULL"; else { oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAllocHost.ptr__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAllocHost.size); oss << ")"; break; case HIP_API_ID_hipMemAllocPitch: oss << "hipMemAllocPitch("; if (data->args.hipMemAllocPitch.dptr == NULL) oss << "dptr=NULL"; else { oss << "dptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAllocPitch.dptr__val); } if (data->args.hipMemAllocPitch.pitch == NULL) oss << ", pitch=NULL"; else { oss << ", pitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAllocPitch.pitch__val); } oss << ", widthInBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAllocPitch.widthInBytes); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAllocPitch.height); oss << ", elementSizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemAllocPitch.elementSizeBytes); oss << ")"; break; case HIP_API_ID_hipMemCreate: oss << "hipMemCreate("; if (data->args.hipMemCreate.handle == NULL) oss << "handle=NULL"; else { oss << "handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemCreate.handle__val); } oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemCreate.size); if (data->args.hipMemCreate.prop == NULL) oss << ", prop=NULL"; else { oss << ", prop="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemCreate.prop__val); } oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemCreate.flags); oss << ")"; break; case HIP_API_ID_hipMemExportToShareableHandle: oss << "hipMemExportToShareableHandle("; oss << "shareableHandle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemExportToShareableHandle.shareableHandle); oss << ", handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemExportToShareableHandle.handle); oss << ", handleType="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemExportToShareableHandle.handleType); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemExportToShareableHandle.flags); oss << ")"; break; case HIP_API_ID_hipMemGetAccess: oss << "hipMemGetAccess("; if (data->args.hipMemGetAccess.flags == NULL) oss << "flags=NULL"; else { oss << "flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAccess.flags__val); } if (data->args.hipMemGetAccess.location == NULL) oss << ", location=NULL"; else { oss << ", location="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAccess.location__val); } oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAccess.ptr); oss << ")"; break; case HIP_API_ID_hipMemGetAddressRange: oss << "hipMemGetAddressRange("; if (data->args.hipMemGetAddressRange.pbase == NULL) oss << "pbase=NULL"; else { oss << "pbase="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAddressRange.pbase__val); } if (data->args.hipMemGetAddressRange.psize == NULL) oss << ", psize=NULL"; else { oss << ", psize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAddressRange.psize__val); } oss << ", dptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAddressRange.dptr); oss << ")"; break; case HIP_API_ID_hipMemGetAllocationGranularity: oss << "hipMemGetAllocationGranularity("; if (data->args.hipMemGetAllocationGranularity.granularity == NULL) oss << "granularity=NULL"; else { oss << "granularity="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAllocationGranularity.granularity__val); } if (data->args.hipMemGetAllocationGranularity.prop == NULL) oss << ", prop=NULL"; else { oss << ", prop="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAllocationGranularity.prop__val); } oss << ", option="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAllocationGranularity.option); oss << ")"; break; case HIP_API_ID_hipMemGetAllocationPropertiesFromHandle: oss << "hipMemGetAllocationPropertiesFromHandle("; if (data->args.hipMemGetAllocationPropertiesFromHandle.prop == NULL) oss << "prop=NULL"; else { oss << "prop="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAllocationPropertiesFromHandle.prop__val); } oss << ", handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetAllocationPropertiesFromHandle.handle); oss << ")"; break; case HIP_API_ID_hipMemGetInfo: oss << "hipMemGetInfo("; if (data->args.hipMemGetInfo.free == NULL) oss << "free=NULL"; else { oss << "free="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetInfo.free__val); } if (data->args.hipMemGetInfo.total == NULL) oss << ", total=NULL"; else { oss << ", total="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemGetInfo.total__val); } oss << ")"; break; case HIP_API_ID_hipMemImportFromShareableHandle: oss << "hipMemImportFromShareableHandle("; if (data->args.hipMemImportFromShareableHandle.handle == NULL) oss << "handle=NULL"; else { oss << "handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemImportFromShareableHandle.handle__val); } oss << ", osHandle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemImportFromShareableHandle.osHandle); oss << ", shHandleType="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemImportFromShareableHandle.shHandleType); oss << ")"; break; case HIP_API_ID_hipMemMap: oss << "hipMemMap("; oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemMap.ptr); oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemMap.size); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemMap.offset); oss << ", handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemMap.handle); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemMap.flags); oss << ")"; break; case HIP_API_ID_hipMemMapArrayAsync: oss << "hipMemMapArrayAsync("; if (data->args.hipMemMapArrayAsync.mapInfoList == NULL) oss << "mapInfoList=NULL"; else { oss << "mapInfoList="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemMapArrayAsync.mapInfoList__val); } oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemMapArrayAsync.count); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemMapArrayAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemPoolCreate: oss << "hipMemPoolCreate("; if (data->args.hipMemPoolCreate.mem_pool == NULL) oss << "mem_pool=NULL"; else { oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolCreate.mem_pool__val); } if (data->args.hipMemPoolCreate.pool_props == NULL) oss << ", pool_props=NULL"; else { oss << ", pool_props="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolCreate.pool_props__val); } oss << ")"; break; case HIP_API_ID_hipMemPoolDestroy: oss << "hipMemPoolDestroy("; oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolDestroy.mem_pool); oss << ")"; break; case HIP_API_ID_hipMemPoolExportPointer: oss << "hipMemPoolExportPointer("; if (data->args.hipMemPoolExportPointer.export_data == NULL) oss << "export_data=NULL"; else { oss << "export_data="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolExportPointer.export_data__val); } oss << ", dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolExportPointer.dev_ptr); oss << ")"; break; case HIP_API_ID_hipMemPoolExportToShareableHandle: oss << "hipMemPoolExportToShareableHandle("; oss << "shared_handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolExportToShareableHandle.shared_handle); oss << ", mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolExportToShareableHandle.mem_pool); oss << ", handle_type="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolExportToShareableHandle.handle_type); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolExportToShareableHandle.flags); oss << ")"; break; case HIP_API_ID_hipMemPoolGetAccess: oss << "hipMemPoolGetAccess("; if (data->args.hipMemPoolGetAccess.flags == NULL) oss << "flags=NULL"; else { oss << "flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolGetAccess.flags__val); } oss << ", mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolGetAccess.mem_pool); if (data->args.hipMemPoolGetAccess.location == NULL) oss << ", location=NULL"; else { oss << ", location="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolGetAccess.location__val); } oss << ")"; break; case HIP_API_ID_hipMemPoolGetAttribute: oss << "hipMemPoolGetAttribute("; oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolGetAttribute.mem_pool); oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolGetAttribute.attr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolGetAttribute.value); oss << ")"; break; case HIP_API_ID_hipMemPoolImportFromShareableHandle: oss << "hipMemPoolImportFromShareableHandle("; if (data->args.hipMemPoolImportFromShareableHandle.mem_pool == NULL) oss << "mem_pool=NULL"; else { oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolImportFromShareableHandle.mem_pool__val); } oss << ", shared_handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolImportFromShareableHandle.shared_handle); oss << ", handle_type="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolImportFromShareableHandle.handle_type); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolImportFromShareableHandle.flags); oss << ")"; break; case HIP_API_ID_hipMemPoolImportPointer: oss << "hipMemPoolImportPointer("; if (data->args.hipMemPoolImportPointer.dev_ptr == NULL) oss << "dev_ptr=NULL"; else { oss << "dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolImportPointer.dev_ptr__val); } oss << ", mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolImportPointer.mem_pool); if (data->args.hipMemPoolImportPointer.export_data == NULL) oss << ", export_data=NULL"; else { oss << ", export_data="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolImportPointer.export_data__val); } oss << ")"; break; case HIP_API_ID_hipMemPoolSetAccess: oss << "hipMemPoolSetAccess("; oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolSetAccess.mem_pool); if (data->args.hipMemPoolSetAccess.desc_list == NULL) oss << ", desc_list=NULL"; else { oss << ", desc_list="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolSetAccess.desc_list__val); } oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolSetAccess.count); oss << ")"; break; case HIP_API_ID_hipMemPoolSetAttribute: oss << "hipMemPoolSetAttribute("; oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolSetAttribute.mem_pool); oss << ", attr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolSetAttribute.attr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolSetAttribute.value); oss << ")"; break; case HIP_API_ID_hipMemPoolTrimTo: oss << "hipMemPoolTrimTo("; oss << "mem_pool="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolTrimTo.mem_pool); oss << ", min_bytes_to_hold="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPoolTrimTo.min_bytes_to_hold); oss << ")"; break; case HIP_API_ID_hipMemPrefetchAsync: oss << "hipMemPrefetchAsync("; oss << "dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPrefetchAsync.dev_ptr); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPrefetchAsync.count); oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPrefetchAsync.device); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPrefetchAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemPtrGetInfo: oss << "hipMemPtrGetInfo("; oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPtrGetInfo.ptr); if (data->args.hipMemPtrGetInfo.size == NULL) oss << ", size=NULL"; else { oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemPtrGetInfo.size__val); } oss << ")"; break; case HIP_API_ID_hipMemRangeGetAttribute: oss << "hipMemRangeGetAttribute("; oss << "data="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttribute.data); oss << ", data_size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttribute.data_size); oss << ", attribute="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttribute.attribute); oss << ", dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttribute.dev_ptr); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttribute.count); oss << ")"; break; case HIP_API_ID_hipMemRangeGetAttributes: oss << "hipMemRangeGetAttributes("; if (data->args.hipMemRangeGetAttributes.data == NULL) oss << "data=NULL"; else { oss << "data="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttributes.data__val); } if (data->args.hipMemRangeGetAttributes.data_sizes == NULL) oss << ", data_sizes=NULL"; else { oss << ", data_sizes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttributes.data_sizes__val); } if (data->args.hipMemRangeGetAttributes.attributes == NULL) oss << ", attributes=NULL"; else { oss << ", attributes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttributes.attributes__val); } oss << ", num_attributes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttributes.num_attributes); oss << ", dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttributes.dev_ptr); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRangeGetAttributes.count); oss << ")"; break; case HIP_API_ID_hipMemRelease: oss << "hipMemRelease("; oss << "handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRelease.handle); oss << ")"; break; case HIP_API_ID_hipMemRetainAllocationHandle: oss << "hipMemRetainAllocationHandle("; if (data->args.hipMemRetainAllocationHandle.handle == NULL) oss << "handle=NULL"; else { oss << "handle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRetainAllocationHandle.handle__val); } oss << ", addr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemRetainAllocationHandle.addr); oss << ")"; break; case HIP_API_ID_hipMemSetAccess: oss << "hipMemSetAccess("; oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemSetAccess.ptr); oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemSetAccess.size); if (data->args.hipMemSetAccess.desc == NULL) oss << ", desc=NULL"; else { oss << ", desc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemSetAccess.desc__val); } oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemSetAccess.count); oss << ")"; break; case HIP_API_ID_hipMemUnmap: oss << "hipMemUnmap("; oss << "ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemUnmap.ptr); oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemUnmap.size); oss << ")"; break; case HIP_API_ID_hipMemcpy: oss << "hipMemcpy("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy.sizeBytes); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy.kind); oss << ")"; break; case HIP_API_ID_hipMemcpy2D: oss << "hipMemcpy2D("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2D.dst); oss << ", dpitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2D.dpitch); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2D.src); oss << ", spitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2D.spitch); oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2D.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2D.height); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2D.kind); oss << ")"; break; case HIP_API_ID_hipMemcpy2DAsync: oss << "hipMemcpy2DAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DAsync.dst); oss << ", dpitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DAsync.dpitch); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DAsync.src); oss << ", spitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DAsync.spitch); oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DAsync.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DAsync.height); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DAsync.kind); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpy2DFromArray: oss << "hipMemcpy2DFromArray("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArray.dst); oss << ", dpitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArray.dpitch); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArray.src); oss << ", wOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArray.wOffset); oss << ", hOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArray.hOffset); oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArray.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArray.height); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArray.kind); oss << ")"; break; case HIP_API_ID_hipMemcpy2DFromArrayAsync: oss << "hipMemcpy2DFromArrayAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.dst); oss << ", dpitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.dpitch); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.src); oss << ", wOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.wOffset); oss << ", hOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.hOffset); oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.height); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.kind); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DFromArrayAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpy2DToArray: oss << "hipMemcpy2DToArray("; if (data->args.hipMemcpy2DToArray.dst == NULL) oss << "dst=NULL"; else { oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArray.dst__val); } oss << ", wOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArray.wOffset); oss << ", hOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArray.hOffset); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArray.src); oss << ", spitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArray.spitch); oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArray.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArray.height); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArray.kind); oss << ")"; break; case HIP_API_ID_hipMemcpy2DToArrayAsync: oss << "hipMemcpy2DToArrayAsync("; if (data->args.hipMemcpy2DToArrayAsync.dst == NULL) oss << "dst=NULL"; else { oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.dst__val); } oss << ", wOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.wOffset); oss << ", hOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.hOffset); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.src); oss << ", spitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.spitch); oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.height); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.kind); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy2DToArrayAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpy3D: oss << "hipMemcpy3D("; if (data->args.hipMemcpy3D.p == NULL) oss << "p=NULL"; else { oss << "p="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy3D.p__val); } oss << ")"; break; case HIP_API_ID_hipMemcpy3DAsync: oss << "hipMemcpy3DAsync("; if (data->args.hipMemcpy3DAsync.p == NULL) oss << "p=NULL"; else { oss << "p="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy3DAsync.p__val); } oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpy3DAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyAsync: oss << "hipMemcpyAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAsync.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAsync.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAsync.sizeBytes); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAsync.kind); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyAtoH: oss << "hipMemcpyAtoH("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAtoH.dst); if (data->args.hipMemcpyAtoH.srcArray == NULL) oss << ", srcArray=NULL"; else { oss << ", srcArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAtoH.srcArray__val); } oss << ", srcOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAtoH.srcOffset); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyAtoH.count); oss << ")"; break; case HIP_API_ID_hipMemcpyDtoD: oss << "hipMemcpyDtoD("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoD.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoD.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoD.sizeBytes); oss << ")"; break; case HIP_API_ID_hipMemcpyDtoDAsync: oss << "hipMemcpyDtoDAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoDAsync.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoDAsync.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoDAsync.sizeBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoDAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyDtoH: oss << "hipMemcpyDtoH("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoH.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoH.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoH.sizeBytes); oss << ")"; break; case HIP_API_ID_hipMemcpyDtoHAsync: oss << "hipMemcpyDtoHAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoHAsync.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoHAsync.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoHAsync.sizeBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyDtoHAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyFromArray: oss << "hipMemcpyFromArray("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromArray.dst); oss << ", srcArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromArray.srcArray); oss << ", wOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromArray.wOffset); oss << ", hOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromArray.hOffset); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromArray.count); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromArray.kind); oss << ")"; break; case HIP_API_ID_hipMemcpyFromSymbol: oss << "hipMemcpyFromSymbol("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbol.dst); oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbol.symbol); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbol.sizeBytes); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbol.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbol.kind); oss << ")"; break; case HIP_API_ID_hipMemcpyFromSymbolAsync: oss << "hipMemcpyFromSymbolAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbolAsync.dst); oss << ", symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbolAsync.symbol); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbolAsync.sizeBytes); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbolAsync.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbolAsync.kind); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyFromSymbolAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyHtoA: oss << "hipMemcpyHtoA("; if (data->args.hipMemcpyHtoA.dstArray == NULL) oss << "dstArray=NULL"; else { oss << "dstArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoA.dstArray__val); } oss << ", dstOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoA.dstOffset); oss << ", srcHost="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoA.srcHost); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoA.count); oss << ")"; break; case HIP_API_ID_hipMemcpyHtoD: oss << "hipMemcpyHtoD("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoD.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoD.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoD.sizeBytes); oss << ")"; break; case HIP_API_ID_hipMemcpyHtoDAsync: oss << "hipMemcpyHtoDAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoDAsync.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoDAsync.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoDAsync.sizeBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyHtoDAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyParam2D: oss << "hipMemcpyParam2D("; if (data->args.hipMemcpyParam2D.pCopy == NULL) oss << "pCopy=NULL"; else { oss << "pCopy="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyParam2D.pCopy__val); } oss << ")"; break; case HIP_API_ID_hipMemcpyParam2DAsync: oss << "hipMemcpyParam2DAsync("; if (data->args.hipMemcpyParam2DAsync.pCopy == NULL) oss << "pCopy=NULL"; else { oss << "pCopy="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyParam2DAsync.pCopy__val); } oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyParam2DAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyPeer: oss << "hipMemcpyPeer("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeer.dst); oss << ", dstDeviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeer.dstDeviceId); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeer.src); oss << ", srcDeviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeer.srcDeviceId); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeer.sizeBytes); oss << ")"; break; case HIP_API_ID_hipMemcpyPeerAsync: oss << "hipMemcpyPeerAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeerAsync.dst); oss << ", dstDeviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeerAsync.dstDeviceId); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeerAsync.src); oss << ", srcDevice="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeerAsync.srcDevice); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeerAsync.sizeBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyPeerAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyToArray: oss << "hipMemcpyToArray("; if (data->args.hipMemcpyToArray.dst == NULL) oss << "dst=NULL"; else { oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToArray.dst__val); } oss << ", wOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToArray.wOffset); oss << ", hOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToArray.hOffset); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToArray.src); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToArray.count); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToArray.kind); oss << ")"; break; case HIP_API_ID_hipMemcpyToSymbol: oss << "hipMemcpyToSymbol("; oss << "symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbol.symbol); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbol.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbol.sizeBytes); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbol.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbol.kind); oss << ")"; break; case HIP_API_ID_hipMemcpyToSymbolAsync: oss << "hipMemcpyToSymbolAsync("; oss << "symbol="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbolAsync.symbol); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbolAsync.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbolAsync.sizeBytes); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbolAsync.offset); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbolAsync.kind); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyToSymbolAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemcpyWithStream: oss << "hipMemcpyWithStream("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyWithStream.dst); oss << ", src="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyWithStream.src); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyWithStream.sizeBytes); oss << ", kind="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyWithStream.kind); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemcpyWithStream.stream); oss << ")"; break; case HIP_API_ID_hipMemset: oss << "hipMemset("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset.dst); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset.value); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset.sizeBytes); oss << ")"; break; case HIP_API_ID_hipMemset2D: oss << "hipMemset2D("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2D.dst); oss << ", pitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2D.pitch); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2D.value); oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2D.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2D.height); oss << ")"; break; case HIP_API_ID_hipMemset2DAsync: oss << "hipMemset2DAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2DAsync.dst); oss << ", pitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2DAsync.pitch); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2DAsync.value); oss << ", width="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2DAsync.width); oss << ", height="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2DAsync.height); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset2DAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemset3D: oss << "hipMemset3D("; oss << "pitchedDevPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset3D.pitchedDevPtr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset3D.value); oss << ", extent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset3D.extent); oss << ")"; break; case HIP_API_ID_hipMemset3DAsync: oss << "hipMemset3DAsync("; oss << "pitchedDevPtr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset3DAsync.pitchedDevPtr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset3DAsync.value); oss << ", extent="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset3DAsync.extent); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemset3DAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemsetAsync: oss << "hipMemsetAsync("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetAsync.dst); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetAsync.value); oss << ", sizeBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetAsync.sizeBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetAsync.stream); oss << ")"; break; case HIP_API_ID_hipMemsetD16: oss << "hipMemsetD16("; oss << "dest="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD16.dest); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD16.value); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD16.count); oss << ")"; break; case HIP_API_ID_hipMemsetD16Async: oss << "hipMemsetD16Async("; oss << "dest="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD16Async.dest); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD16Async.value); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD16Async.count); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD16Async.stream); oss << ")"; break; case HIP_API_ID_hipMemsetD32: oss << "hipMemsetD32("; oss << "dest="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD32.dest); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD32.value); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD32.count); oss << ")"; break; case HIP_API_ID_hipMemsetD32Async: oss << "hipMemsetD32Async("; oss << "dst="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD32Async.dst); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD32Async.value); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD32Async.count); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD32Async.stream); oss << ")"; break; case HIP_API_ID_hipMemsetD8: oss << "hipMemsetD8("; oss << "dest="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD8.dest); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD8.value); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD8.count); oss << ")"; break; case HIP_API_ID_hipMemsetD8Async: oss << "hipMemsetD8Async("; oss << "dest="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD8Async.dest); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD8Async.value); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD8Async.count); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMemsetD8Async.stream); oss << ")"; break; case HIP_API_ID_hipMipmappedArrayCreate: oss << "hipMipmappedArrayCreate("; if (data->args.hipMipmappedArrayCreate.pHandle == NULL) oss << "pHandle=NULL"; else { oss << "pHandle="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMipmappedArrayCreate.pHandle__val); } if (data->args.hipMipmappedArrayCreate.pMipmappedArrayDesc == NULL) oss << ", pMipmappedArrayDesc=NULL"; else { oss << ", pMipmappedArrayDesc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMipmappedArrayCreate.pMipmappedArrayDesc__val); } oss << ", numMipmapLevels="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMipmappedArrayCreate.numMipmapLevels); oss << ")"; break; case HIP_API_ID_hipMipmappedArrayDestroy: oss << "hipMipmappedArrayDestroy("; oss << "hMipmappedArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMipmappedArrayDestroy.hMipmappedArray); oss << ")"; break; case HIP_API_ID_hipMipmappedArrayGetLevel: oss << "hipMipmappedArrayGetLevel("; if (data->args.hipMipmappedArrayGetLevel.pLevelArray == NULL) oss << "pLevelArray=NULL"; else { oss << "pLevelArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMipmappedArrayGetLevel.pLevelArray__val); } oss << ", hMipMappedArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMipmappedArrayGetLevel.hMipMappedArray); oss << ", level="; roctracer::hip_support::detail::operator<<(oss, data->args.hipMipmappedArrayGetLevel.level); oss << ")"; break; case HIP_API_ID_hipModuleGetFunction: oss << "hipModuleGetFunction("; if (data->args.hipModuleGetFunction.function == NULL) oss << "function=NULL"; else { oss << "function="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetFunction.function__val); } oss << ", module="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetFunction.module); if (data->args.hipModuleGetFunction.kname == NULL) oss << ", kname=NULL"; else { oss << ", kname="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetFunction.kname__val); } oss << ")"; break; case HIP_API_ID_hipModuleGetGlobal: oss << "hipModuleGetGlobal("; if (data->args.hipModuleGetGlobal.dptr == NULL) oss << "dptr=NULL"; else { oss << "dptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetGlobal.dptr__val); } if (data->args.hipModuleGetGlobal.bytes == NULL) oss << ", bytes=NULL"; else { oss << ", bytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetGlobal.bytes__val); } oss << ", hmod="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetGlobal.hmod); if (data->args.hipModuleGetGlobal.name == NULL) oss << ", name=NULL"; else { oss << ", name="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetGlobal.name__val); } oss << ")"; break; case HIP_API_ID_hipModuleGetTexRef: oss << "hipModuleGetTexRef("; if (data->args.hipModuleGetTexRef.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, (void*)data->args.hipModuleGetTexRef.texRef__val); } oss << ", hmod="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetTexRef.hmod); if (data->args.hipModuleGetTexRef.name == NULL) oss << ", name=NULL"; else { oss << ", name="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleGetTexRef.name__val); } oss << ")"; break; case HIP_API_ID_hipModuleLaunchCooperativeKernel: oss << "hipModuleLaunchCooperativeKernel("; oss << "f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.f); oss << ", gridDimX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.gridDimX); oss << ", gridDimY="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.gridDimY); oss << ", gridDimZ="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.gridDimZ); oss << ", blockDimX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.blockDimX); oss << ", blockDimY="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.blockDimY); oss << ", blockDimZ="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.blockDimZ); oss << ", sharedMemBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.sharedMemBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.stream); if (data->args.hipModuleLaunchCooperativeKernel.kernelParams == NULL) oss << ", kernelParams=NULL"; else { oss << ", kernelParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernel.kernelParams__val); } oss << ")"; break; case HIP_API_ID_hipModuleLaunchCooperativeKernelMultiDevice: oss << "hipModuleLaunchCooperativeKernelMultiDevice("; if (data->args.hipModuleLaunchCooperativeKernelMultiDevice.launchParamsList == NULL) oss << "launchParamsList=NULL"; else { oss << "launchParamsList="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernelMultiDevice.launchParamsList__val); } oss << ", numDevices="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernelMultiDevice.numDevices); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchCooperativeKernelMultiDevice.flags); oss << ")"; break; case HIP_API_ID_hipModuleLaunchKernel: oss << "hipModuleLaunchKernel("; oss << "f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.f); oss << ", gridDimX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.gridDimX); oss << ", gridDimY="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.gridDimY); oss << ", gridDimZ="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.gridDimZ); oss << ", blockDimX="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.blockDimX); oss << ", blockDimY="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.blockDimY); oss << ", blockDimZ="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.blockDimZ); oss << ", sharedMemBytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.sharedMemBytes); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.stream); if (data->args.hipModuleLaunchKernel.kernelParams == NULL) oss << ", kernelParams=NULL"; else { oss << ", kernelParams="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.kernelParams__val); } if (data->args.hipModuleLaunchKernel.extra == NULL) oss << ", extra=NULL"; else { oss << ", extra="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLaunchKernel.extra__val); } oss << ")"; break; case HIP_API_ID_hipModuleLoad: oss << "hipModuleLoad("; if (data->args.hipModuleLoad.module == NULL) oss << "module=NULL"; else { oss << "module="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoad.module__val); } if (data->args.hipModuleLoad.fname == NULL) oss << ", fname=NULL"; else { oss << ", fname="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoad.fname__val); } oss << ")"; break; case HIP_API_ID_hipModuleLoadData: oss << "hipModuleLoadData("; if (data->args.hipModuleLoadData.module == NULL) oss << "module=NULL"; else { oss << "module="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoadData.module__val); } oss << ", image="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoadData.image); oss << ")"; break; case HIP_API_ID_hipModuleLoadDataEx: oss << "hipModuleLoadDataEx("; if (data->args.hipModuleLoadDataEx.module == NULL) oss << "module=NULL"; else { oss << "module="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoadDataEx.module__val); } oss << ", image="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoadDataEx.image); oss << ", numOptions="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoadDataEx.numOptions); if (data->args.hipModuleLoadDataEx.options == NULL) oss << ", options=NULL"; else { oss << ", options="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoadDataEx.options__val); } if (data->args.hipModuleLoadDataEx.optionsValues == NULL) oss << ", optionsValues=NULL"; else { oss << ", optionsValues="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleLoadDataEx.optionsValues__val); } oss << ")"; break; case HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessor: oss << "hipModuleOccupancyMaxActiveBlocksPerMultiprocessor("; if (data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks == NULL) oss << "numBlocks=NULL"; else { oss << "numBlocks="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks__val); } oss << ", f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.f); oss << ", blockSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.blockSize); oss << ", dynSharedMemPerBlk="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessor.dynSharedMemPerBlk); oss << ")"; break; case HIP_API_ID_hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags: oss << "hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags("; if (data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks == NULL) oss << "numBlocks=NULL"; else { oss << "numBlocks="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks__val); } oss << ", f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.f); oss << ", blockSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.blockSize); oss << ", dynSharedMemPerBlk="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.dynSharedMemPerBlk); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.flags); oss << ")"; break; case HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSize: oss << "hipModuleOccupancyMaxPotentialBlockSize("; if (data->args.hipModuleOccupancyMaxPotentialBlockSize.gridSize == NULL) oss << "gridSize=NULL"; else { oss << "gridSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSize.gridSize__val); } if (data->args.hipModuleOccupancyMaxPotentialBlockSize.blockSize == NULL) oss << ", blockSize=NULL"; else { oss << ", blockSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSize.blockSize__val); } oss << ", f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSize.f); oss << ", dynSharedMemPerBlk="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSize.dynSharedMemPerBlk); oss << ", blockSizeLimit="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSize.blockSizeLimit); oss << ")"; break; case HIP_API_ID_hipModuleOccupancyMaxPotentialBlockSizeWithFlags: oss << "hipModuleOccupancyMaxPotentialBlockSizeWithFlags("; if (data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.gridSize == NULL) oss << "gridSize=NULL"; else { oss << "gridSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.gridSize__val); } if (data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.blockSize == NULL) oss << ", blockSize=NULL"; else { oss << ", blockSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.blockSize__val); } oss << ", f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.f); oss << ", dynSharedMemPerBlk="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.dynSharedMemPerBlk); oss << ", blockSizeLimit="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.blockSizeLimit); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleOccupancyMaxPotentialBlockSizeWithFlags.flags); oss << ")"; break; case HIP_API_ID_hipModuleUnload: oss << "hipModuleUnload("; oss << "module="; roctracer::hip_support::detail::operator<<(oss, data->args.hipModuleUnload.module); oss << ")"; break; case HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessor: oss << "hipOccupancyMaxActiveBlocksPerMultiprocessor("; if (data->args.hipOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks == NULL) oss << "numBlocks=NULL"; else { oss << "numBlocks="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessor.numBlocks__val); } oss << ", f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessor.f); oss << ", blockSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessor.blockSize); oss << ", dynamicSMemSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessor.dynamicSMemSize); oss << ")"; break; case HIP_API_ID_hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags: oss << "hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags("; if (data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks == NULL) oss << "numBlocks=NULL"; else { oss << "numBlocks="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.numBlocks__val); } oss << ", f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.f); oss << ", blockSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.blockSize); oss << ", dynamicSMemSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.dynamicSMemSize); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.flags); oss << ")"; break; case HIP_API_ID_hipOccupancyMaxPotentialBlockSize: oss << "hipOccupancyMaxPotentialBlockSize("; if (data->args.hipOccupancyMaxPotentialBlockSize.gridSize == NULL) oss << "gridSize=NULL"; else { oss << "gridSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxPotentialBlockSize.gridSize__val); } if (data->args.hipOccupancyMaxPotentialBlockSize.blockSize == NULL) oss << ", blockSize=NULL"; else { oss << ", blockSize="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxPotentialBlockSize.blockSize__val); } oss << ", f="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxPotentialBlockSize.f); oss << ", dynSharedMemPerBlk="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxPotentialBlockSize.dynSharedMemPerBlk); oss << ", blockSizeLimit="; roctracer::hip_support::detail::operator<<(oss, data->args.hipOccupancyMaxPotentialBlockSize.blockSizeLimit); oss << ")"; break; case HIP_API_ID_hipPeekAtLastError: oss << "hipPeekAtLastError("; oss << ")"; break; case HIP_API_ID_hipPointerGetAttribute: oss << "hipPointerGetAttribute("; oss << "data="; roctracer::hip_support::detail::operator<<(oss, data->args.hipPointerGetAttribute.data); oss << ", attribute="; roctracer::hip_support::detail::operator<<(oss, data->args.hipPointerGetAttribute.attribute); oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipPointerGetAttribute.ptr); oss << ")"; break; case HIP_API_ID_hipPointerGetAttributes: oss << "hipPointerGetAttributes("; if (data->args.hipPointerGetAttributes.attributes == NULL) oss << "attributes=NULL"; else { oss << "attributes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipPointerGetAttributes.attributes__val); } oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipPointerGetAttributes.ptr); oss << ")"; break; case HIP_API_ID_hipPointerSetAttribute: oss << "hipPointerSetAttribute("; oss << "value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipPointerSetAttribute.value); oss << ", attribute="; roctracer::hip_support::detail::operator<<(oss, data->args.hipPointerSetAttribute.attribute); oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipPointerSetAttribute.ptr); oss << ")"; break; case HIP_API_ID_hipProfilerStart: oss << "hipProfilerStart("; oss << ")"; break; case HIP_API_ID_hipProfilerStop: oss << "hipProfilerStop("; oss << ")"; break; case HIP_API_ID_hipRuntimeGetVersion: oss << "hipRuntimeGetVersion("; if (data->args.hipRuntimeGetVersion.runtimeVersion == NULL) oss << "runtimeVersion=NULL"; else { oss << "runtimeVersion="; roctracer::hip_support::detail::operator<<(oss, data->args.hipRuntimeGetVersion.runtimeVersion__val); } oss << ")"; break; case HIP_API_ID_hipSetDevice: oss << "hipSetDevice("; oss << "deviceId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSetDevice.deviceId); oss << ")"; break; case HIP_API_ID_hipSetDeviceFlags: oss << "hipSetDeviceFlags("; oss << "flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSetDeviceFlags.flags); oss << ")"; break; case HIP_API_ID_hipSetupArgument: oss << "hipSetupArgument("; oss << "arg="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSetupArgument.arg); oss << ", size="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSetupArgument.size); oss << ", offset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSetupArgument.offset); oss << ")"; break; case HIP_API_ID_hipSignalExternalSemaphoresAsync: oss << "hipSignalExternalSemaphoresAsync("; if (data->args.hipSignalExternalSemaphoresAsync.extSemArray == NULL) oss << "extSemArray=NULL"; else { oss << "extSemArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSignalExternalSemaphoresAsync.extSemArray__val); } if (data->args.hipSignalExternalSemaphoresAsync.paramsArray == NULL) oss << ", paramsArray=NULL"; else { oss << ", paramsArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSignalExternalSemaphoresAsync.paramsArray__val); } oss << ", numExtSems="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSignalExternalSemaphoresAsync.numExtSems); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipSignalExternalSemaphoresAsync.stream); oss << ")"; break; case HIP_API_ID_hipStreamAddCallback: oss << "hipStreamAddCallback("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamAddCallback.stream); oss << ", callback="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamAddCallback.callback); oss << ", userData="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamAddCallback.userData); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamAddCallback.flags); oss << ")"; break; case HIP_API_ID_hipStreamAttachMemAsync: oss << "hipStreamAttachMemAsync("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamAttachMemAsync.stream); oss << ", dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamAttachMemAsync.dev_ptr); oss << ", length="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamAttachMemAsync.length); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamAttachMemAsync.flags); oss << ")"; break; case HIP_API_ID_hipStreamBeginCapture: oss << "hipStreamBeginCapture("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamBeginCapture.stream); oss << ", mode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamBeginCapture.mode); oss << ")"; break; case HIP_API_ID_hipStreamCreate: oss << "hipStreamCreate("; if (data->args.hipStreamCreate.stream == NULL) oss << "stream=NULL"; else { oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamCreate.stream__val); } oss << ")"; break; case HIP_API_ID_hipStreamCreateWithFlags: oss << "hipStreamCreateWithFlags("; if (data->args.hipStreamCreateWithFlags.stream == NULL) oss << "stream=NULL"; else { oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamCreateWithFlags.stream__val); } oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamCreateWithFlags.flags); oss << ")"; break; case HIP_API_ID_hipStreamCreateWithPriority: oss << "hipStreamCreateWithPriority("; if (data->args.hipStreamCreateWithPriority.stream == NULL) oss << "stream=NULL"; else { oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamCreateWithPriority.stream__val); } oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamCreateWithPriority.flags); oss << ", priority="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamCreateWithPriority.priority); oss << ")"; break; case HIP_API_ID_hipStreamDestroy: oss << "hipStreamDestroy("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamDestroy.stream); oss << ")"; break; case HIP_API_ID_hipStreamEndCapture: oss << "hipStreamEndCapture("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamEndCapture.stream); if (data->args.hipStreamEndCapture.pGraph == NULL) oss << ", pGraph=NULL"; else { oss << ", pGraph="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamEndCapture.pGraph__val); } oss << ")"; break; case HIP_API_ID_hipStreamGetCaptureInfo: oss << "hipStreamGetCaptureInfo("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetCaptureInfo.stream); if (data->args.hipStreamGetCaptureInfo.pCaptureStatus == NULL) oss << ", pCaptureStatus=NULL"; else { oss << ", pCaptureStatus="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetCaptureInfo.pCaptureStatus__val); } if (data->args.hipStreamGetCaptureInfo.pId == NULL) oss << ", pId=NULL"; else { oss << ", pId="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetCaptureInfo.pId__val); } oss << ")"; break; case HIP_API_ID_hipStreamGetCaptureInfo_v2: oss << "hipStreamGetCaptureInfo_v2("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetCaptureInfo_v2.stream); if (data->args.hipStreamGetCaptureInfo_v2.captureStatus_out == NULL) oss << ", captureStatus_out=NULL"; else { oss << ", captureStatus_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetCaptureInfo_v2.captureStatus_out__val); } if (data->args.hipStreamGetCaptureInfo_v2.id_out == NULL) oss << ", id_out=NULL"; else { oss << ", id_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetCaptureInfo_v2.id_out__val); } if (data->args.hipStreamGetCaptureInfo_v2.graph_out == NULL) oss << ", graph_out=NULL"; else { oss << ", graph_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetCaptureInfo_v2.graph_out__val); } if (data->args.hipStreamGetCaptureInfo_v2.dependencies_out == NULL) oss << ", dependencies_out=NULL"; else { oss << ", dependencies_out="; roctracer::hip_support::detail::operator<<(oss, (void*)data->args.hipStreamGetCaptureInfo_v2.dependencies_out__val); } if (data->args.hipStreamGetCaptureInfo_v2.numDependencies_out == NULL) oss << ", numDependencies_out=NULL"; else { oss << ", numDependencies_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetCaptureInfo_v2.numDependencies_out__val); } oss << ")"; break; case HIP_API_ID_hipStreamGetDevice: oss << "hipStreamGetDevice("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetDevice.stream); if (data->args.hipStreamGetDevice.device == NULL) oss << ", device=NULL"; else { oss << ", device="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetDevice.device__val); } oss << ")"; break; case HIP_API_ID_hipStreamGetFlags: oss << "hipStreamGetFlags("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetFlags.stream); if (data->args.hipStreamGetFlags.flags == NULL) oss << ", flags=NULL"; else { oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetFlags.flags__val); } oss << ")"; break; case HIP_API_ID_hipStreamGetPriority: oss << "hipStreamGetPriority("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetPriority.stream); if (data->args.hipStreamGetPriority.priority == NULL) oss << ", priority=NULL"; else { oss << ", priority="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamGetPriority.priority__val); } oss << ")"; break; case HIP_API_ID_hipStreamIsCapturing: oss << "hipStreamIsCapturing("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamIsCapturing.stream); if (data->args.hipStreamIsCapturing.pCaptureStatus == NULL) oss << ", pCaptureStatus=NULL"; else { oss << ", pCaptureStatus="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamIsCapturing.pCaptureStatus__val); } oss << ")"; break; case HIP_API_ID_hipStreamQuery: oss << "hipStreamQuery("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamQuery.stream); oss << ")"; break; case HIP_API_ID_hipStreamSynchronize: oss << "hipStreamSynchronize("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamSynchronize.stream); oss << ")"; break; case HIP_API_ID_hipStreamUpdateCaptureDependencies: oss << "hipStreamUpdateCaptureDependencies("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamUpdateCaptureDependencies.stream); if (data->args.hipStreamUpdateCaptureDependencies.dependencies == NULL) oss << ", dependencies=NULL"; else { oss << ", dependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamUpdateCaptureDependencies.dependencies__val); } oss << ", numDependencies="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamUpdateCaptureDependencies.numDependencies); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamUpdateCaptureDependencies.flags); oss << ")"; break; case HIP_API_ID_hipStreamWaitEvent: oss << "hipStreamWaitEvent("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitEvent.stream); oss << ", event="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitEvent.event); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitEvent.flags); oss << ")"; break; case HIP_API_ID_hipStreamWaitValue32: oss << "hipStreamWaitValue32("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue32.stream); oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue32.ptr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue32.value); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue32.flags); oss << ", mask="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue32.mask); oss << ")"; break; case HIP_API_ID_hipStreamWaitValue64: oss << "hipStreamWaitValue64("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue64.stream); oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue64.ptr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue64.value); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue64.flags); oss << ", mask="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWaitValue64.mask); oss << ")"; break; case HIP_API_ID_hipStreamWriteValue32: oss << "hipStreamWriteValue32("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWriteValue32.stream); oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWriteValue32.ptr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWriteValue32.value); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWriteValue32.flags); oss << ")"; break; case HIP_API_ID_hipStreamWriteValue64: oss << "hipStreamWriteValue64("; oss << "stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWriteValue64.stream); oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWriteValue64.ptr); oss << ", value="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWriteValue64.value); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipStreamWriteValue64.flags); oss << ")"; break; case HIP_API_ID_hipTexRefGetAddress: oss << "hipTexRefGetAddress("; if (data->args.hipTexRefGetAddress.dev_ptr == NULL) oss << "dev_ptr=NULL"; else { oss << "dev_ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetAddress.dev_ptr__val); } if (data->args.hipTexRefGetAddress.texRef == NULL) oss << ", texRef=NULL"; else { oss << ", texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetAddress.texRef__val); } oss << ")"; break; case HIP_API_ID_hipTexRefGetFlags: oss << "hipTexRefGetFlags("; if (data->args.hipTexRefGetFlags.pFlags == NULL) oss << "pFlags=NULL"; else { oss << "pFlags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetFlags.pFlags__val); } if (data->args.hipTexRefGetFlags.texRef == NULL) oss << ", texRef=NULL"; else { oss << ", texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetFlags.texRef__val); } oss << ")"; break; case HIP_API_ID_hipTexRefGetFormat: oss << "hipTexRefGetFormat("; if (data->args.hipTexRefGetFormat.pFormat == NULL) oss << "pFormat=NULL"; else { oss << "pFormat="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetFormat.pFormat__val); } if (data->args.hipTexRefGetFormat.pNumChannels == NULL) oss << ", pNumChannels=NULL"; else { oss << ", pNumChannels="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetFormat.pNumChannels__val); } if (data->args.hipTexRefGetFormat.texRef == NULL) oss << ", texRef=NULL"; else { oss << ", texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetFormat.texRef__val); } oss << ")"; break; case HIP_API_ID_hipTexRefGetMaxAnisotropy: oss << "hipTexRefGetMaxAnisotropy("; if (data->args.hipTexRefGetMaxAnisotropy.pmaxAnsio == NULL) oss << "pmaxAnsio=NULL"; else { oss << "pmaxAnsio="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMaxAnisotropy.pmaxAnsio__val); } if (data->args.hipTexRefGetMaxAnisotropy.texRef == NULL) oss << ", texRef=NULL"; else { oss << ", texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMaxAnisotropy.texRef__val); } oss << ")"; break; case HIP_API_ID_hipTexRefGetMipMappedArray: oss << "hipTexRefGetMipMappedArray("; if (data->args.hipTexRefGetMipMappedArray.pArray == NULL) oss << "pArray=NULL"; else { oss << "pArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMipMappedArray.pArray__val); } if (data->args.hipTexRefGetMipMappedArray.texRef == NULL) oss << ", texRef=NULL"; else { oss << ", texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMipMappedArray.texRef__val); } oss << ")"; break; case HIP_API_ID_hipTexRefGetMipmapLevelBias: oss << "hipTexRefGetMipmapLevelBias("; if (data->args.hipTexRefGetMipmapLevelBias.pbias == NULL) oss << "pbias=NULL"; else { oss << "pbias="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMipmapLevelBias.pbias__val); } if (data->args.hipTexRefGetMipmapLevelBias.texRef == NULL) oss << ", texRef=NULL"; else { oss << ", texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMipmapLevelBias.texRef__val); } oss << ")"; break; case HIP_API_ID_hipTexRefGetMipmapLevelClamp: oss << "hipTexRefGetMipmapLevelClamp("; if (data->args.hipTexRefGetMipmapLevelClamp.pminMipmapLevelClamp == NULL) oss << "pminMipmapLevelClamp=NULL"; else { oss << "pminMipmapLevelClamp="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMipmapLevelClamp.pminMipmapLevelClamp__val); } if (data->args.hipTexRefGetMipmapLevelClamp.pmaxMipmapLevelClamp == NULL) oss << ", pmaxMipmapLevelClamp=NULL"; else { oss << ", pmaxMipmapLevelClamp="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMipmapLevelClamp.pmaxMipmapLevelClamp__val); } if (data->args.hipTexRefGetMipmapLevelClamp.texRef == NULL) oss << ", texRef=NULL"; else { oss << ", texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefGetMipmapLevelClamp.texRef__val); } oss << ")"; break; case HIP_API_ID_hipTexRefSetAddress: oss << "hipTexRefSetAddress("; if (data->args.hipTexRefSetAddress.ByteOffset == NULL) oss << "ByteOffset=NULL"; else { oss << "ByteOffset="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetAddress.ByteOffset__val); } if (data->args.hipTexRefSetAddress.texRef == NULL) oss << ", texRef=NULL"; else { oss << ", texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetAddress.texRef__val); } oss << ", dptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetAddress.dptr); oss << ", bytes="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetAddress.bytes); oss << ")"; break; case HIP_API_ID_hipTexRefSetAddress2D: oss << "hipTexRefSetAddress2D("; if (data->args.hipTexRefSetAddress2D.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetAddress2D.texRef__val); } if (data->args.hipTexRefSetAddress2D.desc == NULL) oss << ", desc=NULL"; else { oss << ", desc="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetAddress2D.desc__val); } oss << ", dptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetAddress2D.dptr); oss << ", Pitch="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetAddress2D.Pitch); oss << ")"; break; case HIP_API_ID_hipTexRefSetArray: oss << "hipTexRefSetArray("; if (data->args.hipTexRefSetArray.tex == NULL) oss << "tex=NULL"; else { oss << "tex="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetArray.tex__val); } oss << ", array="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetArray.array); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetArray.flags); oss << ")"; break; case HIP_API_ID_hipTexRefSetBorderColor: oss << "hipTexRefSetBorderColor("; if (data->args.hipTexRefSetBorderColor.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetBorderColor.texRef__val); } if (data->args.hipTexRefSetBorderColor.pBorderColor == NULL) oss << ", pBorderColor=NULL"; else { oss << ", pBorderColor="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetBorderColor.pBorderColor__val); } oss << ")"; break; case HIP_API_ID_hipTexRefSetFlags: oss << "hipTexRefSetFlags("; if (data->args.hipTexRefSetFlags.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetFlags.texRef__val); } oss << ", Flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetFlags.Flags); oss << ")"; break; case HIP_API_ID_hipTexRefSetFormat: oss << "hipTexRefSetFormat("; if (data->args.hipTexRefSetFormat.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetFormat.texRef__val); } oss << ", fmt="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetFormat.fmt); oss << ", NumPackedComponents="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetFormat.NumPackedComponents); oss << ")"; break; case HIP_API_ID_hipTexRefSetMaxAnisotropy: oss << "hipTexRefSetMaxAnisotropy("; if (data->args.hipTexRefSetMaxAnisotropy.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMaxAnisotropy.texRef__val); } oss << ", maxAniso="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMaxAnisotropy.maxAniso); oss << ")"; break; case HIP_API_ID_hipTexRefSetMipmapLevelBias: oss << "hipTexRefSetMipmapLevelBias("; if (data->args.hipTexRefSetMipmapLevelBias.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMipmapLevelBias.texRef__val); } oss << ", bias="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMipmapLevelBias.bias); oss << ")"; break; case HIP_API_ID_hipTexRefSetMipmapLevelClamp: oss << "hipTexRefSetMipmapLevelClamp("; if (data->args.hipTexRefSetMipmapLevelClamp.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMipmapLevelClamp.texRef__val); } oss << ", minMipMapLevelClamp="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMipmapLevelClamp.minMipMapLevelClamp); oss << ", maxMipMapLevelClamp="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMipmapLevelClamp.maxMipMapLevelClamp); oss << ")"; break; case HIP_API_ID_hipTexRefSetMipmappedArray: oss << "hipTexRefSetMipmappedArray("; if (data->args.hipTexRefSetMipmappedArray.texRef == NULL) oss << "texRef=NULL"; else { oss << "texRef="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMipmappedArray.texRef__val); } if (data->args.hipTexRefSetMipmappedArray.mipmappedArray == NULL) oss << ", mipmappedArray=NULL"; else { oss << ", mipmappedArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMipmappedArray.mipmappedArray__val); } oss << ", Flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipTexRefSetMipmappedArray.Flags); oss << ")"; break; case HIP_API_ID_hipThreadExchangeStreamCaptureMode: oss << "hipThreadExchangeStreamCaptureMode("; if (data->args.hipThreadExchangeStreamCaptureMode.mode == NULL) oss << "mode=NULL"; else { oss << "mode="; roctracer::hip_support::detail::operator<<(oss, data->args.hipThreadExchangeStreamCaptureMode.mode__val); } oss << ")"; break; case HIP_API_ID_hipUserObjectCreate: oss << "hipUserObjectCreate("; if (data->args.hipUserObjectCreate.object_out == NULL) oss << "object_out=NULL"; else { oss << "object_out="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectCreate.object_out__val); } oss << ", ptr="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectCreate.ptr); oss << ", destroy="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectCreate.destroy); oss << ", initialRefcount="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectCreate.initialRefcount); oss << ", flags="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectCreate.flags); oss << ")"; break; case HIP_API_ID_hipUserObjectRelease: oss << "hipUserObjectRelease("; oss << "object="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectRelease.object); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectRelease.count); oss << ")"; break; case HIP_API_ID_hipUserObjectRetain: oss << "hipUserObjectRetain("; oss << "object="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectRetain.object); oss << ", count="; roctracer::hip_support::detail::operator<<(oss, data->args.hipUserObjectRetain.count); oss << ")"; break; case HIP_API_ID_hipWaitExternalSemaphoresAsync: oss << "hipWaitExternalSemaphoresAsync("; if (data->args.hipWaitExternalSemaphoresAsync.extSemArray == NULL) oss << "extSemArray=NULL"; else { oss << "extSemArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipWaitExternalSemaphoresAsync.extSemArray__val); } if (data->args.hipWaitExternalSemaphoresAsync.paramsArray == NULL) oss << ", paramsArray=NULL"; else { oss << ", paramsArray="; roctracer::hip_support::detail::operator<<(oss, data->args.hipWaitExternalSemaphoresAsync.paramsArray__val); } oss << ", numExtSems="; roctracer::hip_support::detail::operator<<(oss, data->args.hipWaitExternalSemaphoresAsync.numExtSems); oss << ", stream="; roctracer::hip_support::detail::operator<<(oss, data->args.hipWaitExternalSemaphoresAsync.stream); oss << ")"; break; default: oss << "unknown"; }; return strdup(oss.str().c_str()); } #endif // HIP_PROF_HIP_API_STRING #endif // _HIP_PROF_STR_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/hip_runtime_prof.h000066400000000000000000000052331450307266000245220ustar00rootroot00000000000000/* Copyright (c) 2019 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HIP_RUNTIME_PROF_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HIP_RUNTIME_PROF_H // HIP ROCclr Op IDs enumeration enum HipVdiOpId { kHipVdiOpIdDispatch = 0, kHipVdiOpIdCopy = 1, kHipVdiOpIdBarrier = 2, kHipVdiOpIdNumber = 3 }; // Types of ROCclr commands enum HipVdiCommandKind { kHipVdiCommandKernel = 0x11F0, kHipVdiMemcpyDeviceToHost = 0x11F3, kHipHipVdiMemcpyHostToDevice = 0x11F4, kHipVdiMemcpyDeviceToDevice = 0x11F5, kHipVidMemcpyDeviceToHostRect = 0x1201, kHipVdiMemcpyHostToDeviceRect = 0x1202, kHipVdiMemcpyDeviceToDeviceRect = 0x1203, kHipVdiFillMemory = 0x1207, }; /** * @brief Initializes activity callback * * @param [input] id_callback Event ID callback function * @param [input] op_callback Event operation callback function * @param [input] arg Arguments passed into callback * * @returns None */ void hipInitActivityCallback(void* id_callback, void* op_callback, void* arg); /** * @brief Enables activity callback * * @param [input] op Operation, which will trigger a callback (@see HipVdiOpId) * @param [input] enable Enable state for the callback * * @returns True if successful */ bool hipEnableActivityCallback(uint32_t op, bool enable); /** * @brief Returns the description string for the operation kind * * @param [input] id Command kind id (@see HipVdiCommandKind) * * @returns A pointer to a const string with the command description */ const char* hipGetCmdName(uint32_t id); #endif // HIP_INCLUDE_HIP_AMD_DETAIL_HIP_RUNTIME_PROF_H clr-rocm-5.7.1/hipamd/include/hip/amd_detail/host_defines.h000066400000000000000000000160641450307266000236270ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /** * @file amd_detail/host_defines.h * @brief TODO-doc */ #ifndef HIP_INCLUDE_HIP_AMD_DETAIL_HOST_DEFINES_H #define HIP_INCLUDE_HIP_AMD_DETAIL_HOST_DEFINES_H // The follow macro should be removed after upstream updation. // It's defined here for workarround of rocThrust building failure. #define HIP_INCLUDE_HIP_HCC_DETAIL_HOST_DEFINES_H // Add guard to Generic Grid Launch method #ifndef GENERIC_GRID_LAUNCH #define GENERIC_GRID_LAUNCH 1 #endif #if defined(__clang__) && defined(__HIP__) namespace __hip_internal { typedef unsigned char uint8_t; typedef unsigned short uint16_t; typedef unsigned int uint32_t; typedef unsigned long long uint64_t; typedef signed char int8_t; typedef signed short int16_t; typedef signed int int32_t; typedef signed long long int64_t; template struct integral_constant { static constexpr const _Tp value = __v; typedef _Tp value_type; typedef integral_constant type; constexpr operator value_type() const { return value; } constexpr value_type operator()() const { return value; } }; template constexpr const _Tp integral_constant<_Tp, __v>::value; typedef integral_constant true_type; typedef integral_constant false_type; template using bool_constant = integral_constant; typedef bool_constant true_type; typedef bool_constant false_type; template struct enable_if {}; template struct enable_if { typedef __T type; }; template struct true_or_false_type : public false_type {}; template<> struct true_or_false_type : public true_type {}; template struct is_integral : public false_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template <> struct is_integral : public true_type {}; template struct is_arithmetic : public false_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template <> struct is_arithmetic : public true_type {}; template struct is_floating_point : public false_type {}; template<> struct is_floating_point : public true_type {}; template<> struct is_floating_point : public true_type {}; template<> struct is_floating_point : public true_type {}; template struct is_same : public false_type {}; template struct is_same<__T, __T> : public true_type {}; template::value> struct is_signed : public false_type {}; template struct is_signed<_Tp, true> : public true_or_false_type<_Tp(-1) < _Tp(0)> {}; template struct char_traits; template> class basic_istream; template> class basic_ostream; typedef basic_istream istream; typedef basic_ostream ostream; template struct is_standard_layout : public integral_constant { }; template struct is_trivial : public integral_constant { }; } typedef __hip_internal::uint8_t __hip_uint8_t; typedef __hip_internal::uint16_t __hip_uint16_t; typedef __hip_internal::uint32_t __hip_uint32_t; typedef __hip_internal::uint64_t __hip_uint64_t; typedef __hip_internal::int8_t __hip_int8_t; typedef __hip_internal::int16_t __hip_int16_t; typedef __hip_internal::int32_t __hip_int32_t; typedef __hip_internal::int64_t __hip_int64_t; #if !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ #define __host__ __attribute__((host)) #define __device__ __attribute__((device)) #define __global__ __attribute__((global)) #define __shared__ __attribute__((shared)) #define __constant__ __attribute__((constant)) #endif // !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ #if !defined(__has_feature) || !__has_feature(cuda_noinline_keyword) #define __noinline__ __attribute__((noinline)) #endif #define __forceinline__ inline __attribute__((always_inline)) #if __HIP_NO_IMAGE_SUPPORT #define __hip_img_chk__ __attribute__((unavailable("The image/texture API not supported on the device"))) #else #define __hip_img_chk__ #endif #else // Non-HCC compiler /** * Function and kernel markers */ #define __host__ #define __device__ #define __global__ #define __noinline__ #define __forceinline__ inline #define __shared__ #define __constant__ #define __hip_img_chk__ #endif #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/hsa_helpers.hpp000066400000000000000000000062401450307266000240050ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include #include #include #include namespace hip_impl { inline void* address(hsa_executable_symbol_t x) { void* r = nullptr; hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS, &r); return r; } inline hsa_agent_t agent(hsa_executable_symbol_t x) { hsa_agent_t r = {}; hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_AGENT, &r); return r; } inline std::uint32_t group_size(hsa_executable_symbol_t x) { std::uint32_t r = 0u; hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE, &r); return r; } inline hsa_isa_t isa(hsa_agent_t x) { hsa_isa_t r = {}; hsa_agent_iterate_isas(x, [](hsa_isa_t i, void* o) { *static_cast(o) = i; // Pick the first. return HSA_STATUS_INFO_BREAK; }, &r); return r; } inline std::uint64_t kernel_object(hsa_executable_symbol_t x) { std::uint64_t r = 0u; hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &r); return r; } inline std::string name(hsa_executable_symbol_t x) { std::uint32_t sz = 0u; hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &sz); std::string r(sz, '\0'); hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_NAME, &r.front()); return r; } inline std::uint32_t private_size(hsa_executable_symbol_t x) { std::uint32_t r = 0u; hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE, &r); return r; } inline std::uint32_t size(hsa_executable_symbol_t x) { std::uint32_t r = 0; hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE, &r); return r; } inline hsa_symbol_kind_t type(hsa_executable_symbol_t x) { hsa_symbol_kind_t r = {}; hsa_executable_symbol_get_info(x, HSA_EXECUTABLE_SYMBOL_INFO_TYPE, &r); return r; } } // namespace hip_implclr-rocm-5.7.1/hipamd/include/hip/amd_detail/macro_based_grid_launch.hpp000066400000000000000000002045251450307266000263140ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "concepts.hpp" #include "helpers.hpp" #include "hc.hpp" #include "hip/hip_ext.h" #include "hip_runtime.h" #include #include #include #include #include namespace hip_impl { namespace { struct New_grid_launch_tag {}; struct Old_grid_launch_tag {}; template class RAII_guard { D dtor_; public: RAII_guard() = default; RAII_guard(const C& ctor, D dtor) : dtor_{std::move(dtor)} { ctor(); } RAII_guard(const RAII_guard&) = default; RAII_guard(RAII_guard&&) = default; RAII_guard& operator=(const RAII_guard&) = default; RAII_guard& operator=(RAII_guard&&) = default; ~RAII_guard() { dtor_(); } }; template RAII_guard make_RAII_guard(const C& ctor, D dtor) { return RAII_guard{ctor, std::move(dtor)}; } template using is_new_grid_launch_t = typename std::conditional{}, New_grid_launch_tag, Old_grid_launch_tag>::type; } // namespace // TODO: - dispatch rank should be derived from the domain dimensions passed // in, and not always assumed to be 3; template requires(Domain == {Ts...}) inline void grid_launch_hip_impl_(New_grid_launch_tag, dim3 num_blocks, dim3 dim_blocks, int group_mem_bytes, const hc::accelerator_view& acc_v, K k) { const auto d = hc::extent<3>{num_blocks.z * dim_blocks.z, num_blocks.y * dim_blocks.y, num_blocks.x * dim_blocks.x} .tile_with_dynamic(dim_blocks.z, dim_blocks.y, dim_blocks.x, group_mem_bytes); try { hc::parallel_for_each(acc_v, d, k); } catch (std::exception& ex) { std::cerr << "Failed in " << __func__ << ", with exception: " << ex.what() << std::endl; hip_throw(ex); } } // TODO: these are workarounds, they should be removed. hc::accelerator_view lock_stream_hip_(hipStream_t&, void*&); void print_prelaunch_trace_(const char*, dim3, dim3, int, hipStream_t); void unlock_stream_hip_(hipStream_t, void*, const char*, hc::accelerator_view*); template requires(Domain == {Ts...}) inline void grid_launch_hip_impl_(New_grid_launch_tag, dim3 num_blocks, dim3 dim_blocks, int group_mem_bytes, hipStream_t stream, const char* kernel_name, K k) { void* lck_stream = nullptr; auto acc_v = lock_stream_hip_(stream, lck_stream); auto stream_guard = make_RAII_guard(std::bind(print_prelaunch_trace_, kernel_name, num_blocks, dim_blocks, group_mem_bytes, stream), std::bind(unlock_stream_hip_, stream, lck_stream, kernel_name, &acc_v)); try { grid_launch_hip_impl_(New_grid_launch_tag{}, std::move(num_blocks), std::move(dim_blocks), group_mem_bytes, acc_v, std::move(k)); } catch (std::exception& ex) { std::cerr << "Failed in " << __func__ << ", with exception: " << ex.what() << std::endl; hip_throw(ex); } } template requires(Domain == {hipLaunchParm, Ts...}) inline void grid_launch_hip_impl_(Old_grid_launch_tag, dim3 num_blocks, dim3 dim_blocks, int group_mem_bytes, hipStream_t stream, K k) { grid_launch_hip_impl_(New_grid_launch_tag{}, std::move(num_blocks), std::move(dim_blocks), group_mem_bytes, std::move(stream), std::move(k)); } template requires(Domain == {hipLaunchParm, Ts...}) inline void grid_launch_hip_impl_( Old_grid_launch_tag, dim3 num_blocks, dim3 dim_blocks, int group_mem_bytes, hipStream_t stream, const char* kernel_name, K k) { grid_launch_hip_impl_(New_grid_launch_tag{}, std::move(num_blocks), std::move(dim_blocks), group_mem_bytes, std::move(stream), kernel_name, std::move(k)); } template requires(Domain == {Ts...}) inline std::enable_if_t< !std::is_function::value> grid_launch_hip_(dim3 num_blocks, dim3 dim_blocks, int group_mem_bytes, hipStream_t stream, const char* kernel_name, K k) { grid_launch_hip_impl_(is_new_grid_launch_t{}, std::move(num_blocks), std::move(dim_blocks), group_mem_bytes, std::move(stream), kernel_name, std::move(k)); } template requires(Domain == {Ts...}) inline std::enable_if_t< !std::is_function::value> grid_launch_hip_(dim3 num_blocks, dim3 dim_blocks, int group_mem_bytes, hipStream_t stream, K k) { grid_launch_hip_impl_(is_new_grid_launch_t{}, std::move(num_blocks), std::move(dim_blocks), group_mem_bytes, std::move(stream), std::move(k)); } // TODO: these are temporary and purposefully noisy and disruptive. #define make_kernel_name_hip(k, n) \ HIP_kernel_functor_name_begin##_##k##_##HIP_kernel_functor_name_end##_##n #define make_kernel_functor_hip_30(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, \ p22, p23, p24, p25, p26, p27) \ struct make_kernel_name_hip(function_name, 28) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ std::decay_t _p20_; \ std::decay_t _p21_; \ std::decay_t _p22_; \ std::decay_t _p23_; \ std::decay_t _p24_; \ std::decay_t _p25_; \ std::decay_t _p26_; \ std::decay_t _p27_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_, _p20_, _p21_, \ _p22_, _p23_, _p24_, _p25_, _p26_, _p27_); \ } \ } #define make_kernel_functor_hip_29(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, \ p22, p23, p24, p25, p26) \ struct make_kernel_name_hip(function_name, 27) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ std::decay_t _p20_; \ std::decay_t _p21_; \ std::decay_t _p22_; \ std::decay_t _p23_; \ std::decay_t _p24_; \ std::decay_t _p25_; \ std::decay_t _p26_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_, _p20_, _p21_, \ _p22_, _p23_, _p24_, _p25_, _p26_); \ } \ } #define make_kernel_functor_hip_28(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, \ p22, p23, p24, p25) \ struct make_kernel_name_hip(function_name, 26) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ std::decay_t _p20_; \ std::decay_t _p21_; \ std::decay_t _p22_; \ std::decay_t _p23_; \ std::decay_t _p24_; \ std::decay_t _p25_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_, _p20_, _p21_, \ _p22_, _p23_, _p24_, _p25_); \ } \ } #define make_kernel_functor_hip_27(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, \ p22, p23, p24) \ struct make_kernel_name_hip(function_name, 25) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ std::decay_t _p20_; \ std::decay_t _p21_; \ std::decay_t _p22_; \ std::decay_t _p23_; \ std::decay_t _p24_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_, _p20_, _p21_, \ _p22_, _p23_, _p24_); \ } \ } #define make_kernel_functor_hip_26(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, \ p22, p23) \ struct make_kernel_name_hip(function_name, 24) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ std::decay_t _p20_; \ std::decay_t _p21_; \ std::decay_t _p22_; \ std::decay_t _p23_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_, _p20_, _p21_, \ _p22_, _p23_); \ } \ } #define make_kernel_functor_hip_25(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, \ p22) \ struct make_kernel_name_hip(function_name, 23) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ std::decay_t _p20_; \ std::decay_t _p21_; \ std::decay_t _p22_; \ __attribute__((used, flatten)) void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_, _p20_, _p21_, \ _p22_); \ } \ } #define make_kernel_functor_hip_24(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21) \ struct make_kernel_name_hip(function_name, 22) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ std::decay_t _p20_; \ std::decay_t _p21_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_, _p20_, _p21_); \ } \ } #define make_kernel_functor_hip_23(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20) \ struct make_kernel_name_hip(function_name, 21) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ std::decay_t _p20_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_, _p20_); \ } \ } #define make_kernel_functor_hip_22(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19) \ struct make_kernel_name_hip(function_name, 20) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ std::decay_t _p19_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_, _p19_); \ } \ } #define make_kernel_functor_hip_21(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17, p18) \ struct make_kernel_name_hip(function_name, 19) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ std::decay_t _p18_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_, _p18_); \ } \ } #define make_kernel_functor_hip_20(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16, p17) \ struct make_kernel_name_hip(function_name, 18) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ std::decay_t _p17_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_, _p17_); \ } \ } #define make_kernel_functor_hip_19(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15, p16) \ struct make_kernel_name_hip(function_name, 17) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ std::decay_t _p16_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_, _p16_); \ } \ } #define make_kernel_functor_hip_18(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14, p15) \ struct make_kernel_name_hip(function_name, 16) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ std::decay_t _p15_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_, _p15_); \ } \ } #define make_kernel_functor_hip_17(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13, p14) \ struct make_kernel_name_hip(function_name, 15) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ std::decay_t _p14_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_, _p14_); \ } \ } #define make_kernel_functor_hip_16(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12, p13) \ struct make_kernel_name_hip(function_name, 14) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ std::decay_t _p13_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_, _p13_); \ } \ } #define make_kernel_functor_hip_15(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11, p12) \ struct make_kernel_name_hip(function_name, 13) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ std::decay_t _p12_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_, \ _p12_); \ } \ } #define make_kernel_functor_hip_14(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10, p11) \ struct make_kernel_name_hip(function_name, 12) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ std::decay_t _p11_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_, _p11_); \ } \ } #define make_kernel_functor_hip_13(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9, p10) \ struct make_kernel_name_hip(function_name, 11) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ std::decay_t _p10_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { \ kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_, _p10_); \ } \ } #define make_kernel_functor_hip_12(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8, \ p9) \ struct make_kernel_name_hip(function_name, 10) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ std::decay_t _p9_; \ void operator()(const hc::tiled_index<3>&) const \ [[hc]] { kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_, _p9_); } \ } #define make_kernel_functor_hip_11(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7, p8) \ struct make_kernel_name_hip(function_name, 9) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ std::decay_t _p8_; \ void operator()(const hc::tiled_index<3>&) const \ [[hc]] { kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_, _p8_); } \ } #define make_kernel_functor_hip_10(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6, p7) \ struct make_kernel_name_hip(function_name, 8) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ std::decay_t _p7_; \ void operator()(const hc::tiled_index<3>&) const \ [[hc]] { kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_, _p7_); } \ } #define make_kernel_functor_hip_9(function_name, kernel_name, p0, p1, p2, p3, p4, p5, p6) \ struct make_kernel_name_hip(function_name, 7) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ std::decay_t _p6_; \ void operator()(const hc::tiled_index<3>&) const \ [[hc]] { kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_, _p6_); } \ } #define make_kernel_functor_hip_8(function_name, kernel_name, p0, p1, p2, p3, p4, p5) \ struct make_kernel_name_hip(function_name, 6) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ std::decay_t _p5_; \ void operator()(const hc::tiled_index<3>&) const \ [[hc]] { kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_, _p5_); } \ } #define make_kernel_functor_hip_7(function_name, kernel_name, p0, p1, p2, p3, p4) \ struct make_kernel_name_hip(function_name, 5) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ std::decay_t _p4_; \ void operator()(const hc::tiled_index<3>&) const \ [[hc]] { kernel_name(_p0_, _p1_, _p2_, _p3_, _p4_); } \ } #define make_kernel_functor_hip_6(function_name, kernel_name, p0, p1, p2, p3) \ struct make_kernel_name_hip(function_name, 4) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ std::decay_t _p3_; \ void operator()(const hc::tiled_index<3>&) const \ [[hc]] { kernel_name(_p0_, _p1_, _p2_, _p3_); } \ } #define make_kernel_functor_hip_5(function_name, kernel_name, p0, p1, p2) \ struct make_kernel_name_hip(function_name, 3) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ std::decay_t _p2_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { kernel_name(_p0_, _p1_, _p2_); } \ } #define make_kernel_functor_hip_4(function_name, kernel_name, p0, p1) \ struct make_kernel_name_hip(function_name, 2) { \ std::decay_t _p0_; \ std::decay_t _p1_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { kernel_name(_p0_, _p1_); } \ } #define fofo(f, n) kernel_prefix_hip##f##kernel_suffix_hip##n #define make_kernel_functor_hip_3(function_name, kernel_name, p0) \ struct make_kernel_name_hip(function_name, 1) { \ std::decay_t _p0_; \ void operator()(const hc::tiled_index<3>&) const [[hc]] { kernel_name(_p0_); } \ } #define make_kernel_functor_hip_2(function_name, kernel_name) \ struct make_kernel_name_hip(function_name, 0) { \ void operator()(const hc::tiled_index<3>&)[[hc]] { return kernel_name(hipLaunchParm{}); } \ } #define make_kernel_functor_hip_1(...) #define make_kernel_functor_hip_0(...) #define make_kernel_functor_hip_(...) overload_macro_hip_(make_kernel_functor_hip_, __VA_ARGS__) #define hipLaunchNamedKernelGGL(function_name, kernel_name, num_blocks, dim_blocks, \ group_mem_bytes, stream, ...) \ do { \ make_kernel_functor_hip_(function_name, kernel_name, __VA_ARGS__) \ hip_kernel_functor_impl_{__VA_ARGS__}; \ hip_impl::grid_launch_hip_(num_blocks, dim_blocks, group_mem_bytes, stream, #kernel_name, \ hip_kernel_functor_impl_); \ } while (0) #define hipLaunchKernelGGL(kernel_name, num_blocks, dim_blocks, group_mem_bytes, stream, ...) \ do { \ hipLaunchNamedKernelGGL(unnamed, kernel_name, num_blocks, dim_blocks, group_mem_bytes, \ stream, ##__VA_ARGS__); \ } while (0) #define hipLaunchKernel(kernel_name, num_blocks, dim_blocks, group_mem_bytes, stream, ...) \ do { \ hipLaunchKernelGGL(kernel_name, num_blocks, dim_blocks, group_mem_bytes, stream, \ hipLaunchParm{}, ##__VA_ARGS__); \ } while (0) } // namespace hip_impl clr-rocm-5.7.1/hipamd/include/hip/amd_detail/math_fwd.h000066400000000000000000000407721450307266000227510ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "host_defines.h" #if defined(__cplusplus) extern "C" { #endif // DOT FUNCTIONS #if __HIP_CLANG_ONLY__ __device__ __attribute__((const)) int __ockl_sdot2( HIP_vector_base::Native_vec_, HIP_vector_base::Native_vec_, int, bool); __device__ __attribute__((const)) unsigned int __ockl_udot2( HIP_vector_base::Native_vec_, HIP_vector_base::Native_vec_, unsigned int, bool); __device__ __attribute__((const)) int __ockl_sdot4( HIP_vector_base::Native_vec_, HIP_vector_base::Native_vec_, int, bool); __device__ __attribute__((const)) unsigned int __ockl_udot4( HIP_vector_base::Native_vec_, HIP_vector_base::Native_vec_, unsigned int, bool); __device__ __attribute__((const)) int __ockl_sdot8(int, int, int, bool); __device__ __attribute__((const)) unsigned int __ockl_udot8(unsigned int, unsigned int, unsigned int, bool); #endif #if !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ // BEGIN FLOAT __device__ __attribute__((const)) float __ocml_acos_f32(float); __device__ __attribute__((pure)) float __ocml_acosh_f32(float); __device__ __attribute__((const)) float __ocml_asin_f32(float); __device__ __attribute__((pure)) float __ocml_asinh_f32(float); __device__ __attribute__((const)) float __ocml_atan2_f32(float, float); __device__ __attribute__((const)) float __ocml_atan_f32(float); __device__ __attribute__((pure)) float __ocml_atanh_f32(float); __device__ __attribute__((pure)) float __ocml_cbrt_f32(float); __device__ __attribute__((const)) float __ocml_ceil_f32(float); __device__ __attribute__((const)) __device__ float __ocml_copysign_f32(float, float); __device__ float __ocml_cos_f32(float); __device__ float __ocml_native_cos_f32(float); __device__ __attribute__((pure)) __device__ float __ocml_cosh_f32(float); __device__ float __ocml_cospi_f32(float); __device__ float __ocml_i0_f32(float); __device__ float __ocml_i1_f32(float); __device__ __attribute__((pure)) float __ocml_erfc_f32(float); __device__ __attribute__((pure)) float __ocml_erfcinv_f32(float); __device__ __attribute__((pure)) float __ocml_erfcx_f32(float); __device__ __attribute__((pure)) float __ocml_erf_f32(float); __device__ __attribute__((pure)) float __ocml_erfinv_f32(float); __device__ __attribute__((pure)) float __ocml_exp10_f32(float); __device__ __attribute__((pure)) float __ocml_native_exp10_f32(float); __device__ __attribute__((pure)) float __ocml_exp2_f32(float); __device__ __attribute__((pure)) float __ocml_exp_f32(float); __device__ __attribute__((pure)) float __ocml_native_exp_f32(float); __device__ __attribute__((pure)) float __ocml_expm1_f32(float); __device__ __attribute__((const)) float __ocml_fabs_f32(float); __device__ __attribute__((const)) float __ocml_fdim_f32(float, float); __device__ __attribute__((const)) float __ocml_floor_f32(float); __device__ __attribute__((const)) float __ocml_fma_f32(float, float, float); __device__ __attribute__((const)) float __ocml_fmax_f32(float, float); __device__ __attribute__((const)) float __ocml_fmin_f32(float, float); __device__ __attribute__((const)) __device__ float __ocml_fmod_f32(float, float); __device__ float __ocml_frexp_f32(float, __attribute__((address_space(5))) int*); __device__ __attribute__((const)) float __ocml_hypot_f32(float, float); __device__ __attribute__((const)) int __ocml_ilogb_f32(float); __device__ __attribute__((const)) int __ocml_isfinite_f32(float); __device__ __attribute__((const)) int __ocml_isinf_f32(float); __device__ __attribute__((const)) int __ocml_isnan_f32(float); __device__ float __ocml_j0_f32(float); __device__ float __ocml_j1_f32(float); __device__ __attribute__((const)) float __ocml_ldexp_f32(float, int); __device__ float __ocml_lgamma_f32(float); __device__ __attribute__((pure)) float __ocml_log10_f32(float); __device__ __attribute__((pure)) float __ocml_native_log10_f32(float); __device__ __attribute__((pure)) float __ocml_log1p_f32(float); __device__ __attribute__((pure)) float __ocml_log2_f32(float); __device__ __attribute__((pure)) float __ocml_native_log2_f32(float); __device__ __attribute__((const)) float __ocml_logb_f32(float); __device__ __attribute__((pure)) float __ocml_log_f32(float); __device__ __attribute__((pure)) float __ocml_native_log_f32(float); __device__ float __ocml_modf_f32(float, __attribute__((address_space(5))) float*); __device__ __attribute__((const)) float __ocml_nearbyint_f32(float); __device__ __attribute__((const)) float __ocml_nextafter_f32(float, float); __device__ __attribute__((const)) float __ocml_len3_f32(float, float, float); __device__ __attribute__((const)) float __ocml_len4_f32(float, float, float, float); __device__ __attribute__((pure)) float __ocml_ncdf_f32(float); __device__ __attribute__((pure)) float __ocml_ncdfinv_f32(float); __device__ __attribute__((pure)) float __ocml_pow_f32(float, float); __device__ __attribute__((pure)) float __ocml_pown_f32(float, int); __device__ __attribute__((pure)) float __ocml_rcbrt_f32(float); __device__ __attribute__((const)) float __ocml_remainder_f32(float, float); __device__ float __ocml_remquo_f32(float, float, __attribute__((address_space(5))) int*); __device__ __attribute__((const)) float __ocml_rhypot_f32(float, float); __device__ __attribute__((const)) float __ocml_rint_f32(float); __device__ __attribute__((const)) float __ocml_rlen3_f32(float, float, float); __device__ __attribute__((const)) float __ocml_rlen4_f32(float, float, float, float); __device__ __attribute__((const)) float __ocml_round_f32(float); __device__ __attribute__((pure)) float __ocml_rsqrt_f32(float); __device__ __attribute__((const)) float __ocml_scalb_f32(float, float); __device__ __attribute__((const)) float __ocml_scalbn_f32(float, int); __device__ __attribute__((const)) int __ocml_signbit_f32(float); __device__ float __ocml_sincos_f32(float, __attribute__((address_space(5))) float*); __device__ float __ocml_sincospi_f32(float, __attribute__((address_space(5))) float*); __device__ float __ocml_sin_f32(float); __device__ float __ocml_native_sin_f32(float); __device__ __attribute__((pure)) float __ocml_sinh_f32(float); __device__ float __ocml_sinpi_f32(float); __device__ __attribute__((const)) float __ocml_sqrt_f32(float); __device__ __attribute__((const)) float __ocml_native_sqrt_f32(float); __device__ float __ocml_tan_f32(float); __device__ __attribute__((pure)) float __ocml_tanh_f32(float); __device__ float __ocml_tgamma_f32(float); __device__ __attribute__((const)) float __ocml_trunc_f32(float); __device__ float __ocml_y0_f32(float); __device__ float __ocml_y1_f32(float); // BEGIN INTRINSICS __device__ __attribute__((const)) float __ocml_add_rte_f32(float, float); __device__ __attribute__((const)) float __ocml_add_rtn_f32(float, float); __device__ __attribute__((const)) float __ocml_add_rtp_f32(float, float); __device__ __attribute__((const)) float __ocml_add_rtz_f32(float, float); __device__ __attribute__((const)) float __ocml_sub_rte_f32(float, float); __device__ __attribute__((const)) float __ocml_sub_rtn_f32(float, float); __device__ __attribute__((const)) float __ocml_sub_rtp_f32(float, float); __device__ __attribute__((const)) float __ocml_sub_rtz_f32(float, float); __device__ __attribute__((const)) float __ocml_mul_rte_f32(float, float); __device__ __attribute__((const)) float __ocml_mul_rtn_f32(float, float); __device__ __attribute__((const)) float __ocml_mul_rtp_f32(float, float); __device__ __attribute__((const)) float __ocml_mul_rtz_f32(float, float); __device__ __attribute__((const)) float __ocml_div_rte_f32(float, float); __device__ __attribute__((const)) float __ocml_div_rtn_f32(float, float); __device__ __attribute__((const)) float __ocml_div_rtp_f32(float, float); __device__ __attribute__((const)) float __ocml_div_rtz_f32(float, float); __device__ __attribute__((const)) float __ocml_sqrt_rte_f32(float); __device__ __attribute__((const)) float __ocml_sqrt_rtn_f32(float); __device__ __attribute__((const)) float __ocml_sqrt_rtp_f32(float); __device__ __attribute__((const)) float __ocml_sqrt_rtz_f32(float); __device__ __attribute__((const)) float __ocml_fma_rte_f32(float, float, float); __device__ __attribute__((const)) float __ocml_fma_rtn_f32(float, float, float); __device__ __attribute__((const)) float __ocml_fma_rtp_f32(float, float, float); __device__ __attribute__((const)) float __ocml_fma_rtz_f32(float, float, float); // END INTRINSICS // END FLOAT // BEGIN DOUBLE __device__ __attribute__((const)) double __ocml_acos_f64(double); __device__ __attribute__((pure)) double __ocml_acosh_f64(double); __device__ __attribute__((const)) double __ocml_asin_f64(double); __device__ __attribute__((pure)) double __ocml_asinh_f64(double); __device__ __attribute__((const)) double __ocml_atan2_f64(double, double); __device__ __attribute__((const)) double __ocml_atan_f64(double); __device__ __attribute__((pure)) double __ocml_atanh_f64(double); __device__ __attribute__((pure)) double __ocml_cbrt_f64(double); __device__ __attribute__((const)) double __ocml_ceil_f64(double); __device__ __attribute__((const)) double __ocml_copysign_f64(double, double); __device__ double __ocml_cos_f64(double); __device__ __attribute__((pure)) double __ocml_cosh_f64(double); __device__ double __ocml_cospi_f64(double); __device__ double __ocml_i0_f64(double); __device__ double __ocml_i1_f64(double); __device__ __attribute__((pure)) double __ocml_erfc_f64(double); __device__ __attribute__((pure)) double __ocml_erfcinv_f64(double); __device__ __attribute__((pure)) double __ocml_erfcx_f64(double); __device__ __attribute__((pure)) double __ocml_erf_f64(double); __device__ __attribute__((pure)) double __ocml_erfinv_f64(double); __device__ __attribute__((pure)) double __ocml_exp10_f64(double); __device__ __attribute__((pure)) double __ocml_exp2_f64(double); __device__ __attribute__((pure)) double __ocml_exp_f64(double); __device__ __attribute__((pure)) double __ocml_expm1_f64(double); __device__ __attribute__((const)) double __ocml_fabs_f64(double); __device__ __attribute__((const)) double __ocml_fdim_f64(double, double); __device__ __attribute__((const)) double __ocml_floor_f64(double); __device__ __attribute__((const)) double __ocml_fma_f64(double, double, double); __device__ __attribute__((const)) double __ocml_fmax_f64(double, double); __device__ __attribute__((const)) double __ocml_fmin_f64(double, double); __device__ __attribute__((const)) double __ocml_fmod_f64(double, double); __device__ double __ocml_frexp_f64(double, __attribute__((address_space(5))) int*); __device__ __attribute__((const)) double __ocml_hypot_f64(double, double); __device__ __attribute__((const)) int __ocml_ilogb_f64(double); __device__ __attribute__((const)) int __ocml_isfinite_f64(double); __device__ __attribute__((const)) int __ocml_isinf_f64(double); __device__ __attribute__((const)) int __ocml_isnan_f64(double); __device__ double __ocml_j0_f64(double); __device__ double __ocml_j1_f64(double); __device__ __attribute__((const)) double __ocml_ldexp_f64(double, int); __device__ double __ocml_lgamma_f64(double); __device__ __attribute__((pure)) double __ocml_log10_f64(double); __device__ __attribute__((pure)) double __ocml_log1p_f64(double); __device__ __attribute__((pure)) double __ocml_log2_f64(double); __device__ __attribute__((const)) double __ocml_logb_f64(double); __device__ __attribute__((pure)) double __ocml_log_f64(double); __device__ double __ocml_modf_f64(double, __attribute__((address_space(5))) double*); __device__ __attribute__((const)) double __ocml_nearbyint_f64(double); __device__ __attribute__((const)) double __ocml_nextafter_f64(double, double); __device__ __attribute__((const)) double __ocml_len3_f64(double, double, double); __device__ __attribute__((const)) double __ocml_len4_f64(double, double, double, double); __device__ __attribute__((pure)) double __ocml_ncdf_f64(double); __device__ __attribute__((pure)) double __ocml_ncdfinv_f64(double); __device__ __attribute__((pure)) double __ocml_pow_f64(double, double); __device__ __attribute__((pure)) double __ocml_pown_f64(double, int); __device__ __attribute__((pure)) double __ocml_rcbrt_f64(double); __device__ __attribute__((const)) double __ocml_remainder_f64(double, double); __device__ double __ocml_remquo_f64( double, double, __attribute__((address_space(5))) int*); __device__ __attribute__((const)) double __ocml_rhypot_f64(double, double); __device__ __attribute__((const)) double __ocml_rint_f64(double); __device__ __attribute__((const)) double __ocml_rlen3_f64(double, double, double); __device__ __attribute__((const)) double __ocml_rlen4_f64(double, double, double, double); __device__ __attribute__((const)) double __ocml_round_f64(double); __device__ __attribute__((pure)) double __ocml_rsqrt_f64(double); __device__ __attribute__((const)) double __ocml_scalb_f64(double, double); __device__ __attribute__((const)) double __ocml_scalbn_f64(double, int); __device__ __attribute__((const)) int __ocml_signbit_f64(double); __device__ double __ocml_sincos_f64(double, __attribute__((address_space(5))) double*); __device__ double __ocml_sincospi_f64(double, __attribute__((address_space(5))) double*); __device__ double __ocml_sin_f64(double); __device__ __attribute__((pure)) double __ocml_sinh_f64(double); __device__ double __ocml_sinpi_f64(double); __device__ __attribute__((const)) double __ocml_sqrt_f64(double); __device__ double __ocml_tan_f64(double); __device__ __attribute__((pure)) double __ocml_tanh_f64(double); __device__ double __ocml_tgamma_f64(double); __device__ __attribute__((const)) double __ocml_trunc_f64(double); __device__ double __ocml_y0_f64(double); __device__ double __ocml_y1_f64(double); // BEGIN INTRINSICS __device__ __attribute__((const)) double __ocml_add_rte_f64(double, double); __device__ __attribute__((const)) double __ocml_add_rtn_f64(double, double); __device__ __attribute__((const)) double __ocml_add_rtp_f64(double, double); __device__ __attribute__((const)) double __ocml_add_rtz_f64(double, double); __device__ __attribute__((const)) double __ocml_sub_rte_f64(double, double); __device__ __attribute__((const)) double __ocml_sub_rtn_f64(double, double); __device__ __attribute__((const)) double __ocml_sub_rtp_f64(double, double); __device__ __attribute__((const)) double __ocml_sub_rtz_f64(double, double); __device__ __attribute__((const)) double __ocml_mul_rte_f64(double, double); __device__ __attribute__((const)) double __ocml_mul_rtn_f64(double, double); __device__ __attribute__((const)) double __ocml_mul_rtp_f64(double, double); __device__ __attribute__((const)) double __ocml_mul_rtz_f64(double, double); __device__ __attribute__((const)) double __ocml_div_rte_f64(double, double); __device__ __attribute__((const)) double __ocml_div_rtn_f64(double, double); __device__ __attribute__((const)) double __ocml_div_rtp_f64(double, double); __device__ __attribute__((const)) double __ocml_div_rtz_f64(double, double); __device__ __attribute__((const)) double __ocml_sqrt_rte_f64(double); __device__ __attribute__((const)) double __ocml_sqrt_rtn_f64(double); __device__ __attribute__((const)) double __ocml_sqrt_rtp_f64(double); __device__ __attribute__((const)) double __ocml_sqrt_rtz_f64(double); __device__ __attribute__((const)) double __ocml_fma_rte_f64(double, double, double); __device__ __attribute__((const)) double __ocml_fma_rtn_f64(double, double, double); __device__ __attribute__((const)) double __ocml_fma_rtp_f64(double, double, double); __device__ __attribute__((const)) double __ocml_fma_rtz_f64(double, double, double); // END INTRINSICS // END DOUBLE #endif // !__CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__ #if defined(__cplusplus) } // extern "C" #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/ockl_image.h000066400000000000000000000244071450307266000232470ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include extern "C" { #define ADDRESS_SPACE_CONSTANT __attribute__((address_space(4))) __device__ float4::Native_vec_ __ockl_image_load_1D(unsigned int ADDRESS_SPACE_CONSTANT*i, int c); __device__ float4::Native_vec_ __ockl_image_load_1Db(unsigned int ADDRESS_SPACE_CONSTANT*i, int c); __device__ float4::Native_vec_ __ockl_image_load_1Da(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_load_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_load_2Da(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_load_3D(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_load_CM(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, int f); __device__ float4::Native_vec_ __ockl_image_load_CMa(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, int f); __device__ float4::Native_vec_ __ockl_image_load_lod_1D(unsigned int ADDRESS_SPACE_CONSTANT*i, int c, int l); __device__ float4::Native_vec_ __ockl_image_load_lod_1Da(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, int l); __device__ float4::Native_vec_ __ockl_image_load_lod_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, int l); __device__ float4::Native_vec_ __ockl_image_load_lod_2Da(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, int l); __device__ float4::Native_vec_ __ockl_image_load_lod_3D(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, int l); __device__ float4::Native_vec_ __ockl_image_load_lod_CM(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, int f, int l); __device__ float4::Native_vec_ __ockl_image_load_lod_CMa(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, int f, int l); __device__ void __ockl_image_store_1D(unsigned int ADDRESS_SPACE_CONSTANT*i, int c, float4::Native_vec_ p); __device__ void __ockl_image_store_1Da(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, float4::Native_vec_ p); __device__ void __ockl_image_store_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, float4::Native_vec_ p); __device__ void __ockl_image_store_2Da(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, float4::Native_vec_ p); __device__ void __ockl_image_store_3D(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, float4::Native_vec_ p); __device__ void __ockl_image_store_CM(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, int f, float4::Native_vec_ p); __device__ void __ockl_image_store_CMa(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, int f, float4::Native_vec_ p); __device__ void __ockl_image_store_lod_1D(unsigned int ADDRESS_SPACE_CONSTANT*i, int c, int l, float4::Native_vec_ p); __device__ void __ockl_image_store_lod_1Da(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, int l, float4::Native_vec_ p); __device__ void __ockl_image_store_lod_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, int l, float4::Native_vec_ p); __device__ void __ockl_image_store_lod_2Da(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, int l, float4::Native_vec_ p); __device__ void __ockl_image_store_lod_3D(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, int l, float4::Native_vec_ p); __device__ void __ockl_image_store_lod_CM(unsigned int ADDRESS_SPACE_CONSTANT*i, int2::Native_vec_ c, int f, int l, float4::Native_vec_ p); __device__ void __ockl_image_store_lod_CMa(unsigned int ADDRESS_SPACE_CONSTANT*i, int4::Native_vec_ c, int f, int l, float4::Native_vec_ p); __device__ float4::Native_vec_ __ockl_image_sample_1D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float c); __device__ float4::Native_vec_ __ockl_image_sample_1Da(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_sample_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_sample_2Da(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_sample_3D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_sample_CM(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_sample_CMa(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_sample_grad_1D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float c, float dx, float dy); __device__ float4::Native_vec_ __ockl_image_sample_grad_1Da(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c, float dx, float dy); __device__ float4::Native_vec_ __ockl_image_sample_grad_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c, float2::Native_vec_ dx, float2::Native_vec_ dy); __device__ float4::Native_vec_ __ockl_image_sample_grad_2Da(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c, float2::Native_vec_ dx, float2::Native_vec_ dy); __device__ float4::Native_vec_ __ockl_image_sample_grad_3D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c, float4::Native_vec_ dx, float4::Native_vec_ dy); __device__ float4::Native_vec_ __ockl_image_sample_lod_1D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float c, float l); __device__ float4::Native_vec_ __ockl_image_sample_lod_1Da(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c, float l); __device__ float4::Native_vec_ __ockl_image_sample_lod_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c, float l); __device__ float4::Native_vec_ __ockl_image_sample_lod_2Da(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c, float l); __device__ float4::Native_vec_ __ockl_image_sample_lod_3D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c, float l); __device__ float4::Native_vec_ __ockl_image_sample_lod_CM(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c, float l); __device__ float4::Native_vec_ __ockl_image_sample_lod_CMa(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float4::Native_vec_ c, float l); __device__ float4::Native_vec_ __ockl_image_gather4r_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_gather4g_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_gather4b_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c); __device__ float4::Native_vec_ __ockl_image_gather4a_2D(unsigned int ADDRESS_SPACE_CONSTANT*i, unsigned int ADDRESS_SPACE_CONSTANT*s, float2::Native_vec_ c); __device__ int __ockl_image_channel_data_type_1D(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_1Da(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_1Db(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_2D(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_2Da(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_2Dad(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_2Dd(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_3D(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_CM(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_data_type_CMa(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_1D(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_1Da(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_1Db(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_2D(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_2Da(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_2Dad(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_2Dd(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_3D(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_CM(unsigned int ADDRESS_SPACE_CONSTANT* i); __device__ int __ockl_image_channel_order_CMa(unsigned int ADDRESS_SPACE_CONSTANT* i); };clr-rocm-5.7.1/hipamd/include/hip/amd_detail/program_state.hpp000066400000000000000000000061221450307266000243560ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include #include #include #include #include #include #include #include struct ihipModuleSymbol_t; using hipFunction_t = ihipModuleSymbol_t*; namespace hip_impl { // This section contains internal APIs that // needs to be exported #ifdef __GNUC__ #pragma GCC visibility push (default) #endif struct kernarg_impl; class kernarg { public: kernarg(); kernarg(kernarg&&); ~kernarg(); std::uint8_t* data(); std::size_t size(); void reserve(std::size_t); void resize(std::size_t); private: kernarg_impl* impl; }; class kernargs_size_align; class program_state_impl; class program_state { public: program_state(); ~program_state(); program_state(const program_state&) = delete; hipFunction_t kernel_descriptor(std::uintptr_t, hsa_agent_t); kernargs_size_align get_kernargs_size_align(std::uintptr_t); hsa_executable_t load_executable(const char*, const size_t, hsa_executable_t, hsa_agent_t); hsa_executable_t load_executable_no_copy(const char*, const size_t, hsa_executable_t, hsa_agent_t); void* global_addr_by_name(const char* name); private: friend class agent_globals_impl; program_state_impl* impl; }; class kernargs_size_align { public: std::size_t size(std::size_t n) const; std::size_t alignment(std::size_t n) const; const void* getHandle() const {return handle;}; private: const void* handle; friend kernargs_size_align program_state::get_kernargs_size_align(std::uintptr_t); }; #ifdef __GNUC__ #pragma GCC visibility pop #endif inline __attribute__((visibility("hidden"))) program_state& get_program_state() { static program_state ps; return ps; } } // Namespace hip_impl. clr-rocm-5.7.1/hipamd/include/hip/amd_detail/texture_fetch_functions.h000066400000000000000000000425741450307266000261230ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #if defined(__cplusplus) #include #include #include #if !defined(__HIPCC_RTC__) #include #endif // !defined(__HIPCC_RTC__) #define TEXTURE_PARAMETERS_INIT \ unsigned int ADDRESS_SPACE_CONSTANT* i = (unsigned int ADDRESS_SPACE_CONSTANT*)t.textureObject; \ unsigned int ADDRESS_SPACE_CONSTANT* s = i + HIP_SAMPLER_OBJECT_OFFSET_DWORD; template struct __hip_is_tex_surf_scalar_channel_type { static constexpr bool value = std::is_same::value || std::is_same::value || std::is_same::value || std::is_same::value || std::is_same::value || std::is_same::value || std::is_same::value; }; template struct __hip_is_tex_surf_channel_type { static constexpr bool value = __hip_is_tex_surf_scalar_channel_type::value; }; template< typename T, unsigned int rank> struct __hip_is_tex_surf_channel_type> { static constexpr bool value = __hip_is_tex_surf_scalar_channel_type::value && ((rank == 1) || (rank == 2) || (rank == 4)); }; template struct __hip_is_tex_normalized_channel_type { static constexpr bool value = std::is_same::value || std::is_same::value || std::is_same::value || std::is_same::value; }; template< typename T, unsigned int rank> struct __hip_is_tex_normalized_channel_type> { static constexpr bool value = __hip_is_tex_normalized_channel_type::value && ((rank == 1) || (rank == 2) || (rank == 4)); }; template < typename T, hipTextureReadMode readMode, typename Enable = void> struct __hip_tex_ret { static_assert(std::is_same::value, "Invalid channel type!"); }; /* * Map from device function return U to scalar texture type T */ template __forceinline__ __device__ typename std::enable_if< __hip_is_tex_surf_scalar_channel_type::value, const T>::type __hipMapFrom(const U &u) { if constexpr (sizeof(T) < sizeof(float)) { union { U u; int i; } d = { u }; return static_cast(d.i); } else { // sizeof(T) == sizeof(float) union { U u; T t; } d = { u }; return d.t; } } /* * Map from device function return U to vector texture type T */ template __forceinline__ __device__ typename std::enable_if< __hip_is_tex_surf_scalar_channel_type::value, const T>::type __hipMapFrom(const U &u) { if constexpr (sizeof(typename T::value_type) < sizeof(float)) { union { U u; int4 i4; } d = { u }; return __hipMapVector(d.i4); } else { // sizeof(typename T::value_type) == sizeof(float) union { U u; T t; } d = { u }; return d.t; } } /* * Map from scalar texture type T to device function input U */ template __forceinline__ __device__ typename std::enable_if< __hip_is_tex_surf_scalar_channel_type::value, const U>::type __hipMapTo(const T &t) { if constexpr (sizeof(T) < sizeof(float)) { union { U u; int i; } d = { 0 }; d.i = static_cast(t); return d.u; } else { // sizeof(T) == sizeof(float) union { U u; T t; } d = { 0 }; d.t = t; return d.u; } } /* * Map from vector texture type T to device function input U */ template __forceinline__ __device__ typename std::enable_if< __hip_is_tex_surf_scalar_channel_type::value, const U>::type __hipMapTo(const T &t) { if constexpr (sizeof(typename T::value_type) < sizeof(float)) { union { U u; int4 i4; } d = { 0 }; d.i4 = __hipMapVector(t); return d.u; } else { // sizeof(typename T::value_type) == sizeof(float) union { U u; T t; } d = { 0 }; d.t = t; return d.u; } } template < typename T, hipTextureReadMode readMode> using __hip_tex_ret_t = typename __hip_tex_ret::type; template struct __hip_tex_ret< T, hipReadModeElementType, typename std::enable_if<__hip_is_tex_surf_channel_type::value, bool>::type> { using type = T; }; template< typename T, unsigned int rank> struct __hip_tex_ret< HIP_vector_type, hipReadModeElementType, typename std::enable_if<__hip_is_tex_surf_channel_type>::value, bool>::type> { using type = HIP_vector_type<__hip_tex_ret_t, rank>; }; template struct __hip_tex_ret< T, hipReadModeNormalizedFloat, typename std::enable_if<__hip_is_tex_normalized_channel_type::value, bool>::type> { using type = float; }; template< typename T, unsigned int rank> struct __hip_tex_ret< HIP_vector_type, hipReadModeNormalizedFloat, typename std::enable_if<__hip_is_tex_normalized_channel_type>::value, bool>::type> { using type = HIP_vector_type<__hip_tex_ret_t, rank>; }; template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex1Dfetch(texture t, int x) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_load_1Db(i, x); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex1D(texture t, float x) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_1D(i, s, x); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex2D(texture t, float x, float y) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_2D(i, s, float2(x, y).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex1DLayered(texture t, float x, int layer) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_1Da(i, s, float2(x, layer).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex2DLayered(texture t, float x, float y, int layer) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_2Da(i, s, float4(x, y, layer, 0.0f).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex3D(texture t, float x, float y, float z) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_3D(i, s, float4(x, y, z, 0.0f).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t texCubemap(texture t, float x, float y, float z) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_CM(i, s, float4(x, y, z, 0.0f).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex1DLod(texture t, float x, float level) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_lod_1D(i, s, x, level); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex2DLod(texture t, float x, float y, float level) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_lod_2D(i, s, float2(x, y).data, level); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex1DLayeredLod(texture t, float x, int layer, float level) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_lod_1Da(i, s, float2(x, layer).data, level); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex2DLayeredLod(texture t, float x, float y, int layer, float level) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_lod_2Da(i, s, float4(x, y, layer, 0.0f).data, level); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex3DLod(texture t, float x, float y, float z, float level) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_lod_3D(i, s, float4(x, y, z, 0.0f).data, level); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t texCubemapLod(texture t, float x, float y, float z, float level) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_lod_CM(i, s, float4(x, y, z, 0.0f).data, level); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t texCubemapLayered(texture t, float x, float y, float z, int layer) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_CMa(i, s, float4(x, y, z, layer).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t texCubemapLayeredLod(texture t, float x, float y, float z, int layer, float level) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_lod_CMa(i, s, float4(x, y, z, layer).data, level); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t texCubemapGrad(texture t, float x, float y, float z, float4 dPdx, float4 dPdy) { TEXTURE_PARAMETERS_INIT; // TODO missing in device libs. // auto tmp = __ockl_image_sample_grad_CM(i, s, float4(x, y, z, 0.0f).data, float4(dPdx.x, dPdx.y, dPdx.z, 0.0f).data, float4(dPdy.x, dPdy.y, dPdy.z, 0.0f).data); // return __hipMapFrom<__hip_tex_ret_t>(tmp); return {}; } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t texCubemapLayeredGrad(texture t, float x, float y, float z, int layer, float4 dPdx, float4 dPdy) { TEXTURE_PARAMETERS_INIT; // TODO missing in device libs. // auto tmp = __ockl_image_sample_grad_CMa(i, s, float4(x, y, z, layer).data, float4(dPdx.x, dPdx.y, dPdx.z, 0.0f).data, float4(dPdy.x, dPdy.y, dPdy.z, 0.0f).data); // return __hipMapFrom<__hip_tex_ret_t>(tmp); return {}; } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex1DGrad(texture t, float x, float dPdx, float dPdy) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_grad_1D(i, s, x, dPdx, dPdy); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex2DGrad(texture t, float x, float y, float2 dPdx, float2 dPdy) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_grad_2D(i, s, float2(x, y).data, float2(dPdx.x, dPdx.y).data, float2(dPdy.x, dPdy.y).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex1DLayeredGrad(texture t, float x, int layer, float dPdx, float dPdy) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_grad_1Da(i, s, float2(x, layer).data, dPdx, dPdy); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex2DLayeredGrad(texture t, float x, float y, int layer, float2 dPdx, float2 dPdy) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_grad_2Da(i, s, float4(x, y, layer, 0.0f).data, float2(dPdx.x, dPdx.y).data, float2(dPdy.x, dPdy.y).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template static __forceinline__ __device__ __hip_img_chk__ __hip_tex_ret_t tex3DGrad(texture t, float x, float y, float z, float4 dPdx, float4 dPdy) { TEXTURE_PARAMETERS_INIT; auto tmp = __ockl_image_sample_grad_3D(i, s, float4(x, y, z, 0.0f).data, float4(dPdx.x, dPdx.y, dPdx.z, 0.0f).data, float4(dPdy.x, dPdy.y, dPdy.z, 0.0f).data); return __hipMapFrom<__hip_tex_ret_t>(tmp); } template < typename T, hipTextureReadMode readMode, typename Enable = void> struct __hip_tex2dgather_ret { static_assert(std::is_same::value, "Invalid channel type!"); }; template < typename T, hipTextureReadMode readMode> using __hip_tex2dgather_ret_t = typename __hip_tex2dgather_ret::type; template struct __hip_tex2dgather_ret< T, hipReadModeElementType, typename std::enable_if<__hip_is_tex_surf_channel_type::value, bool>::type> { using type = HIP_vector_type; }; template< typename T, unsigned int rank> struct __hip_tex2dgather_ret< HIP_vector_type, hipReadModeElementType, typename std::enable_if<__hip_is_tex_surf_channel_type>::value, bool>::type> { using type = HIP_vector_type; }; template struct __hip_tex2dgather_ret< T, hipReadModeNormalizedFloat, typename std::enable_if<__hip_is_tex_normalized_channel_type::value, bool>::type> { using type = float4; }; template static __forceinline__ __device__ __hip_img_chk__ __hip_tex2dgather_ret_t tex2Dgather(texture t, float x, float y, int comp=0) { TEXTURE_PARAMETERS_INIT; switch (comp) { case 1: { auto tmp = __ockl_image_gather4g_2D(i, s, float2(x, y).data); return __hipMapFrom<__hip_tex2dgather_ret_t>(tmp); } case 2: { auto tmp = __ockl_image_gather4b_2D(i, s, float2(x, y).data); return __hipMapFrom<__hip_tex2dgather_ret_t>(tmp); } case 3: { auto tmp = __ockl_image_gather4a_2D(i, s, float2(x, y).data); return __hipMapFrom<__hip_tex2dgather_ret_t>(tmp); } default: { auto tmp = __ockl_image_gather4r_2D(i, s, float2(x, y).data); return __hipMapFrom<__hip_tex2dgather_ret_t>(tmp); } } return {}; } #endif clr-rocm-5.7.1/hipamd/include/hip/amd_detail/texture_indirect_functions.h000066400000000000000000000440211450307266000266200ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #if defined(__cplusplus) #include #include #include #include #if !defined(__HIPCC_RTC__) #include #endif // !defined(__HIPCC_RTC__) #define TEXTURE_OBJECT_PARAMETERS_INIT \ unsigned int ADDRESS_SPACE_CONSTANT* i = (unsigned int ADDRESS_SPACE_CONSTANT*)textureObject; \ unsigned int ADDRESS_SPACE_CONSTANT* s = i + HIP_SAMPLER_OBJECT_OFFSET_DWORD; template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex1Dfetch(hipTextureObject_t textureObject, int x) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_load_1Db(i, x); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex1Dfetch(T *ptr, hipTextureObject_t textureObject, int x) { *ptr = tex1Dfetch(textureObject, x); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex1D(hipTextureObject_t textureObject, float x) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_1D(i, s, x); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex1D(T *ptr, hipTextureObject_t textureObject, float x) { *ptr = tex1D(textureObject, x); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex2D(hipTextureObject_t textureObject, float x, float y) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_2D(i, s, float2(x, y).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex2D(T *ptr, hipTextureObject_t textureObject, float x, float y) { *ptr = tex2D(textureObject, x, y); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex3D(hipTextureObject_t textureObject, float x, float y, float z) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_3D(i, s, float4(x, y, z, 0.0f).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex3D(T *ptr, hipTextureObject_t textureObject, float x, float y, float z) { *ptr = tex3D(textureObject, x, y, z); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex1DLayered(hipTextureObject_t textureObject, float x, int layer) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_1Da(i, s, float2(x, layer).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex1DLayered(T *ptr, hipTextureObject_t textureObject, float x, int layer) { *ptr = tex1DLayered(textureObject, x, layer); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex2DLayered(hipTextureObject_t textureObject, float x, float y, int layer) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_2Da(i, s, float4(x, y, layer, 0.0f).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex2DLayered(T *ptr, hipTextureObject_t textureObject, float x, float y, int layer) { *ptr = tex1DLayered(textureObject, x, y, layer); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T texCubemap(hipTextureObject_t textureObject, float x, float y, float z) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_CM(i, s, float4(x, y, z, 0.0f).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void texCubemap(T *ptr, hipTextureObject_t textureObject, float x, float y, float z) { *ptr = texCubemap(textureObject, x, y, z); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T texCubemapLayered(hipTextureObject_t textureObject, float x, float y, float z, int layer) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_CMa(i, s, float4(x, y, z, layer).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void texCubemapLayered(T *ptr, hipTextureObject_t textureObject, float x, float y, float z, int layer) { *ptr = texCubemapLayered(textureObject, x, y, z, layer); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex2Dgather(hipTextureObject_t textureObject, float x, float y, int comp = 0) { TEXTURE_OBJECT_PARAMETERS_INIT switch (comp) { case 1: { auto tmp = __ockl_image_gather4r_2D(i, s, float2(x, y).data); return __hipMapFrom(tmp); break; } case 2: { auto tmp = __ockl_image_gather4g_2D(i, s, float2(x, y).data); return __hipMapFrom(tmp); break; } case 3: { auto tmp = __ockl_image_gather4b_2D(i, s, float2(x, y).data); return __hipMapFrom(tmp); break; } default: { auto tmp = __ockl_image_gather4a_2D(i, s, float2(x, y).data); return __hipMapFrom(tmp); break; } }; return {}; } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex2Dgather(T *ptr, hipTextureObject_t textureObject, float x, float y, int comp = 0) { *ptr = texCubemapLayered(textureObject, x, y, comp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex1DLod(hipTextureObject_t textureObject, float x, float level) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_lod_1D(i, s, x, level); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex1DLod(T *ptr, hipTextureObject_t textureObject, float x, float level) { *ptr = tex1DLod(textureObject, x, level); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex2DLod(hipTextureObject_t textureObject, float x, float y, float level) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_lod_2D(i, s, float2(x, y).data, level); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex2DLod(T *ptr, hipTextureObject_t textureObject, float x, float y, float level) { *ptr = tex2DLod(textureObject, x, y, level); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex3DLod(hipTextureObject_t textureObject, float x, float y, float z, float level) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_lod_3D(i, s, float4(x, y, z, 0.0f).data, level); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex3DLod(T *ptr, hipTextureObject_t textureObject, float x, float y, float z, float level) { *ptr = tex3DLod(textureObject, x, y, z, level); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex1DLayeredLod(hipTextureObject_t textureObject, float x, int layer, float level) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_1Da(i, s, float2(x, layer).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex1DLayeredLod(T *ptr, hipTextureObject_t textureObject, float x, int layer, float level) { *ptr = tex1DLayeredLod(textureObject, x, layer, level); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex2DLayeredLod(hipTextureObject_t textureObject, float x, float y, int layer, float level) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_2Da(i, s, float4(x, y, layer, 0.0f).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex2DLayeredLod(T *ptr, hipTextureObject_t textureObject, float x, float y, int layer, float level) { *ptr = tex2DLayeredLod(textureObject, x, y, layer, level); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T texCubemapLod(hipTextureObject_t textureObject, float x, float y, float z, float level) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_lod_CM(i, s, float4(x, y, z, 0.0f).data, level); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void texCubemapLod(T *ptr, hipTextureObject_t textureObject, float x, float y, float z, float level) { *ptr = texCubemapLod(textureObject, x, y, z, level); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T texCubemapGrad(hipTextureObject_t textureObject, float x, float y, float z, float4 dPdx, float4 dPdy) { TEXTURE_OBJECT_PARAMETERS_INIT // TODO missing in device libs. // auto tmp = __ockl_image_sample_grad_CM(i, s, float4(x, y, z, 0.0f).data, float4(dPdx.x, dPdx.y, dPdx.z, 0.0f).data, float4(dPdy.x, dPdy.y, dPdy.z, 0.0f).data); // return __hipMapFrom(tmp); return {}; } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void texCubemapGrad(T *ptr, hipTextureObject_t textureObject, float x, float y, float z, float4 dPdx, float4 dPdy) { *ptr = texCubemapGrad(textureObject, x, y, z, dPdx, dPdy); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T texCubemapLayeredLod(hipTextureObject_t textureObject, float x, float y, float z, int layer, float level) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_lod_CMa(i, s, float4(x, y, z, layer).data, level); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void texCubemapLayeredLod(T *ptr, hipTextureObject_t textureObject, float x, float y, float z, int layer, float level) { *ptr = texCubemapLayeredLod(textureObject, x, y, z, layer, level); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex1DGrad(hipTextureObject_t textureObject, float x, float dPdx, float dPdy) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_grad_1D(i, s, x, dPdx, dPdy); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex1DGrad(T *ptr, hipTextureObject_t textureObject, float x, float dPdx, float dPdy) { *ptr = tex1DGrad(textureObject, x, dPdx, dPdy); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex2DGrad(hipTextureObject_t textureObject, float x, float y, float2 dPdx, float2 dPdy) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_grad_2D(i, s, float2(x, y).data, float2(dPdx.x, dPdx.y).data, float2(dPdy.x, dPdy.y).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex2DGrad(T *ptr, hipTextureObject_t textureObject, float x, float y, float2 dPdx, float2 dPdy) { *ptr = tex2DGrad(textureObject, x, y, dPdx, dPdy); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex3DGrad(hipTextureObject_t textureObject, float x, float y, float z, float4 dPdx, float4 dPdy) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_grad_3D(i, s, float4(x, y, z, 0.0f).data, float4(dPdx.x, dPdx.y, dPdx.z, 0.0f).data, float4(dPdy.x, dPdy.y, dPdy.z, 0.0f).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex3DGrad(T *ptr, hipTextureObject_t textureObject, float x, float y, float z, float4 dPdx, float4 dPdy) { *ptr = tex3DGrad(textureObject, x, y, z, dPdx, dPdy); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex1DLayeredGrad(hipTextureObject_t textureObject, float x, int layer, float dPdx, float dPdy) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_grad_1Da(i, s, float2(x, layer).data, dPdx, dPdy); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex1DLayeredGrad(T *ptr, hipTextureObject_t textureObject, float x, int layer, float dPdx, float dPdy) { *ptr = tex1DLayeredGrad(textureObject, x, layer, dPdx, dPdy); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T tex2DLayeredGrad(hipTextureObject_t textureObject, float x, float y, int layer, float2 dPdx, float2 dPdy) { TEXTURE_OBJECT_PARAMETERS_INIT auto tmp = __ockl_image_sample_grad_2Da(i, s, float4(x, y, layer, 0.0f).data, float2(dPdx.x, dPdx.y).data, float2(dPdy.x, dPdy.y).data); return __hipMapFrom(tmp); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void tex2DLayeredGrad(T *ptr, hipTextureObject_t textureObject, float x, float y, int layer, float2 dPdx, float2 dPdy) { *ptr = tex2DLayeredGrad(textureObject, x, y, layer, dPdx, dPdy); } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ T texCubemapLayeredGrad(hipTextureObject_t textureObject, float x, float y, float z, int layer, float4 dPdx, float4 dPdy) { TEXTURE_OBJECT_PARAMETERS_INIT // TODO missing in device libs. // auto tmp = __ockl_image_sample_grad_CMa(i, s, float4(x, y, z, layer).data, float4(dPdx.x, dPdx.y, dPdx.z, 0.0f).data, float4(dPdy.x, dPdy.y, dPdy.z, 0.0f).data); // return __hipMapFrom(tmp); return {}; } template < typename T, typename std::enable_if<__hip_is_tex_surf_channel_type::value>::type* = nullptr> static __device__ __hip_img_chk__ void texCubemapLayeredGrad(T *ptr, hipTextureObject_t textureObject, float x, float y, float z, int layer, float4 dPdx, float4 dPdy) { *ptr = texCubemapLayeredGrad(textureObject, x, y, z, layer, dPdx, dPdy); } #endif clr-rocm-5.7.1/hipamd/include/hip/hcc_detail000077700000000000000000000000001450307266000227152amd_detailustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/include/hip/nvcc_detail000077700000000000000000000000001450307266000236222nvidia_detailustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/000077500000000000000000000000001450307266000215065ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_channel_descriptor.h000066400000000000000000000023631450307266000270630ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_CHANNEL_DESCRIPTOR_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_CHANNEL_DESCRIPTOR_H #include "channel_descriptor.h" #endif clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_atomics.h000066400000000000000000000047621450307266000255210ustar00rootroot00000000000000/* Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_ATOMICS_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_ATOMICS_H __device__ inline float atomicMax(float* addr, float val) { int ret = __float_as_int(*addr); while (val > __int_as_float(ret)) { int old = ret; if ((ret = atomicCAS((int *)addr, old, __float_as_int(val))) == old) break; } return __int_as_float(ret); } __device__ inline double atomicMax(double* addr, double val) { unsigned long long ret = __double_as_longlong(*addr); while (val > __longlong_as_double(ret)) { unsigned long long old = ret; if ((ret = atomicCAS((unsigned long long *)addr, old, __double_as_longlong(val))) == old) break; } return __longlong_as_double(ret); } __device__ inline float atomicMin(float* addr, float val) { int ret = __float_as_int(*addr); while (val < __int_as_float(ret)) { int old = ret; if ((ret = atomicCAS((int *)addr, old, __float_as_int(val))) == old) break; } return __int_as_float(ret); } __device__ inline double atomicMin(double* addr, double val) { unsigned long long ret = __double_as_longlong(*addr); while (val < __longlong_as_double(ret)) { unsigned long long old = ret; if ((ret = atomicCAS((unsigned long long *)addr, old, __double_as_longlong(val))) == old) break; } return __longlong_as_double(ret); } #endif clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_bf16.h000066400000000000000000000025321450307266000246110ustar00rootroot00000000000000/* Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_FP16_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_FP16_H #include typedef struct __nv_bfloat16 __hip_bfloat16; typedef struct __nv_bfloat162 __hip_bfloat162; #endif // HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_FP16_H clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_complex.h000066400000000000000000000104321450307266000255200ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_COMPLEX_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_COMPLEX_H #include "cuComplex.h" typedef cuFloatComplex hipFloatComplex; __device__ __host__ static inline float hipCrealf(hipFloatComplex z) { return cuCrealf(z); } __device__ __host__ static inline float hipCimagf(hipFloatComplex z) { return cuCimagf(z); } __device__ __host__ static inline hipFloatComplex make_hipFloatComplex(float a, float b) { return make_cuFloatComplex(a, b); } __device__ __host__ static inline hipFloatComplex hipConjf(hipFloatComplex z) { return cuConjf(z); } __device__ __host__ static inline float hipCsqabsf(hipFloatComplex z) { return cuCabsf(z) * cuCabsf(z); } __device__ __host__ static inline hipFloatComplex hipCaddf(hipFloatComplex p, hipFloatComplex q) { return cuCaddf(p, q); } __device__ __host__ static inline hipFloatComplex hipCsubf(hipFloatComplex p, hipFloatComplex q) { return cuCsubf(p, q); } __device__ __host__ static inline hipFloatComplex hipCmulf(hipFloatComplex p, hipFloatComplex q) { return cuCmulf(p, q); } __device__ __host__ static inline hipFloatComplex hipCdivf(hipFloatComplex p, hipFloatComplex q) { return cuCdivf(p, q); } __device__ __host__ static inline float hipCabsf(hipFloatComplex z) { return cuCabsf(z); } typedef cuDoubleComplex hipDoubleComplex; __device__ __host__ static inline double hipCreal(hipDoubleComplex z) { return cuCreal(z); } __device__ __host__ static inline double hipCimag(hipDoubleComplex z) { return cuCimag(z); } __device__ __host__ static inline hipDoubleComplex make_hipDoubleComplex(double a, double b) { return make_cuDoubleComplex(a, b); } __device__ __host__ static inline hipDoubleComplex hipConj(hipDoubleComplex z) { return cuConj(z); } __device__ __host__ static inline double hipCsqabs(hipDoubleComplex z) { return cuCabs(z) * cuCabs(z); } __device__ __host__ static inline hipDoubleComplex hipCadd(hipDoubleComplex p, hipDoubleComplex q) { return cuCadd(p, q); } __device__ __host__ static inline hipDoubleComplex hipCsub(hipDoubleComplex p, hipDoubleComplex q) { return cuCsub(p, q); } __device__ __host__ static inline hipDoubleComplex hipCmul(hipDoubleComplex p, hipDoubleComplex q) { return cuCmul(p, q); } __device__ __host__ static inline hipDoubleComplex hipCdiv(hipDoubleComplex p, hipDoubleComplex q) { return cuCdiv(p, q); } __device__ __host__ static inline double hipCabs(hipDoubleComplex z) { return cuCabs(z); } typedef cuFloatComplex hipComplex; __device__ __host__ static inline hipComplex make_hipComplex(float x, float y) { return make_cuComplex(x, y); } __device__ __host__ static inline hipFloatComplex hipComplexDoubleToFloat(hipDoubleComplex z) { return cuComplexDoubleToFloat(z); } __device__ __host__ static inline hipDoubleComplex hipComplexFloatToDouble(hipFloatComplex z) { return cuComplexFloatToDouble(z); } __device__ __host__ static inline hipComplex hipCfmaf(hipComplex p, hipComplex q, hipComplex r) { return cuCfmaf(p, q, r); } __device__ __host__ static inline hipDoubleComplex hipCfma(hipDoubleComplex p, hipDoubleComplex q, hipDoubleComplex r) { return cuCfma(p, q, r); } #endif clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_cooperative_groups.h000066400000000000000000000005751450307266000277770ustar00rootroot00000000000000#ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_COOPERATIVE_GROUPS_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_COOPERATIVE_GROUPS_H // Include CUDA headers #include #include // Include HIP wrapper headers around CUDA #include #include #endif // HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_COOPERATIVE_GROUPS_H clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_gl_interop.h000066400000000000000000000042271450307266000262200ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_NVIDIA_HIP_GL_INTEROP_H #define HIP_INCLUDE_NVIDIA_HIP_GL_INTEROP_H #include typedef enum cudaGLDeviceList hipGLDeviceList; #define hipGLDeviceListAll cudaGLDeviceListAll #define hipGLDeviceListCurrentFrame cudaGLDeviceListCurrentFrame #define hipGLDeviceListNextFrame cudaGLDeviceListNextFrame inline static hipError_t hipGLGetDevices(unsigned int* pHipDeviceCount, int* pHipDevices, unsigned int hipDeviceCount, hipGLDeviceList deviceList) { return hipCUDAErrorTohipError(cudaGLGetDevices(pHipDeviceCount, pHipDevices, hipDeviceCount, deviceList)); } inline static hipError_t hipGraphicsGLRegisterBuffer(hipGraphicsResource** resource, GLuint buffer, unsigned int flags) { return hipCUDAErrorTohipError(cudaGraphicsGLRegisterBuffer(resource, buffer, flags)); } inline static hipError_t hipGraphicsGLRegisterImage(hipGraphicsResource** resource, GLuint image, GLenum target, unsigned int flags) { return hipCUDAErrorTohipError(cudaGraphicsGLRegisterImage(resource, image, target, flags)); } #endifclr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_math_constants.h000066400000000000000000000126421450307266000271030ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef NVIDIA_HIP_MATH_CONSTANTS_H #define NVIDIA_HIP_MATH_CONSTANTS_H #include // single precision constants #define HIP_INF_F CUDART_INF_F #define HIP_NAN_F CUDART_NAN_F #define HIP_MIN_DENORM_F CUDART_MIN_DENORM_F #define HIP_MAX_NORMAL_F CUDART_MAX_NORMAL_F #define HIP_NEG_ZERO_F CUDART_NEG_ZERO_F #define HIP_ZERO_F CUDART_ZERO_F #define HIP_ONE_F CUDART_ONE_F #define HIP_SQRT_HALF_F CUDART_SQRT_HALF_F #define HIP_SQRT_HALF_HI_F CUDART_SQRT_HALF_HI_F #define HIP_SQRT_HALF_LO_F CUDART_SQRT_HALF_LO_F #define HIP_SQRT_TWO_F CUDART_SQRT_TWO_F #define HIP_THIRD_F CUDART_THIRD_F #define HIP_PIO4_F CUDART_PIO4_F #define HIP_PIO2_F CUDART_PIO2_F #define HIP_3PIO4_F CUDART_3PIO4_F #define HIP_2_OVER_PI_F CUDART_2_OVER_PI_F #define HIP_SQRT_2_OVER_PI_F CUDART_SQRT_2_OVER_PI_F #define HIP_PI_F CUDART_PI_F #define HIP_L2E_F CUDART_L2E_F #define HIP_L2T_F CUDART_L2T_F #define HIP_LG2_F CUDART_LG2_F #define HIP_LGE_F CUDART_LGE_F #define HIP_LN2_F CUDART_LN2_F #define HIP_LNT_F CUDART_LNT_F #define HIP_LNPI_F CUDART_LNPI_F #define HIP_TWO_TO_M126_F CUDART_TWO_TO_M126_F #define HIP_TWO_TO_126_F CUDART_TWO_TO_126_F #define HIP_NORM_HUGE_F CUDART_NORM_HUGE_F #define HIP_TWO_TO_23_F CUDART_TWO_TO_23_F #define HIP_TWO_TO_24_F CUDART_TWO_TO_24_F #define HIP_TWO_TO_31_F CUDART_TWO_TO_31_F #define HIP_TWO_TO_32_F CUDART_TWO_TO_32_F #define HIP_REMQUO_BITS_F CUDART_REMQUO_BITS_F #define HIP_REMQUO_MASK_F CUDART_REMQUO_MASK_F #define HIP_TRIG_PLOSS_F CUDART_TRIG_PLOSS_F // double precision constants #define HIP_INF CUDART_INF #define HIP_NAN CUDART_NAN #define HIP_NEG_ZERO CUDART_NEG_ZERO #define HIP_MIN_DENORM CUDART_MIN_DENORM #define HIP_ZERO CUDART_ZERO #define HIP_ONE CUDART_ONE #define HIP_SQRT_TWO CUDART_SQRT_TWO #define HIP_SQRT_HALF CUDART_SQRT_HALF #define HIP_SQRT_HALF_HI CUDART_SQRT_HALF_HI #define HIP_SQRT_HALF_LO CUDART_SQRT_HALF_LO #define HIP_THIRD CUDART_THIRD #define HIP_TWOTHIRD CUDART_TWOTHIRD #define HIP_PIO4 CUDART_PIO4 #define HIP_PIO4_HI CUDART_PIO4_HI #define HIP_PIO4_LO CUDART_PIO4_LO #define HIP_PIO2 CUDART_PIO2 #define HIP_PIO2_HI CUDART_PIO2_HI #define HIP_PIO2_LO CUDART_PIO2_LO #define HIP_3PIO4 CUDART_3PIO4 #define HIP_2_OVER_PI CUDART_2_OVER_PI #define HIP_PI CUDART_PI #define HIP_PI_HI CUDART_PI_HI #define HIP_PI_LO CUDART_PI_LO #define HIP_SQRT_2PI CUDART_SQRT_2PI #define HIP_SQRT_2PI_HI CUDART_SQRT_2PI_HI #define HIP_SQRT_2PI_LO CUDART_SQRT_2PI_LO #define HIP_SQRT_PIO2 CUDART_SQRT_PIO2 #define HIP_SQRT_PIO2_HI CUDART_SQRT_PIO2_HI #define HIP_SQRT_PIO2_LO CUDART_SQRT_PIO2_LO #define HIP_SQRT_2OPI CUDART_SQRT_2OPI #define HIP_L2E CUDART_L2E #define HIP_L2E_HI CUDART_L2E_HI #define HIP_L2E_LO CUDART_L2E_LO #define HIP_L2T CUDART_L2T #define HIP_LG2 CUDART_LG2 #define HIP_LG2_HI CUDART_LG2_HI #define HIP_LG2_LO CUDART_LG2_LO #define HIP_LGE CUDART_LGE #define HIP_LGE_HI CUDART_LGE_HI #define HIP_LGE_LO CUDART_LGE_LO #define HIP_LN2 CUDART_LN2 #define HIP_LN2_HI CUDART_LN2_HI #define HIP_LN2_LO CUDART_LN2_LO #define HIP_LNT CUDART_LNT #define HIP_LNT_HI CUDART_LNT_HI #define HIP_LNT_LO CUDART_LNT_LO #define HIP_LNPI CUDART_LNPI #define HIP_LN2_X_1024 CUDART_LN2_X_1024 #define HIP_LN2_X_1025 CUDART_LN2_X_1025 #define HIP_LN2_X_1075 CUDART_LN2_X_1075 #define HIP_LG2_X_1024 CUDART_LG2_X_1024 #define HIP_LG2_X_1075 CUDART_LG2_X_1075 #define HIP_TWO_TO_23 CUDART_TWO_TO_23 #define HIP_TWO_TO_52 CUDART_TWO_TO_52 #define HIP_TWO_TO_53 CUDART_TWO_TO_53 #define HIP_TWO_TO_54 CUDART_TWO_TO_54 #define HIP_TWO_TO_M54 CUDART_TWO_TO_M54 #define HIP_TWO_TO_M1022 CUDART_TWO_TO_M1022 #define HIP_TRIG_PLOSS CUDART_TRIG_PLOSS #define HIP_DBL2INT_CVT CUDART_DBL2INT_CVT #endif clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_runtime.h000066400000000000000000000107411450307266000255370ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_RUNTIME_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_RUNTIME_H #include #include #define HIP_KERNEL_NAME(...) __VA_ARGS__ typedef int hipLaunchParm; #define hipLaunchKernelGGLInternal(kernelName, numBlocks, numThreads, memPerBlock, streamId, ...) \ do { \ kernelName<<>>(__VA_ARGS__); \ } while (0) #define hipLaunchKernelGGL(kernelName, ...) hipLaunchKernelGGLInternal((kernelName), __VA_ARGS__) #define hipReadModeElementType cudaReadModeElementType #ifdef __CUDA_ARCH__ // 32-bit Atomics: #define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (__CUDA_ARCH__ >= 110) #define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (__CUDA_ARCH__ >= 110) #define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (__CUDA_ARCH__ >= 120) #define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (__CUDA_ARCH__ >= 120) #define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (__CUDA_ARCH__ >= 200) // 64-bit Atomics: #define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (__CUDA_ARCH__ >= 200) #define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (__CUDA_ARCH__ >= 120) // Doubles #define __HIP_ARCH_HAS_DOUBLES__ (__CUDA_ARCH__ >= 120) // warp cross-lane operations: #define __HIP_ARCH_HAS_WARP_VOTE__ (__CUDA_ARCH__ >= 120) #define __HIP_ARCH_HAS_WARP_BALLOT__ (__CUDA_ARCH__ >= 200) #define __HIP_ARCH_HAS_WARP_SHUFFLE__ (__CUDA_ARCH__ >= 300) #define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (__CUDA_ARCH__ >= 350) // sync #define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (__CUDA_ARCH__ >= 200) #define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (__CUDA_ARCH__ >= 200) // misc #define __HIP_ARCH_HAS_SURFACE_FUNCS__ (__CUDA_ARCH__ >= 200) #define __HIP_ARCH_HAS_3DGRID__ (__CUDA_ARCH__ >= 200) #define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (__CUDA_ARCH__ >= 350) #endif #ifdef __CUDACC__ #include "nvidia_hip_atomics.h" #include "nvidia_hip_unsafe_atomics.h" #define hipThreadIdx_x threadIdx.x #define hipThreadIdx_y threadIdx.y #define hipThreadIdx_z threadIdx.z #define hipBlockIdx_x blockIdx.x #define hipBlockIdx_y blockIdx.y #define hipBlockIdx_z blockIdx.z #define hipBlockDim_x blockDim.x #define hipBlockDim_y blockDim.y #define hipBlockDim_z blockDim.z #define hipGridDim_x gridDim.x #define hipGridDim_y gridDim.y #define hipGridDim_z gridDim.z #define HIP_SYMBOL(X) &X /** * Map HIP_DYNAMIC_SHARED to "extern __shared__" for compatibility with old HIP applications * To be removed in a future release. */ #define HIP_DYNAMIC_SHARED(type, var) extern __shared__ type var[]; #define HIP_DYNAMIC_SHARED_ATTRIBUTE #ifdef __HIP_DEVICE_COMPILE__ #define abort_() \ { asm("trap;"); } #undef assert #define assert(COND) \ { \ if (!COND) { \ abort_(); \ } \ } #endif #define __clock() clock() #define __clock64() clock64() #endif #endif clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_runtime_api.h000066400000000000000000005251071450307266000263770ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_RUNTIME_API_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_RUNTIME_API_H #include #include #include #include #include #define CUDA_9000 9000 #define CUDA_10010 10010 #define CUDA_10020 10020 #define CUDA_11010 11010 #define CUDA_11020 11020 #define CUDA_11030 11030 #define CUDA_11040 11040 #define CUDA_11060 11060 #define CUDA_12000 12000 #ifdef __cplusplus extern "C" { #endif #ifdef __cplusplus #define __dparm(x) = x #else #define __dparm(x) #endif // Add Deprecated Support for CUDA Mapped HIP APIs #if defined(__DOXYGEN_ONLY__) || defined(HIP_ENABLE_DEPRECATED) #define __HIP_DEPRECATED #elif defined(_MSC_VER) #define __HIP_DEPRECATED __declspec(deprecated) #elif defined(__GNUC__) #define __HIP_DEPRECATED __attribute__((deprecated)) #else #define __HIP_DEPRECATED #endif // Add Deprecated Support for CUDA Mapped HIP APIs #if defined(__DOXYGEN_ONLY__) || defined(HIP_ENABLE_DEPRECATED) #define __HIP_DEPRECATED_MSG(msg) #elif defined(_MSC_VER) #define __HIP_DEPRECATED_MSG(msg) __declspec(deprecated(msg)) #elif defined(__GNUC__) #define __HIP_DEPRECATED_MSG(msg) __attribute__((deprecated(msg))) #else #define __HIP_DEPRECATED_MSG(msg) #endif // TODO -move to include/hip_runtime_api.h as a common implementation. /** * Memory copy types * */ typedef enum cudaMemcpyKind hipMemcpyKind; #define hipMemcpyHostToHost cudaMemcpyHostToHost #define hipMemcpyHostToDevice cudaMemcpyHostToDevice #define hipMemcpyDeviceToHost cudaMemcpyDeviceToHost #define hipMemcpyDeviceToDevice cudaMemcpyDeviceToDevice #define hipMemcpyDefault cudaMemcpyDefault typedef enum hipMemoryAdvise { hipMemAdviseSetReadMostly, hipMemAdviseUnsetReadMostly, hipMemAdviseSetPreferredLocation, hipMemAdviseUnsetPreferredLocation, hipMemAdviseSetAccessedBy, hipMemAdviseUnsetAccessedBy } hipMemoryAdvise; // hipDataType #define hipDataType cudaDataType #define HIP_R_16F CUDA_R_16F #define HIP_C_16F CUDA_C_16F #define HIP_R_16BF CUDA_R_16BF #define HIP_C_16BF CUDA_C_16BF #define HIP_R_32F CUDA_R_32F #define HIP_C_32F CUDA_C_32F #define HIP_R_64F CUDA_R_64F #define HIP_C_64F CUDA_C_64F #define HIP_R_4I CUDA_R_4I #define HIP_C_4I CUDA_C_4I #define HIP_R_4U CUDA_R_4U #define HIP_C_4U CUDA_C_4U #define HIP_R_8I CUDA_R_8I #define HIP_C_8I CUDA_C_8I #define HIP_R_8U CUDA_R_8U #define HIP_C_8U CUDA_C_8U #define HIP_R_16I CUDA_R_16I #define HIP_C_16I CUDA_C_16I #define HIP_R_16U CUDA_R_16U #define HIP_C_16U CUDA_C_16U #define HIP_R_32I CUDA_R_32I #define HIP_C_32I CUDA_C_32I #define HIP_R_32U CUDA_R_32U #define HIP_C_32U CUDA_C_32U #define HIP_R_64I CUDA_R_64I #define HIP_C_64I CUDA_C_64I #define HIP_R_64U CUDA_R_64U #define HIP_C_64U CUDA_C_64U // hip stream operation masks #define STREAM_OPS_WAIT_MASK_32 0xFFFFFFFF #define STREAM_OPS_WAIT_MASK_64 0xFFFFFFFFFFFFFFFF // stream operation flags #define hipStreamWaitValueGte CU_STREAM_WAIT_VALUE_GEQ #define hipStreamWaitValueEq CU_STREAM_WAIT_VALUE_EQ #define hipStreamWaitValueAnd CU_STREAM_WAIT_VALUE_AND #define hipStreamWaitValueNor CU_STREAM_WAIT_VALUE_NOR // hipLibraryPropertyType #define hipLibraryPropertyType libraryPropertyType #define HIP_LIBRARY_MAJOR_VERSION MAJOR_VERSION #define HIP_LIBRARY_MINOR_VERSION MINOR_VERSION #define HIP_LIBRARY_PATCH_LEVEL PATCH_LEVEL #define HIP_ARRAY_DESCRIPTOR CUDA_ARRAY_DESCRIPTOR #define HIP_ARRAY3D_DESCRIPTOR CUDA_ARRAY3D_DESCRIPTOR //hipArray_Format #define HIP_AD_FORMAT_UNSIGNED_INT8 CU_AD_FORMAT_UNSIGNED_INT8 #define HIP_AD_FORMAT_UNSIGNED_INT16 CU_AD_FORMAT_UNSIGNED_INT16 #define HIP_AD_FORMAT_UNSIGNED_INT32 CU_AD_FORMAT_UNSIGNED_INT32 #define HIP_AD_FORMAT_SIGNED_INT8 CU_AD_FORMAT_SIGNED_INT8 #define HIP_AD_FORMAT_SIGNED_INT16 CU_AD_FORMAT_SIGNED_INT16 #define HIP_AD_FORMAT_SIGNED_INT32 CU_AD_FORMAT_SIGNED_INT32 #define HIP_AD_FORMAT_HALF CU_AD_FORMAT_HALF #define HIP_AD_FORMAT_FLOAT CU_AD_FORMAT_FLOAT // hipArray_Format #define hipArray_Format CUarray_format inline static CUarray_format hipArray_FormatToCUarray_format( hipArray_Format format) { switch (format) { case HIP_AD_FORMAT_UNSIGNED_INT8: return CU_AD_FORMAT_UNSIGNED_INT8; case HIP_AD_FORMAT_UNSIGNED_INT16: return CU_AD_FORMAT_UNSIGNED_INT16; case HIP_AD_FORMAT_UNSIGNED_INT32: return CU_AD_FORMAT_UNSIGNED_INT32; case HIP_AD_FORMAT_SIGNED_INT8: return CU_AD_FORMAT_SIGNED_INT8; case HIP_AD_FORMAT_SIGNED_INT16: return CU_AD_FORMAT_SIGNED_INT16; case HIP_AD_FORMAT_SIGNED_INT32: return CU_AD_FORMAT_SIGNED_INT32; case HIP_AD_FORMAT_HALF: return CU_AD_FORMAT_HALF; case HIP_AD_FORMAT_FLOAT: return CU_AD_FORMAT_FLOAT; default: return CU_AD_FORMAT_UNSIGNED_INT8; } } #define HIP_TR_ADDRESS_MODE_WRAP CU_TR_ADDRESS_MODE_WRAP #define HIP_TR_ADDRESS_MODE_CLAMP CU_TR_ADDRESS_MODE_CLAMP #define HIP_TR_ADDRESS_MODE_MIRROR CU_TR_ADDRESS_MODE_MIRROR #define HIP_TR_ADDRESS_MODE_BORDER CU_TR_ADDRESS_MODE_BORDER // hipAddress_mode #define hipAddress_mode CUaddress_mode inline static CUaddress_mode hipAddress_modeToCUaddress_mode( hipAddress_mode mode) { switch (mode) { case HIP_TR_ADDRESS_MODE_WRAP: return CU_TR_ADDRESS_MODE_WRAP; case HIP_TR_ADDRESS_MODE_CLAMP: return CU_TR_ADDRESS_MODE_CLAMP; case HIP_TR_ADDRESS_MODE_MIRROR: return CU_TR_ADDRESS_MODE_MIRROR; case HIP_TR_ADDRESS_MODE_BORDER: return CU_TR_ADDRESS_MODE_BORDER; default: return CU_TR_ADDRESS_MODE_WRAP; } } #define HIP_TR_FILTER_MODE_POINT CU_TR_FILTER_MODE_POINT #define HIP_TR_FILTER_MODE_LINEAR CU_TR_FILTER_MODE_LINEAR // hipFilter_mode #define hipFilter_mode CUfilter_mode inline static CUfilter_mode hipFilter_mode_enumToCUfilter_mode( hipFilter_mode mode) { switch (mode) { case HIP_TR_FILTER_MODE_POINT: return CU_TR_FILTER_MODE_POINT; case HIP_TR_FILTER_MODE_LINEAR: return CU_TR_FILTER_MODE_LINEAR; default: return CU_TR_FILTER_MODE_POINT; } } //hipResourcetype #define HIP_RESOURCE_TYPE_ARRAY CU_RESOURCE_TYPE_ARRAY #define HIP_RESOURCE_TYPE_MIPMAPPED_ARRAY CU_RESOURCE_TYPE_MIPMAPPED_ARRAY #define HIP_RESOURCE_TYPE_LINEAR CU_RESOURCE_TYPE_LINEAR #define HIP_RESOURCE_TYPE_PITCH2D CU_RESOURCE_TYPE_PITCH2D // hipResourcetype #define hipResourcetype CUresourcetype inline static CUresourcetype hipResourcetype_enumToCUresourcetype( hipResourcetype resType) { switch (resType) { case HIP_RESOURCE_TYPE_ARRAY: return CU_RESOURCE_TYPE_ARRAY; case HIP_RESOURCE_TYPE_MIPMAPPED_ARRAY: return CU_RESOURCE_TYPE_MIPMAPPED_ARRAY; case HIP_RESOURCE_TYPE_LINEAR: return CU_RESOURCE_TYPE_LINEAR; case HIP_RESOURCE_TYPE_PITCH2D: return CU_RESOURCE_TYPE_PITCH2D; default: return CU_RESOURCE_TYPE_ARRAY; } } // hipStreamPerThread #define hipStreamPerThread ((cudaStream_t)2) #define hipTexRef CUtexref #define hiparray CUarray typedef CUmipmappedArray hipmipmappedArray; typedef cudaMipmappedArray_t hipMipmappedArray_t; #define HIP_TRSA_OVERRIDE_FORMAT CU_TRSA_OVERRIDE_FORMAT #define HIP_TRSF_READ_AS_INTEGER CU_TRSF_READ_AS_INTEGER #define HIP_TRSF_NORMALIZED_COORDINATES CU_TRSF_NORMALIZED_COORDINATES #define HIP_TRSF_SRGB CU_TRSF_SRGB // hipTextureAddressMode typedef enum cudaTextureAddressMode hipTextureAddressMode; #define hipAddressModeWrap cudaAddressModeWrap #define hipAddressModeClamp cudaAddressModeClamp #define hipAddressModeMirror cudaAddressModeMirror #define hipAddressModeBorder cudaAddressModeBorder // hipTextureFilterMode typedef enum cudaTextureFilterMode hipTextureFilterMode; #define hipFilterModePoint cudaFilterModePoint #define hipFilterModeLinear cudaFilterModeLinear // hipTextureReadMode typedef enum cudaTextureReadMode hipTextureReadMode; #define hipReadModeElementType cudaReadModeElementType #define hipReadModeNormalizedFloat cudaReadModeNormalizedFloat // hipChannelFormatKind typedef enum cudaChannelFormatKind hipChannelFormatKind; #define hipChannelFormatKindSigned cudaChannelFormatKindSigned #define hipChannelFormatKindUnsigned cudaChannelFormatKindUnsigned #define hipChannelFormatKindFloat cudaChannelFormatKindFloat #define hipChannelFormatKindNone cudaChannelFormatKindNone // hipMemRangeAttribute typedef enum cudaMemRangeAttribute hipMemRangeAttribute; #define hipMemRangeAttributeReadMostly cudaMemRangeAttributeReadMostly #define hipMemRangeAttributePreferredLocation cudaMemRangeAttributePreferredLocation #define hipMemRangeAttributeAccessedBy cudaMemRangeAttributeAccessedBy #define hipMemRangeAttributeLastPrefetchLocation cudaMemRangeAttributeLastPrefetchLocation #define hipSurfaceBoundaryMode cudaSurfaceBoundaryMode #define hipBoundaryModeZero cudaBoundaryModeZero #define hipBoundaryModeTrap cudaBoundaryModeTrap #define hipBoundaryModeClamp cudaBoundaryModeClamp // hipFuncCache #define hipFuncCachePreferNone cudaFuncCachePreferNone #define hipFuncCachePreferShared cudaFuncCachePreferShared #define hipFuncCachePreferL1 cudaFuncCachePreferL1 #define hipFuncCachePreferEqual cudaFuncCachePreferEqual // hipResourceType #define hipResourceType cudaResourceType #define hipResourceTypeArray cudaResourceTypeArray #define hipResourceTypeMipmappedArray cudaResourceTypeMipmappedArray #define hipResourceTypeLinear cudaResourceTypeLinear #define hipResourceTypePitch2D cudaResourceTypePitch2D // // hipErrorNoDevice. // hipResourceViewFormat typedef enum cudaResourceViewFormat hipResourceViewFormat; #define hipResViewFormatNone cudaResViewFormatNone #define hipResViewFormatUnsignedChar1 cudaResViewFormatUnsignedChar1 #define hipResViewFormatUnsignedChar2 cudaResViewFormatUnsignedChar2 #define hipResViewFormatUnsignedChar4 cudaResViewFormatUnsignedChar4 #define hipResViewFormatSignedChar1 cudaResViewFormatSignedChar1 #define hipResViewFormatSignedChar2 cudaResViewFormatSignedChar2 #define hipResViewFormatSignedChar4 cudaResViewFormatSignedChar4 #define hipResViewFormatUnsignedShort1 cudaResViewFormatUnsignedShort1 #define hipResViewFormatUnsignedShort2 cudaResViewFormatUnsignedShort2 #define hipResViewFormatUnsignedShort4 cudaResViewFormatUnsignedShort4 #define hipResViewFormatSignedShort1 cudaResViewFormatSignedShort1 #define hipResViewFormatSignedShort2 cudaResViewFormatSignedShort2 #define hipResViewFormatSignedShort4 cudaResViewFormatSignedShort4 #define hipResViewFormatUnsignedInt1 cudaResViewFormatUnsignedInt1 #define hipResViewFormatUnsignedInt2 cudaResViewFormatUnsignedInt2 #define hipResViewFormatUnsignedInt4 cudaResViewFormatUnsignedInt4 #define hipResViewFormatSignedInt1 cudaResViewFormatSignedInt1 #define hipResViewFormatSignedInt2 cudaResViewFormatSignedInt2 #define hipResViewFormatSignedInt4 cudaResViewFormatSignedInt4 #define hipResViewFormatHalf1 cudaResViewFormatHalf1 #define hipResViewFormatHalf2 cudaResViewFormatHalf2 #define hipResViewFormatHalf4 cudaResViewFormatHalf4 #define hipResViewFormatFloat1 cudaResViewFormatFloat1 #define hipResViewFormatFloat2 cudaResViewFormatFloat2 #define hipResViewFormatFloat4 cudaResViewFormatFloat4 #define hipResViewFormatUnsignedBlockCompressed1 cudaResViewFormatUnsignedBlockCompressed1 #define hipResViewFormatUnsignedBlockCompressed2 cudaResViewFormatUnsignedBlockCompressed2 #define hipResViewFormatUnsignedBlockCompressed3 cudaResViewFormatUnsignedBlockCompressed3 #define hipResViewFormatUnsignedBlockCompressed4 cudaResViewFormatUnsignedBlockCompressed4 #define hipResViewFormatSignedBlockCompressed4 cudaResViewFormatSignedBlockCompressed4 #define hipResViewFormatUnsignedBlockCompressed5 cudaResViewFormatUnsignedBlockCompressed5 #define hipResViewFormatSignedBlockCompressed5 cudaResViewFormatSignedBlockCompressed5 #define hipResViewFormatUnsignedBlockCompressed6H cudaResViewFormatUnsignedBlockCompressed6H #define hipResViewFormatSignedBlockCompressed6H cudaResViewFormatSignedBlockCompressed6H #define hipResViewFormatUnsignedBlockCompressed7 cudaResViewFormatUnsignedBlockCompressed7 //! Flags that can be used with hipEventCreateWithFlags: #define hipEventDefault cudaEventDefault #define hipEventBlockingSync cudaEventBlockingSync #define hipEventDisableTiming cudaEventDisableTiming #define hipEventInterprocess cudaEventInterprocess #define hipEventReleaseToDevice 0 /* no-op on CUDA platform */ #define hipEventReleaseToSystem 0 /* no-op on CUDA platform */ #define hipHostMallocDefault cudaHostAllocDefault #define hipHostMallocPortable cudaHostAllocPortable #define hipHostMallocMapped cudaHostAllocMapped #define hipHostMallocWriteCombined cudaHostAllocWriteCombined #define hipHostMallocCoherent 0x0 #define hipHostMallocNonCoherent 0x0 #define hipMemAttachGlobal cudaMemAttachGlobal #define hipMemAttachHost cudaMemAttachHost #define hipMemAttachSingle cudaMemAttachSingle #define hipHostRegisterDefault cudaHostRegisterDefault #define hipHostRegisterPortable cudaHostRegisterPortable #define hipHostRegisterMapped cudaHostRegisterMapped #define hipHostRegisterIoMemory cudaHostRegisterIoMemory #define hipHostRegisterReadOnly cudaHostRegisterReadOnly #define HIP_LAUNCH_PARAM_BUFFER_POINTER CU_LAUNCH_PARAM_BUFFER_POINTER #define HIP_LAUNCH_PARAM_BUFFER_SIZE CU_LAUNCH_PARAM_BUFFER_SIZE #define HIP_LAUNCH_PARAM_END CU_LAUNCH_PARAM_END #define hipLimitPrintfFifoSize cudaLimitPrintfFifoSize #define hipLimitMallocHeapSize cudaLimitMallocHeapSize #define hipLimitStackSize cudaLimitStackSize #define hipIpcMemLazyEnablePeerAccess cudaIpcMemLazyEnablePeerAccess #define hipOccupancyDefault cudaOccupancyDefault #define hipOccupancyDisableCachingOverride cudaOccupancyDisableCachingOverride #define hipCooperativeLaunchMultiDeviceNoPreSync \ cudaCooperativeLaunchMultiDeviceNoPreSync #define hipCooperativeLaunchMultiDeviceNoPostSync \ cudaCooperativeLaunchMultiDeviceNoPostSync // enum CUjit_option redefines #define HIPRTC_JIT_MAX_REGISTERS CU_JIT_MAX_REGISTERS #define HIPRTC_JIT_THREADS_PER_BLOCK CU_JIT_THREADS_PER_BLOCK #define HIPRTC_JIT_WALL_TIME CU_JIT_WALL_TIME #define HIPRTC_JIT_INFO_LOG_BUFFER CU_JIT_INFO_LOG_BUFFER #define HIPRTC_JIT_INFO_LOG_BUFFER_SIZE_BYTES CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES #define HIPRTC_JIT_ERROR_LOG_BUFFER CU_JIT_ERROR_LOG_BUFFER #define HIPRTC_JIT_ERROR_LOG_BUFFER_SIZE_BYTES CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES #define HIPRTC_JIT_OPTIMIZATION_LEVEL CU_JIT_OPTIMIZATION_LEVEL #define HIPRTC_JIT_TARGET_FROM_HIPCONTEXT CU_JIT_TARGET_FROM_CUCONTEXT #define HIPRTC_JIT_TARGET CU_JIT_TARGET #define HIPRTC_JIT_FALLBACK_STRATEGY CU_JIT_FALLBACK_STRATEGY #define HIPRTC_JIT_GENERATE_DEBUG_INFO CU_JIT_GENERATE_DEBUG_INFO #define HIPRTC_JIT_LOG_VERBOSE CU_JIT_LOG_VERBOSE #define HIPRTC_JIT_GENERATE_LINE_INFO CU_JIT_GENERATE_LINE_INFO #define HIPRTC_JIT_CACHE_MODE CU_JIT_CACHE_MODE #define HIPRTC_JIT_NEW_SM3X_OPT CU_JIT_NEW_SM3X_OPT #define HIPRTC_JIT_FAST_COMPILE CU_JIT_FAST_COMPILE #define HIPRTC_JIT_NUM_OPTIONS CU_JIT_NUM_OPTIONS typedef cudaEvent_t hipEvent_t; typedef cudaStream_t hipStream_t; typedef cudaIpcEventHandle_t hipIpcEventHandle_t; typedef cudaIpcMemHandle_t hipIpcMemHandle_t; typedef enum cudaLimit hipLimit_t; typedef enum cudaFuncAttribute hipFuncAttribute; typedef enum cudaFuncCache hipFuncCache_t; typedef CUcontext hipCtx_t; typedef enum cudaSharedMemConfig hipSharedMemConfig; typedef CUfunc_cache hipFuncCache; typedef CUjit_option hipJitOption; typedef CUdevice hipDevice_t; typedef enum cudaDeviceP2PAttr hipDeviceP2PAttr; #define hipDevP2PAttrPerformanceRank cudaDevP2PAttrPerformanceRank #define hipDevP2PAttrAccessSupported cudaDevP2PAttrAccessSupported #define hipDevP2PAttrNativeAtomicSupported cudaDevP2PAttrNativeAtomicSupported #define hipDevP2PAttrHipArrayAccessSupported cudaDevP2PAttrCudaArrayAccessSupported #define hipFuncAttributeMaxDynamicSharedMemorySize cudaFuncAttributeMaxDynamicSharedMemorySize #define hipFuncAttributePreferredSharedMemoryCarveout cudaFuncAttributePreferredSharedMemoryCarveout typedef CUmodule hipModule_t; typedef CUfunction hipFunction_t; typedef CUdeviceptr hipDeviceptr_t; typedef struct cudaArray hipArray; typedef struct cudaArray* hipArray_t; typedef struct cudaArray* hipArray_const_t; typedef struct cudaFuncAttributes hipFuncAttributes; typedef struct cudaLaunchParams hipLaunchParams; typedef CUDA_LAUNCH_PARAMS hipFunctionLaunchParams; #define hipFunction_attribute CUfunction_attribute #define hipPointer_attribute CUpointer_attribute #define hip_Memcpy2D CUDA_MEMCPY2D #define HIP_MEMCPY3D CUDA_MEMCPY3D #define hipMemcpy3DParms cudaMemcpy3DParms #define hipArrayDefault cudaArrayDefault #define hipArrayLayered cudaArrayLayered #define hipArraySurfaceLoadStore cudaArraySurfaceLoadStore #define hipArrayCubemap cudaArrayCubemap #define hipArrayTextureGather cudaArrayTextureGather typedef cudaTextureObject_t hipTextureObject_t; typedef cudaSurfaceObject_t hipSurfaceObject_t; #define hipTextureType1D cudaTextureType1D #define hipTextureType1DLayered cudaTextureType1DLayered #define hipTextureType2D cudaTextureType2D #define hipTextureType2DLayered cudaTextureType2DLayered #define hipTextureType3D cudaTextureType3D #define hipDeviceScheduleAuto cudaDeviceScheduleAuto #define hipDeviceScheduleSpin cudaDeviceScheduleSpin #define hipDeviceScheduleYield cudaDeviceScheduleYield #define hipDeviceScheduleBlockingSync cudaDeviceScheduleBlockingSync #define hipDeviceScheduleMask cudaDeviceScheduleMask #define hipDeviceMapHost cudaDeviceMapHost #define hipDeviceLmemResizeToMax cudaDeviceLmemResizeToMax #define hipCpuDeviceId cudaCpuDeviceId #define hipInvalidDeviceId cudaInvalidDeviceId typedef struct cudaExtent hipExtent; typedef struct cudaPitchedPtr hipPitchedPtr; typedef struct cudaPos hipPos; #define make_hipExtent make_cudaExtent #define make_hipPos make_cudaPos #define make_hipPitchedPtr make_cudaPitchedPtr // Flags that can be used with hipStreamCreateWithFlags #define hipStreamDefault cudaStreamDefault #define hipStreamNonBlocking cudaStreamNonBlocking typedef cudaMemPool_t hipMemPool_t; typedef enum cudaMemPoolAttr hipMemPoolAttr; #define hipMemPoolReuseFollowEventDependencies cudaMemPoolReuseFollowEventDependencies #define hipMemPoolReuseAllowOpportunistic cudaMemPoolReuseAllowOpportunistic #define hipMemPoolReuseAllowInternalDependencies cudaMemPoolReuseAllowInternalDependencies #define hipMemPoolAttrReleaseThreshold cudaMemPoolAttrReleaseThreshold #define hipMemPoolAttrReservedMemCurrent cudaMemPoolAttrReservedMemCurrent #define hipMemPoolAttrReservedMemHigh cudaMemPoolAttrReservedMemHigh #define hipMemPoolAttrUsedMemCurrent cudaMemPoolAttrUsedMemCurrent #define hipMemPoolAttrUsedMemHigh cudaMemPoolAttrUsedMemHigh typedef struct cudaMemLocation hipMemLocation; typedef struct cudaMemPoolProps hipMemPoolProps; typedef struct cudaMemAccessDesc hipMemAccessDesc; typedef enum cudaMemAccessFlags hipMemAccessFlags; #define hipMemAccessFlagsProtNone cudaMemAccessFlagsProtNone #define hipMemAccessFlagsProtRead cudaMemAccessFlagsProtRead #define hipMemAccessFlagsProtReadWrite cudaMemAccessFlagsProtReadWrite typedef enum cudaMemAllocationHandleType hipMemAllocationHandleType; typedef struct cudaMemPoolPtrExportData hipMemPoolPtrExportData; typedef struct cudaChannelFormatDesc hipChannelFormatDesc; typedef struct cudaResourceDesc hipResourceDesc; typedef struct cudaTextureDesc hipTextureDesc; typedef struct cudaResourceViewDesc hipResourceViewDesc; typedef CUDA_RESOURCE_DESC HIP_RESOURCE_DESC; typedef CUDA_TEXTURE_DESC HIP_TEXTURE_DESC; typedef CUDA_RESOURCE_VIEW_DESC HIP_RESOURCE_VIEW_DESC; // adding code for hipmemSharedConfig #define hipSharedMemBankSizeDefault cudaSharedMemBankSizeDefault #define hipSharedMemBankSizeFourByte cudaSharedMemBankSizeFourByte #define hipSharedMemBankSizeEightByte cudaSharedMemBankSizeEightByte //Function Attributes #define HIP_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK #define HIP_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES #define HIP_FUNC_ATTRIBUTE_CONST_SIZE_BYTES CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES #define HIP_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES #define HIP_FUNC_ATTRIBUTE_NUM_REGS CU_FUNC_ATTRIBUTE_NUM_REGS #define HIP_FUNC_ATTRIBUTE_PTX_VERSION CU_FUNC_ATTRIBUTE_PTX_VERSION #define HIP_FUNC_ATTRIBUTE_BINARY_VERSION CU_FUNC_ATTRIBUTE_BINARY_VERSION #define HIP_FUNC_ATTRIBUTE_CACHE_MODE_CA CU_FUNC_ATTRIBUTE_CACHE_MODE_CA #define HIP_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES #define HIP_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT #define HIP_FUNC_ATTRIBUTE_MAX CU_FUNC_ATTRIBUTE_MAX //Pointer Attributes #define HIP_POINTER_ATTRIBUTE_CONTEXT CU_POINTER_ATTRIBUTE_CONTEXT #define HIP_POINTER_ATTRIBUTE_MEMORY_TYPE CU_POINTER_ATTRIBUTE_MEMORY_TYPE #define HIP_POINTER_ATTRIBUTE_DEVICE_POINTER CU_POINTER_ATTRIBUTE_DEVICE_POINTER #define HIP_POINTER_ATTRIBUTE_HOST_POINTER CU_POINTER_ATTRIBUTE_HOST_POINTER #define HIP_POINTER_ATTRIBUTE_P2P_TOKENS CU_POINTER_ATTRIBUTE_P2P_TOKENS #define HIP_POINTER_ATTRIBUTE_SYNC_MEMOPS CU_POINTER_ATTRIBUTE_SYNC_MEMOPS #define HIP_POINTER_ATTRIBUTE_BUFFER_ID CU_POINTER_ATTRIBUTE_BUFFER_ID #define HIP_POINTER_ATTRIBUTE_IS_MANAGED CU_POINTER_ATTRIBUTE_IS_MANAGED #define HIP_POINTER_ATTRIBUTE_DEVICE_ORDINAL CU_POINTER_ATTRIBUTE_DEVICE_ORDINAL #define HIP_POINTER_ATTRIBUTE_IS_LEGACY_HIP_IPC_CAPABLE CU_POINTER_ATTRIBUTE_IS_LEGACY_CUDA_IPC_CAPABLE #define HIP_POINTER_ATTRIBUTE_RANGE_START_ADDR CU_POINTER_ATTRIBUTE_RANGE_START_ADDR #define HIP_POINTER_ATTRIBUTE_RANGE_SIZE CU_POINTER_ATTRIBUTE_RANGE_SIZE #define HIP_POINTER_ATTRIBUTE_MAPPED CU_POINTER_ATTRIBUTE_MAPPED #define HIP_POINTER_ATTRIBUTE_ALLOWED_HANDLE_TYPES CU_POINTER_ATTRIBUTE_ALLOWED_HANDLE_TYPES #define HIP_POINTER_ATTRIBUTE_IS_GPU_DIRECT_RDMA_CAPABLE CU_POINTER_ATTRIBUTE_IS_GPU_DIRECT_RDMA_CAPABLE #define HIP_POINTER_ATTRIBUTE_ACCESS_FLAGS CU_POINTER_ATTRIBUTE_ACCESS_FLAGS #define HIP_POINTER_ATTRIBUTE_MEMPOOL_HANDLE CU_POINTER_ATTRIBUTE_MEMPOOL_HANDLE typedef enum cudaGraphInstantiateFlags hipGraphInstantiateFlags; #define hipGraphInstantiateFlagAutoFreeOnLaunch cudaGraphInstantiateFlagAutoFreeOnLaunch #define hipGraphInstantiateFlagUpload cudaGraphInstantiateFlagUpload #define hipGraphInstantiateFlagDeviceLaunch cudaGraphInstantiateFlagDeviceLaunch #define hipGraphInstantiateFlagUseNodePriority cudaGraphInstantiateFlagUseNodePriority #if CUDA_VERSION >= CUDA_9000 #define __shfl(...) __shfl_sync(0xffffffff, __VA_ARGS__) #define __shfl_up(...) __shfl_up_sync(0xffffffff, __VA_ARGS__) #define __shfl_down(...) __shfl_down_sync(0xffffffff, __VA_ARGS__) #define __shfl_xor(...) __shfl_xor_sync(0xffffffff, __VA_ARGS__) #endif // CUDA_VERSION >= CUDA_9000 inline static hipError_t hipCUDAErrorTohipError(cudaError_t cuError) { switch (cuError) { case cudaSuccess: return hipSuccess; case cudaErrorProfilerDisabled: return hipErrorProfilerDisabled; case cudaErrorProfilerNotInitialized: return hipErrorProfilerNotInitialized; case cudaErrorProfilerAlreadyStarted: return hipErrorProfilerAlreadyStarted; case cudaErrorProfilerAlreadyStopped: return hipErrorProfilerAlreadyStopped; case cudaErrorInsufficientDriver: return hipErrorInsufficientDriver; case cudaErrorUnsupportedLimit: return hipErrorUnsupportedLimit; case cudaErrorPeerAccessUnsupported: return hipErrorPeerAccessUnsupported; case cudaErrorInvalidGraphicsContext: return hipErrorInvalidGraphicsContext; case cudaErrorSharedObjectSymbolNotFound: return hipErrorSharedObjectSymbolNotFound; case cudaErrorSharedObjectInitFailed: return hipErrorSharedObjectInitFailed; case cudaErrorOperatingSystem: return hipErrorOperatingSystem; case cudaErrorIllegalState: return hipErrorIllegalState; case cudaErrorSetOnActiveProcess: return hipErrorSetOnActiveProcess; case cudaErrorIllegalAddress: return hipErrorIllegalAddress; case cudaErrorInvalidSymbol: return hipErrorInvalidSymbol; case cudaErrorMissingConfiguration: return hipErrorMissingConfiguration; case cudaErrorMemoryAllocation: return hipErrorOutOfMemory; case cudaErrorInitializationError: return hipErrorNotInitialized; case cudaErrorLaunchFailure: return hipErrorLaunchFailure; case cudaErrorCooperativeLaunchTooLarge: return hipErrorCooperativeLaunchTooLarge; case cudaErrorPriorLaunchFailure: return hipErrorPriorLaunchFailure; case cudaErrorLaunchOutOfResources: return hipErrorLaunchOutOfResources; case cudaErrorInvalidDeviceFunction: return hipErrorInvalidDeviceFunction; case cudaErrorInvalidConfiguration: return hipErrorInvalidConfiguration; case cudaErrorInvalidDevice: return hipErrorInvalidDevice; case cudaErrorInvalidValue: return hipErrorInvalidValue; case cudaErrorInvalidPitchValue: return hipErrorInvalidPitchValue; case cudaErrorInvalidDevicePointer: return hipErrorInvalidDevicePointer; case cudaErrorInvalidMemcpyDirection: return hipErrorInvalidMemcpyDirection; case cudaErrorInvalidResourceHandle: return hipErrorInvalidHandle; case cudaErrorNotReady: return hipErrorNotReady; case cudaErrorNoDevice: return hipErrorNoDevice; case cudaErrorPeerAccessAlreadyEnabled: return hipErrorPeerAccessAlreadyEnabled; case cudaErrorPeerAccessNotEnabled: return hipErrorPeerAccessNotEnabled; case cudaErrorContextIsDestroyed: return hipErrorContextIsDestroyed; case cudaErrorHostMemoryAlreadyRegistered: return hipErrorHostMemoryAlreadyRegistered; case cudaErrorHostMemoryNotRegistered: return hipErrorHostMemoryNotRegistered; case cudaErrorMapBufferObjectFailed: return hipErrorMapFailed; case cudaErrorAssert: return hipErrorAssert; case cudaErrorNotSupported: return hipErrorNotSupported; case cudaErrorCudartUnloading: return hipErrorDeinitialized; case cudaErrorInvalidKernelImage: return hipErrorInvalidImage; case cudaErrorUnmapBufferObjectFailed: return hipErrorUnmapFailed; case cudaErrorNoKernelImageForDevice: return hipErrorNoBinaryForGpu; case cudaErrorECCUncorrectable: return hipErrorECCNotCorrectable; case cudaErrorDeviceAlreadyInUse: return hipErrorContextAlreadyInUse; case cudaErrorInvalidPtx: return hipErrorInvalidKernelFile; case cudaErrorLaunchTimeout: return hipErrorLaunchTimeOut; #if CUDA_VERSION >= CUDA_10010 case cudaErrorInvalidSource: return hipErrorInvalidSource; case cudaErrorFileNotFound: return hipErrorFileNotFound; case cudaErrorSymbolNotFound: return hipErrorNotFound; case cudaErrorArrayIsMapped: return hipErrorArrayIsMapped; case cudaErrorNotMappedAsPointer: return hipErrorNotMappedAsPointer; case cudaErrorNotMappedAsArray: return hipErrorNotMappedAsArray; case cudaErrorNotMapped: return hipErrorNotMapped; case cudaErrorAlreadyAcquired: return hipErrorAlreadyAcquired; case cudaErrorAlreadyMapped: return hipErrorAlreadyMapped; #endif #if CUDA_VERSION >= CUDA_10020 case cudaErrorDeviceUninitialized: return hipErrorInvalidContext; #endif case cudaErrorStreamCaptureUnsupported: return hipErrorStreamCaptureUnsupported; case cudaErrorStreamCaptureInvalidated: return hipErrorStreamCaptureInvalidated; case cudaErrorStreamCaptureMerge: return hipErrorStreamCaptureMerge; case cudaErrorStreamCaptureUnmatched: return hipErrorStreamCaptureUnmatched; case cudaErrorStreamCaptureUnjoined: return hipErrorStreamCaptureUnjoined; case cudaErrorStreamCaptureIsolation: return hipErrorStreamCaptureIsolation; case cudaErrorStreamCaptureImplicit: return hipErrorStreamCaptureImplicit; case cudaErrorCapturedEvent: return hipErrorCapturedEvent; case cudaErrorStreamCaptureWrongThread: return hipErrorStreamCaptureWrongThread; case cudaErrorGraphExecUpdateFailure: return hipErrorGraphExecUpdateFailure; case cudaErrorUnknown: default: return hipErrorUnknown; // Note - translated error. } } inline static hipError_t hipCUResultTohipError(CUresult cuError) { switch (cuError) { case CUDA_SUCCESS: return hipSuccess; case CUDA_ERROR_OUT_OF_MEMORY: return hipErrorOutOfMemory; case CUDA_ERROR_INVALID_VALUE: return hipErrorInvalidValue; case CUDA_ERROR_INVALID_DEVICE: return hipErrorInvalidDevice; case CUDA_ERROR_DEINITIALIZED: return hipErrorDeinitialized; case CUDA_ERROR_NO_DEVICE: return hipErrorNoDevice; case CUDA_ERROR_INVALID_CONTEXT: return hipErrorInvalidContext; case CUDA_ERROR_NOT_INITIALIZED: return hipErrorNotInitialized; case CUDA_ERROR_INVALID_HANDLE: return hipErrorInvalidHandle; case CUDA_ERROR_MAP_FAILED: return hipErrorMapFailed; case CUDA_ERROR_PROFILER_DISABLED: return hipErrorProfilerDisabled; case CUDA_ERROR_PROFILER_NOT_INITIALIZED: return hipErrorProfilerNotInitialized; case CUDA_ERROR_PROFILER_ALREADY_STARTED: return hipErrorProfilerAlreadyStarted; case CUDA_ERROR_PROFILER_ALREADY_STOPPED: return hipErrorProfilerAlreadyStopped; case CUDA_ERROR_INVALID_IMAGE: return hipErrorInvalidImage; case CUDA_ERROR_CONTEXT_ALREADY_CURRENT: return hipErrorContextAlreadyCurrent; case CUDA_ERROR_UNMAP_FAILED: return hipErrorUnmapFailed; case CUDA_ERROR_ARRAY_IS_MAPPED: return hipErrorArrayIsMapped; case CUDA_ERROR_ALREADY_MAPPED: return hipErrorAlreadyMapped; case CUDA_ERROR_NO_BINARY_FOR_GPU: return hipErrorNoBinaryForGpu; case CUDA_ERROR_ALREADY_ACQUIRED: return hipErrorAlreadyAcquired; case CUDA_ERROR_NOT_MAPPED: return hipErrorNotMapped; case CUDA_ERROR_NOT_MAPPED_AS_ARRAY: return hipErrorNotMappedAsArray; case CUDA_ERROR_NOT_MAPPED_AS_POINTER: return hipErrorNotMappedAsPointer; case CUDA_ERROR_ECC_UNCORRECTABLE: return hipErrorECCNotCorrectable; case CUDA_ERROR_UNSUPPORTED_LIMIT: return hipErrorUnsupportedLimit; case CUDA_ERROR_CONTEXT_ALREADY_IN_USE: return hipErrorContextAlreadyInUse; case CUDA_ERROR_PEER_ACCESS_UNSUPPORTED: return hipErrorPeerAccessUnsupported; case CUDA_ERROR_INVALID_PTX: return hipErrorInvalidKernelFile; case CUDA_ERROR_INVALID_GRAPHICS_CONTEXT: return hipErrorInvalidGraphicsContext; case CUDA_ERROR_INVALID_SOURCE: return hipErrorInvalidSource; case CUDA_ERROR_FILE_NOT_FOUND: return hipErrorFileNotFound; case CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND: return hipErrorSharedObjectSymbolNotFound; case CUDA_ERROR_SHARED_OBJECT_INIT_FAILED: return hipErrorSharedObjectInitFailed; case CUDA_ERROR_OPERATING_SYSTEM: return hipErrorOperatingSystem; case CUDA_ERROR_ILLEGAL_STATE: return hipErrorIllegalState; case CUDA_ERROR_NOT_FOUND: return hipErrorNotFound; case CUDA_ERROR_NOT_READY: return hipErrorNotReady; case CUDA_ERROR_ILLEGAL_ADDRESS: return hipErrorIllegalAddress; case CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: return hipErrorLaunchOutOfResources; case CUDA_ERROR_LAUNCH_TIMEOUT: return hipErrorLaunchTimeOut; case CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED: return hipErrorPeerAccessAlreadyEnabled; case CUDA_ERROR_PEER_ACCESS_NOT_ENABLED: return hipErrorPeerAccessNotEnabled; case CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE: return hipErrorSetOnActiveProcess; case CUDA_ERROR_CONTEXT_IS_DESTROYED: return hipErrorContextIsDestroyed; case CUDA_ERROR_ASSERT: return hipErrorAssert; case CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED: return hipErrorHostMemoryAlreadyRegistered; case CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED: return hipErrorHostMemoryNotRegistered; case CUDA_ERROR_LAUNCH_FAILED: return hipErrorLaunchFailure; case CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE: return hipErrorCooperativeLaunchTooLarge; case CUDA_ERROR_NOT_SUPPORTED: return hipErrorNotSupported; case CUDA_ERROR_STREAM_CAPTURE_UNSUPPORTED: return hipErrorStreamCaptureUnsupported; case CUDA_ERROR_STREAM_CAPTURE_INVALIDATED: return hipErrorStreamCaptureInvalidated; case CUDA_ERROR_STREAM_CAPTURE_MERGE: return hipErrorStreamCaptureMerge; case CUDA_ERROR_STREAM_CAPTURE_UNMATCHED: return hipErrorStreamCaptureUnmatched; case CUDA_ERROR_STREAM_CAPTURE_UNJOINED: return hipErrorStreamCaptureUnjoined; case CUDA_ERROR_STREAM_CAPTURE_ISOLATION: return hipErrorStreamCaptureIsolation; case CUDA_ERROR_STREAM_CAPTURE_IMPLICIT: return hipErrorStreamCaptureImplicit; case CUDA_ERROR_CAPTURED_EVENT: return hipErrorCapturedEvent; case CUDA_ERROR_STREAM_CAPTURE_WRONG_THREAD: return hipErrorStreamCaptureWrongThread; case CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE: return hipErrorGraphExecUpdateFailure; case CUDA_ERROR_UNKNOWN: default: return hipErrorUnknown; // Note - translated error. } } inline static CUresult hipErrorToCUResult(hipError_t hError) { switch (hError) { case hipSuccess: return CUDA_SUCCESS; case hipErrorOutOfMemory: return CUDA_ERROR_OUT_OF_MEMORY; case hipErrorInvalidValue: return CUDA_ERROR_INVALID_VALUE; case hipErrorInvalidDevice: return CUDA_ERROR_INVALID_DEVICE; case hipErrorDeinitialized: return CUDA_ERROR_DEINITIALIZED; case hipErrorNoDevice: return CUDA_ERROR_NO_DEVICE; case hipErrorInvalidContext: return CUDA_ERROR_INVALID_CONTEXT; case hipErrorNotInitialized: return CUDA_ERROR_NOT_INITIALIZED; case hipErrorInvalidHandle: return CUDA_ERROR_INVALID_HANDLE; case hipErrorMapFailed: return CUDA_ERROR_MAP_FAILED; case hipErrorProfilerDisabled: return CUDA_ERROR_PROFILER_DISABLED; case hipErrorProfilerNotInitialized: return CUDA_ERROR_PROFILER_NOT_INITIALIZED; case hipErrorProfilerAlreadyStarted: return CUDA_ERROR_PROFILER_ALREADY_STARTED; case hipErrorProfilerAlreadyStopped: return CUDA_ERROR_PROFILER_ALREADY_STOPPED; case hipErrorInvalidImage: return CUDA_ERROR_INVALID_IMAGE; case hipErrorContextAlreadyCurrent: return CUDA_ERROR_CONTEXT_ALREADY_CURRENT; case hipErrorUnmapFailed: return CUDA_ERROR_UNMAP_FAILED; case hipErrorArrayIsMapped: return CUDA_ERROR_ARRAY_IS_MAPPED; case hipErrorAlreadyMapped: return CUDA_ERROR_ALREADY_MAPPED; case hipErrorNoBinaryForGpu: return CUDA_ERROR_NO_BINARY_FOR_GPU; case hipErrorAlreadyAcquired: return CUDA_ERROR_ALREADY_ACQUIRED; case hipErrorNotMapped: return CUDA_ERROR_NOT_MAPPED; case hipErrorNotMappedAsArray: return CUDA_ERROR_NOT_MAPPED_AS_ARRAY; case hipErrorNotMappedAsPointer: return CUDA_ERROR_NOT_MAPPED_AS_POINTER; case hipErrorECCNotCorrectable: return CUDA_ERROR_ECC_UNCORRECTABLE; case hipErrorUnsupportedLimit: return CUDA_ERROR_UNSUPPORTED_LIMIT; case hipErrorContextAlreadyInUse: return CUDA_ERROR_CONTEXT_ALREADY_IN_USE; case hipErrorPeerAccessUnsupported: return CUDA_ERROR_PEER_ACCESS_UNSUPPORTED; case hipErrorInvalidKernelFile: return CUDA_ERROR_INVALID_PTX; case hipErrorInvalidGraphicsContext: return CUDA_ERROR_INVALID_GRAPHICS_CONTEXT; case hipErrorInvalidSource: return CUDA_ERROR_INVALID_SOURCE; case hipErrorFileNotFound: return CUDA_ERROR_FILE_NOT_FOUND; case hipErrorSharedObjectSymbolNotFound: return CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND; case hipErrorSharedObjectInitFailed: return CUDA_ERROR_SHARED_OBJECT_INIT_FAILED; case hipErrorOperatingSystem: return CUDA_ERROR_OPERATING_SYSTEM; case hipErrorIllegalState: return CUDA_ERROR_ILLEGAL_STATE; case hipErrorNotFound: return CUDA_ERROR_NOT_FOUND; case hipErrorNotReady: return CUDA_ERROR_NOT_READY; case hipErrorIllegalAddress: return CUDA_ERROR_ILLEGAL_ADDRESS; case hipErrorLaunchOutOfResources: return CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES; case hipErrorLaunchTimeOut: return CUDA_ERROR_LAUNCH_TIMEOUT; case hipErrorPeerAccessAlreadyEnabled: return CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED; case hipErrorPeerAccessNotEnabled: return CUDA_ERROR_PEER_ACCESS_NOT_ENABLED; case hipErrorSetOnActiveProcess: return CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE; case hipErrorContextIsDestroyed: return CUDA_ERROR_CONTEXT_IS_DESTROYED; case hipErrorAssert: return CUDA_ERROR_ASSERT; case hipErrorHostMemoryAlreadyRegistered: return CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED; case hipErrorHostMemoryNotRegistered: return CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED; case hipErrorLaunchFailure: return CUDA_ERROR_LAUNCH_FAILED; case hipErrorCooperativeLaunchTooLarge: return CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE; case hipErrorNotSupported: return CUDA_ERROR_NOT_SUPPORTED; case hipErrorStreamCaptureUnsupported: return CUDA_ERROR_STREAM_CAPTURE_UNSUPPORTED; case hipErrorStreamCaptureInvalidated: return CUDA_ERROR_STREAM_CAPTURE_INVALIDATED; case hipErrorStreamCaptureMerge: return CUDA_ERROR_STREAM_CAPTURE_MERGE; case hipErrorStreamCaptureUnmatched: return CUDA_ERROR_STREAM_CAPTURE_UNMATCHED; case hipErrorStreamCaptureUnjoined: return CUDA_ERROR_STREAM_CAPTURE_UNJOINED; case hipErrorStreamCaptureIsolation: return CUDA_ERROR_STREAM_CAPTURE_ISOLATION; case hipErrorStreamCaptureImplicit: return CUDA_ERROR_STREAM_CAPTURE_IMPLICIT; case hipErrorCapturedEvent: return CUDA_ERROR_CAPTURED_EVENT; case hipErrorStreamCaptureWrongThread: return CUDA_ERROR_STREAM_CAPTURE_WRONG_THREAD; case hipErrorGraphExecUpdateFailure: return CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE; case hipErrorUnknown: default: return CUDA_ERROR_UNKNOWN; // Note - translated error. } } inline static cudaError_t hipErrorToCudaError(hipError_t hError) { switch (hError) { case hipSuccess: return cudaSuccess; case hipErrorOutOfMemory: return cudaErrorMemoryAllocation; case hipErrorProfilerDisabled: return cudaErrorProfilerDisabled; case hipErrorProfilerNotInitialized: return cudaErrorProfilerNotInitialized; case hipErrorProfilerAlreadyStarted: return cudaErrorProfilerAlreadyStarted; case hipErrorProfilerAlreadyStopped: return cudaErrorProfilerAlreadyStopped; case hipErrorInvalidConfiguration: return cudaErrorInvalidConfiguration; case hipErrorLaunchOutOfResources: return cudaErrorLaunchOutOfResources; case hipErrorInvalidValue: return cudaErrorInvalidValue; case hipErrorInvalidPitchValue: return cudaErrorInvalidPitchValue; case hipErrorInvalidHandle: return cudaErrorInvalidResourceHandle; case hipErrorInvalidDevice: return cudaErrorInvalidDevice; case hipErrorInvalidMemcpyDirection: return cudaErrorInvalidMemcpyDirection; case hipErrorInvalidDevicePointer: return cudaErrorInvalidDevicePointer; case hipErrorNotInitialized: return cudaErrorInitializationError; case hipErrorNoDevice: return cudaErrorNoDevice; case hipErrorNotReady: return cudaErrorNotReady; case hipErrorPeerAccessNotEnabled: return cudaErrorPeerAccessNotEnabled; case hipErrorPeerAccessAlreadyEnabled: return cudaErrorPeerAccessAlreadyEnabled; case hipErrorHostMemoryAlreadyRegistered: return cudaErrorHostMemoryAlreadyRegistered; case hipErrorHostMemoryNotRegistered: return cudaErrorHostMemoryNotRegistered; case hipErrorDeinitialized: return cudaErrorCudartUnloading; case hipErrorInvalidSymbol: return cudaErrorInvalidSymbol; case hipErrorInsufficientDriver: return cudaErrorInsufficientDriver; case hipErrorMissingConfiguration: return cudaErrorMissingConfiguration; case hipErrorPriorLaunchFailure: return cudaErrorPriorLaunchFailure; case hipErrorInvalidDeviceFunction: return cudaErrorInvalidDeviceFunction; case hipErrorInvalidImage: return cudaErrorInvalidKernelImage; case hipErrorInvalidContext: #if CUDA_VERSION >= CUDA_10020 return cudaErrorDeviceUninitialized; #else return cudaErrorUnknown; #endif case hipErrorMapFailed: return cudaErrorMapBufferObjectFailed; case hipErrorUnmapFailed: return cudaErrorUnmapBufferObjectFailed; case hipErrorArrayIsMapped: #if CUDA_VERSION >= CUDA_10010 return cudaErrorArrayIsMapped; #else return cudaErrorUnknown; #endif case hipErrorAlreadyMapped: #if CUDA_VERSION >= CUDA_10010 return cudaErrorAlreadyMapped; #else return cudaErrorUnknown; #endif case hipErrorNoBinaryForGpu: return cudaErrorNoKernelImageForDevice; case hipErrorAlreadyAcquired: #if CUDA_VERSION >= CUDA_10010 return cudaErrorAlreadyAcquired; #else return cudaErrorUnknown; #endif case hipErrorNotMapped: #if CUDA_VERSION >= CUDA_10010 return cudaErrorNotMapped; #else return cudaErrorUnknown; #endif case hipErrorNotMappedAsArray: #if CUDA_VERSION >= CUDA_10010 return cudaErrorNotMappedAsArray; #else return cudaErrorUnknown; #endif case hipErrorNotMappedAsPointer: #if CUDA_VERSION >= CUDA_10010 return cudaErrorNotMappedAsPointer; #else return cudaErrorUnknown; #endif case hipErrorECCNotCorrectable: return cudaErrorECCUncorrectable; case hipErrorUnsupportedLimit: return cudaErrorUnsupportedLimit; case hipErrorContextAlreadyInUse: return cudaErrorDeviceAlreadyInUse; case hipErrorPeerAccessUnsupported: return cudaErrorPeerAccessUnsupported; case hipErrorInvalidKernelFile: return cudaErrorInvalidPtx; case hipErrorInvalidGraphicsContext: return cudaErrorInvalidGraphicsContext; case hipErrorInvalidSource: #if CUDA_VERSION >= CUDA_10010 return cudaErrorInvalidSource; #else return cudaErrorUnknown; #endif case hipErrorFileNotFound: #if CUDA_VERSION >= CUDA_10010 return cudaErrorFileNotFound; #else return cudaErrorUnknown; #endif case hipErrorSharedObjectSymbolNotFound: return cudaErrorSharedObjectSymbolNotFound; case hipErrorSharedObjectInitFailed: return cudaErrorSharedObjectInitFailed; case hipErrorOperatingSystem: return cudaErrorOperatingSystem; case hipErrorIllegalState: return cudaErrorIllegalState; case hipErrorNotFound: #if CUDA_VERSION >= CUDA_10010 return cudaErrorSymbolNotFound; #else return cudaErrorUnknown; #endif case hipErrorIllegalAddress: return cudaErrorIllegalAddress; case hipErrorLaunchTimeOut: return cudaErrorLaunchTimeout; case hipErrorSetOnActiveProcess: return cudaErrorSetOnActiveProcess; case hipErrorContextIsDestroyed: return cudaErrorContextIsDestroyed; case hipErrorAssert: return cudaErrorAssert; case hipErrorLaunchFailure: return cudaErrorLaunchFailure; case hipErrorCooperativeLaunchTooLarge: return cudaErrorCooperativeLaunchTooLarge; case hipErrorStreamCaptureUnsupported: return cudaErrorStreamCaptureUnsupported; case hipErrorStreamCaptureInvalidated: return cudaErrorStreamCaptureInvalidated; case hipErrorStreamCaptureMerge: return cudaErrorStreamCaptureMerge; case hipErrorStreamCaptureUnmatched: return cudaErrorStreamCaptureUnmatched; case hipErrorStreamCaptureUnjoined: return cudaErrorStreamCaptureUnjoined; case hipErrorStreamCaptureIsolation: return cudaErrorStreamCaptureIsolation; case hipErrorStreamCaptureImplicit: return cudaErrorStreamCaptureImplicit; case hipErrorCapturedEvent: return cudaErrorCapturedEvent; case hipErrorStreamCaptureWrongThread: return cudaErrorStreamCaptureWrongThread; case hipErrorGraphExecUpdateFailure: return cudaErrorGraphExecUpdateFailure; case hipErrorNotSupported: return cudaErrorNotSupported; // HSA: does not exist in CUDA case hipErrorRuntimeMemory: // HSA: does not exist in CUDA case hipErrorRuntimeOther: case hipErrorUnknown: case hipErrorTbd: default: return cudaErrorUnknown; // Note - translated error. } } inline static enum cudaMemcpyKind hipMemcpyKindToCudaMemcpyKind(hipMemcpyKind kind) { switch (kind) { case hipMemcpyHostToHost: return cudaMemcpyHostToHost; case hipMemcpyHostToDevice: return cudaMemcpyHostToDevice; case hipMemcpyDeviceToHost: return cudaMemcpyDeviceToHost; case hipMemcpyDeviceToDevice: return cudaMemcpyDeviceToDevice; case hipMemcpyDefault: return cudaMemcpyDefault; default: return (hipMemcpyKind)-1; } } inline static enum cudaTextureAddressMode hipTextureAddressModeToCudaTextureAddressMode( hipTextureAddressMode kind) { switch (kind) { case hipAddressModeWrap: return cudaAddressModeWrap; case hipAddressModeClamp: return cudaAddressModeClamp; case hipAddressModeMirror: return cudaAddressModeMirror; case hipAddressModeBorder: return cudaAddressModeBorder; default: return (hipTextureAddressMode)-1; } } inline static enum cudaMemRangeAttribute hipMemRangeAttributeToCudaMemRangeAttribute( hipMemRangeAttribute kind) { switch (kind) { case hipMemRangeAttributeReadMostly: return cudaMemRangeAttributeReadMostly; case hipMemRangeAttributePreferredLocation: return cudaMemRangeAttributePreferredLocation; case hipMemRangeAttributeAccessedBy: return cudaMemRangeAttributeAccessedBy; case hipMemRangeAttributeLastPrefetchLocation: return cudaMemRangeAttributeLastPrefetchLocation; default: return (hipMemRangeAttribute)-1; } } inline static enum cudaMemoryAdvise hipMemoryAdviseTocudaMemoryAdvise( hipMemoryAdvise kind) { switch (kind) { case hipMemAdviseSetReadMostly: return cudaMemAdviseSetReadMostly; case hipMemAdviseUnsetReadMostly : return cudaMemAdviseUnsetReadMostly ; case hipMemAdviseSetPreferredLocation: return cudaMemAdviseSetPreferredLocation; case hipMemAdviseUnsetPreferredLocation: return cudaMemAdviseUnsetPreferredLocation; case hipMemAdviseSetAccessedBy: return cudaMemAdviseSetAccessedBy; case hipMemAdviseUnsetAccessedBy: return cudaMemAdviseUnsetAccessedBy; default: return (enum cudaMemoryAdvise)-1; } } inline static enum cudaTextureFilterMode hipTextureFilterModeToCudaTextureFilterMode( hipTextureFilterMode kind) { switch (kind) { case hipFilterModePoint: return cudaFilterModePoint; case hipFilterModeLinear: return cudaFilterModeLinear; default: return (hipTextureFilterMode)-1; } } inline static enum cudaTextureReadMode hipTextureReadModeToCudaTextureReadMode(hipTextureReadMode kind) { switch (kind) { case hipReadModeElementType: return cudaReadModeElementType; case hipReadModeNormalizedFloat: return cudaReadModeNormalizedFloat; default: return (hipTextureReadMode)-1; } } inline static enum cudaChannelFormatKind hipChannelFormatKindToCudaChannelFormatKind( hipChannelFormatKind kind) { switch (kind) { case hipChannelFormatKindSigned: return cudaChannelFormatKindSigned; case hipChannelFormatKindUnsigned: return cudaChannelFormatKindUnsigned; case hipChannelFormatKindFloat: return cudaChannelFormatKindFloat; case hipChannelFormatKindNone: return cudaChannelFormatKindNone; default: return (hipChannelFormatKind)-1; } } typedef enum cudaExternalMemoryHandleType hipExternalMemoryHandleType; #define hipExternalMemoryHandleTypeOpaqueFd cudaExternalMemoryHandleTypeOpaqueFd #define hipExternalMemoryHandleTypeOpaqueWin32 cudaExternalMemoryHandleTypeOpaqueWin32 #define hipExternalMemoryHandleTypeOpaqueWin32Kmt cudaExternalMemoryHandleTypeOpaqueWin32Kmt #define hipExternalMemoryHandleTypeD3D12Heap cudaExternalMemoryHandleTypeD3D12Heap #define hipExternalMemoryHandleTypeD3D12Resource cudaExternalMemoryHandleTypeD3D12Resource #if CUDA_VERSION >= CUDA_10020 #define hipExternalMemoryHandleTypeD3D11Resource cudaExternalMemoryHandleTypeD3D11Resource #define hipExternalMemoryHandleTypeD3D11ResourceKmt cudaExternalMemoryHandleTypeD3D11ResourceKmt #define hipExternalMemoryHandleTypeNvSciBuf cudaExternalMemoryHandleTypeNvSciBuf #endif typedef struct cudaExternalMemoryHandleDesc hipExternalMemoryHandleDesc; typedef struct cudaExternalMemoryBufferDesc hipExternalMemoryBufferDesc; typedef cudaExternalMemory_t hipExternalMemory_t; typedef enum cudaExternalSemaphoreHandleType hipExternalSemaphoreHandleType; #define hipExternalSemaphoreHandleTypeOpaqueFd cudaExternalSemaphoreHandleTypeOpaqueFd #define hipExternalSemaphoreHandleTypeOpaqueWin32 cudaExternalSemaphoreHandleTypeOpaqueWin32 #define hipExternalSemaphoreHandleTypeOpaqueWin32Kmt cudaExternalSemaphoreHandleTypeOpaqueWin32Kmt #define hipExternalSemaphoreHandleTypeD3D12Fence cudaExternalSemaphoreHandleTypeD3D12Fence #if CUDA_VERSION >= CUDA_10020 #define hipExternalSemaphoreHandleTypeD3D11Fence cudaExternalSemaphoreHandleTypeD3D11Fence #define hipExternalSemaphoreHandleTypeNvSciSync cudaExternalSemaphoreHandleTypeNvSciSync #define hipExternalSemaphoreHandleTypeKeyedMutex cudaExternalSemaphoreHandleTypeKeyedMutex #define hipExternalSemaphoreHandleTypeKeyedMutexKmt cudaExternalSemaphoreHandleTypeKeyedMutexKmt #endif #if CUDA_VERSION >= CUDA_11020 #define hipExternalSemaphoreHandleTypeTimelineSemaphoreFd cudaExternalSemaphoreHandleTypeTimelineSemaphoreFd #define hipExternalSemaphoreHandleTypeTimelineSemaphoreWin32 cudaExternalSemaphoreHandleTypeTimelineSemaphoreWin32 #endif typedef struct cudaExternalSemaphoreHandleDesc hipExternalSemaphoreHandleDesc; typedef cudaExternalSemaphore_t hipExternalSemaphore_t; typedef struct cudaExternalSemaphoreSignalParams hipExternalSemaphoreSignalParams; typedef struct cudaExternalSemaphoreWaitParams hipExternalSemaphoreWaitParams; typedef struct cudaGraphicsResource hipGraphicsResource; typedef cudaGraphicsResource_t hipGraphicsResource_t; typedef enum cudaGraphicsRegisterFlags hipGraphicsRegisterFlags; #define hipGraphicsRegisterFlagsNone cudaGraphicsRegisterFlagsNone #define hipGraphicsRegisterFlagsReadOnly cudaGraphicsRegisterFlagsReadOnly #define hipGraphicsRegisterFlagsWriteDiscard cudaGraphicsRegisterFlagsWriteDiscard #define hipGraphicsRegisterFlagsSurfaceLoadStore cudaGraphicsRegisterFlagsSurfaceLoadStore #define hipGraphicsRegisterFlagsTextureGather cudaGraphicsRegisterFlagsTextureGather /** * graph types * */ typedef cudaGraph_t hipGraph_t; typedef cudaGraphNode_t hipGraphNode_t; typedef cudaGraphExec_t hipGraphExec_t; typedef cudaUserObject_t hipUserObject_t; typedef enum cudaGraphNodeType hipGraphNodeType; #define hipGraphNodeTypeKernel cudaGraphNodeTypeKernel #define hipGraphNodeTypeMemcpy cudaGraphNodeTypeMemcpy #define hipGraphNodeTypeMemset cudaGraphNodeTypeMemset #define hipGraphNodeTypeHost cudaGraphNodeTypeHost #define hipGraphNodeTypeGraph cudaGraphNodeTypeGraph #define hipGraphNodeTypeEmpty cudaGraphNodeTypeEmpty #define hipGraphNodeTypeWaitEvent cudaGraphNodeTypeWaitEvent #define hipGraphNodeTypeEventRecord cudaGraphNodeTypeEventRecord #define hipGraphNodeTypeExtSemaphoreSignal cudaGraphNodeTypeExtSemaphoreSignal #define hipGraphNodeTypeExtSemaphoreWait cudaGraphNodeTypeExtSemaphoreWait #define hipGraphNodeTypeMemcpyFromSymbol cudaGraphNodeTypeMemcpyFromSymbol #define hipGraphNodeTypeMemcpyToSymbol cudaGraphNodeTypeMemcpyToSymbol #define hipGraphNodeTypeCount cudaGraphNodeTypeCount typedef cudaHostFn_t hipHostFn_t; typedef struct cudaHostNodeParams hipHostNodeParams; typedef struct cudaKernelNodeParams hipKernelNodeParams; typedef struct cudaMemsetParams hipMemsetParams; #if CUDA_VERSION >= CUDA_11040 typedef struct cudaMemAllocNodeParams hipMemAllocNodeParams; #endif typedef enum cudaGraphExecUpdateResult hipGraphExecUpdateResult; #define hipGraphExecUpdateSuccess cudaGraphExecUpdateSuccess #define hipGraphExecUpdateError cudaGraphExecUpdateError #define hipGraphExecUpdateErrorTopologyChanged cudaGraphExecUpdateErrorTopologyChanged #define hipGraphExecUpdateErrorNodeTypeChanged cudaGraphExecUpdateErrorNodeTypeChanged #define hipGraphExecUpdateErrorFunctionChanged cudaGraphExecUpdateErrorFunctionChanged #define hipGraphExecUpdateErrorParametersChanged cudaGraphExecUpdateErrorParametersChanged #define hipGraphExecUpdateErrorNotSupported cudaGraphExecUpdateErrorNotSupported #define hipGraphExecUpdateErrorUnsupportedFunctionChange \ cudaGraphExecUpdateErrorUnsupportedFunctionChange typedef enum cudaStreamCaptureMode hipStreamCaptureMode; #define hipStreamCaptureModeGlobal cudaStreamCaptureModeGlobal #define hipStreamCaptureModeThreadLocal cudaStreamCaptureModeThreadLocal #define hipStreamCaptureModeRelaxed cudaStreamCaptureModeRelaxed typedef enum cudaStreamCaptureStatus hipStreamCaptureStatus; #define hipStreamCaptureStatusNone cudaStreamCaptureStatusNone #define hipStreamCaptureStatusActive cudaStreamCaptureStatusActive #define hipStreamCaptureStatusInvalidated cudaStreamCaptureStatusInvalidated typedef union cudaKernelNodeAttrValue hipKernelNodeAttrValue; typedef enum cudaKernelNodeAttrID hipKernelNodeAttrID; #define hipKernelNodeAttributeAccessPolicyWindow cudaKernelNodeAttributeAccessPolicyWindow #define hipKernelNodeAttributeCooperative cudaKernelNodeAttributeCooperative typedef enum cudaAccessProperty hipAccessProperty; #define hipAccessPropertyNormal cudaAccessPropertyNormal #define hipAccessPropertyStreaming cudaAccessPropertyStreaming #define hipAccessPropertyPersisting cudaAccessPropertyPersisting typedef struct cudaAccessPolicyWindow hipAccessPolicyWindow; typedef enum cudaGraphMemAttributeType hipGraphMemAttributeType; #define hipGraphMemAttrUsedMemCurrent cudaGraphMemAttrUsedMemCurrent #define hipGraphMemAttrUsedMemHigh cudaGraphMemAttrUsedMemHigh #define hipGraphMemAttrReservedMemCurrent cudaGraphMemAttrReservedMemCurrent #define hipGraphMemAttrReservedMemHigh cudaGraphMemAttrReservedMemHigh typedef enum cudaUserObjectFlags hipUserObjectFlags; #define hipUserObjectNoDestructorSync cudaUserObjectNoDestructorSync typedef enum cudaUserObjectRetainFlags hipUserObjectRetainFlags; #define hipGraphUserObjectMove cudaGraphUserObjectMove #if CUDA_VERSION >= CUDA_11030 typedef enum cudaStreamUpdateCaptureDependenciesFlags hipStreamUpdateCaptureDependenciesFlags; #define hipStreamAddCaptureDependencies cudaStreamAddCaptureDependencies #define hipStreamSetCaptureDependencies cudaStreamSetCaptureDependencies #endif #if CUDA_VERSION >= CUDA_11030 typedef enum cudaGraphDebugDotFlags hipGraphDebugDotFlags; #define hipGraphDebugDotFlagsVerbose cudaGraphDebugDotFlagsVerbose #define hipGraphDebugDotFlagsKernelNodeParams cudaGraphDebugDotFlagsKernelNodeParams #define hipGraphDebugDotFlagsMemcpyNodeParams cudaGraphDebugDotFlagsMemcpyNodeParams #define hipGraphDebugDotFlagsMemsetNodeParams cudaGraphDebugDotFlagsMemsetNodeParams #define hipGraphDebugDotFlagsHostNodeParams cudaGraphDebugDotFlagsHostNodeParams #define hipGraphDebugDotFlagsEventNodeParams cudaGraphDebugDotFlagsEventNodeParams #define hipGraphDebugDotFlagsExtSemasSignalNodeParams cudaGraphDebugDotFlagsExtSemasSignalNodeParams #define hipGraphDebugDotFlagsExtSemasWaitNodeParams cudaGraphDebugDotFlagsExtSemasWaitNodeParams #define hipGraphDebugDotFlagsKernelNodeAttributes cudaGraphDebugDotFlagsKernelNodeAttributes #define hipGraphDebugDotFlagsHandles cudaGraphDebugDotFlagsHandles #endif #if CUDA_VERSION >= CUDA_10020 #define hipMemAllocationGranularityMinimum CU_MEM_ALLOC_GRANULARITY_MINIMUM #define hipMemAllocationGranularityRecommended CU_MEM_ALLOC_GRANULARITY_RECOMMENDED typedef enum CUmemAllocationGranularity_flags_enum hipMemAllocationGranularity_flags; typedef enum cudaMemLocationType hipMemLocationType; #define hipMemLocationTypeInvalid cudaMemLocationTypeInvalid #define hipMemLocationTypeDevice cudaMemLocationTypeDevice #define hipMemHandleTypeNone cudaMemHandleTypeNone #define hipMemHandleTypePosixFileDescriptor cudaMemHandleTypePosixFileDescriptor #define hipMemHandleTypeWin32 cudaMemHandleTypeWin32 #define hipMemHandleTypeWin32Kmt cudaMemHandleTypeWin32Kmt typedef enum cudaMemAllocationType hipMemAllocationType; #define hipMemAllocationTypeInvalid cudaMemAllocationTypeInvalid #define hipMemAllocationTypePinned cudaMemAllocationTypePinned #define hipMemAllocationTypeMax cudaMemAllocationTypeMax #define hipMemGenericAllocationHandle_t CUmemGenericAllocationHandle //CUarrayMapInfo mappings typedef CUarrayMapInfo hipArrayMapInfo; typedef CUarraySparseSubresourceType hipArraySparseSubresourceType; #define hipArraySparseSubresourceTypeSparseLevel CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_SPARSE_LEVEL #define hipArraySparseSubresourceTypeMiptail CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_MIPTAIL typedef CUmemOperationType hipMemOperationType; #define hipMemOperationTypeMap CU_MEM_OPERATION_TYPE_MAP #define hipMemOperationTypeUnmap CU_MEM_OPERATION_TYPE_UNMAP typedef CUmemHandleType hipMemHandleType; #define hipMemHandleTypeGeneric CU_MEM_HANDLE_TYPE_GENERIC // Explicitely declaring hipMemAllocationProp based on CUmemAllocationProp but using CUDA runtime members instead // Because hipMemAllocationType, hipMemAllocationHandleType & hipMemLocation are defined using CUDA runtime data types & also used by hipMemPoolProps // Currently there doesn't exist CUDA inbuilt runtime structure corresponding to CUmemAllocationProp // Need to update this structure accordingly if CUDA updates CUmemAllocationProp typedef struct hipMemAllocationProp { /** Memory allocation type */ hipMemAllocationType type; /** Requested handle type */ hipMemAllocationHandleType requestedHandleTypes; /** Location of allocation */ hipMemLocation location; /** * Windows-specific POBJECT_ATTRIBUTES required when * ::CU_MEM_HANDLE_TYPE_WIN32 is specified. This object atributes structure * includes security attributes that define * the scope of which exported allocations may be tranferred to other * processes. In all other cases, this field is required to be zero. */ void *win32HandleMetaData; struct { /** * Allocation hint for requesting compressible memory. * On devices that support Compute Data Compression, compressible * memory can be used to accelerate accesses to data with unstructured * sparsity and other compressible data patterns. Applications are * expected to query allocation property of the handle obtained with * ::cuMemCreate using ::cuMemGetAllocationPropertiesFromHandle to * validate if the obtained allocation is compressible or not. Note that * compressed memory may not be mappable on all devices. */ unsigned char compressionType; /** RDMA capable */ unsigned char gpuDirectRDMACapable; /** Bitmask indicating intended usage for this allocation */ unsigned short usage; unsigned char reserved[4]; } allocFlags; } hipMemAllocationProp; #endif /** * Stream CallBack struct */ #define HIPRT_CB CUDART_CB typedef void(HIPRT_CB* hipStreamCallback_t)(hipStream_t stream, hipError_t status, void* userData); inline static hipError_t hipInit(unsigned int flags) { return hipCUResultTohipError(cuInit(flags)); } inline static hipError_t hipDeviceReset() { return hipCUDAErrorTohipError(cudaDeviceReset()); } inline static hipError_t hipGetLastError() { return hipCUDAErrorTohipError(cudaGetLastError()); } inline static hipError_t hipPeekAtLastError() { return hipCUDAErrorTohipError(cudaPeekAtLastError()); } inline static hipError_t hipMalloc(void** ptr, size_t size) { return hipCUDAErrorTohipError(cudaMalloc(ptr, size)); } inline static hipError_t hipMallocPitch(void** ptr, size_t* pitch, size_t width, size_t height) { return hipCUDAErrorTohipError(cudaMallocPitch(ptr, pitch, width, height)); } inline static hipError_t hipMemAllocPitch(hipDeviceptr_t* dptr,size_t* pitch,size_t widthInBytes,size_t height,unsigned int elementSizeBytes){ return hipCUResultTohipError(cuMemAllocPitch(dptr,pitch,widthInBytes,height,elementSizeBytes)); } inline static hipError_t hipMalloc3D(hipPitchedPtr* pitchedDevPtr, hipExtent extent) { return hipCUDAErrorTohipError(cudaMalloc3D(pitchedDevPtr, extent)); } inline static hipError_t hipFree(void* ptr) { return hipCUDAErrorTohipError(cudaFree(ptr)); } __HIP_DEPRECATED_MSG("use hipHostMalloc instead") inline static hipError_t hipMallocHost(void** ptr, size_t size) { return hipCUDAErrorTohipError(cudaMallocHost(ptr, size)); } __HIP_DEPRECATED_MSG("use hipHostMalloc instead") inline static hipError_t hipMemAllocHost(void** ptr, size_t size) { return hipCUResultTohipError(cuMemAllocHost(ptr, size)); } __HIP_DEPRECATED_MSG("use hipHostMalloc instead") inline static hipError_t hipHostAlloc(void** ptr, size_t size, unsigned int flags) { return hipCUDAErrorTohipError(cudaHostAlloc(ptr, size, flags)); } inline static hipError_t hipHostMalloc(void** ptr, size_t size, unsigned int flags) { return hipCUDAErrorTohipError(cudaHostAlloc(ptr, size, flags)); } inline static hipError_t hipMemAdvise(const void* dev_ptr, size_t count, hipMemoryAdvise advice, int device) { return hipCUDAErrorTohipError(cudaMemAdvise(dev_ptr, count, hipMemoryAdviseTocudaMemoryAdvise(advice), device)); } inline static hipError_t hipMemPrefetchAsync(const void* dev_ptr, size_t count, int device, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError(cudaMemPrefetchAsync(dev_ptr, count, device, stream)); } inline static hipError_t hipMemRangeGetAttribute(void* data, size_t data_size, hipMemRangeAttribute attribute, const void* dev_ptr, size_t count) { return hipCUDAErrorTohipError(cudaMemRangeGetAttribute(data, data_size, hipMemRangeAttributeToCudaMemRangeAttribute(attribute), dev_ptr, count)); } inline static hipError_t hipMemRangeGetAttributes(void** data, size_t* data_sizes, hipMemRangeAttribute* attributes, size_t num_attributes, const void* dev_ptr, size_t count) { return hipCUDAErrorTohipError(cudaMemRangeGetAttributes(data, data_sizes, attributes, num_attributes, dev_ptr, count)); } inline static hipError_t hipStreamAttachMemAsync(hipStream_t stream, hipDeviceptr_t* dev_ptr, size_t length __dparm(0), unsigned int flags __dparm(hipMemAttachSingle)) { return hipCUDAErrorTohipError(cudaStreamAttachMemAsync(stream, dev_ptr, length, flags)); } inline static hipError_t hipMallocManaged(void** ptr, size_t size, unsigned int flags) { return hipCUDAErrorTohipError(cudaMallocManaged(ptr, size, flags)); } inline static hipError_t hipMallocArray(hipArray** array, const hipChannelFormatDesc* desc, size_t width, size_t height __dparm(0), unsigned int flags __dparm(hipArrayDefault)) { return hipCUDAErrorTohipError(cudaMallocArray(array, desc, width, height, flags)); } inline static hipError_t hipMalloc3DArray(hipArray** array, const hipChannelFormatDesc* desc, hipExtent extent, unsigned int flags) { return hipCUDAErrorTohipError(cudaMalloc3DArray(array, desc, extent, flags)); } inline static hipError_t hipFreeArray(hipArray* array) { return hipCUDAErrorTohipError(cudaFreeArray(array)); } inline static hipError_t hipMipmappedArrayCreate(hipmipmappedArray* pHandle, HIP_ARRAY3D_DESCRIPTOR* pMipmappedArrayDesc, unsigned int numMipmapLevels) { return hipCUResultTohipError(cuMipmappedArrayCreate(pHandle, pMipmappedArrayDesc, numMipmapLevels)); } inline static hipError_t hipMipmappedArrayDestroy(hipmipmappedArray hMipmappedArray) { return hipCUResultTohipError(cuMipmappedArrayDestroy(hMipmappedArray)); } inline static hipError_t hipMipmappedArrayGetLevel(hiparray* pLevelArray, hipmipmappedArray hMipMappedArray, unsigned int level) { return hipCUResultTohipError(cuMipmappedArrayGetLevel((CUarray*)pLevelArray, hMipMappedArray, level)); } inline static hipError_t hipMallocMipmappedArray(hipMipmappedArray_t* pHandle, const hipChannelFormatDesc* desc, hipExtent extent, unsigned int numLevels, unsigned int flags __dparm(0)) { return hipCUDAErrorTohipError(cudaMallocMipmappedArray(pHandle, desc, extent, numLevels, flags)); } inline static hipError_t hipFreeMipmappedArray(hipMipmappedArray_t hMipmappedArray) { return hipCUDAErrorTohipError(cudaFreeMipmappedArray(hMipmappedArray)); } inline static hipError_t hipGetMipmappedArrayLevel(hipArray_t* pLevelArray, hipMipmappedArray_t hMipMappedArray, unsigned int level) { return hipCUDAErrorTohipError(cudaGetMipmappedArrayLevel(pLevelArray, hMipMappedArray, level)); } inline static hipError_t hipHostGetDevicePointer(void** devPtr, void* hostPtr, unsigned int flags) { return hipCUDAErrorTohipError(cudaHostGetDevicePointer(devPtr, hostPtr, flags)); } inline static hipError_t hipHostGetFlags(unsigned int* flagsPtr, void* hostPtr) { return hipCUDAErrorTohipError(cudaHostGetFlags(flagsPtr, hostPtr)); } inline static hipError_t hipHostRegister(void* ptr, size_t size, unsigned int flags) { return hipCUDAErrorTohipError(cudaHostRegister(ptr, size, flags)); } inline static hipError_t hipHostUnregister(void* ptr) { return hipCUDAErrorTohipError(cudaHostUnregister(ptr)); } __HIP_DEPRECATED_MSG("use hipHostFree instead") inline static hipError_t hipFreeHost(void* ptr) { return hipCUDAErrorTohipError(cudaFreeHost(ptr)); } inline static hipError_t hipHostFree(void* ptr) { return hipCUDAErrorTohipError(cudaFreeHost(ptr)); } inline static hipError_t hipSetDevice(int device) { return hipCUDAErrorTohipError(cudaSetDevice(device)); } inline static hipError_t hipChooseDevice(int* device, const hipDeviceProp_t* prop) { if (prop == NULL) { return hipErrorInvalidValue; } struct cudaDeviceProp cdprop; memset(&cdprop, 0x0, sizeof(struct cudaDeviceProp)); cdprop.major = prop->major; cdprop.minor = prop->minor; cdprop.totalGlobalMem = prop->totalGlobalMem; cdprop.sharedMemPerBlock = prop->sharedMemPerBlock; cdprop.regsPerBlock = prop->regsPerBlock; cdprop.warpSize = prop->warpSize; cdprop.maxThreadsPerBlock = prop->maxThreadsPerBlock; cdprop.clockRate = prop->clockRate; cdprop.totalConstMem = prop->totalConstMem; cdprop.multiProcessorCount = prop->multiProcessorCount; cdprop.l2CacheSize = prop->l2CacheSize; cdprop.maxThreadsPerMultiProcessor = prop->maxThreadsPerMultiProcessor; cdprop.computeMode = prop->computeMode; cdprop.canMapHostMemory = prop->canMapHostMemory; cdprop.memoryClockRate = prop->memoryClockRate; cdprop.memoryBusWidth = prop->memoryBusWidth; return hipCUDAErrorTohipError(cudaChooseDevice(device, &cdprop)); } inline static hipError_t hipMemcpyHtoD(hipDeviceptr_t dst, void* src, size_t size) { return hipCUResultTohipError(cuMemcpyHtoD(dst, src, size)); } inline static hipError_t hipMemcpyDtoH(void* dst, hipDeviceptr_t src, size_t size) { return hipCUResultTohipError(cuMemcpyDtoH(dst, src, size)); } inline static hipError_t hipMemcpyDtoD(hipDeviceptr_t dst, hipDeviceptr_t src, size_t size) { return hipCUResultTohipError(cuMemcpyDtoD(dst, src, size)); } inline static hipError_t hipMemcpyHtoDAsync(hipDeviceptr_t dst, void* src, size_t size, hipStream_t stream) { return hipCUResultTohipError(cuMemcpyHtoDAsync(dst, src, size, stream)); } inline static hipError_t hipMemcpyDtoHAsync(void* dst, hipDeviceptr_t src, size_t size, hipStream_t stream) { return hipCUResultTohipError(cuMemcpyDtoHAsync(dst, src, size, stream)); } inline static hipError_t hipMemcpyDtoDAsync(hipDeviceptr_t dst, hipDeviceptr_t src, size_t size, hipStream_t stream) { return hipCUResultTohipError(cuMemcpyDtoDAsync(dst, src, size, stream)); } inline static hipError_t hipMemcpy(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind copyKind) { return hipCUDAErrorTohipError( cudaMemcpy(dst, src, sizeBytes, copyKind)); } inline static hipError_t hipMemcpyWithStream(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind copyKind, hipStream_t stream) { cudaError_t error = cudaMemcpyAsync(dst, src, sizeBytes, copyKind, stream); if (error != cudaSuccess) return hipCUDAErrorTohipError(error); return hipCUDAErrorTohipError(cudaStreamSynchronize(stream)); } inline static hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind copyKind, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError( cudaMemcpyAsync(dst, src, sizeBytes, copyKind, stream)); } inline static hipError_t hipMemcpyToSymbol( const void* symbol, const void* src, size_t sizeBytes, size_t offset __dparm(0), hipMemcpyKind copyType __dparm(hipMemcpyKindToCudaMemcpyKind(hipMemcpyHostToDevice))) { return hipCUDAErrorTohipError(cudaMemcpyToSymbol(symbol, src, sizeBytes, offset, copyType)); } inline static hipError_t hipMemcpyToSymbolAsync(const void* symbol, const void* src, size_t sizeBytes, size_t offset, hipMemcpyKind copyType, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError(cudaMemcpyToSymbolAsync( symbol, src, sizeBytes, offset, copyType, stream)); } inline static hipError_t hipMemcpyFromSymbol( void* dst, const void* symbolName, size_t sizeBytes, size_t offset __dparm(0), hipMemcpyKind kind __dparm(hipMemcpyKindToCudaMemcpyKind(hipMemcpyDeviceToHost))) { return hipCUDAErrorTohipError(cudaMemcpyFromSymbol(dst, symbolName, sizeBytes, offset, kind)); } inline static hipError_t hipMemcpyFromSymbolAsync(void* dst, const void* symbolName, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError(cudaMemcpyFromSymbolAsync( dst, symbolName, sizeBytes, offset, kind, stream)); } inline static hipError_t hipGetSymbolAddress(void** devPtr, const void* symbolName) { return hipCUDAErrorTohipError(cudaGetSymbolAddress(devPtr, symbolName)); } inline static hipError_t hipGetSymbolSize(size_t* size, const void* symbolName) { return hipCUDAErrorTohipError(cudaGetSymbolSize(size, symbolName)); } inline static hipError_t hipMemcpy2D(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind) { return hipCUDAErrorTohipError( cudaMemcpy2D(dst, dpitch, src, spitch, width, height, kind)); } inline static hipError_t hipMemcpyParam2D(const hip_Memcpy2D* pCopy) { return hipCUResultTohipError(cuMemcpy2D(pCopy)); } inline static hipError_t hipMemcpyParam2DAsync(const hip_Memcpy2D* pCopy, hipStream_t stream __dparm(0)) { return hipCUResultTohipError(cuMemcpy2DAsync(pCopy, stream)); } inline static hipError_t hipMemcpy3D(const struct hipMemcpy3DParms *p) { return hipCUDAErrorTohipError(cudaMemcpy3D(p)); } inline static hipError_t hipMemcpy3DAsync(const struct hipMemcpy3DParms *p, hipStream_t stream) { return hipCUDAErrorTohipError(cudaMemcpy3DAsync(p, stream)); } inline static hipError_t hipDrvMemcpy3D(const HIP_MEMCPY3D* pCopy) { return hipCUResultTohipError(cuMemcpy3D(pCopy)); } inline static hipError_t hipDrvMemcpy3DAsync(const HIP_MEMCPY3D* pCopy, hipStream_t stream) { return hipCUResultTohipError(cuMemcpy3DAsync(pCopy, stream)); } inline static hipError_t hipMemcpy2DAsync(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { return hipCUDAErrorTohipError(cudaMemcpy2DAsync(dst, dpitch, src, spitch, width, height, kind, stream)); } inline static hipError_t hipMemcpy2DFromArray(void* dst, size_t dpitch, hipArray* src, size_t wOffset, size_t hOffset, size_t width, size_t height, hipMemcpyKind kind) { return hipCUDAErrorTohipError(cudaMemcpy2DFromArray(dst, dpitch, src, wOffset, hOffset, width, height, kind)); } inline static hipError_t hipMemcpy2DFromArrayAsync(void* dst, size_t dpitch, hipArray* src, size_t wOffset, size_t hOffset, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { return hipCUDAErrorTohipError(cudaMemcpy2DFromArrayAsync(dst, dpitch, src, wOffset, hOffset, width, height, kind, stream)); } inline static hipError_t hipMemcpy2DToArray(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind) { return hipCUDAErrorTohipError(cudaMemcpy2DToArray(dst, wOffset, hOffset, src, spitch, width, height, kind)); } inline static hipError_t hipMemcpy2DToArrayAsync(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { return hipCUDAErrorTohipError(cudaMemcpy2DToArrayAsync(dst, wOffset, hOffset, src, spitch, width, height, kind, stream)); } __HIP_DEPRECATED inline static hipError_t hipMemcpyToArray(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t count, hipMemcpyKind kind) { return hipCUDAErrorTohipError( cudaMemcpyToArray(dst, wOffset, hOffset, src, count, kind)); } __HIP_DEPRECATED inline static hipError_t hipMemcpyFromArray(void* dst, hipArray_const_t srcArray, size_t wOffset, size_t hOffset, size_t count, hipMemcpyKind kind) { return hipCUDAErrorTohipError(cudaMemcpyFromArray(dst, srcArray, wOffset, hOffset, count, kind)); } inline static hipError_t hipMemcpyAtoH(void* dst, hipArray* srcArray, size_t srcOffset, size_t count) { return hipCUResultTohipError(cuMemcpyAtoH(dst, (CUarray)srcArray, srcOffset, count)); } inline static hipError_t hipMemcpyHtoA(hipArray* dstArray, size_t dstOffset, const void* srcHost, size_t count) { return hipCUResultTohipError(cuMemcpyHtoA((CUarray)dstArray, dstOffset, srcHost, count)); } inline static hipError_t hipDeviceSynchronize() { return hipCUDAErrorTohipError(cudaDeviceSynchronize()); } inline static hipError_t hipDeviceGetCacheConfig(hipFuncCache_t* pCacheConfig) { return hipCUDAErrorTohipError(cudaDeviceGetCacheConfig(pCacheConfig)); } inline static hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value) { return hipCUDAErrorTohipError(cudaFuncSetAttribute(func, attr, value)); } inline static hipError_t hipDeviceSetCacheConfig(hipFuncCache_t cacheConfig) { return hipCUDAErrorTohipError(cudaDeviceSetCacheConfig(cacheConfig)); } inline static hipError_t hipFuncSetSharedMemConfig(const void* func, hipSharedMemConfig config) { return hipCUDAErrorTohipError(cudaFuncSetSharedMemConfig(func, config)); } inline static const char* hipGetErrorString(hipError_t error) { return cudaGetErrorString(hipErrorToCudaError(error)); } inline static const char* hipGetErrorName(hipError_t error) { return cudaGetErrorName(hipErrorToCudaError(error)); } inline static hipError_t hipDrvGetErrorString(hipError_t error, const char** errorString) { CUresult err = hipErrorToCUResult(error); if( err == CUDA_ERROR_UNKNOWN ) { return hipCUResultTohipError(cuGetErrorString((CUresult)error, errorString)); } else { return hipCUResultTohipError(cuGetErrorString(err, errorString)); } } inline static hipError_t hipDrvGetErrorName(hipError_t error, const char** errorString) { CUresult err = hipErrorToCUResult(error); if( err == CUDA_ERROR_UNKNOWN ) { return hipCUResultTohipError(cuGetErrorName((CUresult)error, errorString)); } else { return hipCUResultTohipError(cuGetErrorName(err, errorString)); } } inline static hipError_t hipGetDeviceCount(int* count) { return hipCUDAErrorTohipError(cudaGetDeviceCount(count)); } inline static hipError_t hipGetDevice(int* device) { return hipCUDAErrorTohipError(cudaGetDevice(device)); } inline static hipError_t hipIpcCloseMemHandle(void* devPtr) { return hipCUDAErrorTohipError(cudaIpcCloseMemHandle(devPtr)); } inline static hipError_t hipIpcGetEventHandle(hipIpcEventHandle_t* handle, hipEvent_t event) { return hipCUDAErrorTohipError(cudaIpcGetEventHandle(handle, event)); } inline static hipError_t hipIpcGetMemHandle(hipIpcMemHandle_t* handle, void* devPtr) { return hipCUDAErrorTohipError(cudaIpcGetMemHandle(handle, devPtr)); } inline static hipError_t hipIpcOpenEventHandle(hipEvent_t* event, hipIpcEventHandle_t handle) { return hipCUDAErrorTohipError(cudaIpcOpenEventHandle(event, handle)); } inline static hipError_t hipIpcOpenMemHandle(void** devPtr, hipIpcMemHandle_t handle, unsigned int flags) { return hipCUDAErrorTohipError(cudaIpcOpenMemHandle(devPtr, handle, flags)); } inline static hipError_t hipMemset(void* devPtr, int value, size_t count) { return hipCUDAErrorTohipError(cudaMemset(devPtr, value, count)); } inline static hipError_t hipMemsetD32(hipDeviceptr_t devPtr, int value, size_t count) { return hipCUResultTohipError(cuMemsetD32(devPtr, value, count)); } inline static hipError_t hipMemsetAsync(void* devPtr, int value, size_t count, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError(cudaMemsetAsync(devPtr, value, count, stream)); } inline static hipError_t hipMemsetD32Async(hipDeviceptr_t devPtr, int value, size_t count, hipStream_t stream __dparm(0)) { return hipCUResultTohipError(cuMemsetD32Async(devPtr, value, count, stream)); } inline static hipError_t hipMemsetD8(hipDeviceptr_t dest, unsigned char value, size_t sizeBytes) { return hipCUResultTohipError(cuMemsetD8(dest, value, sizeBytes)); } inline static hipError_t hipMemsetD8Async(hipDeviceptr_t dest, unsigned char value, size_t sizeBytes, hipStream_t stream __dparm(0)) { return hipCUResultTohipError(cuMemsetD8Async(dest, value, sizeBytes, stream)); } inline static hipError_t hipMemsetD16(hipDeviceptr_t dest, unsigned short value, size_t sizeBytes) { return hipCUResultTohipError(cuMemsetD16(dest, value, sizeBytes)); } inline static hipError_t hipMemsetD16Async(hipDeviceptr_t dest, unsigned short value, size_t sizeBytes, hipStream_t stream __dparm(0)) { return hipCUResultTohipError(cuMemsetD16Async(dest, value, sizeBytes, stream)); } inline static hipError_t hipMemset2D(void* dst, size_t pitch, int value, size_t width, size_t height) { return hipCUDAErrorTohipError(cudaMemset2D(dst, pitch, value, width, height)); } inline static hipError_t hipMemset2DAsync(void* dst, size_t pitch, int value, size_t width, size_t height, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError(cudaMemset2DAsync(dst, pitch, value, width, height, stream)); } inline static hipError_t hipMemset3D(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent ){ return hipCUDAErrorTohipError(cudaMemset3D(pitchedDevPtr, value, extent)); } inline static hipError_t hipMemset3DAsync(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hipStream_t stream __dparm(0) ){ return hipCUDAErrorTohipError(cudaMemset3DAsync(pitchedDevPtr, value, extent, stream)); } inline static hipError_t hipGetDeviceProperties(hipDeviceProp_t* p_prop, int device) { if (p_prop == NULL) { return hipErrorInvalidValue; } struct cudaDeviceProp cdprop; cudaError_t cerror; cerror = cudaGetDeviceProperties(&cdprop, device); strncpy(p_prop->name, cdprop.name, 256); p_prop->totalGlobalMem = cdprop.totalGlobalMem; p_prop->sharedMemPerBlock = cdprop.sharedMemPerBlock; p_prop->regsPerBlock = cdprop.regsPerBlock; p_prop->warpSize = cdprop.warpSize; p_prop->maxThreadsPerBlock = cdprop.maxThreadsPerBlock; for (int i = 0; i < 3; i++) { p_prop->maxThreadsDim[i] = cdprop.maxThreadsDim[i]; p_prop->maxGridSize[i] = cdprop.maxGridSize[i]; } p_prop->clockRate = cdprop.clockRate; p_prop->memoryClockRate = cdprop.memoryClockRate; p_prop->memoryBusWidth = cdprop.memoryBusWidth; p_prop->totalConstMem = cdprop.totalConstMem; p_prop->major = cdprop.major; p_prop->minor = cdprop.minor; p_prop->multiProcessorCount = cdprop.multiProcessorCount; p_prop->l2CacheSize = cdprop.l2CacheSize; p_prop->maxThreadsPerMultiProcessor = cdprop.maxThreadsPerMultiProcessor; p_prop->computeMode = cdprop.computeMode; p_prop->clockInstructionRate = cdprop.clockRate; // Same as clock-rate: int ccVers = p_prop->major * 100 + p_prop->minor * 10; p_prop->arch.hasGlobalInt32Atomics = (ccVers >= 110); p_prop->arch.hasGlobalFloatAtomicExch = (ccVers >= 110); p_prop->arch.hasSharedInt32Atomics = (ccVers >= 120); p_prop->arch.hasSharedFloatAtomicExch = (ccVers >= 120); p_prop->arch.hasFloatAtomicAdd = (ccVers >= 200); p_prop->arch.hasGlobalInt64Atomics = (ccVers >= 120); p_prop->arch.hasSharedInt64Atomics = (ccVers >= 110); p_prop->arch.hasDoubles = (ccVers >= 130); p_prop->arch.hasWarpVote = (ccVers >= 120); p_prop->arch.hasWarpBallot = (ccVers >= 200); p_prop->arch.hasWarpShuffle = (ccVers >= 300); p_prop->arch.hasFunnelShift = (ccVers >= 350); p_prop->arch.hasThreadFenceSystem = (ccVers >= 200); p_prop->arch.hasSyncThreadsExt = (ccVers >= 200); p_prop->arch.hasSurfaceFuncs = (ccVers >= 200); p_prop->arch.has3dGrid = (ccVers >= 200); p_prop->arch.hasDynamicParallelism = (ccVers >= 350); p_prop->concurrentKernels = cdprop.concurrentKernels; p_prop->pciDomainID = cdprop.pciDomainID; p_prop->pciBusID = cdprop.pciBusID; p_prop->pciDeviceID = cdprop.pciDeviceID; p_prop->maxSharedMemoryPerMultiProcessor = cdprop.sharedMemPerMultiprocessor; p_prop->isMultiGpuBoard = cdprop.isMultiGpuBoard; p_prop->canMapHostMemory = cdprop.canMapHostMemory; p_prop->gcnArch = 0; // Not a GCN arch p_prop->integrated = cdprop.integrated; p_prop->cooperativeLaunch = cdprop.cooperativeLaunch; p_prop->cooperativeMultiDeviceLaunch = cdprop.cooperativeMultiDeviceLaunch; p_prop->cooperativeMultiDeviceUnmatchedFunc = 0; p_prop->cooperativeMultiDeviceUnmatchedGridDim = 0; p_prop->cooperativeMultiDeviceUnmatchedBlockDim = 0; p_prop->cooperativeMultiDeviceUnmatchedSharedMem = 0; p_prop->maxTexture1D = cdprop.maxTexture1D; p_prop->maxTexture2D[0] = cdprop.maxTexture2D[0]; p_prop->maxTexture2D[1] = cdprop.maxTexture2D[1]; p_prop->maxTexture3D[0] = cdprop.maxTexture3D[0]; p_prop->maxTexture3D[1] = cdprop.maxTexture3D[1]; p_prop->maxTexture3D[2] = cdprop.maxTexture3D[2]; p_prop->memPitch = cdprop.memPitch; p_prop->textureAlignment = cdprop.textureAlignment; p_prop->texturePitchAlignment = cdprop.texturePitchAlignment; p_prop->kernelExecTimeoutEnabled = cdprop.kernelExecTimeoutEnabled; p_prop->ECCEnabled = cdprop.ECCEnabled; p_prop->tccDriver = cdprop.tccDriver; return hipCUDAErrorTohipError(cerror); } inline static hipError_t hipDeviceGetAttribute(int* pi, hipDeviceAttribute_t attr, int device) { enum cudaDeviceAttr cdattr; cudaError_t cerror; switch (attr) { case hipDeviceAttributeMaxThreadsPerBlock: cdattr = cudaDevAttrMaxThreadsPerBlock; break; case hipDeviceAttributeMaxBlockDimX: cdattr = cudaDevAttrMaxBlockDimX; break; case hipDeviceAttributeMaxBlockDimY: cdattr = cudaDevAttrMaxBlockDimY; break; case hipDeviceAttributeMaxBlockDimZ: cdattr = cudaDevAttrMaxBlockDimZ; break; case hipDeviceAttributeMaxGridDimX: cdattr = cudaDevAttrMaxGridDimX; break; case hipDeviceAttributeMaxGridDimY: cdattr = cudaDevAttrMaxGridDimY; break; case hipDeviceAttributeMaxGridDimZ: cdattr = cudaDevAttrMaxGridDimZ; break; case hipDeviceAttributeMaxSharedMemoryPerBlock: cdattr = cudaDevAttrMaxSharedMemoryPerBlock; break; case hipDeviceAttributeTotalConstantMemory: cdattr = cudaDevAttrTotalConstantMemory; break; case hipDeviceAttributeWarpSize: cdattr = cudaDevAttrWarpSize; break; case hipDeviceAttributeMaxRegistersPerBlock: cdattr = cudaDevAttrMaxRegistersPerBlock; break; case hipDeviceAttributeClockRate: cdattr = cudaDevAttrClockRate; break; case hipDeviceAttributeMemoryClockRate: cdattr = cudaDevAttrMemoryClockRate; break; case hipDeviceAttributeMemoryBusWidth: cdattr = cudaDevAttrGlobalMemoryBusWidth; break; case hipDeviceAttributeMultiprocessorCount: cdattr = cudaDevAttrMultiProcessorCount; break; case hipDeviceAttributeComputeMode: cdattr = cudaDevAttrComputeMode; break; case hipDeviceAttributeL2CacheSize: cdattr = cudaDevAttrL2CacheSize; break; case hipDeviceAttributeMaxThreadsPerMultiProcessor: cdattr = cudaDevAttrMaxThreadsPerMultiProcessor; break; case hipDeviceAttributeComputeCapabilityMajor: cdattr = cudaDevAttrComputeCapabilityMajor; break; case hipDeviceAttributeComputeCapabilityMinor: cdattr = cudaDevAttrComputeCapabilityMinor; break; case hipDeviceAttributeConcurrentKernels: cdattr = cudaDevAttrConcurrentKernels; break; case hipDeviceAttributePciBusId: cdattr = cudaDevAttrPciBusId; break; case hipDeviceAttributePciDeviceId: cdattr = cudaDevAttrPciDeviceId; break; case hipDeviceAttributeMaxSharedMemoryPerMultiprocessor: cdattr = cudaDevAttrMaxSharedMemoryPerMultiprocessor; break; case hipDeviceAttributeIsMultiGpuBoard: cdattr = cudaDevAttrIsMultiGpuBoard; break; case hipDeviceAttributeIntegrated: cdattr = cudaDevAttrIntegrated; break; case hipDeviceAttributeMaxTexture1DWidth: cdattr = cudaDevAttrMaxTexture1DWidth; break; case hipDeviceAttributeMaxTexture2DWidth: cdattr = cudaDevAttrMaxTexture2DWidth; break; case hipDeviceAttributeMaxTexture2DHeight: cdattr = cudaDevAttrMaxTexture2DHeight; break; case hipDeviceAttributeMaxTexture3DWidth: cdattr = cudaDevAttrMaxTexture3DWidth; break; case hipDeviceAttributeMaxTexture3DHeight: cdattr = cudaDevAttrMaxTexture3DHeight; break; case hipDeviceAttributeMaxTexture3DDepth: cdattr = cudaDevAttrMaxTexture3DDepth; break; case hipDeviceAttributeMaxPitch: cdattr = cudaDevAttrMaxPitch; break; case hipDeviceAttributeTextureAlignment: cdattr = cudaDevAttrTextureAlignment; break; case hipDeviceAttributeTexturePitchAlignment: cdattr = cudaDevAttrTexturePitchAlignment; break; case hipDeviceAttributeKernelExecTimeout: cdattr = cudaDevAttrKernelExecTimeout; break; case hipDeviceAttributeCanMapHostMemory: cdattr = cudaDevAttrCanMapHostMemory; break; case hipDeviceAttributeEccEnabled: cdattr = cudaDevAttrEccEnabled; break; case hipDeviceAttributeCooperativeLaunch: cdattr = cudaDevAttrCooperativeLaunch; break; case hipDeviceAttributeCooperativeMultiDeviceLaunch: cdattr = cudaDevAttrCooperativeMultiDeviceLaunch; break; case hipDeviceAttributeConcurrentManagedAccess: cdattr = cudaDevAttrConcurrentManagedAccess; break; case hipDeviceAttributeManagedMemory: cdattr = cudaDevAttrManagedMemory; break; case hipDeviceAttributePageableMemoryAccessUsesHostPageTables: cdattr = cudaDevAttrPageableMemoryAccessUsesHostPageTables; break; case hipDeviceAttributePageableMemoryAccess: cdattr = cudaDevAttrPageableMemoryAccess; break; case hipDeviceAttributeDirectManagedMemAccessFromHost: cdattr = cudaDevAttrDirectManagedMemAccessFromHost; break; case hipDeviceAttributeGlobalL1CacheSupported: cdattr = cudaDevAttrGlobalL1CacheSupported; break; case hipDeviceAttributeMaxBlocksPerMultiProcessor: cdattr = cudaDevAttrMaxBlocksPerMultiprocessor; break; case hipDeviceAttributeMultiGpuBoardGroupID: cdattr = cudaDevAttrMultiGpuBoardGroupID; break; case hipDeviceAttributeReservedSharedMemPerBlock: cdattr = cudaDevAttrReservedSharedMemoryPerBlock; break; case hipDeviceAttributeSingleToDoublePrecisionPerfRatio: cdattr = cudaDevAttrSingleToDoublePrecisionPerfRatio; break; case hipDeviceAttributeStreamPrioritiesSupported: cdattr = cudaDevAttrStreamPrioritiesSupported; break; case hipDeviceAttributeSurfaceAlignment: cdattr = cudaDevAttrSurfaceAlignment; break; case hipDeviceAttributeTccDriver: cdattr = cudaDevAttrTccDriver; break; case hipDeviceAttributeUnifiedAddressing: cdattr = cudaDevAttrUnifiedAddressing; break; #if CUDA_VERSION >= CUDA_11020 case hipDeviceAttributeMemoryPoolsSupported: cdattr = cudaDevAttrMemoryPoolsSupported; break; #endif // CUDA_VERSION >= CUDA_11020 case hipDeviceAttributeVirtualMemoryManagementSupported: return hipCUResultTohipError(cuDeviceGetAttribute(pi, CU_DEVICE_ATTRIBUTE_VIRTUAL_MEMORY_MANAGEMENT_SUPPORTED, device)); case hipDeviceAttributeAccessPolicyMaxWindowSize: cdattr = cudaDevAttrMaxAccessPolicyWindowSize; break; case hipDeviceAttributeAsyncEngineCount: cdattr = cudaDevAttrAsyncEngineCount; break; case hipDeviceAttributeCanUseHostPointerForRegisteredMem: cdattr = cudaDevAttrCanUseHostPointerForRegisteredMem; break; case hipDeviceAttributeComputePreemptionSupported: cdattr = cudaDevAttrComputePreemptionSupported; break; case hipDeviceAttributeHostNativeAtomicSupported: cdattr = cudaDevAttrHostNativeAtomicSupported; break; default: return hipCUDAErrorTohipError(cudaErrorInvalidValue); } cerror = cudaDeviceGetAttribute(pi, cdattr, device); return hipCUDAErrorTohipError(cerror); } #if CUDA_VERSION >= CUDA_10020 inline static CUmemAllocationProp hipMemAllocationPropToCUmemAllocationProp(const hipMemAllocationProp* prop) { CUmemAllocationProp cuProp; cuProp.type = (CUmemAllocationType)prop->type; cuProp.requestedHandleTypes = (CUmemAllocationHandleType)prop->requestedHandleTypes; cuProp.location.type = (CUmemLocationType)prop->location.type; cuProp.location.id = prop->location.id; cuProp.win32HandleMetaData = prop->win32HandleMetaData; cuProp.allocFlags.compressionType = prop->allocFlags.compressionType; cuProp.allocFlags.gpuDirectRDMACapable = prop->allocFlags.gpuDirectRDMACapable; cuProp.allocFlags.usage = prop->allocFlags.usage; cuProp.allocFlags.reserved[0] = prop->allocFlags.reserved[0]; cuProp.allocFlags.reserved[1] = prop->allocFlags.reserved[1]; cuProp.allocFlags.reserved[2] = prop->allocFlags.reserved[2]; cuProp.allocFlags.reserved[3] = prop->allocFlags.reserved[3]; return cuProp; } inline static CUmemLocation hipMemLocationToCUmemLocation(const hipMemLocation* loc) { CUmemLocation cuLoc; cuLoc.id = loc->id; cuLoc.type = (CUmemLocationType)loc->type; return cuLoc; } inline static CUmemAccessDesc hipMemAccessDescToCUmemAccessDesc(const hipMemAccessDesc* desc) { CUmemAccessDesc cuDesc; cuDesc.flags = (CUmemAccess_flags)desc->flags; cuDesc.location.id = (desc->location).id; cuDesc.location.type = (CUmemLocationType)((desc->location).type); return cuDesc; } inline static hipError_t hipMemGetAllocationGranularity(size_t* granularity, const hipMemAllocationProp* prop, hipMemAllocationGranularity_flags option) { CUmemAllocationProp cuProp = hipMemAllocationPropToCUmemAllocationProp(prop); return hipCUResultTohipError(cuMemGetAllocationGranularity(granularity, &cuProp, option)); } inline static hipError_t hipMemCreate(hipMemGenericAllocationHandle_t* handle, size_t size, const hipMemAllocationProp* prop, unsigned long long flags) { CUmemAllocationProp cuProp = hipMemAllocationPropToCUmemAllocationProp(prop); return hipCUResultTohipError(cuMemCreate(handle, size, &cuProp, flags)); } inline static hipError_t hipMemRelease(hipMemGenericAllocationHandle_t handle) { return hipCUResultTohipError(cuMemRelease(handle)); } inline static hipError_t hipMemAddressFree(hipDeviceptr_t ptr, size_t size) { return hipCUResultTohipError(cuMemAddressFree(ptr, size)); } inline static hipError_t hipMemAddressReserve(hipDeviceptr_t* ptr, size_t size, size_t alignment, hipDeviceptr_t addr, unsigned long long flags) { return hipCUResultTohipError(cuMemAddressReserve(ptr, size, alignment, addr, flags)); } inline static hipError_t hipMemExportToShareableHandle(void* shareableHandle, hipMemGenericAllocationHandle_t handle, hipMemAllocationHandleType handleType, unsigned long long flags) { return hipCUResultTohipError(cuMemExportToShareableHandle(shareableHandle, handle, (CUmemAllocationHandleType)handleType, flags)); } inline static hipError_t hipMemGetAccess(unsigned long long* flags, const hipMemLocation* location, hipDeviceptr_t ptr) { CUmemLocation loc = hipMemLocationToCUmemLocation(location); return hipCUResultTohipError(cuMemGetAccess(flags, &loc, ptr)); } inline static hipError_t hipMemGetAllocationPropertiesFromHandle(hipMemAllocationProp* prop, hipMemGenericAllocationHandle_t handle) { CUmemAllocationProp cuProp = hipMemAllocationPropToCUmemAllocationProp(prop); return hipCUResultTohipError(cuMemGetAllocationPropertiesFromHandle(&cuProp, handle)); } inline static hipError_t hipMemImportFromShareableHandle(hipMemGenericAllocationHandle_t* handle, void* osHandle, hipMemAllocationHandleType shHandleType) { return hipCUResultTohipError(cuMemImportFromShareableHandle(handle, osHandle, (CUmemAllocationHandleType)shHandleType)); } inline static hipError_t hipMemMap(hipDeviceptr_t ptr, size_t size, size_t offset, hipMemGenericAllocationHandle_t handle, unsigned long long flags) { return hipCUResultTohipError(cuMemMap(ptr, size, offset, handle, flags)); } inline static hipError_t hipMemMapArrayAsync(hipArrayMapInfo* mapInfoList, unsigned int count, hipStream_t stream) { return hipCUResultTohipError(cuMemMapArrayAsync(mapInfoList, count, stream)); } inline static hipError_t hipMemRetainAllocationHandle(hipMemGenericAllocationHandle_t* handle, void* addr) { return hipCUResultTohipError(cuMemRetainAllocationHandle(handle, addr)); } inline static hipError_t hipMemSetAccess(hipDeviceptr_t ptr, size_t size, const hipMemAccessDesc* desc, size_t count) { CUmemAccessDesc cuDesc = hipMemAccessDescToCUmemAccessDesc(desc); return hipCUResultTohipError(cuMemSetAccess(ptr, size, &cuDesc, count)); } inline static hipError_t hipMemUnmap(hipDeviceptr_t ptr, size_t size) { return hipCUResultTohipError(cuMemUnmap(ptr, size)); } #endif // CUDA_VERSION >= CUDA_10020 inline static hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessor(int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize) { return hipCUDAErrorTohipError(cudaOccupancyMaxActiveBlocksPerMultiprocessor(numBlocks, func, blockSize, dynamicSMemSize)); } inline static hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize, unsigned int flags) { return hipCUDAErrorTohipError(cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(numBlocks, func, blockSize, dynamicSMemSize, flags)); } inline static hipError_t hipModuleOccupancyMaxActiveBlocksPerMultiprocessor(int* numBlocks, hipFunction_t f, int blockSize, size_t dynamicSMemSize ){ return hipCUResultTohipError(cuOccupancyMaxActiveBlocksPerMultiprocessor(numBlocks, f, blockSize, dynamicSMemSize)); } inline static hipError_t hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(int* numBlocks, hipFunction_t f, int blockSize, size_t dynamicSMemSize, unsigned int flags ) { return hipCUResultTohipError(cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(numBlocks,f, blockSize, dynamicSMemSize, flags)); } //TODO - Match CUoccupancyB2DSize inline static hipError_t hipModuleOccupancyMaxPotentialBlockSize(int* gridSize, int* blockSize, hipFunction_t f, size_t dynSharedMemPerBlk, int blockSizeLimit){ return hipCUResultTohipError(cuOccupancyMaxPotentialBlockSize(gridSize, blockSize, f, NULL, dynSharedMemPerBlk, blockSizeLimit)); } //TODO - Match CUoccupancyB2DSize inline static hipError_t hipModuleOccupancyMaxPotentialBlockSizeWithFlags(int* gridSize, int* blockSize, hipFunction_t f, size_t dynSharedMemPerBlk, int blockSizeLimit, unsigned int flags){ return hipCUResultTohipError(cuOccupancyMaxPotentialBlockSizeWithFlags(gridSize, blockSize, f, NULL, dynSharedMemPerBlk, blockSizeLimit, flags)); } inline static hipError_t hipPointerGetAttributes(hipPointerAttribute_t* attributes, const void* ptr) { struct cudaPointerAttributes cPA; hipError_t err = hipCUDAErrorTohipError(cudaPointerGetAttributes(&cPA, ptr)); if (err == hipSuccess) { #if (CUDART_VERSION >= 11000) auto memType = cPA.type; #else unsigned memType = cPA.memoryType; // No auto because cuda 10.2 doesnt force c++11 #endif switch (memType) { case cudaMemoryTypeDevice: attributes->type = hipMemoryTypeDevice; break; case cudaMemoryTypeHost: attributes->type = hipMemoryTypeHost; break; case cudaMemoryTypeManaged: attributes->type = hipMemoryTypeManaged; break; default: return hipErrorInvalidValue; } attributes->device = cPA.device; attributes->devicePointer = cPA.devicePointer; attributes->hostPointer = cPA.hostPointer; attributes->isManaged = 0; attributes->allocationFlags = 0; } return err; } inline static hipError_t hipPointerGetAttribute(void* data, hipPointer_attribute attribute, hipDeviceptr_t ptr) { return hipCUResultTohipError(cuPointerGetAttribute(data, attribute, ptr)); } inline static hipError_t hipDrvPointerGetAttributes(unsigned int numAttributes, hipPointer_attribute* attributes, void** data, hipDeviceptr_t ptr) { return hipCUResultTohipError(cuPointerGetAttributes(numAttributes, attributes, data, ptr)); } inline static hipError_t hipMemGetInfo(size_t* free, size_t* total) { return hipCUDAErrorTohipError(cudaMemGetInfo(free, total)); } inline static hipError_t hipEventCreate(hipEvent_t* event) { return hipCUDAErrorTohipError(cudaEventCreate(event)); } inline static hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream __dparm(NULL)) { return hipCUDAErrorTohipError(cudaEventRecord(event, stream)); } inline static hipError_t hipEventSynchronize(hipEvent_t event) { return hipCUDAErrorTohipError(cudaEventSynchronize(event)); } inline static hipError_t hipEventElapsedTime(float* ms, hipEvent_t start, hipEvent_t stop) { return hipCUDAErrorTohipError(cudaEventElapsedTime(ms, start, stop)); } inline static hipError_t hipEventDestroy(hipEvent_t event) { return hipCUDAErrorTohipError(cudaEventDestroy(event)); } inline static hipError_t hipStreamCreateWithFlags(hipStream_t* stream, unsigned int flags) { return hipCUDAErrorTohipError(cudaStreamCreateWithFlags(stream, flags)); } inline static hipError_t hipStreamCreateWithPriority(hipStream_t* stream, unsigned int flags, int priority) { return hipCUDAErrorTohipError(cudaStreamCreateWithPriority(stream, flags, priority)); } inline static hipError_t hipDeviceGetStreamPriorityRange(int* leastPriority, int* greatestPriority) { return hipCUDAErrorTohipError(cudaDeviceGetStreamPriorityRange(leastPriority, greatestPriority)); } inline static hipError_t hipStreamCreate(hipStream_t* stream) { return hipCUDAErrorTohipError(cudaStreamCreate(stream)); } inline static hipError_t hipStreamSynchronize(hipStream_t stream) { return hipCUDAErrorTohipError(cudaStreamSynchronize(stream)); } inline static hipError_t hipStreamDestroy(hipStream_t stream) { return hipCUDAErrorTohipError(cudaStreamDestroy(stream)); } inline static hipError_t hipStreamGetFlags(hipStream_t stream, unsigned int *flags) { return hipCUDAErrorTohipError(cudaStreamGetFlags(stream, flags)); } inline static hipError_t hipStreamGetPriority(hipStream_t stream, int *priority) { return hipCUDAErrorTohipError(cudaStreamGetPriority(stream, priority)); } inline static hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags) { return hipCUDAErrorTohipError(cudaStreamWaitEvent(stream, event, flags)); } inline static hipError_t hipStreamQuery(hipStream_t stream) { return hipCUDAErrorTohipError(cudaStreamQuery(stream)); } inline static hipError_t hipStreamAddCallback(hipStream_t stream, hipStreamCallback_t callback, void* userData, unsigned int flags) { return hipCUDAErrorTohipError( cudaStreamAddCallback(stream, (cudaStreamCallback_t)callback, userData, flags)); } inline static hipError_t hipStreamGetDevice(hipStream_t stream, hipDevice_t* device) { hipCtx_t context; auto err = hipCUResultTohipError(cuStreamGetCtx(stream, &context)); if (err != hipSuccess) return err; err = hipCUResultTohipError(cuCtxPushCurrent(context)); if (err != hipSuccess) return err; err = hipCUResultTohipError(cuCtxGetDevice(device)); if (err != hipSuccess) return err; return hipCUResultTohipError(cuCtxPopCurrent(&context)); } inline static hipError_t hipDriverGetVersion(int* driverVersion) { return hipCUDAErrorTohipError(cudaDriverGetVersion(driverVersion)); } inline static hipError_t hipRuntimeGetVersion(int* runtimeVersion) { return hipCUDAErrorTohipError(cudaRuntimeGetVersion(runtimeVersion)); } inline static hipError_t hipDeviceCanAccessPeer(int* canAccessPeer, int device, int peerDevice) { return hipCUDAErrorTohipError(cudaDeviceCanAccessPeer(canAccessPeer, device, peerDevice)); } inline static hipError_t hipDeviceDisablePeerAccess(int peerDevice) { return hipCUDAErrorTohipError(cudaDeviceDisablePeerAccess(peerDevice)); } inline static hipError_t hipDeviceEnablePeerAccess(int peerDevice, unsigned int flags) { return hipCUDAErrorTohipError(cudaDeviceEnablePeerAccess(peerDevice, flags)); } inline static hipError_t hipCtxDisablePeerAccess(hipCtx_t peerCtx) { return hipCUResultTohipError(cuCtxDisablePeerAccess(peerCtx)); } inline static hipError_t hipCtxEnablePeerAccess(hipCtx_t peerCtx, unsigned int flags) { return hipCUResultTohipError(cuCtxEnablePeerAccess(peerCtx, flags)); } inline static hipError_t hipDevicePrimaryCtxGetState(hipDevice_t dev, unsigned int* flags, int* active) { return hipCUResultTohipError(cuDevicePrimaryCtxGetState(dev, flags, active)); } inline static hipError_t hipDevicePrimaryCtxRelease(hipDevice_t dev) { return hipCUResultTohipError(cuDevicePrimaryCtxRelease(dev)); } inline static hipError_t hipDevicePrimaryCtxRetain(hipCtx_t* pctx, hipDevice_t dev) { return hipCUResultTohipError(cuDevicePrimaryCtxRetain(pctx, dev)); } inline static hipError_t hipDevicePrimaryCtxReset(hipDevice_t dev) { return hipCUResultTohipError(cuDevicePrimaryCtxReset(dev)); } inline static hipError_t hipDevicePrimaryCtxSetFlags(hipDevice_t dev, unsigned int flags) { return hipCUResultTohipError(cuDevicePrimaryCtxSetFlags(dev, flags)); } inline static hipError_t hipMemGetAddressRange(hipDeviceptr_t* pbase, size_t* psize, hipDeviceptr_t dptr) { return hipCUResultTohipError(cuMemGetAddressRange(pbase, psize, dptr)); } inline static hipError_t hipMemcpyPeer(void* dst, int dstDevice, const void* src, int srcDevice, size_t count) { return hipCUDAErrorTohipError(cudaMemcpyPeer(dst, dstDevice, src, srcDevice, count)); } inline static hipError_t hipMemcpyPeerAsync(void* dst, int dstDevice, const void* src, int srcDevice, size_t count, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError( cudaMemcpyPeerAsync(dst, dstDevice, src, srcDevice, count, stream)); } // Profile APIs: inline static hipError_t hipProfilerStart() { return hipCUDAErrorTohipError(cudaProfilerStart()); } inline static hipError_t hipProfilerStop() { return hipCUDAErrorTohipError(cudaProfilerStop()); } inline static hipError_t hipGetDeviceFlags(unsigned int* flags) { return hipCUDAErrorTohipError(cudaGetDeviceFlags(flags)); } inline static hipError_t hipSetDeviceFlags(unsigned int flags) { return hipCUDAErrorTohipError(cudaSetDeviceFlags(flags)); } inline static hipError_t hipEventCreateWithFlags(hipEvent_t* event, unsigned int flags) { return hipCUDAErrorTohipError(cudaEventCreateWithFlags(event, flags)); } inline static hipError_t hipEventQuery(hipEvent_t event) { return hipCUDAErrorTohipError(cudaEventQuery(event)); } inline static hipError_t hipCtxCreate(hipCtx_t* ctx, unsigned int flags, hipDevice_t device) { return hipCUResultTohipError(cuCtxCreate(ctx, flags, device)); } inline static hipError_t hipCtxDestroy(hipCtx_t ctx) { return hipCUResultTohipError(cuCtxDestroy(ctx)); } inline static hipError_t hipCtxPopCurrent(hipCtx_t* ctx) { return hipCUResultTohipError(cuCtxPopCurrent(ctx)); } inline static hipError_t hipCtxPushCurrent(hipCtx_t ctx) { return hipCUResultTohipError(cuCtxPushCurrent(ctx)); } inline static hipError_t hipCtxSetCurrent(hipCtx_t ctx) { return hipCUResultTohipError(cuCtxSetCurrent(ctx)); } inline static hipError_t hipCtxGetCurrent(hipCtx_t* ctx) { return hipCUResultTohipError(cuCtxGetCurrent(ctx)); } inline static hipError_t hipCtxGetDevice(hipDevice_t* device) { return hipCUResultTohipError(cuCtxGetDevice(device)); } inline static hipError_t hipCtxGetApiVersion(hipCtx_t ctx, int* apiVersion) { return hipCUResultTohipError(cuCtxGetApiVersion(ctx, (unsigned int*)apiVersion)); } inline static hipError_t hipCtxGetCacheConfig(hipFuncCache* cacheConfig) { return hipCUResultTohipError(cuCtxGetCacheConfig(cacheConfig)); } inline static hipError_t hipCtxSetCacheConfig(hipFuncCache cacheConfig) { return hipCUResultTohipError(cuCtxSetCacheConfig(cacheConfig)); } inline static hipError_t hipCtxSetSharedMemConfig(hipSharedMemConfig config) { return hipCUResultTohipError(cuCtxSetSharedMemConfig((CUsharedconfig)config)); } inline static hipError_t hipCtxGetSharedMemConfig(hipSharedMemConfig* pConfig) { return hipCUResultTohipError(cuCtxGetSharedMemConfig((CUsharedconfig*)pConfig)); } inline static hipError_t hipCtxSynchronize(void) { return hipCUResultTohipError(cuCtxSynchronize()); } inline static hipError_t hipCtxGetFlags(unsigned int* flags) { return hipCUResultTohipError(cuCtxGetFlags(flags)); } inline static hipError_t hipCtxDetach(hipCtx_t ctx) { return hipCUResultTohipError(cuCtxDetach(ctx)); } inline static hipError_t hipDeviceGet(hipDevice_t* device, int ordinal) { return hipCUResultTohipError(cuDeviceGet(device, ordinal)); } inline static hipError_t hipDeviceComputeCapability(int* major, int* minor, hipDevice_t device) { return hipCUResultTohipError(cuDeviceComputeCapability(major, minor, device)); } inline static hipError_t hipDeviceGetName(char* name, int len, hipDevice_t device) { return hipCUResultTohipError(cuDeviceGetName(name, len, device)); } inline static hipError_t hipDeviceGetUuid(hipUUID* uuid, hipDevice_t device) { if (uuid == NULL) { return hipErrorInvalidValue; } struct CUuuid_st CUuid; hipError_t err = hipCUResultTohipError(cuDeviceGetUuid(&CUuid, device)); if (err == hipSuccess) { strncpy(uuid->bytes, CUuid.bytes, 16); } return err; } inline static hipError_t hipDeviceGetP2PAttribute(int* value, hipDeviceP2PAttr attr, int srcDevice, int dstDevice) { return hipCUDAErrorTohipError(cudaDeviceGetP2PAttribute(value, attr, srcDevice, dstDevice)); } inline static hipError_t hipDeviceGetPCIBusId(char* pciBusId, int len, hipDevice_t device) { return hipCUDAErrorTohipError(cudaDeviceGetPCIBusId(pciBusId, len, device)); } inline static hipError_t hipDeviceGetByPCIBusId(int* device, const char* pciBusId) { return hipCUDAErrorTohipError(cudaDeviceGetByPCIBusId(device, pciBusId)); } inline static hipError_t hipDeviceGetSharedMemConfig(hipSharedMemConfig* config) { return hipCUDAErrorTohipError(cudaDeviceGetSharedMemConfig(config)); } inline static hipError_t hipDeviceSetSharedMemConfig(hipSharedMemConfig config) { return hipCUDAErrorTohipError(cudaDeviceSetSharedMemConfig(config)); } inline static hipError_t hipDeviceGetLimit(size_t* pValue, hipLimit_t limit) { return hipCUDAErrorTohipError(cudaDeviceGetLimit(pValue, limit)); } inline static hipError_t hipDeviceSetLimit(hipLimit_t limit, size_t value) { return hipCUDAErrorTohipError(cudaDeviceSetLimit(limit, value)); } inline static hipError_t hipDeviceTotalMem(size_t* bytes, hipDevice_t device) { return hipCUResultTohipError(cuDeviceTotalMem(bytes, device)); } inline static hipError_t hipModuleLoad(hipModule_t* module, const char* fname) { return hipCUResultTohipError(cuModuleLoad(module, fname)); } inline static hipError_t hipModuleUnload(hipModule_t hmod) { return hipCUResultTohipError(cuModuleUnload(hmod)); } inline static hipError_t hipModuleGetFunction(hipFunction_t* function, hipModule_t module, const char* kname) { return hipCUResultTohipError(cuModuleGetFunction(function, module, kname)); } inline static hipError_t hipModuleGetTexRef(hipTexRef* pTexRef, hipModule_t hmod, const char* name){ return hipCUResultTohipError(cuModuleGetTexRef(pTexRef, hmod, name)); } inline static hipError_t hipFuncGetAttributes(hipFuncAttributes* attr, const void* func) { return hipCUDAErrorTohipError(cudaFuncGetAttributes(attr, func)); } inline static hipError_t hipFuncGetAttribute (int* value, hipFunction_attribute attrib, hipFunction_t hfunc) { return hipCUResultTohipError(cuFuncGetAttribute(value, attrib, hfunc)); } inline static hipError_t hipModuleGetGlobal(hipDeviceptr_t* dptr, size_t* bytes, hipModule_t hmod, const char* name) { return hipCUResultTohipError(cuModuleGetGlobal(dptr, bytes, hmod, name)); } inline static hipError_t hipModuleLoadData(hipModule_t* module, const void* image) { return hipCUResultTohipError(cuModuleLoadData(module, image)); } inline static hipError_t hipModuleLoadDataEx(hipModule_t* module, const void* image, unsigned int numOptions, hipJitOption* options, void** optionValues) { return hipCUResultTohipError( cuModuleLoadDataEx(module, image, numOptions, options, optionValues)); } inline static hipError_t hipLaunchKernel(const void* function_address, dim3 numBlocks, dim3 dimBlocks, void** args, size_t sharedMemBytes, hipStream_t stream) { return hipCUDAErrorTohipError( cudaLaunchKernel(function_address, numBlocks, dimBlocks, args, sharedMemBytes, stream)); } inline static hipError_t hipModuleLaunchKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void** kernelParams, void** extra) { return hipCUResultTohipError(cuLaunchKernel(f, gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, stream, kernelParams, extra)); } inline static hipError_t hipFuncSetCacheConfig(const void* func, hipFuncCache_t cacheConfig) { return hipCUDAErrorTohipError(cudaFuncSetCacheConfig(func, cacheConfig)); } #if CUDA_VERSION < CUDA_12000 __HIP_DEPRECATED inline static hipError_t hipBindTexture(size_t* offset, struct textureReference* tex, const void* devPtr, const hipChannelFormatDesc* desc, size_t size __dparm(UINT_MAX)) { return hipCUDAErrorTohipError(cudaBindTexture(offset, tex, devPtr, desc, size)); } __HIP_DEPRECATED inline static hipError_t hipBindTexture2D( size_t* offset, struct textureReference* tex, const void* devPtr, const hipChannelFormatDesc* desc, size_t width, size_t height, size_t pitch) { return hipCUDAErrorTohipError(cudaBindTexture2D(offset, tex, devPtr, desc, width, height, pitch)); } #endif // CUDA_VERSION < CUDA_12000 inline static hipChannelFormatDesc hipCreateChannelDesc(int x, int y, int z, int w, hipChannelFormatKind f) { return cudaCreateChannelDesc(x, y, z, w, hipChannelFormatKindToCudaChannelFormatKind(f)); } inline static hipChannelFormatDesc hipCreateChannelDescHalf() { int e = (int)sizeof(unsigned short) * 8; return cudaCreateChannelDesc(e, 0, 0, 0, cudaChannelFormatKindFloat); } inline static hipChannelFormatDesc hipCreateChannelDescHalf1() { int e = (int)sizeof(unsigned short) * 8; return cudaCreateChannelDesc(e, 0, 0, 0, cudaChannelFormatKindFloat); } inline static hipChannelFormatDesc hipCreateChannelDescHalf2() { int e = (int)sizeof(unsigned short) * 8; return cudaCreateChannelDesc(e, e, 0, 0, cudaChannelFormatKindFloat); } inline static hipChannelFormatDesc hipCreateChannelDescHalf4() { int e = (int)sizeof(unsigned short) * 8; return cudaCreateChannelDesc(e, e, e, e, cudaChannelFormatKindFloat); } inline static hipError_t hipCreateTextureObject(hipTextureObject_t* pTexObject, const hipResourceDesc* pResDesc, const hipTextureDesc* pTexDesc, const hipResourceViewDesc* pResViewDesc) { return hipCUDAErrorTohipError( cudaCreateTextureObject(pTexObject, pResDesc, pTexDesc, pResViewDesc)); } inline static hipError_t hipDestroyTextureObject(hipTextureObject_t textureObject) { return hipCUDAErrorTohipError(cudaDestroyTextureObject(textureObject)); } inline static hipError_t hipCreateSurfaceObject(hipSurfaceObject_t* pSurfObject, const hipResourceDesc* pResDesc) { return hipCUDAErrorTohipError(cudaCreateSurfaceObject(pSurfObject, pResDesc)); } inline static hipError_t hipDestroySurfaceObject(hipSurfaceObject_t surfaceObject) { return hipCUDAErrorTohipError(cudaDestroySurfaceObject(surfaceObject)); } inline static hipError_t hipGetTextureObjectResourceDesc(hipResourceDesc* pResDesc, hipTextureObject_t textureObject) { return hipCUDAErrorTohipError(cudaGetTextureObjectResourceDesc( pResDesc, textureObject)); } #if CUDA_VERSION < CUDA_12000 __HIP_DEPRECATED inline static hipError_t hipGetTextureAlignmentOffset( size_t* offset, const struct textureReference* texref) { return hipCUDAErrorTohipError(cudaGetTextureAlignmentOffset(offset,texref)); } #endif inline static hipError_t hipGetChannelDesc(hipChannelFormatDesc* desc, hipArray_const_t array) { return hipCUDAErrorTohipError(cudaGetChannelDesc(desc,array)); } inline static hipError_t hipLaunchCooperativeKernel(const void* f, dim3 gridDim, dim3 blockDim, void** kernelParams, unsigned int sharedMemBytes, hipStream_t stream) { return hipCUDAErrorTohipError( cudaLaunchCooperativeKernel(f, gridDim, blockDim, kernelParams, sharedMemBytes, stream)); } inline static hipError_t hipModuleLaunchCooperativeKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void** kernelParams) { return hipCUResultTohipError(cuLaunchCooperativeKernel(f, gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, stream,kernelParams)); } inline static hipError_t hipLaunchCooperativeKernelMultiDevice(hipLaunchParams* launchParamsList, int numDevices, unsigned int flags) { return hipCUDAErrorTohipError(cudaLaunchCooperativeKernelMultiDevice(launchParamsList, numDevices, flags)); } inline static hipError_t hipModuleLaunchCooperativeKernelMultiDevice( hipFunctionLaunchParams* launchParamsList, unsigned int numDevices, unsigned int flags) { return hipCUResultTohipError(cuLaunchCooperativeKernelMultiDevice(launchParamsList, numDevices, flags)); } inline static hipError_t hipImportExternalSemaphore(hipExternalSemaphore_t* extSem_out, const hipExternalSemaphoreHandleDesc* semHandleDesc) { return hipCUDAErrorTohipError(cudaImportExternalSemaphore(extSem_out,(const struct cudaExternalSemaphoreHandleDesc*)semHandleDesc)); } inline static hipError_t hipSignalExternalSemaphoresAsync(const hipExternalSemaphore_t* extSemArray, const hipExternalSemaphoreSignalParams* paramsArray, unsigned int numExtSems, hipStream_t stream) { return hipCUDAErrorTohipError(cudaSignalExternalSemaphoresAsync(extSemArray, (const struct cudaExternalSemaphoreSignalParams*)paramsArray, numExtSems, stream)); } inline static hipError_t hipWaitExternalSemaphoresAsync(const hipExternalSemaphore_t* extSemArray, const hipExternalSemaphoreWaitParams* paramsArray, unsigned int numExtSems, hipStream_t stream) { return hipCUDAErrorTohipError(cudaWaitExternalSemaphoresAsync(extSemArray, (const struct cudaExternalSemaphoreWaitParams*)paramsArray, numExtSems, stream)); } inline static hipError_t hipDestroyExternalSemaphore(hipExternalSemaphore_t extSem) { return hipCUDAErrorTohipError(cudaDestroyExternalSemaphore(extSem)); } inline static hipError_t hipImportExternalMemory(hipExternalMemory_t* extMem_out, const hipExternalMemoryHandleDesc* memHandleDesc) { return hipCUDAErrorTohipError(cudaImportExternalMemory(extMem_out, (const struct cudaExternalMemoryHandleDesc*)memHandleDesc)); } inline static hipError_t hipExternalMemoryGetMappedBuffer(void **devPtr, hipExternalMemory_t extMem, const hipExternalMemoryBufferDesc *bufferDesc) { return hipCUDAErrorTohipError(cudaExternalMemoryGetMappedBuffer(devPtr, extMem, (const struct cudaExternalMemoryBufferDesc*)bufferDesc)); } inline static hipError_t hipDestroyExternalMemory(hipExternalMemory_t extMem) { return hipCUDAErrorTohipError(cudaDestroyExternalMemory(extMem)); } inline static hipError_t hipGraphicsMapResources(int count, hipGraphicsResource_t* resources, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError(cudaGraphicsMapResources(count, resources, stream)); } inline static hipError_t hipGraphicsSubResourceGetMappedArray(hipArray_t* array, hipGraphicsResource_t resource, unsigned int arrayIndex, unsigned int mipLevel) { return hipCUDAErrorTohipError(cudaGraphicsSubResourceGetMappedArray(array, resource, arrayIndex, mipLevel)); } inline static hipError_t hipGraphicsResourceGetMappedPointer(void** devPtr, size_t* size, hipGraphicsResource_t resource) { return hipCUDAErrorTohipError(cudaGraphicsResourceGetMappedPointer(devPtr, size, resource)); } inline static hipError_t hipGraphicsUnmapResources(int count, hipGraphicsResource_t* resources, hipStream_t stream __dparm(0)) { return hipCUDAErrorTohipError(cudaGraphicsUnmapResources(count, resources, stream)); } inline static hipError_t hipGraphicsUnregisterResource(hipGraphicsResource_t resource) { return hipCUDAErrorTohipError(cudaGraphicsUnregisterResource(resource)); } #if CUDA_VERSION >= CUDA_11020 // ========================== HIP Stream Ordered Memory Allocator ================================= inline static hipError_t hipDeviceGetDefaultMemPool(hipMemPool_t* mem_pool, int device) { return hipCUDAErrorTohipError(cudaDeviceGetDefaultMemPool(mem_pool, device)); } inline static hipError_t hipDeviceSetMemPool(int device, hipMemPool_t mem_pool) { return hipCUDAErrorTohipError(cudaDeviceSetMemPool(device, mem_pool)); } inline static hipError_t hipDeviceGetMemPool(hipMemPool_t* mem_pool, int device) { return hipCUDAErrorTohipError(cudaDeviceGetMemPool(mem_pool, device)); } inline static hipError_t hipMallocAsync(void** dev_ptr, size_t size, hipStream_t stream) { return hipCUDAErrorTohipError(cudaMallocAsync(dev_ptr, size, stream)); } inline static hipError_t hipFreeAsync(void* dev_ptr, hipStream_t stream) { return hipCUDAErrorTohipError(cudaFreeAsync(dev_ptr, stream)); } inline static hipError_t hipMemPoolTrimTo(hipMemPool_t mem_pool, size_t min_bytes_to_hold) { return hipCUDAErrorTohipError(cudaMemPoolTrimTo(mem_pool, min_bytes_to_hold)); } inline static hipError_t hipMemPoolSetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value) { return hipCUDAErrorTohipError(cudaMemPoolSetAttribute(mem_pool, attr, value)); } inline static hipError_t hipMemPoolGetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value) { return hipCUDAErrorTohipError(cudaMemPoolGetAttribute(mem_pool, attr, value)); } inline static hipError_t hipMemPoolSetAccess( hipMemPool_t mem_pool, const hipMemAccessDesc* desc_list, size_t count) { return hipCUDAErrorTohipError(cudaMemPoolSetAccess(mem_pool, desc_list, count)); } inline static hipError_t hipMemPoolGetAccess( hipMemAccessFlags* flags, hipMemPool_t mem_pool, hipMemLocation* location) { return hipCUDAErrorTohipError(cudaMemPoolGetAccess(flags, mem_pool, location)); } inline static hipError_t hipMemPoolCreate(hipMemPool_t* mem_pool, const hipMemPoolProps* pool_props) { return hipCUDAErrorTohipError(cudaMemPoolCreate(mem_pool, pool_props)); } inline static hipError_t hipMemPoolDestroy(hipMemPool_t mem_pool) { return hipCUDAErrorTohipError(cudaMemPoolDestroy(mem_pool)); } inline static hipError_t hipMallocFromPoolAsync( void** dev_ptr, size_t size, hipMemPool_t mem_pool, hipStream_t stream) { return hipCUDAErrorTohipError(cudaMallocFromPoolAsync(dev_ptr, size, mem_pool, stream)); } inline static hipError_t hipMemPoolExportToShareableHandle( void* shared_handle, hipMemPool_t mem_pool, hipMemAllocationHandleType handle_type, unsigned int flags) { return hipCUDAErrorTohipError(cudaMemPoolExportToShareableHandle( shared_handle, mem_pool, handle_type, flags)); } inline static hipError_t hipMemPoolImportFromShareableHandle( hipMemPool_t* mem_pool, void* shared_handle, hipMemAllocationHandleType handle_type, unsigned int flags) { return hipCUDAErrorTohipError(cudaMemPoolImportFromShareableHandle( mem_pool, shared_handle, handle_type, flags)); } inline static hipError_t hipMemPoolExportPointer(hipMemPoolPtrExportData* export_data, void* ptr) { return hipCUDAErrorTohipError(cudaMemPoolExportPointer(export_data, ptr)); } inline static hipError_t hipMemPoolImportPointer( void** ptr, hipMemPool_t mem_pool, hipMemPoolPtrExportData* export_data) { return hipCUDAErrorTohipError(cudaMemPoolImportPointer(ptr, mem_pool, export_data)); } #endif // CUDA_VERSION >= CUDA_11020 #ifdef __cplusplus } #endif #ifdef __CUDACC__ template inline static hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessor(int* numBlocks, T func, int blockSize, size_t dynamicSMemSize) { return hipCUDAErrorTohipError(cudaOccupancyMaxActiveBlocksPerMultiprocessor(numBlocks, func, blockSize, dynamicSMemSize)); } template inline static hipError_t hipOccupancyMaxPotentialBlockSize(int* minGridSize, int* blockSize, T func, size_t dynamicSMemSize = 0, int blockSizeLimit = 0) { return hipCUDAErrorTohipError(cudaOccupancyMaxPotentialBlockSize(minGridSize, blockSize, func, dynamicSMemSize, blockSizeLimit)); } template inline static hipError_t hipOccupancyMaxPotentialBlockSizeVariableSMemWithFlags(int* min_grid_size, int* block_size, T func, UnaryFunction block_size_to_dynamic_smem_size, int block_size_limit = 0, unsigned int flags = 0) { return hipCUDAErrorTohipError(cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags(min_grid_size, block_size, func, block_size_to_dynamic_smem_size, block_size_limit,flags)); } template inline static hipError_t hipOccupancyMaxPotentialBlockSizeWithFlags(int* minGridSize, int* blockSize, T func, size_t dynamicSMemSize = 0, int blockSizeLimit = 0, unsigned int flags = 0) { return hipCUDAErrorTohipError(cudaOccupancyMaxPotentialBlockSize(minGridSize, blockSize, func, dynamicSMemSize, blockSizeLimit, flags)); } template inline static hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags( int* numBlocks, T func, int blockSize, size_t dynamicSMemSize,unsigned int flags) { return hipCUDAErrorTohipError(cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(numBlocks, func, blockSize, dynamicSMemSize, flags)); } #if CUDA_VERSION < CUDA_12000 template inline static hipError_t hipBindTexture(size_t* offset, const struct texture& tex, const void* devPtr, size_t size = UINT_MAX) { return hipCUDAErrorTohipError(cudaBindTexture(offset, tex, devPtr, size)); } template inline static hipError_t hipBindTexture(size_t* offset, struct texture& tex, const void* devPtr, const hipChannelFormatDesc& desc, size_t size = UINT_MAX) { return hipCUDAErrorTohipError(cudaBindTexture(offset, tex, devPtr, desc, size)); } template __HIP_DEPRECATED inline static hipError_t hipUnbindTexture(struct texture* tex) { return hipCUDAErrorTohipError(cudaUnbindTexture(tex)); } template __HIP_DEPRECATED inline static hipError_t hipUnbindTexture(struct texture& tex) { return hipCUDAErrorTohipError(cudaUnbindTexture(tex)); } template __HIP_DEPRECATED inline static hipError_t hipBindTextureToArray( struct texture& tex, hipArray_const_t array, const hipChannelFormatDesc& desc) { return hipCUDAErrorTohipError(cudaBindTextureToArray(tex, array, desc)); } template __HIP_DEPRECATED inline static hipError_t hipBindTextureToArray( struct texture* tex, hipArray_const_t array, const hipChannelFormatDesc* desc) { return hipCUDAErrorTohipError(cudaBindTextureToArray(tex, array, desc)); } template __HIP_DEPRECATED inline static hipError_t hipBindTextureToArray( struct texture& tex, hipArray_const_t array) { return hipCUDAErrorTohipError(cudaBindTextureToArray(tex, array)); } #endif // CUDA_VERSION < CUDA_12000 template inline static hipChannelFormatDesc hipCreateChannelDesc() { return cudaCreateChannelDesc(); } template inline static hipError_t hipLaunchCooperativeKernel(T f, dim3 gridDim, dim3 blockDim, void** kernelParams, unsigned int sharedMemBytes, hipStream_t stream) { return hipCUDAErrorTohipError( cudaLaunchCooperativeKernel(reinterpret_cast(f), gridDim, blockDim, kernelParams, sharedMemBytes, stream)); } inline static hipError_t hipTexObjectCreate(hipTextureObject_t* pTexObject, const HIP_RESOURCE_DESC* pResDesc, const HIP_TEXTURE_DESC* pTexDesc, const HIP_RESOURCE_VIEW_DESC* pResViewDesc) { return hipCUResultTohipError(cuTexObjectCreate((CUtexObject*)pTexObject, pResDesc, pTexDesc, pResViewDesc)); } inline static hipError_t hipTexObjectDestroy(hipTextureObject_t texObject) { return hipCUResultTohipError(cuTexObjectDestroy((CUtexObject)texObject)); } inline static hipError_t hipTexObjectGetResourceDesc(HIP_RESOURCE_DESC* pResDesc, hipTextureObject_t texObject) { return hipCUResultTohipError(cuTexObjectGetResourceDesc(pResDesc, (CUtexObject)texObject)); } inline static hipError_t hipTexObjectGetResourceViewDesc(HIP_RESOURCE_VIEW_DESC* pResViewDesc, hipTextureObject_t texObject) { return hipCUResultTohipError(cuTexObjectGetResourceViewDesc(pResViewDesc, (CUtexObject)texObject)); } inline static hipError_t hipTexObjectGetTextureDesc(HIP_TEXTURE_DESC* pTexDesc, hipTextureObject_t texObject) { return hipCUResultTohipError(cuTexObjectGetTextureDesc(pTexDesc, (CUtexObject)texObject)); } __HIP_DEPRECATED inline static hipError_t hipTexRefSetAddressMode(hipTexRef hTexRef, int dim, hipAddress_mode am){ return hipCUResultTohipError(cuTexRefSetAddressMode(hTexRef,dim,am)); } __HIP_DEPRECATED inline static hipError_t hipTexRefSetFilterMode(hipTexRef hTexRef, hipFilter_mode fm){ return hipCUResultTohipError(cuTexRefSetFilterMode(hTexRef,fm)); } inline static hipError_t hipTexRefSetAddress(size_t *ByteOffset, hipTexRef hTexRef, hipDeviceptr_t dptr, size_t bytes){ return hipCUResultTohipError(cuTexRefSetAddress(ByteOffset,hTexRef,dptr,bytes)); } inline static hipError_t hipTexRefSetAddress2D(hipTexRef hTexRef, const CUDA_ARRAY_DESCRIPTOR *desc, hipDeviceptr_t dptr, size_t Pitch){ return hipCUResultTohipError(cuTexRefSetAddress2D(hTexRef,desc,dptr,Pitch)); } __HIP_DEPRECATED inline static hipError_t hipTexRefSetFormat(hipTexRef hTexRef, hipArray_Format fmt, int NumPackedComponents){ return hipCUResultTohipError(cuTexRefSetFormat(hTexRef,fmt,NumPackedComponents)); } __HIP_DEPRECATED inline static hipError_t hipTexRefSetFlags(hipTexRef hTexRef, unsigned int Flags){ return hipCUResultTohipError(cuTexRefSetFlags(hTexRef,Flags)); } __HIP_DEPRECATED inline static hipError_t hipTexRefSetArray(hipTexRef hTexRef, hiparray hArray, unsigned int Flags){ return hipCUResultTohipError(cuTexRefSetArray(hTexRef,hArray,Flags)); } inline static hipError_t hipArrayCreate(hiparray* pHandle, const HIP_ARRAY_DESCRIPTOR* pAllocateArray){ return hipCUResultTohipError(cuArrayCreate(pHandle, pAllocateArray)); } inline static hipError_t hipArrayDestroy(hiparray hArray){ return hipCUResultTohipError(cuArrayDestroy(hArray)); } inline static hipError_t hipArray3DCreate(hiparray* pHandle, const HIP_ARRAY3D_DESCRIPTOR* pAllocateArray){ return hipCUResultTohipError(cuArray3DCreate(pHandle, pAllocateArray)); } inline static hipError_t hipArrayGetInfo(hipChannelFormatDesc* desc, hipExtent* extent, unsigned int* flags, hipArray* array) { return hipCUDAErrorTohipError(cudaArrayGetInfo(desc, extent, flags, array)); } inline static hipError_t hipArrayGetDescriptor(HIP_ARRAY_DESCRIPTOR* pArrayDescriptor, hipArray* array) { return hipCUResultTohipError(cuArrayGetDescriptor(pArrayDescriptor, (CUarray)array)); } inline static hipError_t hipArray3DGetDescriptor(HIP_ARRAY3D_DESCRIPTOR* pArrayDescriptor, hipArray* array) { return hipCUResultTohipError(cuArray3DGetDescriptor(pArrayDescriptor, (CUarray)array)); } inline static hipError_t hipStreamBeginCapture(hipStream_t stream, hipStreamCaptureMode mode) { return hipCUDAErrorTohipError(cudaStreamBeginCapture(stream, mode)); } inline static hipError_t hipStreamEndCapture(hipStream_t stream, hipGraph_t* pGraph) { return hipCUDAErrorTohipError(cudaStreamEndCapture(stream, pGraph)); } inline static hipError_t hipGraphCreate(hipGraph_t* pGraph, unsigned int flags) { return hipCUDAErrorTohipError(cudaGraphCreate(pGraph, flags)); } inline static hipError_t hipGraphDestroy(hipGraph_t graph) { return hipCUDAErrorTohipError(cudaGraphDestroy(graph)); } inline static hipError_t hipGraphExecDestroy(hipGraphExec_t pGraphExec) { return hipCUDAErrorTohipError(cudaGraphExecDestroy(pGraphExec)); } inline static hipError_t hipGraphInstantiate(hipGraphExec_t* pGraphExec, hipGraph_t graph, hipGraphNode_t* pErrorNode, char* pLogBuffer, size_t bufferSize) { return hipCUDAErrorTohipError( cudaGraphInstantiate(pGraphExec, graph, pErrorNode, pLogBuffer, bufferSize)); } #if CUDA_VERSION >= CUDA_11040 inline static hipError_t hipGraphInstantiateWithFlags(hipGraphExec_t* pGraphExec, hipGraph_t graph, unsigned long long flags) { return hipCUDAErrorTohipError(cudaGraphInstantiateWithFlags(pGraphExec, graph, flags)); } inline hipError_t hipGraphAddMemAllocNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipMemAllocNodeParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphAddMemAllocNode( pGraphNode, graph, pDependencies, numDependencies, pNodeParams)); } inline hipError_t hipGraphMemAllocNodeGetParams(hipGraphNode_t node, hipMemAllocNodeParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphMemAllocNodeGetParams(node, pNodeParams)); } inline hipError_t hipGraphAddMemFreeNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dev_ptr) { return hipCUDAErrorTohipError(cudaGraphAddMemFreeNode( pGraphNode, graph, pDependencies, numDependencies, dev_ptr)); } inline hipError_t hipGraphMemFreeNodeGetParams(hipGraphNode_t node, void* dev_ptr) { return hipCUDAErrorTohipError(cudaGraphMemFreeNodeGetParams(node, dev_ptr)); } #endif inline static hipError_t hipGraphLaunch(hipGraphExec_t graphExec, hipStream_t stream) { return hipCUDAErrorTohipError(cudaGraphLaunch(graphExec, stream)); } inline static hipError_t hipGraphAddKernelNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipKernelNodeParams* pNodeParams) { return hipCUDAErrorTohipError( cudaGraphAddKernelNode(pGraphNode, graph, pDependencies, numDependencies, pNodeParams)); } inline static hipError_t hipGraphAddMemcpyNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipMemcpy3DParms* pCopyParams) { return hipCUDAErrorTohipError( cudaGraphAddMemcpyNode(pGraphNode, graph, pDependencies, numDependencies, pCopyParams)); } #if CUDA_VERSION >= CUDA_11010 inline static hipError_t hipGraphAddMemcpyNode1D(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dst, const void* src, size_t count, hipMemcpyKind kind) { return hipCUDAErrorTohipError( cudaGraphAddMemcpyNode1D(pGraphNode, graph, pDependencies, numDependencies, dst, src, count, kind)); } #endif inline static hipError_t hipGraphAddMemsetNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipMemsetParams* pMemsetParams) { return hipCUDAErrorTohipError( cudaGraphAddMemsetNode(pGraphNode, graph, pDependencies, numDependencies, pMemsetParams)); } inline static hipError_t hipGraphGetNodes(hipGraph_t graph, hipGraphNode_t* nodes, size_t* numNodes) { return hipCUDAErrorTohipError(cudaGraphGetNodes(graph, nodes, numNodes)); } inline static hipError_t hipGraphGetRootNodes(hipGraph_t graph, hipGraphNode_t* pRootNodes, size_t* pNumRootNodes) { return hipCUDAErrorTohipError(cudaGraphGetRootNodes(graph, pRootNodes, pNumRootNodes)); } inline static hipError_t hipGraphKernelNodeGetParams(hipGraphNode_t node, hipKernelNodeParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphKernelNodeGetParams(node, pNodeParams)); } inline static hipError_t hipGraphKernelNodeSetParams(hipGraphNode_t node, const hipKernelNodeParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphKernelNodeSetParams(node, pNodeParams)); } inline static hipError_t hipGraphKernelNodeSetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, const hipKernelNodeAttrValue* value) { return hipCUDAErrorTohipError(cudaGraphKernelNodeSetAttribute(hNode, attr, value)); } inline static hipError_t hipGraphKernelNodeGetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, hipKernelNodeAttrValue* value) { return hipCUDAErrorTohipError(cudaGraphKernelNodeGetAttribute(hNode, attr, value)); } inline static hipError_t hipGraphMemcpyNodeGetParams(hipGraphNode_t node, hipMemcpy3DParms* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphMemcpyNodeGetParams(node, pNodeParams)); } inline static hipError_t hipGraphMemcpyNodeSetParams(hipGraphNode_t node, const hipMemcpy3DParms* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphMemcpyNodeSetParams(node, pNodeParams)); } inline static hipError_t hipGraphMemsetNodeGetParams(hipGraphNode_t node, hipMemsetParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphMemsetNodeGetParams(node, pNodeParams)); } inline static hipError_t hipGraphMemsetNodeSetParams(hipGraphNode_t node, const hipMemsetParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphMemsetNodeSetParams(node, pNodeParams)); } inline static hipError_t hipThreadExchangeStreamCaptureMode(hipStreamCaptureMode* mode) { return hipCUDAErrorTohipError(cudaThreadExchangeStreamCaptureMode(mode)); } inline static hipError_t hipGraphExecKernelNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, const hipKernelNodeParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphExecKernelNodeSetParams(hGraphExec, node, pNodeParams)); } inline static hipError_t hipGraphAddDependencies(hipGraph_t graph, const hipGraphNode_t* from, const hipGraphNode_t* to, size_t numDependencies) { return hipCUDAErrorTohipError(cudaGraphAddDependencies(graph, from, to, numDependencies)); } inline static hipError_t hipGraphAddEmptyNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies) { return hipCUDAErrorTohipError( cudaGraphAddEmptyNode(pGraphNode, graph, pDependencies, numDependencies)); } inline static hipError_t hipStreamWriteValue32(hipStream_t stream, void* ptr, int32_t value, unsigned int flags) { if (value < 0) { printf("Warning! value is negative, CUDA accept positive values\n"); } return hipCUResultTohipError(cuStreamWriteValue32(stream, reinterpret_cast(ptr), static_cast(value), flags)); } inline static hipError_t hipStreamWriteValue64(hipStream_t stream, void* ptr, int64_t value, unsigned int flags) { if (value < 0) { printf("Warning! value is negative, CUDA accept positive values\n"); } return hipCUResultTohipError(cuStreamWriteValue64(stream, reinterpret_cast(ptr), static_cast(value), flags)); } inline static hipError_t hipStreamWaitValue32(hipStream_t stream, void* ptr, int32_t value, unsigned int flags, uint32_t mask __dparm(0xFFFFFFFF)) { if (value < 0) { printf("Warning! value is negative, CUDA accept positive values\n"); } if (mask != STREAM_OPS_WAIT_MASK_32) { printf("Warning! mask will not have impact as CUDA ignores it.\n"); } return hipCUResultTohipError(cuStreamWaitValue32(stream, reinterpret_cast(ptr), static_cast(value), flags)); } inline static hipError_t hipStreamWaitValue64(hipStream_t stream, void* ptr, int64_t value, unsigned int flags, uint64_t mask __dparm(0xFFFFFFFFFFFFFFFF)) { if (value < 0) { printf("Warning! value is negative, CUDA accept positive values\n"); } if (mask != STREAM_OPS_WAIT_MASK_64) { printf("Warning! mask will not have impact as CUDA ignores it.\n"); } return hipCUResultTohipError(cuStreamWaitValue64(stream, reinterpret_cast(ptr), static_cast(value), flags)); } inline static hipError_t hipGraphRemoveDependencies(hipGraph_t graph, const hipGraphNode_t* from, const hipGraphNode_t* to, size_t numDependencies) { return hipCUDAErrorTohipError(cudaGraphRemoveDependencies(graph, from, to, numDependencies)); } inline static hipError_t hipGraphGetEdges(hipGraph_t graph, hipGraphNode_t* from, hipGraphNode_t* to, size_t* numEdges) { return hipCUDAErrorTohipError(cudaGraphGetEdges(graph, from, to, numEdges)); } inline static hipError_t hipGraphNodeGetDependencies(hipGraphNode_t node, hipGraphNode_t* pDependencies, size_t* pNumDependencies) { return hipCUDAErrorTohipError( cudaGraphNodeGetDependencies(node, pDependencies, pNumDependencies)); } inline static hipError_t hipGraphNodeGetDependentNodes(hipGraphNode_t node, hipGraphNode_t* pDependentNodes, size_t* pNumDependentNodes) { return hipCUDAErrorTohipError( cudaGraphNodeGetDependentNodes(node, pDependentNodes, pNumDependentNodes)); } inline static hipError_t hipGraphNodeGetType(hipGraphNode_t node, hipGraphNodeType* pType) { return hipCUDAErrorTohipError(cudaGraphNodeGetType(node, pType)); } inline static hipError_t hipGraphDestroyNode(hipGraphNode_t node) { return hipCUDAErrorTohipError(cudaGraphDestroyNode(node)); } inline static hipError_t hipGraphClone(hipGraph_t* pGraphClone, hipGraph_t originalGraph) { return hipCUDAErrorTohipError(cudaGraphClone(pGraphClone, originalGraph)); } inline static hipError_t hipGraphNodeFindInClone(hipGraphNode_t* pNode, hipGraphNode_t originalNode, hipGraph_t clonedGraph) { return hipCUDAErrorTohipError(cudaGraphNodeFindInClone(pNode, originalNode, clonedGraph)); } inline static hipError_t hipGraphAddChildGraphNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipGraph_t childGraph) { return hipCUDAErrorTohipError( cudaGraphAddChildGraphNode(pGraphNode, graph, pDependencies, numDependencies, childGraph)); } inline static hipError_t hipGraphChildGraphNodeGetGraph(hipGraphNode_t node, hipGraph_t* pGraph) { return hipCUDAErrorTohipError(cudaGraphChildGraphNodeGetGraph(node, pGraph)); } #if CUDA_VERSION >= CUDA_11010 inline static hipError_t hipGraphExecChildGraphNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, hipGraph_t childGraph) { return hipCUDAErrorTohipError( cudaGraphExecChildGraphNodeSetParams(hGraphExec, node, childGraph)); } #endif inline static hipError_t hipStreamGetCaptureInfo(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus, unsigned long long* pId) { return hipCUDAErrorTohipError(cudaStreamGetCaptureInfo(stream, pCaptureStatus, pId)); } #if CUDA_VERSION >= CUDA_11030 inline static hipError_t hipStreamGetCaptureInfo_v2( hipStream_t stream, hipStreamCaptureStatus* captureStatus_out, unsigned long long* id_out __dparm(0), hipGraph_t* graph_out __dparm(0), const hipGraphNode_t** dependencies_out __dparm(0), size_t* numDependencies_out __dparm(0)) { return hipCUResultTohipError(cuStreamGetCaptureInfo_v2( stream, reinterpret_cast(captureStatus_out), reinterpret_cast(id_out), graph_out, dependencies_out, numDependencies_out)); } #endif inline static hipError_t hipStreamIsCapturing(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus) { return hipCUDAErrorTohipError(cudaStreamIsCapturing(stream, pCaptureStatus)); } #if CUDA_VERSION >= CUDA_11030 inline static hipError_t hipStreamUpdateCaptureDependencies(hipStream_t stream, hipGraphNode_t* dependencies, size_t numDependencies, unsigned int flags __dparm(0)) { return hipCUDAErrorTohipError(cudaStreamUpdateCaptureDependencies(stream, dependencies, numDependencies, flags)); } #endif #if CUDA_VERSION >= CUDA_11010 inline static hipError_t hipGraphAddEventRecordNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipEvent_t event) { return hipCUDAErrorTohipError( cudaGraphAddEventRecordNode(pGraphNode, graph, pDependencies, numDependencies, event)); } inline static hipError_t hipGraphAddEventWaitNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipEvent_t event) { return hipCUDAErrorTohipError( cudaGraphAddEventWaitNode(pGraphNode, graph, pDependencies, numDependencies, event)); } #endif inline static hipError_t hipGraphAddHostNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipHostNodeParams* pNodeParams) { return hipCUDAErrorTohipError( cudaGraphAddHostNode(pGraphNode, graph, pDependencies, numDependencies, pNodeParams)); } #if CUDA_VERSION >= CUDA_11010 inline static hipError_t hipGraphAddMemcpyNodeFromSymbol(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dst, const void* symbol, size_t count, size_t offset, hipMemcpyKind kind) { return hipCUDAErrorTohipError(cudaGraphAddMemcpyNodeFromSymbol( pGraphNode, graph, pDependencies, numDependencies, dst, symbol, count, offset, kind)); } inline static hipError_t hipGraphAddMemcpyNodeToSymbol(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const void* symbol, const void* src, size_t count, size_t offset, hipMemcpyKind kind) { return hipCUDAErrorTohipError(cudaGraphAddMemcpyNodeToSymbol( pGraphNode, graph, pDependencies, numDependencies, symbol, src, count, offset, kind)); } inline static hipError_t hipGraphEventRecordNodeSetEvent(hipGraphNode_t node, hipEvent_t event) { return hipCUDAErrorTohipError(cudaGraphEventRecordNodeSetEvent(node, event)); } inline static hipError_t hipGraphEventWaitNodeGetEvent(hipGraphNode_t node, hipEvent_t* event_out) { return hipCUDAErrorTohipError(cudaGraphEventWaitNodeGetEvent(node, event_out)); } inline static hipError_t hipGraphEventWaitNodeSetEvent(hipGraphNode_t node, hipEvent_t event) { return hipCUDAErrorTohipError(cudaGraphEventWaitNodeSetEvent(node, event)); } #endif inline static hipError_t hipGraphExecHostNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, const hipHostNodeParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphExecHostNodeSetParams(hGraphExec, node, pNodeParams)); } inline static hipError_t hipGraphExecMemcpyNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, hipMemcpy3DParms* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphExecMemcpyNodeSetParams(hGraphExec, node, pNodeParams)); } #if CUDA_VERSION >= CUDA_11010 inline static hipError_t hipGraphExecMemcpyNodeSetParams1D(hipGraphExec_t hGraphExec, hipGraphNode_t node, void* dst, const void* src, size_t count, hipMemcpyKind kind) { return hipCUDAErrorTohipError( cudaGraphExecMemcpyNodeSetParams1D(hGraphExec, node, dst, src, count, kind)); } inline static hipError_t hipGraphExecMemcpyNodeSetParamsFromSymbol(hipGraphExec_t hGraphExec, hipGraphNode_t node, void* dst, const void* symbol, size_t count, size_t offset, hipMemcpyKind kind) { return hipCUDAErrorTohipError(cudaGraphExecMemcpyNodeSetParamsFromSymbol( hGraphExec, node, dst, symbol, count, offset, kind)); } inline static hipError_t hipGraphExecMemcpyNodeSetParamsToSymbol( hipGraphExec_t hGraphExec, hipGraphNode_t node, const void* symbol, const void* src, size_t count, size_t offset, hipMemcpyKind kind) { return hipCUDAErrorTohipError(cudaGraphExecMemcpyNodeSetParamsToSymbol( hGraphExec, node, symbol, src, count, offset, kind)); } #endif inline static hipError_t hipGraphExecMemsetNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, const hipMemsetParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphExecMemsetNodeSetParams(hGraphExec, node, pNodeParams)); } inline static hipError_t hipGraphExecUpdate(hipGraphExec_t hGraphExec, hipGraph_t hGraph, hipGraphNode_t* hErrorNode_out, hipGraphExecUpdateResult* updateResult_out) { return hipCUDAErrorTohipError( cudaGraphExecUpdate(hGraphExec, hGraph, hErrorNode_out, updateResult_out)); } #if CUDA_VERSION >= CUDA_11010 inline static hipError_t hipGraphMemcpyNodeSetParamsFromSymbol(hipGraphNode_t node, void* dst, const void* symbol, size_t count, size_t offset, hipMemcpyKind kind) { return hipCUDAErrorTohipError( cudaGraphMemcpyNodeSetParamsFromSymbol(node, dst, symbol, count, offset, kind)); } inline static hipError_t hipGraphMemcpyNodeSetParamsToSymbol(hipGraphNode_t node, const void* symbol, const void* src, size_t count, size_t offset, hipMemcpyKind kind) { return hipCUDAErrorTohipError( cudaGraphMemcpyNodeSetParamsToSymbol(node, symbol, src, count, offset, kind)); } inline static hipError_t hipGraphEventRecordNodeGetEvent(hipGraphNode_t node, hipEvent_t* event_out) { return hipCUDAErrorTohipError(cudaGraphEventRecordNodeGetEvent(node, event_out)); } #endif inline static hipError_t hipGraphHostNodeGetParams(hipGraphNode_t node, hipHostNodeParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphHostNodeGetParams(node, pNodeParams)); } #if CUDA_VERSION >= CUDA_11010 inline static hipError_t hipGraphMemcpyNodeSetParams1D(hipGraphNode_t node, void* dst, const void* src, size_t count, hipMemcpyKind kind) { return hipCUDAErrorTohipError(cudaGraphMemcpyNodeSetParams1D(node, dst, src, count, kind)); } inline static hipError_t hipGraphExecEventRecordNodeSetEvent(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, hipEvent_t event) { return hipCUDAErrorTohipError(cudaGraphExecEventRecordNodeSetEvent(hGraphExec, hNode, event)); } inline static hipError_t hipGraphExecEventWaitNodeSetEvent(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, hipEvent_t event) { return hipCUDAErrorTohipError(cudaGraphExecEventWaitNodeSetEvent(hGraphExec, hNode, event)); } inline static hipError_t hipDeviceGetGraphMemAttribute(int device, hipGraphMemAttributeType attr, void* value) { return hipCUDAErrorTohipError(cudaDeviceGetGraphMemAttribute(device, attr, value)); } inline static hipError_t hipDeviceSetGraphMemAttribute(int device, hipGraphMemAttributeType attr, void* value) { return hipCUDAErrorTohipError(cudaDeviceSetGraphMemAttribute(device, attr, value)); } inline static hipError_t hipDeviceGraphMemTrim(int device) { return hipCUDAErrorTohipError(cudaDeviceGraphMemTrim(device)); } inline static hipError_t hipLaunchHostFunc(hipStream_t stream, hipHostFn_t fn, void* userData) { return hipCUDAErrorTohipError(cudaLaunchHostFunc(stream, fn, userData)); } inline static hipError_t hipUserObjectCreate(hipUserObject_t* object_out, void* ptr, hipHostFn_t destroy, unsigned int initialRefcount, unsigned int flags) { return hipCUDAErrorTohipError(cudaUserObjectCreate(object_out, ptr, destroy, initialRefcount, flags)); } inline static hipError_t hipUserObjectRelease(hipUserObject_t object, unsigned int count __dparm(1)) { return hipCUDAErrorTohipError(cudaUserObjectRelease(object, count)); } inline static hipError_t hipUserObjectRetain(hipUserObject_t object, unsigned int count __dparm(1)) { return hipCUDAErrorTohipError(cudaUserObjectRelease(object, count)); } inline static hipError_t hipGraphRetainUserObject(hipGraph_t graph, hipUserObject_t object, unsigned int count __dparm(1), unsigned int flags __dparm(0)) { return hipCUDAErrorTohipError(cudaGraphRetainUserObject(graph, object, count, flags)); } inline static hipError_t hipGraphReleaseUserObject(hipGraph_t graph, hipUserObject_t object, unsigned int count __dparm(1)) { return hipCUDAErrorTohipError(cudaGraphReleaseUserObject(graph, object, count)); } #endif inline static hipError_t hipGraphHostNodeSetParams(hipGraphNode_t node, const hipHostNodeParams* pNodeParams) { return hipCUDAErrorTohipError(cudaGraphHostNodeSetParams(node, pNodeParams)); } #if CUDA_VERSION >= CUDA_11030 inline static hipError_t hipGraphDebugDotPrint(hipGraph_t graph, const char* path, unsigned int flags) { return hipCUDAErrorTohipError(cudaGraphDebugDotPrint(graph, path, flags)); } #endif #if CUDA_VERSION >= CUDA_11000 inline static hipError_t hipGraphKernelNodeCopyAttributes(hipGraphNode_t hSrc, hipGraphNode_t hDst) { return hipCUDAErrorTohipError(cudaGraphKernelNodeCopyAttributes(hSrc, hDst)); } #endif #if CUDA_VERSION >= CUDA_11060 inline static hipError_t hipGraphNodeSetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int isEnabled) { return hipCUDAErrorTohipError(cudaGraphNodeSetEnabled(hGraphExec, hNode, isEnabled)); } inline static hipError_t hipGraphNodeGetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int* isEnabled) { return hipCUDAErrorTohipError(cudaGraphNodeGetEnabled(hGraphExec, hNode, isEnabled)); } #endif #if CUDA_VERSION >= CUDA_11010 inline static hipError_t hipGraphUpload(hipGraphExec_t graphExec, hipStream_t stream) { return hipCUDAErrorTohipError(cudaGraphUpload(graphExec, stream)); } #endif #endif //__CUDACC__ #endif // HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_RUNTIME_API_H clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_texture_types.h000066400000000000000000000002301450307266000267700ustar00rootroot00000000000000#ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_TEXTURE_TYPES_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_TEXTURE_TYPES_H #include #endif clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hip_unsafe_atomics.h000066400000000000000000000066471450307266000270660ustar00rootroot00000000000000/* Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_UNSAFE_ATOMICS_H #define HIP_INCLUDE_HIP_NVIDIA_DETAIL_HIP_UNSAFE_ATOMICS_H __device__ inline float unsafeAtomicAdd(float* addr, float value) { return atomicAdd(addr, value); } __device__ inline double unsafeAtomicAdd(double* addr, double value) { #if __CUDA_ARCH__ < 600 unsigned long long *addr_cast = (unsigned long long*)addr; unsigned long long old_val = *addr_cast; unsigned long long expected; do { expected = old_val; old_val = atomicCAS(addr_cast, expected, __double_as_longlong(value + __longlong_as_double(expected))); } while (__double_as_longlong(expected) != __double_as_longlong(old_val)); return old_val; #else return atomicAdd(addr, value); #endif } __device__ inline float unsafeAtomicMax(float* addr, float value) { return atomicMax(addr, value); } __device__ inline double unsafeAtomicMax(double* addr, double val) { return atomicMax(addr, val); } __device__ inline float unsafeAtomicMin(float* addr, float value) { return atomicMin(addr, value); } __device__ inline double unsafeAtomicMin(double* addr, double val) { return atomicMin(addr, val); } __device__ inline float safeAtomicAdd(float* addr, float value) { return atomicAdd(addr, value); } __device__ inline double safeAtomicAdd(double* addr, double value) { #if __CUDA_ARCH__ < 600 unsigned long long *addr_cast = (unsigned long long*)addr; unsigned long long old_val = *addr_cast; unsigned long long expected; do { expected = old_val; old_val = atomicCAS(addr_cast, expected, __double_as_longlong(value + __longlong_as_double(expected))); } while (__double_as_longlong(expected) != __double_as_longlong(old_val)); return old_val; #else return atomicAdd(addr, value); #endif } __device__ inline float safeAtomicMax(float* addr, float value) { return atomicMax(addr, value); } __device__ inline double safeAtomicMax(double* addr, double val) { return atomicMax(addr, val); } __device__ inline float safeAtomicMin(float* addr, float value) { return atomicMin(addr, value); } __device__ inline double safeAtomicMin(double* addr, double val) { return atomicMin(addr, val); } #endif clr-rocm-5.7.1/hipamd/include/hip/nvidia_detail/nvidia_hiprtc.h000066400000000000000000000144631450307266000245120ustar00rootroot00000000000000/* Copyright (c) 2021 - 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIPRTC_H #define HIPRTC_H #include #include #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #if !defined(_WIN32) #pragma GCC visibility push(default) #endif typedef enum hiprtcResult { HIPRTC_SUCCESS = 0, HIPRTC_ERROR_OUT_OF_MEMORY = 1, HIPRTC_ERROR_PROGRAM_CREATION_FAILURE = 2, HIPRTC_ERROR_INVALID_INPUT = 3, HIPRTC_ERROR_INVALID_PROGRAM = 4, HIPRTC_ERROR_INVALID_OPTION = 5, HIPRTC_ERROR_COMPILATION = 6, HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE = 7, HIPRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION = 8, HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION = 9, HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID = 10, HIPRTC_ERROR_INTERNAL_ERROR = 11 } hiprtcResult; inline static nvrtcResult hiprtcResultTonvrtcResult(hiprtcResult result) { switch (result) { case HIPRTC_SUCCESS: return NVRTC_SUCCESS; case HIPRTC_ERROR_OUT_OF_MEMORY: return NVRTC_ERROR_OUT_OF_MEMORY; case HIPRTC_ERROR_PROGRAM_CREATION_FAILURE: return NVRTC_ERROR_PROGRAM_CREATION_FAILURE; case HIPRTC_ERROR_INVALID_INPUT: return NVRTC_ERROR_INVALID_INPUT; case HIPRTC_ERROR_INVALID_PROGRAM: return NVRTC_ERROR_INVALID_PROGRAM; case HIPRTC_ERROR_INVALID_OPTION: return NVRTC_ERROR_INVALID_OPTION; case HIPRTC_ERROR_COMPILATION: return NVRTC_ERROR_COMPILATION; case HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE: return NVRTC_ERROR_BUILTIN_OPERATION_FAILURE; case HIPRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION: return NVRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION; case HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION: return NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION; case HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID: return NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID; case HIPRTC_ERROR_INTERNAL_ERROR: return NVRTC_ERROR_INTERNAL_ERROR; default: return NVRTC_ERROR_INTERNAL_ERROR; } } inline static hiprtcResult nvrtcResultTohiprtcResult(nvrtcResult result) { switch (result) { case NVRTC_SUCCESS: return HIPRTC_SUCCESS; case NVRTC_ERROR_OUT_OF_MEMORY: return HIPRTC_ERROR_OUT_OF_MEMORY; case NVRTC_ERROR_PROGRAM_CREATION_FAILURE: return HIPRTC_ERROR_PROGRAM_CREATION_FAILURE; case NVRTC_ERROR_INVALID_INPUT: return HIPRTC_ERROR_INVALID_INPUT; case NVRTC_ERROR_INVALID_PROGRAM: return HIPRTC_ERROR_INVALID_PROGRAM; case NVRTC_ERROR_INVALID_OPTION: return HIPRTC_ERROR_INVALID_OPTION; case NVRTC_ERROR_COMPILATION: return HIPRTC_ERROR_COMPILATION; case NVRTC_ERROR_BUILTIN_OPERATION_FAILURE: return HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE; case NVRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION: return HIPRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION; case NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION: return HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION; case NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID: return HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID; case NVRTC_ERROR_INTERNAL_ERROR: return HIPRTC_ERROR_INTERNAL_ERROR; default: return HIPRTC_ERROR_INTERNAL_ERROR; } } inline static const char* hiprtcGetErrorString(hiprtcResult result) { return nvrtcGetErrorString(hiprtcResultTonvrtcResult(result)); } inline static hiprtcResult hiprtcVersion(int* major, int* minor) { return nvrtcResultTohiprtcResult(nvrtcVersion(major, minor)); } typedef nvrtcProgram hiprtcProgram; inline static hiprtcResult hiprtcAddNameExpression(hiprtcProgram prog, const char* name_expression) { return nvrtcResultTohiprtcResult(nvrtcAddNameExpression(prog, name_expression)); } inline static hiprtcResult hiprtcCompileProgram(hiprtcProgram prog, int numOptions, const char** options) { return nvrtcResultTohiprtcResult(nvrtcCompileProgram(prog, numOptions, options)); } inline static hiprtcResult hiprtcCreateProgram(hiprtcProgram* prog, const char* src, const char* name, int numHeaders, const char** headers, const char** includeNames) { return nvrtcResultTohiprtcResult( nvrtcCreateProgram(prog, src, name, numHeaders, headers, includeNames)); } inline static hiprtcResult hiprtcDestroyProgram(hiprtcProgram* prog) { return nvrtcResultTohiprtcResult(nvrtcDestroyProgram(prog)); } inline static hiprtcResult hiprtcGetLoweredName(hiprtcProgram prog, const char* name_expression, const char** lowered_name) { return nvrtcResultTohiprtcResult(nvrtcGetLoweredName(prog, name_expression, lowered_name)); } inline static hiprtcResult hiprtcGetProgramLog(hiprtcProgram prog, char* log) { return nvrtcResultTohiprtcResult(nvrtcGetProgramLog(prog, log)); } inline static hiprtcResult hiprtcGetProgramLogSize(hiprtcProgram prog, size_t* logSizeRet) { return nvrtcResultTohiprtcResult(nvrtcGetProgramLogSize(prog, logSizeRet)); } inline static hiprtcResult hiprtcGetCode(hiprtcProgram prog, char* code) { return nvrtcResultTohiprtcResult(nvrtcGetPTX(prog, code)); } inline static hiprtcResult hiprtcGetCodeSize(hiprtcProgram prog, size_t* codeSizeRet) { return nvrtcResultTohiprtcResult(nvrtcGetPTXSize(prog, codeSizeRet)); } #if !defined(_WIN32) #pragma GCC visibility pop #endif #ifdef __cplusplus } #endif /* __cplusplus */ #endif // HIPRTC_H clr-rocm-5.7.1/hipamd/install.sh000077500000000000000000000074211450307266000165200ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2017 - 2021 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. # Parse command-line options # Option strings SHORT=h LONG=help,opencl:,hip:,rocclr: # read the options OPTS=$(getopt --options $SHORT --long $LONG --name "$0" -- "$@") if [ $? != 0 ] ; then echo "Failed to parse options...exiting." >&2 ; exit 1 ; fi usage() { echo "Usage: $0 --hip --opencl --rocclr " ; exit 1; } [ $# -eq 0 ] && usage eval set -- "$OPTS" # extract options and their arguments into variables. while true ; do case "$1" in --hip ) HIP_DIR="$2" shift 2 ;; --rocclr ) ROCCLR_DIR="$2" shift 2 ;; --opencl ) OPENCL_DIR="$2" shift 2 ;; -h | --help ) usage shift ;; -- ) shift break ;; *) echo "Internal error!" exit 1 ;; esac done BUILD_ROOT="$( mktemp -d )" SRC_ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" WORKING_DIR=$PWD DASH_JAY="-j $(getconf _NPROCESSORS_ONLN)" OS_NAME="$(cat /etc/os-release | awk -F '=' '/^NAME/{print $2}' | awk '{print $1}' | tr -d '"')" [[ -z "$ROCM_PATH" ]] && ROCM_PATH=/opt/rocm err() { echo "${1-Died}." >&2 } die() { err "$1" exit 1 } pushd () { command pushd "$@" > /dev/null } popd () { command popd "$@" > /dev/null } function setupENV() { if [ "$OS_NAME" == "Ubuntu" ] then sudo apt-get update sudo apt-get install dpkg-dev rpm doxygen libelf-dev rename liburi-encode-perl \ libfile-basedir-perl libfile-copy-recursive-perl libfile-listing-perl elif [ "$OS_NAME" == "CentOS" ] then yum install dpkg-dev rpm-build doxygen elfutils-libelf-devel prename \ perl-URI-Encode perl-File-Listing perl-File-BaseDir fi } function buildHIP() { pushd $BUILD_ROOT HIP_BUILD_DIR="$BUILD_ROOT/hip_build" mkdir $HIP_BUILD_DIR pushd $HIP_BUILD_DIR cmake $SRC_ROOT -DHIP_COMMON_DIR="$HIP_DIR" -DAMD_OPENCL_PATH=$OPENCL_DIR -DROCCLR_PATH=$ROCCLR_DIR -DCMAKE_PREFIX_PATH="$ROCM_PATH" -DCMAKE_BUILD_TYPE=Release make $DASH_JAY make package if [ "$OS_NAME" == "Ubuntu" ] then cp hip-*.deb $WORKING_DIR sudo dpkg -i -B hip-dev*.deb hip-runtime-amd*.deb hip-sample*.deb hip-doc*.deb elif [ "$OS_NAME" == "CentOS" ] then cp hip-*.rpm $WORKING_DIR sudo rpm -ivh --replacefiles --force hip-devel*.rpm hip-runtime-amd*.rpm hip-sample*.rpm \ hip-doc*.rpm fi popd popd rm -rf $BUILD_ROOT } echo "Preparing build environment" setupENV || die "setupENV failed" echo "Building and installing HIP packages" buildHIP || die "buildHIP failed" echo "Finished building HIP packages" clr-rocm-5.7.1/hipamd/packaging/000077500000000000000000000000001450307266000164335ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/packaging/CMakeLists.txt000066400000000000000000000355051450307266000212030ustar00rootroot00000000000000# Copyright (c) 2020 - 2022 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.16.8) #set components for HIP set(CPACK_COMPONENTS_ALL binary dev doc runtime-nvidia) # ASAN Package requires only libraries and license file if(ENABLE_ASAN_PACKAGING) set(CPACK_COMPONENTS_ALL asan) endif() ###############Install Required files for all compnents######## #Enable Component Install set(CPACK_RPM_COMPONENT_INSTALL ON) set(CPACK_DEB_COMPONENT_INSTALL ON) ###Set License#### set(CPACK_RESOURCE_FILE_LICENSE ${hip_SOURCE_DIR}/LICENSE.txt) install(FILES ${CPACK_RESOURCE_FILE_LICENSE} DESTINATION ${CMAKE_INSTALL_DOCDIR} COMPONENT binary) # install license file in share/doc/hip-asan folder install(FILES ${CPACK_RESOURCE_FILE_LICENSE} DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan COMPONENT asan) set(CPACK_RPM_PACKAGE_LICENSE "MIT") #Begin binary files install if(HIP_PLATFORM STREQUAL "amd" ) if(BUILD_SHARED_LIBS) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libamdhip64.so DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libamdhip64.so.${HIP_LIB_VERSION_MAJOR} DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libamdhip64.so.${HIP_LIB_VERSION_STRING} DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libhiprtc.so DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libhiprtc.so.${HIP_LIB_VERSION_MAJOR} DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libhiprtc.so.${HIP_LIB_VERSION_STRING} DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libhiprtc-builtins.so DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libhiprtc-builtins.so.${HIP_LIB_VERSION_MAJOR} DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libhiprtc-builtins.so.${HIP_LIB_VERSION_STRING} DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) # Add libraries to asan package install(DIRECTORY ${PROJECT_BINARY_DIR}/lib/ DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan PATTERN ".hipInfo" EXCLUDE) else() install(PROGRAMS ${PROJECT_BINARY_DIR}/lib/libamdhip64.a DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) endif()#End BUILD_SHARED_LIBS #TODO:This do not belong in BINARY package. #Keeping it as is for now install(FILES ${CMAKE_BINARY_DIR}/hipamd/.hipInfo DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install ( EXPORT hip-targets FILE hip-targets.cmake NAMESPACE hip:: DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hip COMPONENT binary) install(FILES ${CMAKE_BINARY_DIR}/hipamd/src/hip-lang-config.cmake ${CMAKE_BINARY_DIR}/hipamd/src/hip-lang-config-version.cmake DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hip-lang COMPONENT binary) install ( EXPORT hip-lang-targets FILE hip-lang-targets.cmake NAMESPACE hip-lang:: DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hip-lang COMPONENT binary) install(FILES ${CMAKE_BINARY_DIR}/hipamd/hiprtc-config.cmake ${CMAKE_BINARY_DIR}/hipamd/hiprtc-config-version.cmake DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hiprtc COMPONENT binary) install ( EXPORT hiprtc-targets FILE hiprtc-targets.cmake NAMESPACE hiprtc:: DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hiprtc COMPONENT binary) endif()#End HIP_PLATFORM = "amd" #End bianry files install #Begin dev files install if(WIN32) install(DIRECTORY ${HIP_COMMON_DIR}/bin DESTINATION . COMPONENT dev USE_SOURCE_PERMISSIONS) else() install(DIRECTORY ${HIP_COMMON_DIR}/bin DESTINATION . COMPONENT dev USE_SOURCE_PERMISSIONS DIRECTORY_PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE PATTERN *.bat EXCLUDE) endif() install(DIRECTORY ${hip_SOURCE_DIR}/bin DESTINATION . COMPONENT dev USE_SOURCE_PERMISSIONS DIRECTORY_PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE) install(DIRECTORY ${HIP_COMMON_DIR}/include DESTINATION . COMPONENT dev) install(DIRECTORY ${hip_SOURCE_DIR}/include/hip/amd_detail DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/hip COMPONENT dev) install(DIRECTORY ${hip_SOURCE_DIR}/include/hip/nvidia_detail DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/hip COMPONENT dev) install(FILES ${CMAKE_BINARY_DIR}/hipamd/include/hip/amd_detail/hip_prof_str.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/hip/amd_detail COMPONENT dev) install(FILES ${CMAKE_BINARY_DIR}/hipamd/include/hip/hip_version.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/hip COMPONENT dev) if(WIN32) install(FILES ${CMAKE_BINARY_DIR}/hipamd/.hipVersion DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT dev) else() install(FILES ${CMAKE_BINARY_DIR}/hipamd/.hipVersion DESTINATION ${CMAKE_INSTALL_DATADIR}/hip RENAME version COMPONENT dev) endif() install(DIRECTORY ${HIP_COMMON_DIR}/cmake/ DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hip COMPONENT dev) install(FILES ${CMAKE_BINARY_DIR}/hipamd/hip-config.cmake ${CMAKE_BINARY_DIR}/hipamd/hip-config-version.cmake DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hip COMPONENT dev) install(FILES ${CMAKE_BINARY_DIR}/hipamd/hip-config-amd.cmake DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hip COMPONENT dev) install(FILES ${CMAKE_BINARY_DIR}/hipamd/hip-config-nvidia.cmake DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/hip COMPONENT dev) #End dev files install #Begin doc files install find_program(DOXYGEN_EXE doxygen) if(DOXYGEN_EXE) if(EXISTS "${HIP_COMMON_DIR}/docs/doxygen-input/doxy.cfg") add_custom_target(build_doxygen ALL COMMAND HIP_PATH=${HIP_COMMON_DIR} doxygen ${HIP_COMMON_DIR}/docs/doxygen-input/doxy.cfg) elseif(EXISTS "${HIP_COMMON_DIR}/docs/.doxygen/Doxyfile") add_custom_target(build_doxygen ALL COMMAND HIP_PATH=${HIP_COMMON_DIR} doxygen ${HIP_COMMON_DIR}/docs/.doxygen/Doxyfile) else() message(FATAL_ERROR "Unable to find doxygen config file") endif() install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/RuntimeAPI/html DESTINATION ${CMAKE_INSTALL_DOCDIR}/RuntimeAPI COMPONENT doc) endif() #End doc files install ################################## # Packaging steps COMMON Variables ################################## set(CPACK_SET_DESTDIR TRUE) set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.") set(CPACK_PACKAGE_CONTACT "HIP Support ") set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "HIP:Heterogenous-computing Interface for Portability") set(CPACK_PACKAGE_VERSION_MAJOR ${HIP_VERSION_MAJOR}) set(CPACK_PACKAGE_VERSION_MINOR ${HIP_VERSION_MINOR}) set(CPACK_PACKAGE_VERSION_PATCH ${HIP_VERSION_PATCH}) set(CPACK_PACKAGE_VERSION ${HIP_VERSION_MAJOR}.${HIP_VERSION_MINOR}.${HIP_PACKAGING_VERSION_PATCH}) set(CPACK_GENERATOR "TGZ;DEB;RPM" CACHE STRING "Package types to build") set(CPACK_RPM_EXCLUDE_FROM_AUTO_FILELIST_ADDITION "/opt") if (CPACK_RPM_PACKAGE_RELEASE MATCHES "local" ) #If building locally default value will cause build failure #DEBUG SYMBOL pacaking require SOURCE_DIR to be small set(CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX ${CPACK_INSTALL_PREFIX}) endif() # Eventhough hip-runtime package has libraries,it was not in the package provides list, # since CPACK_RPM_PACKAGE_AUTOREQPROV was set to "no". # Use AUTOREQ,(rather than AUTOREQPROV) so that package will also provides the libraries set(CPACK_RPM_PACKAGE_AUTOREQ " no") set(CPACK_RPM_FILE_NAME "RPM-DEFAULT") set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT") set(CPACK_SOURCE_GENERATOR "TGZ") #Begin Binary Packaging setting set(CPACK_BINARY_DEB "ON") set(CPACK_BINARY_RPM "ON") set(CPACK_DEBIAN_BINARY_PACKAGE_NAME "hip-runtime-amd") set(CPACK_RPM_BINARY_PACKAGE_NAME "hip-runtime-amd") set(CPACK_COMPONENT_BINARY_DESCRIPTION "HIP:Heterogenous-computing Interface for Portability [RUNTIME - AMD]") if(FILE_REORG_BACKWARD_COMPATIBILITY) #This is used for softlinking hip-target files configure_file(hip-runtime-amd.postinst ${CMAKE_CURRENT_BINARY_DIR}/binary/postinst @ONLY) configure_file(hip-runtime-amd.prerm ${CMAKE_CURRENT_BINARY_DIR}/binary/prerm @ONLY) set(CPACK_DEBIAN_BINARY_PACKAGE_CONTROL_EXTRA "${CMAKE_CURRENT_BINARY_DIR}/binary/postinst;${CMAKE_CURRENT_BINARY_DIR}/binary/prerm") endif() set(CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS "hsa-rocr-dev (>= 1.3), rocminfo, comgr (>= 2.0), rocm-llvm, libc6, rocm-core, hipcc") set(CPACK_DEBIAN_BINARY_PACKAGE_PROVIDES "hip-rocclr (= ${CPACK_PACKAGE_VERSION})") set(CPACK_DEBIAN_BINARY_PACKAGE_REPLACES "hip-rocclr (= ${CPACK_PACKAGE_VERSION})") set(CPACK_RPM_BINARY_PACKAGE_ARCHITECTURE "${CMAKE_SYSTEM_PROCESSOR}") if(FILE_REORG_BACKWARD_COMPATIBILITY) set(CPACK_RPM_BINARY_POST_INSTALL_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/binary/postinst") set(CPACK_RPM_BINARY_PRE_UNINSTALL_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/binary/prerm") endif() string(REPLACE "-" "_" HIP_BASE_VERSION ${CPACK_PACKAGE_VERSION}) set(CPACK_RPM_BINARY_PACKAGE_REQUIRES "hsa-rocr-dev >= 1.3, rocminfo, comgr >= 2.0, rocm-llvm, rocm-core, hipcc") set(CPACK_RPM_BINARY_PACKAGE_PROVIDES "hip-rocclr = ${HIP_BASE_VERSION}") set(CPACK_RPM_BINARY_PACKAGE_OBSOLETES "hip-rocclr = ${HIP_BASE_VERSION}") #End Binary Packaging setting #Begin dev Packaging setting set(CPACK_DEV_DEB "ON") set(CPACK_DEV_RPM "ON") set(CPACK_DEBIAN_DEV_PACKAGE_NAME "hip-dev") set(CPACK_RPM_DEV_PACKAGE_NAME "hip-devel") set(CPACK_COMPONENT_DEV_DESCRIPTION "HIP: Heterogenous-computing Interface for Portability [DEVELOPMENT]") configure_file(hip-devel.postinst ${CMAKE_CURRENT_BINARY_DIR}/dev/postinst @ONLY) configure_file(hip-devel.prerm ${CMAKE_CURRENT_BINARY_DIR}/dev/prerm @ONLY) set(CPACK_DEBIAN_DEV_PACKAGE_CONTROL_EXTRA "${CMAKE_CURRENT_BINARY_DIR}/dev/postinst;${CMAKE_CURRENT_BINARY_DIR}/dev/prerm") set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "perl (>= 5.0), liburi-encode-perl, libfile-basedir-perl, libfile-copy-recursive-perl, libfile-listing-perl, libfile-which-perl, libc6, file, rocm-core, hipcc") set(CPACK_DEBIAN_DEV_PACKAGE_PROVIDES "hip-base") set(CPACK_DEBIAN_DEV_PACKAGE_REPLACES "hip-base") set(CPACK_RPM_DEV_POST_INSTALL_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/dev/postinst") set(CPACK_RPM_DEV_PRE_UNINSTALL_SCRIPT_FILE "${CMAKE_CURRENT_BINARY_DIR}/dev/prerm") set(CPACK_RPM_DEV_PACKAGE_REQUIRES "perl >= 5.0, perl-File-Which, perl-File-Listing, perl-File-BaseDir, perl-URI-Encode, file, rocm-core, hipcc") set(CPACK_RPM_DEV_PACKAGE_PROVIDES "hip-base") set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "hip-base") #End dev Packaging setting #Begin doc Packaging setting set(CPACK_DOC_DEB "ON") set(CPACK_DOC_RPM "ON") set(CPACK_DEBIAN_DOC_PACKAGE_NAME "hip-doc") set(CPACK_RPM_DOC_PACKAGE_NAME "hip-doc") set(CPACK_COMPONENT_DOC_DESCRIPTION "HIP: Heterogenous-computing Interface for Portability [DOCUMENTATION]") set(CPACK_DEBIAN_DOC_PACKAGE_DEPENDS "hip-dev (= ${CPACK_PACKAGE_VERSION}-${CPACK_DEBIAN_PACKAGE_RELEASE}), rocm-core, hipcc") set(CPACK_DEBIAN_DOC_PACKAGE_PROVIDES "hip-doc") string(REPLACE "-" "_" HIP_BASE_VERSION ${CPACK_PACKAGE_VERSION}) set(CPACK_RPM_DOC_PACKAGE_REQUIRES "hip-devel = ${HIP_BASE_VERSION}-${CPACK_RPM_PACKAGE_RELEASE}, rocm-core, hipcc") #End doc Packaging setting #Begin runtime-nvidia Packaging setting set(CPACK_RUNTIME-NVIDIA_DEB "ON") set(CPACK_RUNTIME-NVIDIA_RPM "ON") set(CPACK_DEBIAN_RUNTIME-NVIDIA_PACKAGE_NAME "hip-runtime-nvidia") set(CPACK_RPM_RUNTIME-NVIDIA_PACKAGE_NAME "hip-runtime-nvidia") set(CPACK_COMPONENT_RUNTIME-NVIDIA_DESCRIPTION "HIP: Heterogenous-computing Interface for Portability [RUNTIME-NVIDIA]") set(CPACK_DEBIAN_RUNTIME-NVIDIA_PACKAGE_DEPENDS "cuda (>= 7.5), rocm-core, hipcc") set(CPACK_DEBIAN_RUNTIME-NVIDIA_PACKAGE_PROVIDES "hip-nvcc") set(CPACK_DEBIAN_RUNTIME-NVIDIA_PACKAGE_REPLACES "hip-nvcc") set(CPACK_RPM_RUNTIME-NVIDIA_PACKAGE_PROVIDES "hip-nvcc") set(CPACK_RPM_RUNTIME-NVIDIA_PACKAGE_OBSOLETES "hip-nvcc") set(CPACK_RPM_RUNTIME-NVIDIA_PACKAGE_REQUIRES "cuda >= 7.5, rocm-core, hipcc") # Begin asan Packaging setting set(CPACK_ASAN_DEB "ON") set(CPACK_ASAN_RPM "ON") set(CPACK_DEBIAN_ASAN_PACKAGE_NAME "hip-runtime-amd-asan") set(CPACK_RPM_ASAN_PACKAGE_NAME "hip-runtime-amd-asan") set(CPACK_COMPONENT_ASAN_DESCRIPTION "HIP:Heterogenous-computing Interface for Portability [AddressSanitizer libraries]") set(CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-dev (>= 1.3), rocminfo, comgr-asan (>= 2.0), rocm-llvm, libc6, rocm-core-asan") set(CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-dev >= 1.3, rocminfo, comgr-asan >= 2.0, rocm-llvm, rocm-core-asan") #End asan Packaging setting # Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake if(NOT ROCM_DEP_ROCMCORE) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_BINARY_PACKAGE_REQUIRES ${CPACK_RPM_BINARY_PACKAGE_REQUIRES}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS ${CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES ${CPACK_RPM_DEV_PACKAGE_REQUIRES}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS ${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOC_PACKAGE_REQUIRES ${CPACK_RPM_DOC_PACKAGE_REQUIRES}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOC_PACKAGE_DEPENDS ${CPACK_DEBIAN_DOC_PACKAGE_DEPENDS}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME-NVIDIA_PACKAGE_REQUIRES ${CPACK_RPM_RUNTIME-NVIDIA_PACKAGE_REQUIRES}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME-NVIDIA_PACKAGE_DEPENDS ${CPACK_DEBIAN_RUNTIME-NVIDIA_PACKAGE_DEPENDS}) string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES ${CPACK_RPM_ASAN_PACKAGE_REQUIRES}) string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS ${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS}) endif() include(CPack) clr-rocm-5.7.1/hipamd/packaging/convert_md_to_html.sh000077500000000000000000000052511450307266000226630ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2016 - 2021 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. function die { echo "${1-Died}." >&2 exit 1 } function cleanup { rm -rf "$workdir" } # parse arguments hip_srcdir=$1 html_destdir=$2 [ "$hip_srcdir" != "" ] || [ "$html_destdir" != "" ] || die "Invalid arguments!" # create temporary directory for grip settings workdir=`mktemp -d` trap cleanup EXIT # setup grip export GRIPURL=$hip_srcdir export GRIPHOME=$workdir echo "CACHE_DIRECTORY = '$html_destdir/asset'" > $workdir/settings.py mkdir -p $html_destdir $html_destdir/docs/markdown # convert all md files to html pushd $hip_srcdir for f in *.md docs/markdown/*.md; do grip --export --no-inline $f $html_destdir/${f%.*}.html; done popd # convert absolute links to relative links pushd $html_destdir for f in *.html; do sed -i "s?$GRIPURL/??g" $f; done for f in docs/markdown/*.html; do sed -i "s?$GRIPURL/?../../?g" $f; done popd # update document titles pushd $html_destdir for f in *.html; do sed -i "s?.md - Grip??g" $f; done for f in docs/markdown/*.html; do sed -i "s?.md - Grip??g" $f; done popd # replace .md with .html in links pushd $html_destdir for f in *.html; do sed -i "s?.md\"?.html\"?g" $f; done for f in *.html; do sed -i "s?.md#?.html#?g" $f; done for f in docs/markdown/*.html; do sed -i "s?.md\"?.html\"?g" $f; done for f in docs/markdown/*.html; do sed -i "s?.md#?.html#?g" $f; done popd # replace github.io links pushd $html_destdir sed -i "s?http://rocm-developer-tools.github.io/HIP?docs/RuntimeAPI/html/index.html?g" README.html sed -i "s?http://rocm-developer-tools.github.io/HIP?docs/RuntimeAPI/html/?g" RELEASE.html popd exit 0 clr-rocm-5.7.1/hipamd/packaging/hip-devel.postinst000077500000000000000000000031741450307266000221250ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2016 - 2021 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. ROCMDIR=@ROCM_PATH@ HIPINCDIR=$ROCMDIR/@CMAKE_INSTALL_INCLUDEDIR@/hip CURRENTDIR=`pwd` # The following will be removed after upstream updation cd $HIPINCDIR ln -r -s -f amd_detail hcc_detail ln -r -s -f nvidia_detail nvcc_detail cd $CURRENTDIR #FILE_REORG_BACKWARD_COMPATIBILITY HIPINCDIR=$ROCMDIR/hip/include/hip if [ -d $HIPINCDIR ]; then # The following will be removed after upstream updation cd $HIPINCDIR ln -r -s -f amd_detail hcc_detail ln -r -s -f nvidia_detail nvcc_detail cd $CURRENTDIR fi clr-rocm-5.7.1/hipamd/packaging/hip-devel.prerm000077500000000000000000000031431450307266000213630ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2016 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. ROCMDIR=@ROCM_PATH@ CURRENTDIR=`pwd` HIPINCDIR=$ROCMDIR/@CMAKE_INSTALL_INCLUDEDIR@/hip ([ ! -d $HIPINCDIR ]) && exit 0 cd $HIPINCDIR rm hcc_detail rm nvcc_detail cd $CURRENTDIR #FILE_REORG_BACKWARD_COMPATIBILITY #backward copatibility code , to be removed later HIPDIR=$ROCMDIR/hip HIPINCDIR=$ROCMDIR/hip/include/hip ([ ! -d $HIPINCDIR ]) && exit 0 cd $HIPINCDIR rm -f hcc_detail rm -f nvcc_detail cd $CURRENTDIR ([ ! -d $HIPDIR ]) && exit 0 rmdir --ignore-fail-on-non-empty $HIPDIR clr-rocm-5.7.1/hipamd/packaging/hip-runtime-amd.postinst000077500000000000000000000036531450307266000232520ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2020 - 2022 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. ROCMDIR=@ROCM_PATH@ ROCMCMAKEDIR=$ROCMDIR/@CMAKE_INSTALL_LIBDIR@/cmake HIPCMAKEDIR=$ROCMDIR/hip/lib/cmake CURRENTDIR=`pwd` mkdir -p $HIPCMAKEDIR/hip mkdir -p $HIPCMAKEDIR/hip-lang mkdir -p $HIPCMAKEDIR/hiprtc HIPTARGETFILES=$(ls -A $ROCMCMAKEDIR/hip | grep "^hip-targets") cd $HIPCMAKEDIR/hip for f in $HIPTARGETFILES do ln -s -r -f $ROCMCMAKEDIR/hip/$f $(basename $f) done cd $CURRENTDIR HIPLANGTARGETFILES=$(ls -A $ROCMCMAKEDIR/hip-lang | grep "^hip-lang-targets") cd $HIPCMAKEDIR/hip-lang for f in $HIPLANGTARGETFILES do ln -s -r -f $ROCMCMAKEDIR/hip-lang/$f $(basename $f) done cd $CURRENTDIR HIPRTCTARGETFILES=$(ls -A $ROCMCMAKEDIR/hiprtc | grep "^hiprtc-targets") cd $HIPCMAKEDIR/hiprtc for f in $HIPRTCTARGETFILES do ln -s -r -f $ROCMCMAKEDIR/hiprtc/$f $(basename $f) done cd $CURRENTDIR clr-rocm-5.7.1/hipamd/packaging/hip-runtime-amd.prerm000077500000000000000000000044161450307266000225120ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2016 - 2022 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. ROCMDIR=@ROCM_PATH@ HIPDIR=$ROCMDIR/hip HIPCMAKEDIR=$ROCMDIR/hip/lib/cmake/hip HIPLANGCMAKEDIR=$ROCMDIR/hip/lib/cmake/hip-lang HIPRTCCMAKEDIR=$ROCMDIR/hip/lib/cmake/hiprtc CURRENTDIR=`pwd` ([ ! -d $ROCMDIR ] || [ ! -d $HIPDIR ]) && exit 0 ([ ! -d $HIPCMAKEDIR ] ) && exit 0 # Remove soft-links to hip-target HIPTARGETFILES=$(ls -A $HIPCMAKEDIR | grep "^hip-targets") cd $HIPCMAKEDIR for f in $HIPTARGETFILES; do [ -e $f ] || continue rm $(basename $f) done cd $CURRENTDIR ([ ! -d $HIPLANGCMAKEDIR ] ) && exit 0 # Remove soft-links to hip-lang-target HIPLANGTARGETFILES=$(ls -A $HIPLANGCMAKEDIR | grep "^hip-lang-targets") cd $HIPLANGCMAKEDIR for f in $HIPLANGTARGETFILES; do [ -e $f ] || continue rm $(basename $f) done cd $CURRENTDIR ([ ! -d $HIPRTCCMAKEDIR ] ) && exit 0 # Remove soft-links to hiprtc-target HIPRTCTARGETFILES=$(ls -A $HIPRTCCMAKEDIR | grep "^hiprtc-targets") cd $HIPRTCCMAKEDIR for f in $HIPRTCTARGETFILES; do [ -e $f ] || continue rm $(basename $f) done cd $CURRENTDIR rmdir --ignore-fail-on-non-empty $HIPCMAKEDIR rmdir --ignore-fail-on-non-empty $HIPLANGCMAKEDIR rmdir --ignore-fail-on-non-empty $HIPRTCCMAKEDIR clr-rocm-5.7.1/hipamd/packaging/hip-runtime-nvidia.postinst000077500000000000000000000023431450307266000237560ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. ROCMDIR=@ROCM_PATH@ HIPDIR=$ROCMDIR/hip if [ -d $ROCMDIR ] ; then ln -s -f $ROCMDIR /opt/rocm fi clr-rocm-5.7.1/hipamd/packaging/hip-runtime-nvidia.prerm000077500000000000000000000022611450307266000232170ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. if [ -L "/opt/rocm" ] ; then unlink /opt/rocm fi clr-rocm-5.7.1/hipamd/src/000077500000000000000000000000001450307266000152765ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/src/CMakeLists.txt000066400000000000000000000264041450307266000200440ustar00rootroot00000000000000# Copyright (c) 2020 - 2022 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.5.1) include(GNUInstallDirs) set(VERSION_MAJOR_AMDHIP ${HIP_VERSION_MAJOR}) set(VERSION_MINOR_AMDHIP ${HIP_VERSION_MINOR}) if(ADDRESS_SANITIZER) set(ASAN_LINKER_FLAGS "-fsanitize=address") set(ASAN_COMPILER_FLAGS "-fno-omit-frame-pointer -fsanitize=address") if(NOT CMAKE_COMPILER_IS_GNUCC) if(BUILD_SHARED_LIBS) set(ASAN_COMPILER_FLAGS "${ASAN_COMPILER_FLAGS} -shared-libsan") set(ASAN_LINKER_FLAGS "${ASAN_LINKER_FLAGS} -shared-libsan") else() set(ASAN_LINKER_FLAGS "${ASAN_LINKER_FLAGS} -static-libsan") endif() endif() set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ASAN_COMPILER_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ASAN_COMPILER_FLAGS}") set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${ASAN_LINKER_FLAGS} -s -Wl,--build-id=sha1") set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} ${ASAN_LINKER_FLAGS} -Wl,--build-id=sha1") endif() if(CMAKE_COMPILER_IS_GNUCC) set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Werror") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-error=deprecated-declarations") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=deprecated-declarations") endif() option(DISABLE_DIRECT_DISPATCH "Disable Direct Dispatch" OFF) option(BUILD_SHARED_LIBS "Build the shared library" ON) list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_LIST_DIR}/cmake") if(BUILD_SHARED_LIBS) add_library(amdhip64 SHARED) # Windows doesn't have a strip utility, so CMAKE_STRIP won't be set. if((CMAKE_BUILD_TYPE STREQUAL "Release") AND NOT ("${CMAKE_STRIP}" STREQUAL "")) add_custom_command(TARGET amdhip64 POST_BUILD COMMAND ${CMAKE_STRIP} $) endif() else() add_library(amdhip64 STATIC rocclr) endif() set_target_properties(amdhip64 PROPERTIES CXX_STANDARD 17 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF POSITION_INDEPENDENT_CODE ON # Workaround for many places in the HIP project # having hardcoded references to build/lib/libamdhip64.so LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib ARCHIVE_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib) if(CMAKE_SIZEOF_VOID_P EQUAL 8) set_target_properties(amdhip64 PROPERTIES OUTPUT_NAME "amdhip64") else() set_target_properties(amdhip64 PROPERTIES OUTPUT_NAME "amdhip32") endif() # Disable versioning for Windows # as currently HIP_LIB_VERSION_STRING and HIP_LIB_VERSION_MAJOR # are not being populated if(NOT WIN32) if(BUILD_SHARED_LIBS) set_target_properties(amdhip64 PROPERTIES VERSION ${HIP_LIB_VERSION_STRING} SOVERSION ${HIP_LIB_VERSION_MAJOR}) endif() endif() target_sources(amdhip64 PRIVATE fixme.cpp hip_activity.cpp hip_code_object.cpp hip_context.cpp hip_device_runtime.cpp hip_device.cpp hip_error.cpp hip_event.cpp hip_event_ipc.cpp hip_fatbin.cpp hip_global.cpp hip_graph_internal.cpp hip_graph.cpp hip_hmm.cpp hip_intercept.cpp hip_memory.cpp hip_mempool.cpp hip_mempool_impl.cpp hip_module.cpp hip_peer.cpp hip_platform.cpp hip_profile.cpp hip_stream_ops.cpp hip_stream.cpp hip_surface.cpp hip_texture.cpp hip_gl.cpp hip_vm.cpp) if(WIN32) target_sources(amdhip64 PRIVATE hip_runtime.cpp) endif() if(BUILD_SHARED_LIBS) if(WIN32) target_sources(amdhip64 PRIVATE amdhip.def) else() target_link_libraries(amdhip64 PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_LIST_DIR}/hip_hcc.map.in") set_target_properties(amdhip64 PROPERTIES LINK_DEPENDS "${CMAKE_CURRENT_LIST_DIR}/hip_hcc.map.in") endif() endif() if(WIN32) configure_file(hip_hcc_in.rc.in hip_hcc_info.rc @ONLY) target_sources(amdhip64 PRIVATE hip_hcc_info.rc) endif() target_include_directories(amdhip64 PRIVATE ${HIP_COMMON_INCLUDE_DIR} ${PROJECT_SOURCE_DIR}/include ${PROJECT_BINARY_DIR}/include) target_compile_definitions(amdhip64 PRIVATE __HIP_PLATFORM_AMD__) target_link_libraries(amdhip64 PRIVATE ${OPENGL_LIBRARIES}) target_link_libraries(amdhip64 PRIVATE ${CMAKE_DL_LIBS}) # Note in static case we cannot link against rocclr. # If we would, we'd also have to export rocclr and have hipcc pass it to the linker. if(BUILD_SHARED_LIBS) target_link_libraries(amdhip64 PRIVATE rocclr) else() target_compile_definitions(amdhip64 PRIVATE $) target_include_directories(amdhip64 PRIVATE $) endif() if(DISABLE_DIRECT_DISPATCH) target_compile_definitions(amdhip64 PRIVATE DISABLE_DIRECT_DISPATCH) endif() # Short-Term solution for pre-compiled headers for online compilation # Enable pre compiled header if(__HIP_ENABLE_PCH) find_package(LLVM REQUIRED CONFIG PATHS ${ROCM_PATH}/llvm) # find_package(LLVM) returns the lib/cmake/llvm location. We require the root. if(NOT DEFINED HIP_LLVM_ROOT) set(HIP_LLVM_ROOT "${LLVM_DIR}/../../..") endif() execute_process(COMMAND sh -c "${CMAKE_CURRENT_SOURCE_DIR}/hip_embed_pch.sh ${HIP_COMMON_INCLUDE_DIR} ${PROJECT_BINARY_DIR}/include ${PROJECT_SOURCE_DIR}/include ${HIP_LLVM_ROOT}" COMMAND_ECHO STDERR RESULT_VARIABLE EMBED_PCH_RC WORKING_DIRECTORY ${CMAKE_BINARY_DIR}) if (EMBED_PCH_RC AND NOT EMBED_PCH_RC EQUAL 0) message(FATAL_ERROR "Failed to embed PCH") endif() target_compile_definitions(amdhip64 PRIVATE __HIP_ENABLE_PCH) target_sources(amdhip64 PRIVATE ${CMAKE_BINARY_DIR}/hip_pch.o) endif() set(HIPRTC_OBJECTS) # Add hiprtc set(FIXME_DIR ${CMAKE_CURRENT_BINARY_DIR}/CMakeFiles/amdhip64.dir) add_subdirectory(hiprtc) if(NOT WIN32) if(BUILD_SHARED_LIBS) target_link_libraries(amdhip64 PRIVATE ${HIPRTC_OBJECTS}) target_compile_definitions(amdhip64 PRIVATE __HIP_ENABLE_RTC) add_dependencies(amdhip64 hiprtc-builtins) INSTALL(TARGETS hiprtc-builtins RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) endif() endif() ############################# # Profiling API support ############################# # Generate profiling API macros/structures header option(USE_PROF_API "Enable roctracer integration" ON) # Enable profiling API if(USE_PROF_API) set(PROF_API_STR "${PROJECT_BINARY_DIR}/include/hip/amd_detail/hip_prof_str.h") set(PROF_API_STR_IN "${CMAKE_SOURCE_DIR}/hipamd/include/hip/amd_detail/hip_prof_str.h") set(PROF_API_HDR "${HIP_COMMON_INCLUDE_DIR}/hip/hip_runtime_api.h") set(PROF_API_SRC "${CMAKE_CURRENT_SOURCE_DIR}") set(PROF_API_GEN "${CMAKE_CURRENT_SOURCE_DIR}/hip_prof_gen.py") set(PROF_API_LOG "${PROJECT_BINARY_DIR}/hip_prof_gen.log.txt") find_package(Python3 COMPONENTS Interpreter REQUIRED) execute_process(COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser" RESULT_VARIABLE CPP_HEADER_PARSER OUTPUT_QUIET) if(NOT ${CPP_HEADER_PARSER} EQUAL 0) message(FATAL_ERROR "\ The \"CppHeaderParser\" Python3 package is not installed. \ Please install it using the following command: \"pip3 install CppHeaderParser\".\ ") endif() add_custom_command(OUTPUT ${PROF_API_STR} COMMAND ${Python3_EXECUTABLE} ${PROF_API_GEN} -v -t --priv ${PROF_API_HDR} ${PROF_API_SRC} ${PROF_API_STR_IN} ${PROF_API_STR} DEPENDS ${PROF_API_STR_IN} ${PROF_API_HDR} ${PROF_API_GEN} COMMENT "Generating profiling primitives: ${PROF_API_STR}") add_custom_target(gen-prof-api-str-header ALL DEPENDS ${PROF_API_STR} SOURCES ${PROF_API_HDR}) set_target_properties(amdhip64 PROPERTIES PUBLIC_HEADER ${PROF_API_STR}) find_path(PROF_API_HEADER_DIR prof_protocol.h HINTS ${PROF_API_HEADER_PATH} PATHS ${ROCM_PATH}/roctracer PATH_SUFFIXES include/ext) if(NOT PROF_API_HEADER_DIR) message(WARNING "Profiling API header not found. Disabling roctracer integration. Use -DPROF_API_HEADER_PATH=") else() target_compile_definitions(amdhip64 PUBLIC USE_PROF_API=1) target_include_directories(amdhip64 PUBLIC ${PROF_API_HEADER_DIR}) message(STATUS "Profiling API: ${PROF_API_HEADER_DIR}") endif() add_dependencies(amdhip64 gen-prof-api-str-header) endif() add_custom_command(TARGET amdhip64 POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy ${PROJECT_BINARY_DIR}/.hipInfo ${PROJECT_BINARY_DIR}/lib/.hipInfo) add_custom_command(TARGET amdhip64 POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_directory ${PROJECT_SOURCE_DIR}/include ${PROJECT_BINARY_DIR}/include) add_library(host INTERFACE) target_link_libraries(host INTERFACE amdhip64) add_library(device INTERFACE) target_link_libraries(device INTERFACE host) # Current packaging assumes that HIP runtime will always be installed in ${ROCM_PATH}/lib # This is false to assume, because some distros like CentOS will use the lib64 directory instead of lib # Relying on CMake to choose the library directory for us will default in that case to lib64 # Hence there will be a mismatch between where HIP is installed and where CMake thinks it is INSTALL(TARGETS amdhip64 host device EXPORT hip-targets RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}) INSTALL(EXPORT hip-targets DESTINATION ${CONFIG_PACKAGE_INSTALL_DIR} NAMESPACE hip::) INSTALL(TARGETS amdhip64 host device EXPORT hip-lang-targets RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}) INSTALL(EXPORT hip-lang-targets DESTINATION ${CONFIG_LANG_PACKAGE_INSTALL_DIR} NAMESPACE hip-lang::) include(CMakePackageConfigHelpers) configure_package_config_file( ${HIP_COMMON_DIR}/hip-lang-config.cmake.in ${CMAKE_CURRENT_BINARY_DIR}/hip-lang-config.cmake INSTALL_DESTINATION ${CONFIG_LANG_PACKAGE_INSTALL_DIR} PATH_VARS LIB_INSTALL_DIR INCLUDE_INSTALL_DIR BIN_INSTALL_DIR) write_basic_package_version_file( ${CMAKE_CURRENT_BINARY_DIR}/hip-lang-config-version.cmake VERSION "${HIP_VERSION_MAJOR}.${HIP_VERSION_MINOR}.${HIP_VERSION_GITDATE}" COMPATIBILITY SameMajorVersion) install( FILES ${CMAKE_CURRENT_BINARY_DIR}/hip-lang-config.cmake ${CMAKE_CURRENT_BINARY_DIR}/hip-lang-config-version.cmake DESTINATION ${CONFIG_LANG_PACKAGE_INSTALL_DIR}/ ) clr-rocm-5.7.1/hipamd/src/amd_hsa_elf.hpp000066400000000000000000000131451450307266000202350ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once // This header file is partially copied from // https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/ELF.h // AMDGPU OS for HSA compatible compute kernels. enum { ELFOSABI_AMDGPU_HSA = 64, ELFOSABI_AMDGPU_PAL = 65, ELFOSABI_AMDGPU_MESA3D = 66 }; enum { ELFABIVERSION_AMDGPU_HSA_V2 = 0, ELFABIVERSION_AMDGPU_HSA_V3 = 1, ELFABIVERSION_AMDGPU_HSA_V4 = 2, ELFABIVERSION_AMDGPU_HSA_V5 = 3 }; // AMDGPU specific e_flags enum : unsigned { EF_AMDGPU_MACH = 0x0ff, // AMDGPU processors EF_AMDGPU_MACH_NONE = 0x000, EF_AMDGPU_MACH_R600_R600 = 0x001, EF_AMDGPU_MACH_R600_R630 = 0x002, EF_AMDGPU_MACH_R600_RS880 = 0x003, EF_AMDGPU_MACH_R600_RV670 = 0x004, EF_AMDGPU_MACH_R600_RV710 = 0x005, EF_AMDGPU_MACH_R600_RV730 = 0x006, EF_AMDGPU_MACH_R600_RV770 = 0x007, EF_AMDGPU_MACH_R600_CEDAR = 0x008, EF_AMDGPU_MACH_R600_CYPRESS = 0x009, EF_AMDGPU_MACH_R600_JUNIPER = 0x00a, EF_AMDGPU_MACH_R600_REDWOOD = 0x00b, EF_AMDGPU_MACH_R600_SUMO = 0x00c, EF_AMDGPU_MACH_R600_BARTS = 0x00d, EF_AMDGPU_MACH_R600_CAICOS = 0x00e, EF_AMDGPU_MACH_R600_CAYMAN = 0x00f, EF_AMDGPU_MACH_R600_TURKS = 0x010, EF_AMDGPU_MACH_R600_RESERVED_FIRST = 0x011, EF_AMDGPU_MACH_R600_RESERVED_LAST = 0x01f, EF_AMDGPU_MACH_R600_FIRST = EF_AMDGPU_MACH_R600_R600, EF_AMDGPU_MACH_R600_LAST = EF_AMDGPU_MACH_R600_TURKS, // AMDGCN-based processors. EF_AMDGPU_MACH_AMDGCN_GFX600 = 0x020, EF_AMDGPU_MACH_AMDGCN_GFX601 = 0x021, EF_AMDGPU_MACH_AMDGCN_GFX700 = 0x022, EF_AMDGPU_MACH_AMDGCN_GFX701 = 0x023, EF_AMDGPU_MACH_AMDGCN_GFX702 = 0x024, EF_AMDGPU_MACH_AMDGCN_GFX703 = 0x025, EF_AMDGPU_MACH_AMDGCN_GFX704 = 0x026, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X27 = 0x027, EF_AMDGPU_MACH_AMDGCN_GFX801 = 0x028, EF_AMDGPU_MACH_AMDGCN_GFX802 = 0x029, EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a, EF_AMDGPU_MACH_AMDGCN_GFX810 = 0x02b, EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c, EF_AMDGPU_MACH_AMDGCN_GFX902 = 0x02d, EF_AMDGPU_MACH_AMDGCN_GFX904 = 0x02e, EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f, EF_AMDGPU_MACH_AMDGCN_GFX908 = 0x030, EF_AMDGPU_MACH_AMDGCN_GFX909 = 0x031, EF_AMDGPU_MACH_AMDGCN_GFX90C = 0x032, EF_AMDGPU_MACH_AMDGCN_GFX1010 = 0x033, EF_AMDGPU_MACH_AMDGCN_GFX1011 = 0x034, EF_AMDGPU_MACH_AMDGCN_GFX1012 = 0x035, EF_AMDGPU_MACH_AMDGCN_GFX1030 = 0x036, EF_AMDGPU_MACH_AMDGCN_GFX1031 = 0x037, EF_AMDGPU_MACH_AMDGCN_GFX1032 = 0x038, EF_AMDGPU_MACH_AMDGCN_GFX1033 = 0x039, EF_AMDGPU_MACH_AMDGCN_GFX602 = 0x03a, EF_AMDGPU_MACH_AMDGCN_GFX705 = 0x03b, EF_AMDGPU_MACH_AMDGCN_GFX805 = 0x03c, EF_AMDGPU_MACH_AMDGCN_GFX1035 = 0x03d, EF_AMDGPU_MACH_AMDGCN_GFX1034 = 0x03e, EF_AMDGPU_MACH_AMDGCN_GFX90A = 0x03f, EF_AMDGPU_MACH_AMDGCN_GFX940 = 0x040, EF_AMDGPU_MACH_AMDGCN_GFX941 = 0x041, EF_AMDGPU_MACH_AMDGCN_GFX942 = 0x042, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X43 = 0x043, EF_AMDGPU_MACH_AMDGCN_GFX1100 = 0x044, EF_AMDGPU_MACH_AMDGCN_GFX1013 = 0x045, EF_AMDGPU_MACH_AMDGCN_GFX1103 = 0x046, EF_AMDGPU_MACH_AMDGCN_GFX1036 = 0x047, EF_AMDGPU_MACH_AMDGCN_GFX1101 = 0x048, EF_AMDGPU_MACH_AMDGCN_GFX1102 = 0x049, // First/last AMDGCN-based processors. EF_AMDGPU_MACH_AMDGCN_FIRST = EF_AMDGPU_MACH_AMDGCN_GFX600, EF_AMDGPU_MACH_AMDGCN_LAST = EF_AMDGPU_MACH_AMDGCN_GFX1102, // Indicates if the "xnack" target feature is enabled for all code contained // in the object. // // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V3. EF_AMDGPU_FEATURE_XNACK_V3 = 0x100, // Indicates if the "sramecc" target feature is enabled for all code // contained in the object. // // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V3. EF_AMDGPU_FEATURE_SRAMECC_V3 = 0x200, // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V4. EF_AMDGPU_FEATURE_XNACK_V4 = 0x300, EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4 = 0x000, EF_AMDGPU_FEATURE_XNACK_ANY_V4 = 0x100, EF_AMDGPU_FEATURE_XNACK_OFF_V4 = 0x200, EF_AMDGPU_FEATURE_XNACK_ON_V4 = 0x300, // SRAMECC selection mask for EF_AMDGPU_FEATURE_SRAMECC_* values. // Only valid for ELFOSABI_AMDGPU_HSA and ELFABIVERSION_AMDGPU_HSA_V4. EF_AMDGPU_FEATURE_SRAMECC_V4 = 0xc00, EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4 = 0x000, EF_AMDGPU_FEATURE_SRAMECC_ANY_V4 = 0x400, EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 = 0x800, EF_AMDGPU_FEATURE_SRAMECC_ON_V4 = 0xc00, }; clr-rocm-5.7.1/hipamd/src/amdhip.def000066400000000000000000000233751450307266000172320ustar00rootroot00000000000000EXPORTS hipChooseDevice hipCtxCreate hipCtxDestroy hipCtxDisablePeerAccess hipCtxEnablePeerAccess hipCtxGetApiVersion hipCtxGetCacheConfig hipCtxGetCurrent hipCtxGetDevice hipCtxGetFlags hipCtxGetSharedMemConfig hipCtxPopCurrent hipCtxPushCurrent hipCtxSetCacheConfig hipCtxSetCurrent hipCtxSetSharedMemConfig hipCtxSynchronize hipDeviceCanAccessPeer hipDeviceComputeCapability hipDeviceDisablePeerAccess hipDeviceEnablePeerAccess hipDeviceGet hipDeviceGetAttribute hipDeviceGetByPCIBusId hipDeviceGetCacheConfig hipDeviceGetStreamPriorityRange hipDeviceGetLimit hipDeviceGetName hipDeviceGetUuid hipDeviceGetPCIBusId hipDeviceGetSharedMemConfig hipDeviceGetP2PAttribute hipDevicePrimaryCtxGetState hipDevicePrimaryCtxRelease hipDevicePrimaryCtxReset hipDevicePrimaryCtxRetain hipDevicePrimaryCtxSetFlags hipDeviceReset hipDeviceSetCacheConfig hipDeviceSetSharedMemConfig hipDeviceSynchronize hipDeviceTotalMem hipDriverGetVersion hipEventCreate hipEventCreateWithFlags hipEventDestroy hipEventElapsedTime hipEventQuery hipEventRecord hipEventSynchronize hipExtGetLinkTypeAndHopCount hipExtLaunchMultiKernelMultiDevice hipExtMallocWithFlags hipExtModuleLaunchKernel hipExtLaunchKernel hipFree hipFreeArray hipFuncSetAttribute hipFuncSetCacheConfig hipFuncSetSharedMemConfig hipGetDevice hipGetDeviceCount hipGetDeviceProperties hipGetErrorName hipGetErrorString hipGetLastError hipMemAllocHost hipHostAlloc hipHostFree hipHostGetDevicePointer hipHostGetFlags hipHostMalloc hipHostRegister hipHostUnregister hipInit hipIpcCloseMemHandle hipIpcGetMemHandle hipIpcOpenMemHandle hipIpcGetEventHandle hipIpcOpenEventHandle hipMalloc hipMalloc3D hipMalloc3DArray hipMallocManaged hipDeviceGetDefaultMemPool hipDeviceSetMemPool hipDeviceGetMemPool hipMallocAsync hipFreeAsync hipMemPoolTrimTo hipMemPoolSetAttribute hipMemPoolGetAttribute hipMemPoolSetAccess hipMemPoolGetAccess hipMemPoolCreate hipMemPoolDestroy hipMallocFromPoolAsync hipMemPoolExportToShareableHandle hipMemPoolImportFromShareableHandle hipMemPoolExportPointer hipMemPoolImportPointer hipArrayCreate hipArray3DCreate hipArrayDestroy hipArrayGetInfo hipArrayGetDescriptor hipArray3DGetDescriptor hipMallocArray hipMemAdvise hipMemAllocPitch hipMallocPitch hipMemcpy hipMemcpyWithStream hipMemcpyParam2D hipMemcpy2D hipMemcpy2DAsync hipMemcpy2DToArray hipMemcpy2DToArrayAsync hipMemcpy3D hipMemcpy3DAsync hipDrvMemcpy3D hipDrvMemcpy3DAsync hipMemcpyAsync hipMemcpyDtoD hipMemcpyDtoDAsync hipMemcpyDtoH hipMemcpyDtoHAsync hipMemcpyFromSymbol hipMemcpyFromSymbolAsync hipMemcpyHtoD hipMemcpyHtoDAsync hipMemcpyPeer hipMemcpyPeerAsync hipMemcpyToArray hipMemcpyFromArray hipMemcpyToSymbol hipMemcpyToSymbolAsync hipMemGetAddressRange hipGetSymbolAddress hipGetSymbolSize hipMemGetInfo hipMemPrefetchAsync hipMemPtrGetInfo hipMemRangeGetAttribute hipMemRangeGetAttributes hipMemset hipMemsetAsync hipMemsetD8 hipMemsetD8Async hipMemsetD16 hipMemsetD16Async hipMemsetD32 hipMemsetD32Async hipMemset2D hipMemset2DAsync hipMemset3D hipMemset3DAsync hipModuleGetFunction hipModuleGetGlobal hipModuleGetTexRef hipModuleLaunchKernel hipModuleLaunchKernelExt hipModuleLaunchCooperativeKernel hipModuleLaunchCooperativeKernelMultiDevice hipLaunchCooperativeKernel hipLaunchCooperativeKernelMultiDevice hipHccModuleLaunchKernel hipModuleLoad hipModuleLoadData hipModuleLoadDataEx hipModuleUnload hipModuleOccupancyMaxPotentialBlockSize hipModuleOccupancyMaxPotentialBlockSizeWithFlags hipModuleOccupancyMaxActiveBlocksPerMultiprocessor hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags hipOccupancyMaxPotentialBlockSize hipOccupancyMaxActiveBlocksPerMultiprocessor hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags hipFuncGetAttribute hipFuncGetAttributes hipPeekAtLastError hipPointerGetAttributes hipProfilerStart hipProfilerStop hipRuntimeGetVersion hipGetDeviceFlags hipSetDevice hipSetDeviceFlags hipStreamAddCallback hipStreamAttachMemAsync hipStreamCreate hipStreamCreateWithFlags hipStreamCreateWithPriority hipStreamDestroy hipStreamGetDevice hipStreamGetFlags hipStreamQuery hipStreamSynchronize hipStreamWaitEvent __hipPopCallConfiguration __hipPushCallConfiguration __hipRegisterFatBinary __hipRegisterFunction __hipRegisterVar __hipRegisterSurface __hipRegisterTexture __hipRegisterManagedVar __hipUnregisterFatBinary hipConfigureCall hipSetupArgument hipLaunchByPtr hipLaunchKernel hipRegisterTracerCallback hipApiName hipKernelNameRef hipBindTexture hipBindTexture2D hipBindTextureToArray hipBindTextureToMipmappedArray hipGetTextureAlignmentOffset hipGetTextureReference hipUnbindTexture hipCreateChannelDesc hipCreateTextureObject hipDestroyTextureObject hipGetChannelDesc hipGetTextureObjectResourceDesc hipGetTextureObjectResourceViewDesc hipGetTextureObjectTextureDesc hipTexRefGetAddress hipTexRefGetAddressMode hipTexRefGetArray hipTexRefGetBorderColor hipTexRefGetFilterMode hipTexRefGetFlags hipTexRefGetFormat hipTexRefGetMaxAnisotropy hipTexRefGetMipmapFilterMode hipTexRefGetMipmapLevelBias hipTexRefGetMipmapLevelClamp hipTexRefGetMipmappedArray hipTexRefSetAddress hipTexRefSetAddress2D hipTexRefSetAddressMode hipTexRefSetArray hipTexRefSetBorderColor hipTexRefSetFilterMode hipTexRefSetFlags hipTexRefSetFormat hipTexRefSetMaxAnisotropy hipTexRefSetMipmapFilterMode hipTexRefSetMipmapLevelBias hipTexRefSetMipmapLevelClamp hipTexRefSetMipmappedArray hipProfilerStart hipProfilerStop hipCreateSurfaceObject hipDestroySurfaceObject hipGetCmdName hipMipmappedArrayCreate hipMallocMipmappedArray hipMipmappedArrayDestroy hipFreeMipmappedArray hipMipmappedArrayGetLevel hipGetMipmappedArrayLevel hipMallocHost hipFreeHost hipTexObjectCreate hipTexObjectDestroy hipTexObjectGetResourceDesc hipTexObjectGetResourceViewDesc hipTexObjectGetTextureDesc hipExtStreamCreateWithCUMask hipStreamGetPriority hipMemcpy2DFromArray hipMemcpy2DFromArrayAsync hipDrvMemcpy2DUnaligned hipMemcpyAtoH hipMemcpyHtoA hipMemcpyParam2DAsync __gnu_h2f_ieee __gnu_f2h_ieee hipExtStreamGetCUMask hipImportExternalMemory hipExternalMemoryGetMappedBuffer hipDestroyExternalMemory hipGraphCreate hipGraphDestroy hipGraphAddKernelNode hipGraphAddMemsetNode hipGraphAddMemcpyNode hipGraphAddMemcpyNode1D hipGraphInstantiate hipGraphLaunch hipStreamIsCapturing hipStreamBeginCapture hipStreamEndCapture hipGraphExecDestroy hipPointerGetAttribute hipDrvPointerGetAttributes hipImportExternalSemaphore hipSignalExternalSemaphoresAsync hipWaitExternalSemaphoresAsync hipDestroyExternalSemaphore hipGLGetDevices hipGraphicsGLRegisterBuffer hipGraphicsGLRegisterImage hipGraphicsMapResources hipGraphicsResourceGetMappedPointer hipGraphicsSubResourceGetMappedArray hipGraphicsUnmapResources hipGraphicsUnregisterResource hipGraphGetNodes hipGraphGetRootNodes hipGraphKernelNodeGetParams hipGraphKernelNodeSetParams hipGraphKernelNodeSetAttribute hipGraphKernelNodeGetAttribute hipGraphMemcpyNodeGetParams hipGraphMemcpyNodeSetParams hipGraphMemsetNodeGetParams hipGraphMemsetNodeSetParams hipGraphAddDependencies hipGraphExecKernelNodeSetParams hipGraphAddEmptyNode hipStreamGetCaptureInfo hipStreamGetCaptureInfo_v2 hipStreamUpdateCaptureDependencies hipGraphRemoveDependencies hipGraphGetEdges hipGraphNodeGetDependencies hipGraphNodeGetDependentNodes hipGraphNodeGetType hipGraphDestroyNode hipGraphClone hipGraphNodeFindInClone hipGraphAddChildGraphNode hipGraphChildGraphNodeGetGraph hipGraphExecChildGraphNodeSetParams hipGraphAddMemcpyNodeFromSymbol hipGraphMemcpyNodeSetParamsFromSymbol hipGraphExecMemcpyNodeSetParamsFromSymbol hipGraphAddMemcpyNodeToSymbol hipGraphMemcpyNodeSetParamsToSymbol hipGraphExecMemcpyNodeSetParamsToSymbol hipGraphExecMemcpyNodeSetParams hipGraphMemcpyNodeSetParams1D hipGraphExecMemcpyNodeSetParams1D hipGraphAddEventRecordNode hipGraphEventRecordNodeGetEvent hipGraphEventRecordNodeSetEvent hipGraphExecEventRecordNodeSetEvent hipGraphAddEventWaitNode hipGraphEventWaitNodeGetEvent hipGraphEventWaitNodeSetEvent hipGraphExecEventWaitNodeSetEvent hipGraphAddHostNode hipGraphHostNodeGetParams hipGraphHostNodeSetParams hipGraphExecHostNodeSetParams hipGraphExecUpdate hipGraphInstantiateWithFlags hipGraphExecMemsetNodeSetParams hipDeviceGetGraphMemAttribute hipDeviceSetGraphMemAttribute hipDeviceGraphMemTrim amd_dbgapi_get_build_name amd_dbgapi_get_git_hash amd_dbgapi_get_build_id hipThreadExchangeStreamCaptureMode hipMemAddressFree hipMemAddressReserve hipMemCreate hipMemExportToShareableHandle hipMemGetAccess hipMemGetAllocationGranularity hipMemGetAllocationPropertiesFromHandle hipMemImportFromShareableHandle hipMemMap hipMemMapArrayAsync hipMemRelease hipMemRetainAllocationHandle hipMemSetAccess hipMemUnmap hipMemcpy_spt hipMemcpyAsync_spt hipStreamSynchronize_spt hipMemcpyToSymbol_spt hipMemcpyFromSymbol_spt hipMemcpy2D_spt hipMemcpy2DToArray_spt hipMemcpy2DFromArray_spt hipMemcpy3D_spt hipMemset_spt hipMemset2D_spt hipMemset3D_spt hipStreamQuery_spt hipStreamGetFlags_spt hipStreamGetPriority_spt hipStreamWaitEvent_spt hipEventRecord_spt hipLaunchKernel_spt hipLaunchCooperativeKernel_spt hipStreamWriteValue32 hipStreamWriteValue64 hipStreamWaitValue32 hipStreamWaitValue64 hipDeviceSetLimit hipGetStreamDeviceId hipGraphLaunch_spt hipStreamBeginCapture_spt hipStreamEndCapture_spt hipStreamIsCapturing_spt hipStreamGetCaptureInfo_spt hipStreamGetCaptureInfo_v2_spt hipStreamAddCallback_spt hipMemsetAsync_spt hipMemset2DAsync_spt hipMemset3DAsync_spt hipMemcpy3DAsync_spt hipMemcpy2DAsync_spt hipMemcpyFromSymbolAsync_spt hipMemcpyToSymbolAsync_spt hipMemcpyFromArray_spt hipMemcpy2DToArray_spt hipMemcpy2DFromArrayAsync_spt hipMemcpy2DToArrayAsync_spt hipDrvGetErrorName hipDrvGetErrorString hipUserObjectCreate hipUserObjectRelease hipUserObjectRetain hipGraphRetainUserObject hipGraphReleaseUserObject hipLaunchHostFunc hipLaunchHostFunc_spt hipGraphDebugDotPrint hipGraphKernelNodeCopyAttributes hipGraphNodeGetEnabled hipGraphNodeSetEnabled hipGraphUpload hipGraphAddMemAllocNode hipGraphMemAllocNodeGetParams hipGraphAddMemFreeNode hipGraphMemFreeNodeGetParams clr-rocm-5.7.1/hipamd/src/cmake/000077500000000000000000000000001450307266000163565ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/src/cmake/FindROCclr.cmake000066400000000000000000000035361450307266000213140ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. if(ROCCLR_FOUND) return() endif() find_path(ROCCLR_INCLUDE_DIR top.hpp HINTS ${ROCCLR_PATH} PATHS # gerrit repo name ${CMAKE_SOURCE_DIR}/vdi ${CMAKE_SOURCE_DIR}/../vdi ${CMAKE_SOURCE_DIR}/../../vdi # github repo name ${CMAKE_SOURCE_DIR}/ROCclr ${CMAKE_SOURCE_DIR}/../ROCclr ${CMAKE_SOURCE_DIR}/../../ROCclr # jenkins repo name ${CMAKE_SOURCE_DIR}/rocclr ${CMAKE_SOURCE_DIR}/../rocclr ${CMAKE_SOURCE_DIR}/../../rocclr PATH_SUFFIXES include) include(FindPackageHandleStandardArgs) find_package_handle_standard_args(ROCclr "\nROCclr not found" ROCCLR_INCLUDE_DIR) mark_as_advanced(ROCCLR_INCLUDE_DIR) list(APPEND CMAKE_MODULE_PATH "${ROCCLR_INCLUDE_DIR}/../cmake") include(ROCclr) clr-rocm-5.7.1/hipamd/src/fixme.cpp000066400000000000000000000032551450307266000171170ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "vdi_common.hpp" #ifdef _WIN32 #include #include #include #include #include #include #endif #include cl_icd_dispatch amd::ICDDispatchedObject::icdVendorDispatch_[] = {0}; amd::PlatformIDS amd::PlatformID::Platform = {amd::ICDDispatchedObject::icdVendorDispatch_}; RUNTIME_ENTRY(cl_int, clGetDeviceIDs, (cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices)) { return CL_SUCCESS; } RUNTIME_EXIT clr-rocm-5.7.1/hipamd/src/hip_activity.cpp000066400000000000000000000024111450307266000204740ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "platform/activity.hpp" #include extern "C" const char* hipGetCmdName(unsigned op) { return getOclCommandKindString(static_cast(op)); }clr-rocm-5.7.1/hipamd/src/hip_code_object.cpp000066400000000000000000000711711450307266000211110ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "hip_code_object.hpp" #include "amd_hsa_elf.hpp" #include #include #include "hip/hip_runtime_api.h" #include "hip/hip_runtime.h" #include "hip_internal.hpp" #include "platform/program.hpp" #include hipError_t ihipFree(void* ptr); // forward declaration of methods required for managed variables hipError_t ihipMallocManaged(void** ptr, size_t size, unsigned int align = 0); namespace { size_t constexpr strLiteralLength(char const* str) { return *str ? 1 + strLiteralLength(str + 1) : 0; } constexpr char const* CLANG_OFFLOAD_BUNDLER_MAGIC_STR = "__CLANG_OFFLOAD_BUNDLE__"; constexpr char const* OFFLOAD_KIND_HIP = "hip"; constexpr char const* OFFLOAD_KIND_HIPV4 = "hipv4"; constexpr char const* OFFLOAD_KIND_HCC = "hcc"; constexpr char const* AMDGCN_TARGET_TRIPLE = "amdgcn-amd-amdhsa-"; // ClangOFFLOADBundle info. static constexpr size_t bundle_magic_string_size = strLiteralLength(CLANG_OFFLOAD_BUNDLER_MAGIC_STR); // Clang Offload bundler description & Header. struct __ClangOffloadBundleInfo { uint64_t offset; uint64_t size; uint64_t bundleEntryIdSize; const char bundleEntryId[1]; }; struct __ClangOffloadBundleHeader { const char magic[bundle_magic_string_size - 1]; uint64_t numOfCodeObjects; __ClangOffloadBundleInfo desc[1]; }; } // namespace namespace hip { bool CodeObject::IsClangOffloadMagicBundle(const void* data) { std::string magic(reinterpret_cast(data), bundle_magic_string_size); return magic.compare(CLANG_OFFLOAD_BUNDLER_MAGIC_STR) ? false : true; } uint64_t CodeObject::ElfSize(const void* emi) { return amd::Elf::getElfSize(emi); } static bool getProcName(uint32_t EFlags, std::string& proc_name, bool& xnackSupported, bool& sramEccSupported) { switch (EFlags & EF_AMDGPU_MACH) { case EF_AMDGPU_MACH_AMDGCN_GFX700: xnackSupported = false; sramEccSupported = false; proc_name = "gfx700"; break; case EF_AMDGPU_MACH_AMDGCN_GFX701: xnackSupported = false; sramEccSupported = false; proc_name = "gfx701"; break; case EF_AMDGPU_MACH_AMDGCN_GFX702: xnackSupported = false; sramEccSupported = false; proc_name = "gfx702"; break; case EF_AMDGPU_MACH_AMDGCN_GFX703: xnackSupported = false; sramEccSupported = false; proc_name = "gfx703"; break; case EF_AMDGPU_MACH_AMDGCN_GFX704: xnackSupported = false; sramEccSupported = false; proc_name = "gfx704"; break; case EF_AMDGPU_MACH_AMDGCN_GFX705: xnackSupported = false; sramEccSupported = false; proc_name = "gfx705"; break; case EF_AMDGPU_MACH_AMDGCN_GFX801: xnackSupported = true; sramEccSupported = false; proc_name = "gfx801"; break; case EF_AMDGPU_MACH_AMDGCN_GFX802: xnackSupported = false; sramEccSupported = false; proc_name = "gfx802"; break; case EF_AMDGPU_MACH_AMDGCN_GFX803: xnackSupported = false; sramEccSupported = false; proc_name = "gfx803"; break; case EF_AMDGPU_MACH_AMDGCN_GFX805: xnackSupported = false; sramEccSupported = false; proc_name = "gfx805"; break; case EF_AMDGPU_MACH_AMDGCN_GFX810: xnackSupported = true; sramEccSupported = false; proc_name = "gfx810"; break; case EF_AMDGPU_MACH_AMDGCN_GFX900: xnackSupported = true; sramEccSupported = false; proc_name = "gfx900"; break; case EF_AMDGPU_MACH_AMDGCN_GFX902: xnackSupported = true; sramEccSupported = false; proc_name = "gfx902"; break; case EF_AMDGPU_MACH_AMDGCN_GFX904: xnackSupported = true; sramEccSupported = false; proc_name = "gfx904"; break; case EF_AMDGPU_MACH_AMDGCN_GFX906: xnackSupported = true; sramEccSupported = true; proc_name = "gfx906"; break; case EF_AMDGPU_MACH_AMDGCN_GFX908: xnackSupported = true; sramEccSupported = true; proc_name = "gfx908"; break; case EF_AMDGPU_MACH_AMDGCN_GFX909: xnackSupported = true; sramEccSupported = false; proc_name = "gfx909"; break; case EF_AMDGPU_MACH_AMDGCN_GFX90A: xnackSupported = true; sramEccSupported = true; proc_name = "gfx90a"; break; case EF_AMDGPU_MACH_AMDGCN_GFX90C: xnackSupported = true; sramEccSupported = false; proc_name = "gfx90c"; break; case EF_AMDGPU_MACH_AMDGCN_GFX940: xnackSupported = true; sramEccSupported = true; proc_name = "gfx940"; break; case EF_AMDGPU_MACH_AMDGCN_GFX941: xnackSupported = true; sramEccSupported = true; proc_name = "gfx941"; break; case EF_AMDGPU_MACH_AMDGCN_GFX942: xnackSupported = true; sramEccSupported = true; proc_name = "gfx942"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1010: xnackSupported = true; sramEccSupported = false; proc_name = "gfx1010"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1011: xnackSupported = true; sramEccSupported = false; proc_name = "gfx1011"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1012: xnackSupported = true; sramEccSupported = false; proc_name = "gfx1012"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1013: xnackSupported = true; sramEccSupported = false; proc_name = "gfx1013"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1030: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1030"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1031: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1031"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1032: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1032"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1033: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1033"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1034: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1034"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1035: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1035"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1036: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1036"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1100: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1100"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1101: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1101"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1102: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1102"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1103: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1103"; break; default: return false; } return true; } static bool getTripleTargetIDFromCodeObject(const void* code_object, std::string& target_id) { if (!code_object) return false; const Elf64_Ehdr* ehdr = reinterpret_cast(code_object); if (ehdr->e_machine != EM_AMDGPU) return false; if (ehdr->e_ident[EI_OSABI] != ELFOSABI_AMDGPU_HSA) return false; bool isXnackSupported{false}, isSramEccSupported{false}; std::string proc_name; if (!getProcName(ehdr->e_flags, proc_name, isXnackSupported, isSramEccSupported)) return false; target_id = std::string(AMDGCN_TARGET_TRIPLE) + '-' + proc_name; switch (ehdr->e_ident[EI_ABIVERSION]) { case ELFABIVERSION_AMDGPU_HSA_V2: { LogPrintfInfo("[Code Object V2, target id:%s]", target_id.c_str()); return false; } case ELFABIVERSION_AMDGPU_HSA_V3: { LogPrintfInfo("[Code Object V3, target id:%s]", target_id.c_str()); if (isSramEccSupported) { if (ehdr->e_flags & EF_AMDGPU_FEATURE_SRAMECC_V3) target_id += ":sramecc+"; else target_id += ":sramecc-"; } if (isXnackSupported) { if (ehdr->e_flags & EF_AMDGPU_FEATURE_XNACK_V3) target_id += ":xnack+"; else target_id += ":xnack-"; } break; } case ELFABIVERSION_AMDGPU_HSA_V4: case ELFABIVERSION_AMDGPU_HSA_V5: { if (ehdr->e_ident[EI_ABIVERSION] & ELFABIVERSION_AMDGPU_HSA_V4) { LogPrintfInfo("[Code Object V4, target id:%s]", target_id.c_str()); } else { LogPrintfInfo("[Code Object V5, target id:%s]", target_id.c_str()); } unsigned co_sram_value = (ehdr->e_flags) & EF_AMDGPU_FEATURE_SRAMECC_V4; if (co_sram_value == EF_AMDGPU_FEATURE_SRAMECC_OFF_V4) target_id += ":sramecc-"; else if (co_sram_value == EF_AMDGPU_FEATURE_SRAMECC_ON_V4) target_id += ":sramecc+"; unsigned co_xnack_value = (ehdr->e_flags) & EF_AMDGPU_FEATURE_XNACK_V4; if (co_xnack_value == EF_AMDGPU_FEATURE_XNACK_OFF_V4) target_id += ":xnack-"; else if (co_xnack_value == EF_AMDGPU_FEATURE_XNACK_ON_V4) target_id += ":xnack+"; break; } default: { return false; } } return true; } // Consumes the string 'consume_' from the starting of the given input // eg: input = amdgcn-amd-amdhsa--gfx908 and consume_ is amdgcn-amd-amdhsa-- // input will become gfx908. static bool consume(std::string& input, std::string consume_) { if (input.substr(0, consume_.size()) != consume_) { return false; } input = input.substr(consume_.size()); return true; } // Trim String till character, will be used to get gpuname // example: input is gfx908:sram-ecc+ and trim char is : // input will become sram-ecc+. static std::string trimName(std::string& input, char trim) { auto pos_ = input.find(trim); auto res = input; if (pos_ == std::string::npos) { input = ""; } else { res = input.substr(0, pos_); input = input.substr(pos_); } return res; } static char getFeatureValue(std::string& input, std::string feature) { char res = ' '; if (consume(input, std::move(feature))) { res = input[0]; input = input.substr(1); } return res; } static bool getTargetIDValue(std::string& input, std::string& processor, char& sramecc_value, char& xnack_value) { processor = trimName(input, ':'); sramecc_value = getFeatureValue(input, std::string(":sramecc")); if (sramecc_value != ' ' && sramecc_value != '+' && sramecc_value != '-') return false; xnack_value = getFeatureValue(input, std::string(":xnack")); if (xnack_value != ' ' && xnack_value != '+' && xnack_value != '-') return false; return true; } static bool getTripleTargetID(std::string bundled_co_entry_id, const void* code_object, std::string& co_triple_target_id) { std::string offload_kind = trimName(bundled_co_entry_id, '-'); if (offload_kind != OFFLOAD_KIND_HIPV4 && offload_kind != OFFLOAD_KIND_HIP && offload_kind != OFFLOAD_KIND_HCC) return false; if (offload_kind != OFFLOAD_KIND_HIPV4) return getTripleTargetIDFromCodeObject(code_object, co_triple_target_id); // For code object V4 onwards the bundled code object entry ID correctly // specifies the target triple. co_triple_target_id = bundled_co_entry_id.substr(1); return true; } static bool isCodeObjectCompatibleWithDevice(std::string co_triple_target_id, std::string agent_triple_target_id) { // Primitive Check if (co_triple_target_id == agent_triple_target_id) return true; // Parse code object triple target id if (!consume(co_triple_target_id, std::string(AMDGCN_TARGET_TRIPLE) + '-')) { return false; } std::string co_processor; char co_sram_ecc, co_xnack; if (!getTargetIDValue(co_triple_target_id, co_processor, co_sram_ecc, co_xnack)) { return false; } if (!co_triple_target_id.empty()) return false; // Parse agent isa triple target id if (!consume(agent_triple_target_id, std::string(AMDGCN_TARGET_TRIPLE) + '-')) { return false; } std::string agent_isa_processor; char isa_sram_ecc, isa_xnack; if (!getTargetIDValue(agent_triple_target_id, agent_isa_processor, isa_sram_ecc, isa_xnack)) { return false; } if (!agent_triple_target_id.empty()) return false; // Check for compatibility if (agent_isa_processor != co_processor) return false; if (co_sram_ecc != ' ') { if (co_sram_ecc != isa_sram_ecc) return false; } if (co_xnack != ' ') { if (co_xnack != isa_xnack) return false; } return true; } // This will be moved to COMGR eventually hipError_t CodeObject::ExtractCodeObjectFromFile( amd::Os::FileDesc fdesc, size_t fsize, const void** image, const std::vector& device_names, std::vector>& code_objs) { hipError_t hip_error = hipSuccess; if (fdesc < 0) { return hipErrorFileNotFound; } // Map the file to memory, with offset 0. // file will be unmapped in ModuleUnload // const void* image = nullptr; if (!amd::Os::MemoryMapFileDesc(fdesc, fsize, 0, image)) { return hipErrorInvalidValue; } // retrieve code_objs{binary_image, binary_size} for devices hip_error = extractCodeObjectFromFatBinary(*image, device_names, code_objs); return hip_error; } // This will be moved to COMGR eventually hipError_t CodeObject::ExtractCodeObjectFromMemory( const void* data, const std::vector& device_names, std::vector>& code_objs, std::string& uri) { // Get the URI from memory if (!amd::Os::GetURIFromMemory(data, 0, uri)) { return hipErrorInvalidValue; } return extractCodeObjectFromFatBinary(data, device_names, code_objs); } // This will be moved to COMGR eventually hipError_t CodeObject::extractCodeObjectFromFatBinary( const void* data, const std::vector& agent_triple_target_ids, std::vector>& code_objs) { std::string magic((const char*)data, bundle_magic_string_size); if (magic.compare(CLANG_OFFLOAD_BUNDLER_MAGIC_STR)) { return hipErrorInvalidKernelFile; } // Initialize Code objects code_objs.reserve(agent_triple_target_ids.size()); for (size_t i = 0; i < agent_triple_target_ids.size(); i++) { code_objs.push_back(std::make_pair(nullptr, 0)); } const auto obheader = reinterpret_cast(data); const auto* desc = &obheader->desc[0]; size_t num_code_objs = code_objs.size(); for (uint64_t i = 0; i < obheader->numOfCodeObjects; ++i, desc = reinterpret_cast( reinterpret_cast(&desc->bundleEntryId[0]) + desc->bundleEntryIdSize)) { const void* image = reinterpret_cast(reinterpret_cast(obheader) + desc->offset); const size_t image_size = desc->size; if (num_code_objs == 0) break; std::string bundleEntryId{desc->bundleEntryId, desc->bundleEntryIdSize}; std::string co_triple_target_id; if (!getTripleTargetID(bundleEntryId, image, co_triple_target_id)) continue; for (size_t dev = 0; dev < agent_triple_target_ids.size(); ++dev) { if (code_objs[dev].first) continue; if (isCodeObjectCompatibleWithDevice(co_triple_target_id, agent_triple_target_ids[dev])) { code_objs[dev] = std::make_pair(image, image_size); --num_code_objs; } } } if (num_code_objs == 0) { return hipSuccess; } else { LogPrintfError("%s", "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"); LogPrintfError("%s", " Devices:"); for (size_t i = 0; i < agent_triple_target_ids.size(); i++) { LogPrintfError(" %s - [%s]", agent_triple_target_ids[i].c_str(), ((code_objs[i].first) ? "Found" : "Not Found")); } const auto obheader = reinterpret_cast(data); const auto* desc = &obheader->desc[0]; LogPrintfError("%s", " Bundled Code Objects:"); for (uint64_t i = 0; i < obheader->numOfCodeObjects; ++i, desc = reinterpret_cast( reinterpret_cast(&desc->bundleEntryId[0]) + desc->bundleEntryIdSize)) { std::string bundleEntryId{desc->bundleEntryId, desc->bundleEntryIdSize}; const void* image = reinterpret_cast(reinterpret_cast(obheader) + desc->offset); std::string co_triple_target_id; bool valid_co = getTripleTargetID(bundleEntryId, image, co_triple_target_id); if (valid_co) { LogPrintfError(" %s - [code object targetID is %s]", bundleEntryId.c_str(), co_triple_target_id.c_str()); } else { LogPrintfError(" %s - [Unsupported]", bundleEntryId.c_str()); } } LogPrintfError("hipErrorNoBinaryForGpu: Unable to find code object for all current devices! - %d",hipErrorNoBinaryForGpu); return hipErrorNoBinaryForGpu; } } hipError_t DynCO::loadCodeObject(const char* fname, const void* image) { amd::ScopedLock lock(dclock_); // Number of devices = 1 in dynamic code object fb_info_ = new FatBinaryInfo(fname, image); std::vector devices = {g_devices[ihipGetDevice()]}; IHIP_RETURN_ONFAIL(fb_info_->ExtractFatBinary(devices)); // No Lazy loading for DynCO IHIP_RETURN_ONFAIL(fb_info_->BuildProgram(ihipGetDevice())); // Define Global variables IHIP_RETURN_ONFAIL(populateDynGlobalVars()); // Define Global functions IHIP_RETURN_ONFAIL(populateDynGlobalFuncs()); return hipSuccess; } // Dynamic Code Object DynCO::~DynCO() { amd::ScopedLock lock(dclock_); for (auto& elem : vars_) { if (elem.second->getVarKind() == Var::DVK_Managed) { hipError_t err = ihipFree(elem.second->getManagedVarPtr()); assert(err == hipSuccess); } delete elem.second; } vars_.clear(); for (auto& elem : functions_) { delete elem.second; } functions_.clear(); delete fb_info_; } hipError_t DynCO::getDeviceVar(DeviceVar** dvar, std::string var_name) { amd::ScopedLock lock(dclock_); CheckDeviceIdMatch(); auto it = vars_.find(var_name); if (it == vars_.end()) { LogPrintfError("Cannot find the Var: %s ", var_name.c_str()); return hipErrorNotFound; } hipError_t err = it->second->getDeviceVar(dvar, device_id_, module()); return err; } hipError_t DynCO::getDynFunc(hipFunction_t* hfunc, std::string func_name) { amd::ScopedLock lock(dclock_); CheckDeviceIdMatch(); if (hfunc == nullptr) { return hipErrorInvalidValue; } auto it = functions_.find(func_name); if (it == functions_.end()) { LogPrintfError("Cannot find the function: %s ", func_name.c_str()); return hipErrorNotFound; } /* See if this could be solved */ return it->second->getDynFunc(hfunc, module()); } hipError_t DynCO::initDynManagedVars(const std::string& managedVar) { amd::ScopedLock lock(dclock_); DeviceVar* dvar; void* pointer = nullptr; hipError_t status = hipSuccess; // To get size of the managed variable status = getDeviceVar(&dvar, managedVar + ".managed"); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_API, "Status %d, failed to get .managed device variable:%s", status, managedVar.c_str()); return status; } // Allocate managed memory for these symbols status = ihipMallocManaged(&pointer, dvar->size()); guarantee(status == hipSuccess, "Status %d, failed to allocate managed memory", status); // update as manager variable and set managed memory pointer and size auto it = vars_.find(managedVar); it->second->setManagedVarInfo(pointer, dvar->size()); // copy initial value to the managed variable to the managed memory allocated hip::Stream* stream = hip::getNullStream(); if (stream != nullptr) { status = ihipMemcpy(pointer, reinterpret_cast
(dvar->device_ptr()), dvar->size(), hipMemcpyDeviceToDevice, *stream); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_API, "Status %d, failed to copy device ptr:%s", status, managedVar.c_str()); return status; } } else { ClPrint(amd::LOG_ERROR, amd::LOG_API, "Host Queue is NULL"); return hipErrorInvalidResourceHandle; } // Get deivce ptr to initialize with managed memory pointer status = getDeviceVar(&dvar, managedVar); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_API, "Status %d, failed to get managed device variable:%s", status, managedVar.c_str()); return status; } // copy managed memory pointer to the managed device variable status = ihipMemcpy(reinterpret_cast
(dvar->device_ptr()), &pointer, dvar->size(), hipMemcpyHostToDevice, *stream); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_API, "Status %d, failed to copy device ptr:%s", status, managedVar.c_str()); return status; } return status; } hipError_t DynCO::populateDynGlobalVars() { amd::ScopedLock lock(dclock_); hipError_t err = hipSuccess; std::vector var_names; std::string managedVarExt = ".managed"; // For Dynamic Modules there is only one hipFatBinaryDevInfo_ device::Program* dev_program = fb_info_->GetProgram(ihipGetDevice()) ->getDeviceProgram(*hip::getCurrentDevice()->devices()[0]); if (!dev_program->getGlobalVarFromCodeObj(&var_names)) { LogPrintfError("Could not get Global vars from Code Obj for Module: 0x%x \n", module()); return hipErrorSharedObjectSymbolNotFound; } for (auto& elem : var_names) { vars_.insert( std::make_pair(elem, new Var(elem, Var::DeviceVarKind::DVK_Variable, 0, 0, 0, nullptr))); } for (auto& elem : var_names) { if (elem.find(managedVarExt) != std::string::npos) { std::string managedVar = elem; managedVar.erase(managedVar.length() - managedVarExt.length(), managedVarExt.length()); err = initDynManagedVars(managedVar); } } return err; } hipError_t DynCO::populateDynGlobalFuncs() { amd::ScopedLock lock(dclock_); std::vector func_names; device::Program* dev_program = fb_info_->GetProgram(ihipGetDevice()) ->getDeviceProgram(*hip::getCurrentDevice()->devices()[0]); // Get all the global func names from COMGR if (!dev_program->getGlobalFuncFromCodeObj(&func_names)) { LogPrintfError("Could not get Global Funcs from Code Obj for Module: 0x%x \n", module()); return hipErrorSharedObjectSymbolNotFound; } for (auto& elem : func_names) { functions_.insert(std::make_pair(elem, new Function(elem))); } return hipSuccess; } // Static Code Object StatCO::StatCO() {} StatCO::~StatCO() { amd::ScopedLock lock(sclock_); for (auto& elem : functions_) { delete elem.second; } functions_.clear(); for (auto& elem : vars_) { delete elem.second; } vars_.clear(); } hipError_t StatCO::digestFatBinary(const void* data, FatBinaryInfo*& programs) { amd::ScopedLock lock(sclock_); if (programs != nullptr) { return hipSuccess; } // Create a new fat binary object and extract the fat binary for all devices. programs = new FatBinaryInfo(nullptr, data); IHIP_RETURN_ONFAIL(programs->ExtractFatBinary(g_devices)); return hipSuccess; } FatBinaryInfo** StatCO::addFatBinary(const void* data, bool initialized) { amd::ScopedLock lock(sclock_); if (initialized) { hipError_t err = digestFatBinary(data, modules_[data]); assert(err == hipSuccess); } return &modules_[data]; } hipError_t StatCO::removeFatBinary(FatBinaryInfo** module) { amd::ScopedLock lock(sclock_); auto vit = vars_.begin(); while (vit != vars_.end()) { if (vit->second->moduleInfo() == module) { delete vit->second; vit = vars_.erase(vit); } else { ++vit; } } auto it = managedVars_.begin(); while (it != managedVars_.end()) { if ((*it)->moduleInfo() == module) { for (auto dev : g_devices) { DeviceVar* dvar = nullptr; IHIP_RETURN_ONFAIL((*it)->getStatDeviceVar(&dvar, dev->deviceId())); // free also deletes the device ptr hipError_t err = ihipFree(dvar->device_ptr()); assert(err == hipSuccess); } it = managedVars_.erase(it); } else { ++it; } } auto fit = functions_.begin(); while (fit != functions_.end()) { if (fit->second->moduleInfo() == module) { delete fit->second; fit = functions_.erase(fit); } else { ++fit; } } auto mit = modules_.begin(); while (mit != modules_.end()) { if (&mit->second == module) { delete mit->second; mit = modules_.erase(mit); } else { ++mit; } } return hipSuccess; } hipError_t StatCO::registerStatFunction(const void* hostFunction, Function* func) { amd::ScopedLock lock(sclock_); if (functions_.find(hostFunction) != functions_.end()) { DevLogPrintfError("hostFunctionPtr: 0x%x already exists", hostFunction); } functions_.insert(std::make_pair(hostFunction, func)); return hipSuccess; } const char* StatCO::getStatFuncName(const void* hostFunction) { amd::ScopedLock lock(sclock_); const auto it = functions_.find(hostFunction); if (it == functions_.end()) { return nullptr; } return it->second->name().c_str(); } hipError_t StatCO::getStatFunc(hipFunction_t* hfunc, const void* hostFunction, int deviceId) { amd::ScopedLock lock(sclock_); const auto it = functions_.find(hostFunction); if (it == functions_.end()) { return hipErrorInvalidSymbol; } return it->second->getStatFunc(hfunc, deviceId); } hipError_t StatCO::getStatFuncAttr(hipFuncAttributes* func_attr, const void* hostFunction, int deviceId) { amd::ScopedLock lock(sclock_); const auto it = functions_.find(hostFunction); if (it == functions_.end()) { return hipErrorInvalidSymbol; } return it->second->getStatFuncAttr(func_attr, deviceId); } hipError_t StatCO::registerStatGlobalVar(const void* hostVar, Var* var) { amd::ScopedLock lock(sclock_); if (vars_.find(hostVar) != vars_.end()) { return hipErrorInvalidSymbol; } vars_.insert(std::make_pair(hostVar, var)); return hipSuccess; } hipError_t StatCO::getStatGlobalVar(const void* hostVar, int deviceId, hipDeviceptr_t* dev_ptr, size_t* size_ptr) { amd::ScopedLock lock(sclock_); const auto it = vars_.find(hostVar); if (it == vars_.end()) { return hipErrorInvalidSymbol; } DeviceVar* dvar = nullptr; IHIP_RETURN_ONFAIL(it->second->getStatDeviceVar(&dvar, deviceId)); *dev_ptr = dvar->device_ptr(); *size_ptr = dvar->size(); return hipSuccess; } hipError_t StatCO::registerStatManagedVar(Var* var) { managedVars_.emplace_back(var); return hipSuccess; } hipError_t StatCO::initStatManagedVarDevicePtr(int deviceId) { amd::ScopedLock lock(sclock_); hipError_t err = hipSuccess; if (managedVarsDevicePtrInitalized_.find(deviceId) == managedVarsDevicePtrInitalized_.end() || !managedVarsDevicePtrInitalized_[deviceId]) { for (auto var : managedVars_) { DeviceVar* dvar = nullptr; IHIP_RETURN_ONFAIL(var->getStatDeviceVar(&dvar, deviceId)); hip::Stream* stream = g_devices.at(deviceId)->NullStream(); if (stream != nullptr) { err = ihipMemcpy(reinterpret_cast
(dvar->device_ptr()), var->getManagedVarPtr(), dvar->size(), hipMemcpyHostToDevice, *stream); } else { ClPrint(amd::LOG_ERROR, amd::LOG_API, "Host Queue is NULL"); return hipErrorInvalidResourceHandle; } } managedVarsDevicePtrInitalized_[deviceId] = true; } return err; } }; // namespace hip clr-rocm-5.7.1/hipamd/src/hip_code_object.hpp000066400000000000000000000150601450307266000211110ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_CODE_OBJECT_HPP #define HIP_CODE_OBJECT_HPP #include "hip_global.hpp" #include #include #include "hip/hip_runtime.h" #include "hip/hip_runtime_api.h" #include "hip_internal.hpp" #include "device/device.hpp" #include "platform/program.hpp" //Forward Declaration for friend usage class PlatformState; namespace hip { //Code Object base class class CodeObject { public: virtual ~CodeObject() {} // Functions to add_dev_prog and build static hipError_t add_program(int deviceId, hipModule_t hmod, const void* binary_ptr, size_t binary_size); static hipError_t build_module(hipModule_t hmod, const std::vector& devices); // Given an file desc and file size, extracts to code object for corresponding devices, // return code_objs{binary_ptr, binary_size}, which could be used to determine foffset static hipError_t ExtractCodeObjectFromFile(amd::Os::FileDesc fdesc, size_t fsize, const void ** image, const std::vector& device_names, std::vector>& code_objs); // Given an ptr to memory, extracts to code object for corresponding devices, // returns code_objs{binary_ptr, binary_size} and uniform resource indicator static hipError_t ExtractCodeObjectFromMemory(const void* data, const std::vector& device_names, std::vector>& code_objs, std::string& uri); static uint64_t ElfSize(const void* emi); static bool IsClangOffloadMagicBundle(const void* data); protected: //Given an ptr to image or file, extracts to code object //for corresponding devices static hipError_t extractCodeObjectFromFatBinary(const void*, const std::vector&, std::vector>&); CodeObject() {} private: friend const std::vector& modules(); }; //Dynamic Code Object class DynCO : public CodeObject { amd::Monitor dclock_{"Guards Dynamic Code object", true}; public: DynCO() : device_id_(ihipGetDevice()), fb_info_(nullptr) {} virtual ~DynCO(); //LoadsCodeObject and its data hipError_t loadCodeObject(const char* fname, const void* image=nullptr); hipModule_t module() const { return fb_info_->Module(ihipGetDevice()); }; //Gets GlobalVar/Functions from a dynamically loaded code object hipError_t getDynFunc(hipFunction_t* hfunc, std::string func_name); hipError_t getDeviceVar(DeviceVar** dvar, std::string var_name); hipError_t getManagedVarPointer(std::string name, void** pointer, size_t* size_ptr) const { auto it = vars_.find(name); if (it != vars_.end() && it->second->getVarKind() == Var::DVK_Managed) { *pointer = it->second->getManagedVarPtr(); *size_ptr = it->second->getSize(); } return hipSuccess; } // Device ID Check to check if module is launched in the same device it was loaded. inline void CheckDeviceIdMatch() const { guarantee(device_id_ == ihipGetDevice(), "Device mismatch from where this module is loaded," "device_id: %d ihipGetDevice:%d \n", device_id_, ihipGetDevice()); } private: int device_id_; FatBinaryInfo* fb_info_; //Maps for vars/funcs, could be keyed in with std::string name std::unordered_map functions_; std::unordered_map vars_; //Populate Global Vars/Funcs from an code object(@ module_load) hipError_t populateDynGlobalFuncs(); hipError_t populateDynGlobalVars(); hipError_t initDynManagedVars(const std::string& managedVar); }; //Static Code Object class StatCO: public CodeObject { amd::Monitor sclock_{"Guards Static Code object", true}; public: StatCO(); virtual ~StatCO(); //Add/Remove/Digest Fat Binaries passed to us from "__hipRegisterFatBinary" FatBinaryInfo** addFatBinary(const void* data, bool initialized); hipError_t removeFatBinary(FatBinaryInfo** module); hipError_t digestFatBinary(const void* data, FatBinaryInfo*& programs); //Register vars/funcs given to use from __hipRegister[Var/Func/ManagedVar] hipError_t registerStatFunction(const void* hostFunction, Function* func); hipError_t registerStatGlobalVar(const void* hostVar, Var* var); hipError_t registerStatManagedVar(Var *var); //Retrive Vars/Funcs for a given hostSidePtr(const void*), unless stated otherwise. const char* getStatFuncName(const void* hostFunction); hipError_t getStatFunc(hipFunction_t* hfunc, const void* hostFunction, int deviceId); hipError_t getStatFuncAttr(hipFuncAttributes* func_attr, const void* hostFunction, int deviceId); hipError_t getStatGlobalVar(const void* hostVar, int deviceId, hipDeviceptr_t* dev_ptr, size_t* size_ptr); //Managed variable is a defined symbol in code object //pointer to the alocated managed memory has to be copied to the address of symbol hipError_t initStatManagedVarDevicePtr(int deviceId); private: friend class ::PlatformState; //Populated during __hipRegisterFatBinary std::unordered_map modules_; //Populated during __hipRegisterFuncs std::unordered_map functions_; //Populated during __hipRegisterVars std::unordered_map vars_; //Populated during __hipRegisterManagedVar std::vector managedVars_; std::unordered_map managedVarsDevicePtrInitalized_; }; }; // namespace hip #endif /* HIP_CODE_OBJECT_HPP */ clr-rocm-5.7.1/hipamd/src/hip_context.cpp000066400000000000000000000251461450307266000203360ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" #include "hip_platform.hpp" #include "platform/runtime.hpp" #include "utils/flags.hpp" #include "utils/versions.hpp" std::vector g_devices; std::once_flag g_ihipInitialized; namespace hip { thread_local TlsAggregator tls; amd::Context* host_context = nullptr; //init() is only to be called from the HIP_INIT macro only once void init(bool* status) { amd::IS_HIP = true; GPU_NUM_MEM_DEPENDENCY = 0; #if DISABLE_DIRECT_DISPATCH constexpr bool kDirectDispatch = false; #else constexpr bool kDirectDispatch = IS_LINUX; #endif AMD_DIRECT_DISPATCH = flagIsDefault(AMD_DIRECT_DISPATCH) ? kDirectDispatch : AMD_DIRECT_DISPATCH; if (!amd::Runtime::init()) { *status = false; return; } ClPrint(amd::LOG_INFO, amd::LOG_INIT, "Direct Dispatch: %d", AMD_DIRECT_DISPATCH); const std::vector& devices = amd::Device::getDevices(CL_DEVICE_TYPE_GPU, false); for (unsigned int i=0; i device(1, devices[i]); amd::Context* context = new amd::Context(device, amd::Context::Info()); if (!context) { *status = false; return; } // Enable active wait on the device by default devices[i]->SetActiveWait(true); if (context && CL_SUCCESS != context->create(nullptr)) { context->release(); } else { auto device = new Device(context, i); if ((device == nullptr) || !device->Create()) { *status = false; return; } g_devices.push_back(device); } } amd::Context* hContext = new amd::Context(devices, amd::Context::Info()); if (!hContext) { *status = false; return; } if (CL_SUCCESS != hContext->create(nullptr)) { hContext->release(); } host_context = hContext; PlatformState::instance().init(); *status = true; return; } Device* getCurrentDevice() { return tls.device_; } void setCurrentDevice(unsigned int index) { assert(indexdevices()[0]->getPreferredNumaNode(); amd::Os::setPreferredNumaNode(preferredNumaNode); } hip::Stream* getStream(hipStream_t stream) { if (stream == nullptr) { return getNullStream(); } else { hip::Stream* hip_stream = reinterpret_cast(stream); if (!(hip_stream->Flags() & hipStreamNonBlocking)) { constexpr bool WaitNullStreamOnly = true; iHipWaitActiveStreams(hip_stream, WaitNullStreamOnly); } return hip_stream; } } // ================================================================================================ hip::Stream* getNullStream(amd::Context& ctx) { for (auto& it : g_devices) { if (it->asContext() == &ctx) { return it->NullStream(); } } // If it's a pure SVM allocation with system memory access, then it shouldn't matter which device // runtime selects by default if (hip::host_context == &ctx) { // Return current... return getNullStream(); } return nullptr; } // ================================================================================================ int getDeviceID(amd::Context& ctx) { for (auto& it : g_devices) { if (it->asContext() == &ctx) { return it->deviceId(); } } return -1; } // ================================================================================================ hip::Stream* getNullStream() { Device* device = getCurrentDevice(); return device ? device->NullStream() : nullptr; } }; using namespace hip; hipError_t hipInit(unsigned int flags) { HIP_INIT_API(hipInit, flags); if (flags != 0) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipCtxCreate(hipCtx_t *ctx, unsigned int flags, hipDevice_t device) { HIP_INIT_API(hipCtxCreate, ctx, flags, device); if (static_cast(device) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidValue); } *ctx = reinterpret_cast(g_devices[device]); // Increment ref count for device primary context g_devices[device]->retain(); tls.ctxt_stack_.push(g_devices[device]); HIP_RETURN(hipSuccess); } hipError_t hipCtxSetCurrent(hipCtx_t ctx) { HIP_INIT_API(hipCtxSetCurrent, ctx); if (ctx == nullptr) { if(!tls.ctxt_stack_.empty()) { tls.ctxt_stack_.pop(); } } else { hip::tls.device_ = reinterpret_cast(ctx); if(!tls.ctxt_stack_.empty()) { tls.ctxt_stack_.pop(); } tls.ctxt_stack_.push(hip::getCurrentDevice()); } HIP_RETURN(hipSuccess); } hipError_t hipCtxGetCurrent(hipCtx_t* ctx) { HIP_INIT_API(hipCtxGetCurrent, ctx); *ctx = reinterpret_cast(hip::getCurrentDevice()); HIP_RETURN(hipSuccess); } hipError_t hipCtxGetSharedMemConfig(hipSharedMemConfig* pConfig) { HIP_INIT_API(hipCtxGetSharedMemConfig, pConfig); *pConfig = hipSharedMemBankSizeFourByte; HIP_RETURN(hipSuccess); } hipError_t hipRuntimeGetVersion(int *runtimeVersion) { HIP_INIT_API_NO_RETURN(hipRuntimeGetVersion, runtimeVersion); if (!runtimeVersion) { HIP_RETURN(hipErrorInvalidValue); } // HIP_VERSION = HIP_VERSION_MAJOR*100 + HIP_MINOR_VERSION *runtimeVersion = HIP_VERSION; HIP_RETURN(hipSuccess); } hipError_t hipCtxDestroy(hipCtx_t ctx) { HIP_INIT_API(hipCtxDestroy, ctx); hip::Device* dev = reinterpret_cast(ctx); if (dev == nullptr) { HIP_RETURN(hipErrorInvalidValue); } // Need to remove the ctx of calling thread if its the top one if (!tls.ctxt_stack_.empty() && tls.ctxt_stack_.top() == dev) { tls.ctxt_stack_.pop(); } // Remove context from global context list for (unsigned int i = 0; i < g_devices.size(); i++) { if (g_devices[i] == dev) { // Decrement ref count for device primary context dev->release(); } } HIP_RETURN(hipSuccess); } hipError_t hipCtxPopCurrent(hipCtx_t* ctx) { HIP_INIT_API(hipCtxPopCurrent, ctx); hip::Device** dev = reinterpret_cast(ctx); if (!tls.ctxt_stack_.empty()) { if (dev != nullptr) { *dev = tls.ctxt_stack_.top(); } tls.ctxt_stack_.pop(); } else { DevLogError("Context Stack empty \n"); HIP_RETURN(hipErrorInvalidContext); } HIP_RETURN(hipSuccess); } hipError_t hipCtxPushCurrent(hipCtx_t ctx) { HIP_INIT_API(hipCtxPushCurrent, ctx); hip::Device* dev = reinterpret_cast(ctx); if (dev == nullptr) { HIP_RETURN(hipErrorInvalidContext); } hip::tls.device_ = dev; tls.ctxt_stack_.push(hip::getCurrentDevice()); HIP_RETURN(hipSuccess); } hipError_t hipDriverGetVersion(int* driverVersion) { HIP_INIT_API_NO_RETURN(hipDriverGetVersion, driverVersion); if (!driverVersion) { HIP_RETURN(hipErrorInvalidValue); } // HIP_VERSION = HIP_VERSION_MAJOR*100 + HIP_MINOR_VERSION *driverVersion = HIP_VERSION; HIP_RETURN(hipSuccess); } hipError_t hipCtxGetDevice(hipDevice_t* device) { HIP_INIT_API(hipCtxGetDevice, device); if (device != nullptr) { *device = hip::getCurrentDevice()->deviceId(); HIP_RETURN(hipSuccess); } else { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipErrorInvalidContext); } hipError_t hipCtxGetApiVersion(hipCtx_t ctx, int* apiVersion) { HIP_INIT_API(hipCtxGetApiVersion, apiVersion); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipCtxGetCacheConfig(hipFuncCache_t* cacheConfig) { HIP_INIT_API(hipCtxGetCacheConfig, cacheConfig); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipCtxSetCacheConfig(hipFuncCache_t cacheConfig) { HIP_INIT_API(hipCtxSetCacheConfig, cacheConfig); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipCtxSetSharedMemConfig(hipSharedMemConfig config) { HIP_INIT_API(hipCtxSetSharedMemConfig, config); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipCtxSynchronize(void) { HIP_INIT_API(hipCtxSynchronize, 1); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipCtxGetFlags(unsigned int* flags) { HIP_INIT_API(hipCtxGetFlags, flags); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipDevicePrimaryCtxGetState(hipDevice_t dev, unsigned int* flags, int* active) { HIP_INIT_API(hipDevicePrimaryCtxGetState, dev, flags, active); if (static_cast(dev) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } if (flags != nullptr) { *flags = 0; } if (active != nullptr) { *active = g_devices[dev]->GetActiveStatus() ? 1 : 0; } HIP_RETURN(hipSuccess); } hipError_t hipDevicePrimaryCtxRelease(hipDevice_t dev) { HIP_INIT_API(hipDevicePrimaryCtxRelease, dev); if (static_cast(dev) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } HIP_RETURN(hipSuccess); } hipError_t hipDevicePrimaryCtxRetain(hipCtx_t* pctx, hipDevice_t dev) { HIP_INIT_API(hipDevicePrimaryCtxRetain, pctx, dev); if (static_cast(dev) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } if (pctx == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *pctx = reinterpret_cast(g_devices[dev]); HIP_RETURN(hipSuccess); } hipError_t hipDevicePrimaryCtxReset(hipDevice_t dev) { HIP_INIT_API(hipDevicePrimaryCtxReset, dev); HIP_RETURN(hipSuccess); } hipError_t hipDevicePrimaryCtxSetFlags(hipDevice_t dev, unsigned int flags) { HIP_INIT_API(hipDevicePrimaryCtxSetFlags, dev, flags); if (static_cast(dev) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } else { HIP_RETURN(hipErrorContextAlreadyInUse); } } clr-rocm-5.7.1/hipamd/src/hip_conversions.hpp000066400000000000000000000670261450307266000212320ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include #include namespace hip { inline cl_channel_type getCLChannelType(const hipArray_Format hipFormat, const hipTextureReadMode hipReadMode) { if (hipReadMode == hipReadModeElementType) { switch (hipFormat) { case HIP_AD_FORMAT_UNSIGNED_INT8: return CL_UNSIGNED_INT8; case HIP_AD_FORMAT_SIGNED_INT8: return CL_SIGNED_INT8; case HIP_AD_FORMAT_UNSIGNED_INT16: return CL_UNSIGNED_INT16; case HIP_AD_FORMAT_SIGNED_INT16: return CL_SIGNED_INT16; case HIP_AD_FORMAT_UNSIGNED_INT32: return CL_UNSIGNED_INT32; case HIP_AD_FORMAT_SIGNED_INT32: return CL_SIGNED_INT32; case HIP_AD_FORMAT_HALF: return CL_HALF_FLOAT; case HIP_AD_FORMAT_FLOAT: return CL_FLOAT; } } else if (hipReadMode == hipReadModeNormalizedFloat) { switch (hipFormat) { case HIP_AD_FORMAT_UNSIGNED_INT8: return CL_UNORM_INT8; case HIP_AD_FORMAT_SIGNED_INT8: return CL_SNORM_INT8; case HIP_AD_FORMAT_UNSIGNED_INT16: return CL_UNORM_INT16; case HIP_AD_FORMAT_SIGNED_INT16: return CL_SNORM_INT16; case HIP_AD_FORMAT_UNSIGNED_INT32: return CL_UNSIGNED_INT32; case HIP_AD_FORMAT_SIGNED_INT32: return CL_SIGNED_INT32; case HIP_AD_FORMAT_HALF: return CL_HALF_FLOAT; case HIP_AD_FORMAT_FLOAT: return CL_FLOAT; } } //error scenario return {}; } inline cl_channel_order getCLChannelOrder(const unsigned int hipNumChannels, const int sRGB) { switch (hipNumChannels) { case 1: return CL_R; case 2: return CL_RG; case 4: return (sRGB == 1) ? CL_sRGBA : CL_RGBA; default: break; } //error scenario return {}; } inline cl_mem_object_type getCLMemObjectType(const unsigned int hipWidth, const unsigned int hipHeight, const unsigned int hipDepth, const unsigned int flags) { if ((flags & hipArrayLayered) == hipArrayLayered) { if ((hipWidth != 0) && (hipHeight == 0) && (hipDepth != 0)) { return CL_MEM_OBJECT_IMAGE1D_ARRAY; } else if ((hipWidth != 0) && (hipHeight != 0) && (hipDepth != 0)) { return CL_MEM_OBJECT_IMAGE2D_ARRAY; } } else { if ((hipWidth != 0) && (hipHeight == 0) && (hipDepth == 0)) { return CL_MEM_OBJECT_IMAGE1D; } else if ((hipWidth != 0) && (hipHeight != 0) && (hipDepth == 0)) { return CL_MEM_OBJECT_IMAGE2D; } else if ((hipWidth != 0) && (hipHeight != 0) && (hipDepth != 0)) { return CL_MEM_OBJECT_IMAGE3D; } } // error scenario. ShouldNotReachHere() return CL_MEM_OBJECT_ALLOCATION_FAILURE; } inline cl_addressing_mode getCLAddressingMode(const hipTextureAddressMode hipAddressMode) { switch (hipAddressMode) { case hipAddressModeWrap: return CL_ADDRESS_REPEAT; case hipAddressModeClamp: return CL_ADDRESS_CLAMP_TO_EDGE; case hipAddressModeMirror: return CL_ADDRESS_MIRRORED_REPEAT; case hipAddressModeBorder: return CL_ADDRESS_CLAMP; } //error scenario return {}; } inline cl_filter_mode getCLFilterMode(const hipTextureFilterMode hipFilterMode) { switch (hipFilterMode) { case hipFilterModePoint: return CL_FILTER_NEAREST; case hipFilterModeLinear: return CL_FILTER_LINEAR; } //error scenario return {}; } inline cl_mem_object_type getCLMemObjectType(const hipResourceType hipResType) { switch (hipResType) { case hipResourceTypeLinear: return CL_MEM_OBJECT_IMAGE1D_BUFFER; case hipResourceTypePitch2D: return CL_MEM_OBJECT_IMAGE2D; default: break; } //error scenario return {}; } inline hipArray_Format getCL2hipArrayFormat(const cl_channel_type type) { switch (type) { case CL_SNORM_INT8: case CL_SIGNED_INT8: return HIP_AD_FORMAT_SIGNED_INT8; case CL_UNSIGNED_INT16: return HIP_AD_FORMAT_UNSIGNED_INT16; case CL_SIGNED_INT16: return HIP_AD_FORMAT_SIGNED_INT16; case CL_SIGNED_INT32: return HIP_AD_FORMAT_SIGNED_INT32; case CL_UNSIGNED_INT32: return HIP_AD_FORMAT_UNSIGNED_INT32; case CL_FLOAT: return HIP_AD_FORMAT_FLOAT; case CL_UNSIGNED_INT8: case CL_UNORM_INT8: case CL_UNORM_INT_101010: default: return HIP_AD_FORMAT_UNSIGNED_INT8; } } inline size_t getElementSize(const hipArray_const_t array) { switch (array->Format) { case HIP_AD_FORMAT_UNSIGNED_INT8: case HIP_AD_FORMAT_SIGNED_INT8: return 1 * array->NumChannels; case HIP_AD_FORMAT_UNSIGNED_INT16: case HIP_AD_FORMAT_SIGNED_INT16: case HIP_AD_FORMAT_HALF: return 2 * array->NumChannels; case HIP_AD_FORMAT_UNSIGNED_INT32: case HIP_AD_FORMAT_SIGNED_INT32: case HIP_AD_FORMAT_FLOAT: return 4 * array->NumChannels; } //error scenario return {}; } inline hipChannelFormatDesc getChannelFormatDesc(int numChannels, hipArray_Format arrayFormat) { switch (arrayFormat) { case HIP_AD_FORMAT_UNSIGNED_INT8: switch (numChannels) { case 1: return {8, 0, 0, 0, hipChannelFormatKindUnsigned}; case 2: return {8, 8, 0, 0, hipChannelFormatKindUnsigned}; case 4: return {8, 8, 8, 8, hipChannelFormatKindUnsigned}; } case HIP_AD_FORMAT_SIGNED_INT8: switch (numChannels) { case 1: return {8, 0, 0, 0, hipChannelFormatKindSigned}; case 2: return {8, 8, 0, 0, hipChannelFormatKindSigned}; case 4: return {8, 8, 8, 8, hipChannelFormatKindSigned}; } case HIP_AD_FORMAT_UNSIGNED_INT16: switch (numChannels) { case 1: return {16, 0, 0, 0, hipChannelFormatKindUnsigned}; case 2: return {16, 16, 0, 0, hipChannelFormatKindUnsigned}; case 4: return {16, 16, 16, 16, hipChannelFormatKindUnsigned}; } case HIP_AD_FORMAT_SIGNED_INT16: switch (numChannels) { case 1: return {16, 0, 0, 0, hipChannelFormatKindSigned}; case 2: return {16, 16, 0, 0, hipChannelFormatKindSigned}; case 4: return {16, 16, 16, 16, hipChannelFormatKindSigned}; } case HIP_AD_FORMAT_UNSIGNED_INT32: switch (numChannels) { case 1: return {32, 0, 0, 0, hipChannelFormatKindUnsigned}; case 2: return {32, 32, 0, 0, hipChannelFormatKindUnsigned}; case 4: return {32, 32, 32, 32, hipChannelFormatKindUnsigned}; } case HIP_AD_FORMAT_SIGNED_INT32: switch (numChannels) { case 1: return {32, 0, 0, 0, hipChannelFormatKindSigned}; case 2: return {32, 32, 0, 0, hipChannelFormatKindSigned}; case 4: return {32, 32, 32, 32, hipChannelFormatKindSigned}; } case HIP_AD_FORMAT_HALF: switch (numChannels) { case 1: return {16, 0, 0, 0, hipChannelFormatKindFloat}; case 2: return {16, 16, 0, 0, hipChannelFormatKindFloat}; case 4: return {16, 16, 16, 16, hipChannelFormatKindFloat}; } case HIP_AD_FORMAT_FLOAT: switch (numChannels) { case 1: return {32, 0, 0, 0, hipChannelFormatKindFloat}; case 2: return {32, 32, 0, 0, hipChannelFormatKindFloat}; case 4: return {32, 32, 32, 32, hipChannelFormatKindFloat}; } } //error scenario return {}; } inline unsigned int getNumChannels(const hipChannelFormatDesc& desc) { return ((desc.x != 0) + (desc.y != 0) + (desc.z != 0) + (desc.w != 0)); } inline bool CheckArrayFormat(const hipChannelFormatDesc& desc) { if(desc.x == 0) { return false; } else { if(desc.y != 0 && desc.y != desc.x) { return false; } if(desc.z !=0 && desc.z != desc.x) { return false; } if(desc.w !=0 && desc.w != desc.x) { return false; } } // The bit channel description should not allow any channels after a zero channel if (desc.y == 0) { return !(desc.z > 0 || desc.w > 0); } else if (desc.z == 0) { return !(desc.w > 0); } return true; } inline hipArray_Format getArrayFormat(const hipChannelFormatDesc& desc) { switch (desc.f) { case hipChannelFormatKindUnsigned: switch (desc.x) { case 8: return HIP_AD_FORMAT_UNSIGNED_INT8; case 16: return HIP_AD_FORMAT_UNSIGNED_INT16; case 32: return HIP_AD_FORMAT_UNSIGNED_INT32; } case hipChannelFormatKindSigned: switch (desc.x) { case 8: return HIP_AD_FORMAT_SIGNED_INT8; case 16: return HIP_AD_FORMAT_SIGNED_INT16; case 32: return HIP_AD_FORMAT_SIGNED_INT32; } case hipChannelFormatKindFloat: switch (desc.x) { case 16: return HIP_AD_FORMAT_HALF; case 32: return HIP_AD_FORMAT_FLOAT; } default: break; } //error scenario return {}; } inline int getNumChannels(const hipResourceViewFormat hipFormat) { switch (hipFormat) { case hipResViewFormatUnsignedChar1: case hipResViewFormatSignedChar1: case hipResViewFormatUnsignedShort1: case hipResViewFormatSignedShort1: case hipResViewFormatUnsignedInt1: case hipResViewFormatSignedInt1: case hipResViewFormatHalf1: case hipResViewFormatFloat1: return 1; case hipResViewFormatUnsignedChar2: case hipResViewFormatSignedChar2: case hipResViewFormatUnsignedShort2: case hipResViewFormatSignedShort2: case hipResViewFormatUnsignedInt2: case hipResViewFormatSignedInt2: case hipResViewFormatHalf2: case hipResViewFormatFloat2: return 2; case hipResViewFormatUnsignedChar4: case hipResViewFormatSignedChar4: case hipResViewFormatUnsignedShort4: case hipResViewFormatSignedShort4: case hipResViewFormatUnsignedInt4: case hipResViewFormatSignedInt4: case hipResViewFormatHalf4: case hipResViewFormatFloat4: return 4; default: break; } //error scenario return {}; } inline hipArray_Format getArrayFormat(const hipResourceViewFormat hipFormat) { switch (hipFormat) { case hipResViewFormatUnsignedChar1: case hipResViewFormatUnsignedChar2: case hipResViewFormatUnsignedChar4: return HIP_AD_FORMAT_UNSIGNED_INT8; case hipResViewFormatSignedChar1: case hipResViewFormatSignedChar2: case hipResViewFormatSignedChar4: return HIP_AD_FORMAT_SIGNED_INT8; case hipResViewFormatUnsignedShort1: case hipResViewFormatUnsignedShort2: case hipResViewFormatUnsignedShort4: return HIP_AD_FORMAT_UNSIGNED_INT16; case hipResViewFormatSignedShort1: case hipResViewFormatSignedShort2: case hipResViewFormatSignedShort4: return HIP_AD_FORMAT_SIGNED_INT16; case hipResViewFormatUnsignedInt1: case hipResViewFormatUnsignedInt2: case hipResViewFormatUnsignedInt4: return HIP_AD_FORMAT_UNSIGNED_INT32; case hipResViewFormatSignedInt1: case hipResViewFormatSignedInt2: case hipResViewFormatSignedInt4: return HIP_AD_FORMAT_SIGNED_INT32; case hipResViewFormatHalf1: case hipResViewFormatHalf2: case hipResViewFormatHalf4: return HIP_AD_FORMAT_HALF; case hipResViewFormatFloat1: case hipResViewFormatFloat2: case hipResViewFormatFloat4: return HIP_AD_FORMAT_FLOAT; default: break; } //error scenario return {}; } inline hipResourceViewFormat getResourceViewFormat(const hipChannelFormatDesc& desc) { switch (desc.f) { case hipChannelFormatKindUnsigned: switch (getNumChannels(desc)) { case 1: switch (desc.x) { case 8: return hipResViewFormatUnsignedChar1; case 16: return hipResViewFormatUnsignedShort1; case 32: return hipResViewFormatUnsignedInt1; } case 2: switch (desc.x) { case 8: return hipResViewFormatUnsignedChar2; case 16: return hipResViewFormatUnsignedShort2; case 32: return hipResViewFormatUnsignedInt2; } case 4: switch (desc.x) { case 8: return hipResViewFormatUnsignedChar4; case 16: return hipResViewFormatUnsignedShort4; case 32: return hipResViewFormatUnsignedInt4; } } case hipChannelFormatKindSigned: switch (getNumChannels(desc)) { case 1: switch (desc.x) { case 8: return hipResViewFormatSignedChar1; case 16: return hipResViewFormatSignedShort1; case 32: return hipResViewFormatSignedInt1; } case 2: switch (desc.x) { case 8: return hipResViewFormatSignedChar2; case 16: return hipResViewFormatSignedShort2; case 32: return hipResViewFormatSignedInt2; } case 4: switch (desc.x) { case 8: return hipResViewFormatSignedChar4; case 16: return hipResViewFormatSignedShort4; case 32: return hipResViewFormatSignedInt4; } } case hipChannelFormatKindFloat: switch (getNumChannels(desc)) { case 1: switch (desc.x) { case 16: return hipResViewFormatHalf1; case 32: return hipResViewFormatFloat1; } case 2: switch (desc.x) { case 16: return hipResViewFormatHalf2; case 32: return hipResViewFormatFloat2; } case 4: switch (desc.x) { case 16: return hipResViewFormatHalf4; case 32: return hipResViewFormatFloat4; } } default: break; } //error scenario return {}; } inline hipTextureDesc getTextureDesc(const textureReference* texRef) { hipTextureDesc texDesc = {}; std::memcpy(texDesc.addressMode, texRef->addressMode, sizeof(texDesc.addressMode)); texDesc.filterMode = texRef->filterMode; texDesc.readMode = texRef->readMode; texDesc.sRGB = texRef->sRGB; texDesc.normalizedCoords = texRef->normalized; texDesc.maxAnisotropy = texRef->maxAnisotropy; texDesc.mipmapFilterMode = texRef->mipmapFilterMode; texDesc.mipmapLevelBias = texRef->mipmapLevelBias; texDesc.minMipmapLevelClamp = texRef->minMipmapLevelClamp; texDesc.maxMipmapLevelClamp = texRef->maxMipmapLevelClamp; return texDesc; } inline hipResourceViewDesc getResourceViewDesc(hipArray_const_t array, const hipResourceViewFormat format) { hipResourceViewDesc resViewDesc = {}; resViewDesc.format = format; resViewDesc.width = array->width; resViewDesc.height = array->height; resViewDesc.depth = array->depth; resViewDesc.firstMipmapLevel = 0; resViewDesc.lastMipmapLevel = 0; resViewDesc.firstLayer = 0; resViewDesc.lastLayer = 0; /* TODO add hipArray::numLayers */ return resViewDesc; } inline hipResourceViewDesc getResourceViewDesc(hipMipmappedArray_const_t array, const hipResourceViewFormat format) { hipResourceViewDesc resViewDesc = {}; resViewDesc.format = format; resViewDesc.width = array->width; resViewDesc.height = array->height; resViewDesc.depth = array->depth; resViewDesc.firstMipmapLevel = 0; resViewDesc.lastMipmapLevel = 0; /* TODO add hipMipmappedArray::numMipLevels */ resViewDesc.firstLayer = 0; resViewDesc.lastLayer = 0; /* TODO add hipArray::numLayers */ return resViewDesc; } inline std::pair getMemoryType(const hipMemcpyKind kind) { switch (kind) { case hipMemcpyHostToHost: return {hipMemoryTypeHost, hipMemoryTypeHost}; case hipMemcpyHostToDevice: return {hipMemoryTypeHost, hipMemoryTypeDevice}; case hipMemcpyDeviceToHost: return {hipMemoryTypeDevice, hipMemoryTypeHost}; case hipMemcpyDeviceToDevice: return {hipMemoryTypeDevice, hipMemoryTypeDevice}; case hipMemcpyDefault: return {hipMemoryTypeUnified, hipMemoryTypeUnified}; } //error scenario return {}; } inline HIP_MEMCPY3D getDrvMemcpy3DDesc(const hip_Memcpy2D& desc2D) { HIP_MEMCPY3D desc3D = {}; desc3D.srcXInBytes = desc2D.srcXInBytes; desc3D.srcY = desc2D.srcY; desc3D.srcZ = 0; desc3D.srcLOD = 0; desc3D.srcMemoryType = desc2D.srcMemoryType; desc3D.srcHost = desc2D.srcHost; desc3D.srcDevice = desc2D.srcDevice; desc3D.srcArray = desc2D.srcArray; desc3D.srcPitch = desc2D.srcPitch; desc3D.srcHeight = 0; desc3D.dstXInBytes = desc2D.dstXInBytes; desc3D.dstY = desc2D.dstY; desc3D.dstZ = 0; desc3D.dstLOD = 0; desc3D.dstMemoryType = desc2D.dstMemoryType; desc3D.dstHost = desc2D.dstHost; desc3D.dstDevice = desc2D.dstDevice; desc3D.dstArray = desc2D.dstArray; desc3D.dstPitch = desc2D.dstPitch; desc3D.dstHeight = 0; desc3D.WidthInBytes = desc2D.WidthInBytes; desc3D.Height = desc2D.Height; desc3D.Depth = 1; return desc3D; } inline HIP_MEMCPY3D getDrvMemcpy3DDesc(const hipMemcpy3DParms& desc) { HIP_MEMCPY3D descDrv = {}; descDrv.WidthInBytes = desc.extent.width; descDrv.Height = desc.extent.height; descDrv.Depth = desc.extent.depth; descDrv.srcXInBytes = desc.srcPos.x; descDrv.srcY = desc.srcPos.y; descDrv.srcZ = desc.srcPos.z; descDrv.srcLOD = 0; descDrv.dstXInBytes = desc.dstPos.x; descDrv.dstY = desc.dstPos.y; descDrv.dstZ = desc.dstPos.z; descDrv.dstLOD = 0; if (desc.srcArray != nullptr) { descDrv.srcMemoryType = hipMemoryTypeArray; descDrv.srcArray = desc.srcArray; // When reffering to array memory, hipPos::x is in elements. descDrv.srcXInBytes *= getElementSize(desc.srcArray); } if (desc.srcPtr.ptr != nullptr) { descDrv.srcMemoryType = std::get<0>(hip::getMemoryType(desc.kind)); descDrv.srcHost = desc.srcPtr.ptr; descDrv.srcDevice = desc.srcPtr.ptr; descDrv.srcPitch = desc.srcPtr.pitch; descDrv.srcHeight = desc.srcPtr.ysize; } if (desc.dstArray != nullptr) { descDrv.dstMemoryType = hipMemoryTypeArray; descDrv.dstArray = desc.dstArray; // When reffering to array memory, hipPos::x is in elements. descDrv.dstXInBytes *= getElementSize(desc.dstArray); } if (desc.dstPtr.ptr != nullptr) { descDrv.dstMemoryType = std::get<1>(getMemoryType(desc.kind)); descDrv.dstHost = desc.dstPtr.ptr; descDrv.dstDevice = desc.dstPtr.ptr; descDrv.dstPitch = desc.dstPtr.pitch; descDrv.dstHeight = desc.dstPtr.ysize; } // If a HIP array is participating in the copy, the extent is defined in terms of that array's elements. if ((desc.srcArray != nullptr) && (desc.dstArray == nullptr)) { descDrv.WidthInBytes *= getElementSize(desc.srcArray); } else if ((desc.srcArray == nullptr) && (desc.dstArray != nullptr)) { descDrv.WidthInBytes *= getElementSize(desc.dstArray); } else if ((desc.srcArray != nullptr) && (desc.dstArray != nullptr)) { descDrv.WidthInBytes *= getElementSize(desc.dstArray); } return descDrv; } inline hipResourceType getResourceType(const HIPresourcetype resType) { // These two enums should be isomorphic. return static_cast(resType); } inline HIPresourcetype getResourceType(const hipResourceType resType) { // These two enums should be isomorphic. return static_cast(resType); } inline hipResourceDesc getResourceDesc(const HIP_RESOURCE_DESC& resDesc) { hipResourceDesc desc; desc.resType = getResourceType(resDesc.resType); switch (desc.resType) { case hipResourceTypeArray: desc.res.array.array = resDesc.res.array.hArray; break; case hipResourceTypeMipmappedArray: desc.res.mipmap.mipmap = resDesc.res.mipmap.hMipmappedArray; break; case hipResourceTypeLinear: desc.res.linear.devPtr = resDesc.res.linear.devPtr; desc.res.linear.desc = getChannelFormatDesc(resDesc.res.linear.numChannels, resDesc.res.linear.format); desc.res.linear.sizeInBytes = resDesc.res.linear.sizeInBytes; break; case hipResourceTypePitch2D: desc.res.pitch2D.devPtr = resDesc.res.pitch2D.devPtr; desc.res.pitch2D.desc = getChannelFormatDesc(resDesc.res.pitch2D.numChannels, resDesc.res.pitch2D.format); desc.res.pitch2D.width = resDesc.res.pitch2D.width; desc.res.pitch2D.height = resDesc.res.pitch2D.height; desc.res.pitch2D.pitchInBytes = resDesc.res.pitch2D.pitchInBytes; break; default: break; } return desc; } inline HIP_RESOURCE_DESC getResourceDesc(const hipResourceDesc& resDesc) { HIP_RESOURCE_DESC desc; desc.resType = getResourceType(resDesc.resType); switch (desc.resType) { case HIP_RESOURCE_TYPE_ARRAY: desc.res.array.hArray = resDesc.res.array.array; break; case HIP_RESOURCE_TYPE_MIPMAPPED_ARRAY: desc.res.mipmap.hMipmappedArray = resDesc.res.mipmap.mipmap; break; case HIP_RESOURCE_TYPE_LINEAR: desc.res.linear.devPtr = resDesc.res.linear.devPtr; desc.res.linear.numChannels = getNumChannels(resDesc.res.linear.desc); desc.res.linear.format = getArrayFormat(resDesc.res.linear.desc); desc.res.linear.sizeInBytes = resDesc.res.linear.sizeInBytes; break; case HIP_RESOURCE_TYPE_PITCH2D: desc.res.pitch2D.devPtr = resDesc.res.pitch2D.devPtr; desc.res.pitch2D.numChannels = getNumChannels(resDesc.res.pitch2D.desc); desc.res.pitch2D.format = getArrayFormat(resDesc.res.pitch2D.desc); desc.res.pitch2D.width = resDesc.res.pitch2D.width; desc.res.pitch2D.height = resDesc.res.pitch2D.height; desc.res.pitch2D.pitchInBytes = resDesc.res.pitch2D.pitchInBytes; break; default: break; } return desc; } inline hipTextureAddressMode getAddressMode(const HIPaddress_mode mode) { // These two enums should be isomorphic. return static_cast(mode); } inline HIPaddress_mode getAddressMode(const hipTextureAddressMode mode) { // These two enums should be isomorphic. return static_cast(mode); } inline hipTextureFilterMode getFilterMode(const HIPfilter_mode mode) { // These two enums should be isomorphic. return static_cast(mode); } inline HIPfilter_mode getFilterMode(const hipTextureFilterMode mode) { // These two enums should be isomorphic. return static_cast(mode); } inline hipTextureReadMode getReadMode(const unsigned int flags) { if (flags & HIP_TRSF_READ_AS_INTEGER) { return hipReadModeElementType; } else { return hipReadModeNormalizedFloat; } } inline unsigned int getReadMode(const hipTextureReadMode mode) { if (mode == hipReadModeElementType) { return HIP_TRSF_READ_AS_INTEGER; } else { return 0; } } inline int getsRGB(const unsigned int flags) { if (flags & HIP_TRSF_SRGB) { return 1; } else { return 0; } } inline unsigned int getsRGB(const int sRGB) { if (sRGB == 1) { return HIP_TRSF_SRGB; } else { return 0; } } inline int getNormalizedCoords(const unsigned int flags) { if (flags & HIP_TRSF_NORMALIZED_COORDINATES) { return 1; } else { return 0; } } inline unsigned int getNormalizedCoords(const int normalizedCoords) { if (normalizedCoords == 1) { return HIP_TRSF_NORMALIZED_COORDINATES; } else { return 0; } } inline hipTextureDesc getTextureDesc(const HIP_TEXTURE_DESC& texDesc) { hipTextureDesc desc; desc.addressMode[0] = getAddressMode(texDesc.addressMode[0]); desc.addressMode[1] = getAddressMode(texDesc.addressMode[1]); desc.addressMode[2] = getAddressMode(texDesc.addressMode[2]); desc.filterMode = getFilterMode(texDesc.filterMode); desc.readMode = getReadMode(texDesc.flags); desc.sRGB = getsRGB(texDesc.flags); std::memcpy(desc.borderColor, texDesc.borderColor, sizeof(desc.borderColor)); desc.normalizedCoords = getNormalizedCoords(texDesc.flags); desc.maxAnisotropy = texDesc.maxAnisotropy; desc.mipmapFilterMode = getFilterMode(texDesc.mipmapFilterMode); desc.mipmapLevelBias = texDesc.mipmapLevelBias; desc.minMipmapLevelClamp = texDesc.minMipmapLevelClamp; desc.maxMipmapLevelClamp = texDesc.maxMipmapLevelClamp; return desc; } inline HIP_TEXTURE_DESC getTextureDesc(const hipTextureDesc& texDesc) { HIP_TEXTURE_DESC desc; desc.addressMode[0] = getAddressMode(texDesc.addressMode[0]); desc.addressMode[1] = getAddressMode(texDesc.addressMode[1]); desc.addressMode[2] = getAddressMode(texDesc.addressMode[2]); desc.filterMode = getFilterMode(texDesc.filterMode); desc.flags = 0; desc.flags |= getReadMode(texDesc.readMode); desc.flags |= getsRGB(texDesc.sRGB); desc.flags |= getNormalizedCoords(texDesc.normalizedCoords); desc.maxAnisotropy = texDesc.maxAnisotropy; desc.mipmapFilterMode = getFilterMode(texDesc.mipmapFilterMode); desc.mipmapLevelBias = texDesc.mipmapLevelBias; desc.minMipmapLevelClamp = texDesc.minMipmapLevelClamp; desc.maxMipmapLevelClamp = texDesc.maxMipmapLevelClamp; std::memcpy(desc.borderColor, texDesc.borderColor, sizeof(desc.borderColor)); return desc; } inline hipResourceViewFormat getResourceViewFormat(const HIPresourceViewFormat format) { // These two enums should be isomorphic. return static_cast(format); } inline HIPresourceViewFormat getResourceViewFormat(const hipResourceViewFormat format) { // These two enums should be isomorphic. return static_cast(format); } inline hipResourceViewDesc getResourceViewDesc(const HIP_RESOURCE_VIEW_DESC& resViewDesc) { hipResourceViewDesc desc; desc.format = getResourceViewFormat(resViewDesc.format); desc.width = resViewDesc.width; desc.height = resViewDesc.height; desc.depth = resViewDesc.depth; desc.firstMipmapLevel = resViewDesc.firstMipmapLevel; desc.lastMipmapLevel = resViewDesc.lastMipmapLevel; desc.firstLayer = resViewDesc.firstLayer; desc.lastLayer = resViewDesc.lastLayer; return desc; } inline HIP_RESOURCE_VIEW_DESC getResourceViewDesc(const hipResourceViewDesc& resViewDesc) { HIP_RESOURCE_VIEW_DESC desc; desc.format = getResourceViewFormat(resViewDesc.format); desc.width = resViewDesc.width; desc.height = resViewDesc.height; desc.depth = resViewDesc.depth; desc.firstMipmapLevel = resViewDesc.firstMipmapLevel; desc.lastMipmapLevel = resViewDesc.lastMipmapLevel; desc.firstLayer = resViewDesc.firstLayer; desc.lastLayer = resViewDesc.lastLayer; return desc; } inline size_t getElementSize(const hipChannelFormatDesc &desc) { return (desc.x / 8) * getNumChannels(desc); } }; clr-rocm-5.7.1/hipamd/src/hip_device.cpp000066400000000000000000000320671450307266000201110ustar00rootroot00000000000000/* Copyright (c) 2018 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" #include "hip_mempool_impl.hpp" namespace hip { // ================================================================================================ hip::Stream* Device::NullStream() { if (null_stream_ == nullptr) { null_stream_ = new Stream(this, Stream::Priority::Normal, 0, true); } if (null_stream_ == nullptr) { return nullptr; } // Wait for all active streams before executing commands on the default iHipWaitActiveStreams(null_stream_); return null_stream_; } // ================================================================================================ bool Device::Create() { // Create default memory pool default_mem_pool_ = new MemoryPool(this); if (default_mem_pool_ == nullptr) { return false; } // Create graph memory pool graph_mem_pool_ = new MemoryPool(this); if (graph_mem_pool_ == nullptr) { return false; } if (!HIP_MEM_POOL_USE_VM) { uint64_t max_size = std::numeric_limits::max(); // Use maximum value to hold memory, because current implementation doesn't support VM // Note: the call for the threshold is always successful auto error = graph_mem_pool_->SetAttribute(hipMemPoolAttrReleaseThreshold, &max_size); } // Current is default pool after device creation current_mem_pool_ = default_mem_pool_; return true; } // ================================================================================================ void Device::AddMemoryPool(MemoryPool* pool) { amd::ScopedLock lock(lock_); if (auto it = mem_pools_.find(pool); it == mem_pools_.end()) { mem_pools_.insert(pool); } } // ================================================================================================ void Device::RemoveMemoryPool(MemoryPool* pool) { amd::ScopedLock lock(lock_); if (auto it = mem_pools_.find(pool); it != mem_pools_.end()) { mem_pools_.erase(it); } } // ================================================================================================ bool Device::FreeMemory(amd::Memory* memory, Stream* stream) { amd::ScopedLock lock(lock_); // Search for memory in the entire list of pools for (auto it : mem_pools_) { if (it->FreeMemory(memory, stream)) { return true; } } return false; } // ================================================================================================ void Device::ReleaseFreedMemory(Stream* stream) { amd::ScopedLock lock(lock_); // Search for memory in the entire list of pools for (auto it : mem_pools_) { it->ReleaseFreedMemory(stream); } } // ================================================================================================ void Device::RemoveStreamFromPools(Stream* stream) { amd::ScopedLock lock(lock_); // Update all pools with the destroyed stream for (auto it : mem_pools_) { it->RemoveStream(stream); } } // ================================================================================================ void Device::Reset() { { amd::ScopedLock lock(lock_); auto it = mem_pools_.begin(); while (it != mem_pools_.end()) { auto current = it++; (*current)->ReleaseAllMemory(); delete *current; } mem_pools_.clear(); } flags_ = hipDeviceScheduleSpin; hip::Stream::destroyAllStreams(deviceId_); amd::MemObjMap::Purge(devices()[0]); Create(); } // ================================================================================================ Device::~Device() { if (default_mem_pool_ != nullptr) { default_mem_pool_->release(); } if (graph_mem_pool_ != nullptr) { graph_mem_pool_->release(); } if (null_stream_!= nullptr) { hip::Stream::Destroy(null_stream_); } } } void ihipDestroyDevice() { for (auto deviceHandle : g_devices) { delete deviceHandle; } } hipError_t ihipDeviceGet(hipDevice_t* device, int deviceId) { if (device == nullptr) { return hipErrorInvalidValue; } if (deviceId < 0 || static_cast(deviceId) >= g_devices.size()) { return hipErrorInvalidDevice; } *device = deviceId; return hipSuccess; } hipError_t hipDeviceGet(hipDevice_t* device, int deviceId) { HIP_INIT_API(hipDeviceGet, device, deviceId); HIP_RETURN(ihipDeviceGet(device, deviceId)); } hipError_t hipDeviceTotalMem (size_t *bytes, hipDevice_t device) { HIP_INIT_API(hipDeviceTotalMem, bytes, device); if (device < 0 || static_cast(device) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } if (bytes == nullptr) { HIP_RETURN(hipErrorInvalidValue); } auto* deviceHandle = g_devices[device]->devices()[0]; const auto& info = deviceHandle->info(); *bytes = info.globalMemSize_; HIP_RETURN(hipSuccess); } hipError_t hipDeviceComputeCapability(int *major, int *minor, hipDevice_t device) { HIP_INIT_API(hipDeviceComputeCapability, major, minor, device); if (device < 0 || static_cast(device) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } if (major == nullptr || minor == nullptr) { HIP_RETURN(hipErrorInvalidValue); } auto* deviceHandle = g_devices[device]->devices()[0]; const auto& isa = deviceHandle->isa(); *major = isa.versionMajor(); *minor = isa.versionMinor(); HIP_RETURN(hipSuccess); } hipError_t hipDeviceGetCount(int* count) { HIP_INIT_API(hipDeviceGetCount, count); HIP_RETURN(ihipDeviceGetCount(count)); } hipError_t ihipDeviceGetCount(int* count) { if (count == nullptr) { return hipErrorInvalidValue; } // Get all available devices *count = g_devices.size(); if (*count < 1) { return hipErrorNoDevice; } return hipSuccess; } hipError_t hipDeviceGetName(char *name, int len, hipDevice_t device) { HIP_INIT_API(hipDeviceGetName, (void*)name, len, device); if (device < 0 || static_cast(device) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } if (name == nullptr || len <= 0) { HIP_RETURN(hipErrorInvalidValue); } auto* deviceHandle = g_devices[device]->devices()[0]; const auto& info = deviceHandle->info(); const auto nameLen = ::strlen(info.boardName_); // Only copy partial name if size of `dest` is smaller than size of `src` including // trailing zero byte auto memcpySize = (len <= (nameLen + 1) ? (len - 1) : nameLen); ::memcpy(name, info.boardName_, memcpySize); name[memcpySize] = '\0'; HIP_RETURN(hipSuccess); } hipError_t hipDeviceGetUuid(hipUUID* uuid, hipDevice_t device) { HIP_INIT_API(hipDeviceGetUuid, reinterpret_cast(uuid), device); if (device < 0 || static_cast(device) >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } if (uuid == nullptr) { HIP_RETURN(hipErrorInvalidValue); } auto* deviceHandle = g_devices[device]->devices()[0]; const auto& info = deviceHandle->info(); memcpy(uuid->bytes, info.uuid_, sizeof(info.uuid_)); HIP_RETURN(hipSuccess); } hipError_t ihipGetDeviceProperties(hipDeviceProp_t* props, hipDevice_t device) { if (props == nullptr) { return hipErrorInvalidValue; } if (unsigned(device) >= g_devices.size()) { return hipErrorInvalidDevice; } auto* deviceHandle = g_devices[device]->devices()[0]; constexpr auto int32_max = static_cast(std::numeric_limits::max()); constexpr auto uint16_max = static_cast(std::numeric_limits::max())+1; hipDeviceProp_t deviceProps = {0}; const auto& info = deviceHandle->info(); const auto& isa = deviceHandle->isa(); ::strncpy(deviceProps.name, info.boardName_, 128); deviceProps.totalGlobalMem = info.globalMemSize_; deviceProps.sharedMemPerBlock = info.localMemSizePerCU_; deviceProps.regsPerBlock = info.availableRegistersPerCU_; deviceProps.warpSize = info.wavefrontWidth_; deviceProps.maxThreadsPerBlock = info.maxWorkGroupSize_; deviceProps.maxThreadsDim[0] = info.maxWorkItemSizes_[0]; deviceProps.maxThreadsDim[1] = info.maxWorkItemSizes_[1]; deviceProps.maxThreadsDim[2] = info.maxWorkItemSizes_[2]; deviceProps.maxGridSize[0] = int32_max; deviceProps.maxGridSize[1] = uint16_max; deviceProps.maxGridSize[2] = uint16_max; deviceProps.clockRate = info.maxEngineClockFrequency_ * 1000; deviceProps.memoryClockRate = info.maxMemoryClockFrequency_ * 1000; deviceProps.memoryBusWidth = info.globalMemChannels_; deviceProps.totalConstMem = std::min(info.maxConstantBufferSize_, int32_max); deviceProps.major = isa.versionMajor(); deviceProps.minor = isa.versionMinor(); deviceProps.multiProcessorCount = info.maxComputeUnits_; deviceProps.l2CacheSize = info.l2CacheSize_; deviceProps.maxThreadsPerMultiProcessor = info.maxThreadsPerCU_; deviceProps.computeMode = 0; deviceProps.clockInstructionRate = info.timeStampFrequency_; deviceProps.arch.hasGlobalInt32Atomics = 1; deviceProps.arch.hasGlobalFloatAtomicExch = 1; deviceProps.arch.hasSharedInt32Atomics = 1; deviceProps.arch.hasSharedFloatAtomicExch = 1; deviceProps.arch.hasFloatAtomicAdd = 1; deviceProps.arch.hasGlobalInt64Atomics = 1; deviceProps.arch.hasSharedInt64Atomics = 1; deviceProps.arch.hasDoubles = 1; deviceProps.arch.hasWarpVote = 1; deviceProps.arch.hasWarpBallot = 1; deviceProps.arch.hasWarpShuffle = 1; deviceProps.arch.hasFunnelShift = 0; deviceProps.arch.hasThreadFenceSystem = 1; deviceProps.arch.hasSyncThreadsExt = 0; deviceProps.arch.hasSurfaceFuncs = 0; deviceProps.arch.has3dGrid = 1; deviceProps.arch.hasDynamicParallelism = 0; deviceProps.concurrentKernels = 1; deviceProps.pciDomainID = info.pciDomainID; deviceProps.pciBusID = info.deviceTopology_.pcie.bus; deviceProps.pciDeviceID = info.deviceTopology_.pcie.device; deviceProps.maxSharedMemoryPerMultiProcessor = info.localMemSizePerCU_; deviceProps.canMapHostMemory = 1; // FIXME: This should be removed, targets can have character names as well. deviceProps.gcnArch = isa.versionMajor() * 100 + isa.versionMinor() * 10 + isa.versionStepping(); sprintf(deviceProps.gcnArchName, "%s", isa.targetId()); deviceProps.cooperativeLaunch = info.cooperativeGroups_; deviceProps.cooperativeMultiDeviceLaunch = info.cooperativeMultiDeviceGroups_; deviceProps.cooperativeMultiDeviceUnmatchedFunc = info.cooperativeMultiDeviceGroups_; deviceProps.cooperativeMultiDeviceUnmatchedGridDim = info.cooperativeMultiDeviceGroups_; deviceProps.cooperativeMultiDeviceUnmatchedBlockDim = info.cooperativeMultiDeviceGroups_; deviceProps.cooperativeMultiDeviceUnmatchedSharedMem = info.cooperativeMultiDeviceGroups_; deviceProps.maxTexture1DLinear = std::min(16 * info.imageMaxBufferSize_, int32_max); // Max pixel size is 16 bytes deviceProps.maxTexture1D = std::min(info.image1DMaxWidth_, int32_max); deviceProps.maxTexture2D[0] = std::min(info.image2DMaxWidth_, int32_max); deviceProps.maxTexture2D[1] = std::min(info.image2DMaxHeight_, int32_max); deviceProps.maxTexture3D[0] = std::min(info.image3DMaxWidth_, int32_max); deviceProps.maxTexture3D[1] = std::min(info.image3DMaxHeight_, int32_max); deviceProps.maxTexture3D[2] = std::min(info.image3DMaxDepth_, int32_max); deviceProps.hdpMemFlushCntl = info.hdpMemFlushCntl; deviceProps.hdpRegFlushCntl = info.hdpRegFlushCntl; deviceProps.memPitch = std::min(info.maxMemAllocSize_, int32_max); deviceProps.textureAlignment = info.imageBaseAddressAlignment_; deviceProps.texturePitchAlignment = info.imagePitchAlignment_; deviceProps.kernelExecTimeoutEnabled = 0; deviceProps.ECCEnabled = info.errorCorrectionSupport_ ? 1 : 0; deviceProps.isLargeBar = info.largeBar_ ? 1 : 0; deviceProps.asicRevision = info.asicRevision_; // HMM capabilities deviceProps.managedMemory = info.hmmSupported_; deviceProps.concurrentManagedAccess = info.hmmSupported_; deviceProps.directManagedMemAccessFromHost = info.hmmDirectHostAccess_; deviceProps.pageableMemoryAccess = info.hmmCpuMemoryAccessible_; deviceProps.pageableMemoryAccessUsesHostPageTables = info.hostUnifiedMemory_; *props = deviceProps; return hipSuccess; } hipError_t hipGetDeviceProperties(hipDeviceProp_t* props, hipDevice_t device) { HIP_INIT_API(hipGetDeviceProperties, props, device); HIP_RETURN(ihipGetDeviceProperties(props, device)); } clr-rocm-5.7.1/hipamd/src/hip_device_runtime.cpp000066400000000000000000000453501450307266000216530ustar00rootroot00000000000000/* Copyright (c) 2018 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" hipError_t hipChooseDevice(int* device, const hipDeviceProp_t* properties) { HIP_INIT_API(hipChooseDevice, device, properties); if (device == nullptr || properties == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *device = 0; cl_uint maxMatchedCount = 0; int count = 0; HIP_RETURN_ONFAIL(ihipDeviceGetCount(&count)); for (cl_int i = 0; i< count; ++i) { hipDeviceProp_t currentProp = {0}; cl_uint validPropCount = 0; cl_uint matchedCount = 0; hipError_t err = ihipGetDeviceProperties(¤tProp, i); if (properties->major != 0) { validPropCount++; if(currentProp.major >= properties->major) { matchedCount++; } } if (properties->minor != 0) { validPropCount++; if(currentProp.minor >= properties->minor) { matchedCount++; } } if(properties->totalGlobalMem != 0) { validPropCount++; if(currentProp.totalGlobalMem >= properties->totalGlobalMem) { matchedCount++; } } if(properties->sharedMemPerBlock != 0) { validPropCount++; if(currentProp.sharedMemPerBlock >= properties->sharedMemPerBlock) { matchedCount++; } } if(properties->maxThreadsPerBlock != 0) { validPropCount++; if(currentProp.maxThreadsPerBlock >= properties->maxThreadsPerBlock ) { matchedCount++; } } if(properties->totalConstMem != 0) { validPropCount++; if(currentProp.totalConstMem >= properties->totalConstMem ) { matchedCount++; } } if(properties->multiProcessorCount != 0) { validPropCount++; if(currentProp.multiProcessorCount >= properties->multiProcessorCount ) { matchedCount++; } } if(properties->maxThreadsPerMultiProcessor != 0) { validPropCount++; if(currentProp.maxThreadsPerMultiProcessor >= properties->maxThreadsPerMultiProcessor ) { matchedCount++; } } if(properties->memoryClockRate != 0) { validPropCount++; if(currentProp.memoryClockRate >= properties->memoryClockRate ) { matchedCount++; } } if(properties->memoryBusWidth != 0) { validPropCount++; if(currentProp.memoryBusWidth >= properties->memoryBusWidth ) { matchedCount++; } } if(properties->l2CacheSize != 0) { validPropCount++; if(currentProp.l2CacheSize >= properties->l2CacheSize ) { matchedCount++; } } if(properties->regsPerBlock != 0) { validPropCount++; if(currentProp.regsPerBlock >= properties->regsPerBlock ) { matchedCount++; } } if(properties->maxSharedMemoryPerMultiProcessor != 0) { validPropCount++; if(currentProp.maxSharedMemoryPerMultiProcessor >= properties->maxSharedMemoryPerMultiProcessor ) { matchedCount++; } } if(properties->warpSize != 0) { validPropCount++; if(currentProp.warpSize >= properties->warpSize ) { matchedCount++; } } if(validPropCount == matchedCount) { *device = matchedCount > maxMatchedCount ? i : *device; maxMatchedCount = std::max(matchedCount, maxMatchedCount); } } HIP_RETURN(hipSuccess); } hipError_t hipDeviceGetAttribute(int* pi, hipDeviceAttribute_t attr, int device) { HIP_INIT_API(hipDeviceGetAttribute, pi, attr, device); if (pi == nullptr) { HIP_RETURN(hipErrorInvalidValue); } int count = 0; HIP_RETURN_ONFAIL(ihipDeviceGetCount(&count)); if (device < 0 || device >= count) { HIP_RETURN(hipErrorInvalidDevice); } //FIXME: should we cache the props, or just select from deviceHandle->info_? hipDeviceProp_t prop = {0}; HIP_RETURN_ONFAIL(ihipGetDeviceProperties(&prop, device)); constexpr auto int32_max = static_cast(std::numeric_limits::max()); switch (attr) { case hipDeviceAttributeMaxThreadsPerBlock: *pi = prop.maxThreadsPerBlock; break; case hipDeviceAttributeMaxBlockDimX: *pi = prop.maxThreadsDim[0]; break; case hipDeviceAttributeMaxBlockDimY: *pi = prop.maxThreadsDim[1]; break; case hipDeviceAttributeMaxBlockDimZ: *pi = prop.maxThreadsDim[2]; break; case hipDeviceAttributeMaxGridDimX: *pi = prop.maxGridSize[0]; break; case hipDeviceAttributeMaxGridDimY: *pi = prop.maxGridSize[1]; break; case hipDeviceAttributeMaxGridDimZ: *pi = prop.maxGridSize[2]; break; case hipDeviceAttributeMaxSharedMemoryPerBlock: *pi = prop.sharedMemPerBlock; break; case hipDeviceAttributeTotalConstantMemory: // size_t to int casting *pi = std::min(prop.totalConstMem, int32_max); break; case hipDeviceAttributeWarpSize: *pi = prop.warpSize; break; case hipDeviceAttributeMaxRegistersPerBlock: *pi = prop.regsPerBlock; break; case hipDeviceAttributeClockRate: *pi = prop.clockRate; break; case hipDeviceAttributeWallClockRate: *pi = g_devices[device]->devices()[0]->info().wallClockFrequency_; break; case hipDeviceAttributeMemoryClockRate: *pi = prop.memoryClockRate; break; case hipDeviceAttributeMemoryBusWidth: *pi = prop.memoryBusWidth; break; case hipDeviceAttributeMultiprocessorCount: *pi = prop.multiProcessorCount; break; case hipDeviceAttributeComputeMode: *pi = prop.computeMode; break; case hipDeviceAttributeL2CacheSize: *pi = prop.l2CacheSize; break; case hipDeviceAttributeMaxThreadsPerMultiProcessor: *pi = prop.maxThreadsPerMultiProcessor; break; case hipDeviceAttributeComputeCapabilityMajor: *pi = prop.major; break; case hipDeviceAttributeComputeCapabilityMinor: *pi = prop.minor; break; case hipDeviceAttributePciBusId: *pi = prop.pciBusID; break; case hipDeviceAttributeConcurrentKernels: *pi = prop.concurrentKernels; break; case hipDeviceAttributePciDeviceId: *pi = prop.pciDeviceID; break; case hipDeviceAttributeMaxSharedMemoryPerMultiprocessor: *pi = prop.maxSharedMemoryPerMultiProcessor; break; case hipDeviceAttributeIsMultiGpuBoard: *pi = prop.isMultiGpuBoard; break; case hipDeviceAttributeCooperativeLaunch: *pi = prop.cooperativeLaunch; break; case hipDeviceAttributeCooperativeMultiDeviceLaunch: *pi = prop.cooperativeMultiDeviceLaunch; break; case hipDeviceAttributeIntegrated: *pi = prop.integrated; break; case hipDeviceAttributeMaxTexture1DWidth: *pi = prop.maxTexture1D; break; case hipDeviceAttributeMaxTexture2DWidth: *pi = prop.maxTexture2D[0]; break; case hipDeviceAttributeMaxTexture2DHeight: *pi = prop.maxTexture2D[1]; break; case hipDeviceAttributeMaxTexture3DWidth: *pi = prop.maxTexture3D[0]; break; case hipDeviceAttributeMaxTexture3DHeight: *pi = prop.maxTexture3D[1]; break; case hipDeviceAttributeMaxTexture3DDepth: *pi = prop.maxTexture3D[2]; break; case hipDeviceAttributeHdpMemFlushCntl: *reinterpret_cast(pi) = prop.hdpMemFlushCntl; break; case hipDeviceAttributeHdpRegFlushCntl: *reinterpret_cast(pi) = prop.hdpRegFlushCntl; break; case hipDeviceAttributeMaxPitch: // size_t to int casting *pi = std::min(prop.memPitch, int32_max); break; case hipDeviceAttributeTextureAlignment: *pi = prop.textureAlignment; break; case hipDeviceAttributeTexturePitchAlignment: *pi = prop.texturePitchAlignment; break; case hipDeviceAttributeKernelExecTimeout: *pi = prop.kernelExecTimeoutEnabled; break; case hipDeviceAttributeCanMapHostMemory: *pi = prop.canMapHostMemory; break; case hipDeviceAttributeEccEnabled: *pi = prop.ECCEnabled; break; case hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc: *pi = prop.cooperativeMultiDeviceUnmatchedFunc; break; case hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim: *pi = prop.cooperativeMultiDeviceUnmatchedGridDim; break; case hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim: *pi = prop.cooperativeMultiDeviceUnmatchedBlockDim; break; case hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem: *pi = prop.cooperativeMultiDeviceUnmatchedSharedMem; break; case hipDeviceAttributeAsicRevision: *pi = prop.asicRevision; break; case hipDeviceAttributeManagedMemory: *pi = prop.managedMemory; break; case hipDeviceAttributeDirectManagedMemAccessFromHost: *pi = prop.directManagedMemAccessFromHost; break; case hipDeviceAttributeConcurrentManagedAccess: *pi = prop.concurrentManagedAccess; break; case hipDeviceAttributePageableMemoryAccess: *pi = prop.pageableMemoryAccess; break; case hipDeviceAttributePageableMemoryAccessUsesHostPageTables: *pi = prop.pageableMemoryAccessUsesHostPageTables; break; case hipDeviceAttributeUnifiedAddressing: // HIP runtime always uses SVM for host memory allocations. // Note: Host registered memory isn't covered by this feature // and still requires hipMemHostGetDevicePointer() call *pi = true; break; case hipDeviceAttributeCanUseStreamWaitValue: // hipStreamWaitValue64() and hipStreamWaitValue32() support *pi = g_devices[device]->devices()[0]->info().aqlBarrierValue_; break; case hipDeviceAttributeImageSupport: *pi = static_cast(g_devices[device]->devices()[0]->info().imageSupport_); break; case hipDeviceAttributePhysicalMultiProcessorCount: *pi = g_devices[device]->devices()[0]->info().maxPhysicalComputeUnits_; break; case hipDeviceAttributeFineGrainSupport: *pi = static_cast(g_devices[device]->devices()[0]->isFineGrainSupported()); break; case hipDeviceAttributeMemoryPoolsSupported: *pi = HIP_MEM_POOL_SUPPORT; break; case hipDeviceAttributeVirtualMemoryManagementSupported: *pi = static_cast(g_devices[device]->devices()[0]->info().virtualMemoryManagement_); break; default: HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipDeviceGetByPCIBusId(int* device, const char*pciBusIdstr) { HIP_INIT_API(hipDeviceGetByPCIBusId, device, pciBusIdstr); if (device == nullptr || pciBusIdstr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } int pciBusID = -1; int pciDeviceID = -1; int pciDomainID = -1; bool found = false; if (sscanf (pciBusIdstr, "%04x:%02x:%02x", reinterpret_cast(&pciDomainID), reinterpret_cast(&pciBusID), reinterpret_cast(&pciDeviceID)) == 0x3) { int count = 0; HIP_RETURN_ONFAIL(ihipDeviceGetCount(&count)); for (cl_int i = 0; i < count; i++) { hipDevice_t dev; hipDeviceProp_t prop; HIP_RETURN_ONFAIL(ihipDeviceGet(&dev, i)); HIP_RETURN_ONFAIL(ihipGetDeviceProperties(&prop, dev)); if ((pciBusID == prop.pciBusID) && (pciDomainID == prop.pciDomainID) && (pciDeviceID == prop.pciDeviceID)) { *device = i; found = true; break; } } } if (!found) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipDeviceGetCacheConfig ( hipFuncCache_t * cacheConfig ) { HIP_INIT_API(hipDeviceGetCacheConfig, cacheConfig); if(cacheConfig == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *cacheConfig = hipFuncCache_t(); HIP_RETURN(hipSuccess); } hipError_t hipDeviceGetLimit ( size_t* pValue, hipLimit_t limit ) { HIP_INIT_API(hipDeviceGetLimit, pValue, limit); if (pValue == nullptr || limit >= hipLimitRange) { HIP_RETURN(hipErrorInvalidValue); } switch (limit) { case hipLimitMallocHeapSize: hipDeviceProp_t prop; HIP_RETURN_ONFAIL(ihipGetDeviceProperties(&prop, ihipGetDevice())); *pValue = prop.totalGlobalMem; break; case hipLimitStackSize: *pValue = hip::getCurrentDevice()->devices()[0]->StackSize(); break; default: LogPrintfError("UnsupportedLimit = %d is passed", limit); HIP_RETURN(hipErrorUnsupportedLimit); } HIP_RETURN(hipSuccess); } hipError_t hipDeviceGetPCIBusId ( char* pciBusId, int len, int device ) { HIP_INIT_API(hipDeviceGetPCIBusId, (void*)pciBusId, len, device); int count; HIP_RETURN_ONFAIL(ihipDeviceGetCount(&count)); if (device < 0 || device >= count) { HIP_RETURN(hipErrorInvalidDevice); } //pciBusId should be large enough to store 13 characters including the NULL-terminator. if (pciBusId == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hipDeviceProp_t prop; HIP_RETURN_ONFAIL(ihipGetDeviceProperties(&prop, device)); snprintf (pciBusId, len, "%04x:%02x:%02x.0", prop.pciDomainID, prop.pciBusID, prop.pciDeviceID); HIP_RETURN(len <= 12 ? hipErrorInvalidValue : hipSuccess); } hipError_t hipDeviceGetSharedMemConfig ( hipSharedMemConfig * pConfig ) { HIP_INIT_API(hipDeviceGetSharedMemConfig, pConfig); if (pConfig == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *pConfig = hipSharedMemBankSizeFourByte; HIP_RETURN(hipSuccess); } hipError_t hipDeviceReset ( void ) { HIP_INIT_API(hipDeviceReset); hip::getCurrentDevice()->Reset(); HIP_RETURN(hipSuccess); } hipError_t hipDeviceSetCacheConfig ( hipFuncCache_t cacheConfig ) { HIP_INIT_API(hipDeviceSetCacheConfig, cacheConfig); // No way to set cache config yet. HIP_RETURN(hipSuccess); } hipError_t hipDeviceSetLimit ( hipLimit_t limit, size_t value ) { HIP_INIT_API(hipDeviceSetLimit, limit, value); if (limit >= hipLimitRange) { HIP_RETURN(hipErrorInvalidValue); } switch(limit) { case hipLimitStackSize : // need to query device size and take action if (!hip::getCurrentDevice()->devices()[0]->UpdateStackSize(value)) { HIP_RETURN(hipErrorInvalidValue); } break; case hipLimitMallocHeapSize: if (!hip::getCurrentDevice()->devices()[0]->UpdateInitialHeapSize(value)) { HIP_RETURN(hipErrorInvalidValue); } break; default: LogPrintfError("UnsupportedLimit = %d is passed", limit); HIP_RETURN(hipErrorUnsupportedLimit); } HIP_RETURN(hipSuccess); } hipError_t hipDeviceSetSharedMemConfig ( hipSharedMemConfig config ) { HIP_INIT_API(hipDeviceSetSharedMemConfig, config); if (config != hipSharedMemBankSizeDefault && config != hipSharedMemBankSizeFourByte && config != hipSharedMemBankSizeEightByte) { HIP_RETURN(hipErrorInvalidValue); } // No way to set cache config yet. HIP_RETURN(hipSuccess); } hipError_t hipDeviceSynchronize() { HIP_INIT_API(hipDeviceSynchronize); hip::Stream::SyncAllStreams(hip::getCurrentDevice()->deviceId()); HIP_RETURN(hipSuccess); } int ihipGetDevice() { hip::Device* device = hip::getCurrentDevice(); if(device == nullptr){ return -1; } return device->deviceId(); } hipError_t hipGetDevice ( int* deviceId ) { HIP_INIT_API(hipGetDevice, deviceId); if (deviceId != nullptr) { int dev = ihipGetDevice(); if (dev == -1) { HIP_RETURN(hipErrorNoDevice); } *deviceId = dev; HIP_RETURN(hipSuccess); } else { HIP_RETURN(hipErrorInvalidValue); } } hipError_t hipGetDeviceCount ( int* count ) { HIP_INIT_API_NO_RETURN(hipGetDeviceCount, count); HIP_RETURN(ihipDeviceGetCount(count)); } hipError_t hipGetDeviceFlags ( unsigned int* flags ) { HIP_INIT_API(hipGetDeviceFlags, flags); if (flags == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *flags = hip::getCurrentDevice()->getFlags(); HIP_RETURN(hipSuccess); } hipError_t hipSetDevice ( int device ) { HIP_INIT_API_NO_RETURN(hipSetDevice, device); if (static_cast(device) < g_devices.size()) { hip::setCurrentDevice(device); HIP_RETURN(hipSuccess); } else if (g_devices.empty()) { HIP_RETURN(hipErrorNoDevice); } HIP_RETURN(hipErrorInvalidDevice); } hipError_t hipSetDeviceFlags ( unsigned int flags ) { HIP_INIT_API(hipSetDeviceFlags, flags); if (g_devices.empty()) { HIP_RETURN(hipErrorNoDevice); } constexpr uint32_t supportedFlags = hipDeviceScheduleMask | hipDeviceMapHost | hipDeviceLmemResizeToMax; constexpr uint32_t mutualExclusiveFlags = hipDeviceScheduleSpin | hipDeviceScheduleYield | hipDeviceScheduleBlockingSync; // Only one scheduling flag allowed a time uint32_t scheduleFlag = flags & hipDeviceScheduleMask; if (((scheduleFlag & mutualExclusiveFlags) != hipDeviceScheduleSpin) && ((scheduleFlag & mutualExclusiveFlags) != hipDeviceScheduleYield) && ((scheduleFlag & mutualExclusiveFlags) != hipDeviceScheduleBlockingSync) && ((scheduleFlag & mutualExclusiveFlags) != hipDeviceScheduleAuto)) { HIP_RETURN(hipErrorInvalidValue); } if (flags & ~supportedFlags) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; switch (scheduleFlag) { case hipDeviceScheduleAuto: // Current behavior is different from the spec, due to MT usage in runtime if (hip::host_context->devices().size() >= std::thread::hardware_concurrency()) { device->SetActiveWait(false); break; } // Fall through for active wait... case hipDeviceScheduleSpin: case hipDeviceScheduleYield: // The both options falls into yield, because MT usage in runtime device->SetActiveWait(true); break; case hipDeviceScheduleBlockingSync: device->SetActiveWait(false); break; default: break; } hip::getCurrentDevice()->setFlags(flags & hipDeviceScheduleMask); HIP_RETURN(hipSuccess); } hipError_t hipSetValidDevices ( int* device_arr, int len ) { HIP_INIT_API(hipSetValidDevices, device_arr, len); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } clr-rocm-5.7.1/hipamd/src/hip_embed_pch.sh000077500000000000000000000156771450307266000204230ustar00rootroot00000000000000#!/bin/bash # Copyright (c) 2020 - 2022 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. printUsage() { echo echo "Usage: $(basename "$0") HIP_BUILD_INC_DIR HIP_INC_DIR HIP_AMD_INC_DIR LLVM_DIR [option] [RTC_LIB_OUTPUT]" echo echo "Options:" echo " -p, --generate_pch Generate pre-compiled header (default)" echo " -r, --generate_rtc Generate preprocessor expansion (hiprtc_header.o)" echo " -h, --help Prints this help" echo echo return 0 } if [ "$1" == "" ]; then printUsage exit 0 fi HIP_BUILD_INC_DIR="$1" HIP_INC_DIR="$2" HIP_AMD_INC_DIR="$3" LLVM_DIR="$4" # By default, generate pch TARGET="generatepch" while [ "$5" != "" ]; do case "$5" in -h | --help ) printUsage ; exit 0 ;; -p | --generate_pch ) TARGET="generatepch" ; break ;; -r | --generate_rtc ) TARGET="generatertc" ; break ;; *) echo " UNEXPECTED ERROR Parm : [$4] ">&2 ; exit 20 ;; esac shift 1 done # Allow hiprtc lib name to be set by argument 7 if [[ "$6" != "" ]]; then rtc_shared_lib_out="$6" else if [[ "$OSTYPE" == cygwin ]]; then rtc_shared_lib_out=hiprtc-builtins64.dll else rtc_shared_lib_out=libhiprtc-builtins.so fi fi if [[ "$OSTYPE" == cygwin || "$OSTYPE" == msys ]]; then isWindows=1 tmpdir=. else isWindows=0 tmpdir=/tmp fi # Expected first argument $1 to be output file name. create_hip_macro_file() { cat >$1 <$tmp/hip_pch.h <$tmp/hip_pch.mcin <$tmp/pch_wave32.cui && cat $tmp/hip_macros.h >> $tmp/pch_wave32.cui && $LLVM_DIR/bin/clang -cc1 -O3 -emit-pch -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -std=c++17 -fgnuc-version=4.2.1 -o $tmp/hip_wave32.pch -x hip-cpp-output - <$tmp/pch_wave32.cui && # For other devices $LLVM_DIR/bin/clang -O3 --rocm-path=$HIP_INC_DIR/.. -std=c++17 -nogpulib -isystem $HIP_INC_DIR -isystem $HIP_BUILD_INC_DIR -isystem $HIP_AMD_INC_DIR --cuda-device-only -x hip $tmp/hip_pch.h -E >$tmp/pch_wave64.cui && cat $tmp/hip_macros.h >> $tmp/pch_wave64.cui && $LLVM_DIR/bin/clang -cc1 -O3 -emit-pch -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -std=c++17 -fgnuc-version=4.2.1 -o $tmp/hip_wave64.pch -x hip-cpp-output - <$tmp/pch_wave64.cui && $LLVM_DIR/bin/llvm-mc -o hip_pch.o $tmp/hip_pch.mcin --filetype=obj && rm -rf $tmp } generate_rtc_header() { tmp=$tmpdir/hip_rtc.$$ mkdir -p $tmp local macroFile="$tmp/hip_macros.h" local headerFile="$tmp/hipRTC_header.h" local mcinFile="$tmp/hipRTC_header.mcin" create_hip_macro_file $macroFile cat >$headerFile < $mcinFile if [[ $isWindows -eq 0 ]]; then echo " .type __hipRTC_header,@object" >> $mcinFile echo " .type __hipRTC_header_size,@object" >> $mcinFile fi cat >>$mcinFile <> $tmp/hiprtc && $LLVM_DIR/bin/llvm-mc -o $tmp/hiprtc_header.o $tmp/hipRTC_header.mcin --filetype=obj && $LLVM_DIR/bin/clang $tmp/hiprtc_header.o -o $rtc_shared_lib_out -shared && $LLVM_DIR/bin/clang -O3 --rocm-path=$HIP_INC_DIR/.. -std=c++14 -nogpulib -nogpuinc -emit-llvm -c -o $tmp/tmp.bc --cuda-device-only -D__HIPCC_RTC__ --offload-arch=gfx906 -x hip-cpp-output $tmp/hiprtc && rm -rf $tmp } case $TARGET in (generatertc) generate_rtc_header ;; (generatepch) generate_pch ;; (*) die "Invalid target $TARGET" ;; esac clr-rocm-5.7.1/hipamd/src/hip_error.cpp000066400000000000000000000363321450307266000200020ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" hipError_t hipGetLastError() { HIP_INIT_API(hipGetLastError); hipError_t err = hip::tls.last_error_; hip::tls.last_error_ = hipSuccess; return err; } hipError_t hipPeekAtLastError() { HIP_INIT_API(hipPeekAtLastError); hipError_t err = hip::tls.last_error_; HIP_RETURN(err); } const char *ihipGetErrorName(hipError_t hip_error) { switch (hip_error) { case hipSuccess: return "hipSuccess"; case hipErrorInvalidValue: return "hipErrorInvalidValue"; case hipErrorOutOfMemory: return "hipErrorOutOfMemory"; case hipErrorNotInitialized: return "hipErrorNotInitialized"; case hipErrorDeinitialized: return "hipErrorDeinitialized"; case hipErrorProfilerDisabled: return "hipErrorProfilerDisabled"; case hipErrorProfilerNotInitialized: return "hipErrorProfilerNotInitialized"; case hipErrorProfilerAlreadyStarted: return "hipErrorProfilerAlreadyStarted"; case hipErrorProfilerAlreadyStopped: return "hipErrorProfilerAlreadyStopped"; case hipErrorInvalidConfiguration: return "hipErrorInvalidConfiguration"; case hipErrorInvalidSymbol: return "hipErrorInvalidSymbol"; case hipErrorInvalidDevicePointer: return "hipErrorInvalidDevicePointer"; case hipErrorInvalidMemcpyDirection: return "hipErrorInvalidMemcpyDirection"; case hipErrorInsufficientDriver: return "hipErrorInsufficientDriver"; case hipErrorMissingConfiguration: return "hipErrorMissingConfiguration"; case hipErrorPriorLaunchFailure: return "hipErrorPriorLaunchFailure"; case hipErrorInvalidDeviceFunction: return "hipErrorInvalidDeviceFunction"; case hipErrorNoDevice: return "hipErrorNoDevice"; case hipErrorInvalidDevice: return "hipErrorInvalidDevice"; case hipErrorInvalidPitchValue: return "hipErrorInvalidPitchValue"; case hipErrorInvalidImage: return "hipErrorInvalidImage"; case hipErrorInvalidContext: return "hipErrorInvalidContext"; case hipErrorContextAlreadyCurrent: return "hipErrorContextAlreadyCurrent"; case hipErrorMapFailed: return "hipErrorMapFailed"; case hipErrorUnmapFailed: return "hipErrorUnmapFailed"; case hipErrorArrayIsMapped: return "hipErrorArrayIsMapped"; case hipErrorAlreadyMapped: return "hipErrorAlreadyMapped"; case hipErrorNoBinaryForGpu: return "hipErrorNoBinaryForGpu"; case hipErrorAlreadyAcquired: return "hipErrorAlreadyAcquired"; case hipErrorNotMapped: return "hipErrorNotMapped"; case hipErrorNotMappedAsArray: return "hipErrorNotMappedAsArray"; case hipErrorNotMappedAsPointer: return "hipErrorNotMappedAsPointer"; case hipErrorECCNotCorrectable: return "hipErrorECCNotCorrectable"; case hipErrorUnsupportedLimit: return "hipErrorUnsupportedLimit"; case hipErrorContextAlreadyInUse: return "hipErrorContextAlreadyInUse"; case hipErrorPeerAccessUnsupported: return "hipErrorPeerAccessUnsupported"; case hipErrorInvalidKernelFile: return "hipErrorInvalidKernelFile"; case hipErrorInvalidGraphicsContext: return "hipErrorInvalidGraphicsContext"; case hipErrorInvalidSource: return "hipErrorInvalidSource"; case hipErrorFileNotFound: return "hipErrorFileNotFound"; case hipErrorSharedObjectSymbolNotFound: return "hipErrorSharedObjectSymbolNotFound"; case hipErrorSharedObjectInitFailed: return "hipErrorSharedObjectInitFailed"; case hipErrorOperatingSystem: return "hipErrorOperatingSystem"; case hipErrorInvalidHandle: return "hipErrorInvalidHandle"; case hipErrorIllegalState: return "hipErrorIllegalState"; case hipErrorNotFound: return "hipErrorNotFound"; case hipErrorNotReady: return "hipErrorNotReady"; case hipErrorIllegalAddress: return "hipErrorIllegalAddress"; case hipErrorLaunchOutOfResources: return "hipErrorLaunchOutOfResources"; case hipErrorLaunchTimeOut: return "hipErrorLaunchTimeOut"; case hipErrorPeerAccessAlreadyEnabled: return "hipErrorPeerAccessAlreadyEnabled"; case hipErrorPeerAccessNotEnabled: return "hipErrorPeerAccessNotEnabled"; case hipErrorSetOnActiveProcess: return "hipErrorSetOnActiveProcess"; case hipErrorContextIsDestroyed: return "hipErrorContextIsDestroyed"; case hipErrorAssert: return "hipErrorAssert"; case hipErrorHostMemoryAlreadyRegistered: return "hipErrorHostMemoryAlreadyRegistered"; case hipErrorHostMemoryNotRegistered: return "hipErrorHostMemoryNotRegistered"; case hipErrorLaunchFailure: return "hipErrorLaunchFailure"; case hipErrorNotSupported: return "hipErrorNotSupported"; case hipErrorUnknown: return "hipErrorUnknown"; case hipErrorRuntimeMemory: return "hipErrorRuntimeMemory"; case hipErrorRuntimeOther: return "hipErrorRuntimeOther"; case hipErrorCooperativeLaunchTooLarge: return "hipErrorCooperativeLaunchTooLarge"; case hipErrorStreamCaptureUnsupported: return "hipErrorStreamCaptureUnsupported"; case hipErrorStreamCaptureInvalidated: return "hipErrorStreamCaptureInvalidated"; case hipErrorStreamCaptureMerge: return "hipErrorStreamCaptureMerge"; case hipErrorStreamCaptureUnmatched: return "hipErrorStreamCaptureUnmatched"; case hipErrorStreamCaptureUnjoined: return "hipErrorStreamCaptureUnjoined"; case hipErrorStreamCaptureIsolation: return "hipErrorStreamCaptureIsolation"; case hipErrorStreamCaptureImplicit: return "hipErrorStreamCaptureImplicit"; case hipErrorCapturedEvent: return "hipErrorCapturedEvent"; case hipErrorStreamCaptureWrongThread: return "hipErrorStreamCaptureWrongThread"; case hipErrorGraphExecUpdateFailure: return "hipErrorGraphExecUpdateFailure"; case hipErrorTbd: return "hipErrorTbd"; default: return "hipErrorUnknown"; }; } const char *ihipGetErrorString(hipError_t hip_error) { switch(hip_error) { case hipSuccess: return "no error"; case hipErrorInvalidValue: return "invalid argument"; case hipErrorOutOfMemory: return "out of memory"; case hipErrorNotInitialized: return "initialization error"; case hipErrorDeinitialized: return "driver shutting down"; case hipErrorProfilerDisabled: return "profiler disabled while using external profiling tool"; case hipErrorProfilerNotInitialized: return "profiler is not initialized"; case hipErrorProfilerAlreadyStarted: return "profiler already started"; case hipErrorProfilerAlreadyStopped: return "profiler already stopped"; case hipErrorInvalidConfiguration: return "invalid configuration argument"; case hipErrorInvalidPitchValue: return "invalid pitch argument"; case hipErrorInvalidSymbol: return "invalid device symbol"; case hipErrorInvalidDevicePointer: return "invalid device pointer"; case hipErrorInvalidMemcpyDirection: return "invalid copy direction for memcpy"; case hipErrorInsufficientDriver: return "driver version is insufficient for runtime version"; case hipErrorMissingConfiguration: return "__global__ function call is not configured"; case hipErrorPriorLaunchFailure: return "unspecified launch failure in prior launch"; case hipErrorInvalidDeviceFunction: return "invalid device function"; case hipErrorNoDevice: return "no ROCm-capable device is detected"; case hipErrorInvalidDevice: return "invalid device ordinal"; case hipErrorInvalidImage: return "device kernel image is invalid"; case hipErrorInvalidContext: return "invalid device context"; case hipErrorContextAlreadyCurrent: return "context is already current context"; case hipErrorMapFailed: return "mapping of buffer object failed"; case hipErrorUnmapFailed: return "unmapping of buffer object failed"; case hipErrorArrayIsMapped: return "array is mapped"; case hipErrorAlreadyMapped: return "resource already mapped"; case hipErrorNoBinaryForGpu: return "no kernel image is available for execution on the device"; case hipErrorAlreadyAcquired: return "resource already acquired"; case hipErrorNotMapped: return "resource not mapped"; case hipErrorNotMappedAsArray: return "resource not mapped as array"; case hipErrorNotMappedAsPointer: return "resource not mapped as pointer"; case hipErrorECCNotCorrectable: return "uncorrectable ECC error encountered"; case hipErrorUnsupportedLimit: return "limit is not supported on this architecture"; case hipErrorContextAlreadyInUse: return "exclusive-thread device already in use by a different thread"; case hipErrorPeerAccessUnsupported: return "peer access is not supported between these two devices"; case hipErrorInvalidKernelFile: return "invalid kernel file"; case hipErrorInvalidGraphicsContext: return "invalid OpenGL or DirectX context"; case hipErrorInvalidSource: return "device kernel image is invalid"; case hipErrorFileNotFound: return "file not found"; case hipErrorSharedObjectSymbolNotFound: return "shared object symbol not found"; case hipErrorSharedObjectInitFailed: return "shared object initialization failed"; case hipErrorOperatingSystem: return "OS call failed or operation not supported on this OS"; case hipErrorInvalidHandle: return "invalid resource handle"; case hipErrorIllegalState: return "the operation cannot be performed in the present state"; case hipErrorNotFound: return "named symbol not found"; case hipErrorNotReady: return "device not ready"; case hipErrorIllegalAddress: return "an illegal memory access was encountered"; case hipErrorLaunchOutOfResources: return "too many resources requested for launch"; case hipErrorLaunchTimeOut: return "the launch timed out and was terminated"; case hipErrorPeerAccessAlreadyEnabled: return "peer access is already enabled"; case hipErrorPeerAccessNotEnabled: return "peer access has not been enabled"; case hipErrorSetOnActiveProcess: return "cannot set while device is active in this process"; case hipErrorContextIsDestroyed: return "context is destroyed"; case hipErrorAssert: return "device-side assert triggered"; case hipErrorHostMemoryAlreadyRegistered: return "part or all of the requested memory range is already mapped"; case hipErrorHostMemoryNotRegistered: return "pointer does not correspond to a registered memory region"; case hipErrorLaunchFailure: return "unspecified launch failure"; case hipErrorCooperativeLaunchTooLarge: return "too many blocks in cooperative launch"; case hipErrorNotSupported: return "operation not supported"; case hipErrorStreamCaptureUnsupported: return "operation not permitted when stream is capturing"; case hipErrorStreamCaptureInvalidated: return "operation failed due to a previous error during capture"; case hipErrorStreamCaptureMerge: return "operation would result in a merge of separate capture sequences"; case hipErrorStreamCaptureUnmatched: return "capture was not ended in the same stream as it began"; case hipErrorStreamCaptureUnjoined: return "capturing stream has unjoined work"; case hipErrorStreamCaptureIsolation: return "dependency created on uncaptured work in another stream"; case hipErrorStreamCaptureImplicit: return "operation would make the legacy stream depend on a capturing blocking stream"; case hipErrorCapturedEvent: return "operation not permitted on an event last recorded in a capturing stream"; case hipErrorStreamCaptureWrongThread: return "attempt to terminate a thread-local capture sequence from another thread"; case hipErrorGraphExecUpdateFailure: return "the graph update was not performed because it included changes which violated constraints specific to instantiated graph update"; case hipErrorRuntimeMemory: return "runtime memory call returned error"; case hipErrorRuntimeOther: return "runtime call other than memory returned error"; case hipErrorUnknown: default: return "unknown error"; } } const char* hipGetErrorName(hipError_t hip_error) { return ihipGetErrorName(hip_error); } const char *hipGetErrorString(hipError_t hip_error) { return ihipGetErrorString(hip_error); } hipError_t hipDrvGetErrorName(hipError_t hip_error, const char** errStr) { if (errStr == nullptr) { return hipErrorInvalidValue; } *errStr = ihipGetErrorName(hip_error); if (hip_error == hipErrorUnknown || strcmp( *errStr, "hipErrorUnknown") != 0) { return hipSuccess; } else { return hipErrorInvalidValue; } } hipError_t hipDrvGetErrorString(hipError_t hip_error, const char** errStr) { if (errStr == nullptr) { return hipErrorInvalidValue; } *errStr = ihipGetErrorString(hip_error); if (hip_error == hipErrorUnknown || strcmp( *errStr, "unknown error") != 0) { return hipSuccess; } else { return hipErrorInvalidValue; } } clr-rocm-5.7.1/hipamd/src/hip_event.cpp000066400000000000000000000343661450307266000177770ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_event.hpp" #if !defined(_MSC_VER) #include #endif namespace hip { static amd::Monitor eventSetLock{"Guards global event set"}; static std::unordered_set eventSet; bool Event::ready(eventType type) { if (event_->status() != CL_COMPLETE) { event_->notifyCmdQueue(); } // Check HW status of the ROCcrl event. Note: not all ROCclr modes support HW status bool ready = CheckHwEvent(type); if (!ready) { ready = (event_->status() == CL_COMPLETE); } return ready; } bool EventDD::ready(eventType type) { // Check HW status of the ROCcrl event. Note: not all ROCclr modes support HW status bool ready = CheckHwEvent(type); // FIXME: Remove status check entirely if (!ready) { ready = (event_->status() == CL_COMPLETE); } return ready; } hipError_t Event::query() { amd::ScopedLock lock(lock_); // If event is not recorded, event_ is null, hence return hipSuccess if (event_ == nullptr) { return hipSuccess; } return ready(Query) ? hipSuccess : hipErrorNotReady; } hipError_t Event::synchronize() { amd::ScopedLock lock(lock_); // If event is not recorded, event_ is null, hence return hipSuccess if (event_ == nullptr) { return hipSuccess; } // Check HW status of the ROCcrl event. Note: not all ROCclr modes support HW status static constexpr bool kWaitCompletion = true; if (!g_devices[deviceId()]->devices()[0]->IsHwEventReady(*event_, kWaitCompletion)) { if (event_->HwEvent() != nullptr) { amd::Command* command = nullptr; hipError_t status = recordCommand(command, event_->command().queue(), flags); command->enqueue(); g_devices[deviceId()]->devices()[0]->IsHwEventReady(command->event(), kWaitCompletion); command->release(); } else { event_->awaitCompletion(); } } return hipSuccess; } bool Event::awaitEventCompletion() { return event_->awaitCompletion(); } bool EventDD::awaitEventCompletion() { return g_devices[deviceId()]->devices()[0]->IsHwEventReady(*event_, true); } hipError_t Event::elapsedTime(Event& eStop, float& ms) { amd::ScopedLock startLock(lock_); if (this == &eStop) { ms = 0.f; if (event_ == nullptr) { return hipErrorInvalidHandle; } if (flags & hipEventDisableTiming) { return hipErrorInvalidHandle; } if (!ready(ElapsedTime)) { return hipErrorNotReady; } return hipSuccess; } amd::ScopedLock stopLock(eStop.lock()); if (event_ == nullptr || eStop.event() == nullptr) { return hipErrorInvalidHandle; } if ((flags | eStop.flags) & hipEventDisableTiming) { return hipErrorInvalidHandle; } if (!ready(ElapsedTime) || !eStop.ready(ElapsedTime)) { return hipErrorNotReady; } if (event_ == eStop.event_) { // Events are the same, which indicates the stream is empty and likely // eventRecord is called on another stream. For such cases insert and measure a // marker. amd::Command* command = new amd::Marker(*event_->command().queue(), kMarkerDisableFlush); command->enqueue(); command->awaitCompletion(); ms = static_cast(static_cast(command->event().profilingInfo().end_) - time(false)) / 1000000.f; command->release(); } else { // Note: with direct dispatch eStop.ready() relies on HW event, but CPU status can be delayed. // Hence for now make sure CPU status is updated by calling awaitCompletion(); awaitEventCompletion(); eStop.awaitEventCompletion(); if (unrecorded_ && eStop.isUnRecorded()) { // Both the events are not recorded, just need the end and start of stop event ms = static_cast(eStop.time(false) - eStop.time(true)) / 1000000.f; } else { ms = static_cast(eStop.time(false) - time(false)) / 1000000.f; } } return hipSuccess; } int64_t Event::time(bool getStartTs) const { assert(event_ != nullptr); if (getStartTs) { return static_cast(event_->profilingInfo().start_); } else { return static_cast(event_->profilingInfo().end_); } } int64_t EventDD::time(bool getStartTs) const { uint64_t start = 0, end = 0; assert(event_ != nullptr); g_devices[deviceId()]->devices()[0]->getHwEventTime(*event_, &start, &end); // FIXME: This is only needed if the command had to wait CL_COMPLETE status if (start == 0 || end == 0) { return Event::time(getStartTs); } if (getStartTs) { return static_cast(start); } else { return static_cast(end); } } hipError_t Event::streamWaitCommand(amd::Command*& command, hip::Stream* stream) { amd::Command::EventWaitList eventWaitList; if (event_ != nullptr) { eventWaitList.push_back(event_); } command = new amd::Marker(*stream, kMarkerDisableFlush, eventWaitList); if (command == NULL) { return hipErrorOutOfMemory; } return hipSuccess; } hipError_t Event::enqueueStreamWaitCommand(hipStream_t stream, amd::Command* command) { command->enqueue(); return hipSuccess; } hipError_t Event::streamWait(hipStream_t stream, uint flags) { hip::Stream* hip_stream = hip::getStream(stream); // Access to event_ object must be lock protected amd::ScopedLock lock(lock_); if ((event_ == nullptr) || (event_->command().queue() == hip_stream) || ready(StreamWait)) { return hipSuccess; } if (!event_->notifyCmdQueue()) { return hipErrorLaunchOutOfResources; } amd::Command* command; hipError_t status = streamWaitCommand(command, hip_stream); if (status != hipSuccess) { return status; } status = enqueueStreamWaitCommand(stream, command); if (status != hipSuccess) { return status; } command->release(); return hipSuccess; } hipError_t Event::recordCommand(amd::Command*& command, amd::HostQueue* stream, uint32_t ext_flags ) { if (command == nullptr) { int32_t releaseFlags = ((ext_flags == 0) ? flags : ext_flags) & (hipEventReleaseToDevice | hipEventReleaseToSystem | hipEventDisableSystemFence); if (releaseFlags & hipEventDisableSystemFence) { releaseFlags = amd::Device::kCacheStateIgnore; } else { releaseFlags = amd::Device::kCacheStateInvalid; } // Always submit a EventMarker. command = new hip::EventMarker(*stream, !kMarkerDisableFlush, true, releaseFlags); } return hipSuccess; } hipError_t Event::enqueueRecordCommand(hipStream_t stream, amd::Command* command, bool record) { command->enqueue(); if (event_ == &command->event()) return hipSuccess; if (event_ != nullptr) { event_->release(); } event_ = &command->event(); unrecorded_ = !record; return hipSuccess; } hipError_t Event::addMarker(hipStream_t stream, amd::Command* command, bool record) { hip::Stream* hip_stream = hip::getStream(stream); // Keep the lock always at the beginning of this to avoid a race. SWDEV-277847 amd::ScopedLock lock(lock_); hipError_t status = recordCommand(command, hip_stream); if (status != hipSuccess) { return hipSuccess; } status = enqueueRecordCommand(stream, command, record); return status; } // ================================================================================================ bool isValid(hipEvent_t event) { // NULL event is always valid if (event == nullptr) { return true; } amd::ScopedLock lock(eventSetLock); if (eventSet.find(event) == eventSet.end()) { return false; } return true; } } // namespace hip // ================================================================================================ hipError_t ihipEventCreateWithFlags(hipEvent_t* event, unsigned flags) { unsigned supportedFlags = hipEventDefault | hipEventBlockingSync | hipEventDisableTiming | hipEventReleaseToDevice | hipEventReleaseToSystem | hipEventInterprocess | hipEventDisableSystemFence; const unsigned releaseFlags = (hipEventReleaseToDevice | hipEventReleaseToSystem | hipEventDisableSystemFence); // can't set any unsupported flags. // can set only one of the release flags. // if hipEventInterprocess flag is set, then hipEventDisableTiming flag also must be set const bool illegalFlags = (flags & ~supportedFlags) || ([](unsigned int num){ unsigned int bitcount; for (bitcount = 0; num; bitcount++) { num &= num - 1; } return bitcount; } (flags & releaseFlags) > 1) || ((flags & hipEventInterprocess) && !(flags & hipEventDisableTiming)); if (!illegalFlags) { hip::Event* e = nullptr; if (flags & hipEventInterprocess) { e = new hip::IPCEvent(); } else { if (AMD_DIRECT_DISPATCH) { e = new hip::EventDD(flags); } else { e = new hip::Event(flags); } } if (e == nullptr) { return hipErrorOutOfMemory; } *event = reinterpret_cast(e); amd::ScopedLock lock(hip::eventSetLock); hip::eventSet.insert(*event); } else { return hipErrorInvalidValue; } return hipSuccess; } hipError_t hipEventCreateWithFlags(hipEvent_t* event, unsigned flags) { HIP_INIT_API(hipEventCreateWithFlags, event, flags); if (event == nullptr) { return hipErrorInvalidValue; } HIP_RETURN(ihipEventCreateWithFlags(event, flags), *event); } hipError_t hipEventCreate(hipEvent_t* event) { HIP_INIT_API(hipEventCreate, event); if (event == nullptr) { return hipErrorInvalidValue; } HIP_RETURN(ihipEventCreateWithFlags(event, 0), *event); } hipError_t hipEventDestroy(hipEvent_t event) { HIP_INIT_API(hipEventDestroy, event); if (event == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } amd::ScopedLock lock(hip::eventSetLock); if (hip::eventSet.erase(event) == 0 ) { return hipErrorContextIsDestroyed; } hip::Event* e = reinterpret_cast(event); // There is a possibility that stream destroy be called first hipStream_t s = e->GetCaptureStream(); if (hip::isValid(s)) { if (e->GetCaptureStream() != nullptr) { reinterpret_cast(e->GetCaptureStream())->EraseCaptureEvent(event); } } delete e; HIP_RETURN(hipSuccess); } hipError_t hipEventElapsedTime(float* ms, hipEvent_t start, hipEvent_t stop) { HIP_INIT_API(hipEventElapsedTime, ms, start, stop); if (ms == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (start == nullptr || stop == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } hip::Event* eStart = reinterpret_cast(start); hip::Event* eStop = reinterpret_cast(stop); if (eStart->deviceId() != eStop->deviceId()) { HIP_RETURN(hipErrorInvalidHandle); } HIP_RETURN(eStart->elapsedTime(*eStop, *ms), "Elapsed Time = ", *ms); } hipError_t hipEventRecord_common(hipEvent_t event, hipStream_t stream) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node EventRecord on stream : %p, Event %p", stream, event); hipError_t status = hipSuccess; if (event == nullptr) { return hipErrorInvalidHandle; } getStreamPerThread(stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Event* e = reinterpret_cast(event); hip::Stream* s = reinterpret_cast(stream); hip::Stream* hip_stream = hip::getStream(stream); e->SetCaptureStream(stream); if ((s != nullptr) && (s->GetCaptureStatus() == hipStreamCaptureStatusActive)) { s->SetCaptureEvent(event); std::vector lastCapturedNodes = s->GetLastCapturedNodes(); if (!lastCapturedNodes.empty()) { e->SetNodesPrevToRecorded(lastCapturedNodes); } } else { if (g_devices[e->deviceId()]->devices()[0] != &hip_stream->device()) { return hipErrorInvalidHandle; } status = e->addMarker(stream, nullptr, true); } return status; } hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream) { HIP_INIT_API(hipEventRecord, event, stream); HIP_RETURN(hipEventRecord_common(event, stream)); } hipError_t hipEventRecord_spt(hipEvent_t event, hipStream_t stream) { HIP_INIT_API(hipEventRecord, event, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipEventRecord_common(event, stream)); } hipError_t hipEventSynchronize(hipEvent_t event) { HIP_INIT_API(hipEventSynchronize, event); if (event == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } hip::Event* e = reinterpret_cast(event); hip::Stream* s = reinterpret_cast(e->GetCaptureStream()); if ((s != nullptr) && (s->GetCaptureStatus() == hipStreamCaptureStatusActive)) { if (s->IsEventCaptured(event) == false) { return HIP_RETURN(hipErrorStreamCaptureUnsupported); } } if (hip::Stream::StreamCaptureOngoing(e->GetCaptureStream()) == true) { HIP_RETURN(hipErrorStreamCaptureUnsupported); } HIP_RETURN(e->synchronize()); } hipError_t ihipEventQuery(hipEvent_t event) { if (event == nullptr) { return hipErrorInvalidHandle; } hip::Event* e = reinterpret_cast(event); return e->query(); } hipError_t hipEventQuery(hipEvent_t event) { HIP_INIT_API(hipEventQuery, event); HIP_RETURN(ihipEventQuery(event)); } clr-rocm-5.7.1/hipamd/src/hip_event.hpp000066400000000000000000000176251450307266000200030ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_EVENT_H #define HIP_EVENT_H #include "hip_internal.hpp" #include "thread/monitor.hpp" // Internal structure for stream callback handler class StreamCallback { protected: void* userData_; public: StreamCallback(void* userData) : userData_(userData) {} virtual void CL_CALLBACK callback() = 0; }; class StreamAddCallback : public StreamCallback { hipStreamCallback_t callBack_; hipStream_t stream_; public: StreamAddCallback(hipStream_t stream, hipStreamCallback_t callback, void* userData) : StreamCallback(userData) { stream_ = stream; callBack_ = callback; } void CL_CALLBACK callback() { hipError_t status = hipSuccess; callBack_(stream_, status, userData_); } }; class LaunchHostFuncCallback : public StreamCallback { hipHostFn_t callBack_; public: LaunchHostFuncCallback(hipHostFn_t callback, void* userData) : StreamCallback(userData) { callBack_ = callback; } void CL_CALLBACK callback() { callBack_(userData_); } }; void CL_CALLBACK ihipStreamCallback(cl_event event, cl_int command_exec_status, void* user_data); namespace hip { #define IPC_SIGNALS_PER_EVENT 32 typedef struct ihipIpcEventShmem_s { std::atomic owners; std::atomic owners_device_id; std::atomic owners_process_id; std::atomic read_index; std::atomic write_index; uint32_t signal[IPC_SIGNALS_PER_EVENT]; } ihipIpcEventShmem_t; class EventMarker : public amd::Marker { public: EventMarker(amd::HostQueue& stream, bool disableFlush, bool markerTs = false, int32_t scope = amd::Device::kCacheStateInvalid) : amd::Marker(stream, disableFlush) { profilingInfo_.enabled_ = true; profilingInfo_.callback_ = nullptr; profilingInfo_.marker_ts_ = markerTs; profilingInfo_.clear(); setEventScope(scope); } }; enum eventType { Query, StreamWait, ElapsedTime }; class Event { /// capture stream where event is recorded hipStream_t captureStream_ = nullptr; /// Previous captured nodes before event record std::vector nodesPrevToRecorded_; protected: bool CheckHwEvent(eventType type) { bool ready; if (type == Query) { ready = g_devices[deviceId()]->devices()[0]->IsHwEventReadyForcedWait(*event_); } else { ready = g_devices[deviceId()]->devices()[0]->IsHwEventReady(*event_); } return ready; } public: Event(unsigned int flags) : flags(flags), lock_("hipEvent_t", true), event_(nullptr), unrecorded_(false), stream_(nullptr) { // No need to init event_ here as addMarker does that device_id_ = hip::getCurrentDevice()->deviceId(); // Created in current device ctx } virtual ~Event() { if (event_ != nullptr) { event_->release(); } } unsigned int flags; virtual hipError_t query(); virtual hipError_t synchronize(); hipError_t elapsedTime(Event& eStop, float& ms); virtual hipError_t streamWaitCommand(amd::Command*& command, hip::Stream* stream); virtual hipError_t enqueueStreamWaitCommand(hipStream_t stream, amd::Command* command); virtual hipError_t streamWait(hipStream_t stream, uint flags); virtual hipError_t recordCommand(amd::Command*& command, amd::HostQueue* stream, uint32_t flags = 0); virtual hipError_t enqueueRecordCommand(hipStream_t stream, amd::Command* command, bool record); hipError_t addMarker(hipStream_t stream, amd::Command* command, bool record); void BindCommand(amd::Command& command, bool record) { amd::ScopedLock lock(lock_); if (event_ != nullptr) { event_->release(); } event_ = &command.event(); unrecorded_ = !record; command.retain(); } bool isUnRecorded() const { return unrecorded_; } amd::Monitor& lock() { return lock_; } const int deviceId() const { return device_id_; } void setDeviceId(int id) { device_id_ = id; } amd::Event* event() { return event_; } /// Get capture stream where event is recorded hipStream_t GetCaptureStream() const { return captureStream_; } /// Set capture stream where event is recorded void SetCaptureStream(hipStream_t stream) { captureStream_ = stream; } /// Returns previous captured nodes before event record std::vector GetNodesPrevToRecorded() const { return nodesPrevToRecorded_; } /// Set last captured graph node before event record void SetNodesPrevToRecorded(std::vector& graphNode) { nodesPrevToRecorded_ = graphNode; } virtual hipError_t GetHandle(ihipIpcEventHandle_t* handle) { return hipErrorInvalidConfiguration; } virtual hipError_t OpenHandle(ihipIpcEventHandle_t* handle) { return hipErrorInvalidConfiguration; } virtual bool awaitEventCompletion(); virtual bool ready(eventType type); virtual int64_t time(bool getStartTs) const; protected: amd::Monitor lock_; hip::Stream* stream_; amd::Event* event_; int device_id_; //! Flag to indicate hipEventRecord has not been called. This is needed for //! hip*ModuleLaunchKernel API which takes start and stop events so no //! hipEventRecord is called. Cleanup needed once those APIs are deprecated. bool unrecorded_; }; class EventDD : public Event { public: EventDD(unsigned int flags) : Event(flags) {} virtual ~EventDD() {} virtual bool awaitEventCompletion(); virtual bool ready(eventType type); virtual int64_t time(bool getStartTs) const; }; class IPCEvent : public Event { // IPC Events struct ihipIpcEvent_t { std::string ipc_name_; int ipc_fd_; ihipIpcEventShmem_t* ipc_shmem_; ihipIpcEvent_t() : ipc_name_("dummy"), ipc_fd_(0), ipc_shmem_(nullptr) {} void setipcname(const char* name) { ipc_name_ = std::string(name); } }; ihipIpcEvent_t ipc_evt_; public: ~IPCEvent() { if (ipc_evt_.ipc_shmem_) { int owners = --ipc_evt_.ipc_shmem_->owners; // Make sure event is synchronized hipError_t status = synchronize(); status = ihipHostUnregister(&ipc_evt_.ipc_shmem_->signal); if (!amd::Os::MemoryUnmapFile(ipc_evt_.ipc_shmem_, sizeof(hip::ihipIpcEventShmem_t))) { // print hipErrorInvalidHandle; } } } IPCEvent() : Event(hipEventInterprocess) {} bool createIpcEventShmemIfNeeded(); hipError_t GetHandle(ihipIpcEventHandle_t* handle); hipError_t OpenHandle(ihipIpcEventHandle_t* handle); hipError_t synchronize(); hipError_t query(); hipError_t streamWaitCommand(amd::Command*& command, hip::Stream* stream); hipError_t enqueueStreamWaitCommand(hipStream_t stream, amd::Command* command); hipError_t streamWait(hipStream_t stream, uint flags); hipError_t recordCommand(amd::Command*& command, amd::HostQueue* queue, uint32_t flags = 0); hipError_t enqueueRecordCommand(hipStream_t stream, amd::Command* command, bool record); }; }; // namespace hip struct CallbackData { int previous_read_index; hip::ihipIpcEventShmem_t* shmem; }; #endif // HIP_EVEMT_H clr-rocm-5.7.1/hipamd/src/hip_event_ipc.cpp000066400000000000000000000206731450307266000206260ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_event.hpp" #if !defined(_MSC_VER) #include #else #include #endif // ================================================================================================ hipError_t ihipEventCreateWithFlags(hipEvent_t* event, unsigned flags); namespace hip { bool IPCEvent::createIpcEventShmemIfNeeded() { if (ipc_evt_.ipc_shmem_) { // ipc_shmem_ already created, no need to create it again return true; } char name_template[] = "/tmp/eventXXXXXX"; #if !defined(_MSC_VER) int temp_fd = mkstemp(name_template); #else _mktemp_s(name_template, sizeof(name_template)); #endif ipc_evt_.ipc_name_ = name_template; ipc_evt_.ipc_name_.replace(0, 5, "/hip_"); if (!amd::Os::MemoryMapFileTruncated( ipc_evt_.ipc_name_.c_str(), const_cast(reinterpret_cast(&(ipc_evt_.ipc_shmem_))), sizeof(hip::ihipIpcEventShmem_t))) { return false; } #if !defined(_MSC_VER) close(temp_fd); #endif ipc_evt_.ipc_shmem_->owners = 1; ipc_evt_.ipc_shmem_->read_index = -1; ipc_evt_.ipc_shmem_->write_index = 0; for (uint32_t sig_idx = 0; sig_idx < IPC_SIGNALS_PER_EVENT; ++sig_idx) { ipc_evt_.ipc_shmem_->signal[sig_idx] = 0; } // device sets 0 to this ptr when the ipc event is completed hipError_t status = ihipHostRegister(&ipc_evt_.ipc_shmem_->signal, sizeof(uint32_t) * IPC_SIGNALS_PER_EVENT, 0); if (status != hipSuccess) { return false; } return true; } hipError_t IPCEvent::query() { if (ipc_evt_.ipc_shmem_) { int prev_read_idx = ipc_evt_.ipc_shmem_->read_index; int offset = (prev_read_idx % IPC_SIGNALS_PER_EVENT); if (ipc_evt_.ipc_shmem_->read_index < prev_read_idx + IPC_SIGNALS_PER_EVENT && ipc_evt_.ipc_shmem_->signal[offset] != 0) { return hipErrorNotReady; } } return hipSuccess; } hipError_t IPCEvent::synchronize() { if (ipc_evt_.ipc_shmem_) { int prev_read_idx = ipc_evt_.ipc_shmem_->read_index; if (prev_read_idx >= 0) { int offset = (prev_read_idx % IPC_SIGNALS_PER_EVENT); while ((ipc_evt_.ipc_shmem_->read_index < prev_read_idx + IPC_SIGNALS_PER_EVENT) && (ipc_evt_.ipc_shmem_->signal[offset] != 0)) { amd::Os::sleep(1); } } } return hipSuccess; } hipError_t IPCEvent::streamWaitCommand(amd::Command*& command, hip::Stream* stream) { command = new amd::Marker(*stream, false); if (command == NULL) { return hipErrorOutOfMemory; } return hipSuccess; } hipError_t IPCEvent::enqueueStreamWaitCommand(hipStream_t stream, amd::Command* command) { auto t{new CallbackData{ipc_evt_.ipc_shmem_->read_index, ipc_evt_.ipc_shmem_}}; StreamCallback* cbo = new StreamAddCallback( stream, reinterpret_cast(WaitThenDecrementSignal), t); if (!command->setCallback(CL_COMPLETE, ihipStreamCallback, cbo)) { command->release(); return hipErrorInvalidHandle; } command->enqueue(); command->release(); command->awaitCompletion(); return hipSuccess; } hipError_t IPCEvent::streamWait(hipStream_t stream, uint flags) { hip::Stream* hip_stream = hip::getStream(stream); amd::ScopedLock lock(lock_); if(query() != hipSuccess) { amd::Command* command; hipError_t status = streamWaitCommand(command, hip_stream); if (status != hipSuccess) { return status; } status = enqueueStreamWaitCommand(stream, command); return status; } return hipSuccess; } hipError_t IPCEvent::recordCommand(amd::Command*& command, amd::HostQueue* stream, uint32_t flags) { bool unrecorded = isUnRecorded(); if (unrecorded) { command = new amd::Marker(*stream, kMarkerDisableFlush); } else { return Event::recordCommand(command, stream); } return hipSuccess; } hipError_t IPCEvent::enqueueRecordCommand(hipStream_t stream, amd::Command* command, bool record) { bool unrecorded = isUnRecorded(); if (unrecorded) { amd::Event& tEvent = command->event(); createIpcEventShmemIfNeeded(); int write_index = ipc_evt_.ipc_shmem_->write_index++; int offset = write_index % IPC_SIGNALS_PER_EVENT; while (ipc_evt_.ipc_shmem_->signal[offset] != 0) { amd::Os::sleep(1); } // Lock signal. ipc_evt_.ipc_shmem_->signal[offset] = 1; ipc_evt_.ipc_shmem_->owners_device_id = deviceId(); command->enqueue(); // device writes 0 to signal after the hipEventRecord command is completed // the signal value is checked by WaitThenDecrementSignal cb hipError_t status = ihipStreamOperation(stream, ROCCLR_COMMAND_STREAM_WRITE_VALUE, &(ipc_evt_.ipc_shmem_->signal[offset]), 0, 0, 0, sizeof(uint32_t)); if (status != hipSuccess) { return status; } // Update read index to indicate new signal. int expected = write_index - 1; while (!ipc_evt_.ipc_shmem_->read_index.compare_exchange_weak(expected, write_index)) { amd::Os::sleep(1); } } else { return Event::enqueueRecordCommand(stream, command, record); } return hipSuccess; } hipError_t IPCEvent::GetHandle(ihipIpcEventHandle_t* handle) { if (!createIpcEventShmemIfNeeded()) { return hipErrorInvalidValue; } ipc_evt_.ipc_shmem_->owners_device_id = deviceId(); ipc_evt_.ipc_shmem_->owners_process_id = amd::Os::getProcessId(); memset(handle->shmem_name, 0, HIP_IPC_HANDLE_SIZE); ipc_evt_.ipc_name_.copy(handle->shmem_name, std::string::npos); return hipSuccess; } hipError_t IPCEvent::OpenHandle(ihipIpcEventHandle_t* handle) { ipc_evt_.ipc_name_ = handle->shmem_name; if (!amd::Os::MemoryMapFileTruncated(ipc_evt_.ipc_name_.c_str(), (const void**)&(ipc_evt_.ipc_shmem_), sizeof(ihipIpcEventShmem_t))) { return hipErrorInvalidValue; } if (amd::Os::getProcessId() == ipc_evt_.ipc_shmem_->owners_process_id.load()) { // If this is in the same process, return error. return hipErrorInvalidContext; } ipc_evt_.ipc_shmem_->owners += 1; // device sets 0 to this ptr when the ipc event is completed hipError_t status = hipSuccess; status = ihipHostRegister(&ipc_evt_.ipc_shmem_->signal, sizeof(uint32_t) * IPC_SIGNALS_PER_EVENT, 0); return status; } } // namespace hip // ================================================================================================ hipError_t hipIpcGetEventHandle(hipIpcEventHandle_t* handle, hipEvent_t event) { HIP_INIT_API(hipIpcGetEventHandle, handle, event); if (handle == nullptr || event == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hip::Event* e = reinterpret_cast(event); HIP_RETURN(e->GetHandle(reinterpret_cast(handle))); } hipError_t hipIpcOpenEventHandle(hipEvent_t* event, hipIpcEventHandle_t handle) { HIP_INIT_API(hipIpcOpenEventHandle, event, handle); hipError_t hip_err = hipSuccess; if (event == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hip_err = ihipEventCreateWithFlags(event, hipEventDisableTiming | hipEventInterprocess); if (hip_err != hipSuccess) { HIP_RETURN(hip_err); } hip::Event* e = reinterpret_cast(*event); ihipIpcEventHandle_t* iHandle = reinterpret_cast(&handle); HIP_RETURN(e->OpenHandle(iHandle)); } clr-rocm-5.7.1/hipamd/src/hip_fatbin.cpp000066400000000000000000000314321450307266000201100ustar00rootroot00000000000000#include "hip_fatbin.hpp" #include #include "hip_code_object.hpp" namespace hip { FatBinaryDeviceInfo::~FatBinaryDeviceInfo() { if (program_ != nullptr) { program_->unload(); program_->release(); program_ = nullptr; } } FatBinaryInfo::FatBinaryInfo(const char* fname, const void* image) : fdesc_(amd::Os::FDescInit()), fsize_(0), foffset_(0), image_(image), image_mapped_(false), uri_(std::string()) { if (fname != nullptr) { fname_ = std::string(fname); } else { fname_ = std::string(); } fatbin_dev_info_.resize(g_devices.size(), nullptr); } FatBinaryInfo::~FatBinaryInfo() { for (auto* fbd: fatbin_dev_info_) { if (fbd != nullptr) { delete fbd; } } if (fdesc_ > 0) { if (fsize_ && (HIP_USE_RUNTIME_UNBUNDLER || image_mapped_) && !amd::Os::MemoryUnmapFile(image_, fsize_)) { guarantee(false, "Cannot unmap file for fdesc: %d fsize: %d \n", fdesc_, fsize_); } if (!amd::Os::CloseFileHandle(fdesc_)) { guarantee(false, "Cannot close file for fdesc: %d \n", fdesc_); } } fname_ = std::string(); fdesc_ = amd::Os::FDescInit(); fsize_ = 0; image_ = nullptr; uri_ = std::string(); } hipError_t FatBinaryInfo::ExtractFatBinaryUsingCOMGR(const std::vector& devices) { amd_comgr_data_t data_object; amd_comgr_status_t comgr_status = AMD_COMGR_STATUS_SUCCESS; hipError_t hip_status = hipSuccess; amd_comgr_code_object_info_t* query_list_array = nullptr; // If image was passed as a pointer to our hipMod* api, we can try to extract the file name // if it was mapped by the app. Otherwise use the COMGR data API. if (fname_.size() == 0) { if (image_ == nullptr) { LogError("Both Filename and image cannot be null"); return hipErrorInvalidValue; } if(!amd::Os::FindFileNameFromAddress(image_, &fname_, &foffset_)) { fname_ = std::string(""); foffset_ = 0; } } // If file name & path are available (or it is passed to you), then get the file desc to use // COMGR file slice APIs. if (fname_.size() > 0) { // Get File Handle & size of the file. if (!amd::Os::GetFileHandle(fname_.c_str(), &fdesc_, &fsize_)) return hipErrorFileNotFound; // If the file name exists but the file size is 0, the something wrong with the file or its path if (fsize_ == 0) return hipErrorInvalidValue; // If image_ is nullptr, then file path is passed via hipMod* APIs, so map the file. if (image_ == nullptr) { if(!amd::Os::MemoryMapFileDesc(fdesc_, fsize_, foffset_, &image_)) { LogError("Cannot map the file descriptor"); amd::Os::CloseFileHandle(fdesc_); return hipErrorInvalidValue; } image_mapped_ = true; } } // At this line, image should be a valid ptr. guarantee(image_ != nullptr, "Image cannot be nullptr, file:%s did not map for some reason", fname_.c_str()); do { // If the image ptr is not clang offload bundle then just directly point the image. if (!CodeObject::IsClangOffloadMagicBundle(image_)) { for (size_t dev_idx=0; dev_idx < devices.size(); ++dev_idx) { fatbin_dev_info_[devices[dev_idx]->deviceId()] = new FatBinaryDeviceInfo(image_, CodeObject::ElfSize(image_), 0); fatbin_dev_info_[devices[dev_idx]->deviceId()]->program_ = new amd::Program(*devices[dev_idx]->asContext()); if (fatbin_dev_info_[devices[dev_idx]->deviceId()]->program_ == nullptr) { hip_status = hipErrorOutOfMemory; break; } } break; } // Create a data object, if it fails return error if ((comgr_status = amd_comgr_create_data(AMD_COMGR_DATA_KIND_FATBIN, &data_object)) != AMD_COMGR_STATUS_SUCCESS) { LogPrintfError("Creating data object failed with status %d ", comgr_status); hip_status = hipErrorInvalidValue; break; } #if !defined(_WIN32) // Using the file descriptor and file size, map the data object. if (fdesc_ > 0) { guarantee(fsize_ > 0, "Cannot have a file size of 0, fdesc: %d fname: %s \n", fdesc_, fname_.c_str()); if ((comgr_status = amd_comgr_set_data_from_file_slice(data_object, fdesc_, foffset_, fsize_)) != AMD_COMGR_STATUS_SUCCESS) { LogPrintfError("Setting data from file slice failed with status %d ", comgr_status); hip_status = hipErrorInvalidValue; break; } } else #endif if (image_ != nullptr) { // Using the image ptr, map the data object. if ((comgr_status = amd_comgr_set_data(data_object, 4096, reinterpret_cast(image_))) != AMD_COMGR_STATUS_SUCCESS) { LogPrintfError("Setting data from file slice failed with status %d ", comgr_status); hip_status = hipErrorInvalidValue; break; } } else { guarantee(false, "Cannot have both fname_ and image_ as nullptr"); } // Find the unique number of ISAs needed for this COMGR query. std::unordered_map> unique_isa_names; for (size_t dev_idx = 0; dev_idx < devices.size(); ++dev_idx) { std::string device_name = devices[dev_idx]->devices()[0]->isa().isaName(); if (unique_isa_names.cend() == unique_isa_names.find(device_name)) { unique_isa_names.insert({device_name, std::make_pair(0,0)}); } } // Create a query list using COMGR info for unique ISAs. query_list_array = new amd_comgr_code_object_info_t[unique_isa_names.size()]; auto isa_it = unique_isa_names.begin(); for (size_t isa_idx = 0; isa_idx < unique_isa_names.size(); ++isa_idx) { std::advance(isa_it, isa_idx); query_list_array[isa_idx].isa = isa_it->first.c_str(); query_list_array[isa_idx].size = 0; query_list_array[isa_idx].offset = 0; } // Look up the code object info passing the query list. if ((comgr_status = amd_comgr_lookup_code_object(data_object, query_list_array, unique_isa_names.size())) != AMD_COMGR_STATUS_SUCCESS) { LogPrintfError("Setting data from file slice failed with status %d ", comgr_status); hip_status = hipErrorInvalidValue; break; } for (size_t isa_idx = 0; isa_idx < unique_isa_names.size(); ++isa_idx) { auto unique_it = unique_isa_names.find(query_list_array[isa_idx].isa); guarantee(unique_isa_names.cend() != unique_it, "Cannot find unique isa "); unique_it->second = std::pair (static_cast(query_list_array[isa_idx].size), static_cast(query_list_array[isa_idx].offset)); } for (size_t dev_idx = 0; dev_idx < devices.size(); ++dev_idx) { std::string device_name = devices[dev_idx]->devices()[0]->isa().isaName(); auto dev_it = unique_isa_names.find(device_name); guarantee(unique_isa_names.cend() != dev_it, "Cannot find the device name in the unique device name"); fatbin_dev_info_[devices[dev_idx]->deviceId()] = new FatBinaryDeviceInfo(reinterpret_cast
(const_cast(image_)) + dev_it->second.second, dev_it->second.first, dev_it->second.second); fatbin_dev_info_[devices[dev_idx]->deviceId()]->program_ = new amd::Program(*devices[dev_idx]->asContext()); } } while(0); if (query_list_array) { delete[] query_list_array; } // Clean up file and memory resouces if hip_status failed for some reason. if (hip_status != hipSuccess && hip_status != hipErrorInvalidKernelFile) { if (image_mapped_) { if (!amd::Os::MemoryUnmapFile(image_, fsize_)) guarantee(false, "Cannot unmap the file"); image_ = nullptr; image_mapped_ = false; } if (fdesc_ > 0) { guarantee(fsize_ > 0, "Size has to greater than 0 too"); if (!amd::Os::CloseFileHandle(fdesc_)) guarantee(false, "Cannot close the file handle"); fdesc_ = 0; fsize_ = 0; } if ((comgr_status = amd_comgr_release_data(data_object)) != AMD_COMGR_STATUS_SUCCESS) { LogPrintfError("Releasing COMGR data failed with status %d ", comgr_status); return hipErrorInvalidValue; } } return hip_status; } hipError_t FatBinaryInfo::ExtractFatBinary(const std::vector& devices) { if (!HIP_USE_RUNTIME_UNBUNDLER) { return ExtractFatBinaryUsingCOMGR(devices); } hipError_t hip_error = hipSuccess; std::vector> code_objs; // Copy device names for Extract Code object File std::vector device_names; device_names.reserve(devices.size()); for (size_t dev_idx = 0; dev_idx < devices.size(); ++dev_idx) { device_names.push_back(devices[dev_idx]->devices()[0]->isa().isaName()); } // We are given file name, get the file desc and file size if (fname_.size() > 0) { // Get File Handle & size of the file. if (!amd::Os::GetFileHandle(fname_.c_str(), &fdesc_, &fsize_)) { return hipErrorFileNotFound; } if (fsize_ == 0) { return hipErrorInvalidImage; } // Extract the code object from file hip_error = CodeObject::ExtractCodeObjectFromFile(fdesc_, fsize_, &image_, device_names, code_objs); } else if (image_ != nullptr) { // We are directly given image pointer directly, try to extract file desc & file Size hip_error = CodeObject::ExtractCodeObjectFromMemory(image_, device_names, code_objs, uri_); } else { return hipErrorInvalidValue; } if (hip_error == hipErrorNoBinaryForGpu) { if (fname_.size() > 0) { LogPrintfError("hipErrorNoBinaryForGpu: Couldn't find binary for file: %s", fname_.c_str()); } else { LogPrintfError("hipErrorNoBinaryForGpu: Couldn't find binary for ptr: 0x%x", image_); } return hip_error; } if (hip_error == hipErrorInvalidKernelFile) { for (size_t dev_idx = 0; dev_idx < devices.size(); ++dev_idx) { // the image type is no CLANG_OFFLOAD_BUNDLER, image for current device directly passed fatbin_dev_info_[devices[dev_idx]->deviceId()] = new FatBinaryDeviceInfo(image_, CodeObject::ElfSize(image_), 0); } } else if(hip_error == hipSuccess) { for (size_t dev_idx = 0; dev_idx < devices.size(); ++dev_idx) { // Calculate the offset wrt binary_image and the original image size_t offset_l = (reinterpret_cast
(const_cast(code_objs[dev_idx].first)) - reinterpret_cast
(const_cast(image_))); fatbin_dev_info_[devices[dev_idx]->deviceId()] = new FatBinaryDeviceInfo(code_objs[dev_idx].first, code_objs[dev_idx].second, offset_l); } } for (size_t dev_idx = 0; dev_idx < devices.size(); ++dev_idx) { fatbin_dev_info_[devices[dev_idx]->deviceId()]->program_ = new amd::Program(*devices[dev_idx]->asContext()); if (fatbin_dev_info_[devices[dev_idx]->deviceId()]->program_ == NULL) { return hipErrorOutOfMemory; } } return hipSuccess; } hipError_t FatBinaryInfo::AddDevProgram(const int device_id) { // Device Id bounds Check DeviceIdCheck(device_id); FatBinaryDeviceInfo* fbd_info = fatbin_dev_info_[device_id]; if (fbd_info == nullptr) { return hipErrorInvalidKernelFile; } // If fat binary was already added, skip this step and return success if (fbd_info->add_dev_prog_ == false) { amd::Context* ctx = g_devices[device_id]->asContext(); if (CL_SUCCESS != fbd_info->program_->addDeviceProgram(*ctx->devices()[0], fbd_info->binary_image_, fbd_info->binary_size_, false, nullptr, nullptr, fdesc_, fbd_info->binary_offset_, uri_)) { return hipErrorInvalidKernelFile; } fbd_info->add_dev_prog_ = true; } return hipSuccess; } hipError_t FatBinaryInfo::BuildProgram(const int device_id) { // Device Id Check and Add DeviceProgram if not added so far DeviceIdCheck(device_id); IHIP_RETURN_ONFAIL(AddDevProgram(device_id)); // If Program was already built skip this step and return success FatBinaryDeviceInfo* fbd_info = fatbin_dev_info_[device_id]; if (fbd_info->prog_built_ == false) { if(CL_SUCCESS != fbd_info->program_->build(g_devices[device_id]->devices(), nullptr, nullptr, nullptr, kOptionChangeable, kNewDevProg)) { return hipErrorSharedObjectInitFailed; } fbd_info->prog_built_ = true; } if (!fbd_info->program_->load()) { return hipErrorSharedObjectInitFailed; } return hipSuccess; } } //namespace : hip clr-rocm-5.7.1/hipamd/src/hip_fatbin.hpp000066400000000000000000000055011450307266000201130ustar00rootroot00000000000000#ifndef HIP_FAT_BINARY_HPP #define HIP_FAT_BINARY_HPP #include "hip/hip_runtime.h" #include "hip/hip_runtime_api.h" #include "hip_internal.hpp" #include "platform/program.hpp" namespace hip { //Fat Binary Per Device info class FatBinaryDeviceInfo { public: FatBinaryDeviceInfo (const void* binary_image, size_t binary_size, size_t binary_offset) : binary_image_(binary_image), binary_size_(binary_size), binary_offset_(binary_offset), program_(nullptr), add_dev_prog_(false), prog_built_(false) {} ~FatBinaryDeviceInfo(); private: const void* binary_image_; // binary image ptr size_t binary_size_; // binary image size size_t binary_offset_; // image offset from original amd::Program* program_; // reinterpreted as hipModule_t friend class FatBinaryInfo; //Control Variables bool add_dev_prog_; bool prog_built_; }; // Fat Binary Info class FatBinaryInfo { public: FatBinaryInfo(const char* fname, const void* image); ~FatBinaryInfo(); // Loads Fat binary from file or image, unbundles COs for devices. hipError_t ExtractFatBinaryUsingCOMGR(const std::vector& devices); hipError_t ExtractFatBinary(const std::vector& devices); hipError_t AddDevProgram(const int device_id); hipError_t BuildProgram(const int device_id); // Device Id bounds check inline void DeviceIdCheck(const int device_id) const { guarantee(device_id >= 0, "Invalid DeviceId less than 0"); guarantee(static_cast(device_id) < fatbin_dev_info_.size(), "Invalid DeviceId, greater than no of fatbin device info!"); } // Getter Methods amd::Program* GetProgram(int device_id) { DeviceIdCheck(device_id); return fatbin_dev_info_[device_id]->program_; } hipModule_t Module(int device_id) const { DeviceIdCheck(device_id); return reinterpret_cast(as_cl(fatbin_dev_info_[device_id]->program_)); } hipError_t GetModule(int device_id, hipModule_t* hmod) const { DeviceIdCheck(device_id); *hmod = reinterpret_cast(as_cl(fatbin_dev_info_[device_id]->program_)); return hipSuccess; } private: std::string fname_; // File name amd::Os::FileDesc fdesc_; // File descriptor size_t fsize_; // Total file size size_t foffset_; // File Offset where the fat binary is present. // Even when file is passed image will be mmapped till ~desctructor. const void* image_; // Image bool image_mapped_; // flag to detect if image is mapped // Only used for FBs where image is directly passed std::string uri_; // Uniform resource indicator // Per Device Info, like corresponding binary ptr, size. std::vector fatbin_dev_info_; }; }; /* namespace hip */ #endif /* HIP_FAT_BINARY_HPP */ clr-rocm-5.7.1/hipamd/src/hip_formatting.hpp000066400000000000000000000510431450307266000210240ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include inline std::ostream& operator<<(std::ostream& os, const hipTextureFilterMode& s) { switch (s) { case hipFilterModePoint: os << "hipFilterModePoint"; break; case hipFilterModeLinear: os << "hipFilterModeLinear"; break; default: os << "hipFilterModePoint"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipTextureReadMode& s) { switch (s) { case hipReadModeElementType: os << "hipReadModeElementType"; break; case hipReadModeNormalizedFloat: os << "hipReadModeNormalizedFloat"; break; default: os << "hipReadModeElementType"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipTextureAddressMode& s) { switch (s) { case hipAddressModeWrap: os << "hipAddressModeWrap"; break; case hipAddressModeClamp: os << "hipAddressModeClamp"; break; case hipAddressModeMirror: os << "hipAddressModeMirror"; break; case hipAddressModeBorder: os << "hipAddressModeBorder"; break; default: os << "hipAddressModeWrap"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipMemcpyKind& s) { switch (s) { case hipMemcpyHostToHost: os << "hipMemcpyHostToHost"; break; case hipMemcpyHostToDevice: os << "hipMemcpyHostToDevice"; break; case hipMemcpyDeviceToHost: os << "hipMemcpyDeviceToHost"; break; case hipMemcpyDeviceToDevice: os << "hipMemcpyDeviceToDevice"; break; case hipMemcpyDefault: os << "hipMemcpyDefault"; break; default: os << "hipMemcpyDefault"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipChannelFormatKind& s) { switch (s) { case hipChannelFormatKindSigned: os << "hipChannelFormatKindSigned"; break; case hipChannelFormatKindUnsigned: os << "hipMemcpyHostToDevice"; break; case hipChannelFormatKindFloat: os << "hipChannelFormatKindFloat"; break; case hipChannelFormatKindNone: os << "hipChannelFormatKindNone"; break; default: os << "hipChannelFormatKindNone"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipArray_Format& s) { switch (s) { case HIP_AD_FORMAT_UNSIGNED_INT8: os << "HIP_AD_FORMAT_UNSIGNED_INT8"; break; case HIP_AD_FORMAT_UNSIGNED_INT16: os << "HIP_AD_FORMAT_UNSIGNED_INT16"; break; case HIP_AD_FORMAT_UNSIGNED_INT32: os << "HIP_AD_FORMAT_UNSIGNED_INT32"; break; case HIP_AD_FORMAT_SIGNED_INT8: os << "HIP_AD_FORMAT_SIGNED_INT8"; break; case HIP_AD_FORMAT_SIGNED_INT16: os << "HIP_AD_FORMAT_SIGNED_INT16"; break; case HIP_AD_FORMAT_SIGNED_INT32: os << "HIP_AD_FORMAT_SIGNED_INT32"; break; case HIP_AD_FORMAT_HALF: os << "HIP_AD_FORMAT_HALF"; break; case HIP_AD_FORMAT_FLOAT: os << "HIP_AD_FORMAT_FLOAT"; break; default: os << "HIP_AD_FORMAT_FLOAT"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipResourceViewFormat& s) { switch (s) { case hipResViewFormatNone: os << "hipResViewFormatNone"; break; case hipResViewFormatUnsignedChar1: os << "hipResViewFormatUnsignedChar1"; break; case hipResViewFormatUnsignedChar2: os << "hipResViewFormatUnsignedChar2"; break; case hipResViewFormatUnsignedChar4: os << "hipResViewFormatUnsignedChar4"; break; case hipResViewFormatSignedChar1: os << "hipResViewFormatSignedChar1"; break; case hipResViewFormatSignedChar2: os << "hipResViewFormatSignedChar2"; break; case hipResViewFormatSignedChar4: os << "hipResViewFormatSignedChar4"; break; case hipResViewFormatUnsignedShort1: os << "hipResViewFormatUnsignedShort1"; break; case hipResViewFormatUnsignedShort2: os << "hipResViewFormatUnsignedShort2"; break; case hipResViewFormatUnsignedShort4: os << "hipResViewFormatUnsignedShort4"; break; case hipResViewFormatSignedShort1: os << "hipResViewFormatSignedShort1"; break; case hipResViewFormatSignedShort2: os << "hipResViewFormatSignedShort2"; break; case hipResViewFormatSignedShort4: os << "hipResViewFormatSignedShort4"; break; case hipResViewFormatUnsignedInt1: os << "hipResViewFormatUnsignedInt1"; break; case hipResViewFormatUnsignedInt2: os << "hipResViewFormatUnsignedInt2"; break; case hipResViewFormatUnsignedInt4: os << "hipResViewFormatUnsignedInt4"; break; case hipResViewFormatSignedInt1: os << "hipResViewFormatSignedInt1"; break; case hipResViewFormatSignedInt2: os << "hipResViewFormatSignedInt2"; break; case hipResViewFormatSignedInt4: os << "hipResViewFormatSignedInt4"; break; case hipResViewFormatHalf1: os << "hipResViewFormatHalf1"; break; case hipResViewFormatHalf2: os << "hipResViewFormatHalf2"; break; case hipResViewFormatHalf4: os << "hipResViewFormatHalf4"; break; case hipResViewFormatFloat1: os << "hipResViewFormatFloat1"; break; case hipResViewFormatFloat2: os << "hipResViewFormatFloat2"; break; case hipResViewFormatFloat4: os << "hipResViewFormatFloat4"; break; case hipResViewFormatUnsignedBlockCompressed1: os << "hipResViewFormatUnsignedBlockCompressed1"; break; case hipResViewFormatUnsignedBlockCompressed2: os << "hipResViewFormatUnsignedBlockCompressed2"; break; case hipResViewFormatUnsignedBlockCompressed3: os << "hipResViewFormatUnsignedBlockCompressed3"; break; case hipResViewFormatUnsignedBlockCompressed4: os << "hipResViewFormatUnsignedBlockCompressed4"; break; case hipResViewFormatSignedBlockCompressed4: os << "hipResViewFormatSignedBlockCompressed4"; break; case hipResViewFormatUnsignedBlockCompressed5: os << "hipResViewFormatUnsignedBlockCompressed5"; break; case hipResViewFormatSignedBlockCompressed5: os << "hipResViewFormatSignedBlockCompressed5"; break; case hipResViewFormatUnsignedBlockCompressed6H: os << "hipResViewFormatUnsignedBlockCompressed6H"; break; case hipResViewFormatSignedBlockCompressed6H: os << "hipResViewFormatSignedBlockCompressed6H"; break; case hipResViewFormatUnsignedBlockCompressed7: os << "hipResViewFormatUnsignedBlockCompressed7"; break; default: os << "hipResViewFormatNone"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipFunction_attribute& s) { switch (s) { case HIP_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK: os << "HIP_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK"; break; case HIP_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES: os << "HIP_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES"; break; case HIP_FUNC_ATTRIBUTE_CONST_SIZE_BYTES: os << "HIP_FUNC_ATTRIBUTE_CONST_SIZE_BYTES"; break; case HIP_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES: os << "HIP_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES"; break; case HIP_FUNC_ATTRIBUTE_NUM_REGS: os << "HIP_FUNC_ATTRIBUTE_NUM_REGS"; break; case HIP_FUNC_ATTRIBUTE_PTX_VERSION: os << "HIP_FUNC_ATTRIBUTE_PTX_VERSION"; break; case HIP_FUNC_ATTRIBUTE_BINARY_VERSION: os << "HIP_FUNC_ATTRIBUTE_BINARY_VERSION"; break; case HIP_FUNC_ATTRIBUTE_CACHE_MODE_CA: os << "HIP_FUNC_ATTRIBUTE_CACHE_MODE_CA"; break; case HIP_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES: os << "HIP_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES"; break; case HIP_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT: os << "HIP_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT"; break; case HIP_FUNC_ATTRIBUTE_MAX: os << "HIP_FUNC_ATTRIBUTE_MAX"; break; default: os << "HIP_FUNC_ATTRIBUTE_MAX"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hiprtcResult& s) { switch (s) { case HIPRTC_SUCCESS: os << "HIPRTC_SUCCESS"; break; case HIPRTC_ERROR_OUT_OF_MEMORY: os << "HIPRTC_ERROR_OUT_OF_MEMORY"; break; case HIPRTC_ERROR_PROGRAM_CREATION_FAILURE: os << "HIPRTC_ERROR_PROGRAM_CREATION_FAILURE"; break; case HIPRTC_ERROR_INVALID_INPUT: os << "HIPRTC_ERROR_INVALID_INPUT"; break; case HIPRTC_ERROR_INVALID_PROGRAM: os << "HIPRTC_ERROR_INVALID_PROGRAM"; break; case HIPRTC_ERROR_INVALID_OPTION: os << "HIPRTC_ERROR_INVALID_OPTION"; break; case HIPRTC_ERROR_COMPILATION: os << "HIPRTC_ERROR_COMPILATION"; break; case HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE: os << "HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE"; break; case HIPRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION: os << "HIPRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION"; break; case HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION: os << "IPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION"; break; case HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID: os << "HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID"; break; case HIPRTC_ERROR_INTERNAL_ERROR: os << "HIPRTC_ERROR_INTERNAL_ERROR"; break; default: os << "HIPRTC_ERROR_INTERNAL_ERROR"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipJitOption& s) { switch (s) { case HIPRTC_JIT_MAX_REGISTERS: os << "HIPRTC_JIT_MAX_REGISTERS"; break; case HIPRTC_JIT_THREADS_PER_BLOCK: os << "HIPRTC_JIT_THREADS_PER_BLOCK"; break; case HIPRTC_JIT_WALL_TIME: os << "HIPRTC_JIT_WALL_TIME"; break; case HIPRTC_JIT_INFO_LOG_BUFFER: os << "HIPRTC_JIT_INFO_LOG_BUFFER"; break; case HIPRTC_JIT_INFO_LOG_BUFFER_SIZE_BYTES: os << "HIPRTC_JIT_ERROR_LOG_BUFFER_SIZE_BYTES"; break; case HIPRTC_JIT_ERROR_LOG_BUFFER: os << "HIPRTC_JIT_ERROR_LOG_BUFFER"; break; case HIPRTC_JIT_ERROR_LOG_BUFFER_SIZE_BYTES: os << "HIPRTC_JIT_ERROR_LOG_BUFFER_SIZE_BYTES"; break; case HIPRTC_JIT_OPTIMIZATION_LEVEL: os << "HIPRTC_JIT_OPTIMIZATION_LEVEL"; break; case HIPRTC_JIT_TARGET_FROM_HIPCONTEXT: os << "HIPRTC_JIT_TARGET_FROM_HIPCONTEXT"; break; case HIPRTC_JIT_TARGET: os << "HIPRTC_JIT_TARGET"; break; case HIPRTC_JIT_FALLBACK_STRATEGY: os << "HIPRTC_JIT_FALLBACK_STRATEGY"; break; case HIPRTC_JIT_GENERATE_DEBUG_INFO: os << "HIPRTC_JIT_GENERATE_DEBUG_INFO"; break; case HIPRTC_JIT_CACHE_MODE: os << "HIPRTC_JIT_CACHE_MODE"; break; case HIPRTC_JIT_NEW_SM3X_OPT: os << "HIPRTC_JIT_NEW_SM3X_OPT"; break; case HIPRTC_JIT_FAST_COMPILE: os << "HIPRTC_JIT_FAST_COMPILE"; break; case HIPRTC_JIT_GLOBAL_SYMBOL_NAMES: os << "HIPRTC_JIT_GLOBAL_SYMBOL_NAMES"; break; case HIPRTC_JIT_GLOBAL_SYMBOL_ADDRESS: os << "HIPRTC_JIT_GLOBAL_SYMBOL_ADDRESS"; break; case HIPRTC_JIT_GLOBAL_SYMBOL_COUNT: os << "HIPRTC_JIT_GLOBAL_SYMBOL_COUNT"; break; case HIPRTC_JIT_LTO: os << "HIPRTC_JIT_LTO"; break; case HIPRTC_JIT_FTZ: os << "HIPRTC_JIT_FTZ"; break; case HIPRTC_JIT_PREC_DIV: os << "HIPRTC_JIT_PREC_DIV"; break; case HIPRTC_JIT_PREC_SQRT: os << "HIPRTC_JIT_PREC_SQRT"; break; case HIPRTC_JIT_FMA: os << "HIPRTC_JIT_FMA"; break; case HIPRTC_JIT_NUM_OPTIONS: os << "HIPRTC_JIT_NUM_OPTIONS"; break; default: os << "HIPRTC_JIT_MAX_REGISTERS"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipFuncCache_t& s) { switch (s) { case hipFuncCachePreferNone: os << "hipFuncCachePreferNone"; break; case hipFuncCachePreferShared: os << "hipFuncCachePreferShared"; break; case hipFuncCachePreferL1: os << "hipFuncCachePreferL1"; break; case hipFuncCachePreferEqual: os << "hipFuncCachePreferEqual"; break; default: os << "hipFuncCachePreferNone"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipSharedMemConfig& s) { switch (s) { case hipSharedMemBankSizeDefault: os << "hipSharedMemBankSizeDefault"; break; case hipSharedMemBankSizeFourByte: os << "hipSharedMemBankSizeFourByte"; break; case hipSharedMemBankSizeEightByte: os << "hipSharedMemBankSizeEightByte"; break; default: os << "hipSharedMemBankSizeDefault"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipDataType& s) { switch (s) { case HIP_R_16F: os << "HIP_R_16F"; break; case HIP_R_32F: os << "HIP_R_32F"; break; case HIP_R_64F: os << "HIP_R_64F"; break; case HIP_C_16F: os << "HIP_C_16F"; break; case HIP_C_32F: os << "HIP_C_32F"; break; case HIP_C_64F: os << "HIP_C_64F"; break; default: os << "HIP_R_16F"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hipLibraryPropertyType& s) { switch (s) { case HIP_LIBRARY_MAJOR_VERSION: os << "HIP_LIBRARY_MAJOR_VERSION"; break; case HIP_LIBRARY_MINOR_VERSION: os << "HIP_LIBRARY_MINOR_VERSION"; break; case HIP_LIBRARY_PATCH_LEVEL: os << "HIP_LIBRARY_PATCH_LEVEL"; break; default: os << "HIP_LIBRARY_MAJOR_VERSION"; }; return os; } inline std::ostream& operator<<(std::ostream& os, const hip_api_id_t& s) { os << hip_api_name(s); return os; } inline std::ostream& operator<<(std::ostream& os, const hip_api_id_t* s) { if (s) { os << *s; } else { os << "nullptr"; } return os; } inline std::ostream& operator<<(std::ostream& os, const hipTextureDesc& s) { os << '{' << '{' << s.addressMode[0] << ',' << s.addressMode[1] << ',' << s.addressMode[2] << '}' << ',' << s.filterMode << ',' << s.readMode << ',' << s.sRGB << ',' << '{' << s.borderColor[0] << ',' << s.borderColor[1] << ',' << s.borderColor[2] << ',' << s.borderColor[3] << '}' << ',' << s.normalizedCoords << ',' << s.mipmapFilterMode << ',' << s.mipmapLevelBias << ',' << s.minMipmapLevelClamp << ',' << s.maxMipmapLevelClamp << '}'; return os; } inline std::ostream& operator<<(std::ostream& os, const hipTextureDesc* s) { if (s) { os << *s; } else { os << "nullptr"; } return os; } inline std::ostream& operator<<(std::ostream& os, const dim3& s) { os << '{' << s.x << ',' << s.y << ',' << s.z << '}'; return os; } inline std::ostream& operator<<(std::ostream& os, const dim3* s) { if (s) { os << *s; } else { os << "nullptr"; } return os; } inline std::ostream& operator<<(std::ostream& os, const hipChannelFormatDesc& s) { os << '{' << s.x << ',' << s.y << ',' << s.z << ',' << s.w << ',' << s.f << '}'; return os; } inline std::ostream& operator<<(std::ostream& os, const hipChannelFormatDesc* s) { if (s) { os << *s; } else { os << "nullptr"; } return os; } inline std::ostream& operator<<(std::ostream& os, const hipMipmappedArray& s) { os << '{' << s.data << ',' << s.desc << ',' << s.width << ',' << s.height << ',' << s.depth << '}'; return os; } inline std::ostream& operator<<(std::ostream& os, const hipMipmappedArray* s) { if (s) { os << *s; } else { os << "nullptr"; } return os; } inline std::ostream& operator<<(std::ostream& os, const hipResourceDesc& s) { os << '{' << s.resType << ',' << '{'; switch (s.resType) { case hipResourceTypeLinear: os << s.res.linear.devPtr << ',' << s.res.linear.desc << ',' << s.res.linear.sizeInBytes; break; case hipResourceTypePitch2D: os << s.res.pitch2D.devPtr << ',' << s.res.pitch2D.desc << ',' << s.res.pitch2D.width << ',' << s.res.pitch2D.height << ',' << s.res.pitch2D.pitchInBytes; break; case hipResourceTypeArray: os << s.res.array.array; break; case hipResourceTypeMipmappedArray: os < #include #include "hip_conversions.hpp" namespace amd { static std::once_flag interopOnce; } // Sets up GL context association with amd context. // NOTE: Refer to Context setup code in OCLTestImp.cpp void setupGLInteropOnce() { amd::Context* amdContext = hip::getCurrentDevice()->asContext(); //current context will be read in amdContext->create cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)AMD_PLATFORM, ROCCLR_HIP_GL_CONTEXT_KHR, (cl_context_properties) nullptr, #ifdef _WIN32 ROCCLR_HIP_WGL_HDC_KHR, (cl_context_properties) nullptr, #else ROCCLR_HIP_GLX_DISPLAY_KHR, (cl_context_properties) nullptr, #endif 0}; amd::Context::Info info; if (CL_SUCCESS != amd::Context::checkProperties(properties, &info)) { LogError("Context setup failed \n"); return; } amdContext->setInfo(info); if (CL_SUCCESS != amdContext->create(properties)) { LogError("Context setup failed \n"); } } static inline hipError_t hipSetInteropObjects(int num_objects, void** mem_objects, std::vector& interopObjects) { if ((num_objects == 0 && mem_objects != nullptr) || (num_objects != 0 && mem_objects == nullptr)) { return hipErrorUnknown; } while (num_objects-- > 0) { void* obj = *mem_objects++; if (obj == nullptr) { return hipErrorInvalidResourceHandle; } amd::Memory* mem = reinterpret_cast(obj); if (mem->getInteropObj() == nullptr) { return hipErrorInvalidResourceHandle; } interopObjects.push_back(mem); } return hipSuccess; } // NOTE: This method cooresponds to OpenCL functionality in clGetGLContextInfoKHR() hipError_t hipGLGetDevices(unsigned int* pHipDeviceCount, int* pHipDevices, unsigned int hipDeviceCount, hipGLDeviceList deviceList) { HIP_INIT_API(hipGLGetDevices, pHipDeviceCount, pHipDevices, hipDeviceCount, deviceList); std::call_once(amd::interopOnce, setupGLInteropOnce); static const bool VALIDATE_ONLY = true; if (deviceList == hipGLDeviceListNextFrame) { LogError(" hipGLDeviceListNextFrame not supported yet.\n"); HIP_RETURN(hipErrorNotSupported); } if (pHipDeviceCount == nullptr || pHipDevices == nullptr || hipDeviceCount == 0) { LogError(" Invalid Argument \n"); HIP_RETURN(hipErrorInvalidValue); } hipDeviceCount = std::min(hipDeviceCount, static_cast(g_devices.size())); amd::Context::Info info = hip::getCurrentDevice()->asContext()->info(); if (!(info.flags_ & amd::Context::GLDeviceKhr)) { LogError("Failed : Invalid Shared Group Reference \n"); HIP_RETURN(hipErrorInvalidValue); } amd::GLFunctions* glenv = hip::getCurrentDevice()->asContext()->glenv(); if (glenv != nullptr) { #ifdef _WIN32 info.hCtx_ = glenv->wglGetCurrentContext_(); #else info.hCtx_ = glenv->glXGetCurrentContext_(); #endif hip::getCurrentDevice()->asContext()->setInfo(info); glenv->update(reinterpret_cast(info.hCtx_)); } *pHipDeviceCount = 0; switch (deviceList) { case hipGLDeviceListCurrentFrame: for (int i = 0; i < hipDeviceCount; ++i) { const std::vector& devices = g_devices[i]->devices(); if (devices.size() > 0 && devices[0]->bindExternalDevice(info.flags_, info.hDev_, info.hCtx_, VALIDATE_ONLY)) { pHipDevices[0] = i; *pHipDeviceCount = 1; break; } } break; case hipGLDeviceListAll: { int foundDeviceCount = 0; for (int i = 0; i < hipDeviceCount; ++i) { const std::vector& devices = g_devices[i]->devices(); if (devices.size() > 0 && devices[0]->bindExternalDevice(info.flags_, info.hDev_, info.hCtx_, VALIDATE_ONLY)) { pHipDevices[foundDeviceCount++] = i; break; } } *pHipDeviceCount = foundDeviceCount; } break; default: LogWarning("Invalid deviceList value"); HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(*pHipDeviceCount > 0 ? hipSuccess : hipErrorNoDevice); } static inline void clearGLErrors(const amd::Context& amdContext) { GLenum glErr, glLastErr = GL_NO_ERROR; while (1) { glErr = amdContext.glenv()->glGetError_(); if (glErr == GL_NO_ERROR || glErr == glLastErr) { break; } glLastErr = glErr; LogWarning("GL error"); } } static inline GLenum checkForGLError(const amd::Context& amdContext) { GLenum glRetErr = GL_NO_ERROR; GLenum glErr; while (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { glRetErr = glErr; // Just return the last GL error LogWarning("Check GL error"); } return glRetErr; } hipError_t hipGraphicsSubResourceGetMappedArray(hipArray_t* array, hipGraphicsResource_t resource, unsigned int arrayIndex, unsigned int mipLevel) { HIP_INIT_API(hipGraphicsSubResourceGetMappedArray, array, resource, arrayIndex, mipLevel); amd::Context& amdContext = *(hip::getCurrentDevice()->asContext()); if (array == nullptr || resource == nullptr) { LogError("invalid array/resource"); HIP_RETURN(hipErrorInvalidValue); } amd::Image* image = (reinterpret_cast(resource))->asImage(); if (image == nullptr) { LogError("invalid resource/image"); HIP_RETURN(hipErrorInvalidValue); } // arrayIndex higher than zero not implmented assert(arrayIndex == 0) ; amd::Image * view = image->createView(amdContext, image->getImageFormat(), nullptr, mipLevel, 0); hipArray* myarray = new hipArray(); myarray->data = as_cl (view); myarray->width = view->getWidth(); myarray->height = view->getHeight(); myarray->depth = view->getDepth(); const cl_mem_object_type image_type = hip::getCLMemObjectType(myarray->width, myarray->height, myarray->depth, hipArrayDefault); myarray->type = image_type; amd::Image::Format f = image->getImageFormat(); myarray->Format = hip::getCL2hipArrayFormat(f.image_channel_data_type); myarray->desc = hip::getChannelFormatDesc(f.getNumChannels(), myarray->Format); myarray->NumChannels = hip::getNumChannels(myarray->desc); myarray->isDrv = 0; myarray->textureType = 0; *array = myarray; { amd::ScopedLock lock(hip::hipArraySetLock); hip::hipArraySet.insert(*array); } HIP_RETURN(hipSuccess); } hipError_t hipGraphicsGLRegisterImage(hipGraphicsResource** resource, GLuint image, GLenum target, unsigned int flags) { HIP_INIT_API(hipGraphicsGLRegisterImage, resource, image, target, flags); if (!((flags == hipGraphicsRegisterFlagsNone) || (flags & hipGraphicsRegisterFlagsReadOnly) || (flags & hipGraphicsRegisterFlagsWriteDiscard) || (flags & hipGraphicsRegisterFlagsSurfaceLoadStore) || (flags & hipGraphicsRegisterFlagsTextureGather))) { LogError("invalid parameter \"flags\""); HIP_RETURN(hipErrorInvalidValue); } if (resource == nullptr) { LogError("invalid resource"); HIP_RETURN(hipErrorInvalidValue); } GLint miplevel = 0; amd::Context& amdContext = *(hip::getCurrentDevice()->asContext()); if (amdContext.glenv() == nullptr) { LogError("invalid context, gl interop not initialized"); HIP_RETURN(hipErrorInvalidValue); } amd::GLFunctions::SetIntEnv ie(amdContext.glenv()); if (!ie.isValid()) { LogWarning("\"amdContext\" is not created from GL context or share list \n"); HIP_RETURN(hipErrorUnknown); } amd::ImageGL* pImageGL = NULL; GLenum glErr; GLenum glTarget = 0; GLenum glInternalFormat; cl_image_format clImageFormat; uint dim = 1; cl_mem_object_type clType; cl_gl_object_type clGLType; GLsizei numSamples = 1; GLint gliTexWidth = 1; GLint gliTexHeight = 1; GLint gliTexDepth = 1; // Verify GL texture object clearGLErrors(amdContext); if ((GL_FALSE == amdContext.glenv()->glIsTexture_(image)) || (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_()))) { LogWarning("\"texture\" is not a GL texture object"); HIP_RETURN(hipErrorUnknown); } bool isImage = true; // Check target value validity switch (target) { case GL_TEXTURE_BUFFER: glTarget = GL_TEXTURE_BUFFER; dim = 1; clType = CL_MEM_OBJECT_IMAGE1D_BUFFER; clGLType = CL_GL_OBJECT_TEXTURE_BUFFER; isImage = false; break; case GL_TEXTURE_1D: glTarget = GL_TEXTURE_1D; dim = 1; clType = CL_MEM_OBJECT_IMAGE1D; clGLType = CL_GL_OBJECT_TEXTURE1D; break; case GL_TEXTURE_CUBE_MAP_POSITIVE_X: case GL_TEXTURE_CUBE_MAP_NEGATIVE_X: case GL_TEXTURE_CUBE_MAP_POSITIVE_Y: case GL_TEXTURE_CUBE_MAP_NEGATIVE_Y: case GL_TEXTURE_CUBE_MAP_POSITIVE_Z: case GL_TEXTURE_CUBE_MAP_NEGATIVE_Z: glTarget = GL_TEXTURE_CUBE_MAP; dim = 2; clType = CL_MEM_OBJECT_IMAGE2D; clGLType = CL_GL_OBJECT_TEXTURE2D; break; case GL_TEXTURE_1D_ARRAY: glTarget = GL_TEXTURE_1D_ARRAY; dim = 2; clType = CL_MEM_OBJECT_IMAGE1D_ARRAY; clGLType = CL_GL_OBJECT_TEXTURE1D_ARRAY; break; case GL_TEXTURE_2D: glTarget = GL_TEXTURE_2D; dim = 2; clType = CL_MEM_OBJECT_IMAGE2D; clGLType = CL_GL_OBJECT_TEXTURE2D; break; case GL_TEXTURE_2D_MULTISAMPLE: glTarget = GL_TEXTURE_2D_MULTISAMPLE; dim = 2; clType = CL_MEM_OBJECT_IMAGE2D; clGLType = CL_GL_OBJECT_TEXTURE2D; break; case GL_TEXTURE_RECTANGLE_ARB: glTarget = GL_TEXTURE_RECTANGLE_ARB; dim = 2; clType = CL_MEM_OBJECT_IMAGE2D; clGLType = CL_GL_OBJECT_TEXTURE2D; break; case GL_TEXTURE_2D_ARRAY: glTarget = GL_TEXTURE_2D_ARRAY; dim = 3; clType = CL_MEM_OBJECT_IMAGE2D_ARRAY; clGLType = CL_GL_OBJECT_TEXTURE2D_ARRAY; break; case GL_TEXTURE_3D: glTarget = GL_TEXTURE_3D; dim = 3; clType = CL_MEM_OBJECT_IMAGE3D; clGLType = CL_GL_OBJECT_TEXTURE3D; break; default: // wrong value LogWarning("invalid \"target\" value"); HIP_RETURN(hipErrorInvalidValue); break; } amdContext.glenv()->glBindTexture_(glTarget, image); // Check if size is available - data store is created if (isImage) { // Check mipmap level for "texture" name GLint gliTexBaseLevel; GLint gliTexMaxLevel; clearGLErrors(amdContext); amdContext.glenv()->glGetTexParameteriv_(glTarget, GL_TEXTURE_BASE_LEVEL, &gliTexBaseLevel); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get base mipmap level of a GL \"texture\" object"); HIP_RETURN(hipErrorInvalidValue); } clearGLErrors(amdContext); amdContext.glenv()->glGetTexParameteriv_(glTarget, GL_TEXTURE_MAX_LEVEL, &gliTexMaxLevel); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get max mipmap level of a GL \"texture\" object"); HIP_RETURN(hipErrorInvalidValue); } if ((gliTexBaseLevel > miplevel) || (miplevel > gliTexMaxLevel)) { LogWarning("\"miplevel\" is not a valid mipmap level of the GL \"texture\" object"); HIP_RETURN(hipErrorInvalidValue); } // Get GL texture format and check if it's compatible with CL format clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_INTERNAL_FORMAT, (GLint*)&glInternalFormat); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get internal format of \"miplevel\" of GL \"texture\" object"); HIP_RETURN(hipErrorInvalidValue); } amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_SAMPLES, (GLint*)&numSamples); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get numbers of samples of GL \"texture\" object"); HIP_RETURN(hipErrorInvalidValue); } if (numSamples > 1) { LogWarning("MSAA \"texture\" object is not suppoerted for the device"); HIP_RETURN(hipErrorInvalidValue); } // Now get CL format from GL format and bytes per pixel int iBytesPerPixel = 0; if (!amd::getCLFormatFromGL(amdContext, glInternalFormat, &clImageFormat, &iBytesPerPixel, 0)) { //clFlags)) { LogWarning("\"texture\" format does not map to an appropriate CL image format"); HIP_RETURN(hipErrorInvalidValue); } switch (dim) { case 3: clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_DEPTH, &gliTexDepth); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get the depth of \"miplevel\" of GL \"texure\""); HIP_RETURN(hipErrorInvalidValue); } // Fall trough to process other dimensions... case 2: clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_HEIGHT, &gliTexHeight); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get the height of \"miplevel\" of GL \"texure\""); HIP_RETURN(hipErrorInvalidValue); } // Fall trough to process other dimensions... case 1: clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_WIDTH, &gliTexWidth); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get the width of \"miplevel\" of GL \"texure\""); HIP_RETURN(hipErrorInvalidValue); } break; default: LogWarning("invalid \"target\" value"); HIP_RETURN(hipErrorInvalidValue); } } else { GLint size; // In case target is GL_TEXTURE_BUFFER GLint backingBuffer; clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_( glTarget, 0, GL_TEXTURE_BUFFER_DATA_STORE_BINDING, &backingBuffer); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get backing buffer for GL \"texture buffer\" object"); HIP_RETURN(hipErrorInvalidValue); } amdContext.glenv()->glBindBuffer_(glTarget, backingBuffer); // Get GL texture format and check if it's compatible with CL format clearGLErrors(amdContext); amdContext.glenv()->glGetIntegerv_(GL_TEXTURE_BUFFER_FORMAT_EXT, reinterpret_cast(&glInternalFormat)); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get internal format of \"miplevel\" of GL \"texture\" object"); HIP_RETURN(hipErrorInvalidValue); } // Now get CL format from GL format and bytes per pixel int iBytesPerPixel = 0; if (!amd::getCLFormatFromGL(amdContext, glInternalFormat, &clImageFormat, &iBytesPerPixel, flags)) { LogWarning("\"texture\" format does not map to an appropriate CL image format"); HIP_RETURN(hipErrorInvalidValue); } clearGLErrors(amdContext); amdContext.glenv()->glGetBufferParameteriv_(glTarget, GL_BUFFER_SIZE, &size); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("Cannot get internal format of \"miplevel\" of GL \"texture\" object"); HIP_RETURN(hipErrorInvalidValue); } gliTexWidth = size / iBytesPerPixel; } size_t imageSize = (clType == CL_MEM_OBJECT_IMAGE1D_ARRAY) ? static_cast(gliTexHeight) : static_cast(gliTexDepth); if (!amd::Image::validateDimensions( amdContext.devices(), clType, static_cast(gliTexWidth), static_cast(gliTexHeight), static_cast(gliTexDepth), imageSize)) { LogWarning("The GL \"texture\" data store is not created or out of supported dimensions"); HIP_RETURN(hipErrorInvalidValue); } target = (glTarget == GL_TEXTURE_CUBE_MAP) ? target : 0; pImageGL = new (amdContext) amd::ImageGL(amdContext, clType, flags, clImageFormat, static_cast(gliTexWidth), static_cast(gliTexHeight), static_cast(gliTexDepth), glTarget, image, 0, glInternalFormat, clGLType, numSamples, target); if (!pImageGL) { LogWarning("Cannot create class ImageGL - out of memory?"); HIP_RETURN(hipErrorUnknown); } if (!pImageGL->create()) { pImageGL->release(); HIP_RETURN(hipErrorUnknown); } // Create interop object if (pImageGL->getInteropObj() == nullptr) { LogWarning("cannot create object of class BufferGL"); pImageGL->release(); HIP_RETURN(hipErrorUnknown); } // Fixme: If more than one device is present in the context, we choose the first device. // We should come up with a more elegant solution to handle this. assert(amdContext.devices().size() == 1); const amd::Device& dev = *(amdContext.devices()[0]); device::Memory* mem = pImageGL->getDeviceMemory(dev); if (nullptr == mem) { LogPrintfError("Can't allocate memory size - 0x%08X bytes!", pImageGL->getSize()); pImageGL->release(); HIP_RETURN(hipErrorUnknown); } mem->processGLResource(device::Memory::GLDecompressResource); *resource = reinterpret_cast(pImageGL); HIP_RETURN(hipSuccess); } hipError_t hipGraphicsGLRegisterBuffer(hipGraphicsResource** resource, GLuint buffer, unsigned int flags) { HIP_INIT_API(hipGraphicsGLRegisterBuffer, resource, buffer, flags); if (!((flags == hipGraphicsRegisterFlagsNone) || (flags & hipGraphicsRegisterFlagsReadOnly) || (flags & hipGraphicsRegisterFlagsWriteDiscard))) { LogError("invalid parameter \"flags\""); HIP_RETURN(hipErrorInvalidValue); } if (resource == nullptr) { LogError("invalid resource"); HIP_RETURN(hipErrorInvalidValue); } amd::BufferGL* pBufferGL = nullptr; GLenum glErr; GLenum glTarget = GL_ARRAY_BUFFER; GLint gliSize = 0; GLint gliMapped = 0; amd::Context& amdContext = *(hip::getCurrentDevice()->asContext()); if (amdContext.glenv() == nullptr) { LogError("invalid context, gl interop not initialized"); HIP_RETURN(hipErrorInvalidValue); } // Add this scope to bound the scoped lock { amd::GLFunctions::SetIntEnv ie(amdContext.glenv()); if (!ie.isValid()) { LogWarning("\"amdContext\" is not created from GL context or share list \n"); HIP_RETURN(hipErrorUnknown); } // Verify GL buffer object clearGLErrors(amdContext); if ((GL_FALSE == amdContext.glenv()->glIsBuffer_(buffer)) || (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_()))) { LogWarning("\"buffer\" is not a GL buffer object \n"); HIP_RETURN(hipErrorInvalidResourceHandle); } // Check if size is available - data store is created amdContext.glenv()->glBindBuffer_(glTarget, buffer); clearGLErrors(amdContext); amdContext.glenv()->glGetBufferParameteriv_(glTarget, GL_BUFFER_SIZE, &gliSize); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { LogWarning("cannot get the GL buffer size \n"); HIP_RETURN(hipErrorInvalidResourceHandle); } if (gliSize == 0) { LogWarning("the GL buffer's data store is not created \n"); HIP_RETURN(hipErrorInvalidResourceHandle); } } // Release scoped lock // Now create BufferGL object pBufferGL = new (amdContext) amd::BufferGL(amdContext, flags, gliSize, 0, buffer); if (!pBufferGL) { LogWarning("cannot create object of class BufferGL"); HIP_RETURN(hipErrorUnknown); } if (!pBufferGL->create()) { pBufferGL->release(); HIP_RETURN(hipErrorUnknown); } // Create interop object if (pBufferGL->getInteropObj() == nullptr) { LogWarning("cannot create object of class BufferGL"); HIP_RETURN(hipErrorUnknown); } // Fixme: If more than one device is present in the context, we choose the first device. // We should come up with a more elegant solution to handle this. assert(amdContext.devices().size() == 1); const auto it = amdContext.devices().cbegin(); const amd::Device& dev = *(*it); device::Memory* mem = pBufferGL->getDeviceMemory(dev); if (nullptr == mem) { LogPrintfError("Can't allocate memory size - 0x%08X bytes!", pBufferGL->getSize()); HIP_RETURN(hipErrorUnknown); } mem->processGLResource(device::Memory::GLDecompressResource); *resource = reinterpret_cast(pBufferGL); HIP_RETURN(hipSuccess); } hipError_t hipGraphicsMapResources(int count, hipGraphicsResource_t* resources, hipStream_t stream) { HIP_INIT_API(hipGraphicsMapResources, count, resources, stream); amd::Context* amdContext = hip::getCurrentDevice()->asContext(); if (!amdContext || !amdContext->glenv()) { HIP_RETURN(hipErrorUnknown); } clearGLErrors(*amdContext); amdContext->glenv()->glFinish_(); if (checkForGLError(*amdContext) != GL_NO_ERROR) { HIP_RETURN(hipErrorUnknown); } hip::Stream* hip_stream = hip::getStream(stream); if (nullptr == hip_stream) { HIP_RETURN(hipErrorUnknown); } if (!hip_stream->context().glenv() || !hip_stream->context().glenv()->isAssociated()) { LogWarning("\"amdContext\" is not created from GL context or share list"); HIP_RETURN(hipErrorUnknown); } std::vector memObjects; hipError_t err = hipSetInteropObjects(count, reinterpret_cast(resources), memObjects); if (err != hipSuccess) { HIP_RETURN(err); } amd::Command::EventWaitList nullWaitList; //! Now create command and enqueue amd::AcquireExtObjectsCommand* command = new amd::AcquireExtObjectsCommand( *hip_stream, nullWaitList, count, memObjects, CL_COMMAND_ACQUIRE_GL_OBJECTS); if (command == nullptr) { HIP_RETURN(hipErrorUnknown); } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; HIP_RETURN(hipErrorUnknown); } command->enqueue(); // *not_null(event) = as_cl(&command->event()); if (as_cl(&command->event()) == nullptr) { command->release(); } const auto it = amdContext->devices().cbegin(); amd::Device* curDev = *it; for (auto& mobj : memObjects) { device::Memory* mem = reinterpret_cast(mobj->getDeviceMemory(*curDev)); amd::MemObjMap::AddMemObj(reinterpret_cast(mem->virtualAddress()), mobj); mobj->retain(); } HIP_RETURN(hipSuccess); } hipError_t hipGraphicsResourceGetMappedPointer(void** devPtr, size_t* size, hipGraphicsResource_t resource) { HIP_INIT_API(hipGraphicsResourceGetMappedPointer, devPtr, size, resource); amd::Context* amdContext = hip::getCurrentDevice()->asContext(); if (!amdContext || !amdContext->glenv()) { HIP_RETURN(hipErrorUnknown); } // Fixme: If more than one device is present in the context, we choose the first device. // We should come up with a more elegant solution to handle this. assert(amdContext->devices().size() == 1); const auto it = amdContext->devices().cbegin(); amd::Device* curDev = *it; amd::Memory* amdMem = reinterpret_cast(resource); *size = amdMem->getSize(); // Interop resources don't have svm allocations they are added to // amd::MemObjMap using device virtual address during creation. device::Memory* mem = reinterpret_cast(amdMem->getDeviceMemory(*curDev)); *devPtr = reinterpret_cast(static_cast(mem->virtualAddress())); HIP_RETURN(hipSuccess); } hipError_t hipGraphicsUnmapResources(int count, hipGraphicsResource_t* resources, hipStream_t stream) { HIP_INIT_API(hipGraphicsUnmapResources, count, resources, stream); if (!hip::isValid(stream)) { HIP_RETURN(hipErrorContextIsDestroyed); } // Wait for the current host queue hip::getStream(stream)->finish(); hip::Stream* hip_stream = hip::getStream(stream); if (nullptr == hip_stream) { HIP_RETURN(hipErrorUnknown); } std::vector memObjects; hipError_t err = hipSetInteropObjects(count, reinterpret_cast(resources), memObjects); if (err != hipSuccess) { HIP_RETURN(err); } amd::Command::EventWaitList nullWaitList; // Now create command and enqueue amd::ReleaseExtObjectsCommand* command = new amd::ReleaseExtObjectsCommand( *hip_stream, nullWaitList, count, memObjects, CL_COMMAND_RELEASE_GL_OBJECTS); if (command == nullptr) { HIP_RETURN(hipErrorUnknown); } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; HIP_RETURN(hipErrorUnknown); } command->enqueue(); if (as_cl(&command->event()) == nullptr) { command->release(); } for (auto& mobj : memObjects) { mobj->release(); } HIP_RETURN(hipSuccess); } hipError_t hipGraphicsUnregisterResource(hipGraphicsResource_t resource) { HIP_INIT_API(hipGraphicsUnregisterResource, resource); if (resource == nullptr) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(resource)->release(); HIP_RETURN(hipSuccess); } clr-rocm-5.7.1/hipamd/src/hip_global.cpp000066400000000000000000000162241450307266000201070ustar00rootroot00000000000000#include "hip_global.hpp" #include "hip/hip_runtime.h" #include "hip_internal.hpp" #include "hip_code_object.hpp" #include "platform/program.hpp" #include const char* amd_dbgapi_get_build_name(void) { return HIP_VERSION_BUILD_NAME; } const char* amd_dbgapi_get_git_hash() { return HIP_VERSION_GITHASH; } size_t amd_dbgapi_get_build_id() { return HIP_VERSION_BUILD_ID; } #ifdef __HIP_ENABLE_PCH extern const char __hip_pch_wave32[]; extern const char __hip_pch_wave64[]; extern unsigned __hip_pch_wave32_size; extern unsigned __hip_pch_wave64_size; void __hipGetPCH(const char** pch, unsigned int *size) { hipDeviceProp_t deviceProp; int deviceId; hipError_t error = hipGetDevice(&deviceId); error = hipGetDeviceProperties(&deviceProp, deviceId); if (deviceProp.warpSize == 32) { *pch = __hip_pch_wave32; *size = __hip_pch_wave32_size; } else { *pch = __hip_pch_wave64; *size = __hip_pch_wave64_size; } } #endif namespace hip { //Device Vars DeviceVar::DeviceVar(std::string name, hipModule_t hmod, int deviceId) : shadowVptr(nullptr), name_(name), amd_mem_obj_(nullptr), device_ptr_(nullptr), size_(0) { amd::Program* program = as_amd(reinterpret_cast(hmod)); device::Program* dev_program = program->getDeviceProgram(*g_devices.at(deviceId)->devices()[0]); guarantee (dev_program != nullptr, "Cannot get Device Program for module: 0x%x \n", hmod); if(!dev_program->createGlobalVarObj(&amd_mem_obj_, &device_ptr_, &size_, name.c_str())) { guarantee(false, "Cannot create GlobalVar Obj for symbol: %s \n", name.c_str()); } // Handle size 0 symbols if (size_ != 0) { if (amd_mem_obj_ == nullptr || device_ptr_ == nullptr) { LogPrintfError("Cannot get memory for creating device Var: %s", name.c_str()); guarantee(false, "Cannot get memory for creating device var"); } amd::MemObjMap::AddMemObj(device_ptr_, amd_mem_obj_); } } DeviceVar::~DeviceVar() { if (amd_mem_obj_ != nullptr) { amd::MemObjMap::RemoveMemObj(device_ptr_); amd_mem_obj_->release(); } if (shadowVptr != nullptr) { textureReference* texRef = reinterpret_cast(shadowVptr); hipError_t err = ihipUnbindTexture(texRef); delete texRef; shadowVptr = nullptr; } device_ptr_ = nullptr; size_ = 0; } //Device Functions DeviceFunc::DeviceFunc(std::string name, hipModule_t hmod) : dflock_("function lock"), name_(name), kernel_(nullptr) { amd::Program* program = as_amd(reinterpret_cast(hmod)); const amd::Symbol *symbol = program->findSymbol(name.c_str()); guarantee(symbol != nullptr, "Cannot find Symbol with name: %s \n", name.c_str()); kernel_ = new amd::Kernel(*program, *symbol, name); guarantee(kernel_ != nullptr, "Cannot Create kernel with name: %s \n", name.c_str()); } DeviceFunc::~DeviceFunc() { if (kernel_ != nullptr) { kernel_->release(); } } //Abstract functions Function::Function(const std::string& name, FatBinaryInfo** modules) : name_(name), modules_(modules) { dFunc_.resize(g_devices.size()); } Function::~Function() { for (auto& elem : dFunc_) { delete elem; } name_ = ""; modules_ = nullptr; } hipError_t Function::getDynFunc(hipFunction_t* hfunc, hipModule_t hmod) { guarantee((dFunc_.size() == g_devices.size()), "dFunc Size mismatch"); if (dFunc_[ihipGetDevice()] == nullptr) { dFunc_[ihipGetDevice()] = new DeviceFunc(name_, hmod); } *hfunc = dFunc_[ihipGetDevice()]->asHipFunction(); return hipSuccess; } hipError_t Function::getStatFunc(hipFunction_t* hfunc, int deviceId) { guarantee(modules_ != nullptr, "Module not initialized"); hipModule_t hmod = nullptr; IHIP_RETURN_ONFAIL((*modules_)->BuildProgram(deviceId)); IHIP_RETURN_ONFAIL((*modules_)->GetModule(deviceId, &hmod)); if (dFunc_[deviceId] == nullptr) { dFunc_[deviceId] = new DeviceFunc(name_, hmod); } *hfunc = dFunc_[deviceId]->asHipFunction(); return hipSuccess; } hipError_t Function::getStatFuncAttr(hipFuncAttributes* func_attr, int deviceId) { guarantee((modules_ != nullptr), "Module not initialized"); hipModule_t hmod = nullptr; IHIP_RETURN_ONFAIL((*modules_)->BuildProgram(deviceId)); IHIP_RETURN_ONFAIL((*modules_)->GetModule(deviceId, &hmod)); if (dFunc_[deviceId] == nullptr) { dFunc_[deviceId] = new DeviceFunc(name_, hmod); } const std::vector& devices = amd::Device::getDevices(CL_DEVICE_TYPE_GPU, false); amd::Kernel* kernel = dFunc_[deviceId]->kernel(); const device::Kernel::WorkGroupInfo* wginfo = kernel->getDeviceKernel(*devices[deviceId])->workGroupInfo(); func_attr->sharedSizeBytes = static_cast(wginfo->localMemSize_); func_attr->binaryVersion = static_cast(kernel->signature().version()); func_attr->cacheModeCA = 0; func_attr->constSizeBytes = 0; func_attr->localSizeBytes = wginfo->privateMemSize_; func_attr->maxDynamicSharedSizeBytes = static_cast(wginfo->availableLDSSize_ - wginfo->localMemSize_); func_attr->maxThreadsPerBlock = static_cast(wginfo->size_); func_attr->numRegs = static_cast(wginfo->usedVGPRs_); func_attr->preferredShmemCarveout = 0; func_attr->ptxVersion = 30; return hipSuccess; } //Abstract Vars Var::Var(const std::string& name, DeviceVarKind dVarKind, size_t size, int type, int norm, FatBinaryInfo** modules) : name_(name), dVarKind_(dVarKind), size_(size), type_(type), norm_(norm), modules_(modules), managedVarPtr_(nullptr), align_(0) { dVar_.resize(g_devices.size()); } Var::Var(const std::string& name, DeviceVarKind dVarKind, void *pointer, size_t size, unsigned align, FatBinaryInfo** modules) : name_(name), dVarKind_(dVarKind), size_(size), modules_(modules), managedVarPtr_(pointer), align_(align), type_(0), norm_(0) { dVar_.resize(g_devices.size()); } Var::~Var() { for (auto& elem : dVar_) { delete elem; } modules_ = nullptr; } hipError_t Var::getDeviceVar(DeviceVar** dvar, int deviceId, hipModule_t hmod) { guarantee((deviceId >= 0), "Invalid DeviceId, less than zero"); guarantee((static_cast(deviceId) < g_devices.size()), "Invalid DeviceId, greater than no of code objects"); guarantee((dVar_.size() == g_devices.size()), "Device Var not initialized to size"); if (dVar_[deviceId] == nullptr) { dVar_[deviceId] = new DeviceVar(name_, hmod, deviceId); } *dvar = dVar_[deviceId]; return hipSuccess; } hipError_t Var::getStatDeviceVar(DeviceVar** dvar, int deviceId) { guarantee((deviceId >= 0) , "Invalid DeviceId, less than zero"); guarantee((static_cast(deviceId) < g_devices.size()), "Invalid DeviceId, greater than no of code objects"); if (dVar_[deviceId] == nullptr) { hipModule_t hmod = nullptr; IHIP_RETURN_ONFAIL((*modules_)->BuildProgram(deviceId)); IHIP_RETURN_ONFAIL((*modules_)->GetModule(deviceId, &hmod)); dVar_[deviceId] = new DeviceVar(name_, hmod, deviceId); } *dvar = dVar_[deviceId]; return hipSuccess; } }; //namespace: hip clr-rocm-5.7.1/hipamd/src/hip_global.hpp000066400000000000000000000101101450307266000201000ustar00rootroot00000000000000#ifndef HIP_GLOBAL_HPP #define HIP_GLOBAL_HPP #include #include #include "hip/hip_runtime_api.h" #include "hip/hip_runtime.h" #include "hip_internal.hpp" #include "hip_fatbin.hpp" #include "platform/program.hpp" namespace hip { //Forward Declaration class CodeObject; //Device Structures class DeviceVar { public: DeviceVar(std::string name, hipModule_t hmod, int deviceId); ~DeviceVar(); //Accessors for device ptr and size, populated during constructor. hipDeviceptr_t device_ptr() const { return device_ptr_; } size_t size() const { return size_; } std::string name() const { return name_; } void* shadowVptr; private: std::string name_; //Name of the var amd::Memory* amd_mem_obj_; //amd_mem_obj abstraction hipDeviceptr_t device_ptr_; //Device Pointer size_t size_; //Size of the var }; class DeviceFunc { public: DeviceFunc(std::string name, hipModule_t hmod); ~DeviceFunc(); amd::Monitor dflock_; //Converts DeviceFunc to hipFunction_t(used by app) and vice versa. hipFunction_t asHipFunction() { return reinterpret_cast(this); } static DeviceFunc* asFunction(hipFunction_t f) { return reinterpret_cast(f); } //Accessor for kernel_ and name_ populated during constructor. std::string name() const { return name_; } amd::Kernel* kernel() const { return kernel_; } private: std::string name_; //name of the func(not unique identifier) amd::Kernel* kernel_; //Kernel ptr referencing to ROCclr Symbol }; //Abstract Structures class Function { public: Function(const std::string& name, FatBinaryInfo** modules=nullptr); ~Function(); //Return DeviceFunc for this this dynamically loaded module hipError_t getDynFunc(hipFunction_t* hfunc, hipModule_t hmod); //Return Device Func & attr . Generate/build if not already done so. hipError_t getStatFunc(hipFunction_t *hfunc, int deviceId); hipError_t getStatFuncAttr(hipFuncAttributes* func_attr, int deviceId); void resize_dFunc(size_t size) { dFunc_.resize(size); } FatBinaryInfo** moduleInfo() { return modules_; } const std::string& name() const { return name_; } private: std::vector dFunc_; //DeviceFuncObj per Device std::string name_; //name of the func(not unique identifier) FatBinaryInfo** modules_; // static module where it is referenced }; class Var { public: //Types of variable enum DeviceVarKind { DVK_Variable = 0, DVK_Surface, DVK_Texture, DVK_Managed }; Var(const std::string& name, DeviceVarKind dVarKind, size_t size, int type, int norm, FatBinaryInfo** modules = nullptr); Var(const std::string& name, DeviceVarKind dVarKind, void *pointer, size_t size, unsigned align, FatBinaryInfo** modules = nullptr); ~Var(); //Return DeviceVar for this dynamically loaded module hipError_t getDeviceVar(DeviceVar** dvar, int deviceId, hipModule_t hmod); //Return DeviceVar for module Generate/build if not already done so. hipError_t getStatDeviceVar(DeviceVar** dvar, int deviceId); void resize_dVar(size_t size) { dVar_.resize(size); } FatBinaryInfo** moduleInfo() { return modules_; }; DeviceVarKind getVarKind() const { return dVarKind_; } size_t getSize() const { return size_; } void* getManagedVarPtr() { return managedVarPtr_; }; void setManagedVarInfo(void* pointer, size_t size) { managedVarPtr_ = pointer; size_ = size; dVarKind_ = DVK_Managed; } private: std::vector dVar_; // DeviceVarObj per Device std::string name_; // Variable name (not unique identifier) DeviceVarKind dVarKind_; // Variable kind size_t size_; // Size of the variable int type_; // Type(Textures/Surfaces only) int norm_; // Type(Textures/Surfaces only) FatBinaryInfo** modules_; // static module where it is referenced void *managedVarPtr_; // Managed memory pointer with size_ & align_ unsigned int align_; // Managed memory alignment }; }; //namespace: hip #endif /* HIP_GLOBAL_HPP */ clr-rocm-5.7.1/hipamd/src/hip_graph.cpp000066400000000000000000003047111450307266000177510ustar00rootroot00000000000000/* Copyright (c) 2021 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "top.hpp" #include "hip_graph_internal.hpp" #include "platform/command.hpp" #include "hip_conversions.hpp" #include "hip_platform.hpp" #include "hip_event.hpp" #include "hip_mempool_impl.hpp" std::vector g_captureStreams; amd::Monitor g_captureStreamsLock{"StreamCaptureGlobalList"}; amd::Monitor g_streamSetLock{"StreamCaptureset"}; std::unordered_set g_allCapturingStreams; inline hipError_t ihipGraphAddNode(hipGraphNode_t graphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, bool capture = true) { graph->AddNode(graphNode); std::unordered_set DuplicateDep; for (size_t i = 0; i < numDependencies; i++) { if ((!hipGraphNode::isNodeValid(pDependencies[i])) || (graph != pDependencies[i]->GetParentGraph())) { return hipErrorInvalidValue; } if (DuplicateDep.find(pDependencies[i]) != DuplicateDep.end()) { return hipErrorInvalidValue; } DuplicateDep.insert(pDependencies[i]); pDependencies[i]->AddEdge(graphNode); } if (capture == false) { { amd::ScopedLock lock(g_streamSetLock); for (auto stream : g_allCapturingStreams) { if (stream->GetCaptureGraph() == graph) { graph->AddManualNodeDuringCapture(graphNode); break; } } } } return hipSuccess; } hipError_t ihipGraphAddKernelNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipKernelNodeParams* pNodeParams, bool capture = true) { if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr) || pNodeParams == nullptr || pNodeParams->func == nullptr) { return hipErrorInvalidValue; } if (!ihipGraph::isGraphValid(graph)) { return hipErrorInvalidValue; } // If neither 'kernelParams' or 'extra' are provided or if both are provided, return error if ((pNodeParams->kernelParams == nullptr && pNodeParams->extra == nullptr) || (pNodeParams->kernelParams != nullptr && pNodeParams->extra != nullptr)) { return hipErrorInvalidValue; } hipError_t status = hipGraphKernelNode::validateKernelParams(pNodeParams); if (hipSuccess != status) { return status; } size_t globalWorkSizeX = static_cast(pNodeParams->gridDim.x) * pNodeParams->blockDim.x; size_t globalWorkSizeY = static_cast(pNodeParams->gridDim.y) * pNodeParams->blockDim.y; size_t globalWorkSizeZ = static_cast(pNodeParams->gridDim.z) * pNodeParams->blockDim.z; if (globalWorkSizeX > std::numeric_limits::max() || globalWorkSizeY > std::numeric_limits::max() || globalWorkSizeZ > std::numeric_limits::max()) { return hipErrorInvalidConfiguration; } *pGraphNode = new hipGraphKernelNode(pNodeParams); status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, capture); return status; } hipError_t ihipGraphAddMemcpyNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipMemcpy3DParms* pCopyParams, bool capture = true) { if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr) || pCopyParams == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpy3D_validate(pCopyParams); if (status != hipSuccess) { return status; } *pGraphNode = new hipGraphMemcpyNode(pCopyParams); status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, capture); return status; } hipError_t ihipGraphAddMemcpyNode1D(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dst, const void* src, size_t count, hipMemcpyKind kind, bool capture = true) { if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr) || count ==0) { return hipErrorInvalidValue; } hipError_t status = hipGraphMemcpyNode1D::ValidateParams(dst, src, count, kind); if (status != hipSuccess) { return status; } *pGraphNode = new hipGraphMemcpyNode1D(dst, src, count, kind); status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, capture); return status; } hipError_t ihipGraphAddMemsetNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipMemsetParams* pMemsetParams, bool capture = true) { if (pGraphNode == nullptr || graph == nullptr || pMemsetParams == nullptr || (numDependencies > 0 && pDependencies == nullptr) || pMemsetParams->height == 0) { return hipErrorInvalidValue; } // The element size must be 1, 2, or 4 bytes if (pMemsetParams->elementSize != sizeof(int8_t) && pMemsetParams->elementSize != sizeof(int16_t) && pMemsetParams->elementSize != sizeof(int32_t)) { return hipErrorInvalidValue; } hipError_t status; status = ihipGraphMemsetParams_validate(pMemsetParams); if (status != hipSuccess) { return status; } if (pMemsetParams->height == 1) { status = ihipMemset_validate(pMemsetParams->dst, pMemsetParams->value, pMemsetParams->elementSize, pMemsetParams->width * pMemsetParams->elementSize); } else { if (pMemsetParams->pitch < (pMemsetParams->width * pMemsetParams->elementSize)) { return hipErrorInvalidValue; } auto sizeBytes = pMemsetParams->width * pMemsetParams->height * pMemsetParams->elementSize * 1; status = ihipMemset3D_validate( {pMemsetParams->dst, pMemsetParams->pitch, pMemsetParams->width, pMemsetParams->height}, pMemsetParams->value, {pMemsetParams->width, pMemsetParams->height, 1}, sizeBytes); } if (status != hipSuccess) { return status; } *pGraphNode = new hipGraphMemsetNode(pMemsetParams); status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, capture); return status; } hipError_t capturehipLaunchKernel(hipStream_t& stream, const void*& hostFunction, dim3& gridDim, dim3& blockDim, void**& args, size_t& sharedMemBytes) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node kernel launch on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipKernelNodeParams nodeParams; nodeParams.func = const_cast(hostFunction); nodeParams.blockDim = blockDim; nodeParams.extra = nullptr; nodeParams.gridDim = gridDim; nodeParams.kernelParams = args; nodeParams.sharedMemBytes = sharedMemBytes; hipGraphNode_t pGraphNode; hipError_t status = ihipGraphAddKernelNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &nodeParams); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t ihipExtLaunchKernel(hipStream_t stream, hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t localWorkSizeX, uint32_t localWorkSizeY, uint32_t localWorkSizeZ, size_t sharedMemBytes, void** kernelParams, void** extra, hipEvent_t startEvent, hipEvent_t stopEvent, uint32_t flags, bool capture = true) { if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipError_t status; if (startEvent != nullptr) { pGraphNode = new hipGraphEventRecordNode(startEvent); status = ihipGraphAddNode(pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), capture); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); } hipKernelNodeParams nodeParams; nodeParams.func = f; nodeParams.blockDim = dim3(localWorkSizeX, localWorkSizeY, localWorkSizeZ); nodeParams.extra = extra; nodeParams.gridDim = dim3(globalWorkSizeX / localWorkSizeX, globalWorkSizeY / localWorkSizeY, globalWorkSizeZ / localWorkSizeZ); nodeParams.kernelParams = kernelParams; nodeParams.sharedMemBytes = sharedMemBytes; status = ihipGraphAddKernelNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &nodeParams); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); if (stopEvent != nullptr) { pGraphNode = new hipGraphEventRecordNode(stopEvent); status = ihipGraphAddNode(pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size()); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); } return hipSuccess; } hipError_t capturehipExtModuleLaunchKernel(hipStream_t& stream, hipFunction_t& f, uint32_t& globalWorkSizeX, uint32_t& globalWorkSizeY, uint32_t& globalWorkSizeZ, uint32_t& localWorkSizeX, uint32_t& localWorkSizeY, uint32_t& localWorkSizeZ, size_t& sharedMemBytes, void**& kernelParams, void**& extra, hipEvent_t& startEvent, hipEvent_t& stopEvent, uint32_t& flags) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Ext Module launch kernel on stream : %p", stream); return ihipExtLaunchKernel(stream, f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, localWorkSizeX, localWorkSizeY, localWorkSizeZ, sharedMemBytes, kernelParams, extra, startEvent, stopEvent, flags); } hipError_t capturehipExtLaunchKernel(hipStream_t& stream, const void*& hostFunction, dim3& gridDim, dim3& blockDim, void**& args, size_t& sharedMemBytes, hipEvent_t& startEvent, hipEvent_t& stopEvent, int& flags) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Ext kernel launch on stream : %p", stream); return ihipExtLaunchKernel( stream, reinterpret_cast(const_cast(hostFunction)), gridDim.x * blockDim.x, gridDim.y * blockDim.y, gridDim.z * blockDim.z, blockDim.x, blockDim.y, blockDim.z, sharedMemBytes, args, nullptr, startEvent, stopEvent, flags); } hipError_t capturehipModuleLaunchKernel(hipStream_t& stream, hipFunction_t& f, uint32_t& gridDimX, uint32_t& gridDimY, uint32_t& gridDimZ, uint32_t& blockDimX, uint32_t& blockDimY, uint32_t& blockDimZ, uint32_t& sharedMemBytes, void**& kernelParams, void**& extra) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node module launch kernel launch on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipKernelNodeParams nodeParams; nodeParams.func = f; nodeParams.blockDim = {blockDimX, blockDimY, blockDimZ}; nodeParams.extra = extra; nodeParams.gridDim = {gridDimX, gridDimY, gridDimZ}; nodeParams.kernelParams = kernelParams; nodeParams.sharedMemBytes = sharedMemBytes; hipGraphNode_t pGraphNode; hipError_t status = ihipGraphAddKernelNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &nodeParams); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpy3DAsync(hipStream_t& stream, const hipMemcpy3DParms*& p) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memcpy3D on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpy2DAsync(hipStream_t& stream, void*& dst, size_t& dpitch, const void*& src, size_t& spitch, size_t& width, size_t& height, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memcpy2D on stream : %p", stream); if (dst == nullptr || src == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipMemcpy3DParms p = {}; memset(&p, 0, sizeof(p)); p.kind = kind; p.srcPtr.ptr = const_cast(src); p.srcPtr.pitch = spitch; p.srcArray = nullptr; // Ignored. p.dstPtr.ptr = const_cast(dst); p.dstPtr.pitch = dpitch; p.dstArray = nullptr; // Ignored. p.extent = {width, height, 1}; hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpy2DFromArrayAsync(hipStream_t& stream, void*& dst, size_t& dpitch, hipArray_const_t& src, size_t& wOffsetSrc, size_t& hOffsetSrc, size_t& width, size_t& height, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memcpy2DFromArray on stream : %p", stream); if (src == nullptr || dst == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipMemcpy3DParms p = {}; memset(&p, 0, sizeof(p)); p.srcPos = {wOffsetSrc, hOffsetSrc, 0}; p.kind = kind; p.srcPtr.ptr = nullptr; p.srcArray = const_cast(src); // Ignored. p.kind = kind; p.dstPtr.ptr = dst; p.dstArray = nullptr; // Ignored. p.dstPtr.pitch = dpitch; p.extent = {width / hip::getElementSize(p.srcArray), height, 1}; hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpyFromArrayAsync(hipStream_t& stream, void*& dst, hipArray_const_t& src, size_t& wOffsetSrc, size_t& hOffsetSrc, size_t& count, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memcpy2DFromArray on stream : %p", stream); if (src == nullptr || dst == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipMemcpy3DParms p = {}; memset(&p, 0, sizeof(p)); p.srcPos = {wOffsetSrc, hOffsetSrc, 0}; p.kind = kind; p.srcPtr.ptr = nullptr; p.srcArray = const_cast(src); p.kind = kind; p.dstPtr.ptr = dst; p.dstArray = nullptr; // Ignored. p.dstPtr.pitch = 0; const size_t arrayHeight = (src->height != 0) ? src->height : 1; const size_t widthInBytes = count / arrayHeight; const size_t height = (count / src->width) / hip::getElementSize(src); p.extent = {widthInBytes / hip::getElementSize(p.srcArray), height, 1}; hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpy2DToArrayAsync(hipStream_t& stream, hipArray*& dst, size_t& wOffset, size_t& hOffset, const void*& src, size_t& spitch, size_t& width, size_t& height, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memcpy2DFromArray on stream : %p", stream); if (src == nullptr || dst == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipMemcpy3DParms p = {}; memset(&p, 0, sizeof(p)); p.dstPos = {wOffset, hOffset, 0}; p.kind = kind; p.dstPtr.ptr = nullptr; p.dstArray = dst; // Ignored. p.kind = kind; p.srcPtr.ptr = const_cast(src); p.srcArray = nullptr; // Ignored. p.srcPtr.pitch = spitch; p.extent = {width / hip::getElementSize(p.dstArray), height, 1}; hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpyToArrayAsync(hipStream_t& stream, hipArray_t& dst, size_t& wOffset, size_t& hOffset, const void*& src, size_t& count, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memcpy2DFromArray on stream : %p", stream); if (src == nullptr || dst == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipMemcpy3DParms p = {}; memset(&p, 0, sizeof(p)); p.dstPos = {wOffset, hOffset, 0}; p.kind = kind; p.dstPtr.ptr = nullptr; p.dstArray = dst; // Ignored. p.kind = kind; p.srcPtr.ptr = const_cast(src); p.srcArray = nullptr; // Ignored. p.srcPtr.pitch = 0; const size_t arrayHeight = (dst->height != 0) ? dst->height : 1; const size_t widthInBytes = count / arrayHeight; const size_t height = (count / dst->width) / hip::getElementSize(dst); p.extent = {widthInBytes / hip::getElementSize(p.dstArray), height, 1}; hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpyParam2DAsync(hipStream_t& stream, const hip_Memcpy2D*& pCopy) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node MemcpyParam2D on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipMemcpy3DParms p = {}; memset(&p, 0, sizeof(p)); p.srcArray = pCopy->srcArray; p.srcPos = {pCopy->srcXInBytes, pCopy->srcY, 0}; p.srcPtr.pitch = pCopy->srcPitch; if (pCopy->srcDevice != nullptr) { p.srcPtr.ptr = pCopy->srcDevice; } if (pCopy->srcHost != nullptr) { p.srcPtr.ptr = const_cast(pCopy->srcHost); } p.dstArray = pCopy->dstArray; p.dstPos = {pCopy->dstXInBytes, pCopy->dstY, 0}; p.dstPtr.pitch = pCopy->srcPitch; if (pCopy->dstDevice != nullptr) { p.dstPtr.ptr = pCopy->dstDevice; } if (pCopy->dstHost != nullptr) { p.dstPtr.ptr = const_cast(pCopy->dstHost); } p.extent = {pCopy->WidthInBytes, pCopy->Height, 1}; if (pCopy->srcMemoryType == hipMemoryTypeHost && pCopy->dstMemoryType == hipMemoryTypeDevice) { p.kind = hipMemcpyHostToDevice; } else if (pCopy->srcMemoryType == hipMemoryTypeDevice && pCopy->dstMemoryType == hipMemoryTypeHost) { p.kind = hipMemcpyDeviceToHost; } else if (pCopy->srcMemoryType == hipMemoryTypeDevice && pCopy->dstMemoryType == hipMemoryTypeDevice) { p.kind = hipMemcpyDeviceToDevice; } hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpyAtoHAsync(hipStream_t& stream, void*& dstHost, hipArray*& srcArray, size_t& srcOffset, size_t& ByteCount) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node MemcpyParam2D on stream : %p", stream); if (srcArray == nullptr || dstHost == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipMemcpy3DParms p = {}; memset(&p, 0, sizeof(p)); p.srcArray = srcArray; p.srcPos = {srcOffset, 0, 0}; p.dstPtr.ptr = dstHost; p.extent = {ByteCount / hip::getElementSize(p.srcArray), 1, 1}; hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpyHtoAAsync(hipStream_t& stream, hipArray*& dstArray, size_t& dstOffset, const void*& srcHost, size_t& ByteCount) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node MemcpyParam2D on stream : %p", stream); if (dstArray == nullptr || srcHost == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipMemcpy3DParms p = {}; memset(&p, 0, sizeof(p)); p.dstArray = dstArray; p.dstPos = {dstOffset, 0, 0}; p.srcPtr.ptr = const_cast(srcHost); p.extent = {ByteCount / hip::getElementSize(p.dstArray), 1, 1}; hipError_t status = ihipGraphAddMemcpyNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &p); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpy(hipStream_t stream, void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind) { if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); std::vector pDependencies = s->GetLastCapturedNodes(); size_t numDependencies = s->GetLastCapturedNodes().size(); hipGraph_t graph = s->GetCaptureGraph(); hipError_t status = ihipMemcpy_validate(dst, src, sizeBytes, kind); if (status != hipSuccess) { return status; } hipGraphNode_t node = new hipGraphMemcpyNode1D(dst, src, sizeBytes, kind); status = ihipGraphAddNode(node, graph, pDependencies.data(), numDependencies); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(node); return hipSuccess; } hipError_t capturehipMemcpyAsync(hipStream_t& stream, void*& dst, const void*& src, size_t& sizeBytes, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memcpy1D on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } return capturehipMemcpy(stream, dst, src, sizeBytes, kind); } hipError_t capturehipMemcpyHtoDAsync(hipStream_t& stream, hipDeviceptr_t& dstDevice, void*& srcHost, size_t& ByteCount, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node MemcpyHtoD on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } return capturehipMemcpy(stream, dstDevice, srcHost, ByteCount, kind); } hipError_t capturehipMemcpyDtoDAsync(hipStream_t& stream, hipDeviceptr_t& dstDevice, hipDeviceptr_t& srcDevice, size_t& ByteCount, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node hipMemcpyDtoD on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } return capturehipMemcpy(stream, dstDevice, srcDevice, ByteCount, kind); } hipError_t capturehipMemcpyDtoHAsync(hipStream_t& stream, void*& dstHost, hipDeviceptr_t& srcDevice, size_t& ByteCount, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node hipMemcpyDtoH on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } return capturehipMemcpy(stream, dstHost, srcDevice, ByteCount, kind); } hipError_t capturehipMemcpyFromSymbolAsync(hipStream_t& stream, void*& dst, const void*& symbol, size_t& sizeBytes, size_t& offset, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node MemcpyFromSymbolNode on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; hipError_t status = ihipMemcpySymbol_validate(symbol, sizeBytes, offset, sym_size, device_ptr); if (status != hipSuccess) { HIP_RETURN(status); } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode = new hipGraphMemcpyNodeFromSymbol(dst, symbol, sizeBytes, offset, kind); status = ihipGraphAddNode(pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size()); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemcpyToSymbolAsync(hipStream_t& stream, const void*& symbol, const void*& src, size_t& sizeBytes, size_t& offset, hipMemcpyKind& kind) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node MemcpyToSymbolNode on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; hipError_t status = ihipMemcpySymbol_validate(symbol, sizeBytes, offset, sym_size, device_ptr); if (status != hipSuccess) { HIP_RETURN(status); } hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode = new hipGraphMemcpyNodeToSymbol(symbol, src, sizeBytes, offset, kind); status = ihipGraphAddNode(pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size()); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemsetAsync(hipStream_t& stream, void*& dst, int& value, size_t& valueSize, size_t& sizeBytes) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memset1D on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hipMemsetParams memsetParams = {0}; memsetParams.dst = dst; memsetParams.value = value; memsetParams.elementSize = valueSize; memsetParams.width = sizeBytes / valueSize; memsetParams.height = 1; hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipError_t status = ihipGraphAddMemsetNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &memsetParams); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemset2DAsync(hipStream_t& stream, void*& dst, size_t& pitch, int& value, size_t& width, size_t& height) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memset2D on stream : %p", stream); hipMemsetParams memsetParams = {0}; if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } memsetParams.dst = dst; memsetParams.value = value; memsetParams.width = width; memsetParams.height = height; memsetParams.pitch = pitch; hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode; hipError_t status = ihipGraphAddMemsetNode(&pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size(), &memsetParams); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } hipError_t capturehipMemset3DAsync(hipStream_t& stream, hipPitchedPtr& pitchedDevPtr, int& value, hipExtent& extent) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node Memset3D on stream : %p", stream); if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } return hipSuccess; } hipError_t capturehipLaunchHostFunc(hipStream_t& stream, hipHostFn_t& fn, void*& userData) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node host on stream : %p", stream); if (fn == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hipHostNodeParams hostParams = {0}; hostParams.fn = fn; hostParams.userData = userData; hip::Stream* s = reinterpret_cast(stream); hipGraphNode_t pGraphNode = new hipGraphHostNode(&hostParams); hipError_t status = ihipGraphAddNode(pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size()); if (status != hipSuccess) { return status; } s->SetLastCapturedNode(pGraphNode); return hipSuccess; } // ================================================================================================ hipError_t capturehipMallocAsync(hipStream_t stream, hipMemPool_t mem_pool, size_t size, void** dev_ptr) { auto s = reinterpret_cast(stream); auto mpool = reinterpret_cast(mem_pool); hipMemAllocNodeParams node_params{}; node_params.poolProps.allocType = hipMemAllocationTypePinned; node_params.poolProps.location.id = mpool->Device()->deviceId(); node_params.poolProps.location.type = hipMemLocationTypeDevice; std::vector descs; for (const auto device : g_devices ) { hipMemLocation location{hipMemLocationTypeDevice, device->deviceId()}; hipMemAccessFlags flags{}; mpool->GetAccess(device, &flags); descs.push_back({location, flags}); } node_params.accessDescs = &descs[0]; node_params.accessDescCount = descs.size(); node_params.bytesize = size; auto mem_alloc_node = new hipGraphMemAllocNode(&node_params); auto status = ihipGraphAddNode(mem_alloc_node, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size()); if (status != hipSuccess) { return status; } // Without VM runtime executes the node during capture, so it can return a valid device pointer *dev_ptr = (HIP_MEM_POOL_USE_VM) ? mem_alloc_node->ReserveAddress() : mem_alloc_node->Execute(s); s->SetLastCapturedNode(mem_alloc_node); return hipSuccess; } // ================================================================================================ hipError_t capturehipFreeAsync(hipStream_t stream, void* dev_ptr) { hip::Stream* s = reinterpret_cast(stream); auto mem_free_node = new hipGraphMemFreeNode(dev_ptr); auto status = ihipGraphAddNode(mem_free_node, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size()); if (status != hipSuccess) { return status; } // Execute the node without VM support, so runtime can release memory into cache if (!HIP_MEM_POOL_USE_VM) { mem_free_node->Execute(s); } s->SetLastCapturedNode(mem_free_node); return hipSuccess; } // ================================================================================================ hipError_t hipStreamIsCapturing_common(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus) { if (pCaptureStatus == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } if (hip::Stream::StreamCaptureBlocking() == true && stream == nullptr) { return hipErrorStreamCaptureImplicit; } if (stream == nullptr) { *pCaptureStatus = hipStreamCaptureStatusNone; } else { *pCaptureStatus = reinterpret_cast(stream)->GetCaptureStatus(); } return hipSuccess; } hipError_t hipStreamIsCapturing(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus) { HIP_INIT_API(hipStreamIsCapturing, stream, pCaptureStatus); HIP_RETURN(hipStreamIsCapturing_common(stream, pCaptureStatus)); } hipError_t hipStreamIsCapturing_spt(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus) { HIP_INIT_API(hipStreamIsCapturing, stream, pCaptureStatus); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamIsCapturing_common(stream, pCaptureStatus)); } hipError_t hipThreadExchangeStreamCaptureMode(hipStreamCaptureMode* mode) { HIP_INIT_API(hipThreadExchangeStreamCaptureMode, mode); if (mode == nullptr || *mode < hipStreamCaptureModeGlobal || *mode > hipStreamCaptureModeRelaxed) { HIP_RETURN(hipErrorInvalidValue); } auto oldMode = hip::tls.stream_capture_mode_; hip::tls.stream_capture_mode_ = *mode; *mode = oldMode; HIP_RETURN_DURATION(hipSuccess); } hipError_t hipStreamBeginCapture_common(hipStream_t stream, hipStreamCaptureMode mode) { if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } // capture cannot be initiated on legacy stream if (stream == nullptr) { return hipErrorStreamCaptureUnsupported; } if (mode < hipStreamCaptureModeGlobal || mode > hipStreamCaptureModeRelaxed) { return hipErrorInvalidValue; } hip::Stream* s = reinterpret_cast(stream); // It can be initiated if the stream is not already in capture mode if (s->GetCaptureStatus() == hipStreamCaptureStatusActive) { return hipErrorIllegalState; } s->SetCaptureGraph(new ihipGraph(s->GetDevice())); s->SetCaptureId(); s->SetCaptureMode(mode); s->SetOriginStream(); if (mode != hipStreamCaptureModeRelaxed) { hip::tls.capture_streams_.push_back(s); } if (mode == hipStreamCaptureModeGlobal) { amd::ScopedLock lock(g_captureStreamsLock); g_captureStreams.push_back(s); } { amd::ScopedLock lock(g_streamSetLock); g_allCapturingStreams.insert(s); } return hipSuccess; } hipError_t hipStreamBeginCapture(hipStream_t stream, hipStreamCaptureMode mode) { HIP_INIT_API(hipStreamBeginCapture, stream, mode); HIP_RETURN_DURATION(hipStreamBeginCapture_common(stream, mode)); } hipError_t hipStreamBeginCapture_spt(hipStream_t stream, hipStreamCaptureMode mode) { HIP_INIT_API(hipStreamBeginCapture, stream, mode); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN_DURATION(hipStreamBeginCapture_common(stream, mode)); } hipError_t hipStreamEndCapture_common(hipStream_t stream, hipGraph_t* pGraph) { if (pGraph == nullptr) { return hipErrorInvalidValue; } if (stream == nullptr) { return hipErrorIllegalState; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); // Capture status must be active before endCapture can be initiated if (s->GetCaptureStatus() == hipStreamCaptureStatusNone) { return hipErrorIllegalState; } // Capture must be ended on the same stream in which it was initiated if (!s->IsOriginStream()) { return hipErrorStreamCaptureUnmatched; } // If mode is not hipStreamCaptureModeRelaxed, hipStreamEndCapture must be called on the stream // from the same thread const auto& it = std::find(hip::tls.capture_streams_.begin(), hip::tls.capture_streams_.end(), s); if (s->GetCaptureMode() != hipStreamCaptureModeRelaxed) { if (it == hip::tls.capture_streams_.end()) { return hipErrorStreamCaptureWrongThread; } hip::tls.capture_streams_.erase(it); } if (s->GetCaptureMode() == hipStreamCaptureModeGlobal) { amd::ScopedLock lock(g_captureStreamsLock); g_captureStreams.erase(std::find(g_captureStreams.begin(), g_captureStreams.end(), s)); } // If capture was invalidated, due to a violation of the rules of stream capture if (s->GetCaptureStatus() == hipStreamCaptureStatusInvalidated) { *pGraph = nullptr; return hipErrorStreamCaptureInvalidated; } { amd::ScopedLock lock(g_streamSetLock); g_allCapturingStreams.erase(std::find(g_allCapturingStreams.begin(), g_allCapturingStreams.end(), s)); } // check if all parallel streams have joined // Nodes that are removed from the dependency set via API hipStreamUpdateCaptureDependencies do // not result in hipErrorStreamCaptureUnjoined // add temporary node to check if all parallel streams have joined hipGraphNode_t pGraphNode; pGraphNode = new hipGraphEmptyNode(); hipError_t status = ihipGraphAddNode(pGraphNode, s->GetCaptureGraph(), s->GetLastCapturedNodes().data(), s->GetLastCapturedNodes().size()); if (s->GetCaptureGraph()->GetLeafNodeCount() > 1) { std::vector leafNodes = s->GetCaptureGraph()->GetLeafNodes(); std::unordered_set nodes = s->GetCaptureGraph()->GetManualNodesDuringCapture(); for (auto node : nodes) { const auto& fnode = std::find(leafNodes.begin(), leafNodes.end(), node); if (fnode != leafNodes.end()) { leafNodes.erase(fnode); } } const std::vector& removedDepNodes = s->GetRemovedDependencies(); bool foundInRemovedDep = false; for (auto leafNode : leafNodes) { for (auto node : removedDepNodes) { if (node == leafNode) { foundInRemovedDep = true; } } } // remove temporary node s->GetCaptureGraph()->RemoveNode(pGraphNode); s->GetCaptureGraph()->RemoveManualNodesDuringCapture(); if (leafNodes.size() > 1 && foundInRemovedDep == false) { return hipErrorStreamCaptureUnjoined; } } else { // remove temporary node s->GetCaptureGraph()->RemoveNode(pGraphNode); } *pGraph = s->GetCaptureGraph(); // end capture on all streams/events part of graph capture return s->EndCapture(); } hipError_t hipStreamEndCapture(hipStream_t stream, hipGraph_t* pGraph) { HIP_INIT_API(hipStreamEndCapture, stream, pGraph); HIP_RETURN_DURATION(hipStreamEndCapture_common(stream, pGraph)); } hipError_t hipStreamEndCapture_spt(hipStream_t stream, hipGraph_t* pGraph) { HIP_INIT_API(hipStreamEndCapture, stream, pGraph); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN_DURATION(hipStreamEndCapture_common(stream, pGraph)); } hipError_t hipGraphCreate(hipGraph_t* pGraph, unsigned int flags) { HIP_INIT_API(hipGraphCreate, pGraph, flags); if ((pGraph == nullptr) || (flags != 0)) { HIP_RETURN(hipErrorInvalidValue); } *pGraph = new ihipGraph(hip::getCurrentDevice()); HIP_RETURN(hipSuccess); } hipError_t hipGraphDestroy(hipGraph_t graph) { HIP_INIT_API(hipGraphDestroy, graph); if (graph == nullptr) { HIP_RETURN(hipErrorInvalidValue); } // if graph is not valid its destroyed already if (!ihipGraph::isGraphValid(graph)) { HIP_RETURN(hipErrorIllegalState); } delete graph; HIP_RETURN(hipSuccess); } hipError_t hipGraphAddKernelNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipKernelNodeParams* pNodeParams) { HIP_INIT_API(hipGraphAddKernelNode, pGraphNode, graph, pDependencies, numDependencies, pNodeParams); if (pGraphNode == nullptr || graph == nullptr || pNodeParams == nullptr || (numDependencies > 0 && pDependencies == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION(ihipGraphAddKernelNode(pGraphNode, graph, pDependencies, numDependencies, pNodeParams, false)); } hipError_t hipGraphAddMemcpyNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipMemcpy3DParms* pCopyParams) { HIP_INIT_API(hipGraphAddMemcpyNode, pGraphNode, graph, pDependencies, numDependencies, pCopyParams); HIP_RETURN_DURATION(ihipGraphAddMemcpyNode(pGraphNode, graph, pDependencies, numDependencies, pCopyParams, false)); } hipError_t hipGraphAddMemcpyNode1D(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dst, const void* src, size_t count, hipMemcpyKind kind) { HIP_INIT_API(hipGraphAddMemcpyNode1D, pGraphNode, graph, pDependencies, numDependencies, dst, src, count, kind); if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION(ihipGraphAddMemcpyNode1D(pGraphNode, graph, pDependencies, numDependencies, dst, src, count, kind, false)); } hipError_t hipGraphMemcpyNodeSetParams1D(hipGraphNode_t node, void* dst, const void* src, size_t count, hipMemcpyKind kind) { HIP_INIT_API(hipGraphMemcpyNodeSetParams1D, node, dst, src, count, kind); if (!hipGraphNode::isNodeValid(node) || dst == nullptr || src == nullptr || count == 0 || src == dst) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(node)->SetParams(dst, src, count, kind)); } hipError_t hipGraphExecMemcpyNodeSetParams1D(hipGraphExec_t hGraphExec, hipGraphNode_t node, void* dst, const void* src, size_t count, hipMemcpyKind kind) { HIP_INIT_API(hipGraphExecMemcpyNodeSetParams1D, hGraphExec, node, dst, src, count, kind); if (hGraphExec == nullptr || !hipGraphNode::isNodeValid(node) || dst == nullptr || src == nullptr || count == 0 || src == dst) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(node); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(clonedNode)->SetParams(dst, src, count, kind)); } hipError_t hipGraphAddMemsetNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipMemsetParams* pMemsetParams) { HIP_INIT_API(hipGraphAddMemsetNode, pGraphNode, graph, pDependencies, numDependencies, pMemsetParams); if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION(ihipGraphAddMemsetNode(pGraphNode, graph, pDependencies, numDependencies, pMemsetParams, false)); } hipError_t hipGraphAddEmptyNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies) { HIP_INIT_API(hipGraphAddEmptyNode, pGraphNode, graph, pDependencies, numDependencies); if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } *pGraphNode = new hipGraphEmptyNode(); hipError_t status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, false); HIP_RETURN(status); } hipError_t hipGraphAddChildGraphNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipGraph_t childGraph) { HIP_INIT_API(hipGraphAddChildGraphNode, pGraphNode, pDependencies, numDependencies, childGraph); if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr) || childGraph == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *pGraphNode = new hipChildGraphNode(childGraph); hipError_t status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, false); HIP_RETURN(status); } hipError_t ihipGraphInstantiate(hipGraphExec_t* pGraphExec, hipGraph_t graph, uint64_t flags = 0) { if (pGraphExec == nullptr || graph == nullptr) { return hipErrorInvalidValue; } if (graph->IsGraphInstantiated() == true) { for (auto node : graph->GetNodes()) { if ((node->GetType() == hipGraphNodeTypeMemAlloc) || (node->GetType() == hipGraphNodeTypeMemFree)) { return hipErrorNotSupported; } } } std::unordered_map clonedNodes; hipGraph_t clonedGraph = graph->clone(clonedNodes); if (clonedGraph == nullptr) { return hipErrorInvalidValue; } std::vector> parallelLists; std::unordered_map> nodeWaitLists; std::unordered_set graphExeUserObj; clonedGraph->GetRunList(parallelLists, nodeWaitLists); std::vector graphNodes; if (false == clonedGraph->TopologicalOrder(graphNodes)) { return hipErrorInvalidValue; } clonedGraph->GetUserObjs(graphExeUserObj); *pGraphExec = new hipGraphExec(graphNodes, parallelLists, nodeWaitLists, clonedNodes, graphExeUserObj, flags); if (*pGraphExec != nullptr) { graph->SetGraphInstantiated(true); return (*pGraphExec)->Init(); } else { return hipErrorOutOfMemory; } } hipError_t hipGraphInstantiate(hipGraphExec_t* pGraphExec, hipGraph_t graph, hipGraphNode_t* pErrorNode, char* pLogBuffer, size_t bufferSize) { HIP_INIT_API(hipGraphInstantiate, pGraphExec, graph); HIP_RETURN_DURATION(ihipGraphInstantiate(pGraphExec, graph)); } hipError_t hipGraphInstantiateWithFlags(hipGraphExec_t* pGraphExec, hipGraph_t graph, unsigned long long flags = 0) { HIP_INIT_API(hipGraphInstantiateWithFlags, pGraphExec, graph, flags); if (pGraphExec == nullptr || graph == nullptr) { HIP_RETURN(hipErrorInvalidValue); } // invalid flag check if (flags != 0 && flags != hipGraphInstantiateFlagAutoFreeOnLaunch) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION(ihipGraphInstantiate(pGraphExec, graph, flags)); } hipError_t hipGraphExecDestroy(hipGraphExec_t pGraphExec) { HIP_INIT_API(hipGraphExecDestroy, pGraphExec); if (pGraphExec == nullptr) { HIP_RETURN(hipErrorInvalidValue); } delete pGraphExec; HIP_RETURN(hipSuccess); } hipError_t ihipGraphLaunch(hipGraphExec_t graphExec, hipStream_t stream) { if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } return graphExec->Run(stream); } hipError_t hipGraphLaunch_common(hipGraphExec_t graphExec, hipStream_t stream) { if (graphExec == nullptr || !hipGraphExec::isGraphExecValid(graphExec)) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } return ihipGraphLaunch(graphExec, stream); } hipError_t hipGraphLaunch(hipGraphExec_t graphExec, hipStream_t stream) { HIP_INIT_API(hipGraphLaunch, graphExec, stream); HIP_RETURN_DURATION(hipGraphLaunch_common(graphExec, stream)); } hipError_t hipGraphLaunch_spt(hipGraphExec_t graphExec, hipStream_t stream) { HIP_INIT_API(hipGraphLaunch, graphExec, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN_DURATION(hipGraphLaunch_common(graphExec, stream)); } hipError_t hipGraphGetNodes(hipGraph_t graph, hipGraphNode_t* nodes, size_t* numNodes) { HIP_INIT_API(hipGraphGetNodes, graph, nodes, numNodes); if (graph == nullptr || numNodes == nullptr) { HIP_RETURN(hipErrorInvalidValue); } std::vector graphNodes; if (false == graph->TopologicalOrder(graphNodes)) { HIP_RETURN(hipErrorInvalidValue); } if (nodes == nullptr) { *numNodes = graphNodes.size(); HIP_RETURN(hipSuccess); } else if (*numNodes <= graphNodes.size()) { for (int i = 0; i < *numNodes; i++) { nodes[i] = graphNodes[i]; } } else { for (int i = 0; i < graphNodes.size(); i++) { nodes[i] = graphNodes[i]; } for (int i = graphNodes.size(); i < *numNodes; i++) { nodes[i] = nullptr; } *numNodes = graphNodes.size(); } HIP_RETURN(hipSuccess); } hipError_t hipGraphGetRootNodes(hipGraph_t graph, hipGraphNode_t* pRootNodes, size_t* pNumRootNodes) { HIP_INIT_API(hipGraphGetRootNodes, graph, pRootNodes, pNumRootNodes); if (graph == nullptr || pNumRootNodes == nullptr) { HIP_RETURN(hipErrorInvalidValue); } const std::vector nodes = graph->GetRootNodes(); if (pRootNodes == nullptr) { *pNumRootNodes = nodes.size(); HIP_RETURN(hipSuccess); } else if (*pNumRootNodes <= nodes.size()) { for (int i = 0; i < *pNumRootNodes; i++) { pRootNodes[i] = nodes[i]; } } else { for (int i = 0; i < nodes.size(); i++) { pRootNodes[i] = nodes[i]; } for (int i = nodes.size(); i < *pNumRootNodes; i++) { pRootNodes[i] = nullptr; } *pNumRootNodes = nodes.size(); } HIP_RETURN(hipSuccess); } hipError_t hipGraphKernelNodeGetParams(hipGraphNode_t node, hipKernelNodeParams* pNodeParams) { HIP_INIT_API(hipGraphKernelNodeGetParams, node, pNodeParams); if (!hipGraphNode::isNodeValid(node) || pNodeParams == nullptr) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(node)->GetParams(pNodeParams); HIP_RETURN(hipSuccess); } hipError_t hipGraphKernelNodeSetParams(hipGraphNode_t node, const hipKernelNodeParams* pNodeParams) { HIP_INIT_API(hipGraphKernelNodeSetParams, node, pNodeParams); if (!hipGraphNode::isNodeValid(node) || pNodeParams == nullptr || pNodeParams->func == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(node)->SetParams(pNodeParams)); } hipError_t hipGraphMemcpyNodeGetParams(hipGraphNode_t node, hipMemcpy3DParms* pNodeParams) { HIP_INIT_API(hipGraphMemcpyNodeGetParams, node, pNodeParams); if (!hipGraphNode::isNodeValid(node) || pNodeParams == nullptr) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(node)->GetParams(pNodeParams); HIP_RETURN(hipSuccess); } hipError_t hipGraphKernelNodeSetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, const hipKernelNodeAttrValue* value) { HIP_INIT_API(hipGraphKernelNodeSetAttribute, hNode, attr, value); if (hNode == nullptr || value == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (attr != hipKernelNodeAttributeAccessPolicyWindow && attr != hipKernelNodeAttributeCooperative) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(hNode)->SetAttrParams(attr, value)); } hipError_t hipGraphKernelNodeGetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, hipKernelNodeAttrValue* value) { HIP_INIT_API(hipGraphKernelNodeGetAttribute, hNode, attr, value); if (hNode == nullptr || value == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (attr != hipKernelNodeAttributeAccessPolicyWindow && attr != hipKernelNodeAttributeCooperative) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(hNode)->GetAttrParams(attr, value)); } hipError_t hipGraphMemcpyNodeSetParams(hipGraphNode_t node, const hipMemcpy3DParms* pNodeParams) { HIP_INIT_API(hipGraphMemcpyNodeSetParams, node, pNodeParams); if (!hipGraphNode::isNodeValid(node) || pNodeParams == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(node)->SetParams(pNodeParams)); } hipError_t hipGraphExecMemcpyNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, hipMemcpy3DParms* pNodeParams) { HIP_INIT_API(hipGraphExecMemcpyNodeSetParams, hGraphExec, node, pNodeParams); if (hGraphExec == nullptr || !hipGraphNode::isNodeValid(node)) { HIP_RETURN(hipErrorInvalidValue); } if (ihipMemcpy3D_validate(pNodeParams) != hipSuccess) { HIP_RETURN(hipErrorInvalidValue); } // Check if pNodeParams passed is a empty struct if (((pNodeParams->srcArray == 0) && (pNodeParams->srcPtr.ptr == nullptr)) || ((pNodeParams->dstArray == 0) && (pNodeParams->dstPtr.ptr == nullptr))) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(node); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(clonedNode)->SetParams(pNodeParams)); } hipError_t hipGraphMemsetNodeGetParams(hipGraphNode_t node, hipMemsetParams* pNodeParams) { HIP_INIT_API(hipGraphMemsetNodeGetParams, node, pNodeParams); if (!hipGraphNode::isNodeValid(node) || pNodeParams == nullptr) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(node)->GetParams(pNodeParams); HIP_RETURN(hipSuccess); } hipError_t hipGraphMemsetNodeSetParams(hipGraphNode_t node, const hipMemsetParams* pNodeParams) { HIP_INIT_API(hipGraphMemsetNodeSetParams, node, pNodeParams); if (!hipGraphNode::isNodeValid(node) || pNodeParams == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (pNodeParams->height > 1 && pNodeParams->pitch < (pNodeParams->width * pNodeParams->elementSize)) { return hipErrorInvalidValue; } HIP_RETURN(reinterpret_cast(node)->SetParams(pNodeParams)); } hipError_t hipGraphExecMemsetNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, const hipMemsetParams* pNodeParams) { HIP_INIT_API(hipGraphExecMemsetNodeSetParams, hGraphExec, node, pNodeParams); if (hGraphExec == nullptr || !hipGraphNode::isNodeValid(node) || pNodeParams == nullptr || pNodeParams->dst == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (ihipGraphMemsetParams_validate(pNodeParams) != hipSuccess) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(node); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(clonedNode)->SetParams(pNodeParams, true)); } hipError_t hipGraphAddDependencies(hipGraph_t graph, const hipGraphNode_t* from, const hipGraphNode_t* to, size_t numDependencies) { HIP_INIT_API(hipGraphAddDependencies, graph, from, to, numDependencies); if (graph == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (numDependencies == 0) { HIP_RETURN(hipSuccess); } else if (from == nullptr || to == nullptr) { HIP_RETURN(hipErrorInvalidValue); } for (size_t i = 0; i < numDependencies; i++) { // When the same node is specified for both from and to if (from[i] == nullptr || to[i] == nullptr || from[i] == to[i] || !hipGraphNode::isNodeValid(to[i]) || !hipGraphNode::isNodeValid(from[i]) || // making sure the nodes blong to the graph to[i]->GetParentGraph() != graph || from[i]->GetParentGraph() != graph) { HIP_RETURN(hipErrorInvalidValue); } } for (size_t i = 0; i < numDependencies; i++) { // When the same edge added from->to return invalid value const std::vector& edges = from[i]->GetEdges(); for (auto edge : edges) { if (edge == to[i]) { HIP_RETURN(hipErrorInvalidValue); } } from[i]->AddEdge(to[i]); } HIP_RETURN(hipSuccess); } hipError_t hipGraphExecKernelNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, const hipKernelNodeParams* pNodeParams) { HIP_INIT_API(hipGraphExecKernelNodeSetParams, hGraphExec, node, pNodeParams); if (hGraphExec == nullptr || !hipGraphNode::isNodeValid(node) || pNodeParams == nullptr || pNodeParams->func == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(node); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(clonedNode)->SetParams(pNodeParams)); } hipError_t hipGraphChildGraphNodeGetGraph(hipGraphNode_t node, hipGraph_t* pGraph) { HIP_INIT_API(hipGraphChildGraphNodeGetGraph, node, pGraph); if (!hipGraphNode::isNodeValid(node) || pGraph == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *pGraph = reinterpret_cast(node)->GetChildGraph(); if (*pGraph == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipGraphExecChildGraphNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, hipGraph_t childGraph) { HIP_INIT_API(hipGraphExecChildGraphNodeSetParams, hGraphExec, node, childGraph); if (hGraphExec == nullptr || !hipGraphNode::isNodeValid(node) || childGraph == nullptr || !ihipGraph::isGraphValid(childGraph)) { HIP_RETURN(hipErrorInvalidValue); } if (childGraph == node->GetParentGraph()) { HIP_RETURN(hipErrorUnknown); } // Validate whether the topology of node and childGraph matches std::vector childGraphNodes1; node->TopologicalOrder(childGraphNodes1); std::vector childGraphNodes2; childGraph->TopologicalOrder(childGraphNodes2); if (childGraphNodes1.size() != childGraphNodes2.size()) { HIP_RETURN(hipErrorUnknown); } // Validate if the node insertion order matches else { for (std::vector::size_type i = 0; i != childGraphNodes1.size(); i++) { if (childGraphNodes1[i]->GetType() != childGraphNodes2[i]->GetType()) { HIP_RETURN(hipErrorUnknown); } } } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(node); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(clonedNode)->SetParams(childGraph)); } hipError_t hipStreamGetCaptureInfo_common(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus, unsigned long long* pId) { if (pCaptureStatus == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } if (hip::Stream::StreamCaptureBlocking() == true && stream == nullptr) { return hipErrorStreamCaptureImplicit; } if (stream == nullptr) { *pCaptureStatus = hipStreamCaptureStatusNone; return hipSuccess; } hip::Stream* s = reinterpret_cast(stream); *pCaptureStatus = s->GetCaptureStatus(); if (*pCaptureStatus == hipStreamCaptureStatusActive && pId != nullptr) { *pId = s->GetCaptureID(); } return hipSuccess; } hipError_t hipStreamGetCaptureInfo(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus, unsigned long long* pId) { HIP_INIT_API(hipStreamGetCaptureInfo, stream, pCaptureStatus, pId); HIP_RETURN(hipStreamGetCaptureInfo_common(stream, pCaptureStatus, pId)); } hipError_t hipStreamGetCaptureInfo_spt(hipStream_t stream, hipStreamCaptureStatus* pCaptureStatus, unsigned long long* pId) { HIP_INIT_API(hipStreamGetCaptureInfo, stream, pCaptureStatus, pId); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamGetCaptureInfo_common(stream, pCaptureStatus, pId)); } hipError_t hipStreamGetCaptureInfo_v2_common(hipStream_t stream, hipStreamCaptureStatus* captureStatus_out, unsigned long long* id_out, hipGraph_t* graph_out, const hipGraphNode_t** dependencies_out, size_t* numDependencies_out) { if (captureStatus_out == nullptr) { return hipErrorInvalidValue; } if (hip::Stream::StreamCaptureBlocking() == true && stream == nullptr) { return hipErrorStreamCaptureImplicit; } if (stream == nullptr) { *captureStatus_out = hipStreamCaptureStatusNone; return hipSuccess; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* s = reinterpret_cast(stream); *captureStatus_out = s->GetCaptureStatus(); if (*captureStatus_out == hipStreamCaptureStatusActive) { if (id_out != nullptr) { *id_out = s->GetCaptureID(); } if (graph_out != nullptr) { *graph_out = s->GetCaptureGraph(); } if (dependencies_out != nullptr) { *dependencies_out = s->GetLastCapturedNodes().data(); } if (numDependencies_out != nullptr) { *numDependencies_out = s->GetLastCapturedNodes().size(); } } return hipSuccess; } hipError_t hipStreamGetCaptureInfo_v2(hipStream_t stream, hipStreamCaptureStatus* captureStatus_out, unsigned long long* id_out, hipGraph_t* graph_out, const hipGraphNode_t** dependencies_out, size_t* numDependencies_out) { HIP_INIT_API(hipStreamGetCaptureInfo_v2, stream, captureStatus_out, id_out, graph_out, dependencies_out, numDependencies_out); HIP_RETURN(hipStreamGetCaptureInfo_v2_common(stream, captureStatus_out, id_out, graph_out, dependencies_out, numDependencies_out)); } hipError_t hipStreamGetCaptureInfo_v2_spt(hipStream_t stream, hipStreamCaptureStatus* captureStatus_out, unsigned long long* id_out, hipGraph_t* graph_out, const hipGraphNode_t** dependencies_out, size_t* numDependencies_out) { HIP_INIT_API(hipStreamGetCaptureInfo_v2, stream, captureStatus_out, id_out, graph_out, dependencies_out, numDependencies_out); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamGetCaptureInfo_v2_common(stream, captureStatus_out, id_out, graph_out, dependencies_out, numDependencies_out)); } hipError_t hipStreamUpdateCaptureDependencies(hipStream_t stream, hipGraphNode_t* dependencies, size_t numDependencies, unsigned int flags) { HIP_INIT_API(hipStreamUpdateCaptureDependencies, stream, dependencies, numDependencies, flags); if (!hip::isValid(stream)) { HIP_RETURN(hipErrorContextIsDestroyed); } hip::Stream* s = reinterpret_cast(stream); if (s->GetCaptureStatus() == hipStreamCaptureStatusNone) { HIP_RETURN(hipErrorIllegalState); } if ((s->GetCaptureGraph()->GetNodeCount() < numDependencies) || (numDependencies > 0 && dependencies == nullptr) || (flags != 0 && flags != hipStreamAddCaptureDependencies && flags != hipStreamSetCaptureDependencies)) { HIP_RETURN(hipErrorInvalidValue); } std::vector depNodes; const std::vector& graphNodes = s->GetCaptureGraph()->GetNodes(); for (int i = 0; i < numDependencies; i++) { if ((dependencies[i] == nullptr) || std::find(std::begin(graphNodes), std::end(graphNodes), dependencies[i]) == std::end(graphNodes)) { HIP_RETURN(hipErrorInvalidValue); } depNodes.push_back(dependencies[i]); } if (flags == hipStreamAddCaptureDependencies) { s->AddCrossCapturedNode(depNodes); } else if (flags == hipStreamSetCaptureDependencies) { bool replace = true; s->AddCrossCapturedNode(depNodes, replace); } HIP_RETURN(hipSuccess); } hipError_t hipGraphRemoveDependencies(hipGraph_t graph, const hipGraphNode_t* from, const hipGraphNode_t* to, size_t numDependencies) { HIP_INIT_API(hipGraphRemoveDependencies, graph, from, to, numDependencies); if (graph == nullptr || (numDependencies > 0 && (from == nullptr || to == nullptr))) { HIP_RETURN(hipErrorInvalidValue); } for (size_t i = 0; i < numDependencies; i++) { if (to[i]->GetParentGraph() != graph || from[i]->GetParentGraph() != graph || from[i]->RemoveUpdateEdge(to[i]) == false) { HIP_RETURN(hipErrorInvalidValue); } } HIP_RETURN(hipSuccess); } hipError_t hipGraphGetEdges(hipGraph_t graph, hipGraphNode_t* from, hipGraphNode_t* to, size_t* numEdges) { HIP_INIT_API(hipGraphGetEdges, graph, from, to, numEdges); if (graph == nullptr || numEdges == nullptr || (from == nullptr && to != nullptr) || (to == nullptr && from != nullptr)) { HIP_RETURN(hipErrorInvalidValue); } const std::vector> edges = graph->GetEdges(); // returns only the number of edges in numEdges when from and to are null if (from == nullptr && to == nullptr) { *numEdges = edges.size(); HIP_RETURN(hipSuccess); } else if (*numEdges <= edges.size()) { for (int i = 0; i < *numEdges; i++) { from[i] = edges[i].first; to[i] = edges[i].second; } } else { for (int i = 0; i < edges.size(); i++) { from[i] = edges[i].first; to[i] = edges[i].second; } // If numEdges > actual number of edges, the remaining entries in from and to will be set to // NULL for (int i = edges.size(); i < *numEdges; i++) { from[i] = nullptr; to[i] = nullptr; } *numEdges = edges.size(); } HIP_RETURN(hipSuccess); } hipError_t hipGraphNodeGetDependencies(hipGraphNode_t node, hipGraphNode_t* pDependencies, size_t* pNumDependencies) { HIP_INIT_API(hipGraphNodeGetDependencies, node, pDependencies, pNumDependencies); if (!hipGraphNode::isNodeValid(node) || pNumDependencies == nullptr) { HIP_RETURN(hipErrorInvalidValue); } const std::vector& dependencies = node->GetDependencies(); if (pDependencies == NULL) { *pNumDependencies = dependencies.size(); HIP_RETURN(hipSuccess); } else if (*pNumDependencies <= dependencies.size()) { for (int i = 0; i < *pNumDependencies; i++) { pDependencies[i] = dependencies[i]; } } else { for (int i = 0; i < dependencies.size(); i++) { pDependencies[i] = dependencies[i]; } // pNumDependencies > actual number of dependencies, the remaining entries in pDependencies will // be set to NULL for (int i = dependencies.size(); i < *pNumDependencies; i++) { pDependencies[i] = nullptr; } *pNumDependencies = dependencies.size(); } HIP_RETURN(hipSuccess); } hipError_t hipGraphNodeGetDependentNodes(hipGraphNode_t node, hipGraphNode_t* pDependentNodes, size_t* pNumDependentNodes) { HIP_INIT_API(hipGraphNodeGetDependentNodes, node, pDependentNodes, pNumDependentNodes); if (!hipGraphNode::isNodeValid(node) || pNumDependentNodes == nullptr) { HIP_RETURN(hipErrorInvalidValue); } const std::vector& dependents = node->GetEdges(); if (pDependentNodes == NULL) { *pNumDependentNodes = dependents.size(); HIP_RETURN(hipSuccess); } else if (*pNumDependentNodes <= dependents.size()) { for (int i = 0; i < *pNumDependentNodes; i++) { pDependentNodes[i] = dependents[i]; } } else { for (int i = 0; i < dependents.size(); i++) { pDependentNodes[i] = dependents[i]; } // pNumDependentNodes > actual number of dependents, the remaining entries in pDependentNodes // will be set to NULL for (int i = dependents.size(); i < *pNumDependentNodes; i++) { pDependentNodes[i] = nullptr; } *pNumDependentNodes = dependents.size(); } HIP_RETURN(hipSuccess); } hipError_t hipGraphNodeGetType(hipGraphNode_t node, hipGraphNodeType* pType) { HIP_INIT_API(hipGraphNodeGetType, node, pType); if (!hipGraphNode::isNodeValid(node) || pType == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *pType = node->GetType(); HIP_RETURN(hipSuccess); } hipError_t hipGraphDestroyNode(hipGraphNode_t node) { HIP_INIT_API(hipGraphDestroyNode, node); if (!hipGraphNode::isNodeValid(node)) { HIP_RETURN(hipErrorInvalidValue); } // First remove all the edges both incoming and outgoing from node. for(auto& edge : node->GetEdges()) { node->RemoveUpdateEdge(edge); } const std::vector& dependencies = node->GetDependencies(); for(auto& parent: dependencies) { parent->RemoveEdge(node); parent->SetOutDegree(parent->GetOutDegree() - 1); } // Remove the node from graph. node->GetParentGraph()->RemoveNode(node); HIP_RETURN(hipSuccess); } hipError_t hipGraphClone(hipGraph_t* pGraphClone, hipGraph_t originalGraph) { HIP_INIT_API(hipGraphClone, pGraphClone, originalGraph); if (originalGraph == nullptr || pGraphClone == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (!ihipGraph::isGraphValid(originalGraph)) { HIP_RETURN(hipErrorInvalidValue); } *pGraphClone = originalGraph->clone(); HIP_RETURN(hipSuccess); } hipError_t hipGraphNodeFindInClone(hipGraphNode_t* pNode, hipGraphNode_t originalNode, hipGraph_t clonedGraph) { HIP_INIT_API(hipGraphNodeFindInClone, pNode, originalNode, clonedGraph); if (pNode == nullptr || originalNode == nullptr || clonedGraph == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (clonedGraph->getOriginalGraph() == nullptr) { HIP_RETURN(hipErrorInvalidValue); } for (auto node : clonedGraph->GetNodes()) { if (node->GetID() == originalNode->GetID()) { *pNode = node; HIP_RETURN(hipSuccess); } } HIP_RETURN(hipErrorInvalidValue); } hipError_t hipGraphAddMemcpyNodeFromSymbol(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dst, const void* symbol, size_t count, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipGraphAddMemcpyNodeFromSymbol, pGraphNode, graph, pDependencies, numDependencies, dst, symbol, count, offset, kind); if (graph == nullptr || pGraphNode == nullptr || count == 0 || (numDependencies > 0 && pDependencies == nullptr) || dst == nullptr || !ihipGraph::isGraphValid(graph)) { HIP_RETURN(hipErrorInvalidValue); } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; hipError_t status = ihipMemcpySymbol_validate(symbol, count, offset, sym_size, device_ptr); if (status != hipSuccess) { HIP_RETURN(status); } *pGraphNode = new hipGraphMemcpyNodeFromSymbol(dst, symbol, count, offset, kind); status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, false); HIP_RETURN(status); } hipError_t hipGraphMemcpyNodeSetParamsFromSymbol(hipGraphNode_t node, void* dst, const void* symbol, size_t count, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipGraphMemcpyNodeSetParamsFromSymbol, node, dst, symbol, count, offset, kind); if (symbol == nullptr) { HIP_RETURN(hipErrorInvalidSymbol); } if (!hipGraphNode::isNodeValid(node) || dst == nullptr || count == 0 || symbol == dst) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(node)->SetParams(dst, symbol, count, offset, kind)); } hipError_t hipGraphExecMemcpyNodeSetParamsFromSymbol(hipGraphExec_t hGraphExec, hipGraphNode_t node, void* dst, const void* symbol, size_t count, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipGraphExecMemcpyNodeSetParamsFromSymbol, hGraphExec, node, dst, symbol, count, offset, kind); if (symbol == nullptr) { HIP_RETURN(hipErrorInvalidSymbol); } if (hGraphExec == nullptr || !hipGraphNode::isNodeValid(node) || dst == nullptr || count == 0 || symbol == dst) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(node); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } constexpr bool kCheckDeviceIsSame = true; HIP_RETURN(reinterpret_cast(clonedNode) ->SetParams(dst, symbol, count, offset, kind, kCheckDeviceIsSame)); } hipError_t hipGraphAddMemcpyNodeToSymbol(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const void* symbol, const void* src, size_t count, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipGraphAddMemcpyNodeToSymbol, pGraphNode, graph, pDependencies, numDependencies, symbol, src, count, offset, kind); if (pGraphNode == nullptr || graph == nullptr || src == nullptr || count == 0 || !ihipGraph::isGraphValid(graph) || (pDependencies == nullptr && numDependencies > 0)) { HIP_RETURN(hipErrorInvalidValue); } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; hipError_t status = ihipMemcpySymbol_validate(symbol, count, offset, sym_size, device_ptr); if (status != hipSuccess) { HIP_RETURN(status); } *pGraphNode = new hipGraphMemcpyNodeToSymbol(symbol, src, count, offset, kind); if (*pGraphNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, false); HIP_RETURN(status); } hipError_t hipGraphMemcpyNodeSetParamsToSymbol(hipGraphNode_t node, const void* symbol, const void* src, size_t count, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipGraphMemcpyNodeSetParamsToSymbol, symbol, src, count, offset, kind); if (symbol == nullptr) { HIP_RETURN(hipErrorInvalidSymbol); } if (!hipGraphNode::isNodeValid(node) || src == nullptr || count == 0 || symbol == src) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(node)->SetParams(symbol, src, count, offset, kind)); } hipError_t hipGraphExecMemcpyNodeSetParamsToSymbol(hipGraphExec_t hGraphExec, hipGraphNode_t node, const void* symbol, const void* src, size_t count, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipGraphExecMemcpyNodeSetParamsToSymbol, hGraphExec, node, symbol, src, count, offset, kind); if (symbol == nullptr) { HIP_RETURN(hipErrorInvalidSymbol); } if (hGraphExec == nullptr || src == nullptr || !hipGraphNode::isNodeValid(node) || count == 0 || src == symbol) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(node); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } constexpr bool kCheckDeviceIsSame = true; HIP_RETURN(reinterpret_cast(clonedNode) ->SetParams(symbol, src, count, offset, kind, kCheckDeviceIsSame)); } hipError_t hipGraphAddEventRecordNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipEvent_t event) { HIP_INIT_API(hipGraphAddEventRecordNode, pGraphNode, graph, pDependencies, numDependencies, event); if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr) || event == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *pGraphNode = new hipGraphEventRecordNode(event); hipError_t status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, false); HIP_RETURN(status); } hipError_t hipGraphEventRecordNodeGetEvent(hipGraphNode_t node, hipEvent_t* event_out) { HIP_INIT_API(hipGraphEventRecordNodeGetEvent, node, event_out); if (!hipGraphNode::isNodeValid(node) || event_out == nullptr || node->GetType() != hipGraphNodeTypeEventRecord) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(node)->GetParams(event_out); HIP_RETURN(hipSuccess); } hipError_t hipGraphEventRecordNodeSetEvent(hipGraphNode_t node, hipEvent_t event) { HIP_INIT_API(hipGraphEventRecordNodeSetEvent, node, event); if (!hipGraphNode::isNodeValid(node) || event == nullptr || node->GetType() != hipGraphNodeTypeEventRecord) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(node)->SetParams(event)); } hipError_t hipGraphExecEventRecordNodeSetEvent(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, hipEvent_t event) { HIP_INIT_API(hipGraphExecEventRecordNodeSetEvent, hGraphExec, hNode, event); if (hGraphExec == nullptr || hNode == nullptr || event == nullptr || hNode->GetType() != hipGraphNodeTypeEventRecord) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(hNode); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(clonedNode)->SetParams(event)); } hipError_t hipGraphAddEventWaitNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipEvent_t event) { HIP_INIT_API(hipGraphAddEventWaitNode, pGraphNode, graph, pDependencies, numDependencies, event); if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr) || event == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *pGraphNode = new hipGraphEventWaitNode(event); hipError_t status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, false); HIP_RETURN(status); } hipError_t hipGraphEventWaitNodeGetEvent(hipGraphNode_t node, hipEvent_t* event_out) { HIP_INIT_API(hipGraphEventWaitNodeGetEvent, node, event_out); if (!hipGraphNode::isNodeValid(node) || event_out == nullptr || node->GetType() != hipGraphNodeTypeWaitEvent) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(node)->GetParams(event_out); HIP_RETURN(hipSuccess); } hipError_t hipGraphEventWaitNodeSetEvent(hipGraphNode_t node, hipEvent_t event) { HIP_INIT_API(hipGraphEventWaitNodeSetEvent, node, event); if (!hipGraphNode::isNodeValid(node) || event == nullptr || node->GetType() != hipGraphNodeTypeWaitEvent) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(node)->SetParams(event)); } hipError_t hipGraphExecEventWaitNodeSetEvent(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, hipEvent_t event) { HIP_INIT_API(hipGraphExecEventWaitNodeSetEvent, hGraphExec, hNode, event); if (hGraphExec == nullptr || hNode == nullptr || event == nullptr || (hNode->GetType() != hipGraphNodeTypeWaitEvent)) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(hNode); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(clonedNode)->SetParams(event)); } hipError_t hipGraphAddHostNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, const hipHostNodeParams* pNodeParams) { HIP_INIT_API(hipGraphAddHostNode, pGraphNode, graph, pDependencies, numDependencies, pNodeParams); if (pGraphNode == nullptr || graph == nullptr || pNodeParams == nullptr || (numDependencies > 0 && pDependencies == nullptr) || pNodeParams->fn == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *pGraphNode = new hipGraphHostNode(pNodeParams); hipError_t status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies, false); HIP_RETURN(status); } hipError_t hipGraphHostNodeGetParams(hipGraphNode_t node, hipHostNodeParams* pNodeParams) { HIP_INIT_API(hipGraphHostNodeGetParams, node, pNodeParams); if (!hipGraphNode::isNodeValid(node) || pNodeParams == nullptr) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(node)->GetParams(pNodeParams); HIP_RETURN(hipSuccess); } hipError_t hipGraphHostNodeSetParams(hipGraphNode_t node, const hipHostNodeParams* pNodeParams) { HIP_INIT_API(hipGraphHostNodeSetParams, node, pNodeParams); if (pNodeParams == nullptr || pNodeParams->fn == nullptr || !hipGraphNode::isNodeValid(node)) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(node)->SetParams(pNodeParams)); } hipError_t hipGraphExecHostNodeSetParams(hipGraphExec_t hGraphExec, hipGraphNode_t node, const hipHostNodeParams* pNodeParams) { HIP_INIT_API(hipGraphExecHostNodeSetParams, hGraphExec, node, pNodeParams); if (hGraphExec == nullptr || pNodeParams == nullptr || pNodeParams->fn == nullptr || !hipGraphNode::isNodeValid(node)) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(node); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(clonedNode)->SetParams(pNodeParams)); } hipError_t hipGraphExecUpdate(hipGraphExec_t hGraphExec, hipGraph_t hGraph, hipGraphNode_t* hErrorNode_out, hipGraphExecUpdateResult* updateResult_out) { HIP_INIT_API(hipGraphExecUpdate, hGraphExec, hGraph, hErrorNode_out, updateResult_out); // parameter check if (hGraphExec == nullptr || hGraph == nullptr || hErrorNode_out == nullptr || updateResult_out == nullptr) { HIP_RETURN(hipErrorInvalidValue); } std::vector newGraphNodes; hGraph->TopologicalOrder(newGraphNodes); std::vector& oldGraphExecNodes = hGraphExec->GetNodes(); if (newGraphNodes.size() != oldGraphExecNodes.size()) { *updateResult_out = hipGraphExecUpdateErrorTopologyChanged; HIP_RETURN(hipErrorGraphExecUpdateFailure); } for (std::vector::size_type i = 0; i != newGraphNodes.size(); i++) { if (newGraphNodes[i]->GetType() == oldGraphExecNodes[i]->GetType()) { hipError_t status = oldGraphExecNodes[i]->SetParams(newGraphNodes[i]); if (status != hipSuccess) { *hErrorNode_out = newGraphNodes[i]; if (status == hipErrorInvalidDeviceFunction) { *updateResult_out = hipGraphExecUpdateErrorUnsupportedFunctionChange; } else if (status == hipErrorInvalidValue || status == hipErrorInvalidDevicePointer) { *updateResult_out = hipGraphExecUpdateErrorParametersChanged; } else { *updateResult_out = hipGraphExecUpdateErrorNotSupported; } HIP_RETURN(hipErrorGraphExecUpdateFailure); } } else { *hErrorNode_out = newGraphNodes[i]; *updateResult_out = hipGraphExecUpdateErrorNodeTypeChanged; HIP_RETURN(hipErrorGraphExecUpdateFailure); } } *updateResult_out = hipGraphExecUpdateSuccess; HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipGraphAddMemAllocNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipMemAllocNodeParams* pNodeParams) { HIP_INIT_API(hipGraphAddMemAllocNode, pGraphNode, graph, pDependencies, numDependencies, pNodeParams); if (pGraphNode == nullptr || graph == nullptr || (numDependencies > 0 && pDependencies == nullptr) || pNodeParams == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (pNodeParams->bytesize == 0 || pNodeParams->poolProps.allocType != hipMemAllocationTypePinned || pNodeParams->poolProps.location.type != hipMemLocationTypeDevice) { pNodeParams->dptr = nullptr; HIP_RETURN(hipErrorInvalidValue); } if (pNodeParams->poolProps.location.type == hipMemLocationTypeDevice) { if (pNodeParams->poolProps.location.id < 0 || pNodeParams->poolProps.location.id >= g_devices.size()) { HIP_RETURN(hipErrorInvalidValue); } } // Clear the pointer to allocated memory because it may contain stale/uninitialized data pNodeParams->dptr = nullptr; auto mem_alloc_node = new hipGraphMemAllocNode(pNodeParams); *pGraphNode = mem_alloc_node; auto status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies); // The address must be provided during the node creation time pNodeParams->dptr = (HIP_MEM_POOL_USE_VM) ? mem_alloc_node->ReserveAddress() : mem_alloc_node->Execute(); HIP_RETURN(status); } // ================================================================================================ hipError_t hipGraphMemAllocNodeGetParams(hipGraphNode_t node, hipMemAllocNodeParams* pNodeParams) { HIP_INIT_API(hipGraphMemAllocNodeGetParams, node, pNodeParams); if (node == nullptr || pNodeParams == nullptr || !hipGraphNode::isNodeValid(node) || node->GetType() != hipGraphNodeTypeMemAlloc) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(node)->GetParams(pNodeParams); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipGraphAddMemFreeNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dev_ptr) { HIP_INIT_API(hipGraphAddMemFreeNode, pGraphNode, graph, pDependencies, numDependencies, dev_ptr); if (pGraphNode == nullptr || graph == nullptr || ((numDependencies > 0 && pDependencies == nullptr) || (pDependencies != nullptr && numDependencies == 0)) || dev_ptr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } // Is memory passed to be free'd valid size_t offset = 0; auto memory = getMemoryObject(dev_ptr, offset); if (memory == nullptr) { if (HIP_MEM_POOL_USE_VM) { // When VM is on the address must be valid and may point to a VA object memory = amd::MemObjMap::FindVirtualMemObj(dev_ptr); } if (memory == nullptr) { HIP_RETURN(hipErrorInvalidValue); } } auto mem_free_node = new hipGraphMemFreeNode(dev_ptr); *pGraphNode = mem_free_node; auto status = ihipGraphAddNode(*pGraphNode, graph, pDependencies, numDependencies); HIP_RETURN(status); } // ================================================================================================ hipError_t hipGraphMemFreeNodeGetParams(hipGraphNode_t node, void* dev_ptr) { HIP_INIT_API(hipGraphMemFreeNodeGetParams, node, dev_ptr); if (node == nullptr || dev_ptr == nullptr || !hipGraphNode::isNodeValid(node) || node->GetType() != hipGraphNodeTypeMemFree) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(node)->GetParams(reinterpret_cast(dev_ptr)); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipDeviceGetGraphMemAttribute(int device, hipGraphMemAttributeType attr, void* value) { HIP_INIT_API(hipDeviceGetGraphMemAttribute, device, attr, value); if ((static_cast(device) >= g_devices.size()) || device < 0) { HIP_RETURN(hipErrorInvalidDevice); } if (value == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hipError_t result = hipErrorInvalidValue; switch (attr) { case hipGraphMemAttrUsedMemCurrent: result = g_devices[device]->GetGraphMemoryPool()->GetAttribute( hipMemPoolAttrUsedMemCurrent, value); break; case hipGraphMemAttrUsedMemHigh: result = g_devices[device]->GetGraphMemoryPool()->GetAttribute( hipMemPoolAttrUsedMemHigh, value); break; case hipGraphMemAttrReservedMemCurrent: result = g_devices[device]->GetGraphMemoryPool()->GetAttribute( hipMemPoolAttrReservedMemCurrent, value); break; case hipGraphMemAttrReservedMemHigh: result = g_devices[device]->GetGraphMemoryPool()->GetAttribute( hipMemPoolAttrReservedMemHigh, value); break; default: break; } HIP_RETURN(result); } // ================================================================================================ hipError_t hipDeviceSetGraphMemAttribute(int device, hipGraphMemAttributeType attr, void* value) { HIP_INIT_API(hipDeviceSetGraphMemAttribute, device, attr, value); if ((static_cast(device) >= g_devices.size()) || device < 0) { HIP_RETURN(hipErrorInvalidDevice); } if (value == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hipError_t result = hipErrorInvalidValue; switch (attr) { case hipGraphMemAttrUsedMemHigh: result = g_devices[device]->GetGraphMemoryPool()->SetAttribute( hipMemPoolAttrUsedMemHigh, value); break; case hipGraphMemAttrReservedMemHigh: result = g_devices[device]->GetGraphMemoryPool()->SetAttribute( hipMemPoolAttrReservedMemHigh, value); break; default: break; } HIP_RETURN(result); } // ================================================================================================ hipError_t hipDeviceGraphMemTrim(int device) { HIP_INIT_API(hipDeviceGraphMemTrim, device); if ((static_cast(device) >= g_devices.size()) || device < 0) { HIP_RETURN(hipErrorInvalidDevice); } g_devices[device]->GetGraphMemoryPool()->TrimTo(0); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipUserObjectCreate(hipUserObject_t* object_out, void* ptr, hipHostFn_t destroy, unsigned int initialRefcount, unsigned int flags) { HIP_INIT_API(hipUserObjectCreate, object_out, ptr, destroy, initialRefcount, flags); if (object_out == nullptr || flags != hipUserObjectNoDestructorSync || initialRefcount == 0 || destroy == nullptr || initialRefcount > INT_MAX) { HIP_RETURN(hipErrorInvalidValue); } *object_out = new hipUserObject(destroy, ptr, flags); //! Creating object adds one reference. if (initialRefcount > 1) { (*object_out)->increaseRefCount(static_cast(initialRefcount - 1)); } HIP_RETURN(hipSuccess); } hipError_t hipUserObjectRelease(hipUserObject_t object, unsigned int count) { HIP_INIT_API(hipUserObjectRelease, object, count); if (object == nullptr || count == 0 || count > INT_MAX) { HIP_RETURN(hipErrorInvalidValue); } if (object->referenceCount() < count || !hipUserObject::isUserObjvalid(object)) { HIP_RETURN(hipSuccess); } //! If all the counts are gone not longer need the obj in the list if (object->referenceCount() == count) { hipUserObject::removeUSerObj(object); } object->decreaseRefCount(count); HIP_RETURN(hipSuccess); } hipError_t hipUserObjectRetain(hipUserObject_t object, unsigned int count) { HIP_INIT_API(hipUserObjectRetain, object, count); if (object == nullptr || count == 0 || count > INT_MAX) { HIP_RETURN(hipErrorInvalidValue); } if (!hipUserObject::isUserObjvalid(object)) { HIP_RETURN(hipSuccess); } object->increaseRefCount(count); HIP_RETURN(hipSuccess); } hipError_t hipGraphRetainUserObject(hipGraph_t graph, hipUserObject_t object, unsigned int count, unsigned int flags) { HIP_INIT_API(hipGraphRetainUserObject, graph, object, count, flags); hipError_t status = hipSuccess; if (graph == nullptr || object == nullptr || count == 0 || count > INT_MAX || (flags != 0 && flags != hipGraphUserObjectMove)) { HIP_RETURN(hipErrorInvalidValue); } if (!hipUserObject::isUserObjvalid(object) && !graph->isUserObjGraphValid(object)) { HIP_RETURN(hipSuccess); } if (flags != hipGraphUserObjectMove) { status = hipUserObjectRetain(object, count); if (status != hipSuccess) { HIP_RETURN(status); } } else { //! if flag is UserObjMove delete userobj from list hipUserObject::removeUSerObj(object); } graph->addUserObjGraph(object); HIP_RETURN(status); } hipError_t hipGraphReleaseUserObject(hipGraph_t graph, hipUserObject_t object, unsigned int count) { HIP_INIT_API(hipGraphReleaseUserObject, graph, object, count); if (graph == nullptr || object == nullptr || count == 0 || count > INT_MAX) { HIP_RETURN(hipErrorInvalidValue); } if (!graph->isUserObjGraphValid(object) || object->referenceCount() < count) { HIP_RETURN(hipSuccess); } //! Obj is being destroyed unsigned int releaseCount = (object->referenceCount() < count) ? object->referenceCount() : count; if (object->referenceCount() == releaseCount) { graph->RemoveUserObjGraph(object); } hipError_t status = hipUserObjectRelease(object, count); HIP_RETURN(status); } hipError_t hipGraphKernelNodeCopyAttributes(hipGraphNode_t hSrc, hipGraphNode_t hDst) { HIP_INIT_API(hipGraphKernelNodeCopyAttributes, hSrc, hDst); if (hSrc == nullptr || hDst == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(reinterpret_cast(hDst)->CopyAttr( reinterpret_cast(hSrc))); } hipError_t ihipGraphDebugDotPrint(hipGraph_t graph, const char* path, unsigned int flags) { if (graph == nullptr || path == nullptr) { return hipErrorInvalidValue; } std::ofstream fout; fout.open(path, std::ios::out); if (fout.fail()) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] Error during opening of file : %s", path); return hipErrorOperatingSystem; } fout << "digraph dot {" << std::endl; graph->GenerateDOT(fout, (hipGraphDebugDotFlags)flags); fout << "}" << std::endl; fout.close(); return hipSuccess; } hipError_t hipGraphDebugDotPrint(hipGraph_t graph, const char* path, unsigned int flags) { HIP_INIT_API(hipGraphDebugDotPrint, graph, path, flags); HIP_RETURN(ihipGraphDebugDotPrint(graph, path, flags)); } hipError_t hipGraphNodeSetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int isEnabled) { HIP_INIT_API(hipGraphNodeSetEnabled, hGraphExec, hNode, isEnabled); if (hGraphExec == nullptr || hNode == nullptr || !hipGraphExec::isGraphExecValid(hGraphExec) || !hipGraphNode::isNodeValid(hNode)) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(hNode); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (!(hNode->GetType() == hipGraphNodeTypeKernel || hNode->GetType() == hipGraphNodeTypeMemcpy || hNode->GetType() == hipGraphNodeTypeMemset)) { HIP_RETURN(hipErrorInvalidValue); } clonedNode->SetEnabled(isEnabled); HIP_RETURN(hipSuccess); } hipError_t hipGraphNodeGetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int* isEnabled) { HIP_INIT_API(hipGraphNodeGetEnabled, hGraphExec, hNode, isEnabled); if (hGraphExec == nullptr || hNode == nullptr || isEnabled == nullptr || !hipGraphExec::isGraphExecValid(hGraphExec) || !hipGraphNode::isNodeValid(hNode)) { HIP_RETURN(hipErrorInvalidValue); } hipGraphNode_t clonedNode = hGraphExec->GetClonedNode(hNode); if (clonedNode == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (!(hNode->GetType() == hipGraphNodeTypeKernel || hNode->GetType() == hipGraphNodeTypeMemcpy || hNode->GetType() == hipGraphNodeTypeMemset)) { HIP_RETURN(hipErrorInvalidValue); } *isEnabled = clonedNode->GetEnabled(); HIP_RETURN(hipSuccess); } hipError_t hipGraphUpload(hipGraphExec_t graphExec, hipStream_t stream) { HIP_INIT_API(hipGraphUpload, graphExec, stream); if (graphExec == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } // TODO: stream is known before launch, do preperatory work with graph optimizations. pre-allocate // memory for memAlloc nodes if any when support is added with mempool feature HIP_RETURN(hipSuccess); } clr-rocm-5.7.1/hipamd/src/hip_graph_capture.hpp000066400000000000000000000145051450307266000215000ustar00rootroot00000000000000/* Copyright (c) 2021 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once // forward declaration of capture methods hipError_t capturehipLaunchKernel(hipStream_t& stream, const void*& hostFunction, dim3& gridDim, dim3& blockDim, void**& args, size_t& sharedMemBytes); hipError_t capturehipExtModuleLaunchKernel(hipStream_t& stream, hipFunction_t& f, uint32_t& globalWorkSizeX, uint32_t& globalWorkSizeY, uint32_t& globalWorkSizeZ, uint32_t& localWorkSizeX, uint32_t& localWorkSizeY, uint32_t& localWorkSizeZ, size_t& sharedMemBytes, void**& kernelParams, void**& extra, hipEvent_t& startEvent, hipEvent_t& stopEvent, uint32_t& flags); hipError_t capturehipExtLaunchKernel(hipStream_t& stream, const void*& hostFunction, dim3& gridDim, dim3& blockDim, void**& args, size_t& sharedMemBytes, hipEvent_t& startEvent, hipEvent_t& stopEvent, int& flags); hipError_t capturehipModuleLaunchKernel(hipStream_t& stream, hipFunction_t& f, uint32_t& gridDimX, uint32_t& gridDimY, uint32_t& gridDimZ, uint32_t& blockDimX, uint32_t& blockDimY, uint32_t& blockDimZ, uint32_t& sharedMemBytes, void**& kernelParams, void**& extra); hipError_t capturehipMemcpy2DAsync(hipStream_t& stream, void*& dst, size_t& dpitch, const void*& src, size_t& spitch, size_t& width, size_t& height, hipMemcpyKind& kind); hipError_t capturehipMemcpyParam2DAsync(hipStream_t& stream, const hip_Memcpy2D*& pCopy); hipError_t capturehipMemcpy2DFromArrayAsync(hipStream_t& stream, void*& dst, size_t& dpitch, hipArray_const_t& src, size_t& wOffsetSrc, size_t& hOffsetSrc, size_t& width, size_t& height, hipMemcpyKind& kind); hipError_t capturehipMemcpyFromArrayAsync(hipStream_t& stream, void*& dst, hipArray_const_t& src, size_t& wOffsetSrc, size_t& hOffsetSrc, size_t& count, hipMemcpyKind& kind); hipError_t capturehipMemcpy2DToArrayAsync(hipStream_t& stream, hipArray*& dst, size_t& wOffset, size_t& hOffset, const void*& src, size_t& spitch, size_t& width, size_t& height, hipMemcpyKind& kind); hipError_t capturehipMemcpyToArrayAsync(hipStream_t& stream, hipArray_t& dst, size_t& wOffset, size_t& hOffset, const void*& src, size_t& count, hipMemcpyKind& kind); hipError_t capturehipMemcpyAtoHAsync(hipStream_t& stream, void*& dstHost, hipArray*& srcArray, size_t& srcOffset, size_t& ByteCount); hipError_t capturehipMemcpyHtoAAsync(hipStream_t& stream, hipArray*& dstArray, size_t& dstOffset, const void*& srcHost, size_t& ByteCount); hipError_t capturehipMemcpy3DAsync(hipStream_t& stream, const hipMemcpy3DParms*& p); hipError_t capturehipMemcpyAsync(hipStream_t& stream, void*& dst, const void*& src, size_t& sizeBytes, hipMemcpyKind& kind); hipError_t capturehipMemcpyHtoDAsync(hipStream_t& stream, hipDeviceptr_t& dstDevice, void*& srcHost, size_t& ByteCount, hipMemcpyKind& kind); hipError_t capturehipMemcpyDtoDAsync(hipStream_t& stream, hipDeviceptr_t& dstDevice, hipDeviceptr_t& srcDevice, size_t& ByteCount, hipMemcpyKind& kind); hipError_t capturehipMemcpyDtoHAsync(hipStream_t& stream, void*& dstHost, hipDeviceptr_t& srcDevice, size_t& ByteCount, hipMemcpyKind& kind); hipError_t capturehipMemcpyFromSymbolAsync(hipStream_t& stream, void*& dst, const void*& symbol, size_t& sizeBytes, size_t& offset, hipMemcpyKind& kind); hipError_t capturehipMemcpyToSymbolAsync(hipStream_t& stream, const void*& symbol, const void*& src, size_t& sizeBytes, size_t& offset, hipMemcpyKind& kind); hipError_t capturehipMemsetAsync(hipStream_t& stream, void*& dst, int& value, size_t& valueSize, size_t& sizeBytes); hipError_t capturehipMemset2DAsync(hipStream_t& stream, void*& dst, size_t& pitch, int& value, size_t& width, size_t& height); hipError_t capturehipMemset3DAsync(hipStream_t& stream, hipPitchedPtr& pitchedDevPtr, int& value, hipExtent& extent); hipError_t capturehipLaunchHostFunc(hipStream_t& stream, hipHostFn_t& fn, void*& userData); hipError_t capturehipMallocAsync(hipStream_t stream, hipMemPool_t mem_pool, size_t size, void** dev_ptr); hipError_t capturehipFreeAsync(hipStream_t stream, void* dev_ptr); clr-rocm-5.7.1/hipamd/src/hip_graph_helper.hpp000066400000000000000000000134271450307266000213160ustar00rootroot00000000000000#include "hip_conversions.hpp" hipError_t ihipMemcpy3D_validate(const hipMemcpy3DParms* p); hipError_t ihipMemcpy_validate(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind); hipError_t ihipMemcpyCommand(amd::Command*& command, void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hip::Stream& stream, bool isAsync = false); void ihipHtoHMemcpy(void* dst, const void* src, size_t sizeBytes, hip::Stream& stream); bool IsHtoHMemcpy(void* dst, const void* src, hipMemcpyKind kind); hipError_t ihipLaunchKernel_validate(hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, uint32_t sharedMemBytes, void** kernelParams, void** extra, int deviceId, uint32_t params); hipError_t ihipMemset_validate(void* dst, int64_t value, size_t valueSize, size_t sizeBytes); hipError_t ihipMemset3D_validate(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, size_t sizeBytes); hipError_t ihipLaunchKernelCommand(amd::Command*& command, hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, uint32_t sharedMemBytes, hip::Stream* stream, void** kernelParams, void** extra, hipEvent_t startEvent, hipEvent_t stopEvent, uint32_t flags, uint32_t params, uint32_t gridId, uint32_t numGrids, uint64_t prevGridSum, uint64_t allGridSum, uint32_t firstDevice); hipError_t ihipMemcpy3DCommand(amd::Command*& command, const hipMemcpy3DParms* p, hip::Stream* stream); hipError_t ihipMemsetCommand(std::vector& commands, void* dst, int64_t value, size_t valueSize, size_t sizeBytes, hip::Stream* stream); hipError_t ihipMemset3DCommand(std::vector& commands, hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hip::Stream* stream, size_t elementSize = 1); hipError_t ihipMemcpySymbol_validate(const void* symbol, size_t sizeBytes, size_t offset, size_t& sym_size, hipDeviceptr_t& device_ptr); hipError_t ihipMemcpyAtoDValidate(hipArray* srcArray, void* dstDevice, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t dstRowPitch, size_t dstSlicePitch, amd::Memory*& dstMemory, amd::Image*& srcImage, amd::BufferRect& srcRect, amd::BufferRect& dstRect); hipError_t ihipMemcpyDtoAValidate(void* srcDevice, hipArray* dstArray, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, amd::Image*& dstImage, amd::Memory*& srcMemory, amd::BufferRect& dstRect, amd::BufferRect& srcRect); hipError_t ihipMemcpyDtoDValidate(void* srcDevice, void* dstDevice, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, amd::Memory*& srcMemory, amd::Memory*& dstMemory, amd::BufferRect& srcRect, amd::BufferRect& dstRect); hipError_t ihipMemcpyDtoHValidate(void* srcDevice, void* dstHost, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, amd::Memory*& srcMemory, amd::BufferRect& srcRect, amd::BufferRect& dstRect); hipError_t ihipMemcpyHtoDValidate(const void* srcHost, void* dstDevice, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, amd::Memory*& dstMemory, amd::BufferRect& srcRect, amd::BufferRect& dstRect); hipError_t ihipMemcpyAtoAValidate(hipArray* srcArray, hipArray* dstArray, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, amd::Image*& srcImage, amd::Image*& dstImage); hipError_t ihipMemcpyHtoAValidate(const void* srcHost, hipArray* dstArray, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, amd::Image*& dstImage, amd::BufferRect& srcRect); hipError_t ihipMemcpyAtoHValidate(hipArray* srcArray, void* dstHost, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t dstRowPitch, size_t dstSlicePitch, amd::Image*& srcImage, amd::BufferRect& dstRect); hipError_t ihipGraphMemsetParams_validate(const hipMemsetParams* pNodeParams); clr-rocm-5.7.1/hipamd/src/hip_graph_internal.cpp000066400000000000000000000554121450307266000216460ustar00rootroot00000000000000/* Copyright (c) 2021 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "hip_graph_internal.hpp" #include #define CASE_STRING(X, C) \ case X: \ case_string = #C; \ break; const char* GetGraphNodeTypeString(uint32_t op) { const char* case_string; switch (static_cast(op)) { CASE_STRING(hipGraphNodeTypeKernel, KernelNode) CASE_STRING(hipGraphNodeTypeMemcpy, MemcpyNode) CASE_STRING(hipGraphNodeTypeMemset, MemsetNode) CASE_STRING(hipGraphNodeTypeHost, HostNode) CASE_STRING(hipGraphNodeTypeGraph, GraphNode) CASE_STRING(hipGraphNodeTypeEmpty, EmptyNode) CASE_STRING(hipGraphNodeTypeWaitEvent, WaitEventNode) CASE_STRING(hipGraphNodeTypeEventRecord, EventRecordNode) CASE_STRING(hipGraphNodeTypeExtSemaphoreSignal, ExtSemaphoreSignalNode) CASE_STRING(hipGraphNodeTypeExtSemaphoreWait, ExtSemaphoreWaitNode) CASE_STRING(hipGraphNodeTypeMemAlloc, MemAllocNode) CASE_STRING(hipGraphNodeTypeMemFree, MemFreeNode) CASE_STRING(hipGraphNodeTypeMemcpyFromSymbol, MemcpyFromSymbolNode) CASE_STRING(hipGraphNodeTypeMemcpyToSymbol, MemcpyToSymbolNode) default: case_string = "Unknown node type"; }; return case_string; }; int hipGraphNode::nextID = 0; int ihipGraph::nextID = 0; std::unordered_set hipGraphNode::nodeSet_; amd::Monitor hipGraphNode::nodeSetLock_{"Guards global node set"}; std::unordered_set ihipGraph::graphSet_; amd::Monitor ihipGraph::graphSetLock_{"Guards global graph set"}; std::unordered_set hipGraphExec::graphExecSet_; amd::Monitor hipGraphExec::graphExecSetLock_{"Guards global exec graph set"}; std::unordered_set hipUserObject::ObjectSet_; amd::Monitor hipUserObject::UserObjectLock_{"Guards global user object"}; hipError_t hipGraphMemcpyNode1D::ValidateParams(void* dst, const void* src, size_t count, hipMemcpyKind kind) { hipError_t status = ihipMemcpy_validate(dst, src, count, kind); if (status != hipSuccess) { return status; } size_t sOffsetOrig = 0; amd::Memory* origSrcMemory = getMemoryObject(src, sOffsetOrig); size_t dOffsetOrig = 0; amd::Memory* origDstMemory = getMemoryObject(dst, dOffsetOrig); size_t sOffset = 0; amd::Memory* srcMemory = getMemoryObject(src, sOffset); size_t dOffset = 0; amd::Memory* dstMemory = getMemoryObject(dst, dOffset); if ((srcMemory == nullptr) && (dstMemory != nullptr)) { // host to device if (origDstMemory->getContext().devices()[0] != dstMemory->getContext().devices()[0]) { return hipErrorInvalidValue; } if ((kind != hipMemcpyHostToDevice) && (kind != hipMemcpyDefault)) { return hipErrorInvalidValue; } } else if ((srcMemory != nullptr) && (dstMemory == nullptr)) { // device to host if (origSrcMemory->getContext().devices()[0] != srcMemory->getContext().devices()[0]) { return hipErrorInvalidValue; } if ((kind != hipMemcpyDeviceToHost) && (kind != hipMemcpyDefault)) { return hipErrorInvalidValue; } } else if ((srcMemory != nullptr) && (dstMemory != nullptr)) { if (origDstMemory->getContext().devices()[0] != dstMemory->getContext().devices()[0]) { return hipErrorInvalidValue; } if (origSrcMemory->getContext().devices()[0] != srcMemory->getContext().devices()[0]) { return hipErrorInvalidValue; } } return hipSuccess; } hipError_t hipGraphMemcpyNode::ValidateParams(const hipMemcpy3DParms* pNodeParams) { hipError_t status = ihipMemcpy3D_validate(pNodeParams); if (status != hipSuccess) { return status; } size_t offset = 0; const HIP_MEMCPY3D pCopy = hip::getDrvMemcpy3DDesc(*pNodeParams); // If {src/dst}MemoryType is hipMemoryTypeUnified, {src/dst}Device and {src/dst}Pitch specify the // (unified virtual address space) base address of the source data and the bytes per row to apply. // {src/dst}Array is ignored. hipMemoryType srcMemoryType = pCopy.srcMemoryType; if (srcMemoryType == hipMemoryTypeUnified) { amd::Memory* memObj = getMemoryObject(pCopy.srcDevice, offset); srcMemoryType = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags()) ? hipMemoryTypeHost : hipMemoryTypeDevice; if (srcMemoryType == hipMemoryTypeHost) { // {src/dst}Host may be unitialized. Copy over {src/dst}Device into it if we detect system // memory. const_cast(&pCopy)->srcHost = pCopy.srcDevice; const_cast(&pCopy)->srcXInBytes += offset; } } offset = 0; hipMemoryType dstMemoryType = pCopy.dstMemoryType; if (dstMemoryType == hipMemoryTypeUnified) { amd::Memory* memObj = getMemoryObject(pCopy.dstDevice, offset); dstMemoryType = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags()) ? hipMemoryTypeHost : hipMemoryTypeDevice; if (dstMemoryType == hipMemoryTypeHost) { const_cast(&pCopy)->dstHost = pCopy.dstDevice; const_cast(&pCopy)->dstXInBytes += offset; } } offset = 0; // If {src/dst}MemoryType is hipMemoryTypeHost, check if the memory was prepinned. // In that case upgrade the copy type to hipMemoryTypeDevice to avoid extra pinning. if (srcMemoryType == hipMemoryTypeHost) { srcMemoryType = getMemoryObject(pCopy.srcHost, offset) ? hipMemoryTypeDevice : hipMemoryTypeHost; if (srcMemoryType == hipMemoryTypeDevice) { const_cast(&pCopy)->srcDevice = const_cast(pCopy.srcHost); } } offset = 0; if (dstMemoryType == hipMemoryTypeHost) { dstMemoryType = getMemoryObject(pCopy.dstHost, offset) ? hipMemoryTypeDevice : hipMemoryTypeHost; if (dstMemoryType == hipMemoryTypeDevice) { const_cast(&pCopy)->dstDevice = const_cast(pCopy.dstDevice); } } amd::Coord3D srcOrigin = {pCopy.srcXInBytes, pCopy.srcY, pCopy.srcZ}; amd::Coord3D dstOrigin = {pCopy.dstXInBytes, pCopy.dstY, pCopy.dstZ}; amd::Coord3D copyRegion = {pCopy.WidthInBytes, pCopy.Height, pCopy.Depth}; if ((srcMemoryType == hipMemoryTypeHost) && (dstMemoryType == hipMemoryTypeDevice)) { // Host to Device. amd::Memory* dstMemory; amd::BufferRect srcRect; amd::BufferRect dstRect; status = ihipMemcpyHtoDValidate(pCopy.srcHost, pCopy.dstDevice, srcOrigin, dstOrigin, copyRegion, pCopy.srcPitch, pCopy.srcPitch * pCopy.srcHeight, pCopy.dstPitch, pCopy.dstPitch * pCopy.dstHeight, dstMemory, srcRect, dstRect); if (status != hipSuccess) { return status; } } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeHost)) { // Device to Host. amd::Memory* srcMemory; amd::BufferRect srcRect; amd::BufferRect dstRect; status = ihipMemcpyDtoHValidate(pCopy.srcDevice, pCopy.dstHost, srcOrigin, dstOrigin, copyRegion, pCopy.srcPitch, pCopy.srcPitch * pCopy.srcHeight, pCopy.dstPitch, pCopy.dstPitch * pCopy.dstHeight, srcMemory, srcRect, dstRect); if (status != hipSuccess) { return status; } } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeDevice)) { // Device to Device. amd::Memory* srcMemory; amd::Memory* dstMemory; amd::BufferRect srcRect; amd::BufferRect dstRect; status = ihipMemcpyDtoDValidate(pCopy.srcDevice, pCopy.dstDevice, srcOrigin, dstOrigin, copyRegion, pCopy.srcPitch, pCopy.srcPitch * pCopy.srcHeight, pCopy.dstPitch, pCopy.dstPitch * pCopy.dstHeight, srcMemory, dstMemory, srcRect, dstRect); if (status != hipSuccess) { return status; } } else if ((srcMemoryType == hipMemoryTypeHost) && (dstMemoryType == hipMemoryTypeArray)) { amd::Image* dstImage; amd::BufferRect srcRect; status = ihipMemcpyHtoAValidate(pCopy.srcHost, pCopy.dstArray, srcOrigin, dstOrigin, copyRegion, pCopy.srcPitch, pCopy.srcPitch * pCopy.srcHeight, dstImage, srcRect); if (status != hipSuccess) { return status; } } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeHost)) { // Image to Host. amd::Image* srcImage; amd::BufferRect dstRect; status = ihipMemcpyAtoHValidate(pCopy.srcArray, pCopy.dstHost, srcOrigin, dstOrigin, copyRegion, pCopy.dstPitch, pCopy.dstPitch * pCopy.dstHeight, srcImage, dstRect); if (status != hipSuccess) { return status; } } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeArray)) { // Device to Image. amd::Image* dstImage; amd::Memory* srcMemory; amd::BufferRect dstRect; amd::BufferRect srcRect; status = ihipMemcpyDtoAValidate(pCopy.srcDevice, pCopy.dstArray, srcOrigin, dstOrigin, copyRegion, pCopy.srcPitch, pCopy.srcPitch * pCopy.srcHeight, dstImage, srcMemory, dstRect, srcRect); if (status != hipSuccess) { return status; } } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeDevice)) { // Image to Device. amd::BufferRect srcRect; amd::BufferRect dstRect; amd::Memory* dstMemory; amd::Image* srcImage; status = ihipMemcpyAtoDValidate(pCopy.srcArray, pCopy.dstDevice, srcOrigin, dstOrigin, copyRegion, pCopy.dstPitch, pCopy.dstPitch * pCopy.dstHeight, dstMemory, srcImage, srcRect, dstRect); if (status != hipSuccess) { return status; } } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeArray)) { amd::Image* srcImage; amd::Image* dstImage; status = ihipMemcpyAtoAValidate(pCopy.srcArray, pCopy.dstArray, srcOrigin, dstOrigin, copyRegion, srcImage, dstImage); if (status != hipSuccess) { return status; } } else { return hipErrorInvalidValue; } return hipSuccess; } bool ihipGraph::isGraphValid(ihipGraph* pGraph) { amd::ScopedLock lock(graphSetLock_); if (graphSet_.find(pGraph) == graphSet_.end()) { return false; } return true; } void ihipGraph::AddNode(const Node& node) { vertices_.emplace_back(node); ClPrint(amd::LOG_INFO, amd::LOG_CODE, "[hipGraph] Add %s(%p)\n", GetGraphNodeTypeString(node->GetType()), node); node->SetParentGraph(this); } void ihipGraph::RemoveNode(const Node& node) { vertices_.erase(std::remove(vertices_.begin(), vertices_.end(), node), vertices_.end()); delete node; } // root nodes are all vertices with 0 in-degrees std::vector ihipGraph::GetRootNodes() const { std::vector roots; for (auto entry : vertices_) { if (entry->GetInDegree() == 0) { roots.push_back(entry); ClPrint(amd::LOG_INFO, amd::LOG_CODE, "[hipGraph] root node: %s(%p)\n", GetGraphNodeTypeString(entry->GetType()), entry); } } ClPrint(amd::LOG_INFO, amd::LOG_CODE, "\n"); return roots; } // leaf nodes are all vertices with 0 out-degrees std::vector ihipGraph::GetLeafNodes() const { std::vector leafNodes; for (auto entry : vertices_) { if (entry->GetOutDegree() == 0) { leafNodes.push_back(entry); } } return leafNodes; } size_t ihipGraph::GetLeafNodeCount() const { int numLeafNodes = 0; for (auto entry : vertices_) { if (entry->GetOutDegree() == 0) { numLeafNodes++; } } return numLeafNodes; } std::vector> ihipGraph::GetEdges() const { std::vector> edges; for (const auto& i : vertices_) { for (const auto& j : i->GetEdges()) { edges.push_back(std::make_pair(i, j)); } } return edges; } void ihipGraph::GetRunListUtil(Node v, std::unordered_map& visited, std::vector& singleList, std::vector>& parallelLists, std::unordered_map>& dependencies) { // Mark the current node as visited. visited[v] = true; singleList.push_back(v); // Recurse for all the vertices adjacent to this vertex for (auto& adjNode : v->GetEdges()) { if (!visited[adjNode]) { // For the parallel list nodes add parent as the dependency if (singleList.empty()) { ClPrint(amd::LOG_INFO, amd::LOG_CODE, "[hipGraph] For %s(%p)- add parent as dependency %s(%p)\n", GetGraphNodeTypeString(adjNode->GetType()), adjNode, GetGraphNodeTypeString(v->GetType()), v); dependencies[adjNode].push_back(v); } GetRunListUtil(adjNode, visited, singleList, parallelLists, dependencies); } else { for (auto& list : parallelLists) { // Merge singleList when adjNode matches with the first element of the list in existing // lists if (adjNode == list[0]) { for (auto k = singleList.rbegin(); k != singleList.rend(); ++k) { list.insert(list.begin(), *k); } singleList.erase(singleList.begin(), singleList.end()); } } // If the list cannot be merged with the existing list add as dependancy if (!singleList.empty()) { ClPrint(amd::LOG_INFO, amd::LOG_CODE, "[hipGraph] For %s(%p)- add dependency %s(%p)\n", GetGraphNodeTypeString(adjNode->GetType()), adjNode, GetGraphNodeTypeString(v->GetType()), v); dependencies[adjNode].push_back(v); } } } if (!singleList.empty()) { parallelLists.push_back(singleList); singleList.erase(singleList.begin(), singleList.end()); } } // The function to do Topological Sort. // It uses recursive GetRunListUtil() void ihipGraph::GetRunList(std::vector>& parallelLists, std::unordered_map>& dependencies) { std::vector singleList; // Mark all the vertices as not visited std::unordered_map visited; for (auto node : vertices_) visited[node] = false; // Call the recursive helper function for all vertices one by one for (auto node : vertices_) { // If the node has embedded child graph node->GetRunList(parallelLists, dependencies); if (visited[node] == false) { GetRunListUtil(node, visited, singleList, parallelLists, dependencies); } } for (size_t i = 0; i < parallelLists.size(); i++) { for (size_t j = 0; j < parallelLists[i].size(); j++) { ClPrint(amd::LOG_INFO, amd::LOG_CODE, "[hipGraph] list %d - %s(%p)\n", i + 1, GetGraphNodeTypeString(parallelLists[i][j]->GetType()), parallelLists[i][j]); } } } bool ihipGraph::TopologicalOrder(std::vector& TopoOrder) { std::queue q; std::unordered_map inDegree; for (auto entry : vertices_) { if (entry->GetInDegree() == 0) { q.push(entry); } inDegree[entry] = entry->GetInDegree(); } while (!q.empty()) { Node node = q.front(); TopoOrder.push_back(node); q.pop(); for (auto edge : node->GetEdges()) { inDegree[edge]--; if (inDegree[edge] == 0) { q.push(edge); } } } if (GetNodeCount() == TopoOrder.size()) { return true; } return false; } ihipGraph* ihipGraph::clone(std::unordered_map& clonedNodes) const { ihipGraph* newGraph = new ihipGraph(device_, this); for (auto entry : vertices_) { hipGraphNode* node = entry->clone(); node->SetParentGraph(newGraph); newGraph->vertices_.push_back(node); clonedNodes[entry] = node; } std::vector clonedEdges; std::vector clonedDependencies; for (auto node : vertices_) { const std::vector& edges = node->GetEdges(); clonedEdges.clear(); for (auto edge : edges) { clonedEdges.push_back(clonedNodes[edge]); } clonedNodes[node]->SetEdges(clonedEdges); } for (auto node : vertices_) { const std::vector& dependencies = node->GetDependencies(); clonedDependencies.clear(); for (auto dep : dependencies) { clonedDependencies.push_back(clonedNodes[dep]); } clonedNodes[node]->SetDependencies(clonedDependencies); } return newGraph; } ihipGraph* ihipGraph::clone() const { std::unordered_map clonedNodes; return clone(clonedNodes); } bool hipGraphExec::isGraphExecValid(hipGraphExec* pGraphExec) { amd::ScopedLock lock(graphExecSetLock_); if (graphExecSet_.find(pGraphExec) == graphExecSet_.end()) { return false; } return true; } hipError_t hipGraphExec::CreateStreams(uint32_t num_streams) { parallel_streams_.reserve(num_streams); for (uint32_t i = 0; i < num_streams; ++i) { auto stream = new hip::Stream(hip::getCurrentDevice(), hip::Stream::Priority::Normal, hipStreamNonBlocking); if (stream == nullptr || !stream->Create()) { if (stream != nullptr) { hip::Stream::Destroy(stream); } ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] Failed to create parallel stream!\n"); return hipErrorOutOfMemory; } parallel_streams_.push_back(stream); } return hipSuccess; } hipError_t hipGraphExec::Init() { hipError_t status; size_t min_num_streams = 1; for (auto& node : topoOrder_) { status = node->GetNumParallelStreams(min_num_streams); if(status != hipSuccess) { return status; } } status = CreateStreams(parallelLists_.size() - 1 + min_num_streams); return status; } hipError_t FillCommands(std::vector>& parallelLists, std::unordered_map>& nodeWaitLists, std::vector& topoOrder, std::vector& rootCommands, amd::Command*& endCommand, hip::Stream* stream) { hipError_t status; for (auto& node : topoOrder) { // TODO: clone commands from next launch status = node->CreateCommand(node->GetQueue()); if (status != hipSuccess) return status; amd::Command::EventWaitList waitList; for (auto depNode : nodeWaitLists[node]) { for (auto command : depNode->GetCommands()) { waitList.push_back(command); } } node->UpdateEventWaitLists(waitList); } // rootCommand ensures graph is started (all parallel branches) after all the previous work is // finished bool first = true; for (auto& singleList : parallelLists) { if (first) { first = false; continue; } // marker from the same queue as the list amd::Command* rootCommand = new amd::Marker(*singleList[0]->GetQueue(), false, {}); amd::Command::EventWaitList waitList; waitList.push_back(rootCommand); if (!singleList.empty()) { auto commands = singleList[0]->GetCommands(); if (!commands.empty()) { commands[0]->updateEventWaitList(waitList); rootCommands.push_back(rootCommand); } } } // endCommand ensures next enqueued ones start after graph is finished (all parallel branches) amd::Command::EventWaitList graphLastCmdWaitList; first = true; for (auto& singleList : parallelLists) { if (first) { first = false; continue; } if (!singleList.empty()) { auto commands = singleList.back()->GetCommands(); if (!commands.empty()) { graphLastCmdWaitList.push_back(commands.back()); } } } if (!graphLastCmdWaitList.empty()) { endCommand = new amd::Marker(*stream, false, graphLastCmdWaitList); if (endCommand == nullptr) { return hipErrorOutOfMemory; } } return hipSuccess; } void UpdateStream(std::vector>& parallelLists, hip::Stream* stream, hipGraphExec* ptr) { int i = 0; for (const auto& list : parallelLists) { // first parallel list will be launched on the same queue as parent if (i == 0) { for (auto& node : list) { node->SetStream(stream, ptr); } } else { // New stream for parallel branches hip::Stream* stream = ptr->GetAvailableStreams(); for (auto& node : list) { node->SetStream(stream, ptr); } } i++; } } hipError_t hipGraphExec::Run(hipStream_t stream) { hipError_t status; if (hip::getStream(stream) == nullptr) { return hipErrorInvalidResourceHandle; } auto hip_stream = (stream == nullptr) ? hip::getCurrentDevice()->NullStream() : reinterpret_cast(stream); if (flags_ & hipGraphInstantiateFlagAutoFreeOnLaunch) { if (!topoOrder_.empty()) { topoOrder_[0]->GetParentGraph()->FreeAllMemory(hip_stream); } } // If this is a repeat launch, make sure corresponding MemFreeNode exists for a MemAlloc node if (repeatLaunch_ == true) { for (auto& node : topoOrder_) { if (node->GetType() == hipGraphNodeTypeMemAlloc && static_cast(node)->IsActiveMem() == true) { return hipErrorInvalidValue; } } } else { repeatLaunch_ = true; } UpdateStream(parallelLists_, hip_stream, this); std::vector rootCommands; amd::Command* endCommand = nullptr; status = FillCommands(parallelLists_, nodeWaitLists_, topoOrder_, rootCommands, endCommand, hip_stream); if (status != hipSuccess) { return status; } for (auto& cmd : rootCommands) { cmd->enqueue(); cmd->release(); } for (auto& node : topoOrder_) { node->EnqueueCommands(stream); } if (endCommand != nullptr) { endCommand->enqueue(); endCommand->release(); } ResetQueueIndex(); return status; } clr-rocm-5.7.1/hipamd/src/hip_graph_internal.hpp000066400000000000000000002351151450307266000216530ustar00rootroot00000000000000/* Copyright (c) 2021 - 2023 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include #include #include #include #include #include #include #include "hip/hip_runtime.h" #include "hip_internal.hpp" #include "hip_graph_helper.hpp" #include "hip_event.hpp" #include "hip_platform.hpp" #include "hip_mempool_impl.hpp" #include "hip_vm.hpp" typedef hipGraphNode* Node; hipError_t FillCommands(std::vector>& parallelLists, std::unordered_map>& nodeWaitLists, std::vector& topoOrder, std::vector& rootCommands, amd::Command*& endCommand, hip::Stream* stream); void UpdateStream(std::vector>& parallelLists, hip::Stream* stream, hipGraphExec* ptr); struct hipUserObject : public amd::ReferenceCountedObject { typedef void (*UserCallbackDestructor)(void* data); static std::unordered_set ObjectSet_; static amd::Monitor UserObjectLock_; public: hipUserObject(UserCallbackDestructor callback, void* data, unsigned int flags) : ReferenceCountedObject(), callback_(callback), data_(data), flags_(flags) { amd::ScopedLock lock(UserObjectLock_); ObjectSet_.insert(this); } virtual ~hipUserObject() { amd::ScopedLock lock(UserObjectLock_); if (callback_ != nullptr) { callback_(data_); } ObjectSet_.erase(this); } void increaseRefCount(const unsigned int refCount) { for (uint32_t i = 0; i < refCount; i++) { retain(); } } void decreaseRefCount(const unsigned int refCount) { assert((refCount <= referenceCount()) && "count is bigger than refcount"); for (uint32_t i = 0; i < refCount; i++) { release(); } } static bool isUserObjvalid(hipUserObject* pUsertObj) { auto it = ObjectSet_.find(pUsertObj); if (it == ObjectSet_.end()) { return false; } return true; } static void removeUSerObj(hipUserObject* pUsertObj) { amd::ScopedLock lock(UserObjectLock_); auto it = ObjectSet_.find(pUsertObj); if (it != ObjectSet_.end()) { ObjectSet_.erase(it); } } private: UserCallbackDestructor callback_; void* data_; unsigned int flags_; //! Disable default operator= hipUserObject& operator=(const hipUserObject&) = delete; //! Disable copy constructor hipUserObject(const hipUserObject& obj) = delete; }; struct hipGraphNodeDOTAttribute { protected: std::string style_; std::string shape_; std::string label_; hipGraphNodeDOTAttribute(std::string style, std::string shape, std::string label) { style_ = style; shape_ = shape; label_ = label; } hipGraphNodeDOTAttribute() { style_ = "solid"; shape_ = "rectangle"; label_ = ""; } hipGraphNodeDOTAttribute(const hipGraphNodeDOTAttribute& node) { style_ = node.style_; shape_ = node.shape_; label_ = node.label_; } void SetStyle(std::string style) { style_ = style; } void SetShape(std::string shape) { shape_ = shape; } virtual std::string GetShape(hipGraphDebugDotFlags flag) { return shape_; } void SetLabel(std::string label) { label_ = label; } virtual std::string GetLabel(hipGraphDebugDotFlags flag) { return label_; } virtual void PrintAttributes(std::ostream& out, hipGraphDebugDotFlags flag) { out << "["; out << "style"; out << "=\""; out << style_; out << "\""; out << "shape"; out << "=\""; out << GetShape(flag); out << "\""; out << "label"; out << "=\""; out << GetLabel(flag); out << "\""; out << "];"; } }; struct hipGraphNode : public hipGraphNodeDOTAttribute { protected: hip::Stream* stream_ = nullptr; unsigned int id_; hipGraphNodeType type_; std::vector commands_; std::vector edges_; std::vector dependencies_; bool visited_; // count of in coming edges size_t inDegree_; // count of outgoing edges size_t outDegree_; static int nextID; struct ihipGraph* parentGraph_; static std::unordered_set nodeSet_; static amd::Monitor nodeSetLock_; unsigned int isEnabled_; public: hipGraphNode(hipGraphNodeType type, std::string style = "", std::string shape = "", std::string label = "") : type_(type), visited_(false), inDegree_(0), outDegree_(0), id_(nextID++), parentGraph_(nullptr), isEnabled_(1), hipGraphNodeDOTAttribute(style, shape, label) { amd::ScopedLock lock(nodeSetLock_); nodeSet_.insert(this); } /// Copy Constructor hipGraphNode(const hipGraphNode& node) : hipGraphNodeDOTAttribute(node) { type_ = node.type_; inDegree_ = node.inDegree_; outDegree_ = node.outDegree_; visited_ = false; id_ = node.id_; parentGraph_ = nullptr; amd::ScopedLock lock(nodeSetLock_); nodeSet_.insert(this); isEnabled_ = node.isEnabled_; } virtual ~hipGraphNode() { for (auto node : edges_) { node->RemoveDependency(this); } for (auto node : dependencies_) { node->RemoveEdge(this); } amd::ScopedLock lock(nodeSetLock_); nodeSet_.erase(this); } // check node validity static bool isNodeValid(hipGraphNode* pGraphNode) { amd::ScopedLock lock(nodeSetLock_); if (pGraphNode == nullptr || nodeSet_.find(pGraphNode) == nodeSet_.end()) { return false; } return true; } hip::Stream* GetQueue() { return stream_; } virtual void SetStream(hip::Stream* stream, hipGraphExec* ptr = nullptr) { stream_ = stream; } /// Create amd::command for the graph node virtual hipError_t CreateCommand(hip::Stream* stream) { commands_.clear(); stream_ = stream; return hipSuccess; } /// Return node unique ID int GetID() const { return id_; } /// Returns command for graph node virtual std::vector& GetCommands() { return commands_; } /// Returns graph node type hipGraphNodeType GetType() const { return type_; } /// Clone graph node virtual hipGraphNode* clone() const = 0; /// Returns graph node indegree size_t GetInDegree() const { return inDegree_; } /// Updates indegree of the node void SetInDegree(size_t inDegree) { inDegree_ = inDegree; } /// Returns graph node outdegree size_t GetOutDegree() const { return outDegree_; } /// Updates outdegree of the node void SetOutDegree(size_t outDegree) { outDegree_ = outDegree; } /// Returns graph node dependencies const std::vector& GetDependencies() const { return dependencies_; } /// Update graph node dependecies void SetDependencies(std::vector& dependencies) { for (auto entry : dependencies) { dependencies_.push_back(entry); } } /// Add graph node dependency void AddDependency(const Node& node) { dependencies_.push_back(node); } /// Remove graph node dependency void RemoveDependency(const Node& node) { dependencies_.erase(std::remove(dependencies_.begin(), dependencies_.end(), node), dependencies_.end()); } void RemoveEdge(const Node& childNode) { edges_.erase(std::remove(edges_.begin(), edges_.end(), childNode), edges_.end()); } /// Return graph node children const std::vector& GetEdges() const { return edges_; } /// Updates graph node children void SetEdges(std::vector& edges) { for (auto entry : edges) { edges_.push_back(entry); } } /// Add edge, update parent node outdegree, child node indegree and dependency void AddEdge(const Node& childNode) { edges_.push_back(childNode); outDegree_++; childNode->SetInDegree(childNode->GetInDegree() + 1); childNode->AddDependency(this); } /// Remove edge, update parent node outdegree, child node indegree and dependency bool RemoveUpdateEdge(const Node& childNode) { // std::remove changes the end() hence saving it before hand for validation auto currEdgeEnd = edges_.end(); auto it = std::remove(edges_.begin(), edges_.end(), childNode); if (it == currEdgeEnd) { // Should come here if childNode is not present in the edge list return false; } edges_.erase(it, edges_.end()); outDegree_--; childNode->SetInDegree(childNode->GetInDegree() - 1); childNode->RemoveDependency(this); return true; } /// Get Runlist of the nodes embedded as part of the graphnode(e.g. ChildGraph) virtual void GetRunList(std::vector>& parallelList, std::unordered_map>& dependencies) {} /// Get topological sort of the nodes embedded as part of the graphnode(e.g. ChildGraph) virtual bool TopologicalOrder(std::vector& TopoOrder) { return true; } /// Update waitlist of the nodes embedded as part of the graphnode(e.g. ChildGraph) virtual void UpdateEventWaitLists(amd::Command::EventWaitList waitList) { for (auto command : commands_) { command->updateEventWaitList(waitList); } } virtual hipError_t GetNumParallelStreams(size_t &num) { return hipSuccess; } /// Enqueue commands part of the node virtual void EnqueueCommands(hipStream_t stream) { // If the node is disabled it becomes empty node. To maintain ordering just enqueue marker. // Node can be enabled/disabled only for kernel, memcpy and memset nodes. if (!isEnabled_ && (type_ == hipGraphNodeTypeKernel || type_ == hipGraphNodeTypeMemcpy || type_ == hipGraphNodeTypeMemset)) { amd::Command::EventWaitList waitList; hip::Stream* hip_stream = hip::getStream(stream); amd::Command* command = new amd::Marker(*hip_stream, !kMarkerDisableFlush, waitList); command->enqueue(); command->release(); return; } for (auto& command : commands_) { command->enqueue(); command->release(); } } ihipGraph* GetParentGraph() { return parentGraph_; } virtual ihipGraph* GetChildGraph() { return nullptr; } void SetParentGraph(ihipGraph* graph) { parentGraph_ = graph; } virtual hipError_t SetParams(hipGraphNode* node) { return hipSuccess; } virtual void GenerateDOT(std::ostream& fout, hipGraphDebugDotFlags flag) {} virtual void GenerateDOTNode(size_t graphId, std::ostream& fout, hipGraphDebugDotFlags flag) { fout << "\n"; std::string nodeName = "graph_" + std::to_string(graphId) + "_node_" + std::to_string(GetID()); fout << "\"" << nodeName << "\""; PrintAttributes(fout, flag); fout << "\n"; } virtual void GenerateDOTNodeEdges(size_t graphId, std::ostream& fout, hipGraphDebugDotFlags flag) { for (auto node : edges_) { std::string toNodeName = "graph_" + std::to_string(graphId) + "_node_" + std::to_string(node->GetID()); std::string fromNodeName = "graph_" + std::to_string(graphId) + "_node_" + std::to_string(GetID()); fout << "\"" << fromNodeName << "\" -> \"" << toNodeName << "\"" << std::endl; } } virtual std::string GetLabel(hipGraphDebugDotFlags flag) { return (std::to_string(id_) + "\n" + label_); } unsigned int GetEnabled() const { return isEnabled_; } void SetEnabled(unsigned int isEnabled) { isEnabled_ = isEnabled; } }; struct ihipGraph { std::vector vertices_; const ihipGraph* pOriginalGraph_ = nullptr; static std::unordered_set graphSet_; static amd::Monitor graphSetLock_; std::unordered_set graphUserObj_; unsigned int id_; static int nextID; hip::Device* device_; //!< HIP device object hip::MemoryPool* mem_pool_; //!< Memory pool, associated with this graph std::unordered_set capturedNodes_; bool graphInstantiated_; public: ihipGraph(hip::Device* device, const ihipGraph* original = nullptr) : pOriginalGraph_(original) , id_(nextID++) , device_(device) { amd::ScopedLock lock(graphSetLock_); graphSet_.insert(this); mem_pool_ = device->GetGraphMemoryPool(); mem_pool_->retain(); graphInstantiated_ = false; } ~ihipGraph() { for (auto node : vertices_) { delete node; } amd::ScopedLock lock(graphSetLock_); graphSet_.erase(this); for (auto userobj : graphUserObj_) { userobj->release(); } if (mem_pool_ != nullptr) { mem_pool_->release(); } } void AddManualNodeDuringCapture(hipGraphNode* node) { capturedNodes_.insert(node); } std::unordered_set GetManualNodesDuringCapture() { return capturedNodes_; } void RemoveManualNodesDuringCapture() { capturedNodes_.erase(capturedNodes_.begin(), capturedNodes_.end()); } /// Return graph unique ID int GetID() const { return id_; } // check graphs validity static bool isGraphValid(ihipGraph* pGraph); /// add node to the graph void AddNode(const Node& node); void RemoveNode(const Node& node); /// Returns root nodes, all vertices with 0 in-degrees std::vector GetRootNodes() const; /// Returns leaf nodes, all vertices with 0 out-degrees std::vector GetLeafNodes() const; /// Returns number of leaf nodes size_t GetLeafNodeCount() const; /// Returns total numbers of nodes in the graph size_t GetNodeCount() const { return vertices_.size(); } /// returns all the nodes in the graph const std::vector& GetNodes() const { return vertices_; } /// returns all the edges in the graph std::vector> GetEdges() const; // returns the original graph ptr if cloned const ihipGraph* getOriginalGraph() const { return pOriginalGraph_; } // Add user obj resource to graph void addUserObjGraph(hipUserObject* pUserObj) { amd::ScopedLock lock(graphSetLock_); graphUserObj_.insert(pUserObj); } // Check user obj resource from graph is valid bool isUserObjGraphValid(hipUserObject* pUserObj) { if (graphUserObj_.find(pUserObj) == graphUserObj_.end()) { return false; } return true; } // Delete user obj resource from graph void RemoveUserObjGraph(hipUserObject* pUserObj) { graphUserObj_.erase(pUserObj); } void GetRunListUtil(Node v, std::unordered_map& visited, std::vector& singleList, std::vector>& parallelLists, std::unordered_map>& dependencies); void GetRunList(std::vector>& parallelLists, std::unordered_map>& dependencies); bool TopologicalOrder(std::vector& TopoOrder); void GetUserObjs(std::unordered_set& graphExeUserObjs) { for (auto userObj : graphUserObj_) { userObj->retain(); graphExeUserObjs.insert(userObj); } } ihipGraph* clone(std::unordered_map& clonedNodes) const; ihipGraph* clone() const; void GenerateDOT(std::ostream& fout, hipGraphDebugDotFlags flag) { fout << "subgraph cluster_" << GetID() << " {" << std::endl; fout << "label=\"graph_" << GetID() <<"\"graph[style=\"dashed\"];\n"; for (auto node : vertices_) { node->GenerateDOTNode(GetID(), fout, flag); } fout << "\n"; for (auto& node : vertices_) { node->GenerateDOTNodeEdges(GetID(), fout, flag); } fout << "}" << std::endl; for (auto node : vertices_) { node->GenerateDOT(fout, flag); } } void* AllocateMemory(size_t size, hip::Stream* stream, void* dptr) const { auto ptr = mem_pool_->AllocateMemory(size, stream, dptr); return ptr; } void* ReserveAddress(size_t size) const { void* startAddress = nullptr; void* ptr; for (auto& dev : g_devices) { const auto& dev_info = dev->devices()[0]->info(); ptr = dev->devices()[0]->virtualAlloc(startAddress, size, dev_info.virtualMemAllocGranularity_); // if addr==0 then runtime will use the first VA on other devices if (startAddress == nullptr) { startAddress = ptr; } else if (ptr != startAddress) { // if runtime cannot reserve the same VA on other devices, just fail for (auto& d : g_devices) { if (d == dev) { d->devices()[0]->virtualFree(ptr); return nullptr; } d->devices()[0]->virtualFree(startAddress); } } } return ptr; } void FreeAddress(void* ptr) const { for (auto& dev : g_devices) { dev->devices()[0]->virtualFree(ptr); } } void FreeMemory(void* dev_ptr, hip::Stream* stream) const { size_t offset = 0; auto memory = getMemoryObject(dev_ptr, offset); if (memory != nullptr) { auto device_id = memory->getUserData().deviceId; if (!g_devices[device_id]->FreeMemory(memory, stream)) { LogError("Memory didn't belong to any pool!"); } } } bool ProbeMemory(void* dev_ptr) const { size_t offset = 0; auto memory = getMemoryObject(dev_ptr, offset); if (memory != nullptr) { return mem_pool_->IsBusyMemory(memory); } return false; } void FreeAllMemory(hip::Stream* stream) { mem_pool_->FreeAllMemory(stream); } bool IsGraphInstantiated() const { return graphInstantiated_; } void SetGraphInstantiated(bool graphInstantiate) { graphInstantiated_ = graphInstantiate; } }; struct hipGraphExec { std::vector> parallelLists_; // Topological order of the graph doesn't include nodes embedded as part of the child graph std::vector topoOrder_; std::unordered_map> nodeWaitLists_; std::vector parallel_streams_; uint currentQueueIndex_; std::unordered_map clonedNodes_; amd::Command* lastEnqueuedCommand_; static std::unordered_set graphExecSet_; std::unordered_set graphExeUserObj_; static amd::Monitor graphExecSetLock_; uint64_t flags_ = 0; bool repeatLaunch_ = false; public: hipGraphExec(std::vector& topoOrder, std::vector>& lists, std::unordered_map>& nodeWaitLists, std::unordered_map& clonedNodes, std::unordered_set& userObjs, uint64_t flags = 0) : parallelLists_(lists), topoOrder_(topoOrder), nodeWaitLists_(nodeWaitLists), clonedNodes_(clonedNodes), lastEnqueuedCommand_(nullptr), graphExeUserObj_(userObjs), currentQueueIndex_(0), flags_(flags) { amd::ScopedLock lock(graphExecSetLock_); graphExecSet_.insert(this); } ~hipGraphExec() { // new commands are launched for every launch they are destroyed as and when command is // terminated after it complete execution for (auto stream : parallel_streams_) { if (stream != nullptr) { hip::Stream::Destroy(stream); } } for (auto it = clonedNodes_.begin(); it != clonedNodes_.end(); it++) delete it->second; amd::ScopedLock lock(graphExecSetLock_); for (auto userobj : graphExeUserObj_) { userobj->release(); } graphExecSet_.erase(this); } Node GetClonedNode(Node node) { Node clonedNode; if (clonedNodes_.find(node) == clonedNodes_.end()) { return nullptr; } else { clonedNode = clonedNodes_[node]; } return clonedNode; } // check executable graphs validity static bool isGraphExecValid(hipGraphExec* pGraphExec); std::vector& GetNodes() { return topoOrder_; } hip::Stream* GetAvailableStreams() { return parallel_streams_[currentQueueIndex_++]; } void ResetQueueIndex() { currentQueueIndex_ = 0; } hipError_t Init(); hipError_t CreateStreams(uint32_t num_streams); hipError_t Run(hipStream_t stream); }; struct hipChildGraphNode : public hipGraphNode { struct ihipGraph* childGraph_; std::vector childGraphNodeOrder_; std::vector> parallelLists_; std::unordered_map> nodeWaitLists_; amd::Command* lastEnqueuedCommand_; public: hipChildGraphNode(ihipGraph* g) : hipGraphNode(hipGraphNodeTypeGraph, "solid", "rectangle") { childGraph_ = g->clone(); lastEnqueuedCommand_ = nullptr; } ~hipChildGraphNode() { delete childGraph_; } hipChildGraphNode(const hipChildGraphNode& rhs) : hipGraphNode(rhs) { childGraph_ = rhs.childGraph_->clone(); } hipGraphNode* clone() const { return new hipChildGraphNode(static_cast(*this)); } ihipGraph* GetChildGraph() { return childGraph_; } hipError_t GetNumParallelStreams(size_t &num) { if (false == TopologicalOrder(childGraphNodeOrder_)) { return hipErrorInvalidValue; } for (auto& node : childGraphNodeOrder_) { if (hipSuccess != node->GetNumParallelStreams(num)) { return hipErrorInvalidValue; } } // returns total number of parallel queues required for child graph nodes to be launched // first parallel list will be launched on the same queue as parent num += (parallelLists_.size() - 1); return hipSuccess; } void SetStream(hip::Stream* stream, hipGraphExec* ptr = nullptr) { stream_ = stream; UpdateStream(parallelLists_, stream, ptr); } // For nodes that are dependent on the child graph node waitlist is the last node of the first // parallel list std::vector& GetCommands() { return parallelLists_[0].back()->GetCommands(); } // Create child graph node commands and set waitlists hipError_t CreateCommand(hip::Stream* stream) { hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } commands_.reserve(2); std::vector rootCommands; amd::Command* endCommand = nullptr; status = FillCommands(parallelLists_, nodeWaitLists_, childGraphNodeOrder_, rootCommands, endCommand, stream); for (auto& cmd : rootCommands) { commands_.push_back(cmd); } if (endCommand != nullptr) { commands_.push_back(endCommand); } return status; } // void UpdateEventWaitLists(amd::Command::EventWaitList waitList) { parallelLists_[0].front()->UpdateEventWaitLists(waitList); } void GetRunList(std::vector>& parallelList, std::unordered_map>& dependencies) { childGraph_->GetRunList(parallelLists_, nodeWaitLists_); } bool TopologicalOrder(std::vector& TopoOrder) { return childGraph_->TopologicalOrder(TopoOrder); } void EnqueueCommands(hipStream_t stream) { // enqueue child graph start command if (commands_.size() == 1) { commands_[0]->enqueue(); commands_[0]->release(); } // enqueue nodes in child graph in level order for (auto& node : childGraphNodeOrder_) { node->EnqueueCommands(stream); } // enqueue child graph end command if (commands_.size() == 2) { commands_[1]->enqueue(); commands_[1]->release(); } } hipError_t SetParams(const ihipGraph* childGraph) { const std::vector& newNodes = childGraph->GetNodes(); const std::vector& oldNodes = childGraph_->GetNodes(); for (std::vector::size_type i = 0; i != newNodes.size(); i++) { hipError_t status = oldNodes[i]->SetParams(newNodes[i]); if (status != hipSuccess) { return status; } } return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipChildGraphNode* childGraphNode = static_cast(node); return SetParams(childGraphNode->childGraph_); } std::string GetLabel(hipGraphDebugDotFlags flag) { return std::to_string(GetID()) + "\n" + "graph_" + std::to_string(childGraph_->GetID()); } virtual void GenerateDOT(std::ostream& fout, hipGraphDebugDotFlags flag) { childGraph_->GenerateDOT(fout, flag); } }; class hipGraphKernelNode : public hipGraphNode { hipKernelNodeParams kernelParams_; unsigned int numParams_; hipKernelNodeAttrValue kernelAttr_; unsigned int kernelAttrInUse_; public: void PrintAttributes(std::ostream& out, hipGraphDebugDotFlags flag) { out << "["; out << "style"; out << "=\""; out << style_; (flag == hipGraphDebugDotFlagsKernelNodeParams || flag == hipGraphDebugDotFlagsKernelNodeAttributes) ? out << "\n" : out << "\""; out << "shape"; out << "=\""; out << GetShape(flag); out << "\""; out << "label"; out << "=\""; out << GetLabel(flag); out << "\""; out << "];"; } std::string GetLabel(hipGraphDebugDotFlags flag) { hipFunction_t func = getFunc(kernelParams_, ihipGetDevice()); hip::DeviceFunc* function = hip::DeviceFunc::asFunction(func); std::string label; char buffer[500]; if (flag == hipGraphDebugDotFlagsVerbose) { sprintf(buffer, "{\n%s\n| {ID | %d | %s\\<\\<\\<(%u,%u,%u),(%u,%u,%u),%u\\>\\>\\>}\n| {{node " "handle | func handle} | {%p | %p}}\n| {accessPolicyWindow | {base_ptr | num_bytes | " "hitRatio | hitProp | missProp} | {%p | %zu | %f | %d | %d}}\n| {cooperative | " "%u}\n| {priority | 0}\n}", label_.c_str(), GetID(), function->name().c_str(), kernelParams_.gridDim.x, kernelParams_.gridDim.y, kernelParams_.gridDim.z, kernelParams_.blockDim.x, kernelParams_.blockDim.y, kernelParams_.blockDim.z, kernelParams_.sharedMemBytes, this, kernelParams_.func, kernelAttr_.accessPolicyWindow.base_ptr, kernelAttr_.accessPolicyWindow.num_bytes, kernelAttr_.accessPolicyWindow.hitRatio, kernelAttr_.accessPolicyWindow.hitProp, kernelAttr_.accessPolicyWindow.missProp, kernelAttr_.cooperative); label = buffer; } else if (flag == hipGraphDebugDotFlagsKernelNodeAttributes) { sprintf(buffer, "{\n%s\n| {ID | %d | %s}\n" "| {accessPolicyWindow | {base_ptr | num_bytes | " "hitRatio | hitProp | missProp} |\n| {%p | %zu | %f | %d | %d}}\n| {cooperative | " "%u}\n| {priority | 0}\n}", label_.c_str(), GetID(), function->name().c_str(), kernelAttr_.accessPolicyWindow.base_ptr, kernelAttr_.accessPolicyWindow.num_bytes, kernelAttr_.accessPolicyWindow.hitRatio, kernelAttr_.accessPolicyWindow.hitProp, kernelAttr_.accessPolicyWindow.missProp, kernelAttr_.cooperative); label = buffer; } else if (flag == hipGraphDebugDotFlagsKernelNodeParams) { sprintf(buffer, "%d\n%s\n\\<\\<\\<(%u,%u,%u),(%u,%u,%u),%u\\>\\>\\>", GetID(), function->name().c_str(), kernelParams_.gridDim.x, kernelParams_.gridDim.y, kernelParams_.gridDim.z, kernelParams_.blockDim.x, kernelParams_.blockDim.y, kernelParams_.blockDim.z, kernelParams_.sharedMemBytes); label = buffer; } else { label = std::to_string(GetID()) + "\n" + function->name() + "\n"; } return label; } std::string GetShape(hipGraphDebugDotFlags flag) { if (flag == hipGraphDebugDotFlagsKernelNodeParams || flag == hipGraphDebugDotFlagsVerbose) { return "record"; } else { return shape_; } } static hipFunction_t getFunc(const hipKernelNodeParams& params, unsigned int device) { hipFunction_t func = nullptr; hipError_t status = PlatformState::instance().getStatFunc(&func, params.func, device); if (status == hipErrorInvalidSymbol) { // capturehipExtModuleLaunchKernel() mixes host function with hipFunction_t, so we convert // here. If it's wrong, later functions will fail func = static_cast(params.func); ClPrint(amd::LOG_INFO, amd::LOG_CODE, "[hipGraph] capturehipExtModuleLaunchKernel() should be called", status); } else if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] getStatFunc() failed with err %d", status); } return func; } hipError_t copyParams(const hipKernelNodeParams* pNodeParams) { hipFunction_t func = getFunc(*pNodeParams, ihipGetDevice()); if (!func) { return hipErrorInvalidDeviceFunction; } hip::DeviceFunc* function = hip::DeviceFunc::asFunction(func); amd::Kernel* kernel = function->kernel(); const amd::KernelSignature& signature = kernel->signature(); numParams_ = signature.numParameters(); // Allocate/assign memory if params are passed part of 'kernelParams' if (pNodeParams->kernelParams != nullptr) { kernelParams_.kernelParams = (void**)malloc(numParams_ * sizeof(void*)); if (kernelParams_.kernelParams == nullptr) { return hipErrorOutOfMemory; } for (uint32_t i = 0; i < numParams_; ++i) { const amd::KernelParameterDescriptor& desc = signature.at(i); kernelParams_.kernelParams[i] = malloc(desc.size_); if (kernelParams_.kernelParams[i] == nullptr) { return hipErrorOutOfMemory; } ::memcpy(kernelParams_.kernelParams[i], (pNodeParams->kernelParams[i]), desc.size_); } } // Allocate/assign memory if params are passed as part of 'extra' else if (pNodeParams->extra != nullptr) { // 'extra' is a struct that contains the following info: { // HIP_LAUNCH_PARAM_BUFFER_POINTER, kernargs, // HIP_LAUNCH_PARAM_BUFFER_SIZE, &kernargs_size, // HIP_LAUNCH_PARAM_END } unsigned int numExtra = 5; kernelParams_.extra = (void**)malloc(numExtra * sizeof(void*)); if (kernelParams_.extra == nullptr) { return hipErrorOutOfMemory; } kernelParams_.extra[0] = pNodeParams->extra[0]; size_t kernargs_size = *((size_t*)pNodeParams->extra[3]); kernelParams_.extra[1] = malloc(kernargs_size); if (kernelParams_.extra[1] == nullptr) { return hipErrorOutOfMemory; } kernelParams_.extra[2] = pNodeParams->extra[2]; kernelParams_.extra[3] = malloc(sizeof(void*)); if (kernelParams_.extra[3] == nullptr) { return hipErrorOutOfMemory; } *((size_t*)kernelParams_.extra[3]) = kernargs_size; ::memcpy(kernelParams_.extra[1], (pNodeParams->extra[1]), kernargs_size); kernelParams_.extra[4] = pNodeParams->extra[4]; } return hipSuccess; } hipGraphKernelNode(const hipKernelNodeParams* pNodeParams) : hipGraphNode(hipGraphNodeTypeKernel, "bold", "octagon", "KERNEL") { kernelParams_ = *pNodeParams; if (copyParams(pNodeParams) != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] Failed to copy params"); } memset(&kernelAttr_, 0, sizeof(kernelAttr_)); kernelAttrInUse_ = 0; } ~hipGraphKernelNode() { freeParams(); } void freeParams() { // Deallocate memory allocated for kernargs passed via 'kernelParams' if (kernelParams_.kernelParams != nullptr) { for (size_t i = 0; i < numParams_; ++i) { if (kernelParams_.kernelParams[i] != nullptr) { free(kernelParams_.kernelParams[i]); } kernelParams_.kernelParams[i] = nullptr; } free(kernelParams_.kernelParams); kernelParams_.kernelParams = nullptr; } // Deallocate memory allocated for kernargs passed via 'extra' else { free(kernelParams_.extra[1]); free(kernelParams_.extra[3]); memset(kernelParams_.extra, 0, 5 * sizeof(kernelParams_.extra[0])); // 5 items free(kernelParams_.extra); kernelParams_.extra = nullptr; } } hipGraphKernelNode(const hipGraphKernelNode& rhs) : hipGraphNode(rhs) { kernelParams_ = rhs.kernelParams_; hipError_t status = copyParams(&rhs.kernelParams_); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] Failed to allocate memory to copy params"); } memset(&kernelAttr_, 0, sizeof(kernelAttr_)); kernelAttrInUse_ = 0; status = CopyAttr(&rhs); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] Failed to during copy attrs"); } } hipGraphNode* clone() const { return new hipGraphKernelNode(static_cast(*this)); } hipError_t CreateCommand(hip::Stream* stream) { hipFunction_t func = nullptr; hipError_t status = validateKernelParams(&kernelParams_, &func, stream ? hip::getDeviceID(stream->context()) : -1); if (hipSuccess != status) { return status; } status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } commands_.reserve(1); amd::Command* command; status = ihipLaunchKernelCommand( command, func, kernelParams_.gridDim.x * kernelParams_.blockDim.x, kernelParams_.gridDim.y * kernelParams_.blockDim.y, kernelParams_.gridDim.z * kernelParams_.blockDim.z, kernelParams_.blockDim.x, kernelParams_.blockDim.y, kernelParams_.blockDim.z, kernelParams_.sharedMemBytes, stream, kernelParams_.kernelParams, kernelParams_.extra, nullptr, nullptr, 0, 0, 0, 0, 0, 0, 0); commands_.emplace_back(command); return status; } void GetParams(hipKernelNodeParams* params) { *params = kernelParams_; } hipError_t SetParams(const hipKernelNodeParams* params) { // updates kernel params hipError_t status = validateKernelParams(params); if (hipSuccess != status) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] Failed to validateKernelParams"); return status; } if ((kernelParams_.kernelParams && kernelParams_.kernelParams == params->kernelParams) || (kernelParams_.extra && kernelParams_.extra == params->extra)) { // params is copied from kernelParams_ and then updated, so just copy it back kernelParams_ = *params; return status; } freeParams(); kernelParams_ = *params; status = copyParams(params); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] Failed to set params"); } return status; } hipError_t SetAttrParams(hipKernelNodeAttrID attr, const hipKernelNodeAttrValue* params) { constexpr int accessPolicyMaxWindowSize = 1024; // updates kernel attr params if (attr == hipKernelNodeAttributeAccessPolicyWindow) { if (params->accessPolicyWindow.hitRatio > 1 || params->accessPolicyWindow.hitRatio < 0) { return hipErrorInvalidValue; } if (params->accessPolicyWindow.missProp == hipAccessPropertyPersisting) { return hipErrorInvalidValue; } if (params->accessPolicyWindow.num_bytes > 0 && params->accessPolicyWindow.hitRatio == 0) { return hipErrorInvalidValue; } // need to check against accessPolicyMaxWindowSize from device // accessPolicyMaxWindowSize not implemented on the device side yet if (params->accessPolicyWindow.num_bytes >= accessPolicyMaxWindowSize) { return hipErrorInvalidValue; } kernelAttr_.accessPolicyWindow.base_ptr = params->accessPolicyWindow.base_ptr; kernelAttr_.accessPolicyWindow.hitProp = params->accessPolicyWindow.hitProp; kernelAttr_.accessPolicyWindow.hitRatio = params->accessPolicyWindow.hitRatio; kernelAttr_.accessPolicyWindow.missProp = params->accessPolicyWindow.missProp; kernelAttr_.accessPolicyWindow.num_bytes = params->accessPolicyWindow.num_bytes; } else if (attr == hipKernelNodeAttributeCooperative) { kernelAttr_.cooperative = params->cooperative; } kernelAttrInUse_ = attr; return hipSuccess; } hipError_t GetAttrParams(hipKernelNodeAttrID attr, hipKernelNodeAttrValue* params) { // Get kernel attr params if (kernelAttrInUse_ != 0 && kernelAttrInUse_ != attr) return hipErrorInvalidValue; if (attr == hipKernelNodeAttributeAccessPolicyWindow) { params->accessPolicyWindow.base_ptr = kernelAttr_.accessPolicyWindow.base_ptr; params->accessPolicyWindow.hitProp = kernelAttr_.accessPolicyWindow.hitProp; params->accessPolicyWindow.hitRatio = kernelAttr_.accessPolicyWindow.hitRatio; params->accessPolicyWindow.missProp = kernelAttr_.accessPolicyWindow.missProp; params->accessPolicyWindow.num_bytes = kernelAttr_.accessPolicyWindow.num_bytes; } else if (attr == hipKernelNodeAttributeCooperative) { params->cooperative = kernelAttr_.cooperative; } return hipSuccess; } hipError_t CopyAttr(const hipGraphKernelNode* srcNode) { if (kernelAttrInUse_ == 0 && srcNode->kernelAttrInUse_ == 0) { return hipSuccess; } if (kernelAttrInUse_ != 0 && srcNode->kernelAttrInUse_ != kernelAttrInUse_) { return hipErrorInvalidContext; } kernelAttrInUse_ = srcNode->kernelAttrInUse_; switch (srcNode->kernelAttrInUse_) { case hipKernelNodeAttributeAccessPolicyWindow: kernelAttr_.accessPolicyWindow.base_ptr = srcNode->kernelAttr_.accessPolicyWindow.base_ptr; kernelAttr_.accessPolicyWindow.hitProp = srcNode->kernelAttr_.accessPolicyWindow.hitProp; kernelAttr_.accessPolicyWindow.hitRatio = srcNode->kernelAttr_.accessPolicyWindow.hitRatio; kernelAttr_.accessPolicyWindow.missProp = srcNode->kernelAttr_.accessPolicyWindow.missProp; kernelAttr_.accessPolicyWindow.num_bytes = srcNode->kernelAttr_.accessPolicyWindow.num_bytes; break; case hipKernelNodeAttributeCooperative: kernelAttr_.cooperative = srcNode->kernelAttr_.cooperative; break; default: return hipErrorInvalidValue; } return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphKernelNode* kernelNode = static_cast(node); return SetParams(&kernelNode->kernelParams_); } static hipError_t validateKernelParams(const hipKernelNodeParams* pNodeParams, hipFunction_t* ptrFunc = nullptr, int devId = -1) { devId = devId == -1 ? ihipGetDevice() : devId; hipFunction_t func = getFunc(*pNodeParams, devId); if (!func) { return hipErrorInvalidDeviceFunction; } size_t globalWorkSizeX = static_cast(pNodeParams->gridDim.x) * pNodeParams->blockDim.x; size_t globalWorkSizeY = static_cast(pNodeParams->gridDim.y) * pNodeParams->blockDim.y; size_t globalWorkSizeZ = static_cast(pNodeParams->gridDim.z) * pNodeParams->blockDim.z; hipError_t status = ihipLaunchKernel_validate( func, static_cast(globalWorkSizeX), static_cast(globalWorkSizeY), static_cast(globalWorkSizeZ), pNodeParams->blockDim.x, pNodeParams->blockDim.y, pNodeParams->blockDim.z, pNodeParams->sharedMemBytes, pNodeParams->kernelParams, pNodeParams->extra, devId, 0); if (status != hipSuccess) { return status; } if (ptrFunc) *ptrFunc = func; return hipSuccess; } }; class hipGraphMemcpyNode : public hipGraphNode { hipMemcpy3DParms copyParams_; public: hipGraphMemcpyNode(const hipMemcpy3DParms* pCopyParams) : hipGraphNode(hipGraphNodeTypeMemcpy, "solid", "trapezium", "MEMCPY") { copyParams_ = *pCopyParams; } ~hipGraphMemcpyNode() {} hipGraphMemcpyNode(const hipGraphMemcpyNode& rhs) : hipGraphNode(rhs) { copyParams_ = rhs.copyParams_; } hipGraphNode* clone() const { return new hipGraphMemcpyNode(static_cast(*this)); } hipError_t CreateCommand(hip::Stream* stream) { if (IsHtoHMemcpy(copyParams_.dstPtr.ptr, copyParams_.srcPtr.ptr, copyParams_.kind)) { return hipSuccess; } hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } commands_.reserve(1); amd::Command* command; status = ihipMemcpy3DCommand(command, ©Params_, stream); commands_.emplace_back(command); return status; } void EnqueueCommands(hipStream_t stream) override { if (isEnabled_ && IsHtoHMemcpy(copyParams_.dstPtr.ptr, copyParams_.srcPtr.ptr, copyParams_.kind)) { ihipHtoHMemcpy(copyParams_.dstPtr.ptr, copyParams_.srcPtr.ptr, copyParams_.extent.width * copyParams_.extent.height * copyParams_.extent.depth, *hip::getStream(stream)); return; } hipGraphNode::EnqueueCommands(stream); } void GetParams(hipMemcpy3DParms* params) { std::memcpy(params, ©Params_, sizeof(hipMemcpy3DParms)); } hipError_t SetParams(const hipMemcpy3DParms* params) { hipError_t status = ValidateParams(params); if (status != hipSuccess) { return status; } std::memcpy(©Params_, params, sizeof(hipMemcpy3DParms)); return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphMemcpyNode* memcpyNode = static_cast(node); return SetParams(&memcpyNode->copyParams_); } // ToDo: use this when commands are cloned and command params are to be updated hipError_t ValidateParams(const hipMemcpy3DParms* pNodeParams); std::string GetLabel(hipGraphDebugDotFlags flag) { size_t offset = 0; const HIP_MEMCPY3D pCopy = hip::getDrvMemcpy3DDesc(copyParams_); hipMemoryType srcMemoryType = pCopy.srcMemoryType; if (srcMemoryType == hipMemoryTypeUnified) { srcMemoryType = getMemoryObject(pCopy.srcDevice, offset) ? hipMemoryTypeDevice : hipMemoryTypeHost; } offset = 0; hipMemoryType dstMemoryType = pCopy.dstMemoryType; if (dstMemoryType == hipMemoryTypeUnified) { dstMemoryType = getMemoryObject(pCopy.dstDevice, offset) ? hipMemoryTypeDevice : hipMemoryTypeHost; } // If {src/dst}MemoryType is hipMemoryTypeHost, check if the memory was prepinned. // In that case upgrade the copy type to hipMemoryTypeDevice to avoid extra pinning. offset = 0; if (srcMemoryType == hipMemoryTypeHost) { amd::Memory* mem = getMemoryObject(pCopy.srcHost, offset); srcMemoryType = mem ? hipMemoryTypeDevice : hipMemoryTypeHost; } if (dstMemoryType == hipMemoryTypeHost) { amd::Memory* mem = getMemoryObject(pCopy.dstHost, offset); dstMemoryType = mem ? hipMemoryTypeDevice : hipMemoryTypeHost; } std::string memcpyDirection; if ((srcMemoryType == hipMemoryTypeHost) && (dstMemoryType == hipMemoryTypeDevice)) { // Host to Device. memcpyDirection = "HtoD"; } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeHost)) { // Device to Host. memcpyDirection = "DtoH"; } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeDevice)) { // Device to Device. memcpyDirection = "DtoD"; } else if ((srcMemoryType == hipMemoryTypeHost) && (dstMemoryType == hipMemoryTypeArray)) { memcpyDirection = "HtoA"; } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeHost)) { // Image to Host. memcpyDirection = "AtoH"; } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeArray)) { // Device to Image. memcpyDirection = "DtoA"; } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeDevice)) { // Image to Device. memcpyDirection = "AtoD"; } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeArray)) { memcpyDirection = "AtoA"; } std::string label; if (flag == hipGraphDebugDotFlagsMemcpyNodeParams || flag == hipGraphDebugDotFlagsVerbose) { char buffer[500]; sprintf( buffer, "{\n%s\n| {{ID | node handle} | {%u | %p}}\n| {kind | %s}\n| {{srcPtr | dstPtr} | " "{pitch " "| ptr | xsize | ysize | pitch | ptr | xsize | size} | {%zu | %p | %zu | %zu | %zu | %p " "| %zu " "| %zu}}\n| {{srcPos | {{x | %zu} | {y | %zu} | {z | %zu}}} | {dstPos | {{x | %zu} | {y " "| " "%zu} | {z | %zu}}} | {Extent | {{Width | %zu} | {Height | %zu} | {Depth | %zu}}}}\n}", label_.c_str(), GetID(), this, memcpyDirection.c_str(), copyParams_.srcPtr.pitch, copyParams_.srcPtr.ptr, copyParams_.srcPtr.xsize, copyParams_.srcPtr.ysize, copyParams_.dstPtr.pitch, copyParams_.dstPtr.ptr, copyParams_.dstPtr.xsize, copyParams_.dstPtr.ysize, copyParams_.srcPos.x, copyParams_.srcPos.y, copyParams_.srcPos.z, copyParams_.dstPos.x, copyParams_.dstPos.y, copyParams_.dstPos.z, copyParams_.extent.width, copyParams_.extent.height, copyParams_.extent.depth); label = buffer; } else { label = std::to_string(GetID()) + "\nMEMCPY\n(" + memcpyDirection + ")"; } return label; } std::string GetShape(hipGraphDebugDotFlags flag) { if (flag == hipGraphDebugDotFlagsMemcpyNodeParams || flag == hipGraphDebugDotFlagsVerbose) { return "record"; } else { return shape_; } } }; class hipGraphMemcpyNode1D : public hipGraphNode { protected: void* dst_; const void* src_; size_t count_; hipMemcpyKind kind_; public: hipGraphMemcpyNode1D(void* dst, const void* src, size_t count, hipMemcpyKind kind, hipGraphNodeType type = hipGraphNodeTypeMemcpy) : hipGraphNode(type, "solid", "trapezium", "MEMCPY"), dst_(dst), src_(src), count_(count), kind_(kind) {} ~hipGraphMemcpyNode1D() {} hipGraphNode* clone() const { return new hipGraphMemcpyNode1D(static_cast(*this)); } virtual hipError_t CreateCommand(hip::Stream* stream) { if (IsHtoHMemcpy(dst_, src_, kind_)) { return hipSuccess; } hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } commands_.reserve(1); amd::Command* command = nullptr; status = ihipMemcpyCommand(command, dst_, src_, count_, kind_, *stream); commands_.emplace_back(command); return status; } void EnqueueCommands(hipStream_t stream) { bool isH2H = IsHtoHMemcpy(dst_, src_, kind_); if (!isH2H) { if (commands_.empty()) return; // commands_ should have just 1 item assert(commands_.size() == 1 && "Invalid command size in hipGraphMemcpyNode1D"); } if (isEnabled_) { //HtoH if (isH2H) { ihipHtoHMemcpy(dst_, src_, count_, *hip::getStream(stream)); return; } amd::Command* command = commands_[0]; amd::HostQueue* cmdQueue = command->queue(); hip::Stream* hip_stream = hip::getStream(stream); if (cmdQueue == hip_stream) { command->enqueue(); command->release(); return; } amd::Command::EventWaitList waitList; amd::Command* depdentMarker = nullptr; amd::Command* cmd = hip_stream->getLastQueuedCommand(true); if (cmd != nullptr) { waitList.push_back(cmd); amd::Command* depdentMarker = new amd::Marker(*cmdQueue, true, waitList); if (depdentMarker != nullptr) { depdentMarker->enqueue(); // Make sure command synced with last command of queue depdentMarker->release(); } cmd->release(); } command->enqueue(); command->release(); cmd = cmdQueue->getLastQueuedCommand(true); // should be command if (cmd != nullptr) { waitList.clear(); waitList.push_back(cmd); amd::Command* depdentMarker = new amd::Marker(*hip_stream, true, waitList); if (depdentMarker != nullptr) { depdentMarker->enqueue(); // Make sure future commands of queue synced with command depdentMarker->release(); } cmd->release(); } } else { amd::Command::EventWaitList waitList; hip::Stream* hip_stream = hip::getStream(stream); amd::Command* command = new amd::Marker(*hip_stream, !kMarkerDisableFlush, waitList); command->enqueue(); command->release(); } } hipError_t SetParams(void* dst, const void* src, size_t count, hipMemcpyKind kind) { hipError_t status = ValidateParams(dst, src, count, kind); if (status != hipSuccess) { return status; } dst_ = dst; src_ = src; count_ = count; kind_ = kind; return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphMemcpyNode1D* memcpy1DNode = static_cast(node); return SetParams(memcpy1DNode->dst_, memcpy1DNode->src_, memcpy1DNode->count_, memcpy1DNode->kind_); } static hipError_t ValidateParams(void* dst, const void* src, size_t count, hipMemcpyKind kind); std::string GetLabel(hipGraphDebugDotFlags flag) { size_t sOffsetOrig = 0; amd::Memory* origSrcMemory = getMemoryObject(src_, sOffsetOrig); size_t dOffsetOrig = 0; amd::Memory* origDstMemory = getMemoryObject(dst_, dOffsetOrig); size_t sOffset = 0; amd::Memory* srcMemory = getMemoryObject(src_, sOffset); size_t dOffset = 0; amd::Memory* dstMemory = getMemoryObject(dst_, dOffset); std::string memcpyDirection; if ((srcMemory == nullptr) && (dstMemory != nullptr)) { // host to device memcpyDirection = "HtoD"; } else if ((srcMemory != nullptr) && (dstMemory == nullptr)) { // device to host memcpyDirection = "DtoH"; } else if ((srcMemory != nullptr) && (dstMemory != nullptr)) { memcpyDirection = "DtoD"; } else { if (kind_ == hipMemcpyHostToDevice) { memcpyDirection = "HtoD"; } else if (kind_ == hipMemcpyDeviceToHost) { memcpyDirection = "DtoH"; } } std::string label; if (flag == hipGraphDebugDotFlagsMemcpyNodeParams || flag == hipGraphDebugDotFlagsVerbose) { char buffer[500]; sprintf(buffer, "{\n%s\n| {{ID | node handle} | {%u | %p}}\n| {kind | %s}\n| {{srcPtr | dstPtr} | " "{pitch " "| ptr | xsize | ysize | pitch | ptr | xsize | size} | {%zu | %p | %zu | %zu | %zu | %p " "| %zu " "| %zu}}\n| {{srcPos | {{x | %zu} | {y | %zu} | {z | %zu}}} | {dstPos | {{x | %zu} | {y " "| " "%zu} | {z | %zu}}} | {Extent | {{Width | %zu} | {Height | %zu} | {Depth | %zu}}}}\n}", label_.c_str(), GetID(), this, memcpyDirection.c_str(), (size_t)0, src_, (size_t)0, (size_t)0, (size_t)0, dst_, (size_t)0, (size_t)0, (size_t)0, (size_t)0, (size_t)0, (size_t)0, (size_t)0, (size_t)0, count_, (size_t)1, (size_t)1); label = buffer; } else { label = std::to_string(GetID()) + "\n" + label_ + "\n(" + memcpyDirection + "," + std::to_string(count_) + ")"; } return label; } std::string GetShape(hipGraphDebugDotFlags flag) { if (flag == hipGraphDebugDotFlagsMemcpyNodeParams || flag == hipGraphDebugDotFlagsVerbose) { return "record"; } else { return shape_; } } }; class hipGraphMemcpyNodeFromSymbol : public hipGraphMemcpyNode1D { const void* symbol_; size_t offset_; public: hipGraphMemcpyNodeFromSymbol(void* dst, const void* symbol, size_t count, size_t offset, hipMemcpyKind kind) : hipGraphMemcpyNode1D(dst, nullptr, count, kind, hipGraphNodeTypeMemcpy), symbol_(symbol), offset_(offset) {} ~hipGraphMemcpyNodeFromSymbol() {} hipGraphNode* clone() const { return new hipGraphMemcpyNodeFromSymbol( static_cast(*this)); } hipError_t CreateCommand(hip::Stream* stream) { hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } commands_.reserve(1); amd::Command* command = nullptr; size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; status = ihipMemcpySymbol_validate(symbol_, count_, offset_, sym_size, device_ptr); if (status != hipSuccess) { return status; } status = ihipMemcpyCommand(command, dst_, device_ptr, count_, kind_, *stream); if (status != hipSuccess) { return status; } commands_.emplace_back(command); return status; } hipError_t SetParams(void* dst, const void* symbol, size_t count, size_t offset, hipMemcpyKind kind, bool isExec = false) { if (isExec) { size_t discardOffset = 0; amd::Memory *memObj = getMemoryObject(dst, discardOffset); if (memObj != nullptr) { amd::Memory *memObjOri = getMemoryObject(dst_, discardOffset); if (memObjOri != nullptr) { if (memObjOri->getUserData().deviceId != memObj->getUserData().deviceId) { return hipErrorInvalidValue; } } } } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; // check to see if dst is also a symbol (hip negative test case) hipError_t status = ihipMemcpySymbol_validate(dst, count, offset, sym_size, device_ptr); if (status == hipSuccess) { return hipErrorInvalidValue; } status = ihipMemcpySymbol_validate(symbol, count, offset, sym_size, device_ptr); if (status != hipSuccess) { return status; } size_t dOffset = 0; amd::Memory* dstMemory = getMemoryObject(dst, dOffset); if (dstMemory == nullptr && kind != hipMemcpyDeviceToHost && kind != hipMemcpyDefault) { return hipErrorInvalidMemcpyDirection; } else if (dstMemory != nullptr && dstMemory->getMemFlags() == 0 && kind != hipMemcpyDeviceToDevice && kind != hipMemcpyDefault) { return hipErrorInvalidMemcpyDirection; } else if (kind == hipMemcpyHostToHost || kind == hipMemcpyHostToDevice) { return hipErrorInvalidMemcpyDirection; } dst_ = dst; symbol_ = symbol; count_ = count; offset_ = offset; kind_ = kind; return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphMemcpyNodeFromSymbol* memcpyNode = static_cast(node); return SetParams(memcpyNode->dst_, memcpyNode->symbol_, memcpyNode->count_, memcpyNode->offset_, memcpyNode->kind_); } }; class hipGraphMemcpyNodeToSymbol : public hipGraphMemcpyNode1D { const void* symbol_; size_t offset_; public: hipGraphMemcpyNodeToSymbol(const void* symbol, const void* src, size_t count, size_t offset, hipMemcpyKind kind) : hipGraphMemcpyNode1D(nullptr, src, count, kind, hipGraphNodeTypeMemcpy), symbol_(symbol), offset_(offset) {} ~hipGraphMemcpyNodeToSymbol() {} hipGraphNode* clone() const { return new hipGraphMemcpyNodeToSymbol(static_cast(*this)); } hipError_t CreateCommand(hip::Stream* stream) { hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } commands_.reserve(1); amd::Command* command = nullptr; size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; status = ihipMemcpySymbol_validate(symbol_, count_, offset_, sym_size, device_ptr); if (status != hipSuccess) { return status; } status = ihipMemcpyCommand(command, device_ptr, src_, count_, kind_, *stream); if (status != hipSuccess) { return status; } commands_.emplace_back(command); return status; } hipError_t SetParams(const void* symbol, const void* src, size_t count, size_t offset, hipMemcpyKind kind, bool isExec = false) { if (isExec) { size_t discardOffset = 0; amd::Memory *memObj = getMemoryObject(src, discardOffset); if (memObj != nullptr) { amd::Memory *memObjOri = getMemoryObject(src_, discardOffset); if (memObjOri != nullptr) { if (memObjOri->getUserData().deviceId != memObj->getUserData().deviceId) { return hipErrorInvalidValue; } } } } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; // check to see if src is also a symbol (hip negative test case) hipError_t status = ihipMemcpySymbol_validate(src, count, offset, sym_size, device_ptr); if (status == hipSuccess) { return hipErrorInvalidValue; } status = ihipMemcpySymbol_validate(symbol, count, offset, sym_size, device_ptr); if (status != hipSuccess) { return status; } size_t dOffset = 0; amd::Memory* srcMemory = getMemoryObject(src, dOffset); if (srcMemory == nullptr && kind != hipMemcpyHostToDevice && kind != hipMemcpyDefault) { return hipErrorInvalidValue; } else if (srcMemory != nullptr && srcMemory->getMemFlags() == 0 && kind != hipMemcpyDeviceToDevice && kind != hipMemcpyDefault) { return hipErrorInvalidValue; } else if (kind == hipMemcpyHostToHost || kind == hipMemcpyDeviceToHost) { return hipErrorInvalidValue; } symbol_ = symbol; src_ = src; count_ = count; offset_ = offset; kind_ = kind; return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphMemcpyNodeToSymbol* memcpyNode = static_cast(node); return SetParams(memcpyNode->src_, memcpyNode->symbol_, memcpyNode->count_, memcpyNode->offset_, memcpyNode->kind_); } }; class hipGraphMemsetNode : public hipGraphNode { hipMemsetParams memsetParams_; public: hipGraphMemsetNode(const hipMemsetParams* pMemsetParams) : hipGraphNode(hipGraphNodeTypeMemset, "solid", "invtrapezium", "MEMSET") { memsetParams_ = *pMemsetParams; size_t sizeBytes = 0; if (memsetParams_.height == 1) { sizeBytes = memsetParams_.width * memsetParams_.elementSize; } else { sizeBytes = memsetParams_.width * memsetParams_.height * memsetParams_.elementSize; } } ~hipGraphMemsetNode() { } // Copy constructor hipGraphMemsetNode(const hipGraphMemsetNode& memsetNode) : hipGraphNode(memsetNode) { memsetParams_ = memsetNode.memsetParams_; } hipGraphNode* clone() const { return new hipGraphMemsetNode(static_cast(*this)); } std::string GetLabel(hipGraphDebugDotFlags flag) { std::string label; if (flag == hipGraphDebugDotFlagsMemsetNodeParams || flag == hipGraphDebugDotFlagsVerbose) { char buffer[500]; sprintf(buffer, "{\n%s\n| {{ID | node handle | dptr | pitch | value | elementSize | width | " "height} | {%u | %p | %p | %zu | %u | %u | %zu | %zu}}}", label_.c_str(), GetID(), this, memsetParams_.dst, memsetParams_.pitch, memsetParams_.value, memsetParams_.elementSize, memsetParams_.width, memsetParams_.height); label = buffer; } else { size_t sizeBytes; if (memsetParams_.height == 1) { sizeBytes = memsetParams_.width * memsetParams_.elementSize; } else { sizeBytes = memsetParams_.width * memsetParams_.height * memsetParams_.elementSize; } label = std::to_string(GetID()) + "\n" + label_ + "\n(" + std::to_string(memsetParams_.value) + "," + std::to_string(sizeBytes) + ")"; } return label; } std::string GetShape(hipGraphDebugDotFlags flag) { if (flag == hipGraphDebugDotFlagsMemsetNodeParams || flag == hipGraphDebugDotFlagsVerbose) { return "record"; } else { return shape_; } } hipError_t CreateCommand(hip::Stream* stream) { hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } if (memsetParams_.height == 1) { size_t sizeBytes = memsetParams_.width * memsetParams_.elementSize; hipError_t status = ihipMemsetCommand(commands_, memsetParams_.dst, memsetParams_.value, memsetParams_.elementSize, sizeBytes, stream); } else { hipError_t status = ihipMemset3DCommand( commands_, {memsetParams_.dst, memsetParams_.pitch, memsetParams_.width * memsetParams_.elementSize, memsetParams_.height}, memsetParams_.value, {memsetParams_.width * memsetParams_.elementSize, memsetParams_.height, 1}, stream, memsetParams_.elementSize); } return status; } void GetParams(hipMemsetParams* params) { std::memcpy(params, &memsetParams_, sizeof(hipMemsetParams)); } hipError_t SetParams(const hipMemsetParams* params, bool isExec = false) { hipError_t hip_error = hipSuccess; hip_error = ihipGraphMemsetParams_validate(params); if (hip_error != hipSuccess) { return hip_error; } if (isExec) { size_t discardOffset = 0; amd::Memory *memObj = getMemoryObject(params->dst, discardOffset); if (memObj != nullptr) { amd::Memory *memObjOri = getMemoryObject(memsetParams_.dst, discardOffset); if (memObjOri != nullptr) { if (memObjOri->getUserData().deviceId != memObj->getUserData().deviceId) { return hipErrorInvalidValue; } } } } size_t sizeBytes; if (params->height == 1) { // 1D - for hipGraphMemsetNodeSetParams & hipGraphExecMemsetNodeSetParams, They return // invalid value if new width is more than actual allocation. size_t discardOffset = 0; amd::Memory *memObj = getMemoryObject(params->dst, discardOffset); if (memObj != nullptr) { if (params->width * params->elementSize > memObj->getSize()) { return hipErrorInvalidValue; } } sizeBytes = params->width * params->elementSize; hip_error = ihipMemset_validate(params->dst, params->value, params->elementSize, sizeBytes); } else { if (isExec) { // 2D - hipGraphExecMemsetNodeSetParams returns invalid value if new width or new height is // not same as what memset node is added with. if (memsetParams_.width * memsetParams_.elementSize != params->width * params->elementSize || memsetParams_.height != params->height) { return hipErrorInvalidValue; } } else { // 2D - hipGraphMemsetNodeSetParams returns invalid value if new width or new height is // greter than actual allocation. size_t discardOffset = 0; amd::Memory *memObj = getMemoryObject(params->dst, discardOffset); if (memObj != nullptr) { if (params->width * params->elementSize > memObj->getUserData().width_ || params->height > memObj->getUserData().height_) { return hipErrorInvalidValue; } } } sizeBytes = params->width * params->elementSize * params->height * 1; hip_error = ihipMemset3D_validate({params->dst, params->pitch, params->width * params->elementSize, params->height}, params->value, {params->width * params->elementSize, params->height, 1}, sizeBytes); } if (hip_error != hipSuccess) { return hip_error; } std::memcpy(&memsetParams_, params, sizeof(hipMemsetParams)); return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphMemsetNode* memsetNode = static_cast(node); return SetParams(&memsetNode->memsetParams_); } }; class hipGraphEventRecordNode : public hipGraphNode { hipEvent_t event_; public: hipGraphEventRecordNode(hipEvent_t event) : hipGraphNode(hipGraphNodeTypeEventRecord, "solid", "rectangle", "EVENT_RECORD"), event_(event) {} ~hipGraphEventRecordNode() {} hipGraphNode* clone() const { return new hipGraphEventRecordNode(static_cast(*this)); } hipError_t CreateCommand(hip::Stream* stream) { hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } hip::Event* e = reinterpret_cast(event_); commands_.reserve(1); amd::Command* command = nullptr; status = e->recordCommand(command, stream); commands_.emplace_back(command); return status; } void EnqueueCommands(hipStream_t stream) { if (!commands_.empty()) { hip::Event* e = reinterpret_cast(event_); // command release during enqueueRecordCommand hipError_t status = e->enqueueRecordCommand(stream, commands_[0], true); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] enqueue event record command failed for node %p - status %d\n", this, status); } } } void GetParams(hipEvent_t* event) const { *event = event_; } hipError_t SetParams(hipEvent_t event) { event_ = event; return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphEventRecordNode* eventRecordNode = static_cast(node); return SetParams(eventRecordNode->event_); } }; class hipGraphEventWaitNode : public hipGraphNode { hipEvent_t event_; public: hipGraphEventWaitNode(hipEvent_t event) : hipGraphNode(hipGraphNodeTypeWaitEvent, "solid", "rectangle", "EVENT_WAIT"), event_(event) {} ~hipGraphEventWaitNode() {} hipGraphNode* clone() const { return new hipGraphEventWaitNode(static_cast(*this)); } hipError_t CreateCommand(hip::Stream* stream) { hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } hip::Event* e = reinterpret_cast(event_); commands_.reserve(1); amd::Command* command; status = e->streamWaitCommand(command, stream); commands_.emplace_back(command); return status; } void EnqueueCommands(hipStream_t stream) { if (!commands_.empty()) { hip::Event* e = reinterpret_cast(event_); hipError_t status = e->enqueueStreamWaitCommand(stream, commands_[0]); if (status != hipSuccess) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] enqueue stream wait command failed for node %p - status %d\n", this, status); } commands_[0]->release(); } } void GetParams(hipEvent_t* event) const { *event = event_; } hipError_t SetParams(hipEvent_t event) { event_ = event; return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphEventWaitNode* eventWaitNode = static_cast(node); return SetParams(eventWaitNode->event_); } }; class hipGraphHostNode : public hipGraphNode { hipHostNodeParams NodeParams_; public: hipGraphHostNode(const hipHostNodeParams* NodeParams) : hipGraphNode(hipGraphNodeTypeHost, "solid", "rectangle", "HOST") { NodeParams_ = *NodeParams; } ~hipGraphHostNode() { } hipGraphHostNode(const hipGraphHostNode& hostNode) : hipGraphNode(hostNode) { NodeParams_ = hostNode.NodeParams_; } hipGraphNode* clone() const { return new hipGraphHostNode(static_cast(*this)); } hipError_t CreateCommand(hip::Stream* stream) { hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } amd::Command::EventWaitList waitList; commands_.reserve(1); amd::Command* command = new amd::Marker(*stream, !kMarkerDisableFlush, waitList); commands_.emplace_back(command); return hipSuccess; } static void Callback(cl_event event, cl_int command_exec_status, void* user_data) { hipHostNodeParams* NodeParams = reinterpret_cast(user_data); NodeParams->fn(NodeParams->userData); } void EnqueueCommands(hipStream_t stream) { if (!commands_.empty()) { if (!commands_[0]->setCallback(CL_COMPLETE, hipGraphHostNode::Callback, &NodeParams_)) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] Failed during setCallback"); } commands_[0]->enqueue(); // Add the new barrier to stall the stream, until the callback is done amd::Command::EventWaitList eventWaitList; eventWaitList.push_back(commands_[0]); amd::Command* block_command = new amd::Marker(*commands_[0]->queue(), !kMarkerDisableFlush, eventWaitList); if (block_command == nullptr) { ClPrint(amd::LOG_ERROR, amd::LOG_CODE, "[hipGraph] Failed during block command creation"); } block_command->enqueue(); block_command->release(); commands_[0]->release(); } } void GetParams(hipHostNodeParams* params) { std::memcpy(params, &NodeParams_, sizeof(hipHostNodeParams)); } hipError_t SetParams(const hipHostNodeParams* params) { std::memcpy(&NodeParams_, params, sizeof(hipHostNodeParams)); return hipSuccess; } hipError_t SetParams(hipGraphNode* node) { const hipGraphHostNode* hostNode = static_cast(node); return SetParams(&hostNode->NodeParams_); } }; class hipGraphEmptyNode : public hipGraphNode { public: hipGraphEmptyNode() : hipGraphNode(hipGraphNodeTypeEmpty, "solid", "rectangle", "EMPTY") {} ~hipGraphEmptyNode() {} hipGraphNode* clone() const { return new hipGraphEmptyNode(static_cast(*this)); } hipError_t CreateCommand(hip::Stream* stream) { hipError_t status = hipGraphNode::CreateCommand(stream); if (status != hipSuccess) { return status; } amd::Command::EventWaitList waitList; commands_.reserve(1); amd::Command* command = new amd::Marker(*stream, !kMarkerDisableFlush, waitList); commands_.emplace_back(command); return hipSuccess; } }; // ================================================================================================ class hipGraphMemAllocNode : public hipGraphNode { hipMemAllocNodeParams node_params_; // Node parameters for memory allocation amd::Memory* va_ = nullptr; // Memory object, which holds a virtual address // Derive the new class for VirtualMapCommand, // so runtime can allocate memory during the execution of command class VirtualMemAllocNode : public amd::VirtualMapCommand { public: VirtualMemAllocNode(amd::HostQueue& queue, const amd::Event::EventWaitList& eventWaitList, amd::Memory* va, size_t size, amd::Memory* memory, ihipGraph* graph) : VirtualMapCommand(queue, eventWaitList, va->getSvmPtr(), size, memory), va_(va), graph_(graph) {} virtual void submit(device::VirtualDevice& device) final { // Remove VA reference from the global mapping. Runtime has to keep a dummy reference for // validation logic during the capture or creation of the nodes if (amd::MemObjMap::FindMemObj(va_->getSvmPtr())) { amd::MemObjMap::RemoveMemObj(va_->getSvmPtr()); } // Allocate real memory for mapping const auto& dev_info = queue()->device().info(); auto aligned_size = amd::alignUp(size_, dev_info.virtualMemAllocGranularity_); auto dptr = graph_->AllocateMemory(aligned_size, static_cast(queue()), nullptr); if (dptr == nullptr) { setStatus(CL_INVALID_OPERATION); return; } size_t offset = 0; // Get memory object associated with the real allocation memory_ = getMemoryObject(dptr, offset); // Retain memory object because command release will release it memory_->retain(); size_ = aligned_size; // Save geenric allocation info to match VM interfaces memory_->getUserData().data = new hip::MemMapAllocUserData(dptr, aligned_size, va_); // Execute the original mapping command VirtualMapCommand::submit(device); // Update the internal svm address to ptr memory()->setSvmPtr(va_->getSvmPtr()); // Can't destroy VA, because it's used in mapping even if the node will be destroyed va_->retain(); ClPrint(amd::LOG_INFO, amd::LOG_MEM_POOL, "Graph MemAlloc execute: %p, %p", va_->getSvmPtr(), memory()); } private: amd::Memory* va_; // Memory object with the new virtual address for mapping ihipGraph* graph_; // Graph which allocates/maps memory }; public: hipGraphMemAllocNode(const hipMemAllocNodeParams* node_params) : hipGraphNode(hipGraphNodeTypeMemAlloc, "solid", "rectangle", "MEM_ALLOC") { node_params_ = *node_params; } hipGraphMemAllocNode(const hipGraphMemAllocNode& rhs) : hipGraphNode(rhs) { node_params_ = rhs.node_params_; if (HIP_MEM_POOL_USE_VM) { assert(rhs.va_ != nullptr && "Graph MemAlloc runtime can't clone an invalid node!"); va_ = rhs.va_; va_->retain(); } } virtual ~hipGraphMemAllocNode() final { if (va_ != nullptr) { va_->release(); } } virtual hipGraphNode* clone() const final { return new hipGraphMemAllocNode(static_cast(*this)); } virtual hipError_t CreateCommand(hip::Stream* stream) final { auto error = hipGraphNode::CreateCommand(stream); if (!HIP_MEM_POOL_USE_VM) { auto ptr = Execute(stream_); } else { auto graph = GetParentGraph(); if (graph != nullptr) { assert(va_ != nullptr && "Runtime can't create a command for an invalid node!"); // Create command for memory mapping auto cmd = new VirtualMemAllocNode(*stream, amd::Event::EventWaitList{}, va_, node_params_.bytesize, nullptr, graph); commands_.push_back(cmd); size_t offset = 0; // Check if memory was already added after first reserve if (getMemoryObject(node_params_.dptr, offset) == nullptr) { // Map VA in the accessible space because the graph execution still has // pointers validation and must find a valid object // @note: Memory can be released outside of the graph and // runtime can't keep a valid mapping since it doesn't know if the graph will // be executed again amd::MemObjMap::AddMemObj(node_params_.dptr, va_); } ClPrint(amd::LOG_INFO, amd::LOG_MEM_POOL, "Graph MemAlloc create: %p", node_params_.dptr); } } return error; } void* ReserveAddress() { auto graph = GetParentGraph(); if (graph != nullptr) { node_params_.dptr = graph->ReserveAddress(node_params_.bytesize); if (node_params_.dptr != nullptr) { // Find VA and map in the accessible space so capture can find a valid object va_ = amd::MemObjMap::FindVirtualMemObj(node_params_.dptr); amd::MemObjMap::AddMemObj(node_params_.dptr, va_); } ClPrint(amd::LOG_INFO, amd::LOG_MEM_POOL, "Graph MemAlloc reserve VA: %p", node_params_.dptr); } return node_params_.dptr; } void* Execute(hip::Stream* stream = nullptr) { auto graph = GetParentGraph(); if (graph != nullptr) { // The node creation requires to return a valid address, however FreeNode can't // free memory on creation because it doesn't have any execution point yet. Thus // the code below makes sure memory won't be recreated on the first execution of the graph if ((node_params_.dptr == nullptr) || !graph->ProbeMemory(node_params_.dptr)) { auto dptr = graph->AllocateMemory(node_params_.bytesize, stream, node_params_.dptr); if ((node_params_.dptr != nullptr) && (node_params_.dptr != dptr)) { LogPrintfError("Ptr mismatch in graph mem alloc %p != %p", node_params_.dptr, dptr); } node_params_.dptr = dptr; } } return node_params_.dptr; } bool IsActiveMem() { auto graph = GetParentGraph(); return graph->ProbeMemory(node_params_.dptr); } void GetParams(hipMemAllocNodeParams* params) const { std::memcpy(params, &node_params_, sizeof(hipMemAllocNodeParams)); } }; // ================================================================================================ class hipGraphMemFreeNode : public hipGraphNode { void* device_ptr_; // Device pointer of the freed memory // Derive the new class for VirtualMap command, since runtime has to free // real allocation after unmap is complete class VirtualMemFreeNode : public amd::VirtualMapCommand { public: VirtualMemFreeNode(ihipGraph* graph, int device_id, amd::HostQueue& queue, const amd::Event::EventWaitList& eventWaitList, void* ptr, size_t size, amd::Memory* memory) : VirtualMapCommand(queue, eventWaitList, ptr, size, memory) , graph_(graph), device_id_(device_id) {} virtual void submit(device::VirtualDevice& device) final { // Find memory object before unmap logic auto alloc = amd::MemObjMap::FindMemObj(ptr()); VirtualMapCommand::submit(device); // Restore the original address of the generic allocation auto ga = reinterpret_cast(alloc->getUserData().data); alloc->setSvmPtr(ga->ptr_); if (!AMD_DIRECT_DISPATCH) { // Update the current device, since hip event, used in mem pools, requires device hip::setCurrentDevice(device_id_); } // Free virtual address ga->va_->release(); alloc->getUserData().data = nullptr; // Release the allocation back to graph's pool graph_->FreeMemory(ga->ptr_, static_cast(queue())); amd::MemObjMap::AddMemObj(ptr(), ga->va_); delete ga; ClPrint(amd::LOG_INFO, amd::LOG_MEM_POOL, "Graph MemFree execute: %p, %p", ptr(), alloc); } private: ihipGraph* graph_; // Graph, which has the execution of this command int device_id_; // Device ID where this command is executed }; public: hipGraphMemFreeNode(void* dptr) : hipGraphNode(hipGraphNodeTypeMemFree, "solid", "rectangle", "MEM_FREE") , device_ptr_(dptr) {} hipGraphMemFreeNode(const hipGraphMemFreeNode& rhs) : hipGraphNode(rhs) { device_ptr_ = rhs.device_ptr_; } virtual hipGraphNode* clone() const final { return new hipGraphMemFreeNode(static_cast(*this)); } virtual hipError_t CreateCommand(hip::Stream* stream) final { auto error = hipGraphNode::CreateCommand(stream); if (!HIP_MEM_POOL_USE_VM) { Execute(stream_); } else { auto graph = GetParentGraph(); if (graph != nullptr) { const auto& dev_info = stream->device().info(); auto va = amd::MemObjMap::FindVirtualMemObj(device_ptr_); // Unmap virtual address from memory amd::Command* cmd = new VirtualMemFreeNode(graph, stream->DeviceId(), *stream, amd::Command::EventWaitList{}, device_ptr_, amd::alignUp(va->getSize(), dev_info.virtualMemAllocGranularity_), nullptr); commands_.push_back(cmd); ClPrint(amd::LOG_INFO, amd::LOG_MEM_POOL, "Graph FreeMem create: %p", device_ptr_); } } return error; } void Execute(hip::Stream* stream) { auto graph = GetParentGraph(); if (graph != nullptr) { graph->FreeMemory(device_ptr_, stream); } } void GetParams(void** params) const { *params = device_ptr_; } }; clr-rocm-5.7.1/hipamd/src/hip_hcc.map.in000066400000000000000000000316161450307266000200060ustar00rootroot00000000000000hip_4.2 { global: hipChooseDevice; hipCtxCreate; hipCtxDestroy; hipCtxDisablePeerAccess; hipCtxEnablePeerAccess; hipCtxGetApiVersion; hipCtxGetCacheConfig; hipCtxGetCurrent; hipCtxGetDevice; hipCtxGetFlags; hipCtxGetSharedMemConfig; hipCtxPopCurrent; hipCtxPushCurrent; hipCtxSetCacheConfig; hipCtxSetCurrent; hipCtxSetSharedMemConfig; hipCtxSynchronize; hipDeviceCanAccessPeer; hipDeviceComputeCapability; hipDeviceDisablePeerAccess; hipDeviceEnablePeerAccess; hipDeviceGet; hipDeviceGetAttribute; hipDeviceGetByPCIBusId; hipDeviceGetCacheConfig; hipDeviceGetStreamPriorityRange; hipDeviceGetLimit; hipDeviceGetName; hipDeviceGetPCIBusId; hipDeviceGetSharedMemConfig; hipDeviceGetP2PAttribute; hipDevicePrimaryCtxGetState; hipDevicePrimaryCtxRelease; hipDevicePrimaryCtxReset; hipDevicePrimaryCtxRetain; hipDevicePrimaryCtxSetFlags; hipDeviceReset; hipDeviceSetCacheConfig; hipDeviceSetSharedMemConfig; hipDeviceSynchronize; hipDeviceTotalMem; hipDriverGetVersion; hipEventCreate; hipEventCreateWithFlags; hipEventDestroy; hipEventElapsedTime; hipEventQuery; hipEventRecord; hipEventSynchronize; hipExtGetLinkTypeAndHopCount; hipExtLaunchMultiKernelMultiDevice; hipExtMallocWithFlags; hipExtModuleLaunchKernel; hipExtLaunchKernel; hipFree; hipFreeArray; hipFuncSetAttribute; hipFuncSetCacheConfig; hipFuncSetSharedMemConfig; hipGetDevice; hipGetDeviceCount; hipGetDeviceProperties; hipGetErrorName; hipGetErrorString; hipGetLastError; hipMemAdvise; hipMemAllocHost; hipHostAlloc; hipHostFree; hipHostGetDevicePointer; hipHostGetFlags; hipHostMalloc; hipHostRegister; hipHostUnregister; hipInit; hipIpcCloseMemHandle; hipIpcGetMemHandle; hipIpcOpenMemHandle; hipIpcGetEventHandle; hipIpcOpenEventHandle; hipMalloc; hipMalloc3D; hipMalloc3DArray; hipMallocManaged; hipArrayCreate; hipArray3DCreate; hipMallocArray; hipMallocPitch; hipMemAllocPitch; hipMemcpy; hipMemcpyWithStream; hipMemcpyParam2D; hipMemcpy2D; hipMemcpy2DAsync; hipMemcpy2DToArray; hipMemcpy3D; hipMemcpy3DAsync; hipDrvMemcpy3D; hipDrvMemcpy3DAsync; hipMemcpyAsync; hipMemcpyDtoD; hipMemcpyDtoDAsync; hipMemcpyDtoH; hipMemcpyDtoHAsync; hipMemcpyFromSymbol; hipMemcpyFromSymbolAsync; hipMemcpyHtoD; hipMemcpyHtoDAsync; hipMemcpyPeer; hipMemcpyPeerAsync; hipMemcpyToArray; hipMemcpyFromArray; hipMemcpyToSymbol; hipMemcpyToSymbolAsync; hipMemGetAddressRange; hipGetSymbolAddress; hipGetSymbolSize; hipMemGetInfo; hipMemPrefetchAsync; hipMemPtrGetInfo; hipMemRangeGetAttribute; hipMemRangeGetAttributes; hipMemset; hipMemsetAsync; hipMemsetD8; hipMemsetD8Async; hipMemsetD16; hipMemsetD16Async; hipMemsetD32; hipMemsetD32Async; hipMemset2D; hipMemset2DAsync; hipMemset3D; hipMemset3DAsync; hipModuleGetFunction; hipModuleGetGlobal; hipModuleGetTexRef; hipModuleLaunchKernel; hipModuleLaunchKernelExt; hipLaunchCooperativeKernel; hipLaunchCooperativeKernelMultiDevice; hipModuleLoad; hipModuleLoadData; hipModuleLoadDataEx; hipModuleUnload; hipModuleOccupancyMaxPotentialBlockSize; hipModuleOccupancyMaxPotentialBlockSizeWithFlags; hipModuleOccupancyMaxActiveBlocksPerMultiprocessor; hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags; hipOccupancyMaxPotentialBlockSize; hipOccupancyMaxActiveBlocksPerMultiprocessor; hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags; hipFuncGetAttribute; hipFuncGetAttributes; hipPeekAtLastError; hipPointerSetAttribute; hipPointerGetAttributes; hipProfilerStart; hipProfilerStop; hipRuntimeGetVersion; hipGetDeviceFlags; hipSetDevice; hipSetDeviceFlags; hipStreamAddCallback; hipStreamAttachMemAsync; hipStreamCreate; hipStreamCreateWithFlags; hipStreamCreateWithPriority; hipStreamDestroy; hipStreamGetDevice; hipStreamGetFlags; hipStreamQuery; hipStreamSynchronize; hipStreamWaitEvent; __hipPopCallConfiguration; __hipPushCallConfiguration; __hipRegisterFatBinary; __hipRegisterFunction; __hipRegisterVar; __hipRegisterSurface; __hipRegisterTexture; __hipRegisterManagedVar; __hipUnregisterFatBinary; __gnu_h2f_ieee; __gnu_f2h_ieee; hipConfigureCall; hipSetupArgument; hipLaunchByPtr; hipLaunchKernel; hipApiName; hipKernelNameRef; hipKernelNameRefByPtr; hipGetStreamDeviceId; hipProfilerStart; hipProfilerStop; hiprtcCompileProgram; hiprtcCreateProgram; hiprtcDestroyProgram; hiprtcGetLoweredName; hiprtcGetProgramLog; hiprtcGetProgramLogSize; hiprtcGetCode; hiprtcGetCodeSize; hiprtcGetErrorString; hiprtcAddNameExpression; hiprtcVersion; hiprtcLinkCreate; hiprtcLinkAddFile; hiprtcLinkAddData; hiprtcLinkComplete; hiprtcLinkDestroy; hipBindTexture; hipBindTexture2D; hipBindTextureToArray; hipBindTextureToMipmappedArray; hipGetTextureAlignmentOffset; hipGetTextureReference; hipUnbindTexture; hipCreateChannelDesc; hipCreateTextureObject; hipDestroyTextureObject; hipGetChannelDesc; hipGetTextureObjectResourceDesc; hipGetTextureObjectResourceViewDesc; hipGetTextureObjectTextureDesc; hipTexRefGetAddress; hipTexRefGetAddressMode; hipTexRefGetArray; hipTexRefGetBorderColor; hipTexRefGetFilterMode; hipTexRefGetFlags; hipTexRefGetFormat; hipTexRefGetMaxAnisotropy; hipTexRefGetMipmapFilterMode; hipTexRefGetMipmapLevelBias; hipTexRefGetMipmapLevelClamp; hipTexRefGetMipmappedArray; hipTexRefSetAddress; hipTexRefSetAddress2D; hipTexRefSetAddressMode; hipTexRefSetArray; hipTexRefSetBorderColor; hipTexRefSetFilterMode; hipTexRefSetFlags; hipTexRefSetFormat; hipTexRefSetMaxAnisotropy; hipTexRefSetMipmapFilterMode; hipTexRefSetMipmapLevelBias; hipTexRefSetMipmapLevelClamp; hipTexRefSetMipmappedArray; hipMipmappedArrayCreate; hipMallocMipmappedArray; hipMipmappedArrayDestroy; hipFreeMipmappedArray; hipMipmappedArrayGetLevel; hipGetMipmappedArrayLevel; hipMallocHost; hipFreeHost; hipTexObjectCreate; hipTexObjectDestroy; hipTexObjectGetResourceDesc; hipTexObjectGetResourceViewDesc; hipTexObjectGetTextureDesc; hipGetCmdName*; hipExtStreamCreateWithCUMask; hipStreamGetPriority; hipMemcpy2DFromArray; hipMemcpy2DFromArrayAsync; hipMemcpyAtoH; hipMemcpyHtoA; hipMemcpyParam2DAsync; __hipGetPCH; hipExtStreamGetCUMask; extern "C++" { hipCreateSurfaceObject*; hipDestroySurfaceObject*; hipHccModuleLaunchKernel*; hipExtModuleLaunchKernel*; }; local: *; }; hip_4.3 { global: hipGraphCreate; hipGraphDestroy; hipGraphAddKernelNode; hipGraphAddMemsetNode; hipGraphAddMemcpyNode; hipGraphAddMemcpyNode1D; hipGraphInstantiate; hipGraphLaunch; hipStreamIsCapturing; hipStreamBeginCapture; hipStreamEndCapture; hipGraphExecDestroy; hipImportExternalSemaphore; hipSignalExternalSemaphoresAsync; hipWaitExternalSemaphoresAsync; hipDestroyExternalSemaphore; hipImportExternalMemory; hipExternalMemoryGetMappedBuffer; hipDestroyExternalMemory; hipMemcpy2DToArrayAsync; hipDrvMemcpy2DUnaligned; hipArrayDestroy; hipGLGetDevices; hipGraphicsGLRegisterBuffer; hipGraphicsMapResources; hipGraphicsResourceGetMappedPointer; hipGraphicsUnmapResources; hipGraphicsUnregisterResource; local: *; } hip_4.2; hip_4.4 { global: hipGraphGetNodes; hipGraphGetRootNodes; hipGraphKernelNodeGetParams; hipGraphKernelNodeSetParams; hipGraphMemcpyNodeGetParams; hipGraphMemcpyNodeSetParams; hipGraphMemsetNodeGetParams; hipGraphMemsetNodeSetParams; hipGraphAddDependencies; hipStreamWaitValue32; hipStreamWaitValue64; hipStreamWriteValue32; hipStreamWriteValue64; hipGraphExecKernelNodeSetParams; hipGraphAddEmptyNode; local: *; } hip_4.3; hip_4.5 { global: hipStreamUpdateCaptureDependencies; hipGraphRemoveDependencies; hipGraphGetEdges; hipGraphNodeGetDependencies; hipGraphNodeGetDependentNodes; hipGraphNodeGetType; hipGraphDestroyNode; hipGraphClone; hipGraphNodeFindInClone; hipGraphAddChildGraphNode; hipGraphChildGraphNodeGetGraph; hipGraphExecChildGraphNodeSetParams; hipGraphAddMemcpyNodeFromSymbol; hipGraphMemcpyNodeSetParamsFromSymbol; hipGraphExecMemcpyNodeSetParamsFromSymbol; hipGraphAddMemcpyNodeToSymbol; hipGraphMemcpyNodeSetParamsToSymbol; hipGraphExecMemcpyNodeSetParamsToSymbol; hipGraphExecMemcpyNodeSetParams; hipGraphMemcpyNodeSetParams1D; hipGraphExecMemcpyNodeSetParams1D; hipGraphAddEventRecordNode; hipGraphEventRecordNodeGetEvent; hipGraphEventRecordNodeSetEvent; hipGraphExecEventRecordNodeSetEvent; hipGraphAddEventWaitNode; hipGraphEventWaitNodeGetEvent; hipGraphEventWaitNodeSetEvent; hipGraphExecEventWaitNodeSetEvent; hipGraphAddHostNode; hipGraphHostNodeGetParams; hipGraphHostNodeSetParams; hipGraphExecHostNodeSetParams; hipGraphExecUpdate; hipGraphInstantiateWithFlags; hipGraphExecMemsetNodeSetParams; hipDeviceGetGraphMemAttribute; hipDeviceSetGraphMemAttribute; hipDeviceGraphMemTrim; amd_dbgapi_get_build_name; amd_dbgapi_get_git_hash; amd_dbgapi_get_build_id; hipStreamGetCaptureInfo; hipStreamGetCaptureInfo_v2; hipGraphicsGLRegisterImage; hipGraphicsSubResourceGetMappedArray; local: *; } hip_4.4; hip_5.0 { global: hipPointerGetAttribute; hipDrvPointerGetAttributes; hipThreadExchangeStreamCaptureMode; hipGraphKernelNodeSetAttribute; hipGraphKernelNodeGetAttribute; local: *; } hip_4.5; hip_5.1 { global: hipDeviceGetUuid; hipDeviceGetDefaultMemPool; hipDeviceSetMemPool; hipDeviceGetMemPool; hipMallocAsync; hipFreeAsync; hipMemPoolTrimTo; hipMemPoolSetAttribute; hipMemPoolGetAttribute; hipMemPoolSetAccess; hipMemPoolGetAccess; hipMemPoolCreate; hipMemPoolDestroy; hipMallocFromPoolAsync; hipMemPoolExportToShareableHandle; hipMemPoolImportFromShareableHandle; hipMemPoolExportPointer; hipMemPoolImportPointer; hipMemAddressFree; hipMemAddressReserve; hipMemCreate; hipMemExportToShareableHandle; hipMemGetAccess; hipMemGetAllocationGranularity; hipMemGetAllocationPropertiesFromHandle; hipMemImportFromShareableHandle; hipMemMap; hipMemMapArrayAsync; hipMemRelease; hipMemRetainAllocationHandle; hipMemSetAccess; hipMemUnmap; local: *; } hip_5.0; hip_5.2 { global: hipMemcpy_spt; hipMemcpyAsync_spt; hipStreamSynchronize_spt; hipMemcpyToSymbol_spt; hipMemcpyFromSymbol_spt; hipMemcpy2D_spt; hipMemcpy2DToArray_spt; hipMemcpy2DFromArray_spt; hipMemcpy3D_spt; hipMemset_spt; hipMemset2D_spt; hipMemset3D_spt; hipStreamQuery_spt; hipStreamGetFlags_spt; hipStreamGetPriority_spt; hipStreamWaitEvent_spt; hipEventRecord_spt; hipLaunchKernel_spt; hipLaunchCooperativeKernel_spt; local: *; } hip_5.1; hip_5.3 { global: hipDeviceSetLimit; hiprtcGetBitcode; hiprtcGetBitcodeSize; hipGraphLaunch_spt; hipStreamBeginCapture_spt; hipStreamEndCapture_spt; hipStreamIsCapturing_spt; hipStreamGetCaptureInfo_spt; hipStreamGetCaptureInfo_v2_spt; hipStreamAddCallback_spt; hipMemsetAsync_spt; hipMemset2DAsync_spt; hipMemset3DAsync_spt; hipMemcpy3DAsync_spt; hipMemcpy2DAsync_spt; hipMemcpyFromSymbolAsync_spt; hipMemcpyToSymbolAsync_spt; hipMemcpyFromArray_spt; hipMemcpy2DToArray_spt; hipMemcpy2DFromArrayAsync_spt; hipMemcpy2DToArrayAsync_spt; hipDrvGetErrorName; hipDrvGetErrorString; hipUserObjectCreate; hipUserObjectRelease; hipUserObjectRetain; hipGraphRetainUserObject; hipGraphReleaseUserObject; hipLaunchHostFunc; hipLaunchHostFunc_spt; hipRegisterTracerCallback; hipGraphDebugDotPrint; hipGraphKernelNodeCopyAttributes; hipGraphNodeGetEnabled; hipGraphNodeSetEnabled; hipGraphUpload; local: *; } hip_5.2; hip_5.5 { global: hipModuleLaunchCooperativeKernel; hipModuleLaunchCooperativeKernelMultiDevice; hipGraphAddMemAllocNode; hipGraphMemAllocNodeGetParams; hipGraphAddMemFreeNode; hipGraphMemFreeNodeGetParams; local: *; } hip_5.3; hip_5.6 { global: hipArrayGetInfo; hipArrayGetDescriptor; hipArray3DGetDescriptor; local: *; } hip_5.5;clr-rocm-5.7.1/hipamd/src/hip_hcc.rc000066400000000000000000000043631450307266000172270ustar00rootroot00000000000000#define STR(__macro__) #__macro__ #define XSTR(__macro__) STR(__macro__) #if defined(_DEBUG) #define DEBUG_ONLY(x) x #else #define DEBUG_ONLY(x) #endif #define VERSION_PREFIX_MAJOR 2 #define VERSION_PREFIX_MINOR 0 #define APSTUDIO_READONLY_SYMBOLS ///////////////////////////////////////////////////////////////////////////// // // Generated from the TEXTINCLUDE 2 resource. // #include "winresrc.h" #include "utils/versions.hpp" ///////////////////////////////////////////////////////////////////////////// #undef APSTUDIO_READONLY_SYMBOLS ///////////////////////////////////////////////////////////////////////////// // English (U.S.) resources #if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_ENU) #ifdef _WIN32 LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US #pragma code_page(1252) #endif //_WIN32 ///////////////////////////////////////////////////////////////////////////// // // Version // VS_VERSION_INFO VERSIONINFO FILEVERSION 10,0,AMD_PLATFORM_BUILD_NUMBER,AMD_PLATFORM_REVISION_NUMBER PRODUCTVERSION 10,0,AMD_PLATFORM_BUILD_NUMBER,AMD_PLATFORM_REVISION_NUMBER FILEFLAGSMASK 0x3fL #ifdef _DEBUG FILEFLAGS 0x1L #else FILEFLAGS 0x0L #endif FILEOS 0x40004L FILETYPE 0x2L FILESUBTYPE 0x0L BEGIN BLOCK "StringFileInfo" BEGIN BLOCK "040904b0" BEGIN VALUE "Comments", " \0" VALUE "CompanyName", "Advanced Micro Devices Inc.\0" VALUE "FileDescription", AMD_PLATFORM_NAME " OpenCL " XSTR(VERSION_PREFIX_MAJOR) "." XSTR(VERSION_PREFIX_MINOR) " Runtime\0" VALUE "FileVersion", "10.0." XSTR(AMD_PLATFORM_BUILD_NUMBER) "." XSTR(AMD_PLATFORM_REVISION_NUMBER) VALUE "InternalName", "OpenCL" VALUE "LegalCopyright", "Copyright (C) 2011 Advanced Micro Devices Inc.\0" VALUE "OriginalFilename", "OpenCL.dll" VALUE "ProductName", "OpenCL " XSTR(VERSION_PREFIX_MAJOR) "." XSTR(VERSION_PREFIX_MINOR) " " AMD_PLATFORM_INFO "\0" VALUE "ProductVersion", "10.0." XSTR(AMD_PLATFORM_BUILD_NUMBER) "." XSTR(AMD_PLATFORM_REVISION_NUMBER) END END BLOCK "VarFileInfo" BEGIN VALUE "Translation", 0x409, 1200 END END #endif // English (U.S.) resources ///////////////////////////////////////////////////////////////////////////// clr-rocm-5.7.1/hipamd/src/hip_hcc_in.rc.in000077500000000000000000000042011450307266000203140ustar00rootroot00000000000000#define STR(__macro__) #__macro__ #define XSTR(__macro__) STR(__macro__) #if defined(_DEBUG) #define DEBUG_ONLY(x) x #else #define DEBUG_ONLY(x) #endif #define VERSION_PREFIX_MAJOR @VERSION_MAJOR_AMDHIP@ #define VERSION_PREFIX_MINOR @VERSION_MINOR_AMDHIP@ #define APSTUDIO_READONLY_SYMBOLS ///////////////////////////////////////////////////////////////////////////// // // Generated from the TEXTINCLUDE 2 resource. // #include "winresrc.h" #include "utils/versions.hpp" ///////////////////////////////////////////////////////////////////////////// #undef APSTUDIO_READONLY_SYMBOLS ///////////////////////////////////////////////////////////////////////////// // English (U.S.) resources #if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_ENU) #ifdef _WIN32 LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US #pragma code_page(1252) #endif //_WIN32 ///////////////////////////////////////////////////////////////////////////// // // Version // VS_VERSION_INFO VERSIONINFO FILEVERSION 10, 0, AMD_PLATFORM_BUILD_NUMBER, AMD_PLATFORM_REVISION_NUMBER PRODUCTVERSION 10, 0, AMD_PLATFORM_BUILD_NUMBER, AMD_PLATFORM_REVISION_NUMBER FILEFLAGSMASK 0x3fL #ifdef _DEBUG FILEFLAGS 0x1L #else FILEFLAGS 0x0L #endif FILEOS 0x40004L FILETYPE 0x2L FILESUBTYPE 0x0L BEGIN BLOCK "StringFileInfo" BEGIN BLOCK "040904b0" BEGIN VALUE "Comments", " \0" VALUE "CompanyName", "Advanced Micro Devices Inc.\0" VALUE "FileDescription", AMD_PLATFORM_NAME " amdhip64 " XSTR(VERSION_PREFIX_MAJOR) "." XSTR(VERSION_PREFIX_MINOR) " Runtime\0" VALUE "FileVersion", "10.0." XSTR(AMD_PLATFORM_BUILD_NUMBER) "." XSTR(AMD_PLATFORM_REVISION_NUMBER) VALUE "InternalName", "amdhip64" VALUE "LegalCopyright", "Copyright (C) 2011 Advanced Micro Devices Inc.\0" VALUE "OriginalFilename", "amdhip64.dll" VALUE "ProductName", "amdhip64 " XSTR(VERSION_PREFIX_MAJOR) "." XSTR(VERSION_PREFIX_MINOR) " " AMD_PLATFORM_INFO "\0" VALUE "ProductVersion", "10.0." XSTR(AMD_PLATFORM_BUILD_NUMBER) "." XSTR(AMD_PLATFORM_REVISION_NUMBER) END END BLOCK "VarFileInfo" BEGIN VALUE "Translation", 0x409, 1200 END END #endif // English (U.S.) resources ///////////////////////////////////////////////////////////////////////////// clr-rocm-5.7.1/hipamd/src/hip_hmm.cpp000066400000000000000000000302051450307266000174230ustar00rootroot00000000000000/* Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" #include "hip_conversions.hpp" #include "platform/context.hpp" #include "platform/command.hpp" #include "platform/memory.hpp" // Forward declaraiton of a function hipError_t ihipMallocManaged(void** ptr, size_t size, unsigned int align = 0); // Make sure HIP defines match ROCclr to avoid double conversion static_assert(hipCpuDeviceId == amd::CpuDeviceId, "CPU device ID mismatch with ROCclr!"); static_assert(hipInvalidDeviceId == amd::InvalidDeviceId, "Invalid device ID mismatch with ROCclr!"); static_assert(static_cast(hipMemAdviseSetReadMostly) == amd::MemoryAdvice::SetReadMostly, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemAdviseUnsetReadMostly) == amd::MemoryAdvice::UnsetReadMostly, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemAdviseSetPreferredLocation) == amd::MemoryAdvice::SetPreferredLocation, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemAdviseUnsetPreferredLocation) == amd::MemoryAdvice::UnsetPreferredLocation, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemAdviseSetAccessedBy) == amd::MemoryAdvice::SetAccessedBy, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemAdviseUnsetAccessedBy) == amd::MemoryAdvice::UnsetAccessedBy, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemAdviseSetCoarseGrain) == amd::MemoryAdvice::SetCoarseGrain, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemAdviseUnsetCoarseGrain) == amd::MemoryAdvice::UnsetCoarseGrain, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemRangeAttributeReadMostly) == amd::MemRangeAttribute::ReadMostly, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemRangeAttributePreferredLocation) == amd::MemRangeAttribute::PreferredLocation, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemRangeAttributeAccessedBy) == amd::MemRangeAttribute::AccessedBy, "Enum mismatch with ROCclr!"); static_assert(static_cast(hipMemRangeAttributeLastPrefetchLocation) == amd::MemRangeAttribute::LastPrefetchLocation, "Enum mismatch with ROCclr!"); // ================================================================================================ hipError_t hipMallocManaged(void** dev_ptr, size_t size, unsigned int flags) { HIP_INIT_API(hipMallocManaged, dev_ptr, size, flags); if ((dev_ptr == nullptr) || (size == 0) || ((flags != hipMemAttachGlobal) && (flags != hipMemAttachHost))) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(ihipMallocManaged(dev_ptr, size), *dev_ptr); } // ================================================================================================ hipError_t hipMemPrefetchAsync(const void* dev_ptr, size_t count, int device, hipStream_t stream) { HIP_INIT_API(hipMemPrefetchAsync, dev_ptr, count, device, stream); if ((dev_ptr == nullptr) || (count == 0)) { HIP_RETURN(hipErrorInvalidValue); } if (!hip::isValid(stream)) { HIP_RETURN(hipErrorContextIsDestroyed); } size_t offset = 0; amd::Memory* memObj = getMemoryObject(dev_ptr, offset); if (memObj == nullptr || (memObj && count > (memObj->getSize() - offset))) { HIP_RETURN(hipErrorInvalidValue); } if (device != hipCpuDeviceId && (static_cast(device) >= g_devices.size())) { HIP_RETURN(hipErrorInvalidDevice); } hip::Stream* hip_stream = nullptr; amd::Device* dev = nullptr; bool cpu_access = false; if ((memObj == nullptr) && (device != hipCpuDeviceId) && (!g_devices[device]->devices()[0]->info().hmmCpuMemoryAccessible_)) { HIP_RETURN(hipErrorNotSupported); } // Pick the specified stream or Null one from the provided device if (device == hipCpuDeviceId) { cpu_access = true; hip_stream = (stream == nullptr) ? hip::getCurrentDevice()->NullStream() : hip::getStream(stream); } else { dev = g_devices[device]->devices()[0]; hip_stream = (stream == nullptr) ? g_devices[device]->NullStream() : hip::getStream(stream); } if (hip_stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Command::EventWaitList waitList; amd::SvmPrefetchAsyncCommand* command = new amd::SvmPrefetchAsyncCommand(*hip_stream, waitList, dev_ptr, count, dev, cpu_access); if (command == nullptr) { return hipErrorOutOfMemory; } command->enqueue(); command->release(); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemAdvise(const void* dev_ptr, size_t count, hipMemoryAdvise advice, int device) { HIP_INIT_API(hipMemAdvise, dev_ptr, count, advice, device); bool isAdviseReadMostly = (advice == hipMemAdviseSetReadMostly) || (advice == hipMemAdviseUnsetReadMostly); if (!isAdviseReadMostly && ((device != hipCpuDeviceId) && (static_cast(device) >= g_devices.size()))) { HIP_RETURN(hipErrorInvalidDevice); } if ((dev_ptr == nullptr) || (count == 0)) { HIP_RETURN(hipErrorInvalidValue); } size_t offset = 0; amd::Memory* memObj = getMemoryObject(dev_ptr, offset); if (memObj && count > (memObj->getSize() - offset)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* dev = (device == hipCpuDeviceId || isAdviseReadMostly) ? g_devices[0]->devices()[0] : g_devices[device]->devices()[0]; bool use_cpu = (device == hipCpuDeviceId) ? true : false; // Set the allocation attributes in AMD HMM if (!dev->SetSvmAttributes(dev_ptr, count, static_cast(advice), use_cpu)) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemRangeGetAttribute(void* data, size_t data_size, hipMemRangeAttribute attribute, const void* dev_ptr, size_t count) { HIP_INIT_API(hipMemRangeGetAttribute, data, data_size, attribute, dev_ptr, count); if ((data == nullptr) || (data_size == 0) || (dev_ptr == nullptr) || (count == 0)) { HIP_RETURN(hipErrorInvalidValue); } // Shouldn't matter for which device the interface is called amd::Device* dev = g_devices[0]->devices()[0]; // Get the allocation attribute from AMD HMM if (!dev->GetSvmAttributes(&data, &data_size, reinterpret_cast(&attribute), 1, dev_ptr, count)) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemRangeGetAttributes(void** data, size_t* data_sizes, hipMemRangeAttribute* attributes, size_t num_attributes, const void* dev_ptr, size_t count) { HIP_INIT_API(hipMemRangeGetAttributes, data, data_sizes, attributes, num_attributes, dev_ptr, count); if ((data == nullptr) || (data_sizes == nullptr) || (attributes == nullptr) || (num_attributes == 0) || (dev_ptr == nullptr) || (count == 0)) { HIP_RETURN(hipErrorInvalidValue); } if (*data_sizes > 0) { for (int i = 0 ; i<*data_sizes ; i++) { if (!data[i]) { HIP_RETURN(hipErrorInvalidValue); } } } size_t offset = 0; amd::Memory* memObj = getMemoryObject(dev_ptr, offset); if (memObj) { if (!(memObj->getMemFlags() & (CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_ALLOC_HOST_PTR))) { HIP_RETURN(hipErrorInvalidValue); } } else { HIP_RETURN(hipErrorInvalidValue); } // Shouldn't matter for which device the interface is called amd::Device* dev = g_devices[0]->devices()[0]; // Get the allocation attributes from AMD HMM if (!dev->GetSvmAttributes(data, data_sizes, reinterpret_cast(attributes), num_attributes, dev_ptr, count)) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipStreamAttachMemAsync(hipStream_t stream, void* dev_ptr, size_t length, unsigned int flags) { HIP_INIT_API(hipStreamAttachMemAsync, stream, dev_ptr, length, flags); // stream can be null, length can be 0. if ((dev_ptr == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } if (!hip::isValid(stream)) { HIP_RETURN(hipErrorContextIsDestroyed); } if (flags != hipMemAttachGlobal && flags != hipMemAttachHost && flags != hipMemAttachSingle) { HIP_RETURN(hipErrorInvalidValue); } if (flags == hipMemAttachSingle && !stream) { HIP_RETURN(hipErrorInvalidValue); } // host-accessible region of system-allocated pageable memory. // This type of memory may only be specified if the device associated with the // stream reports a non-zero value for the device attribute hipDevAttrPageableMemoryAccess. hip::Stream* hip_stream = (stream == nullptr) ? hip::getCurrentDevice()->NullStream() : hip::getStream(stream); size_t offset = 0; amd::Memory* memObj = getMemoryObject(dev_ptr, offset); if (memObj == nullptr) { if (hip_stream->GetDevice()->devices()[0]->info().hmmCpuMemoryAccessible_ == 0) { HIP_RETURN(hipErrorInvalidValue); } if (length == 0) { HIP_RETURN(hipErrorInvalidValue); } } else { if (memObj->getMemFlags() & (CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_ALLOC_HOST_PTR)) { if (length != 0 && memObj->getSize() != length) { HIP_RETURN(hipErrorInvalidValue); } } } // Unclear what should be done for this interface in AMD HMM, since it's generic SVM alloc HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t ihipMallocManaged(void** ptr, size_t size, unsigned int align) { if (ptr == nullptr) { return hipErrorInvalidValue; } else if (size == 0) { *ptr = nullptr; return hipSuccess; } assert((hip::host_context != nullptr) && "Current host context must be valid"); amd::Context& ctx = *hip::host_context; const amd::Device& dev = *ctx.devices()[0]; // Allocate SVM fine grain buffer with the forced host pointer, avoiding explicit memory // allocation in the device driver *ptr = amd::SvmBuffer::malloc(ctx, CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_ALLOC_HOST_PTR, size, (align == 0) ? dev.info().memBaseAddrAlign_ : align); if (*ptr == nullptr) { return hipErrorMemoryAllocation; } size_t offset = 0; //this is ignored amd::Memory* memObj = getMemoryObject(*ptr, offset); if (memObj == nullptr) { return hipErrorMemoryAllocation; } //saves the current device id so that it can be accessed later memObj->getUserData().deviceId = hip::getCurrentDevice()->deviceId(); ClPrint(amd::LOG_INFO, amd::LOG_API, "ihipMallocManaged ptr=0x%zx", *ptr); return hipSuccess; } clr-rocm-5.7.1/hipamd/src/hip_intercept.cpp000066400000000000000000000043171450307266000206440ustar00rootroot00000000000000/* Copyright (c) 2019 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "hip/hip_runtime.h" #include "hip_internal.hpp" #include "hip_platform.hpp" #include "hip_prof_api.h" // HIP API callback/activity extern const std::string& FunctionName(const hipFunction_t f); extern "C" { int hipGetStreamDeviceId(hipStream_t stream) { if (!hip::isValid(stream)) { return -1; } hip::Stream* s = reinterpret_cast(stream); return (s != nullptr) ? s->DeviceId() : ihipGetDevice(); } const char* hipKernelNameRef(const hipFunction_t function) { return (function != nullptr) ? FunctionName(function).c_str() : nullptr; } const char* hipKernelNameRefByPtr(const void* host_function, hipStream_t stream) { [](auto&&...) {}(stream); return (host_function != nullptr) ? PlatformState::instance().getStatFuncName(host_function) : nullptr; } void hipRegisterTracerCallback(int (*function)(activity_domain_t domain, uint32_t operation_id, void* data)) { activity_prof::report_activity.store(function, std::memory_order_relaxed); } const char* hipApiName(uint32_t id) { return hip_api_name(id); } } // extern "C" clr-rocm-5.7.1/hipamd/src/hip_internal.hpp000066400000000000000000000606471450307266000205000ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_SRC_HIP_INTERNAL_H #define HIP_SRC_HIP_INTERNAL_H #include "vdi_common.hpp" #include "hip_prof_api.h" #include "trace_helper.h" #include "utils/debug.hpp" #include "hip_formatting.hpp" #include "hip_graph_capture.hpp" #include #include #include #include #include #ifdef _WIN32 #include #else #include #endif #define KNRM "\x1B[0m" #define KRED "\x1B[31m" #define KGRN "\x1B[32m" #define KYEL "\x1B[33m" #define KBLU "\x1B[34m" #define KMAG "\x1B[35m" #define KCYN "\x1B[36m" #define KWHT "\x1B[37m" /*! IHIP IPC MEMORY Structure */ #define IHIP_IPC_MEM_HANDLE_SIZE 32 #define IHIP_IPC_MEM_RESERVED_SIZE LP64_SWITCH(20,12) typedef struct ihipIpcMemHandle_st { char ipc_handle[IHIP_IPC_MEM_HANDLE_SIZE]; ///< ipc memory handle on ROCr size_t psize; size_t poffset; int owners_process_id; char reserved[IHIP_IPC_MEM_RESERVED_SIZE]; } ihipIpcMemHandle_t; #define IHIP_IPC_EVENT_HANDLE_SIZE 32 #define IHIP_IPC_EVENT_RESERVED_SIZE LP64_SWITCH(28,24) typedef struct ihipIpcEventHandle_st { //hsa_amd_ipc_signal_t ipc_handle; ///< ipc signal handle on ROCr //char ipc_handle[IHIP_IPC_EVENT_HANDLE_SIZE]; //char reserved[IHIP_IPC_EVENT_RESERVED_SIZE]; char shmem_name[IHIP_IPC_EVENT_HANDLE_SIZE]; }ihipIpcEventHandle_t; const char* ihipGetErrorName(hipError_t hip_error); extern std::once_flag g_ihipInitialized; #define HIP_INIT(noReturn) \ { \ bool status = true; \ std::call_once(g_ihipInitialized, hip::init, &status); \ if (!status && !noReturn) { \ HIP_RETURN(hipErrorInvalidDevice); \ } \ if (hip::tls.device_ == nullptr && g_devices.size() > 0) { \ hip::tls.device_ = g_devices[0]; \ amd::Os::setPreferredNumaNode(g_devices[0]->devices()[0]->getPreferredNumaNode()); \ } \ } #define HIP_INIT_VOID() \ { \ bool status = true; \ std::call_once(g_ihipInitialized, hip::init, &status); \ if (hip::tls.device_ == nullptr && g_devices.size() > 0) { \ hip::tls.device_ = g_devices[0]; \ amd::Os::setPreferredNumaNode(g_devices[0]->devices()[0]->getPreferredNumaNode()); \ } \ } #define HIP_API_PRINT(...) \ uint64_t startTimeUs=0; \ HIPPrintDuration(amd::LOG_INFO, amd::LOG_API, &startTimeUs, \ "%s %s ( %s ) %s", KGRN, \ __func__, ToString( __VA_ARGS__ ).c_str(), KNRM); #define HIP_ERROR_PRINT(err, ...) \ ClPrint(amd::LOG_INFO, amd::LOG_API, "%s: Returned %s : %s", \ __func__, ihipGetErrorName(err), ToString( __VA_ARGS__ ).c_str()); #define HIP_INIT_API_INTERNAL(noReturn, cid, ...) \ amd::Thread* thread = amd::Thread::current(); \ if (!VDI_CHECK_THREAD(thread)) { \ ClPrint(amd::LOG_NONE, amd::LOG_ALWAYS, "An internal error has occurred." \ " This may be due to insufficient memory."); \ if (!noReturn) { \ return hipErrorOutOfMemory; \ } \ } \ HIP_INIT(noReturn) \ HIP_API_PRINT(__VA_ARGS__) \ HIP_CB_SPAWNER_OBJECT(cid); // This macro should be called at the beginning of every HIP API. #define HIP_INIT_API(cid, ...) \ HIP_INIT_API_INTERNAL(0, cid, __VA_ARGS__) \ if (g_devices.size() == 0) { \ HIP_RETURN(hipErrorNoDevice); \ } #define HIP_INIT_API_NO_RETURN(cid, ...) \ HIP_INIT_API_INTERNAL(1, cid, __VA_ARGS__) #define HIP_RETURN_DURATION(ret, ...) \ hip::tls.last_error_ = ret; \ HIPPrintDuration(amd::LOG_INFO, amd::LOG_API, &startTimeUs, \ "%s: Returned %s : %s", \ __func__, ihipGetErrorName(hip::tls.last_error_), \ ToString( __VA_ARGS__ ).c_str()); \ return hip::tls.last_error_; #define HIP_RETURN(ret, ...) \ hip::tls.last_error_ = ret; \ HIP_ERROR_PRINT(hip::tls.last_error_, __VA_ARGS__) \ return hip::tls.last_error_; #define HIP_RETURN_ONFAIL(func) \ do { \ hipError_t herror = (func); \ if (herror != hipSuccess) { \ HIP_RETURN(herror); \ } \ } while (0); // Cannot be use in place of HIP_RETURN. // Refrain from using for external HIP APIs #define IHIP_RETURN_ONFAIL(func) \ do { \ hipError_t herror = (func); \ if (herror != hipSuccess) { \ return herror; \ } \ } while (0); #define CHECK_STREAM_CAPTURE_SUPPORTED() \ if (hip::tls.stream_capture_mode_ == hipStreamCaptureModeThreadLocal) { \ if (hip::tls.capture_streams_.size() != 0) { \ HIP_RETURN(hipErrorStreamCaptureUnsupported); \ } \ } else if (hip::tls.stream_capture_mode_ == hipStreamCaptureModeGlobal) { \ if (hip::tls.capture_streams_.size() != 0) { \ HIP_RETURN(hipErrorStreamCaptureUnsupported); \ } \ if (g_captureStreams.size() != 0) { \ HIP_RETURN(hipErrorStreamCaptureUnsupported); \ } \ } // Sync APIs cannot be called when stream capture is active #define CHECK_STREAM_CAPTURING() \ if (!g_captureStreams.empty()) { \ return hipErrorStreamCaptureImplicit; \ } #define STREAM_CAPTURE(name, stream, ...) \ getStreamPerThread(stream); \ if (stream != nullptr && \ reinterpret_cast(stream)->GetCaptureStatus() == \ hipStreamCaptureStatusActive) { \ hipError_t status = capture##name(stream, ##__VA_ARGS__); \ return status; \ } #define PER_THREAD_DEFAULT_STREAM(stream) \ if (stream == nullptr) { \ stream = getPerThreadDefaultStream(); \ } namespace hc { class accelerator; class accelerator_view; }; struct ihipExec_t { dim3 gridDim_; dim3 blockDim_; size_t sharedMem_; hipStream_t hStream_; std::vector arguments_; }; class stream_per_thread { private: std::vector m_streams; public: stream_per_thread(); stream_per_thread(const stream_per_thread& ) = delete; void operator=(const stream_per_thread& ) = delete; ~stream_per_thread(); hipStream_t get(); }; namespace hip { class Device; class MemoryPool; class Stream : public amd::HostQueue { public: enum Priority : int { High = -1, Normal = 0, Low = 1 }; private: mutable amd::Monitor lock_; Device* device_; Priority priority_; unsigned int flags_; bool null_; const std::vector cuMask_; /// Stream capture related parameters /// Current capture status of the stream hipStreamCaptureStatus captureStatus_; /// Graph that is constructed with capture hipGraph_t pCaptureGraph_; /// Based on mode stream capture places restrictions on API calls that can be made within or /// concurrently hipStreamCaptureMode captureMode_{hipStreamCaptureModeGlobal}; bool originStream_; /// Origin sream has no parent. Parent stream for the derived captured streams with event /// dependencies hipStream_t parentStream_ = nullptr; /// Last graph node captured in the stream std::vector lastCapturedNodes_; /// dependencies removed via API hipStreamUpdateCaptureDependencies std::vector removedDependencies_; /// Derived streams/Paralell branches from the origin stream std::vector parallelCaptureStreams_; /// Capture events std::unordered_set captureEvents_; unsigned long long captureID_; static inline CommandQueue::Priority convertToQueuePriority(Priority p) { return p == Priority::High ? amd::CommandQueue::Priority::High : p == Priority::Low ? amd::CommandQueue::Priority::Low : amd::CommandQueue::Priority::Normal; } public: Stream(Device* dev, Priority p = Priority::Normal, unsigned int f = 0, bool null_stream = false, const std::vector& cuMask = {}, hipStreamCaptureStatus captureStatus = hipStreamCaptureStatusNone); /// Creates the hip stream object, including AMD host queue bool Create(); /// Get device ID associated with the current stream; int DeviceId() const; /// Get HIP device associated with the stream Device* GetDevice() const { return device_; } /// Get device ID associated with a stream; static int DeviceId(const hipStream_t hStream); /// Returns if stream is null stream bool Null() const { return null_; } /// Returns the lock object for the current stream amd::Monitor& Lock() const { return lock_; } /// Returns the creation flags for the current stream unsigned int Flags() const { return flags_; } /// Returns the priority for the current stream Priority GetPriority() const { return priority_; } /// Returns the CU mask for the current stream const std::vector GetCUMask() const { return cuMask_; } /// Sync all streams static void SyncAllStreams(int deviceId); /// Check whether any blocking stream running static bool StreamCaptureBlocking(); /// Destroy all streams on a given device static void destroyAllStreams(int deviceId); static void Destroy(hip::Stream* stream); /// Check Stream Capture status to make sure it is done static bool StreamCaptureOngoing(hipStream_t hStream); /// Returns capture status of the current stream hipStreamCaptureStatus GetCaptureStatus() const { return captureStatus_; } /// Returns capture mode of the current stream hipStreamCaptureMode GetCaptureMode() const { return captureMode_; } /// Returns if stream is origin stream bool IsOriginStream() const { return originStream_; } void SetOriginStream() { originStream_ = true; } /// Returns captured graph hipGraph_t GetCaptureGraph() const { return pCaptureGraph_; } /// Returns last captured graph node const std::vector& GetLastCapturedNodes() const { return lastCapturedNodes_; } /// Set last captured graph node void SetLastCapturedNode(hipGraphNode_t graphNode) { lastCapturedNodes_.clear(); lastCapturedNodes_.push_back(graphNode); } /// returns updated dependencies removed const std::vector& GetRemovedDependencies() { return removedDependencies_; } /// Append captured node via the wait event cross stream void AddCrossCapturedNode(std::vector graphNodes, bool replace = false) { // replace dependencies as per flag hipStreamSetCaptureDependencies if (replace == true) { for (auto node : lastCapturedNodes_) { removedDependencies_.push_back(node); } lastCapturedNodes_.clear(); } for (auto node : graphNodes) { lastCapturedNodes_.push_back(node); } } /// Set graph that is being captured void SetCaptureGraph(hipGraph_t pGraph) { pCaptureGraph_ = pGraph; captureStatus_ = hipStreamCaptureStatusActive; } void SetCaptureId() { // ID is generated in Begin Capture i.e.. when capture status is active captureID_ = GenerateCaptureID(); } void SetCaptureId(unsigned long long captureId) { // ID is given from parent stream captureID_ = captureId; } /// reset capture parameters hipError_t EndCapture(); /// Set capture status void SetCaptureStatus(hipStreamCaptureStatus captureStatus) { captureStatus_ = captureStatus; } /// Set capture mode void SetCaptureMode(hipStreamCaptureMode captureMode) { captureMode_ = captureMode; } /// Set parent stream void SetParentStream(hipStream_t parentStream) { parentStream_ = parentStream; } /// Get parent stream hipStream_t GetParentStream() const { return parentStream_; } /// Generate ID for stream capture unique over the lifetime of the process static unsigned long long GenerateCaptureID() { static std::atomic uid(0); return ++uid; } /// Get Capture ID unsigned long long GetCaptureID() { return captureID_; } void SetCaptureEvent(hipEvent_t e) { amd::ScopedLock lock(lock_); captureEvents_.emplace(e); } bool IsEventCaptured(hipEvent_t e) { amd::ScopedLock lock(lock_); auto it = captureEvents_.find(e); if (it != captureEvents_.end()) { return true; } return false; } void EraseCaptureEvent(hipEvent_t e) { amd::ScopedLock lock(lock_); auto it = captureEvents_.find(e); if (it != captureEvents_.end()) { captureEvents_.erase(it); } } void SetParallelCaptureStream(hipStream_t s) { auto it = std::find(parallelCaptureStreams_.begin(), parallelCaptureStreams_.end(), s); if (it == parallelCaptureStreams_.end()) { parallelCaptureStreams_.push_back(s); } } void EraseParallelCaptureStream(hipStream_t s) { auto it = std::find(parallelCaptureStreams_.begin(), parallelCaptureStreams_.end(), s); if (it != parallelCaptureStreams_.end()) { parallelCaptureStreams_.erase(it); } } static bool existsActiveStreamForDevice(hip::Device* device); /// The stream should be destroyed via release() rather than delete private: ~Stream() {}; }; /// HIP Device class class Device { amd::Monitor lock_{"Device lock", true}; /// ROCclr context amd::Context* context_; /// Device's ID /// Store it here so we don't have to loop through the device list every time int deviceId_; /// ROCclr host queue for default streams Stream* null_stream_ = nullptr; /// Store device flags unsigned int flags_; /// Maintain list of user enabled peers std::list userEnabledPeers; /// True if this device is active bool isActive_; MemoryPool* default_mem_pool_; //!< Default memory pool for this device MemoryPool* current_mem_pool_; MemoryPool* graph_mem_pool_; //!< Memory pool, associated with graphs for this device std::set mem_pools_; public: Device(amd::Context* ctx, int devId): context_(ctx), deviceId_(devId), flags_(hipDeviceScheduleSpin), isActive_(false), default_mem_pool_(nullptr), current_mem_pool_(nullptr), graph_mem_pool_(nullptr) { assert(ctx != nullptr); } ~Device(); bool Create(); amd::Context* asContext() const { return context_; } int deviceId() const { return deviceId_; } void retain() const { context_->retain(); } void release() const { context_->release(); } const std::vector& devices() const { return context_->devices(); } hipError_t EnablePeerAccess(int peerDeviceId){ amd::ScopedLock lock(lock_); bool found = (std::find(userEnabledPeers.begin(), userEnabledPeers.end(), peerDeviceId) != userEnabledPeers.end()); if (found) { return hipErrorPeerAccessAlreadyEnabled; } userEnabledPeers.push_back(peerDeviceId); return hipSuccess; } hipError_t DisablePeerAccess(int peerDeviceId) { amd::ScopedLock lock(lock_); bool found = (std::find(userEnabledPeers.begin(), userEnabledPeers.end(), peerDeviceId) != userEnabledPeers.end()); if (found) { userEnabledPeers.remove(peerDeviceId); return hipSuccess; } else { return hipErrorPeerAccessNotEnabled; } } unsigned int getFlags() const { return flags_; } void setFlags(unsigned int flags) { flags_ = flags; } void Reset(); hip::Stream* NullStream(); Stream* GetNullStream(); bool GetActiveStatus() { amd::ScopedLock lock(lock_); if (isActive_) return true; if (Stream::existsActiveStreamForDevice(this)) { isActive_ = true; return true; } return false; } /// Set the current memory pool on the device void SetCurrentMemoryPool(MemoryPool* pool = nullptr) { current_mem_pool_ = (pool == nullptr) ? default_mem_pool_ : pool; } /// Get the current memory pool on the device MemoryPool* GetCurrentMemoryPool() const { return current_mem_pool_; } /// Get the default memory pool on the device MemoryPool* GetDefaultMemoryPool() const { return default_mem_pool_; } /// Get the graph memory pool on the device MemoryPool* GetGraphMemoryPool() const { return graph_mem_pool_; } /// Add memory pool to the device void AddMemoryPool(MemoryPool* pool); /// Remove memory pool from the device void RemoveMemoryPool(MemoryPool* pool); /// Free memory from the device bool FreeMemory(amd::Memory* memory, Stream* stream); /// Release freed memory from all pools on the current device void ReleaseFreedMemory(Stream* stream); /// Removes a destroyed stream from the safe list of memory pools void RemoveStreamFromPools(Stream* stream); }; /// Thread Local Storage Variables Aggregator Class class TlsAggregator { public: Device* device_; std::stack ctxt_stack_; hipError_t last_error_; std::vector capture_streams_; hipStreamCaptureMode stream_capture_mode_; std::stack exec_stack_; stream_per_thread stream_per_thread_obj_; TlsAggregator(): device_(nullptr), last_error_(hipSuccess), stream_capture_mode_(hipStreamCaptureModeGlobal) { } ~TlsAggregator() { } }; extern thread_local TlsAggregator tls; /// Device representing the host - for pinned memory extern amd::Context* host_context; extern void init(bool* status); extern Device* getCurrentDevice(); extern void setCurrentDevice(unsigned int index); /// Get ROCclr queue associated with hipStream /// Note: This follows the CUDA spec to sync with default streams /// and Blocking streams extern hip::Stream* getStream(hipStream_t stream); /// Get default stream associated with the ROCclr context extern hip::Stream* getNullStream(amd::Context&); /// Get default stream of the thread extern hip::Stream* getNullStream(); /// Get device ID associated with the ROCclr context int getDeviceID(amd::Context& ctx); /// Check if stream is valid extern bool isValid(hipStream_t& stream); extern bool isValid(hipEvent_t event); extern amd::Monitor hipArraySetLock; extern std::unordered_set hipArraySet; }; // namespace hip extern void WaitThenDecrementSignal(hipStream_t stream, hipError_t status, void* user_data); /// Wait all active streams on the blocking queue. The method enqueues a wait command and /// doesn't stall the current thread extern void iHipWaitActiveStreams(hip::Stream* blocking_stream, bool wait_null_stream = false); extern std::vector g_devices; extern hipError_t ihipDeviceGetCount(int* count); extern int ihipGetDevice(); extern hipError_t ihipMalloc(void** ptr, size_t sizeBytes, unsigned int flags); extern amd::Memory* getMemoryObject(const void* ptr, size_t& offset, size_t size = 0); extern amd::Memory* getMemoryObjectWithOffset(const void* ptr, const size_t size = 0); extern void getStreamPerThread(hipStream_t& stream); extern hipStream_t getPerThreadDefaultStream(); extern hipError_t ihipUnbindTexture(textureReference* texRef); extern hipError_t ihipHostRegister(void* hostPtr, size_t sizeBytes, unsigned int flags); extern hipError_t ihipHostUnregister(void* hostPtr); extern hipError_t ihipGetDeviceProperties(hipDeviceProp_t* props, hipDevice_t device); extern hipError_t ihipDeviceGet(hipDevice_t* device, int deviceId); extern hipError_t ihipStreamOperation(hipStream_t stream, cl_command_type cmdType, void* ptr, uint64_t value, uint64_t mask, unsigned int flags, size_t sizeBytes); hipError_t ihipMemcpy(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hip::Stream& stream, bool isHostAsync = false, bool isGPUAsync = true); constexpr bool kOptionChangeable = true; constexpr bool kNewDevProg = false; constexpr bool kMarkerDisableFlush = true; //!< Avoids command batch flush in ROCclr extern std::vector g_captureStreams; extern amd::Monitor g_captureStreamsLock; extern amd::Monitor g_streamSetLock; extern std::unordered_set g_allCapturingStreams; #endif // HIP_SRC_HIP_INTERNAL_H clr-rocm-5.7.1/hipamd/src/hip_memory.cpp000066400000000000000000005051101450307266000201540ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" #include "hip_platform.hpp" #include "hip_conversions.hpp" #include "platform/context.hpp" #include "platform/command.hpp" #include "platform/memory.hpp" #include "platform/external_memory.hpp" amd::Monitor hip::hipArraySetLock{"Guards global hipArray set"}; std::unordered_set hip::hipArraySet; // ================================================================================================ amd::Memory* getMemoryObject(const void* ptr, size_t& offset, size_t size) { auto memObj = amd::MemObjMap::FindMemObj(ptr, &offset); if (memObj == nullptr) { // If memObj not found, use arena_mem_obj. arena_mem_obj is null, if HMM is disabled. memObj = (hip::getCurrentDevice()->asContext()->svmDevices()[0])->GetArenaMemObj( ptr, offset, size); } return memObj; } // ================================================================================================ amd::Memory* getMemoryObjectWithOffset(const void* ptr, const size_t size) { size_t offset = 0; amd::Memory* memObj = getMemoryObject(ptr, offset); if (memObj != nullptr) { if (size > (memObj->getSize() - offset)) { return nullptr; } memObj = new (memObj->getContext()) amd::Buffer(*memObj, memObj->getMemFlags(), offset, size); if (memObj == nullptr) {; return nullptr; } if (!memObj->create(nullptr)) { memObj->release(); return nullptr; } } return memObj; } // ================================================================================================ hipError_t ihipFree(void *ptr) { if (ptr == nullptr) { return hipSuccess; } size_t offset = 0; amd::Memory* memory_object = getMemoryObject(ptr, offset); if (memory_object != nullptr) { // Wait on the device, associated with the current memory object during allocation auto device_id = memory_object->getUserData().deviceId; hip::Stream::SyncAllStreams(device_id); // Find out if memory belongs to any memory pool if (!g_devices[device_id]->FreeMemory(memory_object, nullptr)) { // External mem is not svm. if (memory_object->isInterop()) { amd::MemObjMap::RemoveMemObj(ptr); memory_object->release(); } else { amd::SvmBuffer::free(memory_object->getContext(), ptr); } } return hipSuccess; } return hipErrorInvalidValue; } // ================================================================================================ hipError_t hipImportExternalMemory( hipExternalMemory_t* extMem_out, const hipExternalMemoryHandleDesc* memHandleDesc) { HIP_INIT_API(hipImportExternalMemory, extMem_out, memHandleDesc); if (extMem_out == nullptr || memHandleDesc == nullptr || (memHandleDesc->flags != 0 && memHandleDesc->flags != hipExternalMemoryDedicated) || memHandleDesc->size == 0) { HIP_RETURN(hipErrorInvalidValue); } if ((memHandleDesc->type < hipExternalMemoryHandleTypeOpaqueFd) || (memHandleDesc->type > hipExternalMemoryHandleTypeD3D11ResourceKmt)) { HIP_RETURN(hipErrorInvalidValue); } amd::Context& amdContext = *hip::getCurrentDevice()->asContext(); #ifdef _WIN32 auto ext_buffer = new (amdContext) amd::ExternalBuffer(amdContext, memHandleDesc->size, memHandleDesc->handle.win32.handle, static_cast(memHandleDesc->type)); #else auto ext_buffer = new (amdContext) amd::ExternalBuffer(amdContext, memHandleDesc->size, memHandleDesc->handle.fd, static_cast(memHandleDesc->type)); #endif if (!ext_buffer) { HIP_RETURN(hipErrorOutOfMemory); } if (!ext_buffer->create()) { ext_buffer->release(); HIP_RETURN(hipErrorOutOfMemory); } *extMem_out = ext_buffer; HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipExternalMemoryGetMappedBuffer( void **devPtr, hipExternalMemory_t extMem, const hipExternalMemoryBufferDesc *bufferDesc) { HIP_INIT_API(hipExternalMemoryGetMappedBuffer, devPtr, extMem, bufferDesc); if (devPtr == nullptr || extMem == nullptr || bufferDesc == nullptr || bufferDesc->flags != 0) { HIP_RETURN(hipErrorInvalidValue); } auto buf = reinterpret_cast(extMem); const device::Memory* devMem = buf->getDeviceMemory(*hip::getCurrentDevice()->devices()[0]); if (devMem == nullptr || ((bufferDesc->offset + bufferDesc->size) > devMem->size())) { HIP_RETURN(hipErrorInvalidValue); } *devPtr = reinterpret_cast(devMem->virtualAddress() + bufferDesc->offset); amd::MemObjMap::AddMemObj(*devPtr, buf); buf->retain(); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipDestroyExternalMemory(hipExternalMemory_t extMem) { HIP_INIT_API(hipDestroyExternalMemory, extMem); if (extMem == nullptr) { HIP_RETURN(hipErrorInvalidValue); } reinterpret_cast(extMem)->release(); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipImportExternalSemaphore(hipExternalSemaphore_t* extSem_out, const hipExternalSemaphoreHandleDesc* semHandleDesc) { HIP_INIT_API(hipImportExternalSemaphore, extSem_out, semHandleDesc); if (extSem_out == nullptr || semHandleDesc == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if ((semHandleDesc->type < hipExternalSemaphoreHandleTypeOpaqueFd) || (semHandleDesc->type > hipExternalSemaphoreHandleTypeD3D12Fence)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; #ifdef _WIN32 if (device->importExtSemaphore(extSem_out, semHandleDesc->handle.win32.handle, static_cast (semHandleDesc->type))) { #else if (device->importExtSemaphore(extSem_out, semHandleDesc->handle.fd, static_cast (semHandleDesc->type))) { #endif HIP_RETURN(hipSuccess); } HIP_RETURN(hipErrorNotSupported); } hipError_t hipSignalExternalSemaphoresAsync( const hipExternalSemaphore_t* extSemArray, const hipExternalSemaphoreSignalParams* paramsArray, unsigned int numExtSems, hipStream_t stream ) { HIP_INIT_API(hipSignalExternalSemaphoresAsync, extSemArray, paramsArray, numExtSems, stream); if (extSemArray == nullptr || paramsArray == nullptr || !hip::isValid(stream)) { HIP_RETURN(hipErrorInvalidValue); } hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } for (unsigned int i = 0; i < numExtSems; i++) { if (extSemArray[i] != nullptr) { amd::ExternalSemaphoreCmd* command = new amd::ExternalSemaphoreCmd(*hip_stream, extSemArray[i], paramsArray[i].params.fence.value, amd::ExternalSemaphoreCmd::COMMAND_SIGNAL_EXTSEMAPHORE); if (command == nullptr) { return hipErrorOutOfMemory; } command->enqueue(); command->release(); } else { HIP_RETURN(hipErrorInvalidValue); } } HIP_RETURN(hipSuccess); } hipError_t hipWaitExternalSemaphoresAsync(const hipExternalSemaphore_t* extSemArray, const hipExternalSemaphoreWaitParams* paramsArray, unsigned int numExtSems, hipStream_t stream) { HIP_INIT_API(hipWaitExternalSemaphoresAsync, extSemArray, paramsArray, numExtSems, stream); if (extSemArray == nullptr || paramsArray == nullptr || !hip::isValid(stream)) { HIP_RETURN(hipErrorInvalidValue); } hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } for (unsigned int i = 0; i < numExtSems; i++) { if (extSemArray[i] != nullptr) { amd::ExternalSemaphoreCmd* command = new amd::ExternalSemaphoreCmd(*hip_stream, extSemArray[i], paramsArray[i].params.fence.value, amd::ExternalSemaphoreCmd::COMMAND_WAIT_EXTSEMAPHORE); if (command == nullptr) { return hipErrorOutOfMemory; } command->enqueue(); command->release(); } else { HIP_RETURN(hipErrorInvalidValue); } } HIP_RETURN(hipSuccess); } hipError_t hipDestroyExternalSemaphore(hipExternalSemaphore_t extSem) { HIP_INIT_API(hipDestroyExternalSemaphore, extSem); if (extSem == nullptr ) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; device->DestroyExtSemaphore(extSem); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t ihipMalloc(void** ptr, size_t sizeBytes, unsigned int flags) { if (ptr == nullptr) { return hipErrorInvalidValue; } if (sizeBytes == 0) { *ptr = nullptr; return hipSuccess; } bool useHostDevice = (flags & CL_MEM_SVM_FINE_GRAIN_BUFFER) != 0; amd::Context* curDevContext = hip::getCurrentDevice()->asContext(); amd::Context* amdContext = useHostDevice ? hip::host_context : curDevContext; if (amdContext == nullptr) { return hipErrorOutOfMemory; } const auto& dev_info = amdContext->devices()[0]->info(); if ((!useHostDevice && (dev_info.maxMemAllocSize_ < sizeBytes)) || (useHostDevice && (dev_info.maxPhysicalMemAllocSize_ < sizeBytes))) { return hipErrorOutOfMemory; } *ptr = amd::SvmBuffer::malloc(*amdContext, flags, sizeBytes, dev_info.memBaseAddrAlign_, useHostDevice ? curDevContext->svmDevices()[0] : nullptr); if (*ptr == nullptr) { if (!useHostDevice) { size_t free = 0, total =0; hipError_t err = hipMemGetInfo(&free, &total); if (err == hipSuccess) { LogPrintfError("Allocation failed : Device memory : required :%zu | free :%zu | total :%zu \n", sizeBytes, free, total); } } else { LogPrintfError("Allocation failed : Pinned Memory, size :%zu \n", sizeBytes); } return hipErrorOutOfMemory; } size_t offset = 0; //this is ignored amd::Memory* memObj = getMemoryObject(*ptr, offset); //saves the current device id so that it can be accessed later memObj->getUserData().deviceId = hip::getCurrentDevice()->deviceId(); return hipSuccess; } bool IsHtoHMemcpyValid(void* dst, const void* src, hipMemcpyKind kind) { size_t sOffset = 0; amd::Memory* srcMemory = getMemoryObject(src, sOffset); size_t dOffset = 0; amd::Memory* dstMemory = getMemoryObject(dst, dOffset); if (src && dst && srcMemory == nullptr && dstMemory == nullptr) { if (kind != hipMemcpyHostToHost && kind != hipMemcpyDefault) { return false; } } return true; } hipError_t ihipMemcpy_validate(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind) { if (dst == nullptr || src == nullptr) { return hipErrorInvalidValue; } size_t sOffset = 0; amd::Memory* srcMemory = getMemoryObject(src, sOffset); size_t dOffset = 0; amd::Memory* dstMemory = getMemoryObject(dst, dOffset); // Return error if sizeBytes passed to memcpy is more than the actual size allocated if ((dstMemory && sizeBytes > (dstMemory->getSize() - dOffset)) || (srcMemory && sizeBytes > (srcMemory->getSize() - sOffset))) { return hipErrorInvalidValue; } //If src and dst ptr are null then kind must be either h2h or def. if (!IsHtoHMemcpyValid(dst, src, kind)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyCommand(amd::Command*& command, void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hip::Stream& stream, bool isAsync) { amd::Command::EventWaitList waitList; size_t sOffset = 0; amd::Memory* srcMemory = getMemoryObject(src, sOffset); size_t dOffset = 0; amd::Memory* dstMemory = getMemoryObject(dst, dOffset); amd::Device* queueDevice = &stream.device(); amd::CopyMetadata copyMetadata(isAsync, amd::CopyMetadata::CopyEnginePreference::SDMA); if ((srcMemory == nullptr) && (dstMemory != nullptr)) { hip::Stream* pStream = &stream; if (queueDevice != dstMemory->getContext().devices()[0]) { pStream = hip::getNullStream(dstMemory->getContext()); amd::Command* cmd = stream.getLastQueuedCommand(true); if (cmd != nullptr) { waitList.push_back(cmd); } } command = new amd::WriteMemoryCommand(*pStream, CL_COMMAND_WRITE_BUFFER, waitList, *dstMemory->asBuffer(), dOffset, sizeBytes, src, 0, 0, copyMetadata); } else if ((srcMemory != nullptr) && (dstMemory == nullptr)) { hip::Stream* pStream = &stream; if (queueDevice != srcMemory->getContext().devices()[0]) { pStream = hip::getNullStream(srcMemory->getContext()); amd::Command* cmd = stream.getLastQueuedCommand(true); if (cmd != nullptr) { waitList.push_back(cmd); } } command = new amd::ReadMemoryCommand(*pStream, CL_COMMAND_READ_BUFFER, waitList, *srcMemory->asBuffer(), sOffset, sizeBytes, dst, 0, 0, copyMetadata); } else if ((srcMemory != nullptr) && (dstMemory != nullptr)) { // Check if the queue device doesn't match the device on any memory object. // And any of them are not host allocation. // Hence it's a P2P transfer, because the app has requested access to another GPU if ((srcMemory->getContext().devices()[0] != dstMemory->getContext().devices()[0]) && ((srcMemory->getContext().devices().size() == 1) && (dstMemory->getContext().devices().size() == 1))) { command = new amd::CopyMemoryP2PCommand(stream, CL_COMMAND_COPY_BUFFER, waitList, *srcMemory->asBuffer(), *dstMemory->asBuffer(), sOffset, dOffset, sizeBytes); if (command == nullptr) { return hipErrorOutOfMemory; } // Make sure runtime has valid memory for the command execution. P2P access // requires page table mapping on the current device to another GPU memory if (!static_cast(command)->validateMemory()) { delete command; return hipErrorInvalidValue; } } else { hip::Stream* pStream = &stream; if ((srcMemory->getContext().devices()[0] == dstMemory->getContext().devices()[0]) && (queueDevice != srcMemory->getContext().devices()[0])) { copyMetadata.copyEnginePreference_ = amd::CopyMetadata::CopyEnginePreference::NONE; pStream = hip::getNullStream(srcMemory->getContext()); amd::Command* cmd = stream.getLastQueuedCommand(true); if (cmd != nullptr) { waitList.push_back(cmd); } } else if (srcMemory->getContext().devices()[0] != dstMemory->getContext().devices()[0]) { // Scenarios such as DtoH where dst is pinned memory if ((queueDevice != srcMemory->getContext().devices()[0]) && (dstMemory->getContext().devices().size() != 1)) { pStream = hip::getNullStream(srcMemory->getContext()); amd::Command* cmd = stream.getLastQueuedCommand(true); if (cmd != nullptr) { waitList.push_back(cmd); } // Scenarios such as HtoD where src is pinned memory } else if ((queueDevice != dstMemory->getContext().devices()[0]) && (srcMemory->getContext().devices().size() != 1)) { pStream = hip::getNullStream(dstMemory->getContext()); amd::Command* cmd = stream.getLastQueuedCommand(true); if (cmd != nullptr) { waitList.push_back(cmd); } } } command = new amd::CopyMemoryCommand(*pStream, CL_COMMAND_COPY_BUFFER, waitList, *srcMemory->asBuffer(), *dstMemory->asBuffer(), sOffset, dOffset, sizeBytes, copyMetadata); } } if (command == nullptr) { return hipErrorOutOfMemory; } if (waitList.size() > 0) { waitList[0]->release(); } return hipSuccess; } bool IsHtoHMemcpy(void* dst, const void* src, hipMemcpyKind kind) { size_t sOffset = 0; amd::Memory* srcMemory = getMemoryObject(src, sOffset); size_t dOffset = 0; amd::Memory* dstMemory = getMemoryObject(dst, dOffset); if (srcMemory == nullptr && dstMemory == nullptr) { if (kind == hipMemcpyHostToHost || kind == hipMemcpyDefault) { return true; } } return false; } void ihipHtoHMemcpy(void* dst, const void* src, size_t sizeBytes, hip::Stream& stream) { stream.finish(); memcpy(dst, src, sizeBytes); } // ================================================================================================ hipError_t ihipMemcpy(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hip::Stream& stream, bool isHostAsync, bool isGPUAsync) { hipError_t status; if (sizeBytes == 0) { // Skip if nothing needs writing. return hipSuccess; } status = ihipMemcpy_validate(dst, src, sizeBytes, kind); if (status != hipSuccess) { return status; } if (src == dst && kind == hipMemcpyDefault) { return hipSuccess; } size_t sOffset = 0; amd::Memory* srcMemory = getMemoryObject(src, sOffset); size_t dOffset = 0; amd::Memory* dstMemory = getMemoryObject(dst, dOffset); if (srcMemory == nullptr && dstMemory == nullptr) { ihipHtoHMemcpy(dst, src, sizeBytes, stream); return hipSuccess; } else if (((srcMemory == nullptr) && (dstMemory != nullptr)) || ((srcMemory != nullptr) && (dstMemory == nullptr))) { isHostAsync = false; } else if (srcMemory->getContext().devices()[0] == dstMemory->getContext().devices()[0]) { hipMemoryType srcMemoryType = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & srcMemory->getMemFlags())? hipMemoryTypeHost : hipMemoryTypeDevice; hipMemoryType dstMemoryType = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & dstMemory->getMemFlags())? hipMemoryTypeHost : hipMemoryTypeDevice; // Device to Device copies do not need to host side synchronization. if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeDevice)) { isHostAsync = true; } } amd::Command* command = nullptr; status = ihipMemcpyCommand(command, dst, src, sizeBytes, kind, stream, isHostAsync); if (status != hipSuccess) { return status; } command->enqueue(); if (!isHostAsync) { command->awaitCompletion(); } else if (!isGPUAsync) { hip::Stream* pStream = hip::getNullStream(dstMemory->getContext()); amd::Command::EventWaitList waitList; waitList.push_back(command); amd::Command* depdentMarker = new amd::Marker(*pStream, false, waitList); if (depdentMarker != nullptr) { depdentMarker->enqueue(); depdentMarker->release(); } } else { amd::HostQueue* newQueue = command->queue(); if (newQueue != &stream) { amd::Command::EventWaitList waitList; amd::Command* cmd = newQueue->getLastQueuedCommand(true); if (cmd != nullptr) { waitList.push_back(cmd); amd::Command* depdentMarker = new amd::Marker(stream, true, waitList); if (depdentMarker != nullptr) { depdentMarker->enqueue(); depdentMarker->release(); } cmd->release(); } } } command->release(); return hipSuccess; } // ================================================================================================ hipError_t hipExtMallocWithFlags(void** ptr, size_t sizeBytes, unsigned int flags) { HIP_INIT_API(hipExtMallocWithFlags, ptr, sizeBytes, flags); unsigned int ihipFlags = 0; if (flags == hipDeviceMallocDefault) { ihipFlags = 0; } else if (flags == hipDeviceMallocFinegrained) { ihipFlags = CL_MEM_SVM_ATOMICS; } else if (flags == hipDeviceMallocUncached) { ihipFlags = CL_MEM_SVM_ATOMICS | ROCCLR_MEM_HSA_UNCACHED; } else if (flags == hipMallocSignalMemory) { ihipFlags = CL_MEM_SVM_ATOMICS | CL_MEM_SVM_FINE_GRAIN_BUFFER | ROCCLR_MEM_HSA_SIGNAL_MEMORY; if (sizeBytes != 8) { HIP_RETURN(hipErrorInvalidValue); } } else { HIP_RETURN(hipErrorInvalidValue); } hipError_t status = ihipMalloc(ptr, sizeBytes, ihipFlags); if ((status == hipSuccess) && ((*ptr) != nullptr)) { size_t offset = 0; // This is ignored amd::Memory* svmMem = getMemoryObject(*ptr, offset); // Save the HIP memory flags so that they can be accessed later svmMem->getUserData().flags = flags; } HIP_RETURN(status, (ptr != nullptr)? *ptr : nullptr); } hipError_t hipMalloc(void** ptr, size_t sizeBytes) { HIP_INIT_API(hipMalloc, ptr, sizeBytes); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN_DURATION(ihipMalloc(ptr, sizeBytes, 0), (ptr != nullptr)? *ptr : nullptr); } hipError_t hipHostMalloc(void** ptr, size_t sizeBytes, unsigned int flags) { HIP_INIT_API(hipHostMalloc, ptr, sizeBytes, flags); CHECK_STREAM_CAPTURE_SUPPORTED(); if (ptr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *ptr = nullptr; const unsigned int coherentFlags = hipHostMallocCoherent | hipHostMallocNonCoherent; // can't have both Coherent and NonCoherent flags set at the same time if ((flags & coherentFlags) == coherentFlags) { LogPrintfError( "Cannot have both coherent and non-coherent flags " "at the same time, flags: %u coherent flags: %u \n", flags, coherentFlags); HIP_RETURN(hipErrorInvalidValue); } unsigned int ihipFlags = CL_MEM_SVM_FINE_GRAIN_BUFFER; if (flags == 0 || flags & (hipHostMallocCoherent | hipHostMallocMapped | hipHostMallocNumaUser) || (!(flags & hipHostMallocNonCoherent) && HIP_HOST_COHERENT)) { ihipFlags |= CL_MEM_SVM_ATOMICS; } if (flags & hipHostMallocNumaUser) { ihipFlags |= CL_MEM_FOLLOW_USER_NUMA_POLICY; } if (flags & hipHostMallocNonCoherent) { ihipFlags &= ~CL_MEM_SVM_ATOMICS; } hipError_t status = ihipMalloc(ptr, sizeBytes, ihipFlags); if ((status == hipSuccess) && ((*ptr) != nullptr)) { size_t offset = 0; // This is ignored amd::Memory* svmMem = getMemoryObject(*ptr, offset); // Save the HIP memory flags so that they can be accessed later svmMem->getUserData().flags = flags; } HIP_RETURN_DURATION(status, *ptr); } hipError_t hipFree(void* ptr) { HIP_INIT_API(hipFree, ptr); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipFree(ptr)); } hipError_t hipMemcpy_common(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream = nullptr) { CHECK_STREAM_CAPTURING(); hip::Stream* hip_stream = nullptr; if (stream != nullptr) { hip_stream = hip::getStream(stream); } else { hip_stream = hip::getNullStream(); } if (hip_stream == nullptr) { return hipErrorInvalidValue; } return ihipMemcpy(dst, src, sizeBytes, kind, *hip_stream); } hipError_t hipMemcpy(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy, dst, src, sizeBytes, kind); HIP_RETURN_DURATION(hipMemcpy_common(dst, src, sizeBytes, kind)); } hipError_t hipMemcpy_spt(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy, dst, src, sizeBytes, kind); HIP_RETURN_DURATION(hipMemcpy_common(dst, src, sizeBytes, kind, getPerThreadDefaultStream())); } hipError_t hipMemcpyWithStream(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyWithStream, dst, src, sizeBytes, kind, stream); STREAM_CAPTURE(hipMemcpyAsync, stream, dst, src, sizeBytes, kind); if (!hip::isValid(stream)) { HIP_RETURN(hipErrorContextIsDestroyed); } hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION(ihipMemcpy(dst, src, sizeBytes, kind, *hip_stream, false)); } hipError_t hipMemPtrGetInfo(void *ptr, size_t *size) { HIP_INIT_API(hipMemPtrGetInfo, ptr, size); size_t offset = 0; amd::Memory* svmMem = getMemoryObject(ptr, offset); if (svmMem == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *size = svmMem->getSize(); HIP_RETURN(hipSuccess); } hipError_t hipHostFree(void* ptr) { HIP_INIT_API(hipHostFree, ptr); CHECK_STREAM_CAPTURE_SUPPORTED(); size_t offset = 0; amd::Memory* memory_object = getMemoryObject(ptr, offset); if (memory_object != nullptr) { if (memory_object->getSvmPtr() == nullptr) { return hipErrorInvalidValue; } } HIP_RETURN(ihipFree(ptr)); } hipError_t ihipArrayDestroy(hipArray* array) { if (array == nullptr) { return hipErrorInvalidValue; } { amd::ScopedLock lock(hip::hipArraySetLock); if (hip::hipArraySet.find(array) == hip::hipArraySet.end()) { return hipErrorContextIsDestroyed; } else { hip::hipArraySet.erase(array); } } cl_mem memObj = reinterpret_cast(array->data); if (is_valid(memObj) == false) { return hipErrorInvalidValue; } auto image = as_amd(memObj); // Wait on the device, associated with the current memory object during allocation hip::Stream::SyncAllStreams(image->getUserData().deviceId); image->release(); delete array; return hipSuccess; } hipError_t hipFreeArray(hipArray* array) { HIP_INIT_API(hipFreeArray, array); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipArrayDestroy(array)); } hipError_t hipMemGetAddressRange(hipDeviceptr_t* pbase, size_t* psize, hipDeviceptr_t dptr) { HIP_INIT_API(hipMemGetAddressRange, pbase, psize, dptr); // Since we are using SVM buffer DevicePtr and HostPtr is the same void* ptr = dptr; size_t offset = 0; amd::Memory* svmMem = getMemoryObject(ptr, offset); if (svmMem == nullptr) { HIP_RETURN(hipErrorNotFound); } *pbase = svmMem->getSvmPtr(); *psize = svmMem->getSize(); HIP_RETURN(hipSuccess); } hipError_t hipMemGetInfo(size_t* free, size_t* total) { HIP_INIT_API(hipMemGetInfo, free, total); if (free == nullptr && total == nullptr) { HIP_RETURN(hipSuccess); } size_t freeMemory[2]; amd::Device* device = hip::getCurrentDevice()->devices()[0]; if (device == nullptr) { HIP_RETURN(hipErrorInvalidDevice); } if (!device->globalFreeMemory(freeMemory)) { HIP_RETURN(hipErrorInvalidValue); } if (free != nullptr) { *free = freeMemory[0] * Ki; } if (total != nullptr) { *total = device->info().globalMemSize_; } HIP_RETURN(hipSuccess); } hipError_t ihipMallocPitch(void** ptr, size_t* pitch, size_t width, size_t height, size_t depth) { amd::Device* device = hip::getCurrentDevice()->devices()[0]; if ((ptr == nullptr) || (pitch == nullptr)) { return hipErrorInvalidValue; } if ((width == 0) || (height == 0) || (depth == 0)) { *ptr = nullptr; return hipSuccess; } if (device && !device->info().imageSupport_) { LogPrintfError("Image is not supported on device %p \n", device); return hipErrorInvalidValue; } //avoid size_t overflow for pitch calculation if (width > (std::numeric_limits::max() - device->info().imagePitchAlignment_)) { return hipErrorInvalidValue; } *pitch = amd::alignUp(width, device->info().imagePitchAlignment_); size_t sizeBytes = *pitch * height * depth; if (device->info().maxMemAllocSize_ < sizeBytes) { return hipErrorOutOfMemory; } *ptr = amd::SvmBuffer::malloc(*hip::getCurrentDevice()->asContext(), 0, sizeBytes, device->info().memBaseAddrAlign_); if (*ptr == nullptr) { return hipErrorOutOfMemory; } size_t offset = 0; //this is ignored amd::Memory* memObj = getMemoryObject(*ptr, offset); memObj->getUserData().pitch_ = *pitch; memObj->getUserData().width_ = width; memObj->getUserData().height_ = height; memObj->getUserData().depth_ = depth; //saves the current device id so that it can be accessed later memObj->getUserData().deviceId = hip::getCurrentDevice()->deviceId(); return hipSuccess; } hipError_t hipMallocPitch(void** ptr, size_t* pitch, size_t width, size_t height) { HIP_INIT_API(hipMallocPitch, ptr, pitch, width, height); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipMallocPitch(ptr, pitch, width, height, 1), (ptr != nullptr)? *ptr : nullptr); } hipError_t hipMalloc3D(hipPitchedPtr* pitchedDevPtr, hipExtent extent) { HIP_INIT_API(hipMalloc3D, pitchedDevPtr, extent); CHECK_STREAM_CAPTURE_SUPPORTED(); size_t pitch = 0; if (pitchedDevPtr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hipError_t status = ihipMallocPitch(&pitchedDevPtr->ptr, &pitch, extent.width, extent.height, extent.depth); if (status == hipSuccess) { pitchedDevPtr->pitch = pitch; pitchedDevPtr->xsize = extent.width; pitchedDevPtr->ysize = extent.height; } HIP_RETURN(status, *pitchedDevPtr); } amd::Image* ihipImageCreate(const cl_channel_order channelOrder, const cl_channel_type channelType, const cl_mem_object_type imageType, const size_t imageWidth, const size_t imageHeight, const size_t imageDepth, const size_t imageArraySize, const size_t imageRowPitch, const size_t imageSlicePitch, const uint32_t numMipLevels, amd::Memory* buffer, hipError_t& status) { status = hipSuccess; const amd::Image::Format imageFormat({channelOrder, channelType}); if (!imageFormat.isValid()) { LogPrintfError("Invalid Image format for channel Order:%u Type:%u \n", channelOrder, channelType); status = hipErrorInvalidValue; return nullptr; } amd::Context& context = *hip::getCurrentDevice()->asContext(); if (!imageFormat.isSupported(context, imageType)) { LogPrintfError("Image type: %u not supported \n", imageType); status = hipErrorInvalidValue; return nullptr; } const std::vector& devices = context.devices(); if (!devices[0]->info().imageSupport_) { LogPrintfError("Device: 0x%x does not support image \n", devices[0]); status = hipErrorInvalidValue; return nullptr; } if (!amd::Image::validateDimensions(devices, imageType, imageWidth, imageHeight, imageDepth, imageArraySize)) { DevLogError("Image does not have valid dimensions \n"); status = hipErrorInvalidValue; return nullptr; } if (numMipLevels > 0) { size_t max_dim = std::max(std::max(imageWidth, imageHeight), imageDepth); size_t mip_levels = 0; for (mip_levels = 0; max_dim > 0; max_dim >>=1, mip_levels++); // empty for loop if (mip_levels < numMipLevels) { LogPrintfError("Invalid Mip Levels: %d", numMipLevels); status = hipErrorInvalidValue; return nullptr; } } // TODO validate the image descriptor. amd::Image* image = nullptr; if (buffer != nullptr) { switch (imageType) { case CL_MEM_OBJECT_IMAGE1D_BUFFER: case CL_MEM_OBJECT_IMAGE2D: image = new (context) amd::Image(*buffer->asBuffer(), imageType, CL_MEM_READ_WRITE, imageFormat, imageWidth, (imageHeight == 0) ? 1 : imageHeight, (imageDepth == 0) ? 1 : imageDepth, imageRowPitch, imageSlicePitch); break; default: LogPrintfError("Cannot create image of imageType: 0x%x \n", imageType); } } else { switch (imageType) { case CL_MEM_OBJECT_IMAGE1D: case CL_MEM_OBJECT_IMAGE2D: case CL_MEM_OBJECT_IMAGE3D: image = new (context) amd::Image(context, imageType, CL_MEM_READ_WRITE, imageFormat, imageWidth, (imageHeight == 0) ? 1 : imageHeight, (imageDepth == 0) ? 1 : imageDepth, imageWidth * imageFormat.getElementSize(), /* row pitch */ imageWidth * imageHeight * imageFormat.getElementSize(), /* slice pitch */ numMipLevels); break; case CL_MEM_OBJECT_IMAGE1D_ARRAY: image = new (context) amd::Image(context, imageType, CL_MEM_READ_WRITE, imageFormat, imageWidth, imageArraySize, 1, /* image depth */ imageWidth * imageFormat.getElementSize(), imageWidth * imageHeight * imageFormat.getElementSize(), numMipLevels); break; case CL_MEM_OBJECT_IMAGE2D_ARRAY: image = new (context) amd::Image(context, imageType, CL_MEM_READ_WRITE, imageFormat, imageWidth, imageHeight, imageArraySize, imageWidth * imageFormat.getElementSize(), imageWidth * imageHeight * imageFormat.getElementSize(), numMipLevels); break; default: LogPrintfError("Cannot create image of imageType: 0x%x \n", imageType); } } if (image == nullptr) { status = hipErrorOutOfMemory; return nullptr; } if (!image->create(nullptr)) { LogPrintfError("Cannot create image: 0x%x \n", image); status = hipErrorOutOfMemory; delete image; return nullptr; } // Save device ID image was creted on image->getUserData().deviceId = hip::getCurrentDevice()->deviceId(); return image; } hipError_t ihipArrayCreate(hipArray** array, const HIP_ARRAY3D_DESCRIPTOR* pAllocateArray, unsigned int numMipmapLevels) { if (array == nullptr) { return hipErrorInvalidValue; } // NumChannels specifies the number of packed components per HIP array element; it may be 1, 2, or 4; if ((pAllocateArray->NumChannels != 1) && (pAllocateArray->NumChannels != 2) && (pAllocateArray->NumChannels != 4)) { return hipErrorInvalidValue; } if (pAllocateArray->Flags & hipArrayCubemap) { return hipErrorInvalidValue; } const cl_channel_order channelOrder = hip::getCLChannelOrder(pAllocateArray->NumChannels, 0); const cl_channel_type channelType = hip::getCLChannelType(pAllocateArray->Format, hipReadModeElementType); const cl_mem_object_type imageType = hip::getCLMemObjectType(pAllocateArray->Width, pAllocateArray->Height, pAllocateArray->Depth, pAllocateArray->Flags); hipError_t status = hipSuccess; amd::Image* image = ihipImageCreate(channelOrder, channelType, imageType, pAllocateArray->Width, pAllocateArray->Height, pAllocateArray->Depth, // The number of layers is determined by the depth extent. pAllocateArray->Depth, /* array size */ 0, /* row pitch */ 0, /* slice pitch */ numMipmapLevels, nullptr, /* buffer */ status); if (image == nullptr) { return status; } cl_mem memObj = as_cl(image); *array = new hipArray{reinterpret_cast(memObj)}; // It is UB to call hipGet*() on an array created via hipArrayCreate()/hipArray3DCreate(). // This is due to hip not differentiating between runtime and driver types. // TODO change the hipArray struct in driver_types.h. (*array)->desc = hip::getChannelFormatDesc(pAllocateArray->NumChannels, pAllocateArray->Format); (*array)->width = pAllocateArray->Width; (*array)->height = pAllocateArray->Height; (*array)->depth = pAllocateArray->Depth; (*array)->Format = pAllocateArray->Format; (*array)->NumChannels = pAllocateArray->NumChannels; (*array)->flags = pAllocateArray->Flags; { amd::ScopedLock lock(hip::hipArraySetLock); hip::hipArraySet.insert(*array); } return hipSuccess; } hipError_t hipArrayCreate(hipArray** array, const HIP_ARRAY_DESCRIPTOR* pAllocateArray) { HIP_INIT_API(hipArrayCreate, array, pAllocateArray); if (pAllocateArray == nullptr) { return hipErrorInvalidValue; } CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_ARRAY3D_DESCRIPTOR desc = {pAllocateArray->Width, pAllocateArray->Height, 0, /* Depth */ pAllocateArray->Format, pAllocateArray->NumChannels, hipArrayDefault /* Flags */}; HIP_RETURN(ihipArrayCreate(array, &desc, 0)); } hipError_t hipMallocArray(hipArray** array, const hipChannelFormatDesc* desc, size_t width, size_t height, unsigned int flags) { HIP_INIT_API(hipMallocArray, array, desc, width, height, flags); if (array == nullptr || desc == nullptr) { return hipErrorInvalidValue; } CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_ARRAY3D_DESCRIPTOR allocateArray = {width, height, 0, /* Depth */ hip::getArrayFormat(*desc), hip::getNumChannels(*desc), flags}; if(!hip::CheckArrayFormat(*desc)) { return hipErrorInvalidValue; } HIP_RETURN(ihipArrayCreate(array, &allocateArray, 0 /* numMipLevels */)); } hipError_t hipArray3DCreate(hipArray** array, const HIP_ARRAY3D_DESCRIPTOR* pAllocateArray) { HIP_INIT_API(hipArray3DCreate, array, pAllocateArray); CHECK_STREAM_CAPTURE_SUPPORTED(); if (pAllocateArray == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(ihipArrayCreate(array, pAllocateArray, 0 /* numMipLevels */)); } hipError_t hipMalloc3DArray(hipArray_t* array, const hipChannelFormatDesc* desc, hipExtent extent, unsigned int flags) { HIP_INIT_API(hipMalloc3DArray, array, desc, extent, flags); if (array == nullptr || desc == nullptr) { return hipErrorInvalidValue; } CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_ARRAY3D_DESCRIPTOR allocateArray = {extent.width, extent.height, extent.depth, hip::getArrayFormat(*desc), hip::getNumChannels(*desc), flags}; if(!hip::CheckArrayFormat(*desc)) { return hipErrorInvalidValue; } HIP_RETURN(ihipArrayCreate(array, &allocateArray, 0)); } hipError_t hipHostGetFlags(unsigned int* flagsPtr, void* hostPtr) { HIP_INIT_API(hipHostGetFlags, flagsPtr, hostPtr); if (flagsPtr == nullptr || hostPtr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } size_t offset = 0; amd::Memory* svmMem = getMemoryObject(hostPtr, offset); if (svmMem == nullptr) { HIP_RETURN(hipErrorInvalidValue); } // To match with Nvidia behaviour validate that hostPtr passed was allocated using hipHostMalloc(), and not hipMalloc() if (!(svmMem->getMemFlags() & CL_MEM_SVM_FINE_GRAIN_BUFFER)) { HIP_RETURN(hipErrorInvalidValue); } // Retrieve HIP memory flags *flagsPtr = svmMem->getUserData().flags; HIP_RETURN(hipSuccess); } hipError_t ihipHostRegister(void* hostPtr, size_t sizeBytes, unsigned int flags) { if (hostPtr == nullptr || sizeBytes == 0 || flags > 15) { return hipErrorInvalidValue; } else { amd::Memory* mem = new (*hip::host_context) amd::Buffer(*hip::host_context, CL_MEM_USE_HOST_PTR | CL_MEM_SVM_ATOMICS, sizeBytes); constexpr bool sysMemAlloc = false; constexpr bool skipAlloc = false; constexpr bool forceAlloc = true; if (!mem->create(hostPtr, sysMemAlloc, skipAlloc, forceAlloc)) { mem->release(); LogPrintfError("Cannot create memory for size: %u with flags: %d \n", sizeBytes, flags); return hipErrorInvalidValue; } amd::MemObjMap::AddMemObj(hostPtr, mem); for (const auto& device : g_devices) { // Since the amd::Memory object is shared between all devices // it's fine to have multiple addresses mapped to it const device::Memory* devMem = mem->getDeviceMemory(*device->devices()[0]); void* vAddr = reinterpret_cast(devMem->virtualAddress()); if ((hostPtr != vAddr) && (amd::MemObjMap::FindMemObj(vAddr) == nullptr)) { amd::MemObjMap::AddMemObj(vAddr, mem); } } if (mem != nullptr) { mem->getUserData().deviceId = hip::getCurrentDevice()->deviceId(); // Save the HIP memory flags so that they can be accessed later mem->getUserData().flags = flags; } return hipSuccess; } } hipError_t hipHostRegister(void* hostPtr, size_t sizeBytes, unsigned int flags) { HIP_INIT_API(hipHostRegister, hostPtr, sizeBytes, flags); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipHostRegister(hostPtr, sizeBytes,flags)); } hipError_t ihipHostUnregister(void* hostPtr) { if (hostPtr == nullptr) { return hipErrorInvalidValue; } size_t offset = 0; amd::Memory* mem = getMemoryObject(hostPtr, offset); if (mem != nullptr) { // Wait on the device, associated with the current memory object during allocation hip::Stream::SyncAllStreams(mem->getUserData().deviceId); amd::MemObjMap::RemoveMemObj(hostPtr); for (const auto& device: g_devices) { const device::Memory* devMem = mem->getDeviceMemory(*device->devices()[0]); if (devMem != nullptr) { void* vAddr = reinterpret_cast(devMem->virtualAddress()); if ((vAddr != hostPtr) && amd::MemObjMap::FindMemObj(vAddr)) { amd::MemObjMap::RemoveMemObj(vAddr); } } } mem->release(); return hipSuccess; } LogPrintfError("Cannot unregister host_ptr: 0x%x \n", hostPtr); return hipErrorHostMemoryNotRegistered; } hipError_t hipHostUnregister(void* hostPtr) { HIP_INIT_API(hipHostUnregister, hostPtr); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipHostUnregister(hostPtr)); } // Deprecated function: hipError_t hipHostAlloc(void** ptr, size_t sizeBytes, unsigned int flags) { HIP_INIT_API(hipHostAlloc, ptr, sizeBytes, flags); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipMalloc(ptr, sizeBytes, flags), (ptr != nullptr)? *ptr : nullptr); }; inline hipError_t ihipMemcpySymbol_validate(const void* symbol, size_t sizeBytes, size_t offset, size_t &sym_size, hipDeviceptr_t &device_ptr) { HIP_RETURN_ONFAIL(PlatformState::instance().getStatGlobalVar(symbol, ihipGetDevice(), &device_ptr, &sym_size)); /* Size Check to make sure offset is correct */ if ((offset + sizeBytes) > sym_size) { LogPrintfError("Trying to access out of bounds, offset: %u sizeBytes: %u sym_size: %u \n", offset, sizeBytes, sym_size); HIP_RETURN(hipErrorInvalidValue); } device_ptr = reinterpret_cast
(device_ptr) + offset; return hipSuccess; } hipError_t hipMemcpyToSymbol_common(const void* symbol, const void* src, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream=nullptr) { CHECK_STREAM_CAPTURING(); if (kind != hipMemcpyHostToDevice && kind != hipMemcpyDeviceToDevice) { HIP_RETURN(hipErrorInvalidMemcpyDirection); } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; hipError_t status = ihipMemcpySymbol_validate(symbol, sizeBytes, offset, sym_size, device_ptr); if (status != hipSuccess) { return status; } /* Copy memory from source to destination address */ return hipMemcpy_common(device_ptr, src, sizeBytes, kind, stream); } hipError_t hipMemcpyToSymbol(const void* symbol, const void* src, size_t sizeBytes, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpyToSymbol, symbol, src, sizeBytes, offset, kind); HIP_RETURN_DURATION(hipMemcpyToSymbol_common(symbol, src, sizeBytes, offset, kind)); } hipError_t hipMemcpyToSymbol_spt(const void* symbol, const void* src, size_t sizeBytes, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpyToSymbol, symbol, src, sizeBytes, offset, kind); HIP_RETURN_DURATION(hipMemcpyToSymbol_common(symbol, src, sizeBytes, offset, kind, getPerThreadDefaultStream())); } hipError_t hipMemcpyFromSymbol_common(void* dst, const void* symbol, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream=nullptr) { CHECK_STREAM_CAPTURING(); if (kind != hipMemcpyDeviceToHost && kind != hipMemcpyDeviceToDevice) { HIP_RETURN(hipErrorInvalidMemcpyDirection); } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; hipError_t status = ihipMemcpySymbol_validate(symbol, sizeBytes, offset, sym_size, device_ptr); if (status != hipSuccess) { return status; } /* Copy memory from source to destination address */ return hipMemcpy_common(dst, device_ptr, sizeBytes, kind, stream); } hipError_t hipMemcpyFromSymbol(void* dst, const void* symbol, size_t sizeBytes, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpyFromSymbol, symbol, dst, sizeBytes, offset, kind); HIP_RETURN_DURATION(hipMemcpyFromSymbol_common(dst, symbol, sizeBytes, offset, kind)); } hipError_t hipMemcpyFromSymbol_spt(void* dst, const void* symbol, size_t sizeBytes, size_t offset, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpyFromSymbol, symbol, dst, sizeBytes, offset, kind); HIP_RETURN_DURATION(hipMemcpyFromSymbol_common(dst, symbol, sizeBytes, offset, kind, getPerThreadDefaultStream())); } hipError_t hipMemcpyToSymbolAsync_common(const void* symbol, const void* src, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream) { STREAM_CAPTURE(hipMemcpyToSymbolAsync, stream, symbol, src, sizeBytes, offset, kind); if (kind != hipMemcpyHostToDevice && kind != hipMemcpyDeviceToDevice) { return hipErrorInvalidMemcpyDirection; } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; hipError_t status = ihipMemcpySymbol_validate(symbol, sizeBytes, offset, sym_size, device_ptr); if (status != hipSuccess) { return status; } /* Copy memory from source to destination address */ return hipMemcpyAsync(device_ptr, src, sizeBytes, kind, stream); } hipError_t hipMemcpyToSymbolAsync(const void* symbol, const void* src, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyToSymbolAsync, symbol, src, sizeBytes, offset, kind, stream); HIP_RETURN_DURATION(hipMemcpyToSymbolAsync_common(symbol, src, sizeBytes, offset, kind, stream)); } hipError_t hipMemcpyToSymbolAsync_spt(const void* symbol, const void* src, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyToSymbolAsync, symbol, src, sizeBytes, offset, kind, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN_DURATION(hipMemcpyToSymbolAsync_common(symbol, src, sizeBytes, offset, kind, stream)); } hipError_t hipMemcpyFromSymbolAsync_common(void* dst, const void* symbol, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream) { STREAM_CAPTURE(hipMemcpyFromSymbolAsync, stream, dst, symbol, sizeBytes, offset, kind); if (kind != hipMemcpyDeviceToHost && kind != hipMemcpyDeviceToDevice) { return hipErrorInvalidMemcpyDirection; } size_t sym_size = 0; hipDeviceptr_t device_ptr = nullptr; hipError_t status = ihipMemcpySymbol_validate(symbol, sizeBytes, offset, sym_size, device_ptr); if (status != hipSuccess) { return status; } /* Copy memory from source to destination address */ return hipMemcpyAsync(dst, device_ptr, sizeBytes, kind, stream); } hipError_t hipMemcpyFromSymbolAsync(void* dst, const void* symbol, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyFromSymbolAsync, dst, symbol, sizeBytes, offset, kind, stream); HIP_RETURN_DURATION(hipMemcpyFromSymbolAsync_common(dst, symbol, sizeBytes, offset, kind, stream)); } hipError_t hipMemcpyFromSymbolAsync_spt(void* dst, const void* symbol, size_t sizeBytes, size_t offset, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyFromSymbolAsync, dst, symbol, sizeBytes, offset, kind, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN_DURATION(hipMemcpyFromSymbolAsync_common(dst, symbol, sizeBytes, offset, kind, stream)); } hipError_t hipMemcpyHtoD(hipDeviceptr_t dstDevice, void* srcHost, size_t ByteCount) { HIP_INIT_API(hipMemcpyHtoD, dstDevice, srcHost, ByteCount); CHECK_STREAM_CAPTURING(); hip::Stream* stream = hip::getStream(nullptr); if (stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION(ihipMemcpy(dstDevice, srcHost, ByteCount, hipMemcpyHostToDevice, *stream)); } hipError_t hipMemcpyDtoH(void* dstHost, hipDeviceptr_t srcDevice, size_t ByteCount) { HIP_INIT_API(hipMemcpyDtoH, dstHost, srcDevice, ByteCount); CHECK_STREAM_CAPTURING(); hip::Stream* stream = hip::getStream(nullptr); if (stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION(ihipMemcpy(dstHost, srcDevice, ByteCount, hipMemcpyDeviceToHost, *stream)); } hipError_t hipMemcpyDtoD(hipDeviceptr_t dstDevice, hipDeviceptr_t srcDevice, size_t ByteCount) { HIP_INIT_API(hipMemcpyDtoD, dstDevice, srcDevice, ByteCount); CHECK_STREAM_CAPTURING(); hip::Stream* stream = hip::getStream(nullptr); if (stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION(ihipMemcpy(dstDevice, srcDevice, ByteCount, hipMemcpyDeviceToDevice, *stream)); } hipError_t hipMemcpyAsync_common(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream) { STREAM_CAPTURE(hipMemcpyAsync, stream, dst, src, sizeBytes, kind); hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } return ihipMemcpy(dst, src, sizeBytes, kind, *hip_stream, true); } hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyAsync, dst, src, sizeBytes, kind, stream); HIP_RETURN_DURATION(hipMemcpyAsync_common(dst, src, sizeBytes, kind, stream)); } hipError_t hipMemcpyAsync_spt(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyAsync, dst, src, sizeBytes, kind, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN_DURATION(hipMemcpyAsync_common(dst, src, sizeBytes, kind, stream)); } hipError_t hipMemcpyHtoDAsync(hipDeviceptr_t dstDevice, void* srcHost, size_t ByteCount, hipStream_t stream) { HIP_INIT_API(hipMemcpyHtoDAsync, dstDevice, srcHost, ByteCount, stream); hipMemcpyKind kind = hipMemcpyHostToDevice; STREAM_CAPTURE(hipMemcpyHtoDAsync, stream, dstDevice, srcHost, ByteCount, kind); hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION( ihipMemcpy(dstDevice, srcHost, ByteCount, kind, *hip_stream, true)); } hipError_t hipMemcpyDtoDAsync(hipDeviceptr_t dstDevice, hipDeviceptr_t srcDevice, size_t ByteCount, hipStream_t stream) { HIP_INIT_API(hipMemcpyDtoDAsync, dstDevice, srcDevice, ByteCount, stream); hipMemcpyKind kind = hipMemcpyDeviceToDevice; STREAM_CAPTURE(hipMemcpyDtoDAsync, stream, dstDevice, srcDevice, ByteCount, kind); hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION( ihipMemcpy(dstDevice, srcDevice, ByteCount, kind, *hip_stream, true)); } hipError_t hipMemcpyDtoHAsync(void* dstHost, hipDeviceptr_t srcDevice, size_t ByteCount, hipStream_t stream) { HIP_INIT_API(hipMemcpyDtoHAsync, dstHost, srcDevice, ByteCount, stream); hipMemcpyKind kind = hipMemcpyDeviceToHost; STREAM_CAPTURE(hipMemcpyDtoHAsync, stream, dstHost, srcDevice, ByteCount, kind); hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN_DURATION( ihipMemcpy(dstHost, srcDevice, ByteCount, kind, *hip_stream, true)); } hipError_t ihipMemcpyAtoDValidate(hipArray* srcArray, void* dstDevice, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t dstRowPitch, size_t dstSlicePitch, amd::Memory*& dstMemory, amd::Image*& srcImage, amd::BufferRect& srcRect, amd::BufferRect& dstRect) { size_t dstOffset = 0; dstMemory = getMemoryObject(dstDevice, dstOffset); if (srcArray == nullptr || (dstMemory == nullptr)) { return hipErrorInvalidValue; } cl_mem srcMemObj = reinterpret_cast(srcArray->data); if (!is_valid(srcMemObj)) { return hipErrorInvalidValue; } srcImage = as_amd(srcMemObj)->asImage(); // HIP assumes the width is in bytes, but OCL assumes it's in pixels. const size_t elementSize = srcImage->getImageFormat().getElementSize(); static_cast(srcOrigin)[0] /= elementSize; static_cast(copyRegion)[0] /= elementSize; if (!srcRect.create(static_cast(srcOrigin), static_cast(copyRegion), srcImage->getRowPitch(), srcImage->getSlicePitch())) { return hipErrorInvalidValue; } if (!dstRect.create(static_cast(dstOrigin), static_cast(copyRegion), dstRowPitch, dstSlicePitch)) { return hipErrorInvalidValue; } dstRect.start_ += dstOffset; dstRect.end_ += dstOffset; const size_t copySizeInBytes = copyRegion[0] * copyRegion[1] * copyRegion[2] * srcImage->getImageFormat().getElementSize(); if (!srcImage->validateRegion(srcOrigin, copyRegion) || !dstMemory->validateRegion(dstOrigin, {copySizeInBytes, 0, 0})) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyAtoDCommand(amd::Command*& command, hipArray* srcArray, void* dstDevice, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t dstRowPitch, size_t dstSlicePitch, hip::Stream* stream) { amd::BufferRect srcRect; amd::BufferRect dstRect; amd::Memory* dstMemory; amd::Image* srcImage; hipError_t status = ihipMemcpyAtoDValidate(srcArray, dstDevice, srcOrigin, dstOrigin, copyRegion, dstRowPitch, dstSlicePitch, dstMemory, srcImage, srcRect, dstRect); if (status != hipSuccess) { return status; } amd::CopyMemoryCommand* cpyMemCmd = new amd::CopyMemoryCommand(*stream, CL_COMMAND_COPY_IMAGE_TO_BUFFER, amd::Command::EventWaitList{}, *srcImage, *dstMemory, srcOrigin, dstOrigin, copyRegion, srcRect, dstRect); if (cpyMemCmd == nullptr) { return hipErrorOutOfMemory; } if (!cpyMemCmd->validatePeerMemory()) { delete cpyMemCmd; return hipErrorInvalidValue; } command = cpyMemCmd; return hipSuccess; } hipError_t ihipMemcpyDtoAValidate(void* srcDevice, hipArray* dstArray, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, amd::Image*& dstImage, amd::Memory*& srcMemory, amd::BufferRect& dstRect, amd::BufferRect& srcRect) { size_t srcOffset = 0; srcMemory = getMemoryObject(srcDevice, srcOffset); if ((srcMemory == nullptr) || dstArray == nullptr) { return hipErrorInvalidValue; } cl_mem dstMemObj = reinterpret_cast(dstArray->data); if (!is_valid(dstMemObj)) { return hipErrorInvalidValue; } dstImage = as_amd(dstMemObj)->asImage(); // HIP assumes the width is in bytes, but OCL assumes it's in pixels. const size_t elementSize = dstImage->getImageFormat().getElementSize(); static_cast(dstOrigin)[0] /= elementSize; static_cast(copyRegion)[0] /= elementSize; if (!srcRect.create(static_cast(srcOrigin), static_cast(copyRegion), srcRowPitch, srcSlicePitch)) { return hipErrorInvalidValue; } srcRect.start_ += srcOffset; srcRect.end_ += srcOffset; if (!dstRect.create(static_cast(dstOrigin), static_cast(copyRegion), dstImage->getRowPitch(), dstImage->getSlicePitch())) { return hipErrorInvalidValue; } const size_t copySizeInBytes = copyRegion[0] * copyRegion[1] * copyRegion[2] * dstImage->getImageFormat().getElementSize(); if (!srcMemory->validateRegion(srcOrigin, {copySizeInBytes, 0, 0}) || !dstImage->validateRegion(dstOrigin, copyRegion)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyDtoACommand(amd::Command*& command, void* srcDevice, hipArray* dstArray, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, hip::Stream* stream) { amd::Image* dstImage; amd::Memory* srcMemory; amd::BufferRect dstRect; amd::BufferRect srcRect; hipError_t status = ihipMemcpyDtoAValidate(srcDevice, dstArray, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, dstImage, srcMemory, dstRect, srcRect); if (status != hipSuccess) { return status; } amd::CopyMemoryCommand* cpyMemCmd = new amd::CopyMemoryCommand(*stream, CL_COMMAND_COPY_BUFFER_TO_IMAGE, amd::Command::EventWaitList{}, *srcMemory, *dstImage, srcOrigin, dstOrigin, copyRegion, srcRect, dstRect); if (cpyMemCmd == nullptr) { return hipErrorOutOfMemory; } if (!cpyMemCmd->validatePeerMemory()) { delete cpyMemCmd; return hipErrorInvalidValue; } command = cpyMemCmd; return hipSuccess; } hipError_t ihipMemcpyDtoDValidate(void* srcDevice, void* dstDevice, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, amd::Memory*& srcMemory, amd::Memory*& dstMemory, amd::BufferRect& srcRect, amd::BufferRect& dstRect) { size_t srcOffset = 0; srcMemory = getMemoryObject(srcDevice, srcOffset); size_t dstOffset = 0; dstMemory = getMemoryObject(dstDevice, dstOffset); if ((srcMemory == nullptr) || (dstMemory == nullptr)) { return hipErrorInvalidValue; } if (!srcRect.create(static_cast(srcOrigin), static_cast(copyRegion), srcRowPitch, srcSlicePitch)) { return hipErrorInvalidValue; } srcRect.start_ += srcOffset; srcRect.end_ += srcOffset; amd::Coord3D srcStart(srcRect.start_, 0, 0); amd::Coord3D srcSize(srcRect.end_ - srcRect.start_, 1, 1); if (!srcMemory->validateRegion(srcStart, srcSize)) { return hipErrorInvalidValue; } if (!dstRect.create(static_cast(dstOrigin), static_cast(copyRegion), dstRowPitch, dstSlicePitch)) { return hipErrorInvalidValue; } dstRect.start_ += dstOffset; dstRect.end_ += dstOffset; amd::Coord3D dstStart(dstRect.start_, 0, 0); amd::Coord3D dstSize(dstRect.end_ - dstRect.start_, 1, 1); if (!dstMemory->validateRegion(dstStart, dstSize)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyDtoDCommand(amd::Command*& command, void* srcDevice, void* dstDevice, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, hip::Stream* stream) { amd::Memory* srcMemory; amd::Memory* dstMemory; amd::BufferRect srcRect; amd::BufferRect dstRect; hipError_t status = ihipMemcpyDtoDValidate(srcDevice, dstDevice, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, dstRowPitch, dstSlicePitch, srcMemory, dstMemory, srcRect, dstRect); if (status != hipSuccess) { return status; } amd::Coord3D srcStart(srcRect.start_, 0, 0); amd::Coord3D dstStart(dstRect.start_, 0, 0); amd::CopyMemoryCommand* copyCommand = new amd::CopyMemoryCommand( *stream, CL_COMMAND_COPY_BUFFER_RECT, amd::Command::EventWaitList{}, *srcMemory, *dstMemory, srcStart, dstStart, copyRegion, srcRect, dstRect); if (copyCommand == nullptr) { return hipErrorOutOfMemory; } if (!copyCommand->validatePeerMemory()) { delete copyCommand; return hipErrorInvalidValue; } command = copyCommand; return hipSuccess; } hipError_t ihipMemcpyDtoHValidate(void* srcDevice, void* dstHost, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, amd::Memory*& srcMemory, amd::BufferRect& srcRect, amd::BufferRect& dstRect) { size_t srcOffset = 0; srcMemory = getMemoryObject(srcDevice, srcOffset); if ((srcMemory == nullptr) || (dstHost == nullptr)) { return hipErrorInvalidValue; } if (!srcRect.create(static_cast(srcOrigin), static_cast(copyRegion), srcRowPitch, srcSlicePitch)) { return hipErrorInvalidValue; } srcRect.start_ += srcOffset; srcRect.end_ += srcOffset; amd::Coord3D srcStart(srcRect.start_, 0, 0); amd::Coord3D srcSize(srcRect.end_ - srcRect.start_, 1, 1); if (!srcMemory->validateRegion(srcStart, srcSize)) { return hipErrorInvalidValue; } if (!dstRect.create(static_cast(dstOrigin), static_cast(copyRegion), dstRowPitch, dstSlicePitch)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyDtoHCommand(amd::Command*& command, void* srcDevice, void* dstHost, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, hip::Stream* stream, bool isAsync = false) { amd::Memory* srcMemory; amd::BufferRect srcRect; amd::BufferRect dstRect; hipError_t status = ihipMemcpyDtoHValidate(srcDevice, dstHost, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, dstRowPitch, dstSlicePitch, srcMemory, srcRect, dstRect); if (status != hipSuccess) { return status; } amd::Coord3D srcStart(srcRect.start_, 0, 0); amd::CopyMetadata copyMetadata(isAsync, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::ReadMemoryCommand* readCommand = new amd::ReadMemoryCommand(*stream, CL_COMMAND_READ_BUFFER_RECT, amd::Command::EventWaitList{}, *srcMemory, srcStart, copyRegion, dstHost, srcRect, dstRect, copyMetadata); if (readCommand == nullptr) { return hipErrorOutOfMemory; } if (!readCommand->validatePeerMemory()) { delete readCommand; return hipErrorInvalidValue; } command = readCommand; return hipSuccess; } hipError_t ihipMemcpyHtoDValidate(const void* srcHost, void* dstDevice, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, amd::Memory*& dstMemory, amd::BufferRect& srcRect, amd::BufferRect& dstRect) { size_t dstOffset = 0; dstMemory = getMemoryObject(dstDevice, dstOffset); if ((srcHost == nullptr) || (dstMemory == nullptr)) { return hipErrorInvalidValue; } if (!srcRect.create(static_cast(srcOrigin), static_cast(copyRegion), srcRowPitch, srcSlicePitch)) { return hipErrorInvalidValue; } if (!dstRect.create(static_cast(dstOrigin), static_cast(copyRegion), dstRowPitch, dstSlicePitch)) { return hipErrorInvalidValue; } dstRect.start_ += dstOffset; dstRect.end_ += dstOffset; amd::Coord3D dstStart(dstRect.start_, 0, 0); amd::Coord3D dstSize(dstRect.end_ - dstRect.start_, 1, 1); if (!dstMemory->validateRegion(dstStart, dstSize)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyHtoDCommand(amd::Command*& command, const void* srcHost, void* dstDevice, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, hip::Stream* stream, bool isAsync = false) { amd::Memory* dstMemory; amd::BufferRect srcRect; amd::BufferRect dstRect; hipError_t status = ihipMemcpyHtoDValidate(srcHost, dstDevice, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, dstRowPitch, dstSlicePitch, dstMemory, srcRect, dstRect); if (status != hipSuccess) { return status; } amd::Coord3D dstStart(dstRect.start_, 0, 0); amd::CopyMetadata copyMetadata(isAsync, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::WriteMemoryCommand* writeCommand = new amd::WriteMemoryCommand( *stream, CL_COMMAND_WRITE_BUFFER_RECT, amd::Command::EventWaitList{}, *dstMemory, dstStart, copyRegion, srcHost, dstRect, srcRect, copyMetadata); if (writeCommand == nullptr) { return hipErrorOutOfMemory; } if (!writeCommand->validatePeerMemory()) { delete writeCommand; return hipErrorInvalidValue; } command = writeCommand; return hipSuccess; } hipError_t ihipMemcpyHtoH(const void* srcHost, void* dstHost, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, hip::Stream* stream) { if ((srcHost == nullptr) || (dstHost == nullptr)) { return hipErrorInvalidValue; } amd::BufferRect srcRect; if (!srcRect.create(static_cast(srcOrigin), static_cast(copyRegion), srcRowPitch, srcSlicePitch)) { return hipErrorInvalidValue; } amd::BufferRect dstRect; if (!dstRect.create(static_cast(dstOrigin), static_cast(copyRegion), dstRowPitch, dstSlicePitch)) { return hipErrorInvalidValue; } if (stream) { stream->finish(); } for (size_t slice = 0; slice < copyRegion[2]; slice++) { for (size_t row = 0; row < copyRegion[1]; row++) { const void* srcRow = static_cast(srcHost) + srcRect.start_ + row * srcRect.rowPitch_ + slice * srcRect.slicePitch_; void* dstRow = static_cast(dstHost) + dstRect.start_ + row * dstRect.rowPitch_ + slice * dstRect.slicePitch_; std::memcpy(dstRow, srcRow, copyRegion[0]); } } return hipSuccess; } hipError_t ihipMemcpyAtoAValidate(hipArray* srcArray, hipArray* dstArray, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, amd::Image*& srcImage, amd::Image*& dstImage) { if (dstArray == nullptr || srcArray == nullptr) { return hipErrorInvalidValue; } cl_mem srcMemObj = reinterpret_cast(srcArray->data); cl_mem dstMemObj = reinterpret_cast(dstArray->data); if (!is_valid(srcMemObj) || !is_valid(dstMemObj)) { return hipErrorInvalidValue; } srcImage = as_amd(srcMemObj)->asImage(); dstImage = as_amd(dstMemObj)->asImage(); // HIP assumes the width is in bytes, but OCL assumes it's in pixels. // Note that src and dst should have the same element size. assert(srcImage->getImageFormat().getElementSize() == dstImage->getImageFormat().getElementSize()); const size_t elementSize = srcImage->getImageFormat().getElementSize(); static_cast(srcOrigin)[0] /= elementSize; static_cast(dstOrigin)[0] /= elementSize; static_cast(copyRegion)[0] /= elementSize; if (!srcImage->validateRegion(srcOrigin, copyRegion) || !dstImage->validateRegion(dstOrigin, copyRegion)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyAtoACommand(amd::Command*& command, hipArray* srcArray, hipArray* dstArray, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, hip::Stream* stream) { amd::Image* srcImage; amd::Image* dstImage; hipError_t status = ihipMemcpyAtoAValidate(srcArray, dstArray, srcOrigin, dstOrigin, copyRegion, srcImage, dstImage); if (status != hipSuccess) { return status; } amd::CopyMemoryCommand* cpyMemCmd = new amd::CopyMemoryCommand(*stream, CL_COMMAND_COPY_IMAGE, amd::Command::EventWaitList{}, *srcImage, *dstImage, srcOrigin, dstOrigin, copyRegion); if (cpyMemCmd == nullptr) { return hipErrorOutOfMemory; } if (!cpyMemCmd->validatePeerMemory()) { delete cpyMemCmd; return hipErrorInvalidValue; } command = cpyMemCmd; return hipSuccess; } hipError_t ihipMemcpyHtoAValidate(const void* srcHost, hipArray* dstArray, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t srcRowPitch, size_t srcSlicePitch, amd::Image*& dstImage, amd::BufferRect& srcRect) { if ((srcHost == nullptr) || dstArray == nullptr) { return hipErrorInvalidValue; } cl_mem dstMemObj = reinterpret_cast(dstArray->data); if (!is_valid(dstMemObj)) { return hipErrorInvalidValue; } if (!srcRect.create(static_cast(srcOrigin), static_cast(copyRegion), srcRowPitch, srcSlicePitch)) { return hipErrorInvalidValue; } dstImage = as_amd(dstMemObj)->asImage(); // HIP assumes the width is in bytes, but OCL assumes it's in pixels. const size_t elementSize = dstImage->getImageFormat().getElementSize(); static_cast(dstOrigin)[0] /= elementSize; static_cast(copyRegion)[0] /= elementSize; if (!dstImage->validateRegion(dstOrigin, copyRegion)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyHtoACommand(amd::Command*& command, const void* srcHost, hipArray* dstArray, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, hip::Stream* stream, bool isAsync = false) { amd::Image* dstImage; amd::BufferRect srcRect; hipError_t status = ihipMemcpyHtoAValidate(srcHost, dstArray, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, dstImage, srcRect); if (status != hipSuccess) { return status; } amd::CopyMetadata copyMetadata(isAsync, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::WriteMemoryCommand* writeMemCmd = new amd::WriteMemoryCommand( *stream, CL_COMMAND_WRITE_IMAGE, amd::Command::EventWaitList{}, *dstImage, dstOrigin, copyRegion, static_cast(srcHost) + srcRect.start_, srcRowPitch, srcSlicePitch, copyMetadata); if (writeMemCmd == nullptr) { return hipErrorOutOfMemory; } if (!writeMemCmd->validatePeerMemory()) { delete writeMemCmd; return hipErrorInvalidValue; } command = writeMemCmd; return hipSuccess; } hipError_t ihipMemcpyAtoHValidate(hipArray* srcArray, void* dstHost, amd::Coord3D& srcOrigin, amd::Coord3D& dstOrigin, amd::Coord3D& copyRegion, size_t dstRowPitch, size_t dstSlicePitch, amd::Image*& srcImage, amd::BufferRect& dstRect) { if (srcArray == nullptr || (dstHost == nullptr)) { return hipErrorInvalidValue; } cl_mem srcMemObj = reinterpret_cast(srcArray->data); if (!is_valid(srcMemObj)) { return hipErrorInvalidValue; } if (!dstRect.create(static_cast(dstOrigin), static_cast(copyRegion), dstRowPitch, dstSlicePitch)) { return hipErrorInvalidValue; } srcImage = as_amd(srcMemObj)->asImage(); // HIP assumes the width is in bytes, but OCL assumes it's in pixels. const size_t elementSize = srcImage->getImageFormat().getElementSize(); static_cast(srcOrigin)[0] /= elementSize; static_cast(copyRegion)[0] /= elementSize; if (!srcImage->validateRegion(srcOrigin, copyRegion) || !srcImage->isRowSliceValid(dstRowPitch, dstSlicePitch, copyRegion[0], copyRegion[1])) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpyAtoHCommand(amd::Command*& command, hipArray* srcArray, void* dstHost, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t dstRowPitch, size_t dstSlicePitch, hip::Stream* stream, bool isAsync = false) { amd::Image* srcImage; amd::BufferRect dstRect; amd::CopyMetadata copyMetadata(isAsync, amd::CopyMetadata::CopyEnginePreference::SDMA); hipError_t status = ihipMemcpyAtoHValidate(srcArray, dstHost, srcOrigin, dstOrigin, copyRegion, dstRowPitch, dstSlicePitch, srcImage, dstRect); if (status != hipSuccess) { return status; } amd::ReadMemoryCommand* readMemCmd = new amd::ReadMemoryCommand( *stream, CL_COMMAND_READ_IMAGE, amd::Command::EventWaitList{}, *srcImage, srcOrigin, copyRegion, static_cast(dstHost) + dstRect.start_, dstRowPitch, dstSlicePitch, copyMetadata); if (readMemCmd == nullptr) { return hipErrorOutOfMemory; } if (!readMemCmd->validatePeerMemory()) { delete readMemCmd; return hipErrorInvalidValue; } command = readMemCmd; return hipSuccess; } hipError_t ihipGetMemcpyParam3DCommand(amd::Command*& command, const HIP_MEMCPY3D* pCopy, hip::Stream* stream) { size_t offset = 0; // If {src/dst}MemoryType is hipMemoryTypeUnified, {src/dst}Device and {src/dst}Pitch specify the // (unified virtual address space) base address of the source data and the bytes per row to // apply. {src/dst}Array is ignored. hipMemoryType srcMemoryType = pCopy->srcMemoryType; if (srcMemoryType == hipMemoryTypeUnified) { amd::Memory* memObj = getMemoryObject(pCopy->srcDevice, offset); if (memObj != nullptr) { srcMemoryType = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags()) ? hipMemoryTypeHost : hipMemoryTypeDevice; } else { srcMemoryType = hipMemoryTypeHost; } if (srcMemoryType == hipMemoryTypeHost) { // {src/dst}Host may be unitialized. Copy over {src/dst}Device into it if we detect system // memory. const_cast(pCopy)->srcHost = pCopy->srcDevice; const_cast(pCopy)->srcXInBytes += offset; } } offset = 0; hipMemoryType dstMemoryType = pCopy->dstMemoryType; if (dstMemoryType == hipMemoryTypeUnified) { amd::Memory* memObj = getMemoryObject(pCopy->dstDevice, offset); if (memObj != nullptr) { dstMemoryType = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags()) ? hipMemoryTypeHost : hipMemoryTypeDevice; } else { dstMemoryType = hipMemoryTypeHost; } if (dstMemoryType == hipMemoryTypeHost) { const_cast(pCopy)->dstHost = pCopy->dstDevice; const_cast(pCopy)->dstXInBytes += offset; } } // If {src/dst}MemoryType is hipMemoryTypeHost, check if the memory was prepinned. // In that case upgrade the copy type to hipMemoryTypeDevice to avoid extra pinning. offset = 0; if (srcMemoryType == hipMemoryTypeHost) { srcMemoryType = getMemoryObject(pCopy->srcHost, offset) ? hipMemoryTypeDevice : hipMemoryTypeHost; if (srcMemoryType == hipMemoryTypeDevice) { const_cast(pCopy)->srcDevice = const_cast(pCopy->srcHost); } } offset = 0; if (dstMemoryType == hipMemoryTypeHost) { dstMemoryType = getMemoryObject(pCopy->dstHost, offset) ? hipMemoryTypeDevice : hipMemoryTypeHost; if (dstMemoryType == hipMemoryTypeDevice) { const_cast(pCopy)->dstDevice = const_cast(pCopy->dstHost); } } amd::Coord3D srcOrigin = {pCopy->srcXInBytes, pCopy->srcY, pCopy->srcZ}; amd::Coord3D dstOrigin = {pCopy->dstXInBytes, pCopy->dstY, pCopy->dstZ}; amd::Coord3D copyRegion = {pCopy->WidthInBytes, pCopy->Height, pCopy->Depth}; if ((srcMemoryType == hipMemoryTypeHost) && (dstMemoryType == hipMemoryTypeDevice)) { // Host to Device. return ihipMemcpyHtoDCommand(command, pCopy->srcHost, pCopy->dstDevice, srcOrigin, dstOrigin, copyRegion, pCopy->srcPitch, pCopy->srcPitch * pCopy->srcHeight, pCopy->dstPitch, pCopy->dstPitch * pCopy->dstHeight, stream); } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeHost)) { // Device to Host. return ihipMemcpyDtoHCommand(command, pCopy->srcDevice, pCopy->dstHost, srcOrigin, dstOrigin, copyRegion, pCopy->srcPitch, pCopy->srcPitch * pCopy->srcHeight, pCopy->dstPitch, pCopy->dstPitch * pCopy->dstHeight, stream); } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeDevice)) { // Device to Device. return ihipMemcpyDtoDCommand(command, pCopy->srcDevice, pCopy->dstDevice, srcOrigin, dstOrigin, copyRegion, pCopy->srcPitch, pCopy->srcPitch * pCopy->srcHeight, pCopy->dstPitch, pCopy->dstPitch * pCopy->dstHeight, stream); } else if ((srcMemoryType == hipMemoryTypeHost) && (dstMemoryType == hipMemoryTypeArray)) { // Host to Image. return ihipMemcpyHtoACommand(command, pCopy->srcHost, pCopy->dstArray, srcOrigin, dstOrigin, copyRegion, pCopy->srcPitch, pCopy->srcPitch * pCopy->srcHeight, stream); } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeHost)) { // Image to Host. return ihipMemcpyAtoHCommand(command, pCopy->srcArray, pCopy->dstHost, srcOrigin, dstOrigin, copyRegion, pCopy->dstPitch, pCopy->dstPitch * pCopy->dstHeight, stream); } else if ((srcMemoryType == hipMemoryTypeDevice) && (dstMemoryType == hipMemoryTypeArray)) { // Device to Image. return ihipMemcpyDtoACommand(command, pCopy->srcDevice, pCopy->dstArray, srcOrigin, dstOrigin, copyRegion, pCopy->srcPitch, pCopy->srcPitch * pCopy->srcHeight, stream); } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeDevice)) { // Image to Device. return ihipMemcpyAtoDCommand(command, pCopy->srcArray, pCopy->dstDevice, srcOrigin, dstOrigin, copyRegion, pCopy->dstPitch, pCopy->dstPitch * pCopy->dstHeight, stream); } else if ((srcMemoryType == hipMemoryTypeArray) && (dstMemoryType == hipMemoryTypeArray)) { // Image to Image. return ihipMemcpyAtoACommand(command, pCopy->srcArray, pCopy->dstArray, srcOrigin, dstOrigin, copyRegion, stream); } else { ShouldNotReachHere(); } return hipSuccess; } inline hipError_t ihipMemcpyCmdEnqueue(amd::Command* command, bool isAsync = false) { hipError_t status = hipSuccess; if (command == nullptr) { return hipErrorOutOfMemory; } command->enqueue(); if (!isAsync) { if (!command->awaitCompletion()) { status = hipErrorUnknown; } } command->release(); return status; } hipError_t ihipMemcpyParam3D(const HIP_MEMCPY3D* pCopy, hipStream_t stream, bool isAsync = false) { hipError_t status; size_t offset = 0; if (pCopy == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } if (pCopy->WidthInBytes == 0 || pCopy->Height == 0 || pCopy->Depth == 0) { LogPrintfInfo("Either Width :%d or Height: %d and Depth: %d is zero", pCopy->WidthInBytes, pCopy->Height, pCopy->Depth); return hipSuccess; } // If {src/dst}MemoryType is hipMemoryTypeUnified, {src/dst}Device and {src/dst}Pitch specify the (unified virtual address space) // base address of the source data and the bytes per row to apply. {src/dst}Array is ignored. hipMemoryType srcMemoryType = pCopy->srcMemoryType; if (srcMemoryType == hipMemoryTypeUnified) { amd::Memory* memObj = getMemoryObject(pCopy->srcDevice, offset); if (memObj != nullptr) { srcMemoryType = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags()) ? hipMemoryTypeHost : hipMemoryTypeDevice; } else { srcMemoryType = hipMemoryTypeHost; } if (srcMemoryType == hipMemoryTypeHost) { // {src/dst}Host may be unitialized. Copy over {src/dst}Device into it if we detect system memory. const_cast(pCopy)->srcHost = pCopy->srcDevice; const_cast(pCopy)->srcXInBytes += offset; // We don't need detect memory type again for hipMemoryTypeUnified const_cast(pCopy)->srcMemoryType = srcMemoryType; } } offset = 0; hipMemoryType dstMemoryType = pCopy->dstMemoryType; if (dstMemoryType == hipMemoryTypeUnified) { amd::Memory* memObj = getMemoryObject(pCopy->dstDevice, offset); if (memObj != nullptr) { dstMemoryType = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags()) ? hipMemoryTypeHost : hipMemoryTypeDevice; } else { dstMemoryType = hipMemoryTypeHost; } if (dstMemoryType == hipMemoryTypeHost) { const_cast(pCopy)->dstHost = pCopy->dstDevice; const_cast(pCopy)->dstXInBytes += offset; // We don't need detect memory type again for hipMemoryTypeUnified const_cast(pCopy)->dstMemoryType = dstMemoryType; } } // If {src/dst}MemoryType is hipMemoryTypeHost, check if the memory was prepinned. // In that case upgrade the copy type to hipMemoryTypeDevice to avoid extra pinning. offset = 0; if (srcMemoryType == hipMemoryTypeHost) { srcMemoryType = getMemoryObject(pCopy->srcHost, offset) ? hipMemoryTypeDevice : hipMemoryTypeHost; } if (dstMemoryType == hipMemoryTypeHost) { dstMemoryType = getMemoryObject(pCopy->dstHost, offset) ? hipMemoryTypeDevice : hipMemoryTypeHost; } if ((srcMemoryType == hipMemoryTypeHost) && (dstMemoryType == hipMemoryTypeHost)) { amd::Coord3D srcOrigin = {pCopy->srcXInBytes, pCopy->srcY, pCopy->srcZ}; amd::Coord3D dstOrigin = {pCopy->dstXInBytes, pCopy->dstY, pCopy->dstZ}; amd::Coord3D copyRegion = {pCopy->WidthInBytes, (pCopy->Height != 0) ? pCopy->Height : 1, (pCopy->Depth != 0) ? pCopy->Depth : 1}; // Host to Host. return ihipMemcpyHtoH(pCopy->srcHost, pCopy->dstHost, srcOrigin, dstOrigin, copyRegion, pCopy->srcPitch, pCopy->srcPitch * pCopy->srcHeight, pCopy->dstPitch, pCopy->dstPitch * pCopy->dstHeight, hip::getStream(stream)); } else { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } status = ihipGetMemcpyParam3DCommand(command, pCopy, hip_stream); if (status != hipSuccess) return status; // Transfers from device memory to pageable host memory and transfers from any // host memory to any host memory are synchronous with respect to the host. // Device to Device copies do not need to host side synchronization. if (dstMemoryType == hipMemoryTypeHost || ((pCopy->srcMemoryType == hipMemoryTypeHost) && (pCopy->dstMemoryType == hipMemoryTypeHost))) { isAsync = false; } else if ((pCopy->srcMemoryType == hipMemoryTypeDevice) && (pCopy->dstMemoryType == hipMemoryTypeDevice)) { // Device to Device copies dont need to wait for host synchronization isAsync = true; } return ihipMemcpyCmdEnqueue(command, isAsync); } } hipError_t ihipMemcpyParam2D(const hip_Memcpy2D* pCopy, hipStream_t stream, bool isAsync = false) { HIP_MEMCPY3D desc = hip::getDrvMemcpy3DDesc(*pCopy); return ihipMemcpyParam3D(&desc, stream, isAsync); } hipError_t ihipMemcpy2D(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream, bool isAsync = false) { hip_Memcpy2D desc = {}; if ((width == 0) || (height == 0)) { return hipSuccess; } if ((width > dpitch) || (width > spitch)) { return hipErrorInvalidPitchValue; } desc.srcXInBytes = 0; desc.srcY = 0; desc.srcMemoryType = std::get<0>(hip::getMemoryType(kind)); desc.srcHost = src; desc.srcDevice = const_cast(src); desc.srcArray = nullptr; // Ignored. desc.srcPitch = spitch; desc.dstXInBytes = 0; desc.dstY = 0; desc.dstMemoryType = std::get<1>(hip::getMemoryType(kind)); desc.dstHost = dst; desc.dstDevice = dst; desc.dstArray = nullptr; // Ignored. desc.dstPitch = dpitch; desc.WidthInBytes = width; desc.Height = height; return ihipMemcpyParam2D(&desc, stream, isAsync); } hipError_t hipMemcpyParam2D(const hip_Memcpy2D* pCopy) { HIP_INIT_API(hipMemcpyParam2D, pCopy); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(ihipMemcpyParam2D(pCopy, nullptr)); } hipError_t hipMemcpy2DValidateParams(hipMemcpyKind kind, hipStream_t stream = nullptr) { if (kind < hipMemcpyHostToHost || kind > hipMemcpyDefault) { return hipErrorInvalidMemcpyDirection; } if (!hip::isValid(stream)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t hipMemcpy2DValidateBuffer(const void* buf, size_t pitch, size_t width) { if (buf == nullptr) { return hipErrorInvalidValue; } if (pitch == 0 || pitch < width) { return hipErrorInvalidPitchValue; } return hipSuccess; } hipError_t hipMemcpy2DValidateArray(hipArray_const_t arr, size_t wOffset, size_t hOffset, size_t width, size_t height) { if (arr == nullptr) { return hipErrorInvalidHandle; } int FormatSize = hip::getElementSize(arr); if ((width + wOffset) > (arr->width * FormatSize)) { return hipErrorInvalidValue; } if (arr->height == 0) {//1D hipArray if (height + hOffset > 1) { return hipErrorInvalidValue; } } else if ((height + hOffset) > (arr->height)) {//2D hipArray return hipErrorInvalidValue; } return hipSuccess; } hipError_t hipMemcpy2D_common(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream = nullptr, bool isAsync = false) { hipError_t validateParams = hipSuccess, validateSrc = hipSuccess, validateDst = hipSuccess; if ((validateParams = hipMemcpy2DValidateParams(kind,stream)) != hipSuccess) { return validateParams; } if ((validateSrc = hipMemcpy2DValidateBuffer(src,spitch,width)) != hipSuccess) { return validateSrc; } if ((validateDst = hipMemcpy2DValidateBuffer(dst,dpitch, width)) != hipSuccess) { return validateDst; } return ihipMemcpy2D(dst, dpitch, src, spitch, width, height, kind, stream, isAsync); } hipError_t hipMemcpy2D(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy2D, dst, dpitch, src, spitch, width, height, kind); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(hipMemcpy2D_common(dst, dpitch, src, spitch, width, height, kind)); } hipError_t hipMemcpy2D_spt(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy2D, dst, dpitch, src, spitch, width, height, kind); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(hipMemcpy2D_common(dst, dpitch, src, spitch, width, height, kind, getPerThreadDefaultStream())); } hipError_t hipMemcpy2DAsync(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpy2DAsync, dst, dpitch, src, spitch, width, height, kind, stream); STREAM_CAPTURE(hipMemcpy2DAsync, stream, dst, dpitch, src, spitch, width, height, kind); HIP_RETURN_DURATION(hipMemcpy2D_common(dst, dpitch, src, spitch, width, height, kind, stream, true)); } hipError_t hipMemcpy2DAsync_spt(void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpy2DAsync, dst, dpitch, src, spitch, width, height, kind, stream); PER_THREAD_DEFAULT_STREAM(stream); STREAM_CAPTURE(hipMemcpy2DAsync, stream, dst, dpitch, src, spitch, width, height, kind); HIP_RETURN_DURATION(hipMemcpy2D_common(dst, dpitch, src, spitch, width, height, kind, stream, true)); } hipError_t ihipMemcpy2DToArray(hipArray_t dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream, bool isAsync = false) { if (dst == nullptr) { HIP_RETURN(hipErrorInvalidResourceHandle); } hip_Memcpy2D desc = {}; desc.srcXInBytes = 0; desc.srcY = 0; desc.srcMemoryType = std::get<0>(hip::getMemoryType(kind)); desc.srcHost = const_cast(src); desc.srcDevice = const_cast(src); desc.srcArray = nullptr; desc.srcPitch = spitch; desc.dstXInBytes = wOffset; desc.dstY = hOffset; desc.dstMemoryType = hipMemoryTypeArray; desc.dstHost = nullptr; desc.dstDevice = nullptr; desc.dstArray = dst; desc.dstPitch = 0; // Ignored. desc.WidthInBytes = width; desc.Height = height; return ihipMemcpyParam2D(&desc, stream, isAsync); } hipError_t hipMemcpy2DToArray_common(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream=nullptr, bool isAsync = false) { hipError_t validateParams = hipSuccess, validateSrc = hipSuccess, validateDst = hipSuccess; if ((validateParams = hipMemcpy2DValidateParams(kind,stream)) != hipSuccess) { return validateParams; } if ((validateSrc = hipMemcpy2DValidateBuffer(src,spitch,width)) != hipSuccess) { return validateSrc; } if ((validateDst = hipMemcpy2DValidateArray(dst, wOffset, hOffset, width, height)) != hipSuccess) { return validateDst; } return ihipMemcpy2DToArray(dst, wOffset, hOffset, src, spitch, width, height, kind, stream, isAsync); } hipError_t hipMemcpy2DToArray(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy2DToArray, dst, wOffset, hOffset, src, spitch, width, height, kind); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(hipMemcpy2DToArray_common(dst, wOffset, hOffset, src, spitch, width, height, kind)); } hipError_t hipMemcpy2DToArray_spt(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy2DToArray, dst, wOffset, hOffset, src, spitch, width, height, kind); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(hipMemcpy2DToArray_common(dst, wOffset, hOffset, src, spitch, width, height, kind, getPerThreadDefaultStream())); } hipError_t hipMemcpyToArray(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t count, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpyToArray, dst, wOffset, hOffset, src, count, kind); CHECK_STREAM_CAPTURING(); if (dst == nullptr) { HIP_RETURN(hipErrorInvalidValue); } const size_t arrayHeight = (dst->height != 0) ? dst->height : 1; const size_t witdthInBytes = count / arrayHeight; const size_t height = (count / dst->width) / hip::getElementSize(dst); HIP_RETURN_DURATION(ihipMemcpy2DToArray(dst, wOffset, hOffset, src, 0 /* spitch */, witdthInBytes, height, kind, nullptr)); } hipError_t ihipMemcpy2DFromArray(void* dst, size_t dpitch, hipArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream, bool isAsync = false) { if (src == nullptr) { HIP_RETURN(hipErrorInvalidResourceHandle); } hip_Memcpy2D desc = {}; desc.srcXInBytes = wOffsetSrc; desc.srcY = hOffsetSrc; desc.srcMemoryType = hipMemoryTypeArray; desc.srcHost = nullptr; desc.srcDevice = nullptr; desc.srcArray = const_cast(src); desc.srcPitch = 0; // Ignored. desc.dstXInBytes = 0; desc.dstY = 0; desc.dstMemoryType = std::get<1>(hip::getMemoryType(kind)); desc.dstHost = dst; desc.dstDevice = dst; desc.dstArray = nullptr; desc.dstPitch = dpitch; desc.WidthInBytes = width; desc.Height = height; return ihipMemcpyParam2D(&desc, stream, isAsync); } hipError_t hipMemcpyFromArray_common(void* dst, hipArray_const_t src, size_t wOffsetSrc, size_t hOffset, size_t count, hipMemcpyKind kind, hipStream_t stream) { CHECK_STREAM_CAPTURING(); if (src == nullptr) { return hipErrorInvalidValue; } const size_t arrayHeight = (src->height != 0) ? src->height : 1; const size_t witdthInBytes = count / arrayHeight; const size_t height = (count / src->width) / hip::getElementSize(src); return ihipMemcpy2DFromArray(dst, 0 /* dpitch */, src, wOffsetSrc, hOffset, witdthInBytes, height, kind, stream); } hipError_t hipMemcpyFromArray(void* dst, hipArray_const_t src, size_t wOffsetSrc, size_t hOffset, size_t count, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpyFromArray, dst, src, wOffsetSrc, hOffset, count, kind); HIP_RETURN_DURATION(hipMemcpyFromArray_common(dst, src, wOffsetSrc, hOffset, count, kind, nullptr)); } hipError_t hipMemcpyFromArray_spt(void* dst, hipArray_const_t src, size_t wOffsetSrc, size_t hOffset, size_t count, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpyFromArray, dst, src, wOffsetSrc, hOffset, count, kind); HIP_RETURN_DURATION(hipMemcpyFromArray_common(dst, src, wOffsetSrc, hOffset, count, kind, getPerThreadDefaultStream())); } hipError_t ihipMemcpyAtoD(hipArray* srcArray, void* dstDevice, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t dstRowPitch, size_t dstSlicePitch, hipStream_t stream, bool isAsync = false) { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpyAtoDCommand(command, srcArray, dstDevice, srcOrigin, dstOrigin, copyRegion, dstRowPitch, dstSlicePitch, hip_stream); if (status != hipSuccess) return status; return ihipMemcpyCmdEnqueue(command, isAsync); } hipError_t ihipMemcpyDtoA(void* srcDevice, hipArray* dstArray, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, hipStream_t stream, bool isAsync = false) { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpyDtoACommand(command, srcDevice, dstArray, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, hip_stream); if (status != hipSuccess) return status; return ihipMemcpyCmdEnqueue(command, isAsync); } hipError_t ihipMemcpyDtoD(void* srcDevice, void* dstDevice, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, hipStream_t stream, bool isAsync = false) { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpyDtoDCommand(command, srcDevice, dstDevice, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, dstRowPitch, dstSlicePitch, hip_stream); if (status != hipSuccess) return status; return ihipMemcpyCmdEnqueue(command, isAsync); } hipError_t ihipMemcpyDtoH(void* srcDevice, void* dstHost, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, hipStream_t stream, bool isAsync = false) { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpyDtoHCommand(command, srcDevice, dstHost, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, dstRowPitch, dstSlicePitch, hip_stream, isAsync); if (status != hipSuccess) return status; return ihipMemcpyCmdEnqueue(command, isAsync); } hipError_t ihipMemcpyHtoD(const void* srcHost, void* dstDevice, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, size_t dstRowPitch, size_t dstSlicePitch, hipStream_t stream, bool isAsync = false) { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpyHtoDCommand(command, srcHost, dstDevice, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, dstRowPitch, dstSlicePitch, hip_stream, isAsync); if (status != hipSuccess) return status; return ihipMemcpyCmdEnqueue(command, isAsync); } hipError_t ihipMemcpyAtoA(hipArray* srcArray, hipArray* dstArray, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, hipStream_t stream, bool isAsync = false) { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpyAtoACommand(command, srcArray, dstArray, srcOrigin, dstOrigin, copyRegion, hip_stream); if (status != hipSuccess) return status; return ihipMemcpyCmdEnqueue(command, isAsync); } hipError_t ihipMemcpyHtoA(const void* srcHost, hipArray* dstArray, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t srcRowPitch, size_t srcSlicePitch, hipStream_t stream, bool isAsync = false) { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpyHtoACommand(command, srcHost, dstArray, srcOrigin, dstOrigin, copyRegion, srcRowPitch, srcSlicePitch, hip_stream, isAsync); if (status != hipSuccess) return status; return ihipMemcpyCmdEnqueue(command, isAsync); } hipError_t ihipMemcpyAtoH(hipArray* srcArray, void* dstHost, amd::Coord3D srcOrigin, amd::Coord3D dstOrigin, amd::Coord3D copyRegion, size_t dstRowPitch, size_t dstSlicePitch, hipStream_t stream, bool isAsync = false) { amd::Command* command; hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } hipError_t status = ihipMemcpyAtoHCommand(command, srcArray, dstHost, srcOrigin, dstOrigin, copyRegion, dstRowPitch, dstSlicePitch, hip_stream, isAsync); if (status != hipSuccess) return status; return ihipMemcpyCmdEnqueue(command, isAsync); } hipError_t hipMemcpyHtoA(hipArray* dstArray, size_t dstOffset, const void* srcHost, size_t ByteCount) { HIP_INIT_API(hipMemcpyHtoA, dstArray, dstOffset, srcHost, ByteCount); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(ihipMemcpyHtoA(srcHost, dstArray, {0, 0, 0}, {dstOffset, 0, 0}, {ByteCount, 1, 1}, 0, 0, nullptr)); } hipError_t hipMemcpyAtoH(void* dstHost, hipArray* srcArray, size_t srcOffset, size_t ByteCount) { HIP_INIT_API(hipMemcpyAtoH, dstHost, srcArray, srcOffset, ByteCount); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(ihipMemcpyAtoH(srcArray, dstHost, {srcOffset, 0, 0}, {0, 0, 0}, {ByteCount, 1, 1}, 0, 0, nullptr)); } hipError_t ihipMemcpy3D_validate(const hipMemcpy3DParms* p) { // Passing more than one non-zero source or destination will cause hipMemcpy3D() to // return an error. if (p == nullptr || ((p->srcArray != nullptr) && (p->srcPtr.ptr != nullptr)) || ((p->dstArray != nullptr) && (p->dstPtr.ptr != nullptr))) { return hipErrorInvalidValue; } // The struct passed to hipMemcpy3D() must specify one of srcArray or srcPtr and one of dstArray // or dstPtr. if (((p->srcArray == nullptr) && (p->srcPtr.ptr == nullptr)) || ((p->dstArray == nullptr) && (p->dstPtr.ptr == nullptr))) { return hipErrorInvalidValue; } // If the source and destination are both arrays, hipMemcpy3D() will return an error if they do // not have the same element size. if (((p->srcArray != nullptr) && (p->dstArray != nullptr)) && (hip::getElementSize(p->dstArray) != hip::getElementSize(p->dstArray))) { return hipErrorInvalidValue; } // Pitch should not be less than width for both src and dst. if (p->srcPtr.pitch < p->srcPtr.xsize || p->dstPtr.pitch < p->dstPtr.xsize) { return hipErrorInvalidPitchValue; } // dst/src pitch must be less than max pitch auto* deviceHandle = g_devices[hip::getCurrentDevice()->deviceId()]->devices()[0]; const auto& info = deviceHandle->info(); constexpr auto int32_max = static_cast(std::numeric_limits::max()); auto maxPitch = std::min(info.maxMemAllocSize_, int32_max); // negative pitch cases if (p->dstPtr.pitch >= maxPitch || p->srcPtr.pitch >= maxPitch) { return hipErrorInvalidValue; } if (p->dstArray == nullptr && p->srcArray == nullptr) { if ((p->extent.width + p->dstPos.x > p->dstPtr.pitch) || (p->extent.width + p->srcPos.x > p->srcPtr.pitch)) { return hipErrorInvalidValue; } auto totalExtentBytes = p->extent.width * p->extent.height * p->extent.depth; // get memory obj of the PitchPtr size_t offset = 0; amd::Memory* srcPtrMemObj = getMemoryObject(p->srcPtr.ptr, offset); amd::Memory* dstPtrMemObj = getMemoryObject(p->dstPtr.ptr, offset); if (dstPtrMemObj != nullptr && (p->dstPtr.xsize != 0 && p->dstPtr.ysize != 0)) { // Use the memoryObj to get 3d data const auto& dstUsrData = dstPtrMemObj->getUserData(); // dst ptr out of bound cases for linear memory if (dstUsrData.pitch_ == 0 || dstUsrData.height_ == 0 || dstUsrData.depth_ == 0) { auto dstDepth = dstPtrMemObj->getSize() / (p->dstPtr.xsize * p->dstPtr.ysize); if ((p->dstPtr.xsize * (p->dstPtr.ysize - p->dstPos.y) * (dstDepth - p->dstPos.z)) < totalExtentBytes) { return hipErrorInvalidValue; } // out of bound dst ptr for 3d memory } else if ((dstUsrData.pitch_ * (dstUsrData.height_ - p->dstPos.y) * (dstUsrData.depth_ - p->dstPos.z)) < totalExtentBytes) { return hipErrorInvalidValue; } } if (srcPtrMemObj != nullptr && (p->srcPtr.xsize != 0 && p->srcPtr.ysize != 0)) { const auto& srcUsrData = srcPtrMemObj->getUserData(); // src ptr out of bound cases for linear memory if (srcUsrData.pitch_ == 0 || srcUsrData.height_ == 0 || srcUsrData.depth_ == 0) { auto srcDepth = srcPtrMemObj->getSize() / (p->srcPtr.xsize * p->srcPtr.ysize); if ((p->srcPtr.xsize * (p->srcPtr.ysize - p->srcPos.y) * (srcDepth - p->srcPos.z)) < totalExtentBytes) { return hipErrorInvalidValue; } // out of bound src ptr for 3d memory } else if ((srcUsrData.pitch_ * (srcUsrData.height_ - p->srcPos.y) * (srcUsrData.depth_ - p->srcPos.z)) < totalExtentBytes) { return hipErrorInvalidValue; } } } if (p->kind < hipMemcpyHostToHost || p->kind > hipMemcpyDefault) { return hipErrorInvalidMemcpyDirection; } //If src and dst ptr are null then kind must be either h2h or def. if (!IsHtoHMemcpyValid(p->dstPtr.ptr, p->srcPtr.ptr, p->kind)) { return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipMemcpy3DCommand(amd::Command*& command, const hipMemcpy3DParms* p, hip::Stream* stream) { const HIP_MEMCPY3D desc = hip::getDrvMemcpy3DDesc(*p); return ihipGetMemcpyParam3DCommand(command, &desc, stream); } hipError_t ihipMemcpy3D(const hipMemcpy3DParms* p, hipStream_t stream, bool isAsync = false) { hipError_t status = ihipMemcpy3D_validate(p); if (status != hipSuccess) { return status; } const HIP_MEMCPY3D desc = hip::getDrvMemcpy3DDesc(*p); return ihipMemcpyParam3D(&desc, stream, isAsync); } hipError_t hipMemcpy3D_common(const hipMemcpy3DParms* p, hipStream_t stream = nullptr) { CHECK_STREAM_CAPTURING(); return ihipMemcpy3D(p, stream); } hipError_t hipMemcpy3D(const hipMemcpy3DParms* p) { HIP_INIT_API(hipMemcpy3D, p); HIP_RETURN_DURATION(hipMemcpy3D_common(p)); } hipError_t hipMemcpy3D_spt(const hipMemcpy3DParms* p) { HIP_INIT_API(hipMemcpy3D, p); HIP_RETURN_DURATION(hipMemcpy3D_common(p, getPerThreadDefaultStream())); } hipError_t hipMemcpy3DAsync_common(const hipMemcpy3DParms* p, hipStream_t stream) { STREAM_CAPTURE(hipMemcpy3DAsync, stream, p); return ihipMemcpy3D(p, stream, true); } hipError_t hipMemcpy3DAsync(const hipMemcpy3DParms* p, hipStream_t stream) { HIP_INIT_API(hipMemcpy3DAsync, p, stream); HIP_RETURN_DURATION(hipMemcpy3DAsync_common(p, stream)); } hipError_t hipMemcpy3DAsync_spt(const hipMemcpy3DParms* p, hipStream_t stream) { HIP_INIT_API(hipMemcpy3DAsync, p, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN_DURATION(hipMemcpy3DAsync_common(p, stream)); } hipError_t hipDrvMemcpy3D(const HIP_MEMCPY3D* pCopy) { HIP_INIT_API(hipDrvMemcpy3D, pCopy); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(ihipMemcpyParam3D(pCopy, nullptr)); } hipError_t hipDrvMemcpy3DAsync(const HIP_MEMCPY3D* pCopy, hipStream_t stream) { HIP_INIT_API(hipDrvMemcpy3DAsync, pCopy, stream); HIP_RETURN_DURATION(ihipMemcpyParam3D(pCopy, stream, true)); } hipError_t packFillMemoryCommand(amd::Command*& command, amd::Memory* memory, size_t offset, int64_t value, size_t valueSize, size_t sizeBytes, hip::Stream* stream) { if ((memory == nullptr) || (stream == nullptr)) { return hipErrorInvalidValue; } amd::Command::EventWaitList waitList; amd::Coord3D fillOffset(offset, 0, 0); amd::Coord3D fillSize(sizeBytes, 1, 1); // surface=[pitch, width, height] amd::Coord3D surface(sizeBytes, sizeBytes, 1); amd::FillMemoryCommand* fillMemCommand = new amd::FillMemoryCommand(*stream, CL_COMMAND_FILL_BUFFER, waitList, *memory->asBuffer(), &value, valueSize, fillOffset, fillSize, surface); if (fillMemCommand == nullptr) { return hipErrorOutOfMemory; } if (!fillMemCommand->validatePeerMemory()) { delete fillMemCommand; return hipErrorInvalidValue; } command = fillMemCommand; return hipSuccess; } hipError_t ihipMemset_validate(void* dst, int64_t value, size_t valueSize, size_t sizeBytes) { if (sizeBytes == 0) { // Skip if nothing needs filling. return hipSuccess; } if (dst == nullptr) { return hipErrorInvalidValue; } size_t offset = 0; amd::Memory* memory = getMemoryObject(dst, offset); if (memory == nullptr) { // dst ptr is host ptr hence error return hipErrorInvalidValue; } // Return error if sizeBytes passed to memcpy is more than the actual size allocated if (sizeBytes > (memory->getSize() - offset)){ return hipErrorInvalidValue; } return hipSuccess; } hipError_t ihipGraphMemsetParams_validate(const hipMemsetParams* pNodeParams) { if (pNodeParams == nullptr) { return hipErrorInvalidValue; } if (pNodeParams->width == 0) { return hipErrorInvalidValue; } if (pNodeParams->elementSize != 1 && pNodeParams->elementSize != 2 && pNodeParams->elementSize != 4) { return hipErrorInvalidValue; } if (pNodeParams->height <= 0) { return hipErrorInvalidValue; } size_t discardOffset = 0; amd::Memory *memObj = getMemoryObject(pNodeParams->dst, discardOffset); if (memObj != nullptr) { if ((pNodeParams->pitch * pNodeParams->height) > memObj->getSize()) { return hipErrorInvalidValue; } } return hipSuccess; } hipError_t ihipMemsetCommand(std::vector& commands, void* dst, int64_t value, size_t valueSize, size_t sizeBytes, hip::Stream* stream) { hipError_t hip_error = hipSuccess; auto aligned_dst = amd::alignUp(reinterpret_cast
(dst), sizeof(uint64_t)); size_t offset = 0; amd::Memory* memory = getMemoryObject(dst, offset); size_t n_head_bytes = 0; size_t n_tail_bytes = 0; amd::Command* command; hip_error = packFillMemoryCommand(command, memory, offset, value, valueSize, sizeBytes, stream); commands.push_back(command); return hip_error; } hipError_t ihipMemset(void* dst, int64_t value, size_t valueSize, size_t sizeBytes, hipStream_t stream, bool isAsync = false) { hipError_t hip_error = hipSuccess; do { // Nothing to do, fill size is 0. Returns hipSuccess. if (sizeBytes == 0) { break; } // In case of validation failure stop processing. Returns hip_error. hip_error = ihipMemset_validate(dst, value, valueSize, sizeBytes); if (hip_error != hipSuccess) { break; } // This is required to comply with the spec // spec says hipMemset will be asynchronous when destination memory is device memory // and pointer is non-offseted if (isAsync == false) { size_t offset = 0; amd::Memory* memObj = getMemoryObject(dst, offset); auto flags = memObj->getMemFlags(); if ((memObj->getUserData().sync_mem_ops_) || (offset == 0 && !(flags & (CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS | CL_MEM_USE_HOST_PTR)))) { isAsync = true; } } std::vector commands; hip::Stream* hip_stream = hip::getStream(stream); hip_error = ihipMemsetCommand(commands, dst, value, valueSize, sizeBytes, hip_stream); if (hip_error != hipSuccess) { break; } for (auto command : commands) { command->enqueue(); if (!isAsync) { command->awaitCompletion(); } command->release(); } } while (0); return hip_error; } hipError_t hipMemset_common(void* dst, int value, size_t sizeBytes, hipStream_t stream=nullptr) { CHECK_STREAM_CAPTURING(); return ihipMemset(dst, value, sizeof(int8_t), sizeBytes, stream); } hipError_t hipMemset_spt(void* dst, int value, size_t sizeBytes) { HIP_INIT_API(hipMemset, dst, value, sizeBytes); HIP_RETURN(hipMemset_common(dst, value, sizeBytes, getPerThreadDefaultStream())); } hipError_t hipMemset(void* dst, int value, size_t sizeBytes) { HIP_INIT_API(hipMemset, dst, value, sizeBytes); HIP_RETURN(hipMemset_common(dst, value, sizeBytes)); } hipError_t hipMemsetAsync_common(void* dst, int value, size_t sizeBytes, hipStream_t stream) { size_t valueSize = sizeof(int8_t); STREAM_CAPTURE(hipMemsetAsync, stream, dst, value, valueSize, sizeBytes); return ihipMemset(dst, value, sizeof(int8_t), sizeBytes, stream, true); } hipError_t hipMemsetAsync(void* dst, int value, size_t sizeBytes, hipStream_t stream) { HIP_INIT_API(hipMemsetAsync, dst, value, sizeBytes, stream); HIP_RETURN(hipMemsetAsync_common(dst, value, sizeBytes, stream)); } hipError_t hipMemsetAsync_spt(void* dst, int value, size_t sizeBytes, hipStream_t stream) { HIP_INIT_API(hipMemsetAsync, dst, value, sizeBytes, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipMemsetAsync_common(dst, value, sizeBytes, stream)); } hipError_t hipMemsetD8(hipDeviceptr_t dst, unsigned char value, size_t count) { HIP_INIT_API(hipMemsetD8, dst, value, count); CHECK_STREAM_CAPTURING(); HIP_RETURN(ihipMemset(dst, value, sizeof(int8_t), count * sizeof(int8_t), nullptr)); } hipError_t hipMemsetD8Async(hipDeviceptr_t dst, unsigned char value, size_t count, hipStream_t stream) { HIP_INIT_API(hipMemsetD8Async, dst, value, count, stream); int iValue = value; size_t valueSize = sizeof(int8_t); size_t sizeBytes = count * sizeof(int8_t); STREAM_CAPTURE(hipMemsetAsync, stream, dst, iValue, valueSize, sizeBytes); HIP_RETURN(ihipMemset(dst, value, valueSize, sizeBytes, stream, true)); } hipError_t hipMemsetD16(hipDeviceptr_t dst, unsigned short value, size_t count) { HIP_INIT_API(hipMemsetD16, dst, value, count); CHECK_STREAM_CAPTURING(); HIP_RETURN(ihipMemset(dst, value, sizeof(int16_t), count * sizeof(int16_t), nullptr)); } hipError_t hipMemsetD16Async(hipDeviceptr_t dst, unsigned short value, size_t count, hipStream_t stream) { HIP_INIT_API(hipMemsetD16Async, dst, value, count, stream); int iValue = value; size_t valueSize = sizeof(int16_t); size_t sizeBytes = count * sizeof(int16_t); STREAM_CAPTURE(hipMemsetAsync, stream, dst, iValue, valueSize, sizeBytes); HIP_RETURN(ihipMemset(dst, value, valueSize, sizeBytes, stream, true)); } hipError_t hipMemsetD32(hipDeviceptr_t dst, int value, size_t count) { HIP_INIT_API(hipMemsetD32, dst, value, count); CHECK_STREAM_CAPTURING(); HIP_RETURN(ihipMemset(dst, value, sizeof(int32_t), count * sizeof(int32_t), nullptr)); } hipError_t hipMemsetD32Async(hipDeviceptr_t dst, int value, size_t count, hipStream_t stream) { HIP_INIT_API(hipMemsetD32Async, dst, value, count, stream); int iValue = value; size_t valueSize = sizeof(int32_t); size_t sizeBytes = count * sizeof(int32_t); STREAM_CAPTURE(hipMemsetAsync, stream, dst, iValue, valueSize, sizeBytes); HIP_RETURN(ihipMemset(dst, value, valueSize, sizeBytes, stream, true)); } hipError_t ihipMemset3D_validate(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, size_t sizeBytes) { size_t offset = 0; amd::Memory* memory = getMemoryObject(pitchedDevPtr.ptr, offset, sizeBytes); if (memory == nullptr) { return hipErrorInvalidValue; } // Return error if sizeBytes passed to memcpy is more than the actual size allocated if (sizeBytes > (memory->getSize() - offset)){ return hipErrorInvalidValue; } if (pitchedDevPtr.pitch == memory->getUserData().pitch_) { if (extent.height > memory->getUserData().height_) { return hipErrorInvalidValue; } } return hipSuccess; } hipError_t ihipMemset3DCommand(std::vector &commands, hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hip::Stream* stream, size_t elementSize = 1) { size_t offset = 0; auto sizeBytes = extent.width * extent.height * extent.depth; amd::Memory* memory = getMemoryObject(pitchedDevPtr.ptr, offset); if (pitchedDevPtr.pitch == extent.width) { return ihipMemsetCommand(commands, pitchedDevPtr.ptr, value, elementSize, static_cast(sizeBytes), stream); } // Workaround for cases when pitch > row until fill kernel will be updated to support pitch. // Fall back to filling one row at a time. amd::Coord3D origin(offset); amd::Coord3D region(extent.width, extent.height, extent.depth); amd::Coord3D surface(pitchedDevPtr.pitch, pitchedDevPtr.xsize, pitchedDevPtr.ysize); amd::BufferRect rect; if (pitchedDevPtr.pitch == 0 || !rect.create(static_cast(origin), static_cast(amd::Coord3D{pitchedDevPtr.xsize, pitchedDevPtr.ysize, extent.depth}), pitchedDevPtr.pitch, 0)) { return hipErrorInvalidValue; } amd::FillMemoryCommand* command; command = new amd::FillMemoryCommand( *stream, CL_COMMAND_FILL_BUFFER, amd::Command::EventWaitList{}, *memory->asBuffer(), &value, elementSize, origin, region, surface); commands.push_back(command); return hipSuccess; } hipError_t ihipMemset3D(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hipStream_t stream, bool isAsync = false) { auto sizeBytes = extent.width * extent.height * extent.depth; if (sizeBytes == 0) { // sizeBytes is zero hence returning early as nothing to be set return hipSuccess; } hipError_t status = ihipMemset3D_validate(pitchedDevPtr, value, extent, sizeBytes); if (status != hipSuccess) { return status; } // This is required to comply with the spec // spec says hipMemset will be asynchronous when destination memory is device memory // and pointer is non-offseted if (isAsync == false) { size_t offset = 0; amd::Memory* memObj = getMemoryObject(pitchedDevPtr.ptr, offset); auto flags = memObj->getMemFlags(); if (offset == 0 && !(flags & (CL_MEM_USE_HOST_PTR | CL_MEM_SVM_ATOMICS | CL_MEM_SVM_FINE_GRAIN_BUFFER))) { isAsync = true; } } hip::Stream* hip_stream = hip::getStream(stream); std::vector commands; status = ihipMemset3DCommand(commands, pitchedDevPtr, value, extent, hip_stream); if (status != hipSuccess) { return status; } for (auto& command : commands) { command->enqueue(); if (!isAsync) { command->awaitCompletion(); } command->release(); } return hipSuccess; } hipError_t hipMemset2D_common(void* dst, size_t pitch, int value, size_t width, size_t height, hipStream_t stream=nullptr) { CHECK_STREAM_CAPTURING(); return ihipMemset3D({dst, pitch, width, height}, value, {width, height, 1}, stream); } hipError_t hipMemset2D_spt(void* dst, size_t pitch, int value, size_t width, size_t height) { HIP_INIT_API(hipMemset2D, dst, pitch, value, width, height); hipStream_t stream = getPerThreadDefaultStream(); HIP_RETURN(hipMemset2D_common(dst, pitch, value, width, height, stream)); } hipError_t hipMemset2D(void* dst, size_t pitch, int value, size_t width, size_t height) { HIP_INIT_API(hipMemset2D, dst, pitch, value, width, height); HIP_RETURN(hipMemset2D_common(dst, pitch, value, width, height)); } hipError_t hipMemset2DAsync_common(void* dst, size_t pitch, int value, size_t width, size_t height, hipStream_t stream) { STREAM_CAPTURE(hipMemset2DAsync, stream, dst, pitch, value, width, height); return ihipMemset3D({dst, pitch, width, height}, value, {width, height, 1}, stream, true); } hipError_t hipMemset2DAsync(void* dst, size_t pitch, int value, size_t width, size_t height, hipStream_t stream) { HIP_INIT_API(hipMemset2DAsync, dst, pitch, value, width, height, stream); HIP_RETURN(hipMemset2DAsync_common(dst, pitch, value, width, height, stream)); } hipError_t hipMemset2DAsync_spt(void* dst, size_t pitch, int value, size_t width, size_t height, hipStream_t stream) { HIP_INIT_API(hipMemset2DAsync, dst, pitch, value, width, height, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipMemset2DAsync_common(dst, pitch, value, width, height, stream)); } hipError_t hipMemset3D_common(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hipStream_t stream=nullptr) { CHECK_STREAM_CAPTURING(); return ihipMemset3D(pitchedDevPtr, value, extent, stream); } hipError_t hipMemset3D(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent) { HIP_INIT_API(hipMemset3D, pitchedDevPtr, value, extent); HIP_RETURN(hipMemset3D_common(pitchedDevPtr, value, extent)); } hipError_t hipMemset3D_spt(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent) { HIP_INIT_API(hipMemset3D, pitchedDevPtr, value, extent); hipStream_t stream = getPerThreadDefaultStream(); HIP_RETURN(hipMemset3D_common(pitchedDevPtr, value, extent,stream)); } hipError_t hipMemset3DAsync_common(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hipStream_t stream) { STREAM_CAPTURE(hipMemset3DAsync, stream, pitchedDevPtr, value, extent); return ihipMemset3D(pitchedDevPtr, value, extent, stream, true); } hipError_t hipMemset3DAsync(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hipStream_t stream) { HIP_INIT_API(hipMemset3DAsync, pitchedDevPtr, value, extent, stream); HIP_RETURN(hipMemset3DAsync_common(pitchedDevPtr, value, extent, stream)); } hipError_t hipMemset3DAsync_spt(hipPitchedPtr pitchedDevPtr, int value, hipExtent extent, hipStream_t stream) { HIP_INIT_API(hipMemset3DAsync, pitchedDevPtr, value, extent, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipMemset3DAsync_common(pitchedDevPtr, value, extent, stream)); } hipError_t hipMemAllocPitch(hipDeviceptr_t* dptr, size_t* pitch, size_t widthInBytes, size_t height, unsigned int elementSizeBytes) { HIP_INIT_API(hipMemAllocPitch, dptr, pitch, widthInBytes, height, elementSizeBytes); CHECK_STREAM_CAPTURE_SUPPORTED(); if (widthInBytes == 0 || height == 0) { HIP_RETURN(hipErrorInvalidValue); } if (elementSizeBytes != 4 && elementSizeBytes != 8 && elementSizeBytes != 16) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipMallocPitch(dptr, pitch, widthInBytes, height)); } hipError_t hipMemAllocHost(void** ptr, size_t size) { HIP_INIT_API(hipMemAllocHost, ptr, size); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN_DURATION(hipHostMalloc(ptr, size, 0)); } hipError_t hipIpcGetMemHandle(hipIpcMemHandle_t* handle, void* dev_ptr) { HIP_INIT_API(hipIpcGetMemHandle, handle, dev_ptr); amd::Device* device = nullptr; ihipIpcMemHandle_t* ihandle = nullptr; if ((handle == nullptr) || (dev_ptr == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } device = hip::getCurrentDevice()->devices()[0]; ihandle = reinterpret_cast(handle); if(!device->IpcCreate(dev_ptr, &(ihandle->psize), &(ihandle->ipc_handle), &(ihandle->poffset))) { LogPrintfError("IPC memory creation failed for memory: 0x%x", dev_ptr); HIP_RETURN(hipErrorInvalidValue); } ihandle->owners_process_id = amd::Os::getProcessId(); HIP_RETURN(hipSuccess); } hipError_t hipIpcOpenMemHandle(void** dev_ptr, hipIpcMemHandle_t handle, unsigned int flags) { HIP_INIT_API(hipIpcOpenMemHandle, dev_ptr, &handle, flags); amd::Memory* amd_mem_obj = nullptr; amd::Device* device = nullptr; ihipIpcMemHandle_t* ihandle = nullptr; if (dev_ptr == nullptr || flags != hipIpcMemLazyEnablePeerAccess) { HIP_RETURN(hipErrorInvalidValue); } /* Call the IPC Attach from Device class */ device = hip::getCurrentDevice()->devices()[0]; ihandle = reinterpret_cast(&handle); if (ihandle->psize == 0) { HIP_RETURN(hipErrorInvalidValue); } if (ihandle->owners_process_id == amd::Os::getProcessId()) { HIP_RETURN(hipErrorInvalidContext); } if(!device->IpcAttach(&(ihandle->ipc_handle), ihandle->psize, ihandle->poffset, flags, dev_ptr)) { LogPrintfError("Cannot attach ipc_handle: with ipc_size: %u" "ipc_offset: %u flags: %u", ihandle->psize, flags); HIP_RETURN(hipErrorInvalidDevicePointer); } HIP_RETURN(hipSuccess); } hipError_t hipIpcCloseMemHandle(void* dev_ptr) { HIP_INIT_API(hipIpcCloseMemHandle, dev_ptr); amd::Device* device = nullptr; amd::Memory* amd_mem_obj = nullptr; hip::getNullStream()->finish(); if (dev_ptr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } /* Call IPC Detach from Device class */ device = hip::getCurrentDevice()->devices()[0]; if (device == nullptr) { HIP_RETURN(hipErrorNoDevice); } /* detach the memory */ if (!device->IpcDetach(dev_ptr)){ HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipHostGetDevicePointer(void** devicePointer, void* hostPointer, unsigned flags) { HIP_INIT_API(hipHostGetDevicePointer, devicePointer, hostPointer, flags); if (devicePointer == nullptr) { HIP_RETURN(hipErrorInvalidValue); } size_t offset = 0; amd::Memory* memObj = getMemoryObject(hostPointer, offset); if (!memObj) { HIP_RETURN(hipErrorInvalidValue); } *devicePointer = reinterpret_cast(memObj->getDeviceMemory(*hip::getCurrentDevice()->devices()[0])->virtualAddress() + offset); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipPointerGetAttributes(hipPointerAttribute_t* attributes, const void* ptr) { HIP_INIT_API(hipPointerGetAttributes, attributes, ptr); if (attributes == nullptr || ptr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } size_t offset = 0; amd::Memory* memObj = getMemoryObject(ptr, offset); int device = 0; device::Memory* devMem = nullptr; memset(attributes, 0, sizeof(hipPointerAttribute_t)); if (memObj != nullptr) { attributes->type = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags())? hipMemoryTypeHost : hipMemoryTypeDevice; if (attributes->type == hipMemoryTypeHost) { if (memObj->getHostMem() != nullptr) { attributes->hostPointer = static_cast(memObj->getHostMem()) + offset; } else { attributes->hostPointer = static_cast(memObj->getSvmPtr()) + offset; } } // the pointer that attribute is retrieved for might not be on the current device for (const auto& device : g_devices) { if(device->deviceId() == memObj->getUserData().deviceId) { devMem = memObj->getDeviceMemory(*device->devices()[0]); break; } } //getDeviceMemory can fail, hence validate the sanity of the mem obtained if (nullptr == devMem) { DevLogPrintfError("getDeviceMemory for ptr failed : %p \n", ptr); HIP_RETURN(hipErrorMemoryAllocation); } attributes->devicePointer = reinterpret_cast(devMem->virtualAddress() + offset); constexpr uint32_t kManagedAlloc = (CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_ALLOC_HOST_PTR); attributes->isManaged = ((memObj->getMemFlags() & kManagedAlloc) == kManagedAlloc) ? true : false; attributes->allocationFlags = memObj->getUserData().flags; attributes->device = memObj->getUserData().deviceId; if (attributes->isManaged) { attributes->type = hipMemoryTypeManaged; } HIP_RETURN(hipSuccess); } LogPrintfError("Cannot get amd_mem_obj for ptr: %p \n", ptr); HIP_RETURN(hipErrorInvalidValue); } // ================================================================================================ hipError_t ihipPointerSetAttribute(const void* value, hipPointer_attribute attribute, hipDeviceptr_t ptr) { if (attribute != HIP_POINTER_ATTRIBUTE_SYNC_MEMOPS) { return hipErrorInvalidValue; } size_t offset = 0; amd::Memory* memObj = getMemoryObject(ptr, offset); if (memObj == nullptr) { return hipErrorInvalidDevicePointer; } memObj->getUserData().sync_mem_ops_ = static_cast(*(reinterpret_cast(value))); return hipSuccess; } // ================================================================================================ hipError_t ihipPointerGetAttributes(void* data, hipPointer_attribute attribute, hipDeviceptr_t ptr) { size_t offset = 0; amd::Memory* memObj = getMemoryObject(ptr, offset); constexpr uint32_t kManagedAlloc = (CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_ALLOC_HOST_PTR); hipError_t status = hipSuccess; switch (attribute) { case HIP_POINTER_ATTRIBUTE_CONTEXT : { status = hipErrorNotSupported; break; } case HIP_POINTER_ATTRIBUTE_MEMORY_TYPE : { if (memObj) { // checks for host type or device type *reinterpret_cast(data) = ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags())? hipMemoryTypeHost : hipMemoryTypeDevice; } else { // checks for array type cl_mem dstMemObj = reinterpret_cast((static_cast(ptr))->data); if (!is_valid(dstMemObj)) { *reinterpret_cast(data) = 0; return hipErrorInvalidValue; } amd::Image* dstImage = as_amd(dstMemObj)->asImage(); if (dstImage){ *reinterpret_cast(data) = hipMemoryTypeArray; } else { *reinterpret_cast(data) = 0; return hipErrorInvalidValue; } } break; } case HIP_POINTER_ATTRIBUTE_DEVICE_POINTER : { if (memObj) { device::Memory* devMem = memObj->getDeviceMemory(*hip::getCurrentDevice()->devices()[0]); //getDeviceMemory can fail, hence validate the sanity of the mem obtained if (nullptr == devMem) { DevLogPrintfError("getDeviceMemory for ptr failed : %p \n", ptr); return hipErrorMemoryAllocation; } *reinterpret_cast(data) = reinterpret_cast(devMem->virtualAddress() + offset); } else { *reinterpret_cast(data) = nullptr; return hipErrorInvalidValue; } break; } case HIP_POINTER_ATTRIBUTE_HOST_POINTER : { if (memObj) { if ((CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR) & memObj->getMemFlags()) { if (memObj->getHostMem() != nullptr) { // Registered memory *reinterpret_cast(data) = static_cast(memObj->getHostMem()) + offset; } else { // Prepinned memory *reinterpret_cast(data) = static_cast(memObj->getSvmPtr()) + offset; } } else { *reinterpret_cast(data) = nullptr; status = hipErrorInvalidValue; } } else { // Host Memory *reinterpret_cast(data) = nullptr; status = hipErrorInvalidValue; } break; } case HIP_POINTER_ATTRIBUTE_P2P_TOKENS : { // Currently not supported, deprecated in cuda as well status = hipErrorNotSupported; break; } case HIP_POINTER_ATTRIBUTE_SYNC_MEMOPS : { // This attribute is ideally used in hipPointerSetAttribute, defaults to true *reinterpret_cast(data) = true; break; } case HIP_POINTER_ATTRIBUTE_BUFFER_ID : { if (memObj) { *reinterpret_cast(data) = memObj->getUniqueId(); } else { // ptr passed must be allocated using HIP memory allocation API *reinterpret_cast(data) = 0; return hipErrorInvalidValue; } break; } case HIP_POINTER_ATTRIBUTE_IS_MANAGED : { if (memObj) { *reinterpret_cast(data) = ((memObj->getMemFlags() & kManagedAlloc) == kManagedAlloc) ? true : false; } else { *reinterpret_cast(data) = false; return hipErrorInvalidValue; } break; } case HIP_POINTER_ATTRIBUTE_DEVICE_ORDINAL : { if (memObj) { *reinterpret_cast(data) = memObj->getUserData().deviceId; } else { // for host memory, -2 is returned by default similar to cuda *reinterpret_cast(data) = -2; status = hipErrorInvalidValue; } break; } case HIP_POINTER_ATTRIBUTE_IS_LEGACY_HIP_IPC_CAPABLE : { // TODO: Unclear what to be done for this attribute status = hipErrorNotSupported; break; } case HIP_POINTER_ATTRIBUTE_RANGE_START_ADDR : { if (memObj) { if (memObj->getHostMem() != nullptr) { *reinterpret_cast(data) = static_cast(memObj->getHostMem()); } else { device::Memory* devMem = memObj->getDeviceMemory(*hip::getCurrentDevice()->devices()[0]); //getDeviceMemory can fail, hence validate the sanity of the mem obtained if (nullptr == devMem) { DevLogPrintfError("getDeviceMemory for ptr failed : %p \n", ptr); return hipErrorMemoryAllocation; } *reinterpret_cast(data) = reinterpret_cast(devMem->virtualAddress()); } } else { // Input is host memory pointer, invalid for device. *reinterpret_cast(data) = nullptr; status = hipErrorInvalidValue; } break; } case HIP_POINTER_ATTRIBUTE_RANGE_SIZE : { if (memObj) { *reinterpret_cast(data) = memObj->getSize(); } else { *reinterpret_cast(data) = 0; status = hipErrorInvalidValue; } break; } case HIP_POINTER_ATTRIBUTE_MAPPED : { if (memObj) { *reinterpret_cast(data) = true; } else { *reinterpret_cast(data) = false; status = hipErrorInvalidValue; } break; } case HIP_POINTER_ATTRIBUTE_ALLOWED_HANDLE_TYPES : { // hipMemAllocationHandleType is not yet supported LogPrintfWarning("attribute %d is not supported.", attribute); status = hipErrorNotSupported; break; } case HIP_POINTER_ATTRIBUTE_IS_GPU_DIRECT_RDMA_CAPABLE : { // GPUDirect RDMA API is not yet supported LogPrintfWarning("attribute %d is not supported.", attribute); status = hipErrorNotSupported; break; } case HIP_POINTER_ATTRIBUTE_ACCESS_FLAGS : { if (memObj) { *reinterpret_cast(data) = memObj->getUserData().flags; } else { *reinterpret_cast(data) = 0; } break; } case HIP_POINTER_ATTRIBUTE_MEMPOOL_HANDLE : { // allocations from mempool are not yet supported LogPrintfWarning("attribute %d is not supported.", attribute); status = hipErrorNotSupported; break; } default: { LogPrintfError("Invalid attribute: %d ", attribute); status = hipErrorInvalidValue; break; } } return status; } // ================================================================================================ hipError_t hipPointerSetAttribute(const void* value, hipPointer_attribute attribute, hipDeviceptr_t ptr) { HIP_INIT_API(hipPointerSetAttribute, value, attribute, ptr); if (ptr == nullptr || value == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(ihipPointerSetAttribute(value, attribute, ptr)); } // ================================================================================================ hipError_t hipPointerGetAttribute(void* data, hipPointer_attribute attribute, hipDeviceptr_t ptr) { HIP_INIT_API(hipPointerGetAttribute, data, attribute, ptr); if (ptr == nullptr || data == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(ihipPointerGetAttributes(data, attribute, ptr)); } // ================================================================================================ hipError_t hipDrvPointerGetAttributes(unsigned int numAttributes, hipPointer_attribute* attributes, void** data, hipDeviceptr_t ptr) { HIP_INIT_API(hipDrvPointerGetAttributes, numAttributes, attributes, data, ptr); if (numAttributes == 0 || attributes == nullptr || data == nullptr || ptr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } // Ignore the status, hipDrvPointerGetAttributes always returns success // If the ptr is invalid, the queried attributes will be assigned default values for (int i = 0; i < numAttributes; ++i) { hipError_t status = ihipPointerGetAttributes(data[i], attributes[i], ptr); } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipArrayDestroy(hipArray* array) { HIP_INIT_API(hipArrayDestroy, array); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipArrayDestroy(array)); } hipError_t ihipArray3DGetDescriptor(HIP_ARRAY3D_DESCRIPTOR* desc, hipArray* array) { { amd::ScopedLock lock(hip::hipArraySetLock); if (hip::hipArraySet.find(array) == hip::hipArraySet.end()) { return hipErrorInvalidHandle; } } desc->Width = array->width; desc->Height = array->height; desc->Depth = array->depth; desc->Format = array->Format; desc->NumChannels = array->NumChannels; desc->Flags = array->flags; return hipSuccess; } hipError_t hipArrayGetInfo(hipChannelFormatDesc* desc, hipExtent* extent, unsigned int* flags, hipArray* array) { HIP_INIT_API(hipArrayGetInfo, desc, extent, flags, array); CHECK_STREAM_CAPTURE_SUPPORTED(); if (array == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } // If all output parameters are nullptr, then no need to proceed further if ((desc == nullptr) && (extent == nullptr) && (flags == nullptr)) { HIP_RETURN(hipSuccess); } HIP_ARRAY3D_DESCRIPTOR array3DDescriptor; hipError_t status = ihipArray3DGetDescriptor(&array3DDescriptor, array); // Fill each output parameter if (status == hipSuccess) { if (desc != nullptr) { *desc = hip::getChannelFormatDesc(array3DDescriptor.NumChannels, array3DDescriptor.Format); } if (extent != nullptr) { extent->width = array3DDescriptor.Width; extent->height = array3DDescriptor.Height; extent->depth = array3DDescriptor.Depth; } if (flags != nullptr) { *flags = array3DDescriptor.Flags; } } HIP_RETURN(status); } hipError_t hipArrayGetDescriptor(HIP_ARRAY_DESCRIPTOR* pArrayDescriptor, hipArray* array) { HIP_INIT_API(hipArrayGetDescriptor, pArrayDescriptor, array); CHECK_STREAM_CAPTURE_SUPPORTED(); if (array == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } if (pArrayDescriptor == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_ARRAY3D_DESCRIPTOR array3DDescriptor; hipError_t status = ihipArray3DGetDescriptor(&array3DDescriptor, array); // Fill each output parameter if (status == hipSuccess) { pArrayDescriptor->Width = array3DDescriptor.Width; pArrayDescriptor->Height = array3DDescriptor.Height; pArrayDescriptor->Format = array3DDescriptor.Format; pArrayDescriptor->NumChannels = array3DDescriptor.NumChannels; } HIP_RETURN(status); } hipError_t hipArray3DGetDescriptor(HIP_ARRAY3D_DESCRIPTOR* pArrayDescriptor, hipArray* array) { HIP_INIT_API(hipArray3DGetDescriptor, pArrayDescriptor, array); CHECK_STREAM_CAPTURE_SUPPORTED(); if (array == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } if (pArrayDescriptor == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(ihipArray3DGetDescriptor(pArrayDescriptor, array)); } hipError_t hipMemcpyParam2DAsync(const hip_Memcpy2D* pCopy, hipStream_t stream) { HIP_INIT_API(hipMemcpyParam2DAsync, pCopy); STREAM_CAPTURE(hipMemcpyParam2DAsync, stream, pCopy); HIP_RETURN(ihipMemcpyParam2D(pCopy, stream, true)); } hipError_t ihipMemcpy2DArrayToArray(hipArray_t dst, size_t wOffsetDst, size_t hOffsetDst, hipArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream, bool isAsync = false) { hip_Memcpy2D desc = {}; desc.srcXInBytes = wOffsetSrc; desc.srcY = hOffsetSrc; desc.srcMemoryType = hipMemoryTypeArray; desc.srcHost = nullptr; desc.srcDevice = nullptr; desc.srcArray = const_cast(src); desc.srcPitch = 0; // Ignored. desc.dstXInBytes = wOffsetDst; desc.dstY = hOffsetDst; desc.dstMemoryType = hipMemoryTypeArray; desc.dstHost = nullptr; desc.dstDevice = nullptr; desc.dstArray = dst; desc.dstPitch = 0; // Ignored. desc.WidthInBytes = width; desc.Height = height; return ihipMemcpyParam2D(&desc, stream, isAsync); } hipError_t hipMemcpy2DArrayToArray(hipArray_t dst, size_t wOffsetDst, size_t hOffsetDst, hipArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t width, size_t height, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy2DArrayToArray, dst, wOffsetDst, hOffsetDst, src, wOffsetSrc, hOffsetSrc, width, height, kind); CHECK_STREAM_CAPTURING(); hipError_t validateParam = hipSuccess, validateSrc = hipSuccess, validateDst = hipSuccess; if ((validateParam = hipMemcpy2DValidateParams(kind)) != hipSuccess) { HIP_RETURN(validateParam); } if ((validateSrc = hipMemcpy2DValidateArray(src, wOffsetSrc, hOffsetSrc, width, height)) != hipSuccess) { HIP_RETURN(validateSrc); } if ((validateDst = hipMemcpy2DValidateArray(dst, wOffsetDst, hOffsetDst, width, height)) != hipSuccess) { HIP_RETURN(validateDst); } HIP_RETURN_DURATION(ihipMemcpy2DArrayToArray(dst, wOffsetDst, hOffsetDst, src, wOffsetSrc, hOffsetSrc, width, height, kind, nullptr)); } hipError_t hipMemcpyArrayToArray(hipArray_t dst, size_t wOffsetDst, size_t hOffsetDst, hipArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t width, size_t height, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpyArrayToArray, dst, wOffsetDst, hOffsetDst, src, wOffsetSrc, hOffsetSrc, width, height, kind); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(ihipMemcpy2DArrayToArray(dst, wOffsetDst, hOffsetDst, src, wOffsetSrc, hOffsetSrc, width, height, kind, nullptr)); } hipError_t hipMemcpy2DFromArray_common(void* dst, size_t dpitch, hipArray_const_t src, size_t wOffsetSrc, size_t hOffset, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream=nullptr, bool isAsync=false) { hipError_t validateParam = hipSuccess, validateSrc = hipSuccess, validateDst = hipSuccess; if ((validateParam = hipMemcpy2DValidateParams(kind,stream)) != hipSuccess) { return validateParam; } if ((validateSrc = hipMemcpy2DValidateArray(src,wOffsetSrc, hOffset, width, height)) != hipSuccess) { return validateSrc; } if ((validateDst = hipMemcpy2DValidateBuffer(dst,dpitch,width)) != hipSuccess) { return validateDst; } return ihipMemcpy2DFromArray(dst, dpitch, src, wOffsetSrc, hOffset, width, height, kind, stream, isAsync); } hipError_t hipMemcpy2DFromArray(void* dst, size_t dpitch,hipArray_const_t src, size_t wOffsetSrc, size_t hOffset, size_t width, size_t height, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy2DFromArray, dst, dpitch, src, wOffsetSrc, hOffset, width, height, kind); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(hipMemcpy2DFromArray_common(dst, dpitch, src, wOffsetSrc, hOffset, width, height, kind)); } hipError_t hipMemcpy2DFromArray_spt(void* dst, size_t dpitch, hipArray_const_t src, size_t wOffsetSrc, size_t hOffset, size_t width, size_t height, hipMemcpyKind kind) { HIP_INIT_API(hipMemcpy2DFromArray, dst, dpitch, src, wOffsetSrc, hOffset, width, height, kind); hipStream_t stream = getPerThreadDefaultStream(); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(hipMemcpy2DFromArray_common(dst, dpitch, src, wOffsetSrc, hOffset, width, height, kind, stream)); } hipError_t hipMemcpy2DFromArrayAsync(void* dst, size_t dpitch, hipArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpy2DFromArrayAsync, dst, dpitch, src, wOffsetSrc, hOffsetSrc, width, height, kind, stream); STREAM_CAPTURE(hipMemcpy2DFromArrayAsync, stream, dst, dpitch, src, wOffsetSrc, hOffsetSrc, width, height, kind); HIP_RETURN_DURATION(hipMemcpy2DFromArray_common(dst, dpitch, src, wOffsetSrc, hOffsetSrc, width, height, kind, stream, true)); } hipError_t hipMemcpy2DFromArrayAsync_spt(void* dst, size_t dpitch, hipArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpy2DFromArrayAsync, dst, dpitch, src, wOffsetSrc, hOffsetSrc, width, height, kind, stream); PER_THREAD_DEFAULT_STREAM(stream); STREAM_CAPTURE(hipMemcpy2DFromArrayAsync, stream, dst, dpitch, src, wOffsetSrc, hOffsetSrc, width, height, kind); HIP_RETURN_DURATION(hipMemcpy2DFromArray_common(dst, dpitch, src, wOffsetSrc, hOffsetSrc, width, height, kind, stream, true)); } hipError_t hipMemcpyFromArrayAsync(void* dst, hipArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t count, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyFromArrayAsync, dst, src, wOffsetSrc, hOffsetSrc, count, kind, stream); STREAM_CAPTURE(hipMemcpyFromArrayAsync, stream, dst, src, wOffsetSrc, hOffsetSrc, count, kind); if (src == nullptr) { HIP_RETURN(hipErrorInvalidValue); } const size_t arrayHeight = (src->height != 0) ? src->height : 1; const size_t widthInBytes = count / arrayHeight; const size_t height = (count / src->width) / hip::getElementSize(src); HIP_RETURN_DURATION(ihipMemcpy2DFromArray(dst, 0 /* dpitch */, src, wOffsetSrc, hOffsetSrc, widthInBytes, height, kind, stream, true)); } hipError_t hipMemcpy2DToArrayAsync(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpy2DToArrayAsync, dst, wOffset, hOffset, src, spitch, width, height, kind, stream); STREAM_CAPTURE(hipMemcpy2DToArrayAsync, stream, dst, wOffset, hOffset, src, spitch, width, height, kind); HIP_RETURN_DURATION(hipMemcpy2DToArray_common(dst, wOffset, hOffset, src, spitch, width, height, kind, stream, true)); } hipError_t hipMemcpy2DToArrayAsync_spt(hipArray* dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpy2DToArrayAsync, dst, wOffset, hOffset, src, spitch, width, height, kind, stream); PER_THREAD_DEFAULT_STREAM(stream); STREAM_CAPTURE(hipMemcpy2DToArrayAsync, stream, dst, wOffset, hOffset, src, spitch, width, height, kind); HIP_RETURN_DURATION(hipMemcpy2DToArray_common(dst, wOffset, hOffset, src, spitch, width, height, kind, stream, true)); } hipError_t hipMemcpyToArrayAsync(hipArray_t dst, size_t wOffset, size_t hOffset, const void* src, size_t count, hipMemcpyKind kind, hipStream_t stream) { HIP_INIT_API(hipMemcpyToArrayAsync, dst, wOffset, hOffset, src, count, kind); STREAM_CAPTURE(hipMemcpyToArrayAsync, stream, dst, wOffset, hOffset, src, count, kind); if (dst == nullptr) { HIP_RETURN(hipErrorInvalidValue); } const size_t arrayHeight = (dst->height != 0) ? dst->height : 1; const size_t widthInBytes = count / arrayHeight; const size_t height = (count / dst->width) / hip::getElementSize(dst); HIP_RETURN_DURATION(ihipMemcpy2DToArray(dst, wOffset, hOffset, src, 0 /* spitch */, widthInBytes, height, kind, stream, true)); } hipError_t hipMemcpyAtoA(hipArray* dstArray, size_t dstOffset, hipArray* srcArray, size_t srcOffset, size_t ByteCount) { HIP_INIT_API(hipMemcpyAtoA, dstArray, dstOffset, srcArray, srcOffset, ByteCount); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(ihipMemcpyAtoA(srcArray, dstArray, {srcOffset, 0, 0}, {dstOffset, 0, 0}, {ByteCount, 1, 1}, nullptr)); } hipError_t hipMemcpyAtoD(hipDeviceptr_t dstDevice, hipArray* srcArray, size_t srcOffset, size_t ByteCount) { HIP_INIT_API(hipMemcpyAtoD, dstDevice, srcArray, srcOffset, ByteCount); HIP_RETURN_DURATION(ihipMemcpyAtoD(srcArray, dstDevice, {srcOffset, 0, 0}, {0, 0, 0}, {ByteCount, 1, 1}, 0, 0, nullptr)); } hipError_t hipMemcpyAtoHAsync(void* dstHost, hipArray* srcArray, size_t srcOffset, size_t ByteCount, hipStream_t stream) { HIP_INIT_API(hipMemcpyAtoHAsync, dstHost, srcArray, srcOffset, ByteCount, stream); STREAM_CAPTURE(hipMemcpyAtoHAsync, stream, dstHost, srcArray, srcOffset, ByteCount); HIP_RETURN_DURATION(ihipMemcpyAtoH(srcArray, dstHost, {srcOffset, 0, 0}, {0, 0, 0}, {ByteCount, 1, 1}, 0, 0, stream, true)); } hipError_t hipMemcpyDtoA(hipArray* dstArray, size_t dstOffset, hipDeviceptr_t srcDevice, size_t ByteCount) { HIP_INIT_API(hipMemcpyDtoA, dstArray, dstOffset, srcDevice, ByteCount); CHECK_STREAM_CAPTURING(); HIP_RETURN_DURATION(ihipMemcpyDtoA(srcDevice, dstArray, {0, 0, 0}, {dstOffset, 0, 0}, {ByteCount, 1, 1}, 0, 0, nullptr)); } hipError_t hipMemcpyHtoAAsync(hipArray* dstArray, size_t dstOffset, const void* srcHost, size_t ByteCount, hipStream_t stream) { HIP_INIT_API(hipMemcpyHtoAAsync, dstArray, dstOffset, srcHost, ByteCount, stream); STREAM_CAPTURE(hipMemcpyHtoAAsync, stream, dstArray, dstOffset, srcHost, ByteCount); HIP_RETURN_DURATION(ihipMemcpyHtoA(srcHost, dstArray, {0, 0, 0}, {dstOffset, 0, 0}, {ByteCount, 1, 1}, 0, 0, stream, true)); } hipError_t hipMallocHost(void** ptr, size_t size) { HIP_INIT_API(hipMallocHost, ptr, size); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN_DURATION(ihipMalloc(ptr, size, CL_MEM_SVM_FINE_GRAIN_BUFFER), (ptr != nullptr)? *ptr : nullptr); } hipError_t hipFreeHost(void *ptr) { HIP_INIT_API(hipFreeHost, ptr); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipFree(ptr)); } hipError_t hipDrvMemcpy2DUnaligned(const hip_Memcpy2D* pCopy) { HIP_INIT_API(hipDrvMemcpy2DUnaligned, pCopy); HIP_MEMCPY3D desc = hip::getDrvMemcpy3DDesc(*pCopy); HIP_RETURN(ihipMemcpyParam3D(&desc, nullptr)); } hipError_t ihipMipmapArrayCreate(hipMipmappedArray_t* mipmapped_array_pptr, HIP_ARRAY3D_DESCRIPTOR* mipmapped_array_desc_ptr, unsigned int num_mipmap_levels) { bool mipMapSupport = true; amd::Context& context = *hip::getCurrentDevice()->asContext(); const std::vector& devices = context.devices(); for (auto& dev : devices) { if (!dev->settings().checkExtension(ClKhrMipMapImage)) { mipMapSupport = false; } } if (mipMapSupport == false) { LogPrintfError("Mipmap not supported on one of the devices, Mip Level: %d", num_mipmap_levels); return hipErrorNotSupported; } const cl_channel_order channel_order = hip::getCLChannelOrder( mipmapped_array_desc_ptr->NumChannels, 0); const cl_channel_type channel_type = hip::getCLChannelType(mipmapped_array_desc_ptr->Format, hipReadModeElementType); const cl_mem_object_type image_type = hip::getCLMemObjectType(mipmapped_array_desc_ptr->Width, mipmapped_array_desc_ptr->Height, mipmapped_array_desc_ptr->Depth, mipmapped_array_desc_ptr->Flags); hipError_t status = hipSuccess; // Create a new amd::Image with mipmap amd::Image* image = ihipImageCreate(channel_order, channel_type, image_type, mipmapped_array_desc_ptr->Width, mipmapped_array_desc_ptr->Height, mipmapped_array_desc_ptr->Depth, mipmapped_array_desc_ptr->Depth, 0 /* row pitch */, 0 /* slice pitch */, num_mipmap_levels, nullptr, /* buffer */ status); if (image == nullptr) { return status; } cl_mem cl_mem_obj = as_cl(image); *mipmapped_array_pptr = new hipMipmappedArray(); (*mipmapped_array_pptr)->data = reinterpret_cast(cl_mem_obj); (*mipmapped_array_pptr)->desc = hip::getChannelFormatDesc( mipmapped_array_desc_ptr->NumChannels, mipmapped_array_desc_ptr->Format); (*mipmapped_array_pptr)->type = image_type; (*mipmapped_array_pptr)->width = mipmapped_array_desc_ptr->Width; (*mipmapped_array_pptr)->height = mipmapped_array_desc_ptr->Height; (*mipmapped_array_pptr)->depth = mipmapped_array_desc_ptr->Depth; (*mipmapped_array_pptr)->min_mipmap_level = 0; (*mipmapped_array_pptr)->max_mipmap_level = num_mipmap_levels; (*mipmapped_array_pptr)->flags = mipmapped_array_desc_ptr->Flags; (*mipmapped_array_pptr)->format = mipmapped_array_desc_ptr->Format; (*mipmapped_array_pptr)->num_channels = mipmapped_array_desc_ptr->NumChannels; return hipSuccess; } hipError_t ihipMipmappedArrayDestroy(hipMipmappedArray_t mipmapped_array_ptr) { if (mipmapped_array_ptr == nullptr) { return hipErrorInvalidValue; } cl_mem mem_obj = reinterpret_cast(mipmapped_array_ptr->data); if (is_valid(mem_obj) == false) { return hipErrorInvalidValue; } auto image = as_amd(mem_obj); // Wait on the device, associated with the current memory object during allocation hip::Stream::SyncAllStreams(image->getUserData().deviceId); image->release(); delete mipmapped_array_ptr; return hipSuccess; } hipError_t ihipMipmappedArrayGetLevel(hipArray_t* level_array_pptr, hipMipmappedArray_t mipmapped_array_ptr, unsigned int mip_level) { if (level_array_pptr == nullptr || mipmapped_array_ptr == nullptr) { return hipErrorInvalidValue; } // Convert the raw data to amd::Image cl_mem cl_mem_obj = reinterpret_cast(mipmapped_array_ptr->data); if (is_valid(cl_mem_obj) == false) { return hipErrorInvalidValue; } amd::Image* image = as_amd(cl_mem_obj)->asImage(); if (image == nullptr) { return hipErrorInvalidValue; } // Create new hip Array parameter and create an image view with new mip level. (*level_array_pptr) = new hipArray(); (*level_array_pptr)->data = as_cl(image->createView(image->getContext(), image->getImageFormat(), NULL, mip_level, 0)); // Copy the new width, height & depth details of the flag to hipArray. cl_mem cl_mip_mem_obj = reinterpret_cast((*level_array_pptr)->data); if (is_valid(cl_mem_obj) == false) { return hipErrorInvalidValue; } // Fill the hip_array info from newly created amd::Image's view amd::Image* mipmap_image = as_amd(cl_mip_mem_obj)->asImage(); (*level_array_pptr)->width = mipmap_image->getWidth(); (*level_array_pptr)->height = mipmap_image->getHeight(); (*level_array_pptr)->depth = mipmap_image->getDepth(); const cl_mem_object_type image_type = hip::getCLMemObjectType((*level_array_pptr)->width, (*level_array_pptr)->height, (*level_array_pptr)->depth, mipmapped_array_ptr->flags); (*level_array_pptr)->type = image_type; (*level_array_pptr)->Format = mipmapped_array_ptr->format; (*level_array_pptr)->desc = mipmapped_array_ptr->desc; (*level_array_pptr)->NumChannels = hip::getNumChannels((*level_array_pptr)->desc); (*level_array_pptr)->isDrv = 0; (*level_array_pptr)->textureType = 0; amd::ScopedLock lock(hip::hipArraySetLock); hip::hipArraySet.insert(*level_array_pptr); return hipSuccess; } hipError_t hipMipmappedArrayCreate(hipMipmappedArray_t* mipmapped_array_pptr, HIP_ARRAY3D_DESCRIPTOR* mipmapped_array_desc_ptr, unsigned int num_mipmap_levels) { HIP_INIT_API(hipMipmappedArrayCreate, mipmapped_array_pptr, mipmapped_array_desc_ptr, num_mipmap_levels); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipMipmapArrayCreate(mipmapped_array_pptr, mipmapped_array_desc_ptr, num_mipmap_levels)); } hipError_t hipMipmappedArrayDestroy(hipMipmappedArray_t mipmapped_array_ptr) { HIP_INIT_API(hipMipmappedArrayDestroy, mipmapped_array_ptr); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipMipmappedArrayDestroy(mipmapped_array_ptr)); } hipError_t hipMipmappedArrayGetLevel(hipArray_t* level_array_pptr, hipMipmappedArray_t mipmapped_array_ptr, unsigned int mip_level) { HIP_INIT_API(hipMipmappedArrayGetLevel, level_array_pptr, mipmapped_array_ptr, mip_level); HIP_RETURN(ihipMipmappedArrayGetLevel(level_array_pptr, mipmapped_array_ptr, mip_level)); } hipError_t hipMallocMipmappedArray(hipMipmappedArray_t *mipmappedArray, const hipChannelFormatDesc* desc, hipExtent extent, unsigned int numLevels, unsigned int flags) { HIP_INIT_API(hipMallocMipmappedArray, mipmappedArray, desc, extent, numLevels, flags); if (mipmappedArray == nullptr || desc == nullptr) { return hipErrorInvalidValue; } CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_ARRAY3D_DESCRIPTOR allocateArray = {extent.width, extent.height, extent.depth, hip::getArrayFormat(*desc), hip::getNumChannels(*desc), flags}; if(!hip::CheckArrayFormat(*desc)) { return hipErrorInvalidValue; } HIP_RETURN(ihipMipmapArrayCreate(mipmappedArray, &allocateArray, numLevels)); } hipError_t hipFreeMipmappedArray(hipMipmappedArray_t mipmappedArray) { HIP_INIT_API(hipFreeMipmappedArray, mipmappedArray); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipMipmappedArrayDestroy(mipmappedArray)); } hipError_t hipGetMipmappedArrayLevel(hipArray_t *levelArray, hipMipmappedArray_const_t mipmappedArray, unsigned int level) { HIP_INIT_API(hipGetMipmappedArrayLevel, levelArray, mipmappedArray, level); CHECK_STREAM_CAPTURE_SUPPORTED(); HIP_RETURN(ihipMipmappedArrayGetLevel(levelArray, const_cast(mipmappedArray), level)); } clr-rocm-5.7.1/hipamd/src/hip_mempool.cpp000066400000000000000000000350501450307266000203150ustar00rootroot00000000000000/* Copyright (c) 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "hip_mempool_impl.hpp" /** * API interfaces */ extern hipError_t ihipFree(void* ptr); // ================================================================================================ hipError_t hipDeviceGetDefaultMemPool(hipMemPool_t* mem_pool, int device) { HIP_INIT_API(hipDeviceGetDefaultMemPool, mem_pool, device); if (mem_pool == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (device < 0 || device >= g_devices.size()) { HIP_RETURN(hipErrorInvalidDevice); } *mem_pool = reinterpret_cast(g_devices[device]->GetDefaultMemoryPool()); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipDeviceSetMemPool(int device, hipMemPool_t mem_pool) { HIP_INIT_API(hipDeviceSetMemPool, device, mem_pool); if ((mem_pool == nullptr) || (device >= g_devices.size())) { HIP_RETURN(hipErrorInvalidValue); } auto poolDevice = reinterpret_cast(mem_pool)->Device(); if (poolDevice->deviceId() != device) { HIP_RETURN(hipErrorInvalidDevice); } g_devices[device]->SetCurrentMemoryPool(reinterpret_cast(mem_pool)); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipDeviceGetMemPool(hipMemPool_t* mem_pool, int device) { HIP_INIT_API(hipDeviceGetMemPool, mem_pool, device); if ((mem_pool == nullptr) || (device >= g_devices.size())) { HIP_RETURN(hipErrorInvalidValue); } *mem_pool = reinterpret_cast(g_devices[device]->GetCurrentMemoryPool()); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMallocAsync(void** dev_ptr, size_t size, hipStream_t stream) { HIP_INIT_API(hipMallocAsync, dev_ptr, size, stream); if ((dev_ptr == nullptr) || (!hip::isValid(stream))) { HIP_RETURN(hipErrorInvalidValue); } if (size == 0) { *dev_ptr = nullptr; HIP_RETURN(hipSuccess); } auto hip_stream = (stream == nullptr) ? hip::getCurrentDevice()->NullStream() : reinterpret_cast(stream); auto device = hip_stream->GetDevice(); auto mem_pool = device->GetCurrentMemoryPool(); STREAM_CAPTURE(hipMallocAsync, stream, reinterpret_cast(mem_pool), size, dev_ptr); *dev_ptr = mem_pool->AllocateMemory(size, hip_stream); if (*dev_ptr == nullptr) { HIP_RETURN(hipErrorOutOfMemory); } HIP_RETURN(hipSuccess); } // ================================================================================================ // @note: Runtime needs the new command for MT path, since the app can execute hipFreeAsync() // before the graph execution is done. Hence there could be a race condition between // memory allocatiom in graph, which occurs in a worker thread, and host execution of hipFreeAsync class FreeAsyncCommand : public amd::Command { private: void* ptr_; //!< Virtual address for asynchronious free public: FreeAsyncCommand(amd::HostQueue& queue, void* ptr) : amd::Command(queue, 1, amd::Event::nullWaitList), ptr_(ptr) {} virtual void submit(device::VirtualDevice& device) final { size_t offset = 0; auto memory = getMemoryObject(ptr_, offset); if (memory != nullptr) { auto id = memory->getUserData().deviceId; if (!AMD_DIRECT_DISPATCH) { // Required for HIP events hip::setCurrentDevice(id); } if (!g_devices[id]->FreeMemory(memory, static_cast(queue()))) { // @note It's not the most optimal logic. // The current implementation has unconditional waits if (ihipFree(ptr_) != hipSuccess) { setStatus(CL_INVALID_OPERATION); } } } } }; // ================================================================================================ hipError_t hipFreeAsync(void* dev_ptr, hipStream_t stream) { HIP_INIT_API(hipFreeAsync, dev_ptr, stream); if ((dev_ptr == nullptr) || (!hip::isValid(stream))) { HIP_RETURN(hipErrorInvalidValue); } STREAM_CAPTURE(hipFreeAsync, stream, dev_ptr); auto hip_stream = (stream == nullptr) ? hip::getCurrentDevice()->NullStream() : reinterpret_cast(stream); auto cmd = new FreeAsyncCommand(*hip_stream, dev_ptr); if (cmd == nullptr) { HIP_RETURN(hipErrorUnknown); } cmd->enqueue(); cmd->release(); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolTrimTo(hipMemPool_t mem_pool, size_t min_bytes_to_hold) { HIP_INIT_API(hipMemPoolTrimTo, mem_pool, min_bytes_to_hold); if (mem_pool == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hip::MemoryPool* hip_mem_pool = reinterpret_cast(mem_pool); hip_mem_pool->TrimTo(min_bytes_to_hold); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolSetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value) { HIP_INIT_API(hipMemPoolSetAttribute, mem_pool, attr, value); if (mem_pool == nullptr || value == nullptr) { HIP_RETURN(hipErrorInvalidValue); } auto hip_mem_pool = reinterpret_cast(mem_pool); HIP_RETURN(hip_mem_pool->SetAttribute(attr, value)); } // ================================================================================================ hipError_t hipMemPoolGetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value) { HIP_INIT_API(hipMemPoolGetAttribute, mem_pool, attr, value); if (mem_pool == nullptr || value == nullptr) { HIP_RETURN(hipErrorInvalidValue); } auto hip_mem_pool = reinterpret_cast(mem_pool); HIP_RETURN(hip_mem_pool->GetAttribute(attr, value)); } // ================================================================================================ hipError_t hipMemPoolSetAccess( hipMemPool_t mem_pool, const hipMemAccessDesc* desc_list, size_t count) { HIP_INIT_API(hipMemPoolSetAccess, mem_pool, desc_list, count); if ((mem_pool == nullptr) || (desc_list == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } auto hip_mem_pool = reinterpret_cast(mem_pool); for (int i = 0; i < count; ++i) { if (desc_list[i].location.type == hipMemLocationTypeDevice) { if (desc_list[i].location.id >= g_devices.size()) { HIP_RETURN(hipErrorInvalidValue); } if (desc_list[i].flags > hipMemAccessFlagsProtReadWrite) { HIP_RETURN(hipErrorInvalidValue); } auto device = g_devices[desc_list[i].location.id]; hip_mem_pool->SetAccess(device, desc_list[i].flags); } else { HIP_RETURN(hipErrorInvalidValue); } } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolGetAccess( hipMemAccessFlags* flags, hipMemPool_t mem_pool, hipMemLocation* location) { HIP_INIT_API(hipMemPoolGetAccess, flags, mem_pool, location); if ((mem_pool == nullptr) || (location == nullptr) || (flags == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } auto hip_mem_pool = reinterpret_cast(mem_pool); if (location->type == hipMemLocationTypeDevice) { if (location->id >= g_devices.size()) { HIP_RETURN(hipErrorInvalidValue); } auto device = g_devices[location->id]; hip_mem_pool->GetAccess(device, flags); } else { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolCreate(hipMemPool_t* mem_pool, const hipMemPoolProps* pool_props) { HIP_INIT_API(hipMemPoolCreate, mem_pool, pool_props); if (mem_pool == nullptr) { HIP_RETURN(hipErrorInvalidValue); } // validate hipMemAllocationType value if (pool_props->allocType != hipMemAllocationTypePinned) { HIP_RETURN(hipErrorInvalidValue); } // Make sure the pool creation occurs on a valid device if ((pool_props->location.type != hipMemLocationTypeDevice) || (pool_props->location.id >= g_devices.size())) { HIP_RETURN(hipErrorInvalidValue); } auto device = g_devices[pool_props->location.id]; auto pool = new hip::MemoryPool(device, pool_props->handleTypes != hipMemHandleTypeNone); if (pool == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *mem_pool = reinterpret_cast(pool); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolDestroy(hipMemPool_t mem_pool) { HIP_INIT_API(hipMemPoolDestroy, mem_pool); if (mem_pool == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hip::MemoryPool* hip_mem_pool = reinterpret_cast(mem_pool); hip_mem_pool->ReleaseFreedMemory(); auto device = hip_mem_pool->Device(); // Force default pool if the current one is destroyed if (hip_mem_pool == device->GetCurrentMemoryPool()) { device->SetCurrentMemoryPool(device->GetDefaultMemoryPool()); } hip_mem_pool->release(); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMallocFromPoolAsync( void** dev_ptr, size_t size, hipMemPool_t mem_pool, hipStream_t stream) { HIP_INIT_API(hipMallocFromPoolAsync, dev_ptr, size, mem_pool, stream); if ((dev_ptr == nullptr) || (mem_pool == nullptr) || (!hip::isValid(stream))) { HIP_RETURN(hipErrorInvalidValue); } if (size == 0) { *dev_ptr = nullptr; HIP_RETURN(hipSuccess); } STREAM_CAPTURE(hipMallocAsync, stream, mem_pool, size, dev_ptr); auto mpool = reinterpret_cast(mem_pool); auto hip_stream = (stream == nullptr) ? hip::getCurrentDevice()->NullStream() : reinterpret_cast(stream); *dev_ptr = mpool->AllocateMemory(size, hip_stream); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolExportToShareableHandle( void* shared_handle, hipMemPool_t mem_pool, hipMemAllocationHandleType handle_type, unsigned int flags) { HIP_INIT_API(hipMemPoolExportToShareableHandle, shared_handle, mem_pool, handle_type, flags); if (mem_pool == nullptr || shared_handle == nullptr || flags == -1) { HIP_RETURN(hipErrorInvalidValue); } auto mpool = reinterpret_cast(mem_pool); auto handle = mpool->Export(); if (!handle) { HIP_RETURN(hipErrorInvalidValue); } *reinterpret_cast(shared_handle) = handle; HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolImportFromShareableHandle( hipMemPool_t* mem_pool, void* shared_handle, hipMemAllocationHandleType handle_type, unsigned int flags) { HIP_INIT_API(hipMemPoolImportFromShareableHandle, mem_pool, shared_handle, handle_type, flags); if (mem_pool == nullptr || shared_handle == nullptr || flags == -1) { HIP_RETURN(hipErrorInvalidValue); } auto device = g_devices[0]; auto pool = new hip::MemoryPool(device); if (pool == nullptr) { HIP_RETURN(hipErrorOutOfMemory); } // Note: The interface casts the integer value of file handle under Linux into void*, // but compiler may not allow to cast it back. Hence, make a cast with a union... union { amd::Os::FileDesc desc; void* ptr; } handle; handle.ptr = shared_handle; if (!pool->Import(handle.desc)) { pool->release(); HIP_RETURN(hipErrorOutOfMemory); } *mem_pool = reinterpret_cast(pool); HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolExportPointer(hipMemPoolPtrExportData* export_data, void* ptr) { HIP_INIT_API(hipMemPoolExportPointer, export_data, ptr); if (export_data == nullptr || ptr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } size_t offset = 0; auto memory = getMemoryObject(ptr, offset); if (memory != nullptr) { auto id = memory->getUserData().deviceId; // Note: export_data must point to 64 bytes of shared memory auto shared = reinterpret_cast(export_data); if (!g_devices[id]->devices()[0]->IpcCreate(ptr, &shared->size_, &shared->handle_[0], &shared->offset_)) { HIP_RETURN(hipErrorOutOfMemory); } } else { HIP_RETURN(hipErrorOutOfMemory); } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipMemPoolImportPointer( void** ptr, hipMemPool_t mem_pool, hipMemPoolPtrExportData* export_data) { HIP_INIT_API(hipMemPoolImportPointer, ptr, mem_pool, export_data); if (mem_pool == nullptr || export_data == nullptr || ptr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } auto mpool = reinterpret_cast(mem_pool); auto shared = reinterpret_cast(export_data); if (!mpool->Device()->devices()[0]->IpcAttach( &shared->handle_[0], shared->size_, shared->offset_, 0, ptr)) { HIP_RETURN(hipErrorOutOfMemory); } size_t offset = 0; auto memory = getMemoryObject(*ptr, offset); mpool->AddBusyMemory(memory); mpool->retain(); HIP_RETURN(hipSuccess); } clr-rocm-5.7.1/hipamd/src/hip_mempool_impl.cpp000066400000000000000000000432521450307266000213410ustar00rootroot00000000000000/* Copyright (c) 2022-2023 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "hip_mempool_impl.hpp" #include "hip_vm.hpp" #include "platform/command.hpp" namespace hip { // ================================================================================================ void Heap::AddMemory(amd::Memory* memory, hip::Stream* stream) { allocations_.insert({memory, {stream, nullptr}}); total_size_ += memory->getSize(); max_total_size_ = std::max(max_total_size_, total_size_); } // ================================================================================================ void Heap::AddMemory(amd::Memory* memory, const MemoryTimestamp& ts) { allocations_.insert({memory, ts}); total_size_ += memory->getSize(); max_total_size_ = std::max(max_total_size_, total_size_); } // ================================================================================================ amd::Memory* Heap::FindMemory(size_t size, hip::Stream* stream, bool opportunistic, void* dptr) { amd::Memory* memory = nullptr; for (auto it = allocations_.begin(); it != allocations_.end();) { bool check_address = (dptr == nullptr); if (it->first->getSvmPtr() == dptr) { // If the search is done for the specified address then runtime must wait it->second.Wait(); check_address = true; } // Check if size can match and it's safe to use this resource. // Runtime can accept an allocation with 12.5% on the size threshold if ((it->first->getSize() >= size) && (it->first->getSize() <= (size / 8) * 9) && check_address && (it->second.IsSafeFind(stream, opportunistic))) { memory = it->first; total_size_ -= memory->getSize(); // Remove found allocation from the map it = allocations_.erase(it); break; } else { ++it; } } return memory; } // ================================================================================================ bool Heap::RemoveMemory(amd::Memory* memory, MemoryTimestamp* ts) { if (auto it = allocations_.find(memory); it != allocations_.end()) { if (ts != nullptr) { // Preserve timestamp info for possible reuse later *ts = it->second; } else { // Runtime will delete the timestamp object, hence make sure HIP event is released it->second.Wait(); it->second.SetEvent(nullptr); } total_size_ -= memory->getSize(); allocations_.erase(it); return true; } return false; } // ================================================================================================ std::unordered_map::iterator Heap::EraseAllocaton(std::unordered_map::iterator& it) { const device::Memory* dev_mem = it->first->getDeviceMemory(*device_->devices()[0]); amd::SvmBuffer::free(it->first->getContext(), reinterpret_cast(dev_mem->virtualAddress())); total_size_ -= it->first->getSize(); // Clear HIP event it->second.SetEvent(nullptr); // Remove the allocation from the map return allocations_.erase(it); } // ================================================================================================ bool Heap::ReleaseAllMemory(size_t min_bytes_to_hold, bool safe_release) { for (auto it = allocations_.begin(); it != allocations_.end();) { // Make sure the heap is smaller than the minimum value to hold if (total_size_ <= min_bytes_to_hold) { return true; } // Safe release forces unconditional wait for memory if (safe_release) { it->second.Wait(); } if (it->second.IsSafeRelease()) { it = EraseAllocaton(it); } else { ++it; } } return true; } // ================================================================================================ bool Heap::ReleaseAllMemory(hip::Stream* stream) { for (auto it = allocations_.begin(); it != allocations_.end();) { // Make sure the heap holds the minimum number of bytes if (total_size_ <= release_threshold_) { return true; } if (it->second.IsSafeRelease()) { it = EraseAllocaton(it); } else { ++it; } } return true; } // ================================================================================================ void Heap::RemoveStream(hip::Stream* stream) { for (auto it : allocations_) { it.second.safe_streams_.erase(stream); } } // ================================================================================================ void Heap::SetAccess(hip::Device* device, bool enable) { for (const auto& it : allocations_) { auto peer_device = device->asContext()->devices()[0]; device::Memory* mem = it.first->getDeviceMemory(*peer_device); if (mem != nullptr) { if (!mem->getAllowedPeerAccess() && enable) { // Enable p2p access for the specified device peer_device->allowPeerAccess(mem); mem->setAllowedPeerAccess(true); } else if (mem->getAllowedPeerAccess() && !enable) { mem->setAllowedPeerAccess(false); } } else { LogError("Couldn't find device memory for P2P access"); } } } // ================================================================================================ void* MemoryPool::AllocateMemory(size_t size, hip::Stream* stream, void* dptr) { amd::ScopedLock lock(lock_pool_ops_); void* dev_ptr = nullptr; amd::Memory* memory = free_heap_.FindMemory(size, stream, Opportunistic(), dptr); if (memory == nullptr) { amd::Context* context = device_->asContext(); const auto& dev_info = context->devices()[0]->info(); if (dev_info.maxMemAllocSize_ < size) { return nullptr; } cl_svm_mem_flags flags = (state_.interprocess_) ? ROCCLR_MEM_INTERPROCESS : 0; dev_ptr = amd::SvmBuffer::malloc(*context, flags, size, dev_info.memBaseAddrAlign_, nullptr); if (dev_ptr == nullptr) { size_t free = 0, total =0; hipError_t err = hipMemGetInfo(&free, &total); if (err == hipSuccess) { LogPrintfError("Allocation failed : Device memory : required :%zu | free :%zu | total :%zu \n", size, free, total); } return nullptr; } size_t offset = 0; memory = getMemoryObject(dev_ptr, offset); // Saves the current device id so that it can be accessed later memory->getUserData().deviceId = device_->deviceId(); // Update access for the new allocation from other devices for (const auto& it : access_map_) { auto vdi_device = it.first->asContext()->devices()[0]; device::Memory* mem = memory->getDeviceMemory(*vdi_device); if ((mem != nullptr) && (it.second != hipMemAccessFlagsProtNone)) { vdi_device->allowPeerAccess(mem); mem->setAllowedPeerAccess(true); } } } else { free_heap_.RemoveMemory(memory); const device::Memory* dev_mem = memory->getDeviceMemory(*device_->devices()[0]); dev_ptr = reinterpret_cast(dev_mem->virtualAddress()); } // Place the allocated memory into the busy heap busy_heap_.AddMemory(memory, stream); // Increment the reference counter on the pool retain(); ClPrint(amd::LOG_INFO, amd::LOG_MEM_POOL, "Pool AllocMem: %p, %p", memory->getSvmPtr(), memory); return dev_ptr; } // ================================================================================================ bool MemoryPool::FreeMemory(amd::Memory* memory, hip::Stream* stream) { amd::ScopedLock lock(lock_pool_ops_); MemoryTimestamp ts; // Remove memory object from the busy pool if (!busy_heap_.RemoveMemory(memory, &ts)) { // This pool doesn't contain memory return false; } ClPrint(amd::LOG_INFO, amd::LOG_MEM_POOL, "Pool FreeMem: %p, %p", memory->getSvmPtr(), memory); auto ga = reinterpret_cast(memory->getUserData().data); if (ga != nullptr) { if (stream == nullptr) { stream = g_devices[memory->getUserData().deviceId]->NullStream(); } // Unmap virtual address from memory auto cmd = new amd::VirtualMapCommand(*stream, amd::Command::EventWaitList{}, memory->getSvmPtr(), ga->size_, nullptr); cmd->enqueue(); cmd->release(); memory->setSvmPtr(ga->ptr_); // Free virtual address and destroy generic allocation object ga->va_->release(); delete ga; memory->getUserData().data = nullptr; } if (stream != nullptr) { // The stream of destruction is a safe stream, because the app must handle sync ts.AddSafeStream(stream); // Add a marker to the stream to trace availability of this memory Event* e = new hip::Event(0); if (e != nullptr) { if (hipSuccess == e->addMarker(reinterpret_cast(stream), nullptr, true)) { ts.SetEvent(e); } } } else { // Assume a safe release from hipFree() if stream is nullptr ts.SetEvent(nullptr); } free_heap_.AddMemory(memory, ts); // Decrement the reference counter on the pool release(); return true; } // ================================================================================================ void MemoryPool::ReleaseAllMemory() { constexpr bool kSafeRelease = true; free_heap_.ReleaseAllMemory(0, kSafeRelease); busy_heap_.ReleaseAllMemory(0, kSafeRelease); } // ================================================================================================ void MemoryPool::ReleaseFreedMemory(hip::Stream* stream) { amd::ScopedLock lock(lock_pool_ops_); free_heap_.ReleaseAllMemory(stream); } // ================================================================================================ void MemoryPool::RemoveStream(hip::Stream* stream) { amd::ScopedLock lock(lock_pool_ops_); free_heap_.RemoveStream(stream); } // ================================================================================================ void MemoryPool::TrimTo(size_t min_bytes_to_hold) { amd::ScopedLock lock(lock_pool_ops_); free_heap_.ReleaseAllMemory(min_bytes_to_hold); } // ================================================================================================ hipError_t MemoryPool::SetAttribute(hipMemPoolAttr attr, void* value) { amd::ScopedLock lock(lock_pool_ops_); uint64_t reset; switch (attr) { case hipMemPoolReuseFollowEventDependencies: // Enable/disable HIP events tracking from the app's dependencies state_.event_dependencies_ = *reinterpret_cast(value); break; case hipMemPoolReuseAllowOpportunistic: // Enable/disable HIP event check for freed memory state_.opportunistic_ = *reinterpret_cast(value); break; case hipMemPoolReuseAllowInternalDependencies: // Enable/disable internal extra dependencies introduced in runtime state_.internal_dependencies_ = *reinterpret_cast(value); break; case hipMemPoolAttrReleaseThreshold: free_heap_.SetReleaseThreshold(*reinterpret_cast(value)); break; case hipMemPoolAttrReservedMemCurrent: // Should be GetAttribute only return hipErrorInvalidValue; break; case hipMemPoolAttrReservedMemHigh: reset = *reinterpret_cast(value); // Only 0 is accepted if (reset != 0) { return hipErrorInvalidValue; } free_heap_.SetMaxTotalSize(reset); case hipMemPoolAttrUsedMemCurrent: // Should be GetAttribute only return hipErrorInvalidValue; break; case hipMemPoolAttrUsedMemHigh: reset = *reinterpret_cast(value); // Only 0 is accepted if (reset != 0) { return hipErrorInvalidValue; } busy_heap_.SetMaxTotalSize(reset); break; default: return hipErrorInvalidValue; } return hipSuccess; } // ================================================================================================ hipError_t MemoryPool::GetAttribute(hipMemPoolAttr attr, void* value) { amd::ScopedLock lock(lock_pool_ops_); switch (attr) { case hipMemPoolReuseFollowEventDependencies: // Enable/disable HIP events tracking from the app's dependencies *reinterpret_cast(value) = EventDependencies(); break; case hipMemPoolReuseAllowOpportunistic: // Enable/disable HIP event check for freed memory *reinterpret_cast(value) = Opportunistic(); break; case hipMemPoolReuseAllowInternalDependencies: // Enable/disable internal extra dependencies introduced in runtime *reinterpret_cast(value) = InternalDependencies(); break; case hipMemPoolAttrReleaseThreshold: *reinterpret_cast(value) = free_heap_.GetReleaseThreshold(); break; case hipMemPoolAttrReservedMemCurrent: // All allocate memory by the pool in OS *reinterpret_cast(value) = busy_heap_.GetTotalSize() + free_heap_.GetTotalSize(); break; case hipMemPoolAttrReservedMemHigh: // High watermark of all allocated memory in OS, since the last reset *reinterpret_cast(value) = busy_heap_.GetTotalSize() + free_heap_.GetMaxTotalSize(); break; case hipMemPoolAttrUsedMemCurrent: // Total currently used memory by the pool *reinterpret_cast(value) = busy_heap_.GetTotalSize(); break; case hipMemPoolAttrUsedMemHigh: // High watermark of all used memoryS, since the last reset *reinterpret_cast(value) = busy_heap_.GetMaxTotalSize(); break; default: return hipErrorInvalidValue; } return hipSuccess; } // ================================================================================================ void MemoryPool::SetAccess(hip::Device* device, hipMemAccessFlags flags) { amd::ScopedLock lock(lock_pool_ops_); // Check if the requested device is the pool device where memory was allocated if (device == device_) { return; } hipMemAccessFlags current_flags = hipMemAccessFlagsProtNone; // Check if access was enabled before if (access_map_.find(device) != access_map_.end()) { current_flags = access_map_[device]; } if (current_flags != flags) { bool enable_access = false; // Save the access state in the device map access_map_[device] = flags; // Check if access is enabled if ((flags == hipMemAccessFlagsProtRead) || (flags == hipMemAccessFlagsProtReadWrite)) { enable_access = true; } // Update device access on the both pools busy_heap_.SetAccess(device, enable_access); free_heap_.SetAccess(device, enable_access); } } // ================================================================================================ void MemoryPool::GetAccess(hip::Device* device, hipMemAccessFlags* flags) { amd::ScopedLock lock(lock_pool_ops_); // Current pool device has full access to memory allocation *flags = (device == device_) ? hipMemAccessFlagsProtReadWrite : hipMemAccessFlagsProtNone; // Check if access was enabled before if (access_map_.find(device) != access_map_.end()) { *flags = access_map_[device]; } } // ================================================================================================ void MemoryPool::FreeAllMemory(hip::Stream* stream) { while (!busy_heap_.Allocations().empty()) { FreeMemory(busy_heap_.Allocations().begin()->first, stream); } } // ================================================================================================ amd::Os::FileDesc MemoryPool::Export() { amd::ScopedLock lock(lock_pool_ops_); if (shared_ != nullptr) { return shared_->handle_; } constexpr uint32_t kFileNameSize = 20; char file_name[kFileNameSize]; // Generate a unique name from the mempool pointer // Note: Windows can accept an unnamed allocation snprintf(file_name, kFileNameSize, "%p", this); amd::Os::FileDesc handle{}; shared_ = reinterpret_cast(amd::Os::CreateIpcMemory( file_name, sizeof(SharedMemPool), &handle)); if (shared_ != nullptr) { shared_->handle_ = handle; shared_->state_ = state_.value_; shared_->access_size_ = 0; memset(shared_->access_, 0, sizeof(SharedAccess) * kMaxMgpuAccess); assert((access_map_.size() <= kMaxMgpuAccess) && "Can't support more GPU(s) in shared access" ); for (auto it : access_map_) { shared_->access_[shared_->access_size_] = SharedAccess{it.first->deviceId(), it.second}; shared_->access_size_++; } } return handle; } // ================================================================================================ bool MemoryPool::Import(amd::Os::FileDesc handle) { amd::ScopedLock lock(lock_pool_ops_); bool result = false; auto shared = reinterpret_cast( amd::Os::OpenIpcMemory(nullptr, handle, sizeof(SharedMemPool))); if (shared != nullptr) { state_.value_ = shared->state_; for (uint32_t i = 0; i < shared->access_size_; ++i) { access_map_[g_devices[shared->access_[i].device_id_]] = shared->access_[i].flags_; } result = true; } return result; } } clr-rocm-5.7.1/hipamd/src/hip_mempool_impl.hpp000066400000000000000000000254411450307266000213460ustar00rootroot00000000000000/* Copyright (c) 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include #include "hip_event.hpp" #include "hip_internal.hpp" #include #include namespace hip { class Device; class Stream; struct SharedMemPointer { size_t offset_; size_t size_; char handle_[IHIP_IPC_MEM_HANDLE_SIZE]; }; struct MemoryTimestamp { MemoryTimestamp(hip::Stream* stream, hip::Event* event = nullptr): event_(event) { if (stream != nullptr) { safe_streams_.insert(stream); } } MemoryTimestamp(): event_(nullptr) {} /// Adds a safe stream to the list of stream for possible reuse void AddSafeStream(hip::Stream* stream) { if (safe_streams_.find(stream) != safe_streams_.end()) { safe_streams_.insert(stream); } } /// Changes last known valid event asociated with memory void SetEvent(hip::Event* event) { delete event_; event_ = event; } /// Wait for memory to be available void Wait() { if (event_ != nullptr) { auto hip_error = event_->synchronize(); } } /// Returns if memory object is safe for reuse bool IsSafeFind(hip::Stream* stream = nullptr, bool opportunistic = true) { bool result = false; if (safe_streams_.find(stream) != safe_streams_.end()) { // A safe stream doesn't require TS validation result = true; } else if (opportunistic && (event_ != nullptr)) { // Check HIP event for a retired status result = (event_->query() == hipSuccess) ? true : false; } else if (event_ == nullptr) { // Event doesn't exist. It was a safe release with explicit wait return true; } return result; } /// Returns if memory object is safe for reuse bool IsSafeRelease() { bool result = true; if (event_ != nullptr) { // Check HIP event for a retired status result = (event_->query() == hipSuccess) ? true : false; } return result; } std::unordered_set safe_streams_; //!< Safe streams for memory reuse hip::Event* event_; //!< Last known HIP event, associated with the memory object }; class Heap : public amd::EmbeddedObject { public: Heap(hip::Device* device): total_size_(0), max_total_size_(0), release_threshold_(0), device_(device) {} ~Heap() {} /// Adds allocation into the heap on a specific stream void AddMemory(amd::Memory* memory, hip::Stream* stream); /// Adds allocation into the heap with specific TS void AddMemory(amd::Memory* memory, const MemoryTimestamp& ts); /// Finds memory object with the specified size amd::Memory* FindMemory(size_t size, hip::Stream* stream, bool opportunistic, void* dptr = nullptr); /// Removes allocation from the map bool RemoveMemory(amd::Memory* memory, MemoryTimestamp* ts = nullptr); /// Releases all memory, until the threshold value is met bool ReleaseAllMemory(size_t min_bytes_to_hold = std::numeric_limits::max(), bool safe_release = false); /// Releases all memory, safe to the provided stream, until the threshold value is met bool ReleaseAllMemory(hip::Stream* stream); /// Remove the provided stream from the safe list void RemoveStream(hip::Stream* stream); /// Enables P2P access to the provided device void SetAccess(hip::Device* device, bool enable); /// Heap doesn't have any allocations bool IsEmpty() const { return (allocations_.size() == 0) ? true : false; } /// Set the memory release threshold void SetReleaseThreshold(uint64_t value) { release_threshold_ = value; } /// Set the memory release threshold uint64_t GetReleaseThreshold() const { return release_threshold_; } /// Get the size of all allocations in the heap uint64_t GetTotalSize() const { return total_size_; } /// Get the size of all allocations in the heap uint64_t GetMaxTotalSize() const { return max_total_size_; } /// Set maximum total, allocated by the heap void SetMaxTotalSize(uint64_t value) { max_total_size_ = value; } /// Erases single allocation form the heap's map std::unordered_map::iterator EraseAllocaton( std::unordered_map::iterator& it); /// Checks if memory belongs to this heap bool IsActiveMemory(amd::Memory* memory) const { return (allocations_.find(memory) != allocations_.end()); } const auto& Allocations() { return allocations_; } private: Heap() = delete; Heap(const Heap&) = delete; Heap& operator=(const Heap&) = delete; std::unordered_map allocations_; //!< Map of allocations on a specific stream uint64_t total_size_; //!< Size of all allocations in the heap uint64_t max_total_size_; //!< Maximum heap allocation size uint64_t release_threshold_; //!< Threshold size in bytes for memory release from heap, default 0 hip::Device* device_; //!< Hip device the allocations will reside }; /// Allocates memory in the pool on the specified stream and places the allocation into busy_heap_ /// @note: the logic also will look in free_heap for possible reuse. /// hipMemPoolReuseAllowOpportunistic option will validate if HIP event, /// associated with memory is done, then reuse can be performed. class MemoryPool : public amd::ReferenceCountedObject { public: struct SharedAccess { int device_id_; //!< Device ID for access with a specified shared resource hipMemAccessFlags flags_; //!< Flags which define access type }; static constexpr uint32_t kMaxMgpuAccess = 32; struct SharedMemPool { amd::Os::FileDesc handle_; //!< File descriptor for shared memory uint32_t state_; //!< Memory pool state uint32_t access_size_; //!< The number of entries in access array SharedAccess access_[kMaxMgpuAccess]; //!< The list of devices for access }; MemoryPool(hip::Device* device, bool interprocess = false) : busy_heap_(device), free_heap_(device), lock_pool_ops_("Pool operations", true), device_(device), shared_(nullptr) { device_->AddMemoryPool(this); state_.value_ = 0; state_.event_dependencies_ = 1; state_.opportunistic_ = 1; state_.internal_dependencies_ = 1; state_.interprocess_ = interprocess; } virtual ~MemoryPool() { if (!busy_heap_.IsEmpty()) { LogError("Shouldn't destroy pool with busy allocations!"); } ReleaseAllMemory(); // Remove memory pool from the list of all pool on the current device device_->RemoveMemoryPool(this); if (shared_ != nullptr) { // Note: The app supposes to close the handle... Double close in Windows will cause a crash amd::Os::CloseIpcMemory(0, shared_, sizeof(SharedMemPool)); } } /// The same stream can reuse memory without HIP event validation void* AllocateMemory(size_t size, hip::Stream* stream, void* dptr = nullptr); /// Frees memory by placing memory object with HIP event into free_heap_ bool FreeMemory(amd::Memory* memory, hip::Stream* stream); /// Check if memory is active and belongs to the busy heap bool IsBusyMemory(amd::Memory* memory) const { return busy_heap_.IsActiveMemory(memory); } /// Releases all allocations from free_heap_. It can be called on Stream or Device synchronization /// @note The caller must make sure it's safe to release memory void ReleaseFreedMemory(hip::Stream* stream = nullptr); /// Removes a stream from tracking void RemoveStream(hip::Stream* stream); /// Releases all allocations in MemoryPool void ReleaseAllMemory(); /// Place the allocated memory into the busy heap void AddBusyMemory(amd::Memory* memory) { busy_heap_.AddMemory(memory, nullptr); } /// Trims the pool until it has only min_bytes_to_hold void TrimTo(size_t min_bytes_to_hold); /// Trims the pool until it has only min_bytes_to_hold hip::Device* Device() const { return device_; } /// Set memory pool control attributes hipError_t SetAttribute(hipMemPoolAttr attr, void* value); /// Get memory pool control attributes hipError_t GetAttribute(hipMemPoolAttr attr, void* value); /// Set memory pool access by different devices void SetAccess(hip::Device* device, hipMemAccessFlags flags); /// Set memory pool access by different devices void GetAccess(hip::Device* device, hipMemAccessFlags* flags); /// Frees all busy memory void FreeAllMemory(hip::Stream* stream = nullptr); /// Exports memory pool into an OS specific handle amd::Os::FileDesc Export(); /// Imports memory pool from an OS specific handle bool Import(amd::Os::FileDesc handle); /// Accessors for the pool state bool EventDependencies() const { return (state_.event_dependencies_) ? true : false; } bool Opportunistic() const { return (state_.opportunistic_) ? true : false; } bool InternalDependencies() const { return (state_.internal_dependencies_) ? true : false; } private: MemoryPool() = delete; MemoryPool(const MemoryPool&) = delete; MemoryPool& operator=(const MemoryPool&) = delete; Heap busy_heap_; //!< Heap of busy allocations Heap free_heap_; //!< Heap of freed allocations union { struct { uint32_t event_dependencies_ : 1; //!< Event dependencies tracking is enabled uint32_t opportunistic_ : 1; //!< HIP event check is enabled uint32_t internal_dependencies_ : 1; //!< Runtime adds internal events to handle memory //!< dependencies uint32_t interprocess_ : 1; //!< Memory pool can be used in interprocess communications }; uint32_t value_; } state_; amd::Monitor lock_pool_ops_; //!< Access to the pool must be lock protected std::map access_map_; //!< Map of access to the pool from devices hip::Device* device_; //!< Hip device the heap will reside SharedMemPool* shared_; //!< Pointer to shared memory for IPC }; } // Mamespace hip clr-rocm-5.7.1/hipamd/src/hip_module.cpp000066400000000000000000001101411450307266000201250ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include "hip_internal.hpp" #include "platform/program.hpp" #include "hip_event.hpp" #include "hip_platform.hpp" hipError_t ihipModuleLoadData(hipModule_t* module, const void* mmap_ptr, size_t mmap_size); extern hipError_t ihipLaunchKernel(const void* hostFunction, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMemBytes, hipStream_t stream, hipEvent_t startEvent, hipEvent_t stopEvent, int flags); const std::string& FunctionName(const hipFunction_t f) { return hip::DeviceFunc::asFunction(f)->kernel()->name(); } static uint64_t ElfSize(const void* emi) { return amd::Elf::getElfSize(emi); } hipError_t hipModuleUnload(hipModule_t hmod) { HIP_INIT_API(hipModuleUnload, hmod); if (hmod == nullptr) { HIP_RETURN(hipErrorInvalidResourceHandle); } HIP_RETURN(PlatformState::instance().unloadModule(hmod)); } hipError_t hipModuleLoad(hipModule_t* module, const char* fname) { HIP_INIT_API(hipModuleLoad, module, fname); HIP_RETURN(PlatformState::instance().loadModule(module, fname)); } hipError_t hipModuleLoadData(hipModule_t* module, const void* image) { HIP_INIT_API(hipModuleLoadData, module, image); HIP_RETURN(PlatformState::instance().loadModule(module, 0, image)); } hipError_t hipModuleLoadDataEx(hipModule_t* module, const void* image, unsigned int numOptions, hipJitOption* options, void** optionsValues) { /* TODO: Pass options to Program */ HIP_INIT_API(hipModuleLoadDataEx, module, image); HIP_RETURN(PlatformState::instance().loadModule(module, 0, image)); } extern hipError_t __hipExtractCodeObjectFromFatBinary( const void* data, const std::vector& devices, std::vector>& code_objs); hipError_t hipModuleGetFunction(hipFunction_t* hfunc, hipModule_t hmod, const char* name) { HIP_INIT_API(hipModuleGetFunction, hfunc, hmod, name); if (hfunc == nullptr || name == nullptr || strlen(name) == 0) { HIP_RETURN(hipErrorInvalidValue); } if (hmod == nullptr) { HIP_RETURN(hipErrorInvalidResourceHandle); } if (hipSuccess != PlatformState::instance().getDynFunc(hfunc, hmod, name)) { LogPrintfError("Cannot find the function: %s for module: 0x%x \n", name, hmod); HIP_RETURN(hipErrorNotFound); } HIP_RETURN(hipSuccess); } hipError_t hipModuleGetGlobal(hipDeviceptr_t* dptr, size_t* bytes, hipModule_t hmod, const char* name) { HIP_INIT_API(hipModuleGetGlobal, dptr, bytes, hmod, name); if (dptr == nullptr || bytes == nullptr) { // If either is nullptr, ignore it HIP_RETURN(hipSuccess); } if ((dptr == nullptr && bytes == nullptr) || name == nullptr || strlen(name) == 0) { HIP_RETURN(hipErrorInvalidValue); } if (hmod == nullptr) { HIP_RETURN(hipErrorInvalidResourceHandle); } /* Get address and size for the global symbol */ if (hipSuccess != PlatformState::instance().getDynGlobalVar(name, hmod, dptr, bytes)) { LogPrintfError("Cannot find global Var: %s for module: 0x%x at device: %d \n", name, hmod, ihipGetDevice()); HIP_RETURN(hipErrorNotFound); } HIP_RETURN(hipSuccess); } hipError_t hipFuncGetAttribute(int* value, hipFunction_attribute attrib, hipFunction_t hfunc) { HIP_INIT_API(hipFuncGetAttribute, value, attrib, hfunc); if ((value == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } hip::DeviceFunc* function = hip::DeviceFunc::asFunction(hfunc); if (function == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } amd::Kernel* kernel = function->kernel(); if (kernel == nullptr) { HIP_RETURN(hipErrorInvalidDeviceFunction); } const device::Kernel::WorkGroupInfo* wrkGrpInfo = kernel->getDeviceKernel(*(hip::getCurrentDevice()->devices()[0]))->workGroupInfo(); if (wrkGrpInfo == nullptr) { HIP_RETURN(hipErrorMissingConfiguration); } switch (attrib) { case HIP_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES: *value = static_cast(wrkGrpInfo->localMemSize_); break; case HIP_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK: *value = static_cast(wrkGrpInfo->size_); break; case HIP_FUNC_ATTRIBUTE_CONST_SIZE_BYTES: *value = 0; break; case HIP_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES: *value = static_cast(wrkGrpInfo->privateMemSize_); break; case HIP_FUNC_ATTRIBUTE_NUM_REGS: *value = static_cast(wrkGrpInfo->usedVGPRs_); break; case HIP_FUNC_ATTRIBUTE_PTX_VERSION: *value = 30; // Defaults to 3.0 as HCC break; case HIP_FUNC_ATTRIBUTE_BINARY_VERSION: *value = static_cast(kernel->signature().version()); break; case HIP_FUNC_ATTRIBUTE_CACHE_MODE_CA: *value = 0; break; case HIP_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES: *value = static_cast(wrkGrpInfo->availableLDSSize_ - wrkGrpInfo->localMemSize_); break; case HIP_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT: *value = 0; break; default: HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipFuncGetAttributes(hipFuncAttributes* attr, const void* func) { HIP_INIT_API(hipFuncGetAttributes, attr, func); HIP_RETURN_ONFAIL(PlatformState::instance().getStatFuncAttr(attr, func, ihipGetDevice())); HIP_RETURN(hipSuccess); } hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value) { HIP_INIT_API(hipFuncSetAttribute, func, attr, value); // No way to set function attribute yet. HIP_RETURN(hipSuccess); } hipError_t hipFuncSetCacheConfig(const void* func, hipFuncCache_t cacheConfig) { HIP_INIT_API(hipFuncSetCacheConfig, cacheConfig); // No way to set cache config yet. HIP_RETURN(hipSuccess); } hipError_t hipFuncSetSharedMemConfig(const void* func, hipSharedMemConfig config) { HIP_INIT_API(hipFuncSetSharedMemConfig, func, config); // No way to set Shared Memory config function yet. HIP_RETURN(hipSuccess); } hipError_t ihipLaunchKernel_validate(hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, uint32_t sharedMemBytes, void** kernelParams, void** extra, int deviceId, uint32_t params = 0) { if (f == nullptr) { LogPrintfError("%s", "Function passed is null"); return hipErrorInvalidImage; } if ((kernelParams != nullptr) && (extra != nullptr)) { LogPrintfError("%s", "Both, kernelParams and extra Params are provided, only one should be provided"); return hipErrorInvalidValue; } if (globalWorkSizeX == 0 || globalWorkSizeY == 0 || globalWorkSizeZ == 0 || blockDimX == 0 || blockDimY == 0 || blockDimZ == 0) { return hipErrorInvalidValue; } const amd::Device* device = g_devices[deviceId]->devices()[0]; const auto& info = device->info(); if (sharedMemBytes > info.localMemSizePerCU_) { //sharedMemPerBlock return hipErrorInvalidValue; } // Make sure dispatch doesn't exceed max workgroup size limit if (blockDimX * blockDimY * blockDimZ > info.maxWorkGroupSize_) { return hipErrorInvalidValue; } hip::DeviceFunc* function = hip::DeviceFunc::asFunction(f); amd::Kernel* kernel = function->kernel(); if (!kernel->getDeviceKernel(*device)) { return hipErrorInvalidDevice; } // Make sure the launch params are not larger than if specified launch_bounds // If it exceeds, then return a failure if (blockDimX * blockDimY * blockDimZ > kernel->getDeviceKernel(*device)->workGroupInfo()->size_) { LogPrintfError("Launch params (%u, %u, %u) are larger than launch bounds (%lu) for kernel %s", blockDimX, blockDimY, blockDimZ, kernel->getDeviceKernel(*device)->workGroupInfo()->size_, function->name().c_str()); return hipErrorLaunchFailure; } if (params & amd::NDRangeKernelCommand::CooperativeGroups) { if (!device->info().cooperativeGroups_) { return hipErrorLaunchFailure; } int num_blocks = 0; int max_blocks_per_grid = 0; int best_block_size = 0; int block_size = blockDimX * blockDimY * blockDimZ; hipError_t err = hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor( &num_blocks, &max_blocks_per_grid, &best_block_size, *device, f, block_size, sharedMemBytes, true); if (err != hipSuccess) { return err; } if (((globalWorkSizeX * globalWorkSizeY * globalWorkSizeZ) / block_size) > unsigned(max_blocks_per_grid)) { return hipErrorCooperativeLaunchTooLarge; } } if (params & amd::NDRangeKernelCommand::CooperativeMultiDeviceGroups) { if (!device->info().cooperativeMultiDeviceGroups_) { return hipErrorLaunchFailure; } } address kernargs = nullptr; // 'extra' is a struct that contains the following info: { // HIP_LAUNCH_PARAM_BUFFER_POINTER, kernargs, // HIP_LAUNCH_PARAM_BUFFER_SIZE, &kernargs_size, // HIP_LAUNCH_PARAM_END } if (extra != nullptr) { if (extra[0] != HIP_LAUNCH_PARAM_BUFFER_POINTER || extra[2] != HIP_LAUNCH_PARAM_BUFFER_SIZE || extra[4] != HIP_LAUNCH_PARAM_END) { return hipErrorInvalidValue; } kernargs = reinterpret_cast
(extra[1]); } const amd::KernelSignature& signature = kernel->signature(); for (size_t i = 0; i < signature.numParameters(); ++i) { const amd::KernelParameterDescriptor& desc = signature.at(i); if (kernelParams == nullptr) { assert(kernargs != nullptr); kernel->parameters().set(i, desc.size_, kernargs + desc.offset_, desc.type_ == T_POINTER /*svmBound*/); } else { assert(extra == nullptr); kernel->parameters().set(i, desc.size_, kernelParams[i], desc.type_ == T_POINTER /*svmBound*/); } } return hipSuccess; } hipError_t ihipLaunchKernelCommand(amd::Command*& command, hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, uint32_t sharedMemBytes, hip::Stream* stream, void** kernelParams, void** extra, hipEvent_t startEvent = nullptr, hipEvent_t stopEvent = nullptr, uint32_t flags = 0, uint32_t params = 0, uint32_t gridId = 0, uint32_t numGrids = 0, uint64_t prevGridSum = 0, uint64_t allGridSum = 0, uint32_t firstDevice = 0) { hip::DeviceFunc* function = hip::DeviceFunc::asFunction(f); amd::Kernel* kernel = function->kernel(); size_t globalWorkOffset[3] = {0}; size_t globalWorkSize[3] = {globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ}; size_t localWorkSize[3] = {blockDimX, blockDimY, blockDimZ}; amd::NDRangeContainer ndrange(3, globalWorkOffset, globalWorkSize, localWorkSize); amd::Command::EventWaitList waitList; address kernargs = nullptr; bool profileNDRange = (startEvent != nullptr || stopEvent != nullptr); // Flag set to 1 signifies that kernel can be launched in anyorder if (flags & hipExtAnyOrderLaunch) { params |= amd::NDRangeKernelCommand::AnyOrderLaunch; } amd::NDRangeKernelCommand* kernelCommand = new amd::NDRangeKernelCommand( *stream, waitList, *kernel, ndrange, sharedMemBytes, params, gridId, numGrids, prevGridSum, allGridSum, firstDevice, profileNDRange); if (!kernelCommand) { return hipErrorOutOfMemory; } // Capture the kernel arguments if (CL_SUCCESS != kernelCommand->captureAndValidate()) { kernelCommand->release(); return hipErrorOutOfMemory; } command = kernelCommand; return hipSuccess; } hipError_t ihipModuleLaunchKernel(hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, uint32_t sharedMemBytes, hipStream_t hStream, void** kernelParams, void** extra, hipEvent_t startEvent, hipEvent_t stopEvent, uint32_t flags = 0, uint32_t params = 0, uint32_t gridId = 0, uint32_t numGrids = 0, uint64_t prevGridSum = 0, uint64_t allGridSum = 0, uint32_t firstDevice = 0) { int deviceId = hip::Stream::DeviceId(hStream); HIP_RETURN_ONFAIL(PlatformState::instance().initStatManagedVarDevicePtr(deviceId)); if (f == nullptr) { LogPrintfError("%s", "Function passed is null"); return hipErrorInvalidResourceHandle; } hip::DeviceFunc* function = hip::DeviceFunc::asFunction(f); amd::Kernel* kernel = function->kernel(); amd::ScopedLock lock(function->dflock_); hipError_t status = ihipLaunchKernel_validate( f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, kernelParams, extra, deviceId, params); if (status != hipSuccess) { return status; } // Make sure the app doesn't launch a workgroup bigger than the global size if (globalWorkSizeX < blockDimX) blockDimX = globalWorkSizeX; if (globalWorkSizeY < blockDimY) blockDimY = globalWorkSizeY; if (globalWorkSizeZ < blockDimZ) blockDimZ = globalWorkSizeZ; auto device = g_devices[deviceId]->devices()[0]; // Check if it's a uniform kernel and validate dimensions if (kernel->getDeviceKernel(*device)->getUniformWorkGroupSize()) { if (((globalWorkSizeX % blockDimX) != 0) || ((globalWorkSizeY % blockDimY) != 0) || ((globalWorkSizeZ % blockDimZ) != 0)) { return hipErrorInvalidValue; } } amd::Command* command = nullptr; hip::Stream* hip_stream = hip::getStream(hStream); status = ihipLaunchKernelCommand(command, f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, hip_stream, kernelParams, extra, startEvent, stopEvent, flags, params, gridId, numGrids, prevGridSum, allGridSum, firstDevice); if (status != hipSuccess) { return status; } if (startEvent != nullptr) { hip::Event* eStart = reinterpret_cast(startEvent); status = eStart->addMarker(hStream, nullptr, false); if (status != hipSuccess) { return status; } } if (stopEvent != nullptr) { hip::Event* eStop = reinterpret_cast(stopEvent); if (eStop->flags & hipEventDisableSystemFence) { command->setEventScope(amd::Device::kCacheStateIgnore); } else { command->setEventScope(amd::Device::kCacheStateSystem); } // Enqueue Dispatch and bind the stop event command->enqueue(); eStop->BindCommand(*command, false); } else { command->enqueue(); } if (command->status() == CL_INVALID_OPERATION) { command->release(); return hipErrorIllegalState; } command->release(); return hipSuccess; } hipError_t hipModuleLaunchKernel(hipFunction_t f, uint32_t gridDimX, uint32_t gridDimY, uint32_t gridDimZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, uint32_t sharedMemBytes, hipStream_t hStream, void** kernelParams, void** extra) { HIP_INIT_API(hipModuleLaunchKernel, f, gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, hStream, kernelParams, extra); if (!hip::isValid(hStream)) { HIP_RETURN(hipErrorInvalidValue); } STREAM_CAPTURE(hipModuleLaunchKernel, hStream, f, gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, kernelParams, extra); if (gridDimX > std::numeric_limits::max() || gridDimY > std::numeric_limits::max() || gridDimZ > std::numeric_limits::max()) { HIP_RETURN(hipErrorInvalidValue); } size_t globalWorkSizeX = static_cast(gridDimX) * blockDimX; size_t globalWorkSizeY = static_cast(gridDimY) * blockDimY; size_t globalWorkSizeZ = static_cast(gridDimZ) * blockDimZ; if (globalWorkSizeX > std::numeric_limits::max()) { HIP_RETURN(hipErrorInvalidConfiguration); } HIP_RETURN(ihipModuleLaunchKernel( f, static_cast(globalWorkSizeX), static_cast(globalWorkSizeY), static_cast(globalWorkSizeZ), blockDimX, blockDimY, blockDimZ, sharedMemBytes, hStream, kernelParams, extra, nullptr, nullptr)); } hipError_t hipExtModuleLaunchKernel(hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t localWorkSizeX, uint32_t localWorkSizeY, uint32_t localWorkSizeZ, size_t sharedMemBytes, hipStream_t hStream, void** kernelParams, void** extra, hipEvent_t startEvent, hipEvent_t stopEvent, uint32_t flags) { HIP_INIT_API(hipExtModuleLaunchKernel, f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, localWorkSizeX, localWorkSizeY, localWorkSizeZ, sharedMemBytes, hStream, kernelParams, extra, startEvent, stopEvent, flags); if (!hip::isValid(hStream)) { HIP_RETURN(hipErrorInvalidValue); } STREAM_CAPTURE(hipExtModuleLaunchKernel, hStream, f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, localWorkSizeX, localWorkSizeY, localWorkSizeZ, sharedMemBytes, kernelParams, extra, startEvent, stopEvent, flags); HIP_RETURN(ihipModuleLaunchKernel(f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, localWorkSizeX, localWorkSizeY, localWorkSizeZ, sharedMemBytes, hStream, kernelParams, extra, startEvent, stopEvent, flags)); } hipError_t hipHccModuleLaunchKernel(hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, size_t sharedMemBytes, hipStream_t hStream, void** kernelParams, void** extra, hipEvent_t startEvent, hipEvent_t stopEvent) { HIP_INIT_API(hipHccModuleLaunchKernel, f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, hStream, kernelParams, extra, startEvent, stopEvent); HIP_RETURN(ihipModuleLaunchKernel(f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, hStream, kernelParams, extra, startEvent, stopEvent)); } hipError_t hipModuleLaunchKernelExt(hipFunction_t f, uint32_t globalWorkSizeX, uint32_t globalWorkSizeY, uint32_t globalWorkSizeZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, size_t sharedMemBytes, hipStream_t hStream, void** kernelParams, void** extra, hipEvent_t startEvent, hipEvent_t stopEvent) { HIP_INIT_API(hipModuleLaunchKernelExt, f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, hStream, kernelParams, extra, startEvent, stopEvent); HIP_RETURN(ihipModuleLaunchKernel(f, globalWorkSizeX, globalWorkSizeY, globalWorkSizeZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, hStream, kernelParams, extra, startEvent, stopEvent)); } hipError_t hipModuleLaunchCooperativeKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void** kernelParams) { HIP_INIT_API(hipModuleLaunchCooperativeKernel, f, gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes, stream, kernelParams); if (!hip::isValid(stream)) { HIP_RETURN(hipErrorInvalidValue); } size_t globalWorkSizeX = static_cast(gridDimX) * blockDimX; size_t globalWorkSizeY = static_cast(gridDimY) * blockDimY; size_t globalWorkSizeZ = static_cast(gridDimZ) * blockDimZ; if (globalWorkSizeX > std::numeric_limits::max() || globalWorkSizeY > std::numeric_limits::max() || globalWorkSizeZ > std::numeric_limits::max()) { HIP_RETURN(hipErrorInvalidConfiguration); } HIP_RETURN(ihipModuleLaunchKernel(f, static_cast(globalWorkSizeX), static_cast(globalWorkSizeY), static_cast(globalWorkSizeZ), blockDimX, blockDimY, blockDimZ, sharedMemBytes, stream, kernelParams, nullptr, nullptr, nullptr, 0, amd::NDRangeKernelCommand::CooperativeGroups)); } hipError_t ihipModuleLaunchCooperativeKernelMultiDevice(hipFunctionLaunchParams* launchParamsList, unsigned int numDevices, unsigned int flags, uint32_t extFlags) { int numActiveGPUs = 0; hipError_t result = hipSuccess; result = ihipDeviceGetCount(&numActiveGPUs); if ((numDevices == 0) || (numDevices > numActiveGPUs)) { return hipErrorInvalidValue; } if (flags > (hipCooperativeLaunchMultiDeviceNoPostSync + hipCooperativeLaunchMultiDeviceNoPreSync)) { return hipErrorInvalidValue; } uint64_t allGridSize = 0; std::vector mgpu_list(numDevices); for (int i = 0; i < numDevices; ++i) { uint32_t blockDims = 0; const hipFunctionLaunchParams& launch = launchParamsList[i]; blockDims = launch.blockDimX * launch.blockDimY * launch.blockDimZ; allGridSize += launch.gridDimX * launch.gridDimY * launch.gridDimZ * blockDims; // Make sure block dimensions are valid if (0 == blockDims) { return hipErrorInvalidConfiguration; } if (launch.hStream != nullptr) { // Validate devices to make sure it dosn't have duplicates hip::Stream* hip_stream = reinterpret_cast(launch.hStream); auto device = &hip_stream->vdev()->device(); for (int j = 0; j < numDevices; ++j) { if (mgpu_list[j] == device) { return hipErrorInvalidDevice; } } mgpu_list[i] = device; } else { return hipErrorInvalidResourceHandle; } } uint64_t prevGridSize = 0; uint32_t firstDevice = 0; // Sync the execution streams on all devices if ((flags & hipCooperativeLaunchMultiDeviceNoPreSync) == 0) { for (int i = 0; i < numDevices; ++i) { hip::Stream* hip_stream = reinterpret_cast(launchParamsList[i].hStream); hip_stream->finish(); } } for (int i = 0; i < numDevices; ++i) { const hipFunctionLaunchParams& launch = launchParamsList[i]; hip::Stream* hip_stream = reinterpret_cast(launch.hStream); if (i == 0) { // The order of devices in the launch may not match the order in the global array for (size_t dev = 0; dev < g_devices.size(); ++dev) { // Find the matching device if (&hip_stream->vdev()->device() == g_devices[dev]->devices()[0]) { // Save ROCclr index of the first device in the launch firstDevice = hip_stream->vdev()->device().index(); break; } } } size_t globalWorkSizeX = static_cast(launch.gridDimX) * launch.blockDimX; size_t globalWorkSizeY = static_cast(launch.gridDimY) * launch.blockDimY; size_t globalWorkSizeZ = static_cast(launch.gridDimZ) * launch.blockDimZ; if (globalWorkSizeX > std::numeric_limits::max() || globalWorkSizeY > std::numeric_limits::max() || globalWorkSizeZ > std::numeric_limits::max()) { return hipErrorInvalidConfiguration; } result = ihipModuleLaunchKernel( launch.function, static_cast(globalWorkSizeX), static_cast(globalWorkSizeY), static_cast(globalWorkSizeZ), launch.blockDimX, launch.blockDimY, launch.blockDimZ, launch.sharedMemBytes, launch.hStream, launch.kernelParams, nullptr, nullptr, nullptr, flags, extFlags, i, numDevices, prevGridSize, allGridSize, firstDevice); if (result != hipSuccess) { break; } prevGridSize += globalWorkSizeX * globalWorkSizeY * globalWorkSizeZ; } // Sync the execution streams on all devices if ((flags & hipCooperativeLaunchMultiDeviceNoPostSync) == 0) { for (int i = 0; i < numDevices; ++i) { hip::Stream* hip_stream = reinterpret_cast(launchParamsList[i].hStream); hip_stream->finish(); } } return result; } hipError_t hipModuleLaunchCooperativeKernelMultiDevice(hipFunctionLaunchParams* launchParamsList, unsigned int numDevices, unsigned int flags) { HIP_INIT_API(hipModuleLaunchCooperativeKernelMultiDevice, launchParamsList, numDevices, flags); if (launchParamsList == nullptr) { HIP_RETURN(hipErrorInvalidValue); } // Validate all streams passed by user for (int i = 0; i < numDevices; ++i) { if (!hip::isValid(launchParamsList[i].hStream)) { HIP_RETURN(hipErrorInvalidValue); } } HIP_RETURN(ihipModuleLaunchCooperativeKernelMultiDevice( launchParamsList, numDevices, flags, (amd::NDRangeKernelCommand::CooperativeGroups | amd::NDRangeKernelCommand::CooperativeMultiDeviceGroups))); } extern "C" hipError_t hipLaunchKernel_common(const void* hostFunction, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMemBytes, hipStream_t stream) { STREAM_CAPTURE(hipLaunchKernel, stream, hostFunction, gridDim, blockDim, args, sharedMemBytes); return ihipLaunchKernel(hostFunction, gridDim, blockDim, args, sharedMemBytes, stream, nullptr, nullptr, 0); } extern "C" hipError_t hipLaunchKernel(const void* hostFunction, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMemBytes, hipStream_t stream) { HIP_INIT_API(hipLaunchKernel, hostFunction, gridDim, blockDim, args, sharedMemBytes, stream); HIP_RETURN(hipLaunchKernel_common(hostFunction, gridDim, blockDim, args, sharedMemBytes, stream)); } extern "C" hipError_t hipLaunchKernel_spt(const void* hostFunction, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMemBytes, hipStream_t stream) { HIP_INIT_API(hipLaunchKernel, hostFunction, gridDim, blockDim, args, sharedMemBytes, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipLaunchKernel_common(hostFunction, gridDim, blockDim, args, sharedMemBytes, stream)); } extern "C" hipError_t hipExtLaunchKernel(const void* hostFunction, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMemBytes, hipStream_t stream, hipEvent_t startEvent, hipEvent_t stopEvent, int flags) { HIP_INIT_API(hipExtLaunchKernel, hostFunction, gridDim, blockDim, args, sharedMemBytes, stream, startEvent, stopEvent, flags); if (!hip::isValid(stream) || !hip::isValid(startEvent) || !hip::isValid(stopEvent)) { HIP_RETURN(hipErrorInvalidValue); } STREAM_CAPTURE(hipExtLaunchKernel, stream, hostFunction, gridDim, blockDim, args, sharedMemBytes, startEvent, stopEvent, flags); HIP_RETURN(ihipLaunchKernel(hostFunction, gridDim, blockDim, args, sharedMemBytes, stream, startEvent, stopEvent, flags)); } hipError_t hipLaunchCooperativeKernel_common(const void* f, dim3 gridDim, dim3 blockDim, void** kernelParams, uint32_t sharedMemBytes, hipStream_t hStream) { if (!hip::isValid(hStream)) { return hipErrorInvalidValue; } hipFunction_t func = nullptr; int deviceId = hip::Stream::DeviceId(hStream); hipError_t getStatFuncError = PlatformState::instance().getStatFunc(&func, f, deviceId); if (getStatFuncError != hipSuccess) { return getStatFuncError; } const amd::Device* device = g_devices[deviceId]->devices()[0]; size_t globalWorkSizeX = static_cast(gridDim.x) * blockDim.x; size_t globalWorkSizeY = static_cast(gridDim.y) * blockDim.y; size_t globalWorkSizeZ = static_cast(gridDim.z) * blockDim.z; if (globalWorkSizeX > std::numeric_limits::max() || globalWorkSizeY > std::numeric_limits::max() || globalWorkSizeZ > std::numeric_limits::max() || (blockDim.x * blockDim.y * blockDim.z > device->info().maxWorkGroupSize_)) { return hipErrorInvalidConfiguration; } return ihipModuleLaunchKernel(func, static_cast(globalWorkSizeX), static_cast(globalWorkSizeY), static_cast(globalWorkSizeZ), blockDim.x, blockDim.y, blockDim.z, sharedMemBytes, hStream, kernelParams, nullptr, nullptr, nullptr, 0, amd::NDRangeKernelCommand::CooperativeGroups); } hipError_t hipLaunchCooperativeKernel(const void* f, dim3 gridDim, dim3 blockDim, void** kernelParams, uint32_t sharedMemBytes, hipStream_t hStream) { HIP_INIT_API(hipLaunchCooperativeKernel, f, gridDim, blockDim, sharedMemBytes, hStream); HIP_RETURN(hipLaunchCooperativeKernel_common(f, gridDim, blockDim, kernelParams, sharedMemBytes, hStream)); } hipError_t hipLaunchCooperativeKernel_spt(const void* f, dim3 gridDim, dim3 blockDim, void** kernelParams, uint32_t sharedMemBytes, hipStream_t hStream) { HIP_INIT_API(hipLaunchCooperativeKernel, f, gridDim, blockDim, sharedMemBytes, hStream); PER_THREAD_DEFAULT_STREAM(hStream); HIP_RETURN(hipLaunchCooperativeKernel_common(f, gridDim, blockDim, kernelParams, sharedMemBytes, hStream)); } hipError_t ihipLaunchCooperativeKernelMultiDevice(hipLaunchParams* launchParamsList, int numDevices, unsigned int flags, uint32_t extFlags) { if (launchParamsList == nullptr) { return hipErrorInvalidValue; } std::vector functionLaunchParamsList(numDevices); // Convert hipLaunchParams to hipFunctionLaunchParams for (int i = 0; i < numDevices; ++i) { hipLaunchParams& launch = launchParamsList[i]; // Validate stream passed by user if (!hip::isValid(launch.stream)) { return hipErrorInvalidValue; } hip::Stream* hip_stream = hip::getStream(launch.stream); hipFunction_t func = nullptr; // The order of devices in the launch may not match the order in the global array for (size_t dev = 0; dev < g_devices.size(); ++dev) { // Find the matching device and request the kernel function if (&hip_stream->vdev()->device() == g_devices[dev]->devices()[0]) { IHIP_RETURN_ONFAIL(PlatformState::instance().getStatFunc(&func, launch.func, dev)); break; } } if (func == nullptr) { return hipErrorInvalidDeviceFunction; } functionLaunchParamsList[i].function = func; functionLaunchParamsList[i].gridDimX = launch.gridDim.x; functionLaunchParamsList[i].gridDimY = launch.gridDim.y; functionLaunchParamsList[i].gridDimZ = launch.gridDim.z; functionLaunchParamsList[i].blockDimX = launch.blockDim.x; functionLaunchParamsList[i].blockDimY = launch.blockDim.y; functionLaunchParamsList[i].blockDimZ = launch.blockDim.z; functionLaunchParamsList[i].sharedMemBytes = launch.sharedMem; functionLaunchParamsList[i].hStream = launch.stream; functionLaunchParamsList[i].kernelParams = launch.args; } return ihipModuleLaunchCooperativeKernelMultiDevice(functionLaunchParamsList.data(), functionLaunchParamsList.size(), flags, extFlags); } hipError_t hipLaunchCooperativeKernelMultiDevice(hipLaunchParams* launchParamsList, int numDevices, unsigned int flags) { HIP_INIT_API(hipLaunchCooperativeKernelMultiDevice, launchParamsList, numDevices, flags); HIP_RETURN(ihipLaunchCooperativeKernelMultiDevice( launchParamsList, numDevices, flags, (amd::NDRangeKernelCommand::CooperativeGroups | amd::NDRangeKernelCommand::CooperativeMultiDeviceGroups))); } hipError_t hipExtLaunchMultiKernelMultiDevice(hipLaunchParams* launchParamsList, int numDevices, unsigned int flags) { HIP_INIT_API(hipExtLaunchMultiKernelMultiDevice, launchParamsList, numDevices, flags); HIP_RETURN(ihipLaunchCooperativeKernelMultiDevice(launchParamsList, numDevices, flags, 0)); } hipError_t hipModuleGetTexRef(textureReference** texRef, hipModule_t hmod, const char* name) { HIP_INIT_API(hipModuleGetTexRef, texRef, hmod, name); /* input args check */ if ((texRef == nullptr) || (name == nullptr) || (strlen(name) == 0)) { HIP_RETURN(hipErrorInvalidValue); } if (hmod == nullptr) { HIP_RETURN(hipErrorInvalidResourceHandle); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } /* Get address and size for the global symbol */ if (hipSuccess != PlatformState::instance().getDynTexRef(name, hmod, texRef)) { LogPrintfError("Cannot get texRef for name: %s at module:0x%x \n", name, hmod); HIP_RETURN(hipErrorNotFound); } // Texture references created by HIP driver API // have the default read mode set to normalized float. // have format set to format float // set num of channels to 1 (*texRef)->readMode = hipReadModeNormalizedFloat; (*texRef)->format = HIP_AD_FORMAT_FLOAT; (*texRef)->numChannels = 1; hipError_t err = PlatformState::instance().registerTexRef(*texRef, hmod, std::string(name)); HIP_RETURN(err); } clr-rocm-5.7.1/hipamd/src/hip_peer.cpp000066400000000000000000000223271450307266000176030ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" hipError_t hipDeviceCanAccessPeer(int* canAccessPeer, hipCtx_t thisCtx, hipCtx_t peerCtx) { HIP_INIT_API(NONE, canAccessPeer, thisCtx, peerCtx); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipMemcpyPeer(void* dst, hipCtx_t dstCtx, const void* src, hipCtx_t srcCtx, size_t sizeBytes) { HIP_INIT_API(NONE, dst, dstCtx, src, srcCtx, sizeBytes); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipMemcpyPeerAsync(void* dst, hipCtx_t dstDevice, const void* src, hipCtx_t srcDevice, size_t sizeBytes, hipStream_t stream) { HIP_INIT_API(NONE, dst, dstDevice, src, srcDevice, sizeBytes, stream); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t canAccessPeer(int* canAccessPeer, int deviceId, int peerDeviceId){ amd::Device* device = nullptr; amd::Device* peer_device = nullptr; if (canAccessPeer == nullptr) { return hipErrorInvalidValue; } /* Peer cannot be self */ if (deviceId == peerDeviceId) { *canAccessPeer = 0; return hipSuccess; } /* Cannot exceed the max number of devices */ if (static_cast(deviceId) >= g_devices.size() || static_cast(peerDeviceId) >= g_devices.size()) { return hipErrorInvalidDevice; } device = g_devices[deviceId]->devices()[0]; peer_device = g_devices[peerDeviceId]->devices()[0]; *canAccessPeer = static_cast(std::find(device->p2pDevices_.begin(), device->p2pDevices_.end(), as_cl(peer_device)) != device->p2pDevices_.end()); return hipSuccess; } hipError_t findLinkInfo(int device1, int device2, std::vector* link_attrs) { amd::Device* amd_dev_obj1 = nullptr; amd::Device* amd_dev_obj2 = nullptr; const int numDevices = static_cast(g_devices.size()); if ((device1 < 0) || (device1 >= numDevices) || (device2 < 0) || (device2 >= numDevices)) { return hipErrorInvalidDevice; } amd_dev_obj1 = g_devices[device1]->devices()[0]; amd_dev_obj2 = g_devices[device2]->devices()[0]; if (!amd_dev_obj1->findLinkInfo(*amd_dev_obj2, link_attrs)) { return hipErrorInvalidHandle; } return hipSuccess; } hipError_t hipExtGetLinkTypeAndHopCount(int device1, int device2, uint32_t* linktype, uint32_t* hopcount) { HIP_INIT_API(hipExtGetLinkTypeAndHopCount, device1, device2, linktype, hopcount); if (linktype == nullptr || hopcount == nullptr || device1 == device2 || device1 < 0 || device2 < 0) { HIP_RETURN(hipErrorInvalidValue); } // Fill out the list of LinkAttributes std::vector link_attrs; link_attrs.push_back(std::make_pair(amd::Device::LinkAttribute::kLinkLinkType, 0)); link_attrs.push_back(std::make_pair(amd::Device::LinkAttribute::kLinkHopCount, 0)); HIP_RETURN_ONFAIL(findLinkInfo(device1, device2, &link_attrs)); *linktype = static_cast(link_attrs[0].second); *hopcount = static_cast(link_attrs[1].second); HIP_RETURN(hipSuccess); } hipError_t hipDeviceGetP2PAttribute(int* value, hipDeviceP2PAttr attr, int srcDevice, int dstDevice) { HIP_INIT_API(hipDeviceGetP2PAttribute, value, attr, srcDevice, dstDevice); if (value == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (srcDevice == dstDevice || srcDevice >= static_cast(g_devices.size()) || dstDevice >= static_cast(g_devices.size())) { HIP_RETURN(hipErrorInvalidDevice); } std::vector link_attrs; switch (attr) { case hipDevP2PAttrPerformanceRank : { link_attrs.push_back(std::make_pair(amd::Device::LinkAttribute::kLinkLinkType, 0)); break; } case hipDevP2PAttrAccessSupported : { HIP_RETURN_ONFAIL(canAccessPeer(value, srcDevice, dstDevice)); break; } case hipDevP2PAttrNativeAtomicSupported : { link_attrs.push_back(std::make_pair(amd::Device::LinkAttribute::kLinkAtomicSupport, 0)); break; } case hipDevP2PAttrHipArrayAccessSupported : { hipDeviceProp_t srcDeviceProp; hipDeviceProp_t dstDeviceProp; HIP_RETURN_ONFAIL(hipGetDeviceProperties(&srcDeviceProp, srcDevice)); HIP_RETURN_ONFAIL(hipGetDeviceProperties(&dstDeviceProp, dstDevice)); // Linear layout access is supported if P2P is enabled // Opaque Images are supported only on homogeneous systems // Might have more conditions to check, in future. if (srcDeviceProp.gcnArch == dstDeviceProp.gcnArch) { HIP_RETURN_ONFAIL(canAccessPeer(value, srcDevice, dstDevice)); } else { *value = 0; } break; } default : { LogPrintfError("Invalid attribute attr: %d ", attr); HIP_RETURN(hipErrorInvalidValue); } } if ((attr != hipDevP2PAttrAccessSupported) && (attr != hipDevP2PAttrHipArrayAccessSupported)) { HIP_RETURN_ONFAIL(findLinkInfo(srcDevice, dstDevice, &link_attrs)); *value = static_cast(link_attrs[0].second); } HIP_RETURN(hipSuccess); } hipError_t hipDeviceCanAccessPeer(int* canAccess, int deviceId, int peerDeviceId) { HIP_INIT_API(hipDeviceCanAccessPeer, canAccess, deviceId, peerDeviceId); HIP_RETURN(canAccessPeer(canAccess, deviceId, peerDeviceId)); } hipError_t hipDeviceDisablePeerAccess(int peerDeviceId) { HIP_INIT_API(hipDeviceDisablePeerAccess, peerDeviceId); int deviceId = hip::getCurrentDevice()->deviceId(); int canAccess = 0; if ((hipSuccess != canAccessPeer(&canAccess, deviceId, peerDeviceId)) || (canAccess == 0)) { HIP_RETURN(hipErrorInvalidDevice); } amd::Device* device = g_devices[deviceId]->devices()[0]; amd::Device* peer_device = g_devices[peerDeviceId]->devices()[0]; peer_device->disableP2P(device); HIP_RETURN(hip::getCurrentDevice()->DisablePeerAccess(peerDeviceId)); } hipError_t hipDeviceEnablePeerAccess(int peerDeviceId, unsigned int flags) { HIP_INIT_API(hipDeviceEnablePeerAccess, peerDeviceId, flags); int deviceId = hip::getCurrentDevice()->deviceId(); int canAccess = 0; if (flags != 0) { HIP_RETURN(hipErrorInvalidValue); } if ((hipSuccess != canAccessPeer(&canAccess, deviceId, peerDeviceId)) || (canAccess == 0)) { HIP_RETURN(hipErrorInvalidDevice); } amd::Device* device = g_devices[deviceId]->asContext()->devices()[0]; amd::Device* peer_device = g_devices[peerDeviceId]->asContext()->devices()[0]; peer_device->enableP2P(device); HIP_RETURN(hip::getCurrentDevice()->EnablePeerAccess(peerDeviceId)); } hipError_t hipMemcpyPeer(void* dst, int dstDevice, const void* src, int srcDevice, size_t sizeBytes) { HIP_INIT_API(hipMemcpyPeer, dst, dstDevice, src, srcDevice, sizeBytes); if (srcDevice >= static_cast(g_devices.size()) || dstDevice >= static_cast(g_devices.size()) || srcDevice < 0 || dstDevice < 0) { HIP_RETURN(hipErrorInvalidDevice); } HIP_RETURN(ihipMemcpy(dst, src, sizeBytes, hipMemcpyDeviceToDevice, *hip::getNullStream(), true, false)); } hipError_t hipMemcpyPeerAsync(void* dst, int dstDevice, const void* src, int srcDevice, size_t sizeBytes, hipStream_t stream) { HIP_INIT_API(hipMemcpyPeerAsync, dst, dstDevice, src, srcDevice, sizeBytes, stream); if (srcDevice >= static_cast(g_devices.size()) || dstDevice >= static_cast(g_devices.size()) || srcDevice < 0 || dstDevice < 0) { HIP_RETURN(hipErrorInvalidDevice); } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* hip_stream = hip::getStream(stream); if (hip_stream == nullptr) { return hipErrorInvalidValue; } HIP_RETURN(ihipMemcpy(dst, src, sizeBytes, hipMemcpyDeviceToDevice, *hip_stream, true, true)); } hipError_t hipCtxEnablePeerAccess(hipCtx_t peerCtx, unsigned int flags) { HIP_INIT_API(hipCtxEnablePeerAccess, peerCtx, flags); HIP_RETURN(hipSuccess); } hipError_t hipCtxDisablePeerAccess(hipCtx_t peerCtx) { HIP_INIT_API(hipCtxDisablePeerAccess, peerCtx); HIP_RETURN(hipSuccess); } clr-rocm-5.7.1/hipamd/src/hip_platform.cpp000066400000000000000000001052021450307266000204660ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include "hip_platform.hpp" #include "hip_internal.hpp" #include "platform/program.hpp" #include "platform/runtime.hpp" #include constexpr unsigned __hipFatMAGIC2 = 0x48495046; // "HIPF" PlatformState* PlatformState::platform_; // Initiaized as nullptr by default // forward declaration of methods required for __hipRegisrterManagedVar hipError_t ihipMallocManaged(void** ptr, size_t size, unsigned int align = 0); struct __CudaFatBinaryWrapper { unsigned int magic; unsigned int version; void* binary; void* dummy1; }; hipError_t hipModuleGetGlobal(hipDeviceptr_t* dptr, size_t* bytes, hipModule_t hmod, const char* name); hipError_t ihipCreateGlobalVarObj(const char* name, hipModule_t hmod, amd::Memory** amd_mem_obj, hipDeviceptr_t* dptr, size_t* bytes); extern hipError_t ihipModuleLaunchKernel( hipFunction_t f, uint32_t gridDimX, uint32_t gridDimY, uint32_t gridDimZ, uint32_t blockDimX, uint32_t blockDimY, uint32_t blockDimZ, uint32_t sharedMemBytes, hipStream_t hStream, void** kernelParams, void** extra, hipEvent_t startEvent, hipEvent_t stopEvent, uint32_t flags = 0, uint32_t params = 0, uint32_t gridId = 0, uint32_t numGrids = 0, uint64_t prevGridSum = 0, uint64_t allGridSum = 0, uint32_t firstDevice = 0); static bool isCompatibleCodeObject(const std::string& codeobj_target_id, const char* device_name) { // Workaround for device name mismatch. // Device name may contain feature strings delimited by '+', e.g. // gfx900+xnack. Currently HIP-Clang does not include feature strings // in code object target id in fat binary. Therefore drop the feature // strings from device name before comparing it with code object target id. std::string short_name(device_name); auto feature_loc = short_name.find('+'); if (feature_loc != std::string::npos) { short_name.erase(feature_loc); } return codeobj_target_id == short_name; } extern "C" hip::FatBinaryInfo** __hipRegisterFatBinary(const void* data) { const __CudaFatBinaryWrapper* fbwrapper = reinterpret_cast(data); if (fbwrapper->magic != __hipFatMAGIC2 || fbwrapper->version != 1) { LogPrintfError("Cannot Register fat binary. FatMagic: %u version: %u ", fbwrapper->magic, fbwrapper->version); return nullptr; } return PlatformState::instance().addFatBinary(fbwrapper->binary); } extern "C" void __hipRegisterFunction(hip::FatBinaryInfo** modules, const void* hostFunction, char* deviceFunction, const char* deviceName, unsigned int threadLimit, uint3* tid, uint3* bid, dim3* blockDim, dim3* gridDim, int* wSize) { static int enable_deferred_loading{[]() { char* var = getenv("HIP_ENABLE_DEFERRED_LOADING"); return var ? atoi(var) : 1; }()}; hipError_t hip_error = hipSuccess; hip::Function* func = new hip::Function(std::string(deviceName), modules); hip_error = PlatformState::instance().registerStatFunction(hostFunction, func); guarantee((hip_error == hipSuccess), "Cannot register Static function, error: %d \n", hip_error); if (!enable_deferred_loading) { HIP_INIT_VOID(); hipFunction_t hfunc = nullptr; for (size_t dev_idx = 0; dev_idx < g_devices.size(); ++dev_idx) { hip_error = PlatformState::instance().getStatFunc(&hfunc, hostFunction, dev_idx); guarantee((hip_error == hipSuccess), "Cannot retrieve Static function, error: %d \n", hip_error); } } } // Registers a device-side global variable. // For each global variable in device code, there is a corresponding shadow // global variable in host code. The shadow host variable is used to keep // track of the value of the device side global variable between kernel // executions. extern "C" void __hipRegisterVar( hip::FatBinaryInfo** modules, // The device modules containing code object void* var, // The shadow variable in host code char* hostVar, // Variable name in host code char* deviceVar, // Variable name in device code int ext, // Whether this variable is external size_t size, // Size of the variable int constant, // Whether this variable is constant int global) // Unknown, always 0 { hip::Var* var_ptr = new hip::Var(std::string(hostVar), hip::Var::DeviceVarKind::DVK_Variable, size, 0, 0, modules); hipError_t err = PlatformState::instance().registerStatGlobalVar(var, var_ptr); guarantee((err == hipSuccess), "Cannot register Static Global Var, error:%d \n", err); } extern "C" void __hipRegisterSurface( hip::FatBinaryInfo** modules, // The device modules containing code object void* var, // The shadow variable in host code char* hostVar, // Variable name in host code char* deviceVar, // Variable name in device code int type, int ext) { hip::Var* var_ptr = new hip::Var(std::string(hostVar), hip::Var::DeviceVarKind::DVK_Surface, sizeof(surfaceReference), 0, 0, modules); hipError_t err = PlatformState::instance().registerStatGlobalVar(var, var_ptr); guarantee((err == hipSuccess), "Cannot register Static Glbal Var, err:%d \n", err); } extern "C" void __hipRegisterManagedVar( void* hipModule, // Pointer to hip module returned from __hipRegisterFatbinary void** pointer, // Pointer to a chunk of managed memory with size \p size and alignment \p // align HIP runtime allocates such managed memory and assign it to \p pointer void* init_value, // Initial value to be copied into \p pointer const char* name, // Name of the variable in code object size_t size, unsigned align) { HIP_INIT_VOID(); hipError_t status = ihipMallocManaged(pointer, size, align); if (status == hipSuccess) { hip::Stream* stream = hip::getNullStream(); if (stream != nullptr) { status = ihipMemcpy(*pointer, init_value, size, hipMemcpyHostToDevice, *stream); guarantee((status == hipSuccess), "Error during memcpy to managed memory, error:%d \n!", status); } else { ClPrint(amd::LOG_ERROR, amd::LOG_API, "Host Queue is NULL"); } } else { guarantee(false, "Error during allocation of managed memory!, error: %d \n", status); } hip::Var* var_ptr = new hip::Var(std::string(name), hip::Var::DeviceVarKind::DVK_Managed, pointer, size, align, reinterpret_cast(hipModule)); status = PlatformState::instance().registerStatManagedVar(var_ptr); guarantee((status == hipSuccess), "Cannot register Static Managed Var, error: %d \n", status); } extern "C" void __hipRegisterTexture( hip::FatBinaryInfo** modules, // The device modules containing code object void* var, // The shadow variable in host code char* hostVar, // Variable name in host code char* deviceVar, // Variable name in device code int type, int norm, int ext) { hip::Var* var_ptr = new hip::Var(std::string(hostVar), hip::Var::DeviceVarKind::DVK_Texture, sizeof(textureReference), 0, 0, modules); hipError_t err = PlatformState::instance().registerStatGlobalVar(var, var_ptr); guarantee((err == hipSuccess), "Cannot register Static Global Var, status: %d \n", err); } extern "C" void __hipUnregisterFatBinary(hip::FatBinaryInfo** modules) { hipError_t err = PlatformState::instance().removeFatBinary(modules); guarantee((err == hipSuccess), "Cannot Unregister Fat Binary, error:%d \n", err); } extern "C" hipError_t hipConfigureCall(dim3 gridDim, dim3 blockDim, size_t sharedMem, hipStream_t stream) { HIP_INIT_API(hipConfigureCall, gridDim, blockDim, sharedMem, stream); PlatformState::instance().configureCall(gridDim, blockDim, sharedMem, stream); HIP_RETURN(hipSuccess); } extern "C" hipError_t __hipPushCallConfiguration(dim3 gridDim, dim3 blockDim, size_t sharedMem, hipStream_t stream) { HIP_INIT_API(__hipPushCallConfiguration, gridDim, blockDim, sharedMem, stream); PlatformState::instance().configureCall(gridDim, blockDim, sharedMem, stream); HIP_RETURN(hipSuccess); } extern "C" hipError_t __hipPopCallConfiguration(dim3* gridDim, dim3* blockDim, size_t* sharedMem, hipStream_t* stream) { HIP_INIT_API(__hipPopCallConfiguration, gridDim, blockDim, sharedMem, stream); ihipExec_t exec; PlatformState::instance().popExec(exec); *gridDim = exec.gridDim_; *blockDim = exec.blockDim_; *sharedMem = exec.sharedMem_; *stream = exec.hStream_; HIP_RETURN(hipSuccess); } extern "C" hipError_t hipSetupArgument(const void* arg, size_t size, size_t offset) { HIP_INIT_API(hipSetupArgument, arg, size, offset); PlatformState::instance().setupArgument(arg, size, offset); HIP_RETURN(hipSuccess); } extern "C" hipError_t hipLaunchByPtr(const void* hostFunction) { HIP_INIT_API(hipLaunchByPtr, hostFunction); ihipExec_t exec; PlatformState::instance().popExec(exec); hip::Stream* stream = reinterpret_cast(exec.hStream_); int deviceId = (stream != nullptr) ? stream->DeviceId() : ihipGetDevice(); if (deviceId == -1) { LogPrintfError("Wrong DeviceId: %d \n", deviceId); HIP_RETURN(hipErrorNoDevice); } hipFunction_t func = nullptr; hipError_t hip_error = PlatformState::instance().getStatFunc(&func, hostFunction, deviceId); if ((hip_error != hipSuccess) || (func == nullptr)) { LogPrintfError("Could not retrieve hostFunction: 0x%x \n", hostFunction); HIP_RETURN(hipErrorInvalidDeviceFunction); } size_t size = exec.arguments_.size(); void* extra[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, &exec.arguments_[0], HIP_LAUNCH_PARAM_BUFFER_SIZE, &size, HIP_LAUNCH_PARAM_END}; HIP_RETURN(hipModuleLaunchKernel(func, exec.gridDim_.x, exec.gridDim_.y, exec.gridDim_.z, exec.blockDim_.x, exec.blockDim_.y, exec.blockDim_.z, exec.sharedMem_, exec.hStream_, nullptr, extra)); } hipError_t hipGetSymbolAddress(void** devPtr, const void* symbol) { HIP_INIT_API(hipGetSymbolAddress, devPtr, symbol); hipError_t hip_error = hipSuccess; if (devPtr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } size_t sym_size = 0; HIP_RETURN_ONFAIL( PlatformState::instance().getStatGlobalVar(symbol, ihipGetDevice(), devPtr, &sym_size)); HIP_RETURN(hipSuccess, *devPtr); } hipError_t hipGetSymbolSize(size_t* sizePtr, const void* symbol) { HIP_INIT_API(hipGetSymbolSize, sizePtr, symbol); if (sizePtr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hipDeviceptr_t device_ptr = nullptr; HIP_RETURN_ONFAIL( PlatformState::instance().getStatGlobalVar(symbol, ihipGetDevice(), &device_ptr, sizePtr)); HIP_RETURN(hipSuccess, *sizePtr); } hipError_t ihipCreateGlobalVarObj(const char* name, hipModule_t hmod, amd::Memory** amd_mem_obj, hipDeviceptr_t* dptr, size_t* bytes) { /* Get Device Program pointer*/ amd::Program* program = as_amd(reinterpret_cast(hmod)); device::Program* dev_program = program->getDeviceProgram(*hip::getCurrentDevice()->devices()[0]); if (dev_program == nullptr) { LogPrintfError("Cannot get Device Function for module: 0x%x \n", hmod); HIP_RETURN(hipErrorInvalidDeviceFunction); } /* Find the global Symbols */ if (!dev_program->createGlobalVarObj(amd_mem_obj, dptr, bytes, name)) { LogPrintfError("Cannot create Global Var obj for symbol: %s \n", name); HIP_RETURN(hipErrorInvalidSymbol); } HIP_RETURN(hipSuccess); } namespace hip_impl { hipError_t ihipOccupancyMaxActiveBlocksPerMultiprocessor( int* maxBlocksPerCU, int* numBlocksPerGrid, int* bestBlockSize, const amd::Device& device, hipFunction_t func, int inputBlockSize, size_t dynamicSMemSize, bool bCalcPotentialBlkSz) { hip::DeviceFunc* function = hip::DeviceFunc::asFunction(func); const amd::Kernel& kernel = *function->kernel(); const device::Kernel::WorkGroupInfo* wrkGrpInfo = kernel.getDeviceKernel(device)->workGroupInfo(); if (bCalcPotentialBlkSz == false) { if (inputBlockSize <= 0) { return hipErrorInvalidValue; } *bestBlockSize = 0; // Make sure the requested block size is smaller than max supported if (inputBlockSize > int(device.info().maxWorkGroupSize_)) { *maxBlocksPerCU = 0; *numBlocksPerGrid = 0; return hipSuccess; } } else { if (inputBlockSize > int(device.info().maxWorkGroupSize_) || inputBlockSize <= 0) { // The user wrote the kernel to work with a workgroup size // bigger than this hardware can support. Or they do not care // about the size So just assume its maximum size is // constrained by hardware inputBlockSize = device.info().maxWorkGroupSize_; } } // Find wave occupancy per CU => simd_per_cu * GPR usage size_t MaxWavesPerSimd; if (device.isa().versionMajor() <= 9) { MaxWavesPerSimd = 8; // Limited by SPI 32 per CU, hence 8 per SIMD } else { MaxWavesPerSimd = 16; } size_t VgprWaves = MaxWavesPerSimd; uint32_t VgprGranularity = device.info().vgprAllocGranularity_; size_t maxVGPRs = device.info().vgprsPerSimd_; size_t wavefrontSize = wrkGrpInfo->wavefrontSize_; if (device.isa().versionMajor() >= 10) { if (wavefrontSize == 64) { maxVGPRs = maxVGPRs >> 1; VgprGranularity = VgprGranularity >> 1; } } if (wrkGrpInfo->usedVGPRs_ > 0) { VgprWaves = maxVGPRs / amd::alignUp(wrkGrpInfo->usedVGPRs_, VgprGranularity); } size_t GprWaves = VgprWaves; if (wrkGrpInfo->usedSGPRs_ > 0) { size_t maxSGPRs = device.info().sgprsPerSimd_; const size_t SgprWaves = maxSGPRs / amd::alignUp(wrkGrpInfo->usedSGPRs_, 16); GprWaves = std::min(VgprWaves, SgprWaves); } uint32_t simdPerCU = (device.isa().versionMajor() <= 9) ? device.info().simdPerCU_ : (wrkGrpInfo->isWGPMode_ ? 4 : 2); const size_t alu_occupancy = simdPerCU * std::min(MaxWavesPerSimd, GprWaves); const int alu_limited_threads = alu_occupancy * wrkGrpInfo->wavefrontSize_; int lds_occupancy_wgs = INT_MAX; const size_t total_used_lds = wrkGrpInfo->usedLDSSize_ + dynamicSMemSize; if (total_used_lds != 0) { lds_occupancy_wgs = static_cast(device.info().localMemSize_ / total_used_lds); } // Calculate how many blocks of inputBlockSize we can fit per CU // Need to align with hardware wavefront size. If they want 65 threads, but // waves are 64, then we need 128 threads per block. // So this calculates how many blocks we can fit. *maxBlocksPerCU = alu_limited_threads / amd::alignUp(inputBlockSize, wrkGrpInfo->wavefrontSize_); // Unless those blocks are further constrained by LDS size. *maxBlocksPerCU = std::min(*maxBlocksPerCU, lds_occupancy_wgs); // Some callers of this function want to return the block size, in threads, that // leads to the maximum occupancy. In that case, inputBlockSize is the maximum // workgroup size the user wants to allow, or that the hardware can allow. // It is either the number of threads that we are limited to due to occupancy, or // the maximum available block size for this kernel, which could have come from the // user. e.g., if the user indicates the maximum block size is 64 threads, but we // calculate that 128 threads can fit in each CU, we have to give up and return 64. *bestBlockSize = std::min(alu_limited_threads, amd::alignUp(inputBlockSize, wrkGrpInfo->wavefrontSize_)); // If the best block size is smaller than the block size used to fit the maximum, // then we need to make the grid bigger for full occupancy. const int bestBlocksPerCU = alu_limited_threads / (*bestBlockSize); // Unless those blocks are further constrained by LDS size. *numBlocksPerGrid = device.info().maxComputeUnits_ * std::min(bestBlocksPerCU, lds_occupancy_wgs); return hipSuccess; } } // namespace hip_impl extern "C" { hipError_t hipOccupancyMaxPotentialBlockSize(int* gridSize, int* blockSize, const void* f, size_t dynSharedMemPerBlk, int blockSizeLimit) { HIP_INIT_API(hipOccupancyMaxPotentialBlockSize, f, dynSharedMemPerBlk, blockSizeLimit); if ((gridSize == nullptr) || (blockSize == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } hipFunction_t func = nullptr; hipError_t hip_error = PlatformState::instance().getStatFunc(&func, f, ihipGetDevice()); if ((hip_error != hipSuccess) || (func == nullptr)) { HIP_RETURN(hipErrorInvalidDeviceFunction); } const amd::Device& device = *hip::getCurrentDevice()->devices()[0]; int max_blocks_per_grid = 0; int num_blocks = 0; int best_block_size = 0; hipError_t ret = hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor( &num_blocks, &max_blocks_per_grid, &best_block_size, device, func, blockSizeLimit, dynSharedMemPerBlk, true); if (ret == hipSuccess) { *blockSize = best_block_size; *gridSize = max_blocks_per_grid; } HIP_RETURN(ret); } hipError_t hipModuleOccupancyMaxPotentialBlockSize(int* gridSize, int* blockSize, hipFunction_t f, size_t dynSharedMemPerBlk, int blockSizeLimit) { HIP_INIT_API(hipModuleOccupancyMaxPotentialBlockSize, f, dynSharedMemPerBlk, blockSizeLimit); if ((gridSize == nullptr) || (blockSize == nullptr) || (f == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } const amd::Device& device = *hip::getCurrentDevice()->devices()[0]; int max_blocks_per_grid = 0; int num_blocks = 0; int best_block_size = 0; hipError_t ret = hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor( &num_blocks, &max_blocks_per_grid, &best_block_size, device, f, blockSizeLimit, dynSharedMemPerBlk, true); if (ret == hipSuccess) { *blockSize = best_block_size; *gridSize = max_blocks_per_grid; } HIP_RETURN(ret); } hipError_t hipModuleOccupancyMaxPotentialBlockSizeWithFlags(int* gridSize, int* blockSize, hipFunction_t f, size_t dynSharedMemPerBlk, int blockSizeLimit, unsigned int flags) { HIP_INIT_API(hipModuleOccupancyMaxPotentialBlockSizeWithFlags, f, dynSharedMemPerBlk, blockSizeLimit, flags); if ((gridSize == nullptr) || (blockSize == nullptr) || (f == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } if (flags != hipOccupancyDefault && flags != hipOccupancyDisableCachingOverride) { HIP_RETURN(hipErrorInvalidValue); } const amd::Device& device = *hip::getCurrentDevice()->devices()[0]; int max_blocks_per_grid = 0; int num_blocks = 0; int best_block_size = 0; hipError_t ret = hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor( &num_blocks, &max_blocks_per_grid, &best_block_size, device, f, blockSizeLimit, dynSharedMemPerBlk, true); if (ret == hipSuccess) { *blockSize = best_block_size; *gridSize = max_blocks_per_grid; } HIP_RETURN(ret); } hipError_t hipModuleOccupancyMaxActiveBlocksPerMultiprocessor(int* numBlocks, hipFunction_t f, int blockSize, size_t dynSharedMemPerBlk) { HIP_INIT_API(hipModuleOccupancyMaxActiveBlocksPerMultiprocessor, f, blockSize, dynSharedMemPerBlk); if (numBlocks == nullptr || (f == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } const amd::Device& device = *hip::getCurrentDevice()->devices()[0]; int num_blocks = 0; int max_blocks_per_grid = 0; int best_block_size = 0; hipError_t ret = hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor( &num_blocks, &max_blocks_per_grid, &best_block_size, device, f, blockSize, dynSharedMemPerBlk, false); *numBlocks = num_blocks; HIP_RETURN(ret); } hipError_t hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags( int* numBlocks, hipFunction_t f, int blockSize, size_t dynSharedMemPerBlk, unsigned int flags) { HIP_INIT_API(hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags, f, blockSize, dynSharedMemPerBlk, flags); if (numBlocks == nullptr || (f == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } if (flags != hipOccupancyDefault && flags != hipOccupancyDisableCachingOverride) { HIP_RETURN(hipErrorInvalidValue); } const amd::Device& device = *hip::getCurrentDevice()->devices()[0]; int num_blocks = 0; int max_blocks_per_grid = 0; int best_block_size = 0; hipError_t ret = hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor( &num_blocks, &max_blocks_per_grid, &best_block_size, device, f, blockSize, dynSharedMemPerBlk, false); *numBlocks = num_blocks; HIP_RETURN(ret); } hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessor(int* numBlocks, const void* f, int blockSize, size_t dynamicSMemSize) { HIP_INIT_API(hipOccupancyMaxActiveBlocksPerMultiprocessor, f, blockSize, dynamicSMemSize); if (numBlocks == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hipFunction_t func = nullptr; hipError_t hip_error = PlatformState::instance().getStatFunc(&func, f, ihipGetDevice()); if ((hip_error != hipSuccess) || (func == nullptr)) { HIP_RETURN(hipErrorInvalidDeviceFunction); } const amd::Device& device = *hip::getCurrentDevice()->devices()[0]; int num_blocks = 0; int max_blocks_per_grid = 0; int best_block_size = 0; hipError_t ret = hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor( &num_blocks, &max_blocks_per_grid, &best_block_size, device, func, blockSize, dynamicSMemSize, false); *numBlocks = num_blocks; HIP_RETURN(ret); } hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(int* numBlocks, const void* f, int blockSize, size_t dynamicSMemSize, unsigned int flags) { HIP_INIT_API(hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags, f, blockSize, dynamicSMemSize, flags); if (numBlocks == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (flags != hipOccupancyDefault && flags != hipOccupancyDisableCachingOverride) { HIP_RETURN(hipErrorInvalidValue); } hipFunction_t func = nullptr; hipError_t hip_error = PlatformState::instance().getStatFunc(&func, f, ihipGetDevice()); if ((hip_error != hipSuccess) || (func == nullptr)) { HIP_RETURN(hipErrorInvalidDeviceFunction); } const amd::Device& device = *hip::getCurrentDevice()->devices()[0]; int num_blocks = 0; int max_blocks_per_grid = 0; int best_block_size = 0; hipError_t ret = hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor( &num_blocks, &max_blocks_per_grid, &best_block_size, device, func, blockSize, dynamicSMemSize, false); *numBlocks = num_blocks; HIP_RETURN(ret); } } hipError_t ihipLaunchKernel(const void* hostFunction, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMemBytes, hipStream_t stream, hipEvent_t startEvent, hipEvent_t stopEvent, int flags) { hipFunction_t func = nullptr; int deviceId = hip::Stream::DeviceId(stream); hipError_t hip_error = PlatformState::instance().getStatFunc(&func, hostFunction, deviceId); if ((hip_error != hipSuccess) || (func == nullptr)) { if (hip_error == hipErrorSharedObjectInitFailed) { return hip_error; } else { return hipErrorInvalidDeviceFunction; } } size_t globalWorkSizeX = static_cast(gridDim.x) * blockDim.x; size_t globalWorkSizeY = static_cast(gridDim.y) * blockDim.y; size_t globalWorkSizeZ = static_cast(gridDim.z) * blockDim.z; if (globalWorkSizeX > std::numeric_limits::max() || globalWorkSizeY > std::numeric_limits::max() || globalWorkSizeZ > std::numeric_limits::max()) { return hipErrorInvalidConfiguration; } return ihipModuleLaunchKernel( func, static_cast(globalWorkSizeX), static_cast(globalWorkSizeY), static_cast(globalWorkSizeZ), blockDim.x, blockDim.y, blockDim.z, sharedMemBytes, stream, args, nullptr, startEvent, stopEvent, flags); } // conversion routines between float and half precision static inline std::uint32_t f32_as_u32(float f) { union { float f; std::uint32_t u; } v; v.f = f; return v.u; } static inline float u32_as_f32(std::uint32_t u) { union { float f; std::uint32_t u; } v; v.u = u; return v.f; } static inline int clamp_int(int i, int l, int h) { return std::min(std::max(i, l), h); } // half float, the f16 is in the low 16 bits of the input argument static inline float __convert_half_to_float(std::uint32_t a) noexcept { std::uint32_t u = ((a << 13) + 0x70000000U) & 0x8fffe000U; std::uint32_t v = f32_as_u32(u32_as_f32(u) * u32_as_f32(0x77800000U) /*0x1.0p+112f*/) + 0x38000000U; u = (a & 0x7fff) != 0 ? v : u; return u32_as_f32(u) * u32_as_f32(0x07800000U) /*0x1.0p-112f*/; } // float half with nearest even rounding // The lower 16 bits of the result is the bit pattern for the f16 static inline std::uint32_t __convert_float_to_half(float a) noexcept { std::uint32_t u = f32_as_u32(a); int e = static_cast((u >> 23) & 0xff) - 127 + 15; std::uint32_t m = ((u >> 11) & 0xffe) | ((u & 0xfff) != 0); std::uint32_t i = 0x7c00 | (m != 0 ? 0x0200 : 0); std::uint32_t n = ((std::uint32_t)e << 12) | m; std::uint32_t s = (u >> 16) & 0x8000; int b = clamp_int(1 - e, 0, 13); std::uint32_t d = (0x1000 | m) >> b; d |= (d << b) != (0x1000 | m); std::uint32_t v = e < 1 ? d : n; v = (v >> 2) + (((v & 0x7) == 3) | ((v & 0x7) > 5)); v = e > 30 ? 0x7c00 : v; v = e == 143 ? i : v; return s | v; } extern "C" #if !defined(_MSC_VER) __attribute__((weak)) #endif float __gnu_h2f_ieee(unsigned short h) { return __convert_half_to_float((std::uint32_t)h); } extern "C" #if !defined(_MSC_VER) __attribute__((weak)) #endif unsigned short __gnu_f2h_ieee(float f) { return (unsigned short)__convert_float_to_half(f); } void PlatformState::init() { amd::ScopedLock lock(lock_); if (initialized_ || g_devices.empty()) { return; } initialized_ = true; for (auto& it : statCO_.modules_) { hipError_t err = digestFatBinary(it.first, it.second); if (err == hipErrorNoBinaryForGpu) { HIP_ERROR_PRINT(err, "continue parsing remaining modules"); } else if (err != hipSuccess) { HIP_ERROR_PRINT(err); return; } } for (auto& it : statCO_.vars_) { it.second->resize_dVar(g_devices.size()); } for (auto& it : statCO_.functions_) { it.second->resize_dFunc(g_devices.size()); } } hipError_t PlatformState::loadModule(hipModule_t* module, const char* fname, const void* image) { if (module == nullptr) { return hipErrorInvalidValue; } hip::DynCO* dynCo = new hip::DynCO(); hipError_t hip_error = dynCo->loadCodeObject(fname, image); if (hip_error != hipSuccess) { delete dynCo; return hip_error; } *module = dynCo->module(); assert(*module != nullptr); amd::ScopedLock lock(lock_); if (dynCO_map_.find(*module) != dynCO_map_.end()) { delete dynCo; return hipErrorAlreadyMapped; } dynCO_map_.insert(std::make_pair(*module, dynCo)); return hipSuccess; } hipError_t PlatformState::unloadModule(hipModule_t hmod) { amd::ScopedLock lock(lock_); auto it = dynCO_map_.find(hmod); if (it == dynCO_map_.end()) { return hipErrorNotFound; } delete it->second; dynCO_map_.erase(hmod); auto tex_it = texRef_map_.begin(); while (tex_it != texRef_map_.end()) { if (tex_it->second.first == hmod) { tex_it = texRef_map_.erase(tex_it); } else { ++tex_it; } } return hipSuccess; } hipError_t PlatformState::getDynFunc(hipFunction_t* hfunc, hipModule_t hmod, const char* func_name) { amd::ScopedLock lock(lock_); auto it = dynCO_map_.find(hmod); if (it == dynCO_map_.end()) { LogPrintfError("Cannot find the module: 0x%x", hmod); return hipErrorNotFound; } if (0 == strlen(func_name)) { return hipErrorNotFound; } return it->second->getDynFunc(hfunc, func_name); } hipError_t PlatformState::getDynGlobalVar(const char* hostVar, hipModule_t hmod, hipDeviceptr_t* dev_ptr, size_t* size_ptr) { amd::ScopedLock lock(lock_); if (hostVar == nullptr || dev_ptr == nullptr || size_ptr == nullptr) { return hipErrorInvalidValue; } auto it = dynCO_map_.find(hmod); if (it == dynCO_map_.end()) { LogPrintfError("Cannot find the module: 0x%x", hmod); return hipErrorNotFound; } *dev_ptr = nullptr; IHIP_RETURN_ONFAIL(it->second->getManagedVarPointer(hostVar, dev_ptr, size_ptr)); // if dev_ptr is nullptr, hostvar is not in managed variable list if (*dev_ptr == nullptr) { hip::DeviceVar* dvar = nullptr; IHIP_RETURN_ONFAIL(it->second->getDeviceVar(&dvar, hostVar)); *dev_ptr = dvar->device_ptr(); *size_ptr = dvar->size(); } return hipSuccess; } hipError_t PlatformState::registerTexRef(textureReference* texRef, hipModule_t hmod, std::string name) { amd::ScopedLock lock(lock_); texRef_map_.insert(std::make_pair(texRef, std::make_pair(hmod, name))); return hipSuccess; } hipError_t PlatformState::getDynTexGlobalVar(textureReference* texRef, hipDeviceptr_t* dev_ptr, size_t* size_ptr) { amd::ScopedLock lock(lock_); auto tex_it = texRef_map_.find(texRef); if (tex_it == texRef_map_.end()) { LogPrintfError("Cannot find the texRef Entry: 0x%x", texRef); return hipErrorNotFound; } auto it = dynCO_map_.find(tex_it->second.first); if (it == dynCO_map_.end()) { LogPrintfError("Cannot find the module: 0x%x", tex_it->second.first); return hipErrorNotFound; } hip::DeviceVar* dvar = nullptr; IHIP_RETURN_ONFAIL(it->second->getDeviceVar(&dvar, tex_it->second.second)); *dev_ptr = dvar->device_ptr(); *size_ptr = dvar->size(); return hipSuccess; } hipError_t PlatformState::getDynTexRef(const char* hostVar, hipModule_t hmod, textureReference** texRef) { amd::ScopedLock lock(lock_); auto it = dynCO_map_.find(hmod); if (it == dynCO_map_.end()) { LogPrintfError("Cannot find the module: 0x%x", hmod); return hipErrorNotFound; } hip::DeviceVar* dvar = nullptr; IHIP_RETURN_ONFAIL(it->second->getDeviceVar(&dvar, hostVar)); if (dvar->size() != sizeof(textureReference)) { return hipErrorNotFound; // Any better way to verify texture type? } dvar->shadowVptr = new texture(); *texRef = reinterpret_cast(dvar->shadowVptr); return hipSuccess; } hipError_t PlatformState::digestFatBinary(const void* data, hip::FatBinaryInfo*& programs) { return statCO_.digestFatBinary(data, programs); } hip::FatBinaryInfo** PlatformState::addFatBinary(const void* data) { return statCO_.addFatBinary(data, initialized_); } hipError_t PlatformState::removeFatBinary(hip::FatBinaryInfo** module) { return statCO_.removeFatBinary(module); } hipError_t PlatformState::registerStatFunction(const void* hostFunction, hip::Function* func) { return statCO_.registerStatFunction(hostFunction, func); } hipError_t PlatformState::registerStatGlobalVar(const void* hostVar, hip::Var* var) { return statCO_.registerStatGlobalVar(hostVar, var); } hipError_t PlatformState::registerStatManagedVar(hip::Var* var) { return statCO_.registerStatManagedVar(var); } const char* PlatformState::getStatFuncName(const void* hostFunction) { return statCO_.getStatFuncName(hostFunction); } hipError_t PlatformState::getStatFunc(hipFunction_t* hfunc, const void* hostFunction, int deviceId) { return statCO_.getStatFunc(hfunc, hostFunction, deviceId); } hipError_t PlatformState::getStatFuncAttr(hipFuncAttributes* func_attr, const void* hostFunction, int deviceId) { if (func_attr == nullptr) { return hipErrorInvalidValue; } if (hostFunction == nullptr) { return hipErrorInvalidDeviceFunction; } return statCO_.getStatFuncAttr(func_attr, hostFunction, deviceId); } hipError_t PlatformState::getStatGlobalVar(const void* hostVar, int deviceId, hipDeviceptr_t* dev_ptr, size_t* size_ptr) { return statCO_.getStatGlobalVar(hostVar, deviceId, dev_ptr, size_ptr); } hipError_t PlatformState::initStatManagedVarDevicePtr(int deviceId) { return statCO_.initStatManagedVarDevicePtr(deviceId); } void PlatformState::setupArgument(const void* arg, size_t size, size_t offset) { auto& arguments = hip::tls.exec_stack_.top().arguments_; if (arguments.size() < offset + size) { arguments.resize(offset + size); } ::memcpy(&arguments[offset], arg, size); } void PlatformState::configureCall(dim3 gridDim, dim3 blockDim, size_t sharedMem, hipStream_t stream) { hip::tls.exec_stack_.push(ihipExec_t{gridDim, blockDim, sharedMem, stream}); } void PlatformState::popExec(ihipExec_t& exec) { exec = std::move(hip::tls.exec_stack_.top()); hip::tls.exec_stack_.pop(); } clr-rocm-5.7.1/hipamd/src/hip_platform.hpp000066400000000000000000000100731450307266000204740ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "hip_internal.hpp" #include "hip_fatbin.hpp" #include "device/device.hpp" #include "hip_code_object.hpp" namespace hip_impl { hipError_t ihipOccupancyMaxActiveBlocksPerMultiprocessor( int* maxBlocksPerCU, int* numBlocksPerGrid, int* bestBlockSize, const amd::Device& device, hipFunction_t func, int inputBlockSize, size_t dynamicSMemSize, bool bCalcPotentialBlkSz); } /* namespace hip_impl*/ class PlatformState { amd::Monitor lock_{"Guards PlatformState globals", true}; /* Singleton object */ static PlatformState* platform_; PlatformState() {} ~PlatformState() {} public: void init(); // Dynamic Code Objects functions hipError_t loadModule(hipModule_t* module, const char* fname, const void* image = nullptr); hipError_t unloadModule(hipModule_t hmod); hipError_t getDynFunc(hipFunction_t* hfunc, hipModule_t hmod, const char* func_name); hipError_t getDynGlobalVar(const char* hostVar, hipModule_t hmod, hipDeviceptr_t* dev_ptr, size_t* size_ptr); hipError_t getDynTexRef(const char* hostVar, hipModule_t hmod, textureReference** texRef); hipError_t registerTexRef(textureReference* texRef, hipModule_t hmod, std::string name); hipError_t getDynTexGlobalVar(textureReference* texRef, hipDeviceptr_t* dev_ptr, size_t* size_ptr); /* Singleton instance */ static PlatformState& instance() { if (platform_ == nullptr) { // __hipRegisterFatBinary() will call this when app starts, thus // there is no multiple entry issue here. platform_ = new PlatformState(); } return *platform_; } // Static Code Objects functions hip::FatBinaryInfo** addFatBinary(const void* data); hipError_t removeFatBinary(hip::FatBinaryInfo** module); hipError_t digestFatBinary(const void* data, hip::FatBinaryInfo*& programs); hipError_t registerStatFunction(const void* hostFunction, hip::Function* func); hipError_t registerStatGlobalVar(const void* hostVar, hip::Var* var); hipError_t registerStatManagedVar(hip::Var* var); const char* getStatFuncName(const void* hostFunction); hipError_t getStatFunc(hipFunction_t* hfunc, const void* hostFunction, int deviceId); hipError_t getStatFuncAttr(hipFuncAttributes* func_attr, const void* hostFunction, int deviceId); hipError_t getStatGlobalVar(const void* hostVar, int deviceId, hipDeviceptr_t* dev_ptr, size_t* size_ptr); hipError_t initStatManagedVarDevicePtr(int deviceId); // Exec Functions void setupArgument(const void* arg, size_t size, size_t offset); void configureCall(dim3 gridDim, dim3 blockDim, size_t sharedMem, hipStream_t stream); void popExec(ihipExec_t& exec); private: // Dynamic Code Object map, keyin module to get the corresponding object std::unordered_map dynCO_map_; hip::StatCO statCO_; // Static Code object var bool initialized_{false}; std::unordered_map> texRef_map_; }; clr-rocm-5.7.1/hipamd/src/hip_prof_api.h000066400000000000000000000070351450307266000201130ustar00rootroot00000000000000/* Copyright (c) 2019 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_SRC_HIP_PROF_API_H #define HIP_SRC_HIP_PROF_API_H #include #include #include #include #include #if USE_PROF_API #include "hip/amd_detail/hip_prof_str.h" #include "platform/prof_protocol.h" struct hip_api_trace_data_t { hip_api_data_t api_data; uint64_t phase_enter_timestamp; uint64_t phase_data; void (*phase_enter)(hip_api_id_t operation_id, hip_api_trace_data_t* data); void (*phase_exit)(hip_api_id_t operation_id, hip_api_trace_data_t* data); }; // HIP API callbacks spawner object macro #define HIP_CB_SPAWNER_OBJECT(operation_id) \ api_callbacks_spawner_t __api_tracer( \ [=](auto& api_data) { INIT_CB_ARGS_DATA(operation_id, api_data); }); template class api_callbacks_spawner_t { public: template api_callbacks_spawner_t(Functor init_cb_args_data) { static_assert(operation_id >= HIP_API_ID_FIRST && operation_id <= HIP_API_ID_LAST, "invalid HIP_API operation id"); if (auto function = activity_prof::report_activity.load(std::memory_order_relaxed); function && (enabled_ = function(ACTIVITY_DOMAIN_HIP_API, operation_id, &trace_data_) == 0)) { activity_prof::correlation_id = trace_data_.api_data.correlation_id; if (trace_data_.phase_enter != nullptr) { init_cb_args_data(trace_data_.api_data); trace_data_.phase_enter(operation_id, &trace_data_); } } } ~api_callbacks_spawner_t() { if (enabled_) { if (trace_data_.phase_exit != nullptr) trace_data_.phase_exit(operation_id, &trace_data_); activity_prof::correlation_id = 0; } } private: bool enabled_{false}; union { hip_api_trace_data_t trace_data_; }; }; template <> class api_callbacks_spawner_t { public: template api_callbacks_spawner_t(Functor) {} }; #else #define HIP_CB_SPAWNER_OBJECT(x) \ do { \ } while (false) class api_callbacks_table_t { public: bool set_activity(hip_api_id_t, activity_sync_callback_t, void*) { return false; } bool set_callback(hip_api_id_t, activity_rtapi_callback_t, void*) { return false; } }; #endif #endif // HIP_SRC_HIP_PROF_API_H clr-rocm-5.7.1/hipamd/src/hip_prof_gen.py000077500000000000000000000575571450307266000203350ustar00rootroot00000000000000#!/usr/bin/python # Copyright (c) 2019 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. import os, sys, re import CppHeaderParser import filecmp PROF_HEADER = "hip_prof_str.h" OUTPUT = PROF_HEADER REC_MAX_LEN = 1024 # Recursive sources processing recursive_mode = 0 # HIP_INIT_API macro patching hip_patch_mode = 0 # API matching types check types_check_mode = 0 # Private API check private_check_mode = 0 # Messages and errors controll verbose = 0 errexit = 0 inp_file = 'none' line_num = -1 # Verbose message def message(msg): if verbose: sys.stdout.write(msg + '\n') # Fatal error termination def error(msg): if line_num != -1: msg += ", file '" + inp_file + "', line (" + str(line_num) + ")" if errexit: msg = " Error: " + msg else: msg = " Warning: " + msg sys.stdout.write(msg + '\n') sys.stderr.write(sys.argv[0] + msg +'\n') def fatal(msg): error(msg) sys.exit(1) ############################################################# # Normalizing API name def filtr_api_name(name): name = re.sub(r'\s*$', r'', name); return name def filtr_api_decl(record): record = re.sub("\s__dparm\([^\)]*\)", r'', record); record = re.sub("\(void\*\)", r'', record); return record # Normalizing API arguments def filtr_api_args(args_str): args_str = re.sub(r'^\s*', r'', args_str); args_str = re.sub(r'\s*$', r'', args_str); args_str = re.sub(r'\s*,\s*', r',', args_str); args_str = re.sub(r'\s+', r' ', args_str); args_str = re.sub(r'\s*(\*+)\s*', r'\1 ', args_str); args_str = re.sub(r'(\benum|struct) ', '', args_str); return args_str # Normalizing types def norm_api_types(type_str): type_str = re.sub(r'uint32_t', r'unsigned int', type_str) type_str = re.sub(r'^unsigned$', r'unsigned int', type_str) return type_str # Creating a list of arguments [(type, name), ...] def list_api_args(args_str): args_str = filtr_api_args(args_str) args_list = [] if args_str != '': for arg_pair in args_str.split(','): if arg_pair == 'void': continue arg_pair = re.sub(r'\s*=\s*\S+$','', arg_pair); m = re.match("^(.*)\s(\S+)$", arg_pair); if m: arg_type = norm_api_types(m.group(1)) arg_name = m.group(2) args_list.append((arg_type, arg_name)) else: fatal("bad args: args_str: '" + args_str + "' arg_pair: '" + arg_pair + "'") return args_list; # Creating arguments string "type0, type1, ..." def filtr_api_types(args_str): args_list = list_api_args(args_str) types_str = '' for arg_tuple in args_list: types_str += arg_tuple[0] + ', ' return types_str # Creating options list [opt0, opt1, ...] def filtr_api_opts(args_str): args_list = list_api_args(args_str) opts_list = [] for arg_tuple in args_list: opts_list.append(arg_tuple[1]) return opts_list # Checking for pointer non-void arg type def pointer_ck(arg_type): ptr_type = '' m = re.match(r'(.*)\*$', arg_type) if m: ptr_type = m.group(1) n = re.match(r'(.*)\*\*$', arg_type) if not n: ptr_type = re.sub(r'const ', '', ptr_type) if ptr_type == 'void': ptr_type = '' return ptr_type ############################################################# # Parsing API header # hipError_t hipSetupArgument(const void* arg, size_t size, size_t offset); def parse_api(inp_file_p, out): global inp_file global line_num inp_file = inp_file_p beg_pattern = re.compile("^(hipError_t|const char\s*\*)\s+([^\(]+)\("); api_pattern = re.compile("^(hipError_t|const char\s*\*)\s+([^\(]+)\(([^\)]*)\)"); end_pattern = re.compile("Texture"); hidden_pattern = re.compile(r'__attribute__\(\(visibility\("hidden"\)\)\)') nms_open_pattern = re.compile(r'namespace hip_impl {') nms_close_pattern = re.compile(r'}') inp = open(inp_file, 'r') found = 0 hidden = 0 nms_level = 0; record = "" line_num = -1 for line in inp.readlines(): record += re.sub(r'^\s+', r' ', line[:-1]) line_num += 1 if len(record) > REC_MAX_LEN: fatal("bad record \"" + record + "\"") m = beg_pattern.match(line) if m: name = m.group(2) if hidden != 0: message("api: " + name + " - hidden") elif nms_level != 0: message("api: " + name + " - hip_impl") else: message("api: " + name) found = 1 if found != 0: record = re.sub("\s__dparm\([^\)]*\)", '', record); m = api_pattern.match(record) if m: found = 0 if end_pattern.search(record): continue api_name = filtr_api_name(m.group(2)) api_args = m.group(3) if not api_name in out: out[api_name] = api_args else: continue hidden = 0 if hidden_pattern.match(line): hidden = 1 if nms_open_pattern.match(line): nms_level += 1 if (nms_level > 0) and nms_close_pattern.match(line): nms_level -= 1 if nms_level < 0: fatal("nms level < 0") record = "" inp.close() line_num = -1 ############################################################# # Parsing API implementation # hipError_t hipSetupArgument(const void* arg, size_t size, size_t offset) { # HIP_INIT_API(hipSetupArgument, arg, size, offset); # inp_file - input implementation source file # api_map - input public API map [] => # out - output map [] => [opt0, opt1, ...] def parse_content(inp_file_p, api_map, out): global hip_patch_mode global types_check_mode global private_check_mode global inp_file global line_num inp_file = inp_file_p # API method begin pattern beg_pattern = re.compile("^(hipError_t|const char\s*\*)\s+[^\(]+\("); # API declaration pattern decl_pattern = re.compile("^(hipError_t|const char\s*\*)\s+([^\(]+)\(([^\)]*)\)\s*;"); # API definition pattern api_pattern = re.compile("^(hipError_t|const char\s*\*)\s+([^\(]+)\(([^\)]*)\)\s*{"); # API init macro pattern init_pattern = re.compile("(^\s*HIP_INIT_API[^\s]*\s*)\((([^,]+)(,.*|)|)(\);|,)\s*$"); # Open input file inp = open(inp_file, 'r') # API name api_name = "" # Valid public API found flag api_valid = 0 # API overload (parameters mismatch) api_overload = 0 # Input file patched content content = '' # Sub content for found API defiition sub_content = '' # Current record, accumulating several API definition related lines record = '' # Current input file line number line_num = -1 # API beginning found flag found = 0 # Reading input file for line in inp.readlines(): # Accumulating record record += re.sub(r'^\s+', r' ', line[:-1]) line_num += 1 if len(record) > REC_MAX_LEN: fatal("bad record \"" + record + "\"") break; # Looking for API begin if found == 0: record = re.sub(r'\s*extern\s+"C"\s+', r'', record); if beg_pattern.match(record): found = 1 record = filtr_api_decl(record) # Matching API declaration if found == 1: if decl_pattern.match(record): found = 0 # Matching API definition if found == 1: m = api_pattern.match(record) # Checking if complete API matched if m: found = 2 api_valid = 0 api_overload = 0 api_name = filtr_api_name(m.group(2)) # Checking if API name is in the API map if (private_check_mode == 0) or (api_name in api_map): if not api_name in api_map: api_map[api_name] = '' # Getting API arguments api_args = m.group(3) # Getting etalon arguments from the API map eta_args = api_map[api_name] if eta_args == '': eta_args = api_args api_map[api_name] = eta_args # Normalizing API arguments api_types = filtr_api_types(api_args) # Normalizing etalon arguments eta_types = filtr_api_types(eta_args) if (api_types == eta_types) or ((types_check_mode == 0) and (not api_name in out)): # API is already found and not is mismatched if (api_name in out): fatal("API redefined \"" + api_name + "\", record \"" + record + "\"") # Set valid public API found flag api_valid = 1 # Set output API map with API arguments list out[api_name] = filtr_api_opts(api_args) # Register missmatched API methods else: api_overload = 1 # Warning about mismatched API, possible non public overloaded version api_diff = '\t\t' + inp_file + " line(" + str(line_num) + ")\n\t\tapi: " + api_types + "\n\t\teta: " + eta_types message("\t" + api_name + ' args mismatch:\n' + api_diff + '\n') # API found action if found == 2: if hip_patch_mode != 0: # Looking for INIT macro m = init_pattern.match(line) if m: init_name = api_name if api_overload == 1: init_name = 'NONE' init_args = m.group(4) line = m.group(1) + '(' + init_name + init_args + m.group(5) + '\n' m = init_pattern.match(line) if m: found = 0 if api_valid == 1: message("\t" + api_name) # Ignore if it is initialized as NONE init_name = m.group(3) if init_name != 'NONE': # Check if init name matching API name # if init_name != api_name: # fatal("init name mismatch: '" + init_name + "' <> '" + api_name + "'") # Registering dummy API for non public API if the name in INIT is not NONE if api_valid == 0: # If init name is not in public API map then it is private API # else it was not identified and will be checked on finish if not init_name in api_map: if init_name in out: fatal("API reinit \"" + api_name + "\", record \"" + record + "\"") out[init_name] = [] elif re.search('}', line): found = 0 # Expect INIT macro for valid public API # Removing and registering non-conformant APIs with missing HIP_INIT macro if api_valid == 1: if api_name in out: del out[api_name] del api_map[api_name] # Registering non-conformant APIs out['.' + api_name] = 1 else: fatal("API is not in out \"" + api_name + "\", record \"" + record + "\"") if found != 1: record = "" content += line inp.close() line_num = -1 if len(out) != 0: return content else: return '' # src path walk def parse_src(api_map, src_path, src_patt, out): global recursive_mode pattern = re.compile(src_patt) src_path = re.sub(r'\s', '', src_path) for src_dir in src_path.split(':'): message("Parsing " + src_dir + " for '" + src_patt + "'") for root, dirs, files in os.walk(src_dir): for fnm in files: if pattern.search(fnm): file = root + '/' + fnm message(file) content = parse_content(file, api_map, out); if (hip_patch_mode != 0) and (content != ''): f = open(file, 'w') f.write(content) f.close() if recursive_mode == 0: break ############################################################# # Generating profiling primitives header # api_map - public API map [] => [(type, name), ...] # callback_ids - public API callback IDs list (name, callback_id) # opts_map - opts map [] => [opt0, opt1, ...] def generate_prof_header(f, api_map, callback_ids, opts_map): # Private API list priv_lst = [] f.write('// Generated file. DO NOT EDIT.\n') f.write('//\n') f.write('// This file is automatically generated by the ' + os.path.basename(__file__) + ' script.\n') f.write('// If changes are required, run the script and commit the updated file.\n\n') f.write('#ifndef _HIP_PROF_STR_H\n'); f.write('#define _HIP_PROF_STR_H\n'); f.write('#define HIP_PROF_VER 1\n') # Check for non-public API for name in sorted(opts_map.keys()): if not name in api_map: opts_lst = opts_map[name] if len(opts_lst) != 0: fatal("bad dummy API \"" + name + "\", args: " + str(opts_lst)) priv_lst.append(name) message("Private: " + name) # Generating the callbacks ID enumaration f.write('\n// HIP API callbacks ID enumeration\n') f.write('enum hip_api_id_t {\n') f.write(' HIP_API_ID_NONE = 0,\n') f.write(' HIP_API_ID_FIRST = 1,\n') cb_id_map = {} last_cb_id = 0 for name, cb_id in callback_ids: if not name in api_map: f.write(' HIP_API_ID_RESERVED_' + str(cb_id) + ' = ' + str(cb_id) + ',\n') else: f.write(' HIP_API_ID_' + name + ' = ' + str(cb_id) + ',\n') cb_id_map[name] = cb_id if cb_id > last_cb_id: last_cb_id = cb_id for name in sorted(api_map.keys()): if not name in cb_id_map: last_cb_id += 1 f.write(' HIP_API_ID_' + name + ' = ' + str(last_cb_id) + ',\n') f.write(' HIP_API_ID_LAST = ' + str(last_cb_id) + ',\n') f.write('\n') for name in sorted(priv_lst): f.write(' HIP_API_ID_' + name + ' = HIP_API_ID_NONE,\n') f.write('};\n') # Generating the method to return API name by ID f.write('\n// Return the HIP API string for a given callback ID\n') f.write('static inline const char* hip_api_name(const uint32_t id) {\n') f.write(' switch(id) {\n') for name in sorted(api_map.keys()): f.write(' case HIP_API_ID_' + name + ': return "' + name + '";\n') f.write(' };\n') f.write(' return "unknown";\n') f.write('};\n') # Generating the method for querying API ID by name f.write('\n') f.write('#include \n'); f.write('// Return the HIP API callback ID for a given name\n') f.write('static inline uint32_t hipApiIdByName(const char* name) {\n') for name in sorted(api_map.keys()): f.write(' if (strcmp("' + name + '", name) == 0) return HIP_API_ID_' + name + ';\n') f.write(' return HIP_API_ID_NONE;\n') f.write('}\n') # Generating the callbacks data structure f.write('\n// HIP API callbacks data structures\n') f.write( 'typedef struct hip_api_data_s {\n' + ' uint64_t correlation_id;\n' + ' uint32_t phase;\n' + ' union {\n' ) for name in sorted(api_map.keys()): args = api_map[name] if len(args) != 0: f.write(' struct {\n') for arg_tuple in args: arg_type = arg_tuple[0] ptr_type = pointer_ck(arg_type) arg_name = arg_tuple[1] # Checking for enum type if arg_type == "hipLimit_t": arg_type = 'enum ' + arg_type # Structuer field code f.write(' ' + arg_type + ' ' + arg_name + ';\n') if ptr_type != '': f.write(' ' + ptr_type + ' ' + arg_name + '__val;\n') f.write(' } ' + name + ';\n') f.write( ' } args;\n' + ' uint64_t *phase_data;\n' + '} hip_api_data_t;\n' ) # Generating the callbacks args data filling macros f.write('\n// HIP API callbacks args data filling macros\n') for name in sorted(api_map.keys()): args = api_map[name] f.write('// ' + name + str(args) + '\n') f.write('#define INIT_' + name + '_CB_ARGS_DATA(cb_data) { \\\n') if name in opts_map: opts_list = opts_map[name] if len(args) != len(opts_list): fatal("\"" + name + "\" API args and opts mismatch, args: " + str(args) + ", opts: " + str(opts_list)) # API args iterating: # type is args[][0] # name is args[][1] for ind in range(0, len(args)): arg_tuple = args[ind] arg_type = arg_tuple[0] ptr_type = pointer_ck(arg_type) fld_name = arg_tuple[1] opt_name = opts_list[ind] if arg_type == "const char*": f.write(' cb_data.args.' + name + '.' + fld_name + ' = (' + opt_name + ') ? strdup(' + opt_name + ') : NULL; \\\n') else: f.write(' cb_data.args.' + name + '.' + fld_name + ' = (' + arg_type + ')' + opt_name + '; \\\n') f.write('};\n') f.write('#define INIT_CB_ARGS_DATA(cb_id, cb_data) INIT_##cb_id##_CB_ARGS_DATA(cb_data)\n') # Generating macro for non-public API f.write('\n// Macros for non-public API primitives\n') for name in sorted(priv_lst): f.write('// ' + name + '()\n') f.write('#define INIT_'+ name + '_CB_ARGS_DATA(cb_data) {};\n') f.write('\n#define INIT_NONE_CB_ARGS_DATA(cb_data) {};\n') f.write('\n#if HIP_PROF_HIP_API_STRING\n') # Generating the method for the API args filling f.write('// HIP API args filling helper\n') f.write('static inline void hipApiArgsInit(hip_api_id_t id, hip_api_data_t* data) {\n') f.write(' switch (id) {\n') for name in sorted(api_map.keys()): args = api_map[name] f.write('// ' + name + str(args) + '\n') f.write(' case HIP_API_ID_' + name + ':\n') for ind in range(0, len(args)): arg_tuple = args[ind] arg_type = arg_tuple[0] ptr_type = pointer_ck(arg_type) fld_name = arg_tuple[1] var_name = 'data->args.' + name + '.' + fld_name if arg_type == "char*": f.write(' ' + var_name + ' = (' + var_name + ') ? strdup(' + var_name + ') : NULL;\n') else: if ptr_type != '': f.write(' if (' + var_name + ') ' + var_name + '__val = *(' + var_name + ');\n') f.write(' break;\n') f.write(' default: break;\n') f.write(' };\n') f.write('}\n') # Generating the method for the API string, name and parameters f.write('\n') f.write('#include \n'); f.write('#include \n'); f.write('// HIP API string method, method name and parameters\n') f.write('static inline const char* hipApiString(hip_api_id_t id, const hip_api_data_t* data) {\n') f.write(' std::ostringstream oss;\n') f.write(' switch (id) {\n') for name in sorted(api_map.keys()): args = api_map[name] f.write(' case HIP_API_ID_' + name + ':\n') f.write(' oss << "' + name + '(";\n') for ind in range(0, len(args)): arg_tuple = args[ind] arg_type = arg_tuple[0] ptr_type = pointer_ck(arg_type) arg_name = arg_tuple[1] var_name = 'data->args.' + name + '.' + arg_name delim = '' if ind == 0 else ', '; oss_stream = 'oss << "' + delim + arg_name + '=' line_shift = ' ' f.write(line_shift) if ptr_type != '': f.write('if (' + var_name + ' == NULL) ' + oss_stream + 'NULL";\n' + line_shift + 'else { ') if pointer_ck(ptr_type) != '': f.write(oss_stream + '"; roctracer::hip_support::detail::operator<<(oss, (void*)' + var_name + '__val' + '); }\n') else: f.write(oss_stream + '"; roctracer::hip_support::detail::operator<<(oss, ' + var_name + '__val' + '); }\n') else: f.write(oss_stream + '"; roctracer::hip_support::detail::operator<<(oss, ' + var_name + ');\n') f.write(' oss << ")";\n') f.write(' break;\n') f.write(' default: oss << "unknown";\n') f.write(' };\n') f.write(' return strdup(oss.str().c_str());\n') f.write('}\n') f.write('#endif // HIP_PROF_HIP_API_STRING\n') f.write('#endif // _HIP_PROF_STR_H\n'); ############################################################# # main while len(sys.argv) > 1: if not re.match(r'-', sys.argv[1]): break if (sys.argv[1] == '-v'): verbose = 1 sys.argv.pop(1) if (sys.argv[1] == '-r'): recursive_mode = 1 sys.argv.pop(1) if (sys.argv[1] == '-t'): types_check_mode = 1 sys.argv.pop(1) if (sys.argv[1] == '--priv'): private_check_mode = 1 sys.argv.pop(1) if (sys.argv[1] == '-e'): errexit = 1 sys.argv.pop(1) if (sys.argv[1] == '-p'): hip_patch_mode = 1 sys.argv.pop(1) # Usage if (len(sys.argv) < 4): fatal ("Usage: " + sys.argv[0] + " [-v] []\n" + " -v - verbose messages\n" + " -r - process source directory recursively\n" + " -t - API types matching check\n" + " --priv - private API check\n" + " -e - on error exit mode\n" + " -p - HIP_INIT_API macro patching mode\n" + "\n" + " Example:\n" + " $ " + sys.argv[0] + " -v -p -t --priv ../hip/include/hip/hip_runtime_api.h" + " ./src ./include/hip/amd_detail/hip_prof_str.h ./include/hip/amd_detail/hip_prof_str.h.new"); # API header file given as an argument src_pat = "\.cpp$" api_hfile = sys.argv[1] if not os.path.isfile(api_hfile): fatal("input file '" + api_hfile + "' not found") # Srcs directory given as an argument src_dir = sys.argv[2] if not os.path.isdir(src_dir): fatal("src directory " + src_dir + "' not found") # Current hip_prof_str include INPUT = sys.argv[3] if not os.path.isfile(INPUT): fatal("input file '" + INPUT + "' not found") if len(sys.argv) > 4: OUTPUT = sys.argv[4] # API declaration map api_map = { 'hipSetupArgument': '', 'hipMalloc3DArray': '', 'hipFuncGetAttribute': '', 'hipMemset3DAsync': '', 'hipKernelNameRef': '', 'hipStreamGetPriority': '', 'hipLaunchByPtr': '', 'hipFreeHost': '', 'hipGetErrorName': '', 'hipMemcpy3DAsync': '', 'hipMemcpyParam2DAsync': '', 'hipArray3DCreate': '', 'hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags': '', 'hipOccupancyMaxPotentialBlockSize': '', 'hipMallocManaged': '', 'hipOccupancyMaxActiveBlocksPerMultiprocessor': '', 'hipGetErrorString': '', 'hipMallocHost': '', 'hipModuleLoadDataEx': '', 'hipGetDeviceProperties': '', 'hipConfigureCall': '', 'hipHccModuleLaunchKernel': '', 'hipExtModuleLaunchKernel': '', } # API options map opts_map = {} # Parsing API header parse_api(api_hfile, api_map) # Parsing sources parse_src(api_map, src_dir, src_pat, opts_map) try: cppHeader = CppHeaderParser.CppHeader(INPUT) except CppHeaderParser.CppParseError as e: print(e) sys.exit(1) # Callback IDs api_callback_ids = [] for enum in cppHeader.enums: if enum['name'] == 'hip_api_id_t': for value in enum['values']: if value['name'] == 'HIP_API_ID_NONE' or value['name'] == 'HIP_API_ID_FIRST': continue if value['name'] == 'HIP_API_ID_LAST': break m = re.match(r'HIP_API_ID_(\S*)', value['name']) if m: api_callback_ids.append((m.group(1), value['value'])) break # Checking for non-conformant APIs with missing HIP_INIT macro for name in list(opts_map.keys()): m = re.match(r'\.(\S*)', name) if m: message("Init missing: " + m.group(1)) del opts_map[name] # Converting api map to map of lists # Checking for not found APIs not_found = 0 if len(opts_map) != 0: for name in api_map.keys(): args_str = api_map[name]; api_map[name] = list_api_args(args_str) if not name in opts_map: error("implementation not found: " + name) not_found += 1 if not_found != 0: error(str(not_found) + " API calls missing in interception layer") # The output subdirectory seems to exist or not depending on the # version of cmake. output_dir = os.path.dirname(OUTPUT) if not os.path.exists(output_dir): os.makedirs(output_dir) # Generating output header file with open(OUTPUT, 'w') as f: generate_prof_header(f, api_map, api_callback_ids, opts_map) if not filecmp.cmp(INPUT, OUTPUT): message("Warning: \"" + INPUT + "\" needs to be re-generated and checked-in with the current changes") # Successfull exit sys.exit(0) clr-rocm-5.7.1/hipamd/src/hip_profile.cpp000066400000000000000000000026341450307266000203070ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" hipError_t hipProfilerStart() { HIP_INIT_API(hipProfilerStart); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } hipError_t hipProfilerStop() { HIP_INIT_API(hipProfilerStop); assert(0 && "Unimplemented"); HIP_RETURN(hipErrorNotSupported); } clr-rocm-5.7.1/hipamd/src/hip_runtime.cpp000066400000000000000000000041651450307266000203330ustar00rootroot00000000000000/* Copyright (c) 2008 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "thread/thread.hpp" #include #include void ihipDestroyDevice(); #ifdef DEBUG static int reportHook(int reportType, char* message, int* returnValue) { if (returnValue) { *returnValue = 1; } std::cerr << message; ::exit(3); return TRUE; } #endif // DEBUG extern "C" BOOL WINAPI DllMain(HINSTANCE hinst, DWORD reason, LPVOID reserved) { switch (reason) { case DLL_PROCESS_ATTACH: #ifdef DEBUG if (!::getenv("AMD_OCL_ENABLE_MESSAGE_BOX")) { _CrtSetReportHook(reportHook); _set_error_mode(_OUT_TO_STDERR); } #endif // DEBUG break; case DLL_PROCESS_DETACH: { amd::Thread* thread = amd::Thread::current(); if (!(thread != nullptr || ((thread = new amd::HostThread()) != nullptr && thread == amd::Thread::current()))) { return true; } ihipDestroyDevice(); } break; case DLL_THREAD_DETACH: { amd::Thread* thread = amd::Thread::current(); delete thread; } break; default: break; } return true; } clr-rocm-5.7.1/hipamd/src/hip_stream.cpp000066400000000000000000000760751450307266000201540ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" #include "hip_event.hpp" #include "thread/monitor.hpp" #include "hip_prof_api.h" static amd::Monitor streamSetLock{"Guards global stream set"}; static std::unordered_set streamSet; namespace hip { // ================================================================================================ Stream::Stream(hip::Device* dev, Priority p, unsigned int f, bool null_stream, const std::vector& cuMask, hipStreamCaptureStatus captureStatus) : amd::HostQueue(*dev->asContext(), *dev->devices()[0], 0, amd::CommandQueue::RealTimeDisabled, convertToQueuePriority(p), cuMask), lock_("Stream Callback lock"), device_(dev), priority_(p), flags_(f), null_(null_stream), cuMask_(cuMask), captureStatus_(captureStatus), originStream_(false), captureID_(0) { amd::ScopedLock lock(streamSetLock); streamSet.insert(this); } // ================================================================================================ hipError_t Stream::EndCapture() { for (auto event : captureEvents_) { hip::Event* e = reinterpret_cast(event); e->SetCaptureStream(nullptr); } for (auto stream : parallelCaptureStreams_) { hip::Stream* s = reinterpret_cast(stream); hipError_t err = s->EndCapture(); assert(err == hipSuccess); } captureStatus_ = hipStreamCaptureStatusNone; pCaptureGraph_ = nullptr; originStream_ = false; parentStream_ = nullptr; lastCapturedNodes_.clear(); parallelCaptureStreams_.clear(); captureEvents_.clear(); return hipSuccess; } // ================================================================================================ bool Stream::Create() { return create(); } // ================================================================================================ void Stream::Destroy(hip::Stream* stream) { { amd::ScopedLock lock(streamSetLock); streamSet.erase(stream); } stream->release(); } // ================================================================================================ bool isValid(hipStream_t& stream) { // NULL stream is always valid if (stream == nullptr) { return true; } if (hipStreamPerThread == stream) { getStreamPerThread(stream); } hip::Stream* s = reinterpret_cast(stream); amd::ScopedLock lock(streamSetLock); if (streamSet.find(s) == streamSet.end()) { return false; } return true; } // ================================================================================================ int Stream::DeviceId() const { return device_->deviceId(); } // ================================================================================================ int Stream::DeviceId(const hipStream_t hStream) { // Copying locally into non-const variable just to get const away hipStream_t inputStream = hStream; if (!hip::isValid(inputStream)) { //return invalid device id return -1; } hip::Stream* s = reinterpret_cast(inputStream); int deviceId = (s != nullptr)? s->DeviceId() : ihipGetDevice(); assert(deviceId >= 0 && deviceId < static_cast(g_devices.size())); return deviceId; } // ================================================================================================ void Stream::SyncAllStreams(int deviceId) { // Make a local copy to avoid stalls for GPU finish with multiple threads std::vector streams; streams.reserve(streamSet.size()); { amd::ScopedLock lock(streamSetLock); for (auto it : streamSet) { if (it->DeviceId() == deviceId) { streams.push_back(it); it->retain(); } } } for (auto it : streams) { it->finish(); it->release(); } } // ================================================================================================ bool Stream::StreamCaptureBlocking() { amd::ScopedLock lock(streamSetLock); for (auto& it : streamSet) { if (it->GetCaptureStatus() == hipStreamCaptureStatusActive && it->Flags() != hipStreamNonBlocking) { return true; } } return false; } void Stream::destroyAllStreams(int deviceId) { std::vector toBeDeleted; { amd::ScopedLock lock(streamSetLock); for (auto& it : streamSet) { if (it->Null() == false && it->DeviceId() == deviceId) { toBeDeleted.push_back(it); } } } for (auto& it : toBeDeleted) { hip::Stream::Destroy(it); } } bool Stream::StreamCaptureOngoing(hipStream_t hStream) { hip::Stream* s = reinterpret_cast(hStream); // If any local thread has an ongoing or concurrent capture sequence initiated // with hipStreamCaptureModeGlobal, it is prohibited from unsafe calls if (s != nullptr && s->GetCaptureMode() == hipStreamCaptureModeGlobal) { amd::ScopedLock lock(g_captureStreamsLock); return (g_captureStreams.empty() == true && hip::tls.capture_streams_.empty()) ? false : true; } else { amd::ScopedLock lock(g_streamSetLock); return (g_allCapturingStreams.find(s) == g_allCapturingStreams.end() ? false : true); } } bool Stream::existsActiveStreamForDevice(hip::Device* device) { amd::ScopedLock lock(streamSetLock); for (const auto& active_stream : streamSet) { if ((active_stream->GetDevice() == device) && active_stream->GetQueueStatus()) { return true; } } return false; } };// hip namespace // ================================================================================================ void iHipWaitActiveStreams(hip::Stream* blocking_stream, bool wait_null_stream) { amd::Command::EventWaitList eventWaitList(0); bool submitMarker = 0; { amd::ScopedLock lock(streamSetLock); for (const auto& active_stream : streamSet) { // If it's the current device if ((&active_stream->device() == &blocking_stream->device()) && // Make sure it's a default stream ((active_stream->Flags() & hipStreamNonBlocking) == 0) && // and it's not the current stream (active_stream != blocking_stream) && // check for a wait on the null stream (active_stream->Null() == wait_null_stream)) { // Get the last valid command amd::Command* command = active_stream->getLastQueuedCommand(true); if (command != nullptr) { amd::Event& event = command->event(); // Check HW status of the ROCcrl event. // Note: not all ROCclr modes support HW status bool ready = active_stream->device().IsHwEventReady(event); if (!ready) { ready = (command->status() == CL_COMPLETE); } submitMarker |= active_stream->vdev()->isFenceDirty(); // Check the current active status if (!ready) { command->notifyCmdQueue(); eventWaitList.push_back(command); } else { command->release(); } } // Nullstream, hence there is nothing else to wait if (wait_null_stream) { break; } } } } // Check if we have to wait anything if (eventWaitList.size() > 0 || submitMarker) { amd::Command* command = new amd::Marker(*blocking_stream, kMarkerDisableFlush, eventWaitList); if (command != nullptr) { command->enqueue(); command->release(); } //Reset the dirty flag for all streams now that the marker is submitted for (const auto& stream : streamSet) { amd::HostQueue* active_queue = stream->asHostQueue(); if (active_queue->vdev()->isFenceDirty()) { active_queue->vdev()->resetFenceDirty(); } } } // Release all active commands. It's safe after the marker was enqueued for (const auto& it : eventWaitList) { it->release(); } } // ================================================================================================ void CL_CALLBACK ihipStreamCallback(cl_event event, cl_int command_exec_status, void* user_data) { StreamCallback* cbo = reinterpret_cast(user_data); cbo->callback(); delete cbo; } // ================================================================================================ static hipError_t ihipStreamCreate(hipStream_t* stream, unsigned int flags, hip::Stream::Priority priority, const std::vector& cuMask = {}) { if (flags != hipStreamDefault && flags != hipStreamNonBlocking) { return hipErrorInvalidValue; } hip::Stream* hStream = new hip::Stream(hip::getCurrentDevice(), priority, flags, false, cuMask); if (hStream == nullptr) { return hipErrorOutOfMemory; } else if (!hStream->Create()) { hip::Stream::Destroy(hStream); return hipErrorOutOfMemory; } *stream = reinterpret_cast(hStream); return hipSuccess; } // ================================================================================================ stream_per_thread::stream_per_thread() { m_streams.resize(g_devices.size()); for (auto &stream : m_streams) { stream = nullptr; } } stream_per_thread::~stream_per_thread() { for (auto &stream:m_streams) { if (stream != nullptr && hip::isValid(stream)) { hip::Stream::Destroy(reinterpret_cast(stream)); stream = nullptr; } } } hipStream_t stream_per_thread::get() { hip::Device* device = hip::getCurrentDevice(); int currDev = device->deviceId(); // This is to make sure m_streams is not empty if (m_streams.empty()) { m_streams.resize(g_devices.size()); for (auto &stream : m_streams) { stream = nullptr; } } // There is a scenario where hipResetDevice destroys stream per thread // hence isValid check is required to make sure only valid stream is used if (m_streams[currDev] == nullptr || !hip::isValid(m_streams[currDev])) { hipError_t status = ihipStreamCreate(&m_streams[currDev], hipStreamDefault, hip::Stream::Priority::Normal); if (status != hipSuccess) { DevLogError("Stream creation failed\n"); } } return m_streams[currDev]; } // ================================================================================================ void getStreamPerThread(hipStream_t& stream) { if (stream == hipStreamPerThread) { stream = hip::tls.stream_per_thread_obj_.get(); } } // ================================================================================================ hipStream_t getPerThreadDefaultStream() { // Function to get per thread default stream // More about the usecases yet to come hipStream_t stream = hipStreamPerThread; getStreamPerThread(stream); return stream; } // ================================================================================================ hipError_t hipStreamCreateWithFlags(hipStream_t *stream, unsigned int flags) { HIP_INIT_API(hipStreamCreateWithFlags, stream, flags); if (stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(ihipStreamCreate(stream, flags, hip::Stream::Priority::Normal), *stream); } // ================================================================================================ hipError_t hipStreamCreate(hipStream_t *stream) { HIP_INIT_API(hipStreamCreate, stream); if (stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(ihipStreamCreate(stream, hipStreamDefault, hip::Stream::Priority::Normal), *stream); } // ================================================================================================ hipError_t hipStreamCreateWithPriority(hipStream_t* stream, unsigned int flags, int priority) { HIP_INIT_API(hipStreamCreateWithPriority, stream, flags, priority); if (stream == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hip::Stream::Priority streamPriority; if (priority <= hip::Stream::Priority::High) { streamPriority = hip::Stream::Priority::High; } else if (priority >= hip::Stream::Priority::Low) { streamPriority = hip::Stream::Priority::Low; } else { streamPriority = hip::Stream::Priority::Normal; } HIP_RETURN(ihipStreamCreate(stream, flags, streamPriority), *stream); } // ================================================================================================ hipError_t hipDeviceGetStreamPriorityRange(int* leastPriority, int* greatestPriority) { HIP_INIT_API(hipDeviceGetStreamPriorityRange, leastPriority, greatestPriority); if (leastPriority != nullptr) { *leastPriority = hip::Stream::Priority::Low; } if (greatestPriority != nullptr) { *greatestPriority = hip::Stream::Priority::High; } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipStreamGetFlags_common(hipStream_t stream, unsigned int* flags) { if ((flags != nullptr) && (stream != nullptr)) { if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } *flags = reinterpret_cast(stream)->Flags(); } else { return hipErrorInvalidValue; } return hipSuccess; } // ================================================================================================ hipError_t hipStreamGetFlags(hipStream_t stream, unsigned int* flags) { HIP_INIT_API(hipStreamGetFlags, stream, flags); HIP_RETURN(hipStreamGetFlags_common(stream, flags)); } // ================================================================================================ hipError_t hipStreamGetFlags_spt(hipStream_t stream, unsigned int* flags) { HIP_INIT_API(hipStreamGetFlags, stream, flags); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamGetFlags_common(stream, flags)); } // ================================================================================================ hipError_t hipStreamSynchronize_common(hipStream_t stream) { if (!hip::isValid(stream)) { HIP_RETURN(hipErrorContextIsDestroyed); } if (stream != nullptr) { // If still capturing return error if (hip::Stream::StreamCaptureOngoing(stream) == true) { HIP_RETURN(hipErrorStreamCaptureUnsupported); } } // Wait for the current host queue hip::getStream(stream)->finish(); return hipSuccess; } // ================================================================================================ hipError_t hipStreamSynchronize(hipStream_t stream) { HIP_INIT_API(hipStreamSynchronize, stream); HIP_RETURN(hipStreamSynchronize_common(stream)); } // ================================================================================================ hipError_t hipStreamSynchronize_spt(hipStream_t stream) { HIP_INIT_API(hipStreamSynchronize, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamSynchronize_common(stream)); } // ================================================================================================ hipError_t hipStreamDestroy(hipStream_t stream) { HIP_INIT_API(hipStreamDestroy, stream); if (stream == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } if (stream == hipStreamPerThread) { HIP_RETURN(hipErrorInvalidResourceHandle); } if (!hip::isValid(stream)) { HIP_RETURN(hipErrorContextIsDestroyed); } hip::Stream* s = reinterpret_cast(stream); if (s->GetCaptureStatus() != hipStreamCaptureStatusNone) { if (s->GetParentStream() != nullptr) { reinterpret_cast(s->GetParentStream())->EraseParallelCaptureStream(stream); } auto error = s->EndCapture(); } s->GetDevice()->RemoveStreamFromPools(s); { amd::ScopedLock lock(g_captureStreamsLock); const auto& g_it = std::find(g_captureStreams.begin(), g_captureStreams.end(), s); if (g_it != g_captureStreams.end()) { g_captureStreams.erase(g_it); } } const auto& l_it = std::find(hip::tls.capture_streams_.begin(), hip::tls.capture_streams_.end(), s); if (l_it != hip::tls.capture_streams_.end()) { hip::tls.capture_streams_.erase(l_it); } hip::Stream::Destroy(s); HIP_RETURN(hipSuccess); } // ================================================================================================ void WaitThenDecrementSignal(hipStream_t stream, hipError_t status, void* user_data) { CallbackData* data = reinterpret_cast(user_data); int offset = data->previous_read_index % IPC_SIGNALS_PER_EVENT; while (data->shmem->read_index < data->previous_read_index + IPC_SIGNALS_PER_EVENT && data->shmem->signal[offset] != 0) { amd::Os::sleep(1); } delete data; } // ================================================================================================ hipError_t hipStreamWaitEvent_common(hipStream_t stream, hipEvent_t event, unsigned int flags) { ClPrint(amd::LOG_INFO, amd::LOG_API, "[hipGraph] current capture node StreamWaitEvent on stream : %p, Event %p", stream, event); hipError_t status = hipSuccess; if (event == nullptr || !hip::isValid(stream)) { return hipErrorInvalidHandle; } hip::Stream* waitStream = reinterpret_cast(stream); hip::Event* e = reinterpret_cast(event); hip::Stream* eventStream = reinterpret_cast(e->GetCaptureStream()); if (eventStream != nullptr && eventStream->IsEventCaptured(event) == true) { if (waitStream == nullptr) { return hipErrorInvalidHandle; } if (!waitStream->IsOriginStream()) { waitStream->SetCaptureGraph((eventStream)->GetCaptureGraph()); waitStream->SetCaptureId((eventStream)->GetCaptureID()); waitStream->SetCaptureMode((eventStream)->GetCaptureMode()); waitStream->SetParentStream(reinterpret_cast(eventStream)); eventStream->SetParallelCaptureStream(stream); } waitStream->AddCrossCapturedNode(e->GetNodesPrevToRecorded()); } else { if (flags != 0) { return hipErrorInvalidValue; } if ((eventStream != nullptr) && (eventStream->GetCaptureStatus() == hipStreamCaptureStatusActive)) { // If stream is capturing but event is not recorded on event's stream. return hipErrorStreamCaptureIsolation; } status = e->streamWait(stream, flags); } return status; } // ================================================================================================ hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags) { HIP_INIT_API(hipStreamWaitEvent, stream, event, flags); HIP_RETURN(hipStreamWaitEvent_common(stream, event, flags)); } // ================================================================================================ hipError_t hipStreamWaitEvent_spt(hipStream_t stream, hipEvent_t event, unsigned int flags) { HIP_INIT_API(hipStreamWaitEvent, stream, event, flags); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamWaitEvent_common(stream, event, flags)); } // ================================================================================================ hipError_t hipStreamQuery_common(hipStream_t stream) { if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } if (stream != nullptr) { // If still capturing return error if (hip::Stream::StreamCaptureOngoing(stream) == true) { HIP_RETURN(hipErrorStreamCaptureUnsupported); } } hip::Stream* hip_stream = hip::getStream(stream); amd::Command* command = hip_stream->getLastQueuedCommand(true); if (command == nullptr) { // Nothing was submitted to the queue return hipSuccess; } amd::Event& event = command->event(); if (command->type() != 0) { event.notifyCmdQueue(); } // Check HW status of the ROCcrl event. Note: not all ROCclr modes support HW status bool ready = command->queue()->device().IsHwEventReady(event); if (!ready) { ready = (command->status() == CL_COMPLETE); } hipError_t status = ready ? hipSuccess : hipErrorNotReady; command->release(); return status; } // ================================================================================================ hipError_t hipStreamQuery(hipStream_t stream) { HIP_INIT_API(hipStreamQuery, stream); HIP_RETURN(hipStreamQuery_common(stream)); } // ================================================================================================ hipError_t hipStreamQuery_spt(hipStream_t stream) { HIP_INIT_API(hipStreamQuery, stream); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamQuery_common(stream)); } hipError_t streamCallback_common(hipStream_t stream, StreamCallback* cbo, void* userData) { if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } hip::Stream* hip_stream = hip::getStream(stream); amd::Command* last_command = hip_stream->getLastQueuedCommand(true); amd::Command::EventWaitList eventWaitList; if (last_command != nullptr) { eventWaitList.push_back(last_command); } amd::Command* command = new amd::Marker(*hip_stream, !kMarkerDisableFlush, eventWaitList); if (command == nullptr) { return hipErrorInvalidValue; } if ((cbo == nullptr) || !command->setCallback(CL_COMPLETE, ihipStreamCallback, cbo)) { command->release(); if (last_command != nullptr) { last_command->release(); } return hipErrorInvalidHandle; } command->enqueue(); // @note: don't release the command here, because it will be released after HIP callback if (last_command != nullptr) { last_command->release(); } // Extra marker is required for HW event check, which is done before the callback is finished. // Add the new barrier to stall the stream, until the callback is done eventWaitList.clear(); eventWaitList.push_back(command); amd::Command* block_command = new amd::Marker(*hip_stream, !kMarkerDisableFlush, eventWaitList); if (block_command == nullptr) { return hipErrorInvalidValue; } block_command->enqueue(); block_command->release(); // Release the callback marker command->release(); // Notify the command queue about a possible waiter for the calback block_command->notifyCmdQueue(); return hipSuccess; } // ================================================================================================ hipError_t hipStreamAddCallback_common(hipStream_t stream, hipStreamCallback_t callback, void* userData, unsigned int flags) { // flags - Reserved for future use, must be 0 if (callback == nullptr || flags != 0) { return hipErrorInvalidValue; } StreamCallback* cbo = new StreamAddCallback(stream, callback, userData); return streamCallback_common(stream, cbo, userData); } // ================================================================================================ hipError_t hipStreamAddCallback(hipStream_t stream, hipStreamCallback_t callback, void* userData, unsigned int flags) { HIP_INIT_API(hipStreamAddCallback, stream, callback, userData, flags); HIP_RETURN(hipStreamAddCallback_common(stream, callback, userData, flags)); } // ================================================================================================ hipError_t hipStreamAddCallback_spt(hipStream_t stream, hipStreamCallback_t callback, void* userData, unsigned int flags) { HIP_INIT_API(hipStreamAddCallback, stream, callback, userData, flags); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamAddCallback_common(stream, callback, userData, flags)); } // ================================================================================================ hipError_t hipLaunchHostFunc_common(hipStream_t stream, hipHostFn_t fn, void* userData) { STREAM_CAPTURE(hipLaunchHostFunc, stream, fn, userData); if (fn == nullptr) { return hipErrorInvalidValue; } StreamCallback* cbo = new LaunchHostFuncCallback(fn, userData); return streamCallback_common(stream, cbo, userData); } // ================================================================================================ hipError_t hipLaunchHostFunc_spt(hipStream_t stream, hipHostFn_t fn, void* userData) { HIP_INIT_API(hipLaunchHostFunc, stream, fn, userData); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipLaunchHostFunc_common(stream, fn, userData)); } // ================================================================================================ hipError_t hipLaunchHostFunc(hipStream_t stream, hipHostFn_t fn, void* userData) { HIP_INIT_API(hipLaunchHostFunc, stream, fn, userData); if (stream == nullptr && (hip::Stream::StreamCaptureOngoing(stream) == true)) { HIP_RETURN(hipErrorStreamCaptureImplicit); } HIP_RETURN(hipLaunchHostFunc_common(stream, fn, userData)); } // ================================================================================================ hipError_t hipExtStreamCreateWithCUMask(hipStream_t* stream, uint32_t cuMaskSize, const uint32_t* cuMask) { HIP_INIT_API(hipExtStreamCreateWithCUMask, stream, cuMaskSize, cuMask); if (stream == nullptr) { HIP_RETURN(hipErrorInvalidHandle); } if (cuMaskSize == 0 || cuMask == nullptr) { HIP_RETURN(hipErrorInvalidValue); } const std::vector cuMaskv(cuMask, cuMask + cuMaskSize); HIP_RETURN(ihipStreamCreate(stream, hipStreamDefault, hip::Stream::Priority::Normal, cuMaskv), *stream); } // ================================================================================================ hipError_t hipStreamGetPriority_common(hipStream_t stream, int* priority) { if ((priority != nullptr) && (stream == nullptr)) { *priority = 0; return hipSuccess; } if ((priority != nullptr) && (stream != nullptr)) { if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } *priority = static_cast(reinterpret_cast(stream)->GetPriority()); } else { return hipErrorInvalidValue; } return hipSuccess; } // ================================================================================================ hipError_t hipStreamGetPriority(hipStream_t stream, int* priority) { HIP_INIT_API(hipStreamGetPriority, stream, priority); HIP_RETURN(hipStreamGetPriority_common(stream, priority)); } // ================================================================================================ hipError_t hipStreamGetPriority_spt(hipStream_t stream, int* priority) { HIP_INIT_API(hipStreamGetPriority, stream, priority); PER_THREAD_DEFAULT_STREAM(stream); HIP_RETURN(hipStreamGetPriority_common(stream, priority)); } // ================================================================================================ hipError_t hipExtStreamGetCUMask(hipStream_t stream, uint32_t cuMaskSize, uint32_t* cuMask) { HIP_INIT_API(hipExtStreamGetCUMask, stream, cuMaskSize, cuMask); if (cuMask == nullptr) { HIP_RETURN(hipErrorInvalidValue); } int deviceId = hip::getCurrentDevice()->deviceId(); auto* deviceHandle = g_devices[deviceId]->devices()[0]; const auto& info = deviceHandle->info(); // find the minimum cuMaskSize required to present the CU mask bit-array in a patch of 32 bits // and return error if the cuMaskSize argument is less than cuMaskSizeRequired uint32_t cuMaskSizeRequired = info.maxComputeUnits_ / 32 + ((info.maxComputeUnits_ % 32) ? 1 : 0); if (cuMaskSize < cuMaskSizeRequired) { HIP_RETURN(hipErrorInvalidValue); } // make a default CU mask bit-array where all CUs are active // this default mask will be returned when there is no // custom or global CU mask defined std::vector defaultCUMask; uint32_t temp = 0; uint32_t bit_index = 0; for (uint32_t i = 0; i < info.maxComputeUnits_; i++) { temp |= 1UL << bit_index; if (bit_index >= 32) { defaultCUMask.push_back(temp); temp = 0; bit_index = 0; temp |= 1UL << bit_index; } bit_index += 1; } if (bit_index != 0) { defaultCUMask.push_back(temp); } // if the stream is null then either return globalCUMask_ (if it is defined) // or return defaultCUMask if (stream == nullptr || stream == hipStreamPerThread) { if (info.globalCUMask_.size() != 0) { std::copy(info.globalCUMask_.begin(), info.globalCUMask_.end(), cuMask); } else { std::copy(defaultCUMask.begin(), defaultCUMask.end(), cuMask); } } else { // if the stream is not null then get the stream's CU mask and return one of the below cases // case1 if globalCUMask_ is defined then return the AND of globalCUMask_ and stream's CU mask // case2 if globalCUMask_ is not defined then retuen AND of defaultCUMask and stream's CU mask // in both cases above if stream's CU mask is empty then either globalCUMask_ (for case1) // or defaultCUMask(for case2) will be returned std::vector streamCUMask; streamCUMask = reinterpret_cast(stream)->GetCUMask(); std::vector mask = {}; if (info.globalCUMask_.size() != 0) { for (uint32_t i = 0; i < std::min(streamCUMask.size(), info.globalCUMask_.size()); i++) { mask.push_back(streamCUMask[i] & info.globalCUMask_[i]); } } else { for (uint32_t i = 0; i < std::min(streamCUMask.size(), defaultCUMask.size()); i++) { mask.push_back(streamCUMask[i] & defaultCUMask[i]); } // check to make sure after ANDing streamCUMask (custom-defined) with global CU mask, //we have non-zero mask, oterwise just return either globalCUMask_ or defaultCUMask bool zeroCUMask = true; for (auto m : mask) { if (m != 0) { zeroCUMask = false; break; } } if (zeroCUMask) { mask = (info.globalCUMask_.size() != 0) ? info.globalCUMask_ : defaultCUMask; } std::copy(mask.begin(), mask.end(), cuMask); } } HIP_RETURN(hipSuccess); } // ================================================================================================ hipError_t hipStreamGetDevice(hipStream_t stream, hipDevice_t* device) { HIP_INIT_API(hipStreamGetDevice, stream, device); if (device == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if (!hip::isValid(stream)) { HIP_RETURN(hipErrorContextIsDestroyed); } if (stream == nullptr) { // handle null stream // null stream is associated with current device, return the device id associated with the // current device *device = hip::getCurrentDevice()->deviceId(); } else { getStreamPerThread(stream); *device = reinterpret_cast(stream)->DeviceId(); } HIP_RETURN(hipSuccess); } clr-rocm-5.7.1/hipamd/src/hip_stream_ops.cpp000066400000000000000000000114251450307266000210210ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" #include "platform/command_utils.hpp" hipError_t ihipStreamOperation(hipStream_t stream, cl_command_type cmdType, void* ptr, uint64_t value, uint64_t mask, unsigned int flags, size_t sizeBytes) { size_t offset = 0; unsigned int outFlags = 0; if (ptr == nullptr) { return hipErrorInvalidValue; } if (!hip::isValid(stream)) { return hipErrorContextIsDestroyed; } amd::Memory* memory = getMemoryObject(ptr, offset); if (!memory) { return hipErrorInvalidValue; } // NOTE: 'mask' is only used in Wait operation, 'sizeBytes' is only used in Write operation // 'flags' for now used only for Wait, but in future there will usecases for Write too. if (cmdType == ROCCLR_COMMAND_STREAM_WAIT_VALUE) { // Stream Wait on AQL barrier-value type packet is only supported on SignalMemory objects if (GPU_STREAMOPS_CP_WAIT && (!(memory->getMemFlags() & ROCCLR_MEM_HSA_SIGNAL_MEMORY))) { return hipErrorInvalidValue; } switch (flags) { case hipStreamWaitValueGte: outFlags = ROCCLR_STREAM_WAIT_VALUE_GTE; break; case hipStreamWaitValueEq: outFlags = ROCCLR_STREAM_WAIT_VALUE_EQ; break; case hipStreamWaitValueAnd: outFlags = ROCCLR_STREAM_WAIT_VALUE_AND; break; case hipStreamWaitValueNor: outFlags = ROCCLR_STREAM_WAIT_VALUE_NOR; break; default: return hipErrorInvalidValue; break; } } else if (cmdType != ROCCLR_COMMAND_STREAM_WRITE_VALUE) { return hipErrorInvalidValue; } hip::Stream* hip_stream = hip::getStream(stream); amd::Command::EventWaitList waitList; amd::StreamOperationCommand* command = new amd::StreamOperationCommand(*hip_stream, cmdType, waitList, *memory->asBuffer(), value, mask, outFlags, offset, sizeBytes); if (command == nullptr) { return hipErrorOutOfMemory; } command->enqueue(); command->release(); return hipSuccess; } hipError_t hipStreamWaitValue32(hipStream_t stream, void* ptr, uint32_t value, unsigned int flags, uint32_t mask) { HIP_INIT_API(hipStreamWaitValue32, stream, ptr, value, mask, flags); // NOTE: ptr corresponds to a HSA Signal memeory which is 64 bits. // 32 bit value and mask are converted to 64-bit values. HIP_RETURN_DURATION(ihipStreamOperation( stream, ROCCLR_COMMAND_STREAM_WAIT_VALUE, ptr, value, mask, flags, sizeof(uint32_t))); } hipError_t hipStreamWaitValue64(hipStream_t stream, void* ptr, uint64_t value, unsigned int flags, uint64_t mask) { HIP_INIT_API(hipStreamWaitValue64, stream, ptr, value, mask, flags); HIP_RETURN_DURATION(ihipStreamOperation( stream, ROCCLR_COMMAND_STREAM_WAIT_VALUE, ptr, value, mask, flags, sizeof(uint64_t))); } hipError_t hipStreamWriteValue32(hipStream_t stream, void* ptr, uint32_t value, unsigned int flags) { HIP_INIT_API(hipStreamWriteValue32, stream, ptr, value, flags); HIP_RETURN_DURATION(ihipStreamOperation( stream, ROCCLR_COMMAND_STREAM_WRITE_VALUE, ptr, value, 0, // mask un-used set it to 0 0, // flags un-used for now set it to 0 sizeof(uint32_t))); } hipError_t hipStreamWriteValue64(hipStream_t stream, void* ptr, uint64_t value, unsigned int flags) { HIP_INIT_API(hipStreamWriteValue64, stream, ptr, value, flags); HIP_RETURN_DURATION(ihipStreamOperation( stream, ROCCLR_COMMAND_STREAM_WRITE_VALUE, ptr, value, 0, // mask un-used set it to 0 0, // flags un-used for now set it to 0 sizeof(uint64_t))); } clr-rocm-5.7.1/hipamd/src/hip_surface.cpp000066400000000000000000000071261450307266000203000ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" #include hipError_t ihipFree(void* ptr); struct __hip_surface { uint32_t imageSRD[HIP_IMAGE_OBJECT_SIZE_DWORD]; amd::Image* image; hipResourceDesc resDesc; __hip_surface(amd::Image* image_, const hipResourceDesc& resDesc_) : image(image_), resDesc(resDesc_) { amd::Context& context = *hip::getCurrentDevice()->asContext(); amd::Device& device = *context.devices()[0]; device::Memory* imageMem = image->getDeviceMemory(device); std::memcpy(imageSRD, imageMem->cpuSrd(), sizeof(imageSRD)); } }; hipError_t ihipCreateSurfaceObject(hipSurfaceObject_t* pSurfObject, const hipResourceDesc* pResDesc) { amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); return hipErrorNotSupported; } // Validate input params if (pSurfObject == nullptr || pResDesc == nullptr) { return hipErrorInvalidValue; } // the type of resource must be a HIP array // hipResourceDesc::res::array::array must be set to a valid HIP array handle. if ((pResDesc->resType != hipResourceTypeArray) || (pResDesc->res.array.array == nullptr)) { return hipErrorInvalidValue; } amd::Image* image = nullptr; cl_mem memObj = reinterpret_cast(pResDesc->res.array.array->data); if (!is_valid(memObj)) { return hipErrorInvalidValue; } image = as_amd(memObj)->asImage(); void* surfObjectBuffer = nullptr; hipError_t err = ihipMalloc(&surfObjectBuffer, sizeof(__hip_surface), CL_MEM_SVM_FINE_GRAIN_BUFFER); if (surfObjectBuffer == nullptr || err != hipSuccess) { return hipErrorOutOfMemory; } *pSurfObject = new (surfObjectBuffer) __hip_surface{image, *pResDesc}; return hipSuccess; } hipError_t hipCreateSurfaceObject(hipSurfaceObject_t* pSurfObject, const hipResourceDesc* pResDesc) { HIP_INIT_API(hipCreateSurfaceObject, pSurfObject, pResDesc); HIP_RETURN(ihipCreateSurfaceObject(pSurfObject, pResDesc)); } hipError_t ihipDestroySurfaceObject(hipSurfaceObject_t surfaceObject) { if (surfaceObject == nullptr) { return hipSuccess; } return ihipFree(surfaceObject); } hipError_t hipDestroySurfaceObject(hipSurfaceObject_t surfaceObject) { HIP_INIT_API(hipDestroySurfaceObject, surfaceObject); HIP_RETURN(ihipDestroySurfaceObject(surfaceObject)); } clr-rocm-5.7.1/hipamd/src/hip_texture.cpp000066400000000000000000001636551450307266000203620ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include "hip_internal.hpp" #include "hip_platform.hpp" #include "hip_conversions.hpp" #include "platform/sampler.hpp" hipError_t ihipFree(void* ptr); struct __hip_texture { uint32_t imageSRD[HIP_IMAGE_OBJECT_SIZE_DWORD]; uint32_t samplerSRD[HIP_SAMPLER_OBJECT_SIZE_DWORD]; amd::Image* image; amd::Sampler* sampler; hipResourceDesc resDesc; hipTextureDesc texDesc; hipResourceViewDesc resViewDesc; __hip_texture(amd::Image* image_, amd::Sampler* sampler_, const hipResourceDesc& resDesc_, const hipTextureDesc& texDesc_, const hipResourceViewDesc& resViewDesc_) : image(image_), sampler(sampler_), resDesc(resDesc_), texDesc(texDesc_), resViewDesc(resViewDesc_) { amd::Context& context = *hip::getCurrentDevice()->asContext(); amd::Device& device = *context.devices()[0]; device::Memory* imageMem = image->getDeviceMemory(device); std::memcpy(imageSRD, imageMem->cpuSrd(), sizeof(imageSRD)); device::Sampler* samplerMem = sampler->getDeviceSampler(device); std::memcpy(samplerSRD, samplerMem->hwState(), sizeof(samplerSRD)); } }; amd::Image* ihipImageCreate(const cl_channel_order channelOrder, const cl_channel_type channelType, const cl_mem_object_type imageType, const size_t imageWidth, const size_t imageHeight, const size_t imageDepth, const size_t imageArraySize, const size_t imageRowPitch, const size_t imageSlicePitch, const uint32_t numMipLevels, amd::Memory* buffer, hipError_t& status); hipError_t ihipCreateTextureObject(hipTextureObject_t* pTexObject, const hipResourceDesc* pResDesc, const hipTextureDesc* pTexDesc, const hipResourceViewDesc* pResViewDesc) { amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); return hipErrorNotSupported; } // Validate input params if (pTexObject == nullptr || pResDesc == nullptr || pTexDesc == nullptr) { return hipErrorInvalidValue; } // pResViewDesc can only be specified if the type of resource is a HIP array or a HIP mipmapped array. if ((pResViewDesc != nullptr) && ((pResDesc->resType != hipResourceTypeArray) && (pResDesc->resType != hipResourceTypeMipmappedArray))) { return hipErrorInvalidValue; } // If hipResourceDesc::resType is set to hipResourceTypeArray, if (pResDesc->resType == hipResourceTypeArray) { // hipResourceDesc::res::array::array must be set to a valid HIP array handle. if (pResDesc->res.array.array == nullptr) { return hipErrorInvalidValue; } else if (pResDesc->res.array.array->depth > 0 && pTexDesc->filterMode == hipFilterModeLinear && !strncmp(info.name_, "gfx90a", strlen("gfx90a"))) { LogPrintfInfo("%s doesn't support 3D linear filter!", info.name_); return hipErrorNotSupported; } } // If hipResourceDesc::resType is set to hipResourceTypeMipmappedArray, // hipResourceDesc::res::mipmap::mipmap must be set to a valid HIP mipmapped array handle // and hipTextureDesc::normalizedCoords must be set to true. if ((pResDesc->resType == hipResourceTypeMipmappedArray) && ((pResDesc->res.mipmap.mipmap == nullptr) || (pTexDesc->normalizedCoords == 0))) { return hipErrorInvalidValue; } // If hipResourceDesc::resType is set to hipResourceTypeLinear, // hipResourceDesc::res::linear::devPtr must be set to a valid device pointer, that is aligned to hipDeviceProp::textureAlignment. // The total number of elements in the linear address range cannot exceed hipDeviceProp::maxTexture1DLinear. if ((pResDesc->resType == hipResourceTypeLinear) && ((pResDesc->res.linear.devPtr == nullptr) || (!amd::isMultipleOf(pResDesc->res.linear.devPtr, info.imageBaseAddressAlignment_)) || ((pResDesc->res.linear.sizeInBytes / hip::getElementSize(pResDesc->res.linear.desc)) >= info.imageMaxBufferSize_))) { return hipErrorInvalidValue; } // If hipResourceDesc::resType is set to hipResourceTypePitch2D, // hipResourceDesc::res::pitch2D::devPtr must be set to a valid device pointer, that is aligned to hipDeviceProp::textureAlignment. // hipResourceDesc::res::pitch2D::width and hipResourceDesc::res::pitch2D::height specify the width and height of the array in elements, // and cannot exceed hipDeviceProp::maxTexture2DLinear[0] and hipDeviceProp::maxTexture2DLinear[1] respectively. // hipResourceDesc::res::pitch2D::pitchInBytes specifies the pitch between two rows in bytes and has to be aligned to hipDeviceProp::texturePitchAlignment. // Pitch cannot exceed hipDeviceProp::maxTexture2DLinear[2]. if ((pResDesc->resType == hipResourceTypePitch2D) && ((pResDesc->res.pitch2D.devPtr == nullptr) || (!amd::isMultipleOf(pResDesc->res.pitch2D.devPtr, info.imageBaseAddressAlignment_)) || (pResDesc->res.pitch2D.width >= info.image2DMaxWidth_) || (pResDesc->res.pitch2D.height >= info.image2DMaxHeight_) || (!amd::isMultipleOf(pResDesc->res.pitch2D.pitchInBytes, info.imagePitchAlignment_)))) { // TODO check pitch limits. return hipErrorInvalidValue; } // We don't program the max_ansio_ratio field in the the HW sampler SRD. if (pTexDesc->maxAnisotropy != 0) { return hipErrorNotSupported; } // We don't program the lod_bias field in the HW sampler SRD. if (pTexDesc->mipmapLevelBias != 0) { LogError("mipmapLevelBias not supported!"); return hipErrorNotSupported; } // We don't program the min_lod field in the HW sampler SRD. if (pTexDesc->minMipmapLevelClamp != 0) { LogInfo("minMipmapLevelClamp ignored!"); } // We don't program the max_lod field in the HW sampler SRD. if (pTexDesc->maxMipmapLevelClamp != 0) { LogInfo("maxMipmapLevelClamp ignored!"); } // TODO ROCclr assumes all dimensions have the same addressing mode. cl_addressing_mode addressMode = CL_ADDRESS_NONE; // If hipTextureDesc::normalizedCoords is set to zero, // hipAddressModeWrap and hipAddressModeMirror won't be supported // and will be switched to hipAddressModeClamp. if ((pTexDesc->normalizedCoords == 0) && ((pTexDesc->addressMode[0] == hipAddressModeWrap) || (pTexDesc->addressMode[0] == hipAddressModeMirror))) { addressMode = hip::getCLAddressingMode(hipAddressModeClamp); } // hipTextureDesc::addressMode is ignored if hipResourceDesc::resType is hipResourceTypeLinear else if (pResDesc->resType != hipResourceTypeLinear) { addressMode = hip::getCLAddressingMode(pTexDesc->addressMode[0]); } #ifndef CL_FILTER_NONE #define CL_FILTER_NONE 0x1142 #endif cl_filter_mode filterMode = CL_FILTER_NONE; cl_filter_mode mipFilterMode = CL_FILTER_NONE; #undef CL_FILTER_NONE // hipTextureDesc::filterMode is ignored if hipResourceDesc::resType is hipResourceTypeLinear. if (pResDesc->resType != hipResourceTypeLinear) { filterMode = hip::getCLFilterMode(pTexDesc->filterMode); } if (pResDesc->resType == hipResourceTypeMipmappedArray) { mipFilterMode = hip::getCLFilterMode(pTexDesc->mipmapFilterMode); } amd::Sampler* sampler = new amd::Sampler(*hip::getCurrentDevice()->asContext(), pTexDesc->normalizedCoords, addressMode, filterMode, mipFilterMode, pTexDesc->minMipmapLevelClamp, pTexDesc->maxMipmapLevelClamp); if (sampler == nullptr) { return hipErrorOutOfMemory; } if (!sampler->create()) { delete sampler; return hipErrorOutOfMemory; } amd::Image* image = nullptr; switch (pResDesc->resType) { case hipResourceTypeArray: { cl_mem memObj = reinterpret_cast(pResDesc->res.array.array->data); if (!is_valid(memObj)) { return hipErrorInvalidValue; } image = as_amd(memObj)->asImage(); hipTextureReadMode readMode = pTexDesc->readMode; // 32-bit integer format will not be promoted, regardless of whether or not // this hipTextureDesc::readMode is set hipReadModeNormalizedFloat is specified. if ((pResDesc->res.array.array->Format == HIP_AD_FORMAT_SIGNED_INT32) || (pResDesc->res.array.array->Format == HIP_AD_FORMAT_UNSIGNED_INT32)) { readMode = hipReadModeElementType; } // We need to create an image view if the user requested to use normalized pixel values, // due to already having the image created with a different format. if ((pResViewDesc != nullptr) || (readMode == hipReadModeNormalizedFloat) || (pTexDesc->sRGB == 1)) { // TODO ROCclr currently right now can only change the format of the image. const cl_channel_order channelOrder = (pResViewDesc != nullptr) ? hip::getCLChannelOrder(hip::getNumChannels(pResViewDesc->format), pTexDesc->sRGB) : hip::getCLChannelOrder(pResDesc->res.array.array->NumChannels, pTexDesc->sRGB); const cl_channel_type channelType = (pResViewDesc != nullptr) ? hip::getCLChannelType(hip::getArrayFormat(pResViewDesc->format), readMode) : hip::getCLChannelType(pResDesc->res.array.array->Format, readMode); const amd::Image::Format imageFormat(cl_image_format{channelOrder, channelType}); if (!imageFormat.isValid()) { return hipErrorInvalidValue; } image = image->createView(*hip::getCurrentDevice()->asContext(), imageFormat, nullptr); if (image == nullptr) { return hipErrorInvalidValue; } } else if (image->parent()) { image->retain(); // Because it will be released as a view in ihipDestroyTextureObject() } break; } case hipResourceTypeMipmappedArray: { cl_mem memObj = reinterpret_cast(pResDesc->res.array.array->data); if (!is_valid(memObj)) { return hipErrorInvalidValue; } image = as_amd(memObj)->asImage(); hipTextureReadMode readMode = pTexDesc->readMode; // 32-bit integer format will not be promoted, regardless of whether or not // this hipTextureDesc::readMode is set hipReadModeNormalizedFloat is specified. if ((pResDesc->res.array.array->Format == HIP_AD_FORMAT_SIGNED_INT32) || (pResDesc->res.array.array->Format == HIP_AD_FORMAT_UNSIGNED_INT32)) { readMode = hipReadModeElementType; } // We need to create an image view if the user requested to use normalized pixel values, // due to already having the image created with a different format. if ((pResViewDesc != nullptr) || (readMode == hipReadModeNormalizedFloat) || (pTexDesc->sRGB == 1)) { // TODO ROCclr currently right now can only change the format of the image. const cl_channel_order channelOrder = (pResViewDesc != nullptr) ? hip::getCLChannelOrder(hip::getNumChannels(pResViewDesc->format), pTexDesc->sRGB) : hip::getCLChannelOrder(pResDesc->res.mipmap.mipmap->num_channels, pTexDesc->sRGB); const cl_channel_type channelType = (pResViewDesc != nullptr) ? hip::getCLChannelType(hip::getArrayFormat(pResViewDesc->format), readMode) : hip::getCLChannelType(pResDesc->res.mipmap.mipmap->format, readMode); const amd::Image::Format imageFormat(cl_image_format{channelOrder, channelType}); if (!imageFormat.isValid()) { return hipErrorInvalidValue; } image = image->createView(*hip::getCurrentDevice()->asContext(), imageFormat, nullptr, 0, 0, true); if (image == nullptr) { return hipErrorInvalidValue; } } break; } case hipResourceTypeLinear: { const cl_channel_order channelOrder = hip::getCLChannelOrder(hip::getNumChannels(pResDesc->res.linear.desc), pTexDesc->sRGB); const cl_channel_type channelType = hip::getCLChannelType(hip::getArrayFormat(pResDesc->res.linear.desc), pTexDesc->readMode); const amd::Image::Format imageFormat({channelOrder, channelType}); const cl_mem_object_type imageType = hip::getCLMemObjectType(pResDesc->resType); const size_t imageSizeInBytes = pResDesc->res.linear.sizeInBytes; amd::Memory* buffer = getMemoryObjectWithOffset(pResDesc->res.linear.devPtr, imageSizeInBytes); hipError_t status = hipSuccess; image = ihipImageCreate(channelOrder, channelType, imageType, imageSizeInBytes / imageFormat.getElementSize(), /* imageWidth */ 0, /* imageHeight */ 0, /* imageDepth */ 0, /* imageArraySize */ 0, /* imageRowPitch */ 0, /* imageSlicePitch */ 0, /* numMipLevels */ buffer, status); buffer->release(); if (image == nullptr) { return status; } break; } case hipResourceTypePitch2D: { const cl_channel_order channelOrder = hip::getCLChannelOrder(hip::getNumChannels(pResDesc->res.pitch2D.desc), pTexDesc->sRGB); const cl_channel_type channelType = hip::getCLChannelType(hip::getArrayFormat(pResDesc->res.pitch2D.desc), pTexDesc->readMode); const amd::Image::Format imageFormat({channelOrder, channelType}); const cl_mem_object_type imageType = hip::getCLMemObjectType(pResDesc->resType); const size_t imageSizeInBytes = pResDesc->res.pitch2D.width * imageFormat.getElementSize() + pResDesc->res.pitch2D.pitchInBytes * (pResDesc->res.pitch2D.height - 1); amd::Memory* buffer = getMemoryObjectWithOffset(pResDesc->res.pitch2D.devPtr, imageSizeInBytes); hipError_t status = hipSuccess; image = ihipImageCreate(channelOrder, channelType, imageType, pResDesc->res.pitch2D.width, /* imageWidth */ pResDesc->res.pitch2D.height, /* imageHeight */ 0, /* imageDepth */ 0, /* imageArraySize */ pResDesc->res.pitch2D.pitchInBytes, /* imageRowPitch */ 0, /* imageSlicePitch */ 0, /* numMipLevels */ buffer, status); if (buffer != nullptr) { buffer->release(); } if (image == nullptr) { return status; } break; } } void *texObjectBuffer = nullptr; hipError_t err = ihipMalloc(&texObjectBuffer, sizeof(__hip_texture), CL_MEM_SVM_FINE_GRAIN_BUFFER); if (texObjectBuffer == nullptr || err != hipSuccess) { return hipErrorOutOfMemory; } *pTexObject = new (texObjectBuffer) __hip_texture{image, sampler, *pResDesc, *pTexDesc, (pResViewDesc != nullptr) ? *pResViewDesc : hipResourceViewDesc{}}; return hipSuccess; } hipError_t hipCreateTextureObject(hipTextureObject_t* pTexObject, const hipResourceDesc* pResDesc, const hipTextureDesc* pTexDesc, const hipResourceViewDesc* pResViewDesc) { HIP_INIT_API(hipCreateTextureObject, pTexObject, pResDesc, pTexDesc, pResViewDesc); HIP_RETURN(ihipCreateTextureObject(pTexObject, pResDesc, pTexDesc, pResViewDesc)); } hipError_t ihipDestroyTextureObject(hipTextureObject_t texObject) { if (texObject == nullptr) { return hipSuccess; } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); return hipErrorNotSupported; } const hipResourceType type = texObject->resDesc.resType; const bool isImageFromBuffer = (type == hipResourceTypeLinear) || (type == hipResourceTypePitch2D); const bool isImageView = ((type == hipResourceTypeArray) || (type == hipResourceTypeMipmappedArray)) && texObject->image->parent() != nullptr; // If the texture object was created from an array, then the array owns the image SRD. // Otherwise, if the texture object is a view, or was created from a buffer, then it owns the image SRD. if (isImageFromBuffer || isImageView) { texObject->image->release(); } // The texture object always owns the sampler SRD. texObject->sampler->release(); // TODO Should call ihipFree() to not polute the api trace. return ihipFree(texObject); } hipError_t ihipUnbindTexture(textureReference* texRef) { hipError_t hip_error = hipSuccess; do { if (texRef == nullptr) { hip_error = hipErrorInvalidValue; break; } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } hip_error = ihipDestroyTextureObject(texRef->textureObject); if (hip_error != hipSuccess) { break; } const_cast(texRef)->textureObject = nullptr; } while (0); return hip_error; } hipError_t hipDestroyTextureObject(hipTextureObject_t texObject) { HIP_INIT_API(hipDestroyTextureObject, texObject); HIP_RETURN(ihipDestroyTextureObject(texObject)); } hipError_t ihipGetTextureObjectResourceDesc(hipResourceDesc* pResDesc, hipTextureObject_t texObject) { if ((pResDesc == nullptr) || (texObject == nullptr)) { return hipErrorInvalidValue; } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); return hipErrorNotSupported; } *pResDesc = texObject->resDesc; return hipSuccess; } hipError_t hipGetTextureObjectResourceDesc(hipResourceDesc* pResDesc, hipTextureObject_t texObject) { HIP_INIT_API(hipGetTextureObjectResourceDesc, pResDesc, texObject); HIP_RETURN(ihipGetTextureObjectResourceDesc(pResDesc, texObject)); } hipError_t hipGetTextureObjectResourceViewDesc(hipResourceViewDesc* pResViewDesc, hipTextureObject_t texObject) { HIP_INIT_API(hipGetTextureObjectResourceViewDesc, pResViewDesc, texObject); if ((pResViewDesc == nullptr) || (texObject == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pResViewDesc = texObject->resViewDesc; HIP_RETURN(hipSuccess); } hipError_t hipGetTextureObjectTextureDesc(hipTextureDesc* pTexDesc, hipTextureObject_t texObject) { HIP_INIT_API(hipGetTextureObjectTextureDesc, pTexDesc, texObject); if ((pTexDesc == nullptr) || (texObject == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pTexDesc = texObject->texDesc; HIP_RETURN(hipSuccess); } inline hipError_t ihipGetTextureAlignmentOffset(size_t* offset, const void* devPtr) { amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); return hipErrorNotSupported; } const char* alignedDevPtr = amd::alignUp(static_cast(devPtr), info.imageBaseAddressAlignment_); const size_t alignedOffset = alignedDevPtr - static_cast(devPtr); // If the device memory pointer was returned from hipMalloc(), // the offset is guaranteed to be 0 and NULL may be passed as the offset parameter. if ((alignedOffset != 0) && (offset == nullptr)) { LogPrintfError("Texture object not aligned with offset %u \n", alignedOffset); return hipErrorInvalidValue; } if (offset != nullptr) { *offset = alignedOffset; } return hipSuccess; } hipError_t ihipBindTexture(size_t* offset, const textureReference* texref, const void* devPtr, const hipChannelFormatDesc* desc, size_t size) { if ((texref == nullptr) || (devPtr == nullptr) || (desc == nullptr)) { return hipErrorInvalidValue; } // Any previous address or HIP array state associated with the texture reference is superseded by this function. // Any memory previously bound to hTexRef is unbound. // No need to check for errors. hipError_t err = ihipDestroyTextureObject(texref->textureObject); if (err != hipSuccess) { return err; } hipResourceDesc resDesc = {}; resDesc.resType = hipResourceTypeLinear; resDesc.res.linear.devPtr = const_cast(devPtr); resDesc.res.linear.desc = *desc; resDesc.res.linear.sizeInBytes = size; err = ihipGetTextureAlignmentOffset(offset, devPtr); if (err != hipSuccess) { return err; } // Align the user ptr to HW requirments. resDesc.res.linear.devPtr = static_cast(const_cast(devPtr)) - *offset; hipTextureDesc texDesc = hip::getTextureDesc(texref); return ihipCreateTextureObject(const_cast(&texref->textureObject), &resDesc, &texDesc, nullptr); } hipError_t ihipBindTexture2D(size_t* offset, const textureReference* texref, const void* devPtr, const hipChannelFormatDesc* desc, size_t width, size_t height, size_t pitch) { if ((texref == nullptr) || (devPtr == nullptr) || (desc == nullptr)) { return hipErrorInvalidValue; } // Any previous address or HIP array state associated with the texture reference is superseded by this function. // Any memory previously bound to hTexRef is unbound. // No need to check for errors. hipError_t err = ihipDestroyTextureObject(texref->textureObject); if (err != hipSuccess) { return err; } hipResourceDesc resDesc = {}; resDesc.resType = hipResourceTypePitch2D; resDesc.res.pitch2D.devPtr = const_cast(devPtr); resDesc.res.pitch2D.desc = *desc; resDesc.res.pitch2D.width = width; resDesc.res.pitch2D.height = height; resDesc.res.pitch2D.pitchInBytes = pitch; err = ihipGetTextureAlignmentOffset(offset, devPtr); if (err != hipSuccess) { return err; } // Align the user ptr to HW requirments. resDesc.res.pitch2D.devPtr = static_cast(const_cast(devPtr)) - *offset; hipTextureDesc texDesc = hip::getTextureDesc(texref); return ihipCreateTextureObject(const_cast(&texref->textureObject), &resDesc, &texDesc, nullptr); } hipError_t hipBindTexture2D(size_t* offset, const textureReference* texref, const void* devPtr, const hipChannelFormatDesc* desc, size_t width, size_t height, size_t pitch) { HIP_INIT_API(hipBindTexture2D, offset, texref, devPtr, desc, width, height, pitch); hipDeviceptr_t refDevPtr = nullptr; size_t refDevSize = 0; HIP_RETURN_ONFAIL(PlatformState::instance().getStatGlobalVar(texref, ihipGetDevice(), &refDevPtr, &refDevSize)); assert(refDevSize == sizeof(textureReference)); hipError_t err = ihipBindTexture2D(offset, texref, devPtr, desc, width, height, pitch); if (err != hipSuccess) { HIP_RETURN(err); } // Copy to device. hip::Stream* stream = hip::getNullStream(); HIP_RETURN(ihipMemcpy(refDevPtr, texref, refDevSize, hipMemcpyHostToDevice, *stream)); } hipError_t ihipBindTextureToArray(const textureReference* texref, hipArray_const_t array, const hipChannelFormatDesc* desc) { if ((texref == nullptr) || (array == nullptr) || (desc == nullptr)) { return hipErrorInvalidValue; } // Any previous address or HIP array state associated with the texture reference is superseded by this function. // Any memory previously bound to hTexRef is unbound. // No need to check for errors. hipError_t err = ihipDestroyTextureObject(texref->textureObject); if (err != hipSuccess) { return err; } hipResourceDesc resDesc = {}; resDesc.resType = hipResourceTypeArray; resDesc.res.array.array = const_cast(array); hipTextureDesc texDesc = hip::getTextureDesc(texref); hipResourceViewFormat format = hip::getResourceViewFormat(*desc); hipResourceViewDesc resViewDesc = hip::getResourceViewDesc(array, format); return ihipCreateTextureObject(const_cast(&texref->textureObject), &resDesc, &texDesc, &resViewDesc); } hipError_t hipBindTextureToArray(const textureReference* texref, hipArray_const_t array, const hipChannelFormatDesc* desc) { HIP_INIT_API(hipBindTextureToArray, texref, array, desc); hipDeviceptr_t refDevPtr = nullptr; size_t refDevSize = 0; HIP_RETURN_ONFAIL(PlatformState::instance().getStatGlobalVar(texref, ihipGetDevice(), &refDevPtr, &refDevSize)); assert(refDevSize == sizeof(textureReference)); hipError_t err = ihipBindTextureToArray(texref, array, desc); if (err != hipSuccess) { HIP_RETURN(err); } // Copy to device. hip::Stream* stream = hip::getNullStream(); HIP_RETURN(ihipMemcpy(refDevPtr, texref, refDevSize, hipMemcpyHostToDevice, *stream)); } hipError_t ihipBindTextureToMipmappedArray(const textureReference* texref, hipMipmappedArray_const_t mipmappedArray, const hipChannelFormatDesc* desc) { if ((texref == nullptr) || (mipmappedArray == nullptr) || (desc == nullptr)) { return hipErrorInvalidValue; } // Any previous address or HIP array state associated with the texture reference is superseded by this function. // Any memory previously bound to hTexRef is unbound. // No need to check for errors. hipError_t err = ihipDestroyTextureObject(texref->textureObject); if (err != hipSuccess) { return err; } hipResourceDesc resDesc = {}; resDesc.resType = hipResourceTypeMipmappedArray; resDesc.res.mipmap.mipmap = const_cast(mipmappedArray); hipTextureDesc texDesc = hip::getTextureDesc(texref); hipResourceViewFormat format = hip::getResourceViewFormat(*desc); hipResourceViewDesc resViewDesc = hip::getResourceViewDesc(mipmappedArray, format); return ihipCreateTextureObject(const_cast(&texref->textureObject), &resDesc, &texDesc, &resViewDesc); } hipError_t hipBindTextureToMipmappedArray(const textureReference* texref, hipMipmappedArray_const_t mipmappedArray, const hipChannelFormatDesc* desc) { HIP_INIT_API(hipBindTextureToMipmappedArray, texref, mipmappedArray, desc); hipDeviceptr_t refDevPtr = nullptr; size_t refDevSize = 0; HIP_RETURN_ONFAIL(PlatformState::instance().getStatGlobalVar(texref, ihipGetDevice(), &refDevPtr, &refDevSize)); assert(refDevSize == sizeof(textureReference)); hipError_t err = ihipBindTextureToMipmappedArray(texref, mipmappedArray, desc); if (err != hipSuccess) { HIP_RETURN(err); } // Copy to device. hip::Stream* stream = hip::getNullStream(); HIP_RETURN(ihipMemcpy(refDevPtr, texref, refDevSize, hipMemcpyHostToDevice, *stream)); } hipError_t hipUnbindTexture(const textureReference* texref) { HIP_INIT_API(hipUnbindTexture, texref); HIP_RETURN(ihipUnbindTexture(const_cast(texref))); } hipError_t hipBindTexture(size_t* offset, const textureReference* texref, const void* devPtr, const hipChannelFormatDesc* desc, size_t size) { HIP_INIT_API(hipBindTexture, offset, texref, devPtr, desc, size); hipDeviceptr_t refDevPtr = nullptr; size_t refDevSize = 0; HIP_RETURN_ONFAIL(PlatformState::instance().getStatGlobalVar(texref, ihipGetDevice(), &refDevPtr, &refDevSize)); assert(refDevSize == sizeof(textureReference)); hipError_t err = ihipBindTexture(offset, texref, devPtr, desc, size); if (err != hipSuccess) { HIP_RETURN(err); } // Copy to device. hip::Stream* stream = hip::getNullStream(); HIP_RETURN(ihipMemcpy(refDevPtr, texref, refDevSize, hipMemcpyHostToDevice, *stream)); } hipError_t hipGetChannelDesc(hipChannelFormatDesc* desc, hipArray_const_t array) { HIP_INIT_API(hipGetChannelDesc, desc, array); if ((desc == nullptr) || (array == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } // It is UB to call hipGetChannelDesc() on an array created via hipArrayCreate()/hipArray3DCreate(). // This is due to hip not differentiating between runtime and driver types. *desc = array->desc; HIP_RETURN(hipSuccess); } hipError_t hipGetTextureAlignmentOffset(size_t* offset, const textureReference* texref) { HIP_INIT_API(hipGetTextureAlignmentOffset, offset, texref); if ((offset == nullptr) || (texref == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } // TODO enforce alignment on devPtr. *offset = 0; HIP_RETURN(hipSuccess); } hipError_t hipGetTextureReference(const textureReference** texref, const void* symbol) { HIP_INIT_API(hipGetTextureReference, texref, symbol); if (texref == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *texref = reinterpret_cast(symbol); HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetFormat(textureReference* texRef, hipArray_Format fmt, int NumPackedComponents) { HIP_INIT_API(hipTexRefSetFormat, texRef, fmt, NumPackedComponents); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } texRef->format = fmt; texRef->numChannels = NumPackedComponents; HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetFlags(textureReference* texRef, unsigned int Flags) { HIP_INIT_API(hipTexRefSetFlags, texRef, Flags); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } texRef->readMode = hipReadModeNormalizedFloat; texRef->normalized = 0; texRef->sRGB = 0; if (Flags & HIP_TRSF_READ_AS_INTEGER) { texRef->readMode = hipReadModeElementType; } if (Flags & HIP_TRSF_NORMALIZED_COORDINATES) { texRef->normalized = 1; } if (Flags & HIP_TRSF_SRGB) { texRef->sRGB = 1; } HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetFilterMode(textureReference* texRef, hipTextureFilterMode fm) { HIP_INIT_API(hipTexRefSetFilterMode, texRef, fm); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } texRef->filterMode = fm; HIP_RETURN(hipSuccess); } hipError_t hipTexRefGetAddressMode(hipTextureAddressMode* pam, const textureReference* texRef, int dim) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetAddressMode, pam, texRef, dim); if ((pam == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } // Currently, the only valid value for dim are 0 and 1. if ((dim != 0) && (dim != 1)) { LogPrintfError( "Currently only 2 dimensions (0,1) are valid," "dim : %d \n", dim); HIP_RETURN(hipErrorInvalidValue); } *pam = texRef->addressMode[dim]; HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetAddressMode(textureReference* texRef, int dim, hipTextureAddressMode am) { HIP_INIT_API(hipTexRefSetAddressMode, texRef, dim, am); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } if ((dim < 0) || (dim > 2)) { LogPrintfError( "Currently only 3 dimensions (0,1,2) are valid," "dim : %d \n", dim); HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } texRef->addressMode[dim] = am; HIP_RETURN(hipSuccess); } hipError_t hipTexRefGetArray(hipArray_t* pArray, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetArray, pArray, texRef); if ((pArray == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } hipResourceDesc resDesc = {}; // TODO use ihipGetTextureObjectResourceDesc() to not pollute the API trace. hipError_t error = ihipGetTextureObjectResourceDesc(&resDesc, texRef->textureObject); if (error != hipSuccess) { HIP_RETURN(error); } switch (resDesc.resType) { case hipResourceTypeLinear: case hipResourceTypePitch2D: case hipResourceTypeMipmappedArray: { HIP_RETURN(hipErrorInvalidValue); } case hipResourceTypeArray: *pArray = resDesc.res.array.array; break; } HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetArray(textureReference* texRef, hipArray_const_t array, unsigned int flags) { HIP_INIT_API(hipTexRefSetArray, texRef, array, flags); if ((texRef == nullptr) || (array == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } if (flags != HIP_TRSA_OVERRIDE_FORMAT) { HIP_RETURN(hipErrorInvalidValue); } hipDeviceptr_t refDevPtr = nullptr; size_t refDevSize = 0; HIP_RETURN_ONFAIL(PlatformState::instance().getDynTexGlobalVar(texRef, &refDevPtr, &refDevSize)); assert(refDevSize == sizeof(textureReference)); // Any previous address or HIP array state associated with the texture reference is superseded by this function. // Any memory previously bound to hTexRef is unbound. // No need to check for errors. hipError_t err = ihipDestroyTextureObject(texRef->textureObject); if (err != hipSuccess) { HIP_RETURN(err); } hipResourceDesc resDesc = {}; resDesc.resType = hipResourceTypeArray; resDesc.res.array.array = const_cast(array); hipTextureDesc texDesc = hip::getTextureDesc(texRef); hipResourceViewFormat format = hip::getResourceViewFormat(hip::getChannelFormatDesc(texRef->numChannels, texRef->format)); hipResourceViewDesc resViewDesc = hip::getResourceViewDesc(array, format); err = ihipCreateTextureObject(&texRef->textureObject, &resDesc, &texDesc, &resViewDesc); if (err != hipSuccess) { HIP_RETURN(err); } // Copy to device. hip::Stream* stream = hip::getNullStream(); HIP_RETURN(ihipMemcpy(refDevPtr, texRef, refDevSize, hipMemcpyHostToDevice, *stream)); } hipError_t hipTexRefGetAddress(hipDeviceptr_t* dptr, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetAddress, dptr, texRef); if ((dptr == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } hipResourceDesc resDesc = {}; // TODO use ihipGetTextureObjectResourceDesc() to not pollute the API trace. hipError_t error = ihipGetTextureObjectResourceDesc(&resDesc, texRef->textureObject); if (error != hipSuccess) { LogPrintfError("hipGetTextureObjectResourceDesc failed with error code: %s \n", ihipGetErrorName(error)); HIP_RETURN(error); } switch (resDesc.resType) { // Need to verify. // If the texture reference is not bound to any device memory range, // return hipErroInvalidValue. case hipResourceTypeArray: case hipResourceTypeMipmappedArray: { HIP_RETURN(hipErrorInvalidValue); } case hipResourceTypeLinear: *dptr = resDesc.res.linear.devPtr; break; case hipResourceTypePitch2D: *dptr = resDesc.res.pitch2D.devPtr; break; } HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetAddress(size_t* ByteOffset, textureReference* texRef, hipDeviceptr_t dptr, size_t bytes) { HIP_INIT_API(hipTexRefSetAddress, ByteOffset, texRef, dptr, bytes); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } hipDeviceptr_t refDevPtr = nullptr; size_t refDevSize = 0; HIP_RETURN_ONFAIL(PlatformState::instance().getDynTexGlobalVar(texRef, &refDevPtr, &refDevSize)); assert(refDevSize == sizeof(textureReference)); // Any previous address or HIP array state associated with the texture reference is superseded by this function. // Any memory previously bound to hTexRef is unbound. // No need to check for errors. hipError_t err = ihipDestroyTextureObject(texRef->textureObject); if (err != hipSuccess) { HIP_RETURN(err); } hipResourceDesc resDesc = {}; resDesc.resType = hipResourceTypeLinear; resDesc.res.linear.devPtr = dptr; resDesc.res.linear.desc = hip::getChannelFormatDesc(texRef->numChannels, texRef->format); resDesc.res.linear.sizeInBytes = bytes; err = ihipGetTextureAlignmentOffset(ByteOffset, dptr); if (err != hipSuccess) { HIP_RETURN(err); } // Align the user ptr to HW requirments. resDesc.res.linear.devPtr = static_cast(dptr) - *ByteOffset; hipTextureDesc texDesc = hip::getTextureDesc(texRef); err = ihipCreateTextureObject(&texRef->textureObject, &resDesc, &texDesc, nullptr); if (err != hipSuccess) { HIP_RETURN(err); } // Copy to device. hip::Stream* stream = hip::getNullStream(); HIP_RETURN(ihipMemcpy(refDevPtr, texRef, refDevSize, hipMemcpyHostToDevice, *stream)); } hipError_t hipTexRefSetAddress2D(textureReference* texRef, const HIP_ARRAY_DESCRIPTOR* desc, hipDeviceptr_t dptr, size_t Pitch) { HIP_INIT_API(hipTexRefSetAddress2D, texRef, desc, dptr, Pitch); if ((texRef == nullptr) || (desc == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } hipDeviceptr_t refDevPtr = nullptr; size_t refDevSize = 0; HIP_RETURN_ONFAIL(PlatformState::instance().getDynTexGlobalVar(texRef, &refDevPtr, &refDevSize)); assert(refDevSize == sizeof(textureReference)); // Any previous address or HIP array state associated with the texture reference is superseded by this function. // Any memory previously bound to hTexRef is unbound. // No need to check for errors. hipError_t err = ihipDestroyTextureObject(texRef->textureObject); if (err != hipSuccess) { HIP_RETURN(err); } hipResourceDesc resDesc = {}; resDesc.resType = hipResourceTypePitch2D; resDesc.res.linear.devPtr = dptr; resDesc.res.linear.desc = hip::getChannelFormatDesc(desc->NumChannels, desc->Format); // Need to verify. resDesc.res.pitch2D.width = desc->Width; resDesc.res.pitch2D.height = desc->Height; resDesc.res.pitch2D.pitchInBytes = Pitch; hipTextureDesc texDesc = hip::getTextureDesc(texRef); err = ihipCreateTextureObject(&texRef->textureObject, &resDesc, &texDesc, nullptr); if (err != hipSuccess) { HIP_RETURN(err); } // Copy to device. hip::Stream* stream = hip::getNullStream(); HIP_RETURN(ihipMemcpy(refDevPtr, texRef, refDevSize, hipMemcpyHostToDevice, *stream)); } hipChannelFormatDesc hipCreateChannelDesc(int x, int y, int z, int w, hipChannelFormatKind f) { return {x, y, z, w, f}; } hipError_t hipTexRefGetBorderColor(float* pBorderColor, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetBorderColor, pBorderColor, texRef); if ((pBorderColor == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } // TODO add textureReference::borderColor. assert(false && "textureReference::borderColor is missing in header"); // std::memcpy(pBorderColor, texRef.borderColor, sizeof(texRef.borderColor)); HIP_RETURN(hipSuccess); } hipError_t hipTexRefGetFilterMode(hipTextureFilterMode* pfm, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetFilterMode, pfm, texRef); if ((pfm == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pfm = texRef->filterMode; HIP_RETURN(hipSuccess); } hipError_t hipTexRefGetFlags(unsigned int* pFlags, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetFlags, pFlags, texRef); if ((pFlags == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pFlags = 0; if (texRef->readMode == hipReadModeElementType) { *pFlags |= HIP_TRSF_READ_AS_INTEGER; } if (texRef->normalized == 1) { *pFlags |= HIP_TRSF_NORMALIZED_COORDINATES; } if (texRef->sRGB == 1) { *pFlags |= HIP_TRSF_SRGB; } HIP_RETURN(hipSuccess); } hipError_t hipTexRefGetFormat(hipArray_Format* pFormat, int* pNumChannels, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetFormat, pFormat, pNumChannels, texRef); if ((pFormat == nullptr) || (pNumChannels == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pFormat = texRef->format; *pNumChannels = texRef->numChannels; HIP_RETURN(hipSuccess); } hipError_t hipTexRefGetMaxAnisotropy(int* pmaxAnsio, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetMaxAnisotropy, pmaxAnsio, texRef); if ((pmaxAnsio == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pmaxAnsio = texRef->maxAnisotropy; HIP_RETURN(hipErrorInvalidValue); } hipError_t hipTexRefGetMipmapFilterMode(hipTextureFilterMode* pfm, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetMipmapFilterMode, pfm, texRef); if ((pfm == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pfm = texRef->mipmapFilterMode; HIP_RETURN(hipErrorInvalidValue); } hipError_t hipTexRefGetMipmapLevelBias(float* pbias, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetMipmapLevelBias, pbias, texRef); if ((pbias == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pbias = texRef->mipmapLevelBias; HIP_RETURN(hipErrorInvalidValue); } hipError_t hipTexRefGetMipmapLevelClamp(float* pminMipmapLevelClamp, float* pmaxMipmapLevelClamp, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetMipmapLevelClamp, pminMipmapLevelClamp, pmaxMipmapLevelClamp, texRef); if ((pminMipmapLevelClamp == nullptr) || (pmaxMipmapLevelClamp == nullptr) || (texRef == nullptr)){ HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pminMipmapLevelClamp = texRef->minMipmapLevelClamp; *pmaxMipmapLevelClamp = texRef->maxMipmapLevelClamp; HIP_RETURN(hipErrorInvalidValue); } hipError_t hipTexRefGetMipmappedArray(hipMipmappedArray_t* pArray, const textureReference* texRef) { // TODO overload operator<<(ostream&, textureReference&). HIP_INIT_API(hipTexRefGetMipmappedArray, pArray, &texRef); if ((pArray == nullptr) || (texRef == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } hipResourceDesc resDesc = {}; // TODO use ihipGetTextureObjectResourceDesc() to not pollute the API trace. hipError_t error = ihipGetTextureObjectResourceDesc(&resDesc, texRef->textureObject); if (error != hipSuccess) { HIP_RETURN(error); } switch (resDesc.resType) { case hipResourceTypeLinear: case hipResourceTypePitch2D: case hipResourceTypeArray: { HIP_RETURN(hipErrorInvalidValue); } case hipResourceTypeMipmappedArray: *pArray = resDesc.res.mipmap.mipmap; break; } HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetBorderColor(textureReference* texRef, float* pBorderColor) { HIP_INIT_API(hipTexRefSetBorderColor, texRef, pBorderColor); if ((texRef == nullptr) || (pBorderColor == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } // TODO add textureReference::borderColor. assert(false && "textureReference::borderColor is missing in header"); // std::memcpy(texRef.borderColor, pBorderColor, sizeof(texRef.borderColor)); HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetMaxAnisotropy(textureReference* texRef, unsigned int maxAniso) { HIP_INIT_API(hipTexRefSetMaxAnisotropy, texRef, maxAniso); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } texRef->maxAnisotropy = maxAniso; HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetMipmapFilterMode(textureReference* texRef, hipTextureFilterMode fm) { HIP_INIT_API(hipTexRefSetMipmapFilterMode, texRef, fm); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } texRef->mipmapFilterMode = fm; HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetMipmapLevelBias(textureReference* texRef, float bias) { HIP_INIT_API(hipTexRefSetMipmapLevelBias, texRef, bias); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } texRef->mipmapLevelBias = bias; HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetMipmapLevelClamp(textureReference* texRef, float minMipMapLevelClamp, float maxMipMapLevelClamp) { HIP_INIT_API(hipTexRefSetMipmapLevelClamp, minMipMapLevelClamp, maxMipMapLevelClamp); if (texRef == nullptr) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } texRef->minMipmapLevelClamp = minMipMapLevelClamp; texRef->maxMipmapLevelClamp = maxMipMapLevelClamp; HIP_RETURN(hipSuccess); } hipError_t hipTexRefSetMipmappedArray(textureReference* texRef, hipMipmappedArray* mipmappedArray, unsigned int Flags) { HIP_INIT_API(hipTexRefSetMipmappedArray, texRef, mipmappedArray, Flags); if ((texRef == nullptr) || (mipmappedArray == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } if (Flags != HIP_TRSA_OVERRIDE_FORMAT) { HIP_RETURN(hipErrorInvalidValue); } hipDeviceptr_t refDevPtr = nullptr; size_t refDevSize = 0; HIP_RETURN_ONFAIL(PlatformState::instance().getDynTexGlobalVar(texRef, &refDevPtr, &refDevSize)); assert(refDevSize == sizeof(textureReference)); // Any previous address or HIP array state associated with the texture reference is superseded by this function. // Any memory previously bound to hTexRef is unbound. // No need to check for errors. hipError_t err = ihipDestroyTextureObject(texRef->textureObject); if (err != hipSuccess) { HIP_RETURN(err); } hipResourceDesc resDesc = {}; resDesc.resType = hipResourceTypeMipmappedArray; resDesc.res.mipmap.mipmap = mipmappedArray; hipTextureDesc texDesc = hip::getTextureDesc(texRef); hipResourceViewFormat format = hip::getResourceViewFormat(hip::getChannelFormatDesc(texRef->numChannels, texRef->format)); hipResourceViewDesc resViewDesc = hip::getResourceViewDesc(mipmappedArray, format); err = ihipCreateTextureObject(&texRef->textureObject, &resDesc, &texDesc, &resViewDesc); if (err != hipSuccess) { HIP_RETURN(err); } // Copy to device. hip::Stream* stream = hip::getNullStream(); HIP_RETURN(ihipMemcpy(refDevPtr, texRef, refDevSize, hipMemcpyHostToDevice, *stream)); } hipError_t hipTexObjectCreate(hipTextureObject_t* pTexObject, const HIP_RESOURCE_DESC* pResDesc, const HIP_TEXTURE_DESC* pTexDesc, const HIP_RESOURCE_VIEW_DESC* pResViewDesc) { HIP_INIT_API(hipTexObjectCreate, pTexObject, pResDesc, pTexDesc, pResViewDesc); if ((pTexObject == nullptr) || (pResDesc == nullptr) || (pTexDesc == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } hipResourceDesc resDesc = hip::getResourceDesc(*pResDesc); hipTextureDesc texDesc = hip::getTextureDesc(*pTexDesc); if (pResViewDesc != nullptr) { hipResourceViewDesc resViewDesc = hip::getResourceViewDesc(*pResViewDesc); HIP_RETURN(ihipCreateTextureObject(pTexObject, &resDesc, &texDesc, &resViewDesc)); } else { HIP_RETURN(ihipCreateTextureObject(pTexObject, &resDesc, &texDesc, nullptr)); } } hipError_t hipTexObjectDestroy(hipTextureObject_t texObject) { HIP_INIT_API(hipTexObjectDestroy, texObject); HIP_RETURN(ihipDestroyTextureObject(texObject)); } hipError_t hipTexObjectGetResourceDesc(HIP_RESOURCE_DESC* pResDesc, hipTextureObject_t texObject) { HIP_INIT_API(hipTexObjectGetResourceDesc, pResDesc, texObject); if ((pResDesc == nullptr) || (texObject == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pResDesc = hip::getResourceDesc(texObject->resDesc); HIP_RETURN(hipSuccess); } hipError_t hipTexObjectGetResourceViewDesc(HIP_RESOURCE_VIEW_DESC* pResViewDesc, hipTextureObject_t texObject) { HIP_INIT_API(hipTexObjectGetResourceViewDesc, pResViewDesc, texObject); if ((pResViewDesc == nullptr) || (texObject == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pResViewDesc = hip::getResourceViewDesc(texObject->resViewDesc); HIP_RETURN(hipSuccess); } hipError_t hipTexObjectGetTextureDesc(HIP_TEXTURE_DESC* pTexDesc, hipTextureObject_t texObject) { HIP_INIT_API(hipTexObjectGetTextureDesc, pTexDesc, texObject); if ((pTexDesc == nullptr) || (texObject == nullptr)) { HIP_RETURN(hipErrorInvalidValue); } amd::Device* device = hip::getCurrentDevice()->devices()[0]; const device::Info& info = device->info(); if (!info.imageSupport_) { LogPrintfError("Texture not supported on the device %s", info.name_); HIP_RETURN(hipErrorNotSupported); } *pTexDesc = hip::getTextureDesc(texObject->texDesc); HIP_RETURN(hipSuccess); } clr-rocm-5.7.1/hipamd/src/hip_vm.cpp000066400000000000000000000217151450307266000172720ustar00rootroot00000000000000/* Copyright (c) 2015 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hip_internal.hpp" #include "hip_vm.hpp" hipError_t hipMemAddressFree(void* devPtr, size_t size) { HIP_INIT_API(hipMemAddressFree, devPtr, size); if (devPtr == nullptr || size == 0) { HIP_RETURN(hipErrorInvalidValue); } for (auto& dev: g_devices) { dev->devices()[0]->virtualFree(devPtr); } HIP_RETURN(hipSuccess); } hipError_t hipMemAddressReserve(void** ptr, size_t size, size_t alignment, void* addr, unsigned long long flags) { HIP_INIT_API(hipMemAddressReserve, ptr, size, alignment, addr, flags); if (ptr == nullptr || flags !=0) { HIP_RETURN(hipErrorInvalidValue); } *ptr = nullptr; void* startAddress = addr; for (auto& dev : g_devices) { *ptr = dev->devices()[0]->virtualAlloc(startAddress, size, alignment); // if addr==0 we generate the va and use it for other devices if (startAddress == nullptr) { startAddress = *ptr; } else if (*ptr != startAddress) { // if we cannot reserve the same VA on other devices, just fail for (auto& d : g_devices) { if (d == dev) HIP_RETURN(hipErrorOutOfMemory); d->devices()[0]->virtualFree(startAddress); } } } HIP_RETURN(hipSuccess); } hipError_t hipMemCreate(hipMemGenericAllocationHandle_t* handle, size_t size, const hipMemAllocationProp* prop, unsigned long long flags) { HIP_INIT_API(hipMemCreate, handle, size, prop, flags); if (handle == nullptr || size == 0 || flags != 0 || prop == nullptr || prop->type != hipMemAllocationTypePinned || prop->location.type != hipMemLocationTypeDevice || prop->location.id >= g_devices.size()) { HIP_RETURN(hipErrorInvalidValue); } // Currently only support non-IPC allocations if (prop->requestedHandleType != hipMemHandleTypeNone) { HIP_RETURN(hipErrorNotSupported); } const auto& dev_info = g_devices[prop->location.id]->devices()[0]->info(); if (dev_info.maxPhysicalMemAllocSize_ < size) { HIP_RETURN(hipErrorOutOfMemory); } if (size % dev_info.memBaseAddrAlign_ != 0) { HIP_RETURN(hipErrorInvalidValue); } amd::Context* amdContext = g_devices[prop->location.id]->asContext(); void* ptr = amd::SvmBuffer::malloc(*amdContext, 0, size, dev_info.memBaseAddrAlign_, nullptr); if (ptr == nullptr) { size_t free = 0, total =0; hipError_t err = hipMemGetInfo(&free, &total); if (err == hipSuccess) { LogPrintfError("Allocation failed : Device memory : required :%zu | free :%zu | total :%zu \n", size, free, total); } HIP_RETURN(hipErrorOutOfMemory); } size_t offset = 0; //this is ignored amd::Memory* memObj = getMemoryObject(ptr, offset); //saves the current device id so that it can be accessed later memObj->getUserData().deviceId = prop->location.id; memObj->getUserData().data = new hip::GenericAllocation(ptr, size, *prop); *handle = reinterpret_cast(memObj->getUserData().data); HIP_RETURN(hipSuccess); } hipError_t hipMemExportToShareableHandle(void* shareableHandle, hipMemGenericAllocationHandle_t handle, hipMemAllocationHandleType handleType, unsigned long long flags) { HIP_INIT_API(hipMemExportToShareableHandle, shareableHandle, handle, handleType, flags); if (flags != 0 || handle == nullptr || shareableHandle == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipErrorNotSupported); } hipError_t hipMemGetAccess(unsigned long long* flags, const hipMemLocation* location, void* ptr) { HIP_INIT_API(hipMemGetAccess, flags, location, ptr); if (flags == nullptr || location == nullptr || ptr == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipMemGetAllocationGranularity(size_t* granularity, const hipMemAllocationProp* prop, hipMemAllocationGranularity_flags option) { HIP_INIT_API(hipMemGetAllocationGranularity, granularity, prop, option); if (granularity == nullptr || prop == nullptr || prop->type != hipMemAllocationTypePinned || prop->location.type != hipMemLocationTypeDevice || prop->location.id >= g_devices.size()) { HIP_RETURN(hipErrorInvalidValue); } const auto& dev_info = g_devices[prop->location.id]->devices()[0]->info(); *granularity = dev_info.virtualMemAllocGranularity_; HIP_RETURN(hipSuccess); } hipError_t hipMemGetAllocationPropertiesFromHandle(hipMemAllocationProp* prop, hipMemGenericAllocationHandle_t handle) { HIP_INIT_API(hipMemGetAllocationPropertiesFromHandle, prop, handle); if (handle == nullptr || prop == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *prop = reinterpret_cast(handle)->GetProperties(); HIP_RETURN(hipSuccess); } hipError_t hipMemImportFromShareableHandle(hipMemGenericAllocationHandle_t* handle, void* osHandle, hipMemAllocationHandleType shHandleType) { HIP_INIT_API(hipMemImportFromShareableHandle, handle, osHandle, shHandleType); if (handle == nullptr || osHandle == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipErrorNotSupported); } hipError_t hipMemMap(void* ptr, size_t size, size_t offset, hipMemGenericAllocationHandle_t handle, unsigned long long flags) { HIP_INIT_API(hipMemMap, ptr, size, offset, handle, flags); if (ptr == nullptr || handle == nullptr || size == 0 || offset != 0 || flags != 0) { HIP_RETURN(hipErrorInvalidValue); } hip::GenericAllocation* ga = reinterpret_cast(handle); auto& queue = *g_devices[ga->GetProperties().location.id]->NullStream(); amd::Command* cmd = new amd::VirtualMapCommand(queue, amd::Command::EventWaitList{}, ptr, size, &ga->asAmdMemory()); cmd->enqueue(); cmd->awaitCompletion(); cmd->release(); // update the internal svm address to ptr ga->asAmdMemory().setSvmPtr(ptr); HIP_RETURN(hipSuccess); } hipError_t hipMemMapArrayAsync(hipArrayMapInfo* mapInfoList, unsigned int count, hipStream_t stream) { HIP_INIT_API(hipMemMapArrayAsync, mapInfoList, count, stream); if (mapInfoList == nullptr || count == 0) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipErrorNotSupported); } hipError_t hipMemRelease(hipMemGenericAllocationHandle_t handle) { HIP_INIT_API(hipMemRelease, handle); if (handle == nullptr) HIP_RETURN(hipErrorInvalidValue); hip::GenericAllocation* ga = reinterpret_cast(handle); delete ga; HIP_RETURN(hipSuccess); } hipError_t hipMemRetainAllocationHandle(hipMemGenericAllocationHandle_t* handle, void* addr) { HIP_INIT_API(hipMemRetainAllocationHandle, handle, addr); if (handle == nullptr || addr == nullptr) HIP_RETURN(hipErrorInvalidValue); amd::Memory* mem = amd::MemObjMap::FindMemObj(addr); if (mem == nullptr) { HIP_RETURN(hipErrorInvalidValue); } *handle = reinterpret_cast(mem->getUserData().data); if (*handle == nullptr) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipMemSetAccess(void* ptr, size_t size, const hipMemAccessDesc* desc, size_t count) { HIP_INIT_API(hipMemSetAccess, ptr, size, desc, count); if (ptr == nullptr || size == 0 || desc == nullptr || count == 0) { HIP_RETURN(hipErrorInvalidValue); } HIP_RETURN(hipSuccess); } hipError_t hipMemUnmap(void* ptr, size_t size) { HIP_INIT_API(hipMemUnmap, ptr, size); if (ptr == nullptr) HIP_RETURN(hipErrorInvalidValue); amd::Memory* va = amd::MemObjMap::FindMemObj(ptr); auto& queue = *g_devices[va->getUserData().deviceId]->NullStream(); amd::Command* cmd = new amd::VirtualMapCommand(queue, amd::Command::EventWaitList{}, ptr, size, nullptr); cmd->enqueue(); cmd->awaitCompletion(); cmd->release(); // restore the original va of the generic allocation hip::GenericAllocation* ga = reinterpret_cast(va->getUserData().data); va->setSvmPtr(ga->genericAddress()); HIP_RETURN(hipSuccess); } clr-rocm-5.7.1/hipamd/src/hip_vm.hpp000066400000000000000000000043321450307266000172730ustar00rootroot00000000000000/* Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef HIP_SRC_HIP_VM_H #define HIP_SRC_HIP_VM_H #include #include "hip_internal.hpp" hipError_t ihipFree(void* ptr); namespace hip { struct MemMapAllocUserData { void* ptr_; // Original pointer of the allocation size_t size_; // Aligned size of the allocation amd::Memory* va_; // Memory object for the virtual address MemMapAllocUserData(void* ptr, size_t size, amd::Memory* va) : ptr_(ptr), size_(size), va_(va) {} }; class GenericAllocation { void* ptr_; size_t size_; hipMemAllocationProp properties_; public: GenericAllocation(void* ptr, size_t size, const hipMemAllocationProp& prop): ptr_(ptr), size_(size), properties_(prop) {} ~GenericAllocation() { hipError_t err = ihipFree(ptr_); } const hipMemAllocationProp& GetProperties() const { return properties_; } hipMemGenericAllocationHandle_t asMemGenericAllocationHandle() { return reinterpret_cast(this); } amd::Memory& asAmdMemory() { size_t discardOffset; return *getMemoryObject(genericAddress(), discardOffset); } void* genericAddress() const { return ptr_; } }; }; #endif //HIP_SRC_HIP_VM_H clr-rocm-5.7.1/hipamd/src/hiprtc/000077500000000000000000000000001450307266000165675ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/src/hiprtc/CMakeLists.txt000066400000000000000000000245111450307266000213320ustar00rootroot00000000000000# Copyright (c) 2020 - 2022 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. # This project builds hiprtc # If ever this is to be a different lib living in different folder # Please read this part # Depends on: rocclr, so find_package(rocclr) will be required # Building hip header requires hip include folders with hip_version.h cmake_minimum_required(VERSION 3.16.1) option(BUILD_SHARED_LIBS "Build the shared library" ON) list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_LIST_DIR}/cmake") if(BUILD_SHARED_LIBS) add_library(hiprtc SHARED) # Windows doesn't have a strip utility, so CMAKE_STRIP won't be set. if((CMAKE_BUILD_TYPE STREQUAL "Release") AND NOT ("${CMAKE_STRIP}" STREQUAL "")) add_custom_command(TARGET hiprtc POST_BUILD COMMAND ${CMAKE_STRIP} $) endif() else() add_library(hiprtc STATIC $) endif() set_target_properties(hiprtc PROPERTIES CXX_STANDARD 17 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF POSITION_INDEPENDENT_CODE ON LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib ARCHIVE_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib) if(WIN32) if(${HIP_LIB_VERSION_MAJOR} VERSION_GREATER 9) set(HIP_MAJOR_VERSION "${HIP_LIB_VERSION_MAJOR}") else() set(HIP_MAJOR_VERSION "0${HIP_LIB_VERSION_MAJOR}") endif() set(HIP_MINOR_VERSION "0${HIP_LIB_VERSION_MINOR}") endif() if(BUILD_SHARED_LIBS) if(WIN32) set_target_properties(hiprtc PROPERTIES OUTPUT_NAME "hiprtc${HIP_MAJOR_VERSION}${HIP_MINOR_VERSION}" ARCHIVE_OUTPUT_NAME "hiprtc") else() set_target_properties(hiprtc PROPERTIES VERSION ${HIP_LIB_VERSION_STRING} SOVERSION ${HIP_LIB_VERSION_MAJOR}) endif() endif() if(BUILD_SHARED_LIBS) target_sources(hiprtc PRIVATE hiprtc.cpp hiprtcComgrHelper.cpp hiprtcInternal.cpp) endif() set_target_properties(hiprtc PROPERTIES CXX_STANDARD 17 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF POSITION_INDEPENDENT_CODE ON) target_include_directories(hiprtc PRIVATE $ $ $ PUBLIC $) if(BUILD_SHARED_LIBS) if(WIN32) target_sources(hiprtc PRIVATE hiprtc.def) else() target_link_libraries(hiprtc PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_LIST_DIR}/hiprtc.map.in") set_target_properties(hiprtc PROPERTIES LINK_DEPENDS "${CMAKE_CURRENT_LIST_DIR}/hiprtc.map.in") endif() endif() target_link_libraries(hiprtc PRIVATE ${FIXME_DIR}/fixme.cpp${CMAKE_CXX_OUTPUT_EXTENSION}) add_dependencies(hiprtc amdhip64) if(WIN32) target_link_libraries(hiprtc PRIVATE Dbghelp.lib) endif() target_link_libraries(hiprtc PUBLIC ${CMAKE_DL_LIBS}) if(BUILD_SHARED_LIBS) target_link_libraries(hiprtc PRIVATE rocclr) else() target_compile_definitions(hiprtc PRIVATE $) target_include_directories(hiprtc PRIVATE $) endif() target_compile_definitions(hiprtc PUBLIC __HIP_PLATFORM_AMD__) add_to_config(_versionInfo HIP_PACKAGING_VERSION_PATCH) add_to_config(_versionInfo CPACK_DEBIAN_PACKAGE_RELEASE) add_to_config(_versionInfo CPACK_RPM_PACKAGE_RELEASE) add_to_config(_versionInfo HIP_VERSION_MAJOR) add_to_config(_versionInfo HIP_VERSION_MINOR) add_to_config(_versionInfo HIP_VERSION_PATCH) add_to_config(_versionInfo HIP_VERSION_GITHASH) # Enable preprocessed hiprtc-builtins library include(HIPRTC RESULT_VARIABLE HIPRTC_CMAKE) # Requires clang and llvm-mc to create this library. find_package(LLVM REQUIRED CONFIG PATHS ${ROCM_PATH}/llvm) find_package(Clang REQUIRED CONFIG PATHS ${ROCM_PATH}/llvm) set(HIPRTC_GEN_DIR "${CMAKE_CURRENT_BINARY_DIR}/hip_rtc_gen") set(HIPRTC_GEN_HEADER "${HIPRTC_GEN_DIR}/hipRTC_header.h") set(HIPRTC_GEN_MCIN "${HIPRTC_GEN_DIR}/hipRTC_header.mcin") set(HIPRTC_GEN_PREPROCESSED "${HIPRTC_GEN_DIR}/hipRTC") set(HIPRTC_GEN_OBJ "${HIPRTC_GEN_DIR}/hipRTC_header${CMAKE_CXX_OUTPUT_EXTENSION}") set(HIPRTC_WARP_FUNCS "${PROJECT_SOURCE_DIR}/include/hip/amd_detail/amd_warp_functions.h") set(HIPRTC_FP16_MATH_FWD "${PROJECT_SOURCE_DIR}/include/hip/amd_detail/hip_fp16_math_fwd.h") set(HIPRTC_FP16_FUNCS "${PROJECT_SOURCE_DIR}/include/hip/amd_detail/amd_hip_fp16.h") set(HIPRTC_COOP_GROUPS "${PROJECT_SOURCE_DIR}/include/hip/amd_detail/amd_hip_cooperative_groups.h") set(HIPRTC_COOP_GRPS_HELPER "${PROJECT_SOURCE_DIR}/include/hip/amd_detail/hip_cooperative_groups_helper.h") set(HIPRTC_UNSAFE_ATOMICS "${PROJECT_SOURCE_DIR}/include/hip/amd_detail/amd_hip_unsafe_atomics.h") # Generate required HIPRTC files. FILE(MAKE_DIRECTORY ${HIPRTC_GEN_DIR}) generate_hiprtc_header("${HIPRTC_GEN_HEADER}") generate_hiprtc_mcin("${HIPRTC_GEN_MCIN}" "${HIPRTC_GEN_PREPROCESSED}") # Generate HIPRTC Builtins Preprocessed Object. # Note: second command appends define macros at build time. # FIXME: --hip-version forced to 3.6 to use clang headers, until Windows versioning is fixed. add_custom_command( OUTPUT ${HIPRTC_GEN_PREPROCESSED} COMMAND $ -O3 --rocm-path=${PROJECT_SOURCE_DIR}/include/.. -std=c++17 -nogpulib --hip-version=3.6 -isystem ${HIP_COMMON_INCLUDE_DIR} -isystem ${PROJECT_SOURCE_DIR}/include -isystem ${PROJECT_BINARY_DIR}/include -isystem ${CMAKE_CURRENT_SOURCE_DIR}/include --cuda-device-only -D__HIPCC_RTC__ -x hip ${HIPRTC_GEN_HEADER} -E -o ${HIPRTC_GEN_PREPROCESSED} COMMAND ${CMAKE_COMMAND} -DHIPRTC_ADD_MACROS=1 -DHIPRTC_WARP_HEADER_FILE=${HIPRTC_WARP_FUNCS} -DHIPRTC_COOP_HEADER_FILE=${HIPRTC_COOP_GROUPS} -DHIPRTC_COOP_HELPER_FILE=${HIPRTC_COOP_GRPS_HELPER} -DHIPRTC_UNSAFE_ATOMICS_FILE=${HIPRTC_UNSAFE_ATOMICS} -DHIPRTC_FP16_MATH_FWD_FILE=${HIPRTC_FP16_MATH_FWD} -DHIPRTC_FP16_HEADER_FILE=${HIPRTC_FP16_FUNCS} -DHIPRTC_PREPROCESSED_FILE=${HIPRTC_GEN_PREPROCESSED} -P ${HIPRTC_CMAKE} DEPENDS clang ${HIPRTC_GEN_HEADER}) add_custom_command( OUTPUT ${HIPRTC_GEN_OBJ} COMMAND $ -o ${HIPRTC_GEN_OBJ} ${HIPRTC_GEN_MCIN} --filetype=obj DEPENDS llvm-mc ${HIPRTC_GEN_PREPROCESSED} ${HIPRTC_GEN_MCIN}) # Create hiprtc-builtins library. add_library(hiprtc-builtins ${HIPRTC_GEN_OBJ}) set_target_properties(hiprtc-builtins PROPERTIES CXX_STANDARD 14 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF POSITION_INDEPENDENT_CODE ON LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib LINKER_LANGUAGE CXX VERSION ${HIP_LIB_VERSION_STRING}) # Windows and Linux have different naming conventions. if(WIN32) # Windows uses DEF file to determine which symbols to expose. target_sources(hiprtc-builtins PRIVATE hiprtc-builtins.def) set_target_properties(hiprtc-builtins PROPERTIES OUTPUT_NAME "hiprtc-builtins${HIP_MAJOR_VERSION}${HIP_MINOR_VERSION}" ARCHIVE_OUTPUT_NAME "hiprtc-builtins") # Since ${HIPRTC_GEN_OBJ} was manually generated with llvm-mc, /MT did not embed # libcmt.lib inside of the obj. So we need to manually set it as defaultlib. target_link_options(hiprtc-builtins PRIVATE "LINKER:/DEFAULTLIB:libcmt") else() # SOVERSION is only supported on Linux. set_target_properties(hiprtc-builtins PROPERTIES OUTPUT_NAME "hiprtc-builtins" SOVERSION ${HIP_LIB_VERSION_MAJOR}) endif() # Test the header file works with simple compilation. add_custom_command( OUTPUT ${HIPRTC_GEN_DIR}/tmp.bc COMMAND $ -O3 --rocm-path=${PROJECT_SOURCE_DIR}/include/.. -std=c++14 -nogpulib -nogpuinc -emit-llvm -c -isystem ${HIP_COMMON_INCLUDE_DIR} -isystem ${PROJECT_BINARY_DIR}/include -isystem ${CMAKE_CURRENT_SOURCE_DIR}/include --cuda-device-only -D__HIPCC_RTC__ --offload-arch=gfx906 -x hip-cpp-output ${HIPRTC_GEN_PREPROCESSED} -o ${HIPRTC_GEN_DIR}/tmp.bc DEPENDS clang ${HIPRTC_GEN_PREPROCESSED}) target_link_libraries(hiprtc PRIVATE ${HIPRTC_GEN_OBJ}) target_compile_definitions(hiprtc PRIVATE __HIP_ENABLE_RTC) if(NOT WIN32) target_sources(amdhip64 PRIVATE hiprtc.cpp hiprtcComgrHelper.cpp hiprtcInternal.cpp) endif() list(APPEND HIPRTC_OBJECTS ${HIPRTC_GEN_OBJ}) set(HIPRTC_OBJECTS ${HIPRTC_OBJECTS} PARENT_SCOPE) add_dependencies(hiprtc hiprtc-builtins) install(TARGETS hiprtc-builtins RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) INSTALL(TARGETS hiprtc EXPORT hiprtc-targets RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}) INSTALL(EXPORT hiprtc-targets NAMESPACE hiprtc:: DESTINATION ${CONFIG_RTC_PACKAGE_INSTALL_DIR}) ############################# # hiprtc-config ############################# include(CMakePackageConfigHelpers) configure_package_config_file( ${CMAKE_CURRENT_SOURCE_DIR}/cmake/hiprtc-config.cmake.in ${PROJECT_BINARY_DIR}/hiprtc-config.cmake INSTALL_DESTINATION ${CONFIG_RTC_PACKAGE_INSTALL_DIR} PATH_VARS LIB_INSTALL_DIR INCLUDE_INSTALL_DIR BIN_INSTALL_DIR) write_basic_package_version_file( ${PROJECT_BINARY_DIR}/hiprtc-config-version.cmake VERSION "${HIP_VERSION_MAJOR}.${HIP_VERSION_MINOR}.${HIP_VERSION_GITDATE}" COMPATIBILITY SameMajorVersion) INSTALL(FILES ${HIP_COMMON_INCLUDE_DIR}/hip/hiprtc.h DESTINATION "include/hip") INSTALL(FILES ${HIP_COMMON_INCLUDE_DIR}/hip/hip_common.h DESTINATION "include/hip") INSTALL( FILES ${PROJECT_BINARY_DIR}/hiprtc-config.cmake ${PROJECT_BINARY_DIR}/hiprtc-config-version.cmake DESTINATION ${CONFIG_RTC_PACKAGE_INSTALL_DIR}) clr-rocm-5.7.1/hipamd/src/hiprtc/cmake/000077500000000000000000000000001450307266000176475ustar00rootroot00000000000000clr-rocm-5.7.1/hipamd/src/hiprtc/cmake/HIPRTC.cmake000066400000000000000000000123431450307266000216450ustar00rootroot00000000000000# Copyright (c) 2021 - 2023 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. ############################################################################### # HIPRTC.cmake ############################################################################### # This file includes macros required to generate the hiprtc builtins library. function(get_hiprtc_macros HIPRTC_DEFINES) set(${HIPRTC_DEFINES} "#pragma clang diagnostic push\n\ #pragma clang diagnostic ignored \"-Wreserved-id-macro\"\n\ #pragma clang diagnostic ignored \"-Wc++98-compat-pedantic\"\n\ #define __device__ __attribute__((device))\n\ #define __host__ __attribute__((host))\n\ #define __global__ __attribute__((global))\n\ #define __constant__ __attribute__((constant))\n\ #define __shared__ __attribute__((shared))\n\ #define __align__(x) __attribute__((aligned(x)))\n\ #if !defined(__has_feature) || !__has_feature(cuda_noinline_keyword)\n\ #define __noinline__ __attribute__((noinline))\n\ #endif\n\ #define __forceinline__ inline __attribute__((always_inline))\n\ #define launch_bounds_impl0(requiredMaxThreadsPerBlock) \\\n\ __attribute__((amdgpu_flat_work_group_size(1, requiredMaxThreadsPerBlock)))\n\ #define launch_bounds_impl1(requiredMaxThreadsPerBlock, minBlocksPerMultiprocessor) \\\n\ __attribute__((amdgpu_flat_work_group_size(1, requiredMaxThreadsPerBlock), \\\n\ amdgpu_waves_per_eu(minBlocksPerMultiprocessor)))\n\ #define select_impl_(_1, _2, impl_, ...) impl_\n\ #define __launch_bounds__(...) \\\n\ select_impl_(__VA_ARGS__, launch_bounds_impl1, launch_bounds_impl0)(__VA_ARGS__) \n\ #pragma clang diagnostic pop\n\ #define HIP_INCLUDE_HIP_HIP_RUNTIME_H\n\ #pragma clang diagnostic push\n\ #pragma clang diagnostic ignored \"-Wreserved-macro-identifier\"\n\ #define _HIP_BFLOAT16_H_\n\ #pragma clang diagnostic pop\n\ #define HIP_INCLUDE_HIP_HIP_VECTOR_TYPES_H" PARENT_SCOPE) endfunction(get_hiprtc_macros) # To allow concatenating above macros during build time, call this file in script mode. if(HIPRTC_ADD_MACROS) message(STATUS "Appending hiprtc macros to ${HIPRTC_PREPROCESSED_FILE}.") get_hiprtc_macros(HIPRTC_DEFINES) FILE(APPEND ${HIPRTC_PREPROCESSED_FILE} "${HIPRTC_DEFINES}") FILE(READ "${HIPRTC_WARP_HEADER_FILE}" HIPRTC_WARP_HEADER) FILE(APPEND ${HIPRTC_PREPROCESSED_FILE} "${HIPRTC_WARP_HEADER}") #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wreserved-macro-identifier" FILE(READ "${HIPRTC_COOP_HELPER_FILE}" HIPRTC_COOP_HELPER) FILE(APPEND ${HIPRTC_PREPROCESSED_FILE} "${HIPRTC_COOP_HELPER}") FILE(READ "${HIPRTC_COOP_HEADER_FILE}" HIPRTC_COOP_HEADER) FILE(APPEND ${HIPRTC_PREPROCESSED_FILE} "${HIPRTC_COOP_HEADER}") FILE(READ "${HIPRTC_UNSAFE_ATOMICS_FILE}" HIPRTC_UNSAFE_ATOMICS) FILE(APPEND ${HIPRTC_PREPROCESSED_FILE} "${HIPRTC_UNSAFE_ATOMICS}") FILE(READ "${HIPRTC_FP16_MATH_FWD_FILE}" HIPRTC_FP16_MATH_FWD) FILE(APPEND ${HIPRTC_PREPROCESSED_FILE} "${HIPRTC_FP16_MATH_FWD}") FILE(READ "${HIPRTC_FP16_HEADER_FILE}" HIPRTC_FP16_HEADER) FILE(APPEND ${HIPRTC_PREPROCESSED_FILE} "${HIPRTC_FP16_HEADER}") #pragma clang diagnostic pop endif() macro(generate_hiprtc_header HiprtcHeader) FILE(WRITE ${HiprtcHeader} "#pragma push_macro(\"CHAR_BIT\")\n\ #pragma push_macro(\"INT_MAX\")\n\ #define CHAR_BIT __CHAR_BIT__\n\ #define INT_MAX __INTMAX_MAX__\n\ #include \"hip/hip_runtime.h\"\n\ #include \"hip/hip_bfloat16.h\"\n\ #pragma pop_macro(\"CHAR_BIT\")\n\ #pragma pop_macro(\"INT_MAX\")") endmacro(generate_hiprtc_header) macro(generate_hiprtc_mcin HiprtcMcin HiprtcPreprocessedInput) if(WIN32) set(HIPRTC_TYPE_LINUX_ONLY "") else() set(HIPRTC_TYPE_LINUX_ONLY " .type __hipRTC_header,@object\n" " .type __hipRTC_header_size,@object") endif() FILE(WRITE ${HiprtcMcin} "// Automatically generated script for HIPRTC.\n\ ${HIPRTC_TYPE_LINUX_ONLY}\n\ .section .hipRTC_header,\"a\"\n\ .globl __hipRTC_header\n\ .globl __hipRTC_header_size\n\ .p2align 3\n\ __hipRTC_header:\n\ .incbin \"${HiprtcPreprocessedInput}\"\n\ __hipRTC_header_size:\n\ .long __hipRTC_header_size - __hipRTC_header\n") endmacro(generate_hiprtc_mcin) clr-rocm-5.7.1/hipamd/src/hiprtc/cmake/hiprtc-config.cmake.in000066400000000000000000000033501450307266000240130ustar00rootroot00000000000000# Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.3) @PACKAGE_INIT@ include(CMakeFindDependencyMacro) set_and_check(hiprtc_INCLUDE_DIR "@PACKAGE_INCLUDE_INSTALL_DIR@") set_and_check(hiprtc_LIB_INSTALL_DIR "@PACKAGE_LIB_INSTALL_DIR@") set_and_check(hiprtc_BIN_INSTALL_DIR "@PACKAGE_BIN_INSTALL_DIR@") # Windows Specific Definition here: if(WIN32) if(DEFINED ENV{HIP_PATH}) file(TO_CMAKE_PATH "$ENV{HIP_PATH}" HIP_PATH) elseif(DEFINED ENV{HIP_DIR}) file(TO_CMAKE_PATH "$ENV{HIP_DIR}" HIP_DIR) else() # using the HIP found set(HIP_PATH ${PACKAGE_PREFIX_DIR}) endif() endif() include("${CMAKE_CURRENT_LIST_DIR}/hiprtc-targets.cmake") clr-rocm-5.7.1/hipamd/src/hiprtc/hiprtc-builtins.def000066400000000000000000000000561450307266000223700ustar00rootroot00000000000000EXPORTS __hipRTC_header __hipRTC_header_size clr-rocm-5.7.1/hipamd/src/hiprtc/hiprtc.cpp000066400000000000000000000302141450307266000205640ustar00rootroot00000000000000/* Copyright (c) 2022 - Present Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include "hiprtcInternal.hpp" namespace hiprtc { thread_local TlsAggregator tls; } const char* hiprtcGetErrorString(hiprtcResult x) { switch (x) { case HIPRTC_SUCCESS: return "HIPRTC_SUCCESS"; case HIPRTC_ERROR_OUT_OF_MEMORY: return "HIPRTC_ERROR_OUT_OF_MEMORY"; case HIPRTC_ERROR_PROGRAM_CREATION_FAILURE: return "HIPRTC_ERROR_PROGRAM_CREATION_FAILURE"; case HIPRTC_ERROR_INVALID_INPUT: return "HIPRTC_ERROR_INVALID_INPUT"; case HIPRTC_ERROR_INVALID_PROGRAM: return "HIPRTC_ERROR_INVALID_PROGRAM"; case HIPRTC_ERROR_INVALID_OPTION: return "HIPRTC_ERROR_INVALID_OPTION"; case HIPRTC_ERROR_COMPILATION: return "HIPRTC_ERROR_COMPILATION"; case HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE: return "HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE"; case HIPRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION: return "HIPRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION"; case HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION: return "HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION"; case HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID: return "HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID"; case HIPRTC_ERROR_INTERNAL_ERROR: return "HIPRTC_ERROR_INTERNAL_ERROR"; case HIPRTC_ERROR_LINKING: return "HIPRTC_ERROR_LINKING"; default: LogPrintfError("Invalid HIPRTC error code: %d \n", x); return nullptr; }; return nullptr; } hiprtcResult hiprtcCreateProgram(hiprtcProgram* prog, const char* src, const char* name, int numHeaders, const char** headers, const char** headerNames) { HIPRTC_INIT_API(prog, src, name, numHeaders, headers, headerNames); if (prog == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_PROGRAM); } if (numHeaders < 0) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } if (numHeaders && (headers == nullptr || headerNames == nullptr)) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } std::string progName; if (name) { progName = name; } auto* rtcProgram = new hiprtc::RTCCompileProgram(progName); if (rtcProgram == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_PROGRAM_CREATION_FAILURE); } if (name == nullptr || strlen(name) == 0) { progName = "CompileSourceXXXXXX"; hiprtc::helpers::GenerateUniqueFileName(progName); } if (!rtcProgram->addSource(std::string(src), progName)) { delete rtcProgram; HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } for (int i = 0; i < numHeaders; i++) { if (!rtcProgram->addHeader(std::string(headers[i]), std::string(headerNames[i]))) { delete rtcProgram; HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } } *prog = hiprtc::RTCCompileProgram::as_hiprtcProgram(rtcProgram); HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcCompileProgram(hiprtcProgram prog, int numOptions, const char** options) { HIPRTC_INIT_API(prog, numOptions, options); auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); bool fgpu_rdc = false; std::vector opt; opt.reserve(numOptions); for (int i = 0; i < numOptions; i++) { if (std::string(options[i]) == std::string("-fgpu-rdc")) { fgpu_rdc = true; } opt.push_back(std::string(options[i])); } if (!rtcProgram->compile(opt, fgpu_rdc)) { HIPRTC_RETURN(HIPRTC_ERROR_COMPILATION); } HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcAddNameExpression(hiprtcProgram prog, const char* name_expression) { HIPRTC_INIT_API(prog, name_expression); if (name_expression == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); std::string name = name_expression; if (!rtcProgram->trackMangledName(name)) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcGetLoweredName(hiprtcProgram prog, const char* name_expression, const char** loweredName) { HIPRTC_INIT_API(prog, name_expression, loweredName); if (name_expression == nullptr || loweredName == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); if (!rtcProgram->getMangledName(name_expression, loweredName)) { return HIPRTC_RETURN(HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID); } HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcDestroyProgram(hiprtcProgram* prog) { HIPRTC_INIT_API(prog); if (prog == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(*prog); delete rtcProgram; HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcGetCodeSize(hiprtcProgram prog, size_t* binarySizeRet) { HIPRTC_INIT_API(prog, binarySizeRet); if (binarySizeRet == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); *binarySizeRet = rtcProgram->getExecSize(); HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcGetCode(hiprtcProgram prog, char* binaryMem) { HIPRTC_INIT_API(prog, binaryMem); if (binaryMem == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); auto binary = rtcProgram->getExec(); ::memcpy(binaryMem, binary.data(), binary.size()); HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcGetProgramLog(hiprtcProgram prog, char* dst) { HIPRTC_INIT_API(prog, dst); if (dst == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); auto log = rtcProgram->getLog(); ::memcpy(dst, log.data(), log.size()); HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcGetProgramLogSize(hiprtcProgram prog, size_t* logSizeRet) { HIPRTC_INIT_API(prog, logSizeRet); if (logSizeRet == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); *logSizeRet = rtcProgram->getLogSize(); HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcVersion(int* major, int* minor) { HIPRTC_INIT_API(major, minor); if (major == nullptr || minor == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } // TODO add actual version, what do these numbers mean *major = 9; *minor = 0; HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcGetBitcode(hiprtcProgram prog, char* bitcode) { HIPRTC_INIT_API(prog, bitcode); if (bitcode == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); if (!rtcProgram->GetBitcode(bitcode)) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_PROGRAM); } HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcGetBitcodeSize(hiprtcProgram prog, size_t* bitcode_size) { HIPRTC_INIT_API(prog, bitcode_size); if (bitcode_size == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } auto* rtcProgram = hiprtc::RTCCompileProgram::as_RTCCompileProgram(prog); if (!rtcProgram->GetBitcodeSize(bitcode_size)) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_PROGRAM); } HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcLinkCreate(unsigned int num_options, hiprtcJIT_option* options_ptr, void** options_vals_pptr, hiprtcLinkState* hip_link_state_ptr) { HIPRTC_INIT_API(num_options, options_ptr, options_vals_pptr, hip_link_state_ptr); if (hip_link_state_ptr == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } if (num_options != 0) { for (int i = 0; i < num_options; i++) { if (options_ptr == nullptr || options_vals_pptr == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } } } std::string name("LinkerProgram"); hiprtc::RTCLinkProgram* rtc_link_prog_ptr = new hiprtc::RTCLinkProgram(name); if (!rtc_link_prog_ptr->AddLinkerOptions(num_options, options_ptr, options_vals_pptr)) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_OPTION); } *hip_link_state_ptr = reinterpret_cast(rtc_link_prog_ptr); HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcLinkAddFile(hiprtcLinkState hip_link_state, hiprtcJITInputType input_type, const char* file_path, unsigned int num_options, hiprtcJIT_option* options_ptr, void** option_values) { HIPRTC_INIT_API(hip_link_state, input_type, file_path, num_options, options_ptr, option_values); if (hip_link_state == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } if (input_type == HIPRTC_JIT_INPUT_CUBIN || input_type == HIPRTC_JIT_INPUT_PTX || input_type == HIPRTC_JIT_INPUT_FATBINARY || input_type == HIPRTC_JIT_INPUT_OBJECT || input_type == HIPRTC_JIT_INPUT_LIBRARY || input_type == HIPRTC_JIT_INPUT_NVVM) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } hiprtc::RTCLinkProgram* rtc_link_prog_ptr = reinterpret_cast(hip_link_state); if (!rtc_link_prog_ptr->AddLinkerFile(std::string(file_path), input_type)) { HIPRTC_RETURN(HIPRTC_ERROR_PROGRAM_CREATION_FAILURE); } HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcLinkAddData(hiprtcLinkState hip_link_state, hiprtcJITInputType input_type, void* image, size_t image_size, const char* name, unsigned int num_options, hiprtcJIT_option* options_ptr, void** option_values) { HIPRTC_INIT_API(hip_link_state, image, image_size, name, num_options, options_ptr, option_values); if (image == nullptr || image_size <= 0) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } if (input_type == HIPRTC_JIT_INPUT_CUBIN || input_type == HIPRTC_JIT_INPUT_PTX || input_type == HIPRTC_JIT_INPUT_FATBINARY || input_type == HIPRTC_JIT_INPUT_OBJECT || input_type == HIPRTC_JIT_INPUT_LIBRARY || input_type == HIPRTC_JIT_INPUT_NVVM) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } std::string input_name; if (name) { input_name = name; } hiprtc::RTCLinkProgram* rtc_link_prog_ptr = reinterpret_cast(hip_link_state); if (!rtc_link_prog_ptr->AddLinkerData(image, image_size, input_name, input_type)) { HIPRTC_RETURN(HIPRTC_ERROR_PROGRAM_CREATION_FAILURE); } HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcLinkComplete(hiprtcLinkState hip_link_state, void** bin_out, size_t* size_out) { HIPRTC_INIT_API(hip_link_state, bin_out, size_out); if (bin_out == nullptr || size_out == nullptr) { HIPRTC_RETURN(HIPRTC_ERROR_INVALID_INPUT); } hiprtc::RTCLinkProgram* rtc_link_prog_ptr = reinterpret_cast(hip_link_state); if (!rtc_link_prog_ptr->LinkComplete(bin_out, size_out)) { HIPRTC_RETURN(HIPRTC_ERROR_LINKING); } HIPRTC_RETURN(HIPRTC_SUCCESS); } hiprtcResult hiprtcLinkDestroy(hiprtcLinkState hip_link_state) { HIPRTC_INIT_API(hip_link_state); hiprtc::RTCLinkProgram* rtc_link_prog_ptr = reinterpret_cast(hip_link_state); delete rtc_link_prog_ptr; HIPRTC_RETURN(HIPRTC_SUCCESS); } clr-rocm-5.7.1/hipamd/src/hiprtc/hiprtc.def000066400000000000000000000005421450307266000205410ustar00rootroot00000000000000EXPORTS hiprtcAddNameExpression hiprtcCompileProgram hiprtcCreateProgram hiprtcDestroyProgram hiprtcGetLoweredName hiprtcGetProgramLog hiprtcGetProgramLogSize hiprtcGetCode hiprtcGetCodeSize hiprtcVersion hiprtcGetErrorString hiprtcLinkCreate hiprtcLinkAddFile hiprtcLinkAddData hiprtcLinkComplete hiprtcLinkDestroy hiprtcGetBitcode hiprtcGetBitcodeSize clr-rocm-5.7.1/hipamd/src/hiprtc/hiprtc.map.in000066400000000000000000000007171450307266000211710ustar00rootroot00000000000000{ global: hiprtcCompileProgram; hiprtcCreateProgram; hiprtcDestroyProgram; hiprtcGetLoweredName; hiprtcGetProgramLog; hiprtcGetProgramLogSize; hiprtcGetCode; hiprtcGetCodeSize; hiprtcGetErrorString; hiprtcAddNameExpression; hiprtcVersion; hiprtcLinkCreate; hiprtcLinkAddFile; hiprtcLinkAddData; hiprtcLinkComplete; hiprtcLinkDestroy; hiprtcGetBitcode; hiprtcGetBitcodeSize; local: *; }; clr-rocm-5.7.1/hipamd/src/hiprtc/hiprtcComgrHelper.cpp000066400000000000000000000733341450307266000227260ustar00rootroot00000000000000/* Copyright (c) 2022 - Present Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "hiprtcComgrHelper.hpp" #if defined(_WIN32) #include #endif #include "../amd_hsa_elf.hpp" namespace hiprtc { namespace helpers { size_t constexpr strLiteralLength(char const* str) { return *str ? 1 + strLiteralLength(str + 1) : 0; } constexpr char const* CLANG_OFFLOAD_BUNDLER_MAGIC_STR = "__CLANG_OFFLOAD_BUNDLE__"; constexpr char const* OFFLOAD_KIND_HIP = "hip"; constexpr char const* OFFLOAD_KIND_HIPV4 = "hipv4"; constexpr char const* OFFLOAD_KIND_HCC = "hcc"; constexpr char const* AMDGCN_TARGET_TRIPLE = "amdgcn-amd-amdhsa-"; static constexpr size_t bundle_magic_string_size = strLiteralLength(CLANG_OFFLOAD_BUNDLER_MAGIC_STR); struct __ClangOffloadBundleInfo { uint64_t offset; uint64_t size; uint64_t bundleEntryIdSize; const char bundleEntryId[1]; }; struct __ClangOffloadBundleHeader { const char magic[bundle_magic_string_size - 1]; uint64_t numOfCodeObjects; __ClangOffloadBundleInfo desc[1]; }; uint64_t ElfSize(const void* emi) { return amd::Elf::getElfSize(emi); } static bool getProcName(uint32_t EFlags, std::string& proc_name, bool& xnackSupported, bool& sramEccSupported) { switch (EFlags & EF_AMDGPU_MACH) { case EF_AMDGPU_MACH_AMDGCN_GFX700: xnackSupported = false; sramEccSupported = false; proc_name = "gfx700"; break; case EF_AMDGPU_MACH_AMDGCN_GFX701: xnackSupported = false; sramEccSupported = false; proc_name = "gfx701"; break; case EF_AMDGPU_MACH_AMDGCN_GFX702: xnackSupported = false; sramEccSupported = false; proc_name = "gfx702"; break; case EF_AMDGPU_MACH_AMDGCN_GFX703: xnackSupported = false; sramEccSupported = false; proc_name = "gfx703"; break; case EF_AMDGPU_MACH_AMDGCN_GFX704: xnackSupported = false; sramEccSupported = false; proc_name = "gfx704"; break; case EF_AMDGPU_MACH_AMDGCN_GFX705: xnackSupported = false; sramEccSupported = false; proc_name = "gfx705"; break; case EF_AMDGPU_MACH_AMDGCN_GFX801: xnackSupported = true; sramEccSupported = false; proc_name = "gfx801"; break; case EF_AMDGPU_MACH_AMDGCN_GFX802: xnackSupported = false; sramEccSupported = false; proc_name = "gfx802"; break; case EF_AMDGPU_MACH_AMDGCN_GFX803: xnackSupported = false; sramEccSupported = false; proc_name = "gfx803"; break; case EF_AMDGPU_MACH_AMDGCN_GFX805: xnackSupported = false; sramEccSupported = false; proc_name = "gfx805"; break; case EF_AMDGPU_MACH_AMDGCN_GFX810: xnackSupported = true; sramEccSupported = false; proc_name = "gfx810"; break; case EF_AMDGPU_MACH_AMDGCN_GFX900: xnackSupported = true; sramEccSupported = false; proc_name = "gfx900"; break; case EF_AMDGPU_MACH_AMDGCN_GFX902: xnackSupported = true; sramEccSupported = false; proc_name = "gfx902"; break; case EF_AMDGPU_MACH_AMDGCN_GFX904: xnackSupported = true; sramEccSupported = false; proc_name = "gfx904"; break; case EF_AMDGPU_MACH_AMDGCN_GFX906: xnackSupported = true; sramEccSupported = true; proc_name = "gfx906"; break; case EF_AMDGPU_MACH_AMDGCN_GFX908: xnackSupported = true; sramEccSupported = true; proc_name = "gfx908"; break; case EF_AMDGPU_MACH_AMDGCN_GFX909: xnackSupported = true; sramEccSupported = false; proc_name = "gfx909"; break; case EF_AMDGPU_MACH_AMDGCN_GFX90A: xnackSupported = true; sramEccSupported = true; proc_name = "gfx90a"; break; case EF_AMDGPU_MACH_AMDGCN_GFX90C: xnackSupported = true; sramEccSupported = false; proc_name = "gfx90c"; break; case EF_AMDGPU_MACH_AMDGCN_GFX940: xnackSupported = true; sramEccSupported = true; proc_name = "gfx940"; break; case EF_AMDGPU_MACH_AMDGCN_GFX941: xnackSupported = true; sramEccSupported = true; proc_name = "gfx941"; break; case EF_AMDGPU_MACH_AMDGCN_GFX942: xnackSupported = true; sramEccSupported = true; proc_name = "gfx942"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1010: xnackSupported = true; sramEccSupported = false; proc_name = "gfx1010"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1011: xnackSupported = true; sramEccSupported = false; proc_name = "gfx1011"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1012: xnackSupported = true; sramEccSupported = false; proc_name = "gfx1012"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1013: xnackSupported = true; sramEccSupported = false; proc_name = "gfx1013"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1030: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1030"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1031: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1031"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1032: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1032"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1033: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1033"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1034: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1034"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1035: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1035"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1036: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1036"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1100: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1100"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1101: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1101"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1102: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1102"; break; case EF_AMDGPU_MACH_AMDGCN_GFX1103: xnackSupported = false; sramEccSupported = false; proc_name = "gfx1103"; break; default: return false; } return true; } static bool getTripleTargetIDFromCodeObject(const void* code_object, std::string& target_id) { if (!code_object) return false; const Elf64_Ehdr* ehdr = reinterpret_cast(code_object); if (ehdr->e_machine != EM_AMDGPU) return false; if (ehdr->e_ident[EI_OSABI] != ELFOSABI_AMDGPU_HSA) return false; bool isXnackSupported{false}, isSramEccSupported{false}; std::string proc_name; if (!getProcName(ehdr->e_flags, proc_name, isXnackSupported, isSramEccSupported)) return false; target_id = std::string(AMDGCN_TARGET_TRIPLE) + '-' + proc_name; switch (ehdr->e_ident[EI_ABIVERSION]) { case ELFABIVERSION_AMDGPU_HSA_V2: { LogPrintfInfo("[Code Object V2, target id:%s]", target_id.c_str()); return false; } case ELFABIVERSION_AMDGPU_HSA_V3: { LogPrintfInfo("[Code Object V3, target id:%s]", target_id.c_str()); if (isSramEccSupported) { if (ehdr->e_flags & EF_AMDGPU_FEATURE_SRAMECC_V3) target_id += ":sramecc+"; else target_id += ":sramecc-"; } if (isXnackSupported) { if (ehdr->e_flags & EF_AMDGPU_FEATURE_XNACK_V3) target_id += ":xnack+"; else target_id += ":xnack-"; } break; } case ELFABIVERSION_AMDGPU_HSA_V4: case ELFABIVERSION_AMDGPU_HSA_V5: { if (ehdr->e_ident[EI_ABIVERSION] & ELFABIVERSION_AMDGPU_HSA_V4) { LogPrintfInfo("[Code Object V4, target id:%s]", target_id.c_str()); } else { LogPrintfInfo("[Code Object V5, target id:%s]", target_id.c_str()); } unsigned co_sram_value = (ehdr->e_flags) & EF_AMDGPU_FEATURE_SRAMECC_V4; if (co_sram_value == EF_AMDGPU_FEATURE_SRAMECC_OFF_V4) target_id += ":sramecc-"; else if (co_sram_value == EF_AMDGPU_FEATURE_SRAMECC_ON_V4) target_id += ":sramecc+"; unsigned co_xnack_value = (ehdr->e_flags) & EF_AMDGPU_FEATURE_XNACK_V4; if (co_xnack_value == EF_AMDGPU_FEATURE_XNACK_OFF_V4) target_id += ":xnack-"; else if (co_xnack_value == EF_AMDGPU_FEATURE_XNACK_ON_V4) target_id += ":xnack+"; break; } default: { return false; } } return true; } // Consumes the string 'consume_' from the starting of the given input // eg: input = amdgcn-amd-amdhsa--gfx908 and consume_ is amdgcn-amd-amdhsa-- // input will become gfx908. static bool consume(std::string& input, std::string consume_) { if (input.substr(0, consume_.size()) != consume_) { return false; } input = input.substr(consume_.size()); return true; } // Trim String till character, will be used to get gpuname // example: input is gfx908:sram-ecc+ and trim char is : // input will become sram-ecc+. static std::string trimName(std::string& input, char trim) { auto pos_ = input.find(trim); auto res = input; if (pos_ == std::string::npos) { input = ""; } else { res = input.substr(0, pos_); input = input.substr(pos_); } return res; } static char getFeatureValue(std::string& input, std::string feature) { char res = ' '; if (consume(input, std::move(feature))) { res = input[0]; input = input.substr(1); } return res; } static bool getTargetIDValue(std::string& input, std::string& processor, char& sramecc_value, char& xnack_value) { processor = trimName(input, ':'); sramecc_value = getFeatureValue(input, std::string(":sramecc")); if (sramecc_value != ' ' && sramecc_value != '+' && sramecc_value != '-') return false; xnack_value = getFeatureValue(input, std::string(":xnack")); if (xnack_value != ' ' && xnack_value != '+' && xnack_value != '-') return false; return true; } static bool getTripleTargetID(std::string bundled_co_entry_id, const void* code_object, std::string& co_triple_target_id) { std::string offload_kind = trimName(bundled_co_entry_id, '-'); if (offload_kind != OFFLOAD_KIND_HIPV4 && offload_kind != OFFLOAD_KIND_HIP && offload_kind != OFFLOAD_KIND_HCC) return false; if (offload_kind != OFFLOAD_KIND_HIPV4) return getTripleTargetIDFromCodeObject(code_object, co_triple_target_id); // For code object V4 onwards the bundled code object entry ID correctly // specifies the target triple. co_triple_target_id = bundled_co_entry_id.substr(1); return true; } bool isCodeObjectCompatibleWithDevice(std::string co_triple_target_id, std::string agent_triple_target_id) { // Primitive Check if (co_triple_target_id == agent_triple_target_id) return true; // Parse code object triple target id if (!consume(co_triple_target_id, std::string(OFFLOAD_KIND_HIP) + "-" + std::string(AMDGCN_TARGET_TRIPLE))) { return false; } std::string co_processor; char co_sram_ecc, co_xnack; if (!getTargetIDValue(co_triple_target_id, co_processor, co_sram_ecc, co_xnack)) { return false; } if (!co_triple_target_id.empty()) return false; // Parse agent isa triple target id if (!consume(agent_triple_target_id, std::string(AMDGCN_TARGET_TRIPLE) + '-')) { return false; } std::string agent_isa_processor; char isa_sram_ecc, isa_xnack; if (!getTargetIDValue(agent_triple_target_id, agent_isa_processor, isa_sram_ecc, isa_xnack)) { return false; } if (!agent_triple_target_id.empty()) return false; // Check for compatibility if (agent_isa_processor != co_processor) return false; if (co_sram_ecc != ' ') { if (co_sram_ecc != isa_sram_ecc) return false; } if (co_xnack != ' ') { if (co_xnack != isa_xnack) return false; } return true; } bool UnbundleBitCode(const std::vector& bundled_llvm_bitcode, const std::string& isa, size_t& co_offset, size_t& co_size) { std::string magic(bundled_llvm_bitcode.begin(), bundled_llvm_bitcode.begin() + bundle_magic_string_size); if (magic.compare(CLANG_OFFLOAD_BUNDLER_MAGIC_STR)) { // Handle case where the whole file is unbundled return true; } std::string bundled_llvm_bitcode_s(bundled_llvm_bitcode.begin(), bundled_llvm_bitcode.begin() + bundled_llvm_bitcode.size()); const void* data = reinterpret_cast(bundled_llvm_bitcode_s.c_str()); const auto obheader = reinterpret_cast(data); const auto* desc = &obheader->desc[0]; for (uint64_t idx = 0; idx < obheader->numOfCodeObjects; ++idx, desc = reinterpret_cast( reinterpret_cast(&desc->bundleEntryId[0]) + desc->bundleEntryIdSize)) { const void* image = reinterpret_cast(reinterpret_cast(obheader) + desc->offset); const size_t image_size = desc->size; std::string bundleEntryId{desc->bundleEntryId, desc->bundleEntryIdSize}; // Check if the device id and code object id are compatible if (isCodeObjectCompatibleWithDevice(bundleEntryId, isa)) { co_offset = (reinterpret_cast(image) - reinterpret_cast(data)); co_size = image_size; break; } } return true; } bool addCodeObjData(amd_comgr_data_set_t& input, const std::vector& source, const std::string& name, const amd_comgr_data_kind_t type) { amd_comgr_data_t data; if (auto res = amd::Comgr::create_data(type, &data); res != AMD_COMGR_STATUS_SUCCESS) { return false; } if (auto res = amd::Comgr::set_data(data, source.size(), source.data()); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::release_data(data); return false; } if (auto res = amd::Comgr::set_data_name(data, name.c_str()); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::release_data(data); return false; } if (auto res = amd::Comgr::data_set_add(input, data); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::release_data(data); return false; } amd::Comgr::release_data(data); // Release from our end after setting the input return true; } bool extractBuildLog(amd_comgr_data_set_t dataSet, std::string& buildLog) { size_t count; if (auto res = amd::Comgr::action_data_count(dataSet, AMD_COMGR_DATA_KIND_LOG, &count); res != AMD_COMGR_STATUS_SUCCESS) { return false; } std::vector log; if (count > 0) { if (!extractByteCodeBinary(dataSet, AMD_COMGR_DATA_KIND_LOG, log)) return false; buildLog.insert(buildLog.end(), log.data(), log.data() + log.size()); } return true; } bool extractByteCodeBinary(const amd_comgr_data_set_t inDataSet, const amd_comgr_data_kind_t dataKind, std::vector& bin) { amd_comgr_data_t binaryData; if (auto res = amd::Comgr::action_data_get_data(inDataSet, dataKind, 0, &binaryData); res != AMD_COMGR_STATUS_SUCCESS) { return false; } size_t binarySize = 0; if (auto res = amd::Comgr::get_data(binaryData, &binarySize, NULL); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::release_data(binaryData); return false; } size_t bufSize = (dataKind == AMD_COMGR_DATA_KIND_LOG) ? binarySize + 1 : binarySize; char* binary = new char[bufSize]; if (binary == nullptr) { amd::Comgr::release_data(binaryData); return false; } if (auto res = amd::Comgr::get_data(binaryData, &binarySize, binary); res != AMD_COMGR_STATUS_SUCCESS) { delete[] binary; amd::Comgr::release_data(binaryData); return false; } if (dataKind == AMD_COMGR_DATA_KIND_LOG) { binary[binarySize] = '\0'; } amd::Comgr::release_data(binaryData); bin.reserve(binarySize); bin.assign(binary, binary + binarySize); delete[] binary; return true; } bool createAction(amd_comgr_action_info_t& action, std::vector& options, const std::string& isa, const amd_comgr_language_t lang) { if (auto res = amd::Comgr::create_action_info(&action); res != AMD_COMGR_STATUS_SUCCESS) { return false; } if (lang != AMD_COMGR_LANGUAGE_NONE) { if (auto res = amd::Comgr::action_info_set_language(action, lang); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); return false; } } if (auto res = amd::Comgr::action_info_set_isa_name(action, isa.c_str()); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); return false; } std::vector optionsArgv; optionsArgv.reserve(options.size()); for (auto& option : options) { optionsArgv.push_back(option.c_str()); } if (auto res = amd::Comgr::action_info_set_option_list(action, optionsArgv.data(), optionsArgv.size()); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); return res; } if (auto res = amd::Comgr::action_info_set_logging(action, true); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); return res; } return AMD_COMGR_STATUS_SUCCESS; } bool compileToBitCode(const amd_comgr_data_set_t compileInputs, const std::string& isa, std::vector& compileOptions, std::string& buildLog, std::vector& LLVMBitcode) { amd_comgr_language_t lang = AMD_COMGR_LANGUAGE_HIP; amd_comgr_action_info_t action; amd_comgr_data_set_t output; amd_comgr_data_set_t input = compileInputs; if (auto res = createAction(action, compileOptions, isa, lang); res != AMD_COMGR_STATUS_SUCCESS) { return false; } if (auto res = amd::Comgr::create_data_set(&output); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); return false; } if (auto res = amd::Comgr::do_action(AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC, action, input, output); res != AMD_COMGR_STATUS_SUCCESS) { extractBuildLog(output, buildLog); amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); return false; } if (!extractBuildLog(output, buildLog)) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); return false; } if (!extractByteCodeBinary(output, AMD_COMGR_DATA_KIND_BC, LLVMBitcode)) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); return false; } // Clean up amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); return true; } bool linkLLVMBitcode(const amd_comgr_data_set_t linkInputs, const std::string& isa, std::vector& linkOptions, std::string& buildLog, std::vector& LinkedLLVMBitcode) { amd_comgr_language_t lang = AMD_COMGR_LANGUAGE_HIP; amd_comgr_action_info_t action; if (auto res = createAction(action, linkOptions, isa, AMD_COMGR_LANGUAGE_HIP); res != AMD_COMGR_STATUS_SUCCESS) { return false; } amd_comgr_data_set_t output; if (auto res = amd::Comgr::create_data_set(&output); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); return false; } if (auto res = amd::Comgr::do_action(AMD_COMGR_ACTION_LINK_BC_TO_BC, action, linkInputs, output); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); return false; } if (!extractBuildLog(output, buildLog)) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); return false; } if (!extractByteCodeBinary(output, AMD_COMGR_DATA_KIND_BC, LinkedLLVMBitcode)) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); return false; } amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); return true; } bool createExecutable(const amd_comgr_data_set_t linkInputs, const std::string& isa, std::vector& exeOptions, std::string& buildLog, std::vector& executable) { amd_comgr_action_info_t action; if (auto res = createAction(action, exeOptions, isa); res != AMD_COMGR_STATUS_SUCCESS) { return false; } amd_comgr_data_set_t relocatableData; if (auto res = amd::Comgr::create_data_set(&relocatableData); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); return false; } if (auto res = amd::Comgr::do_action(AMD_COMGR_ACTION_CODEGEN_BC_TO_RELOCATABLE, action, linkInputs, relocatableData); res != AMD_COMGR_STATUS_SUCCESS) { extractBuildLog(relocatableData, buildLog); amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(relocatableData); return false; } if (!extractBuildLog(relocatableData, buildLog)) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(relocatableData); return false; } amd::Comgr::destroy_action_info(action); std::vector emptyOpt; if (auto res = createAction(action, emptyOpt, isa); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_data_set(relocatableData); return false; } amd_comgr_data_set_t output; if (auto res = amd::Comgr::create_data_set(&output); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(relocatableData); return false; } if (auto res = amd::Comgr::do_action(AMD_COMGR_ACTION_LINK_RELOCATABLE_TO_EXECUTABLE, action, relocatableData, output); res != AMD_COMGR_STATUS_SUCCESS) { extractBuildLog(output, buildLog); amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); amd::Comgr::destroy_data_set(relocatableData); return false; } if (!extractBuildLog(output, buildLog)) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); amd::Comgr::destroy_data_set(relocatableData); return false; } if (!extractByteCodeBinary(output, AMD_COMGR_DATA_KIND_EXECUTABLE, executable)) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); amd::Comgr::destroy_data_set(relocatableData); return false; } amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(output); amd::Comgr::destroy_data_set(relocatableData); return true; } void GenerateUniqueFileName(std::string& name) { #if !defined(_WIN32) char* name_template = const_cast(name.c_str()); int temp_fd = mkstemp(name_template); #else char* name_template = new char[name.length() + 1]; strcpy_s(name_template, name.length() + 1, name.data()); int sizeinchars = strnlen(name_template, 20) + 1; _mktemp_s(name_template, sizeinchars); #endif name = name_template; #if !defined(_WIN32) unlink(name_template); close(temp_fd); #endif } bool dumpIsaFromBC(const amd_comgr_data_set_t isaInputs, const std::string& isa, std::vector& exeOptions, std::string name, std::string& buildLog) { amd_comgr_action_info_t action; if (auto res = createAction(action, exeOptions, isa); res != AMD_COMGR_STATUS_SUCCESS) { return false; } amd_comgr_data_set_t isaData; if (auto res = amd::Comgr::create_data_set(&isaData); res != AMD_COMGR_STATUS_SUCCESS) { amd::Comgr::destroy_action_info(action); return false; } if (auto res = amd::Comgr::do_action(AMD_COMGR_ACTION_CODEGEN_BC_TO_ASSEMBLY, action, isaInputs, isaData); res != AMD_COMGR_STATUS_SUCCESS) { extractBuildLog(isaData, buildLog); amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(isaData); return false; } std::vector isaOutput; if (!extractByteCodeBinary(isaData, AMD_COMGR_DATA_KIND_SOURCE, isaOutput)) { amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(isaData); return false; } if (name.size() == 0) { // Generate a unique name if the program name is not specified by the user name = std::string("hiprtcXXXXXX"); GenerateUniqueFileName(name); } std::string isaName = isa; #if defined(_WIN32) // Replace special charaters that are not supported by Windows FS. std::replace(isaName.begin(), isaName.end(), ':', '@'); #endif auto isaFileName = name + std::string("-hip-") + isaName + ".s"; std::ofstream f(isaFileName.c_str(), std::ios::trunc | std::ios::binary); if (f.is_open()) { f.write(isaOutput.data(), isaOutput.size()); f.close(); } else { buildLog += "Warning: writing isa file failed.\n"; amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(isaData); return false; } amd::Comgr::destroy_action_info(action); amd::Comgr::destroy_data_set(isaData); return true; } bool demangleName(const std::string& mangledName, std::string& demangledName) { amd_comgr_data_t mangled_data; amd_comgr_data_t demangled_data; if (AMD_COMGR_STATUS_SUCCESS != amd::Comgr::create_data(AMD_COMGR_DATA_KIND_BYTES, &mangled_data)) return false; if (AMD_COMGR_STATUS_SUCCESS != amd::Comgr::set_data(mangled_data, mangledName.size(), mangledName.c_str())) { amd::Comgr::release_data(mangled_data); return false; } if (AMD_COMGR_STATUS_SUCCESS != amd::Comgr::demangle_symbol_name(mangled_data, &demangled_data)) { amd::Comgr::release_data(mangled_data); return false; } size_t demangled_size = 0; if (AMD_COMGR_STATUS_SUCCESS != amd::Comgr::get_data(demangled_data, &demangled_size, NULL)) { amd::Comgr::release_data(mangled_data); amd::Comgr::release_data(demangled_data); return false; } demangledName.resize(demangled_size); if (AMD_COMGR_STATUS_SUCCESS != amd::Comgr::get_data(demangled_data, &demangled_size, const_cast(demangledName.data()))) { amd::Comgr::release_data(mangled_data); amd::Comgr::release_data(demangled_data); return false; } amd::Comgr::release_data(mangled_data); amd::Comgr::release_data(demangled_data); return true; } std::string handleMangledName(std::string loweredName) { if (loweredName.empty()) { return loweredName; } if (loweredName.find(".kd") != std::string::npos) { return {}; } if (loweredName.find("void ") == 0) { loweredName.erase(0, strlen("void ")); } auto dx{loweredName.find_first_of("(<")}; if (dx == std::string::npos) { return loweredName; } if (loweredName[dx] == '<') { uint32_t count = 1; do { ++dx; count += (loweredName[dx] == '<') ? 1 : ((loweredName[dx] == '>') ? -1 : 0); } while (count); loweredName.erase(++dx); } else { loweredName.erase(dx); } return loweredName; } bool fillMangledNames(std::vector& dataVec, std::vector& mangledNames, bool isBitcode) { amd_comgr_data_t dataObject; if (auto res = amd::Comgr::create_data( isBitcode ? AMD_COMGR_DATA_KIND_BC : AMD_COMGR_DATA_KIND_EXECUTABLE, &dataObject); res != AMD_COMGR_STATUS_SUCCESS) { return false; } if (auto res = amd::Comgr::set_data(dataObject, dataVec.size(), dataVec.data())) { amd::Comgr::release_data(dataObject); return false; } size_t Count; if (auto res = amd::Comgr::populate_mangled_names(dataObject, &Count)) { amd::Comgr::release_data(dataObject); return false; } for (size_t i = 0; i < Count; i++) { size_t Size; if (auto res = amd::Comgr::get_mangled_name(dataObject, i, &Size, NULL)) { amd::Comgr::release_data(dataObject); return false; } char* mName = new char[Size](); if (auto res = amd::Comgr::get_mangled_name(dataObject, i, &Size, mName)) { amd::Comgr::release_data(dataObject); return false; } mangledNames.push_back(std::string(mName)); delete [] mName; } amd::Comgr::release_data(dataObject); return true; } bool getDemangledNames(const std::vector& mangledNames, std::map& demangledNames) { for (auto& i : mangledNames) { std::string demangledName; if (!demangleName(i, demangledName)) return false; demangledName = handleMangledName(demangledName); demangledName.erase(std::remove_if(demangledName.begin(), demangledName.end(), [](unsigned char c) { return std::isspace(c); }), demangledName.end()); if (auto dres = demangledNames.find(demangledName); dres != demangledNames.end()) { dres->second = i; } } return true; } } // namespace helpers } // namespace hiprtc clr-rocm-5.7.1/hipamd/src/hiprtc/hiprtcComgrHelper.hpp000066400000000000000000000065041450307266000227260ustar00rootroot00000000000000/* Copyright (c) 2022 - Present Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include #include #include "vdi_common.hpp" #include "utils/debug.hpp" #include "device/comgrctx.hpp" namespace hiprtc { namespace helpers { bool UnbundleBitCode(const std::vector& bundled_bit_code, const std::string& isa, size_t& co_offset, size_t& co_size); bool addCodeObjData(amd_comgr_data_set_t& input, const std::vector& source, const std::string& name, const amd_comgr_data_kind_t type); bool extractBuildLog(amd_comgr_data_set_t dataSet, std::string& buildLog); bool extractByteCodeBinary(const amd_comgr_data_set_t inDataSet, const amd_comgr_data_kind_t dataKind, std::vector& bin); bool createAction(amd_comgr_action_info_t& action, std::vector& options, const std::string& isa, const amd_comgr_language_t lang = AMD_COMGR_LANGUAGE_NONE); bool compileToBitCode(const amd_comgr_data_set_t compileInputs, const std::string& isa, std::vector& compileOptions, std::string& buildLog, std::vector& LLVMBitcode); bool linkLLVMBitcode(const amd_comgr_data_set_t linkInputs, const std::string& isa, std::vector& linkOptions, std::string& buildLog, std::vector& LinkedLLVMBitcode); bool createExecutable(const amd_comgr_data_set_t linkInputs, const std::string& isa, std::vector& exeOptions, std::string& buildLog, std::vector& executable); bool dumpIsaFromBC(const amd_comgr_data_set_t isaInputs, const std::string& isa, std::vector& exeOptions, std::string name, std::string& buildLog); bool demangleName(const std::string& mangledName, std::string& demangledName); std::string handleMangledName(std::string loweredName); bool fillMangledNames(std::vector& executable, std::vector& mangledNames, bool isBitcode); bool getDemangledNames(const std::vector& mangledNames, std::map& demangledNames); void GenerateUniqueFileName(std::string& name); } // namespace helpers } // namespace hiprtc clr-rocm-5.7.1/hipamd/src/hiprtc/hiprtcInternal.cpp000066400000000000000000000571261450307266000222740ustar00rootroot00000000000000/* Copyright (c) 2022 - 2023 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "hiprtcInternal.hpp" #include #include #include #include #include "vdi_common.hpp" #include "utils/flags.hpp" namespace hiprtc { using namespace helpers; std::vector getLinkOptions(const LinkArguments& args) { std::vector res; auto irArgCount = args.linkerIRArgCount(); if (irArgCount > 0) { res.reserve(irArgCount); auto irArg = args.linkerIRArg(); for (size_t i = 0; i < irArgCount; i++) { res.emplace_back(std::string(irArg[i])); } } return res; } // RTC Program Member Functions RTCProgram::RTCProgram(std::string name) : name_(name) { constexpr bool kComgrVersioned = true; std::call_once(amd::Comgr::initialized, amd::Comgr::LoadLib, kComgrVersioned); if (amd::Comgr::create_data_set(&exec_input_) != AMD_COMGR_STATUS_SUCCESS) { crashWithMessage("Failed to allocate internal hiprtc structure"); } } bool RTCProgram::findIsa() { const char* libName; #ifdef _WIN32 libName = "amdhip64.dll"; #else libName = "libamdhip64.so"; #endif void* handle = amd::Os::loadLibrary(libName); if (!handle) { LogInfo("hip runtime failed to load using dlopen"); build_log_ += "hip runtime failed to load.\n" "Error: Please provide architecture for which code is to be " "generated.\n"; return false; } void* sym_hipGetDevice = amd::Os::getSymbol(handle, "hipGetDevice"); void* sym_hipGetDeviceProperties = amd::Os::getSymbol(handle, "hipGetDeviceProperties"); if (sym_hipGetDevice == nullptr || sym_hipGetDeviceProperties == nullptr) { LogInfo("ISA cannot be found to dlsym failure"); build_log_ += "ISA cannot be found from hip runtime.\n" "Error: Please provide architecture for which code is to be " "generated.\n"; return false; } hipError_t (*dyn_hipGetDevice)(int*) = reinterpret_cast(sym_hipGetDevice); hipError_t (*dyn_hipGetDeviceProperties)(hipDeviceProp_t*, int) = reinterpret_cast(sym_hipGetDeviceProperties); int device; hipError_t status = dyn_hipGetDevice(&device); if (status != hipSuccess) { return false; } hipDeviceProp_t props; status = dyn_hipGetDeviceProperties(&props, device); if (status != hipSuccess) { return false; } isa_ = "amdgcn-amd-amdhsa--"; isa_.append(props.gcnArchName); amd::Os::unloadLibrary(handle); return true; } // RTC Compile Program Member Functions RTCCompileProgram::RTCCompileProgram(std::string name_) : RTCProgram(name_), fgpu_rdc_(false) { if ((amd::Comgr::create_data_set(&compile_input_) != AMD_COMGR_STATUS_SUCCESS) || (amd::Comgr::create_data_set(&link_input_) != AMD_COMGR_STATUS_SUCCESS)) { crashWithMessage("Failed to allocate internal hiprtc structure"); } // Add internal header if (!addBuiltinHeader()) { crashWithMessage("Unable to add internal header"); } // Add compile options const std::string hipVerOpt{"--hip-version=" + std::to_string(HIP_VERSION_MAJOR) + '.' + std::to_string(HIP_VERSION_MINOR) + '.' + std::to_string(HIP_VERSION_PATCH)}; const std::string hipVerMajor{"-DHIP_VERSION_MAJOR=" + std::to_string(HIP_VERSION_MAJOR)}; const std::string hipVerMinor{"-DHIP_VERSION_MINOR=" + std::to_string(HIP_VERSION_MINOR)}; const std::string hipVerPatch{"-DHIP_VERSION_PATCH=" + std::to_string(HIP_VERSION_PATCH)}; compile_options_.reserve(20); // count of options below compile_options_.push_back("-O3"); if (GPU_ENABLE_WGP_MODE) compile_options_.push_back("-mcumode"); compile_options_.push_back(hipVerOpt); compile_options_.push_back(hipVerMajor); compile_options_.push_back(hipVerMinor); compile_options_.push_back(hipVerPatch); compile_options_.push_back("-D__HIPCC_RTC__"); compile_options_.push_back("-include"); compile_options_.push_back("hiprtc_runtime.h"); compile_options_.push_back("-std=c++14"); compile_options_.push_back("-nogpuinc"); compile_options_.push_back("-Wno-gnu-line-marker"); compile_options_.push_back("-Wno-missing-prototypes"); #ifdef _WIN32 compile_options_.push_back("-target"); compile_options_.push_back("x86_64-pc-windows-msvc"); compile_options_.push_back("-fms-extensions"); compile_options_.push_back("-fms-compatibility"); #endif exe_options_.push_back("-O3"); } bool RTCCompileProgram::addSource(const std::string& source, const std::string& name) { if (source.size() == 0 || name.size() == 0) { LogError("Error in hiprtc: source or name is of size 0 in addSource"); return false; } source_code_ += source; source_name_ = name; return true; } // addSource_impl is a different function because we need to add source when we track mangled // objects bool RTCCompileProgram::addSource_impl() { std::vector vsource(source_code_.begin(), source_code_.end()); if (!addCodeObjData(compile_input_, vsource, source_name_, AMD_COMGR_DATA_KIND_SOURCE)) { return false; } return true; } bool RTCCompileProgram::addHeader(const std::string& source, const std::string& name) { if (source.size() == 0 || name.size() == 0) { LogError("Error in hiprtc: source or name is of size 0 in addHeader"); return false; } std::vector vsource(source.begin(), source.end()); if (!addCodeObjData(compile_input_, vsource, name, AMD_COMGR_DATA_KIND_INCLUDE)) { return false; } return true; } bool RTCCompileProgram::addBuiltinHeader() { std::vector source(__hipRTC_header, __hipRTC_header + __hipRTC_header_size); std::string name{"hiprtc_runtime.h"}; if (!addCodeObjData(compile_input_, source, name, AMD_COMGR_DATA_KIND_INCLUDE)) { return false; } return true; } bool RTCCompileProgram::findLLVMOptions(const std::vector& options, std::vector& llvm_options) { for (size_t i = 0; i < options.size(); ++i) { if (options[i] == "-mllvm") { if (options.size() == (i + 1)) { LogInfo( "-mllvm option passed by the app, it comes as a pair but there is no option after " "this"); return false; } llvm_options.push_back(options[i]); llvm_options.push_back(options[i + 1]); } } return true; } bool RTCCompileProgram::transformOptions(std::vector& compile_options) { auto getValueOf = [](const std::string& option) { std::string res; auto f = std::find(option.begin(), option.end(), '='); if (f != option.end()) res = std::string(f + 1, option.end()); return res; }; for (auto& i : compile_options) { if (i == "-hip-pch") { LogInfo( "-hip-pch is deprecated option, has no impact on execution of new hiprtc programs, it " "can be removed"); i.clear(); continue; } // Some rtc samples use --gpu-architecture if (i.rfind("--gpu-architecture=", 0) == 0) { LogInfo("--gpu-architecture is nvcc option, transforming it to --offload-arch option"); auto val = getValueOf(i); i = "--offload-arch=" + val; continue; } if (i == "--save-temps") { settings_.dumpISA = true; continue; } } // Removed consumed options compile_options.erase( std::remove(compile_options.begin(), compile_options.end(), std::string("")), compile_options.end()); if (auto res = std::find_if( compile_options.begin(), compile_options.end(), [](const std::string& str) { return str.find("--offload-arch=") != std::string::npos; }); res != compile_options.end()) { auto isaName = getValueOf(*res); isa_ = "amdgcn-amd-amdhsa--" + isaName; settings_.offloadArchProvided = true; return true; } // App has not provided the gpu archiecture, need to find it return findIsa(); } static inline uint getArchMajorVersion(std::string &isa) { if (const amd::Isa *isaIter = amd::Isa::findIsa(isa.data())) { return isaIter->versionMajor(); } return static_cast(-1); } amd::Monitor RTCProgram::lock_("HIPRTC Program", true); bool RTCCompileProgram::compile(const std::vector& options, bool fgpu_rdc) { if (!addSource_impl()) { LogError("Error in hiprtc: unable to add source code"); return false; } fgpu_rdc_ = fgpu_rdc; // Append compile options std::vector compileOpts(compile_options_); compileOpts.reserve(compile_options_.size() + options.size() + 2); compileOpts.insert(compileOpts.end(), options.begin(), options.end()); if (!fgpu_rdc_) { compileOpts.push_back("-Xclang"); compileOpts.push_back("-disable-llvm-passes"); } if (!transformOptions(compileOpts)) { LogError("Error in hiprtc: unable to transform options"); return false; } // Decide whether to enable wave64 compilation auto majorVer = getArchMajorVersion(isa_); if (majorVer <= 9 || !GPU_ENABLE_WAVE32_MODE) { if (majorVer > 9) { LogWarning("Wavefront size 64 is experimental for gfx10 and above. Warp " "functions may not work"); } compileOpts.push_back("-mwavefrontsize64"); link_options_.push_back("wavefrontsize64"); } if (!compileToBitCode(compile_input_, isa_, compileOpts, build_log_, LLVMBitcode_)) { LogError("Error in hiprtc: unable to compile source to bitcode"); return false; } if (fgpu_rdc_) { std::vector mangledNames; if (!fillMangledNames(LLVMBitcode_, mangledNames, true)) { LogError("Error in hiprtc: unable to fill mangled names"); return false; } if (!getDemangledNames(mangledNames, demangled_names_)) { LogError("Error in hiprtc: unable to get demangled names"); return false; } return true; } std::string linkFileName = "linked"; if (!addCodeObjData(link_input_, LLVMBitcode_, linkFileName, AMD_COMGR_DATA_KIND_BC)) { LogError("Error in hiprtc: unable to add linked code object"); return false; } std::vector LinkedLLVMBitcode; if (!linkLLVMBitcode(link_input_, isa_, link_options_, build_log_, LinkedLLVMBitcode)) { LogError("Error in hiprtc: unable to add device libs to linked bitcode"); return false; } std::string linkedFileName = "LLVMBitcode.bc"; if (!addCodeObjData(exec_input_, LinkedLLVMBitcode, linkedFileName, AMD_COMGR_DATA_KIND_BC)) { LogError("Error in hiprtc: unable to add device libs linked code object"); return false; } std::vector llvmOptions; // Find the -mllvm options passed by the app such as "-mllvm" "-amdgpu-early-inline-all=true" if (!findLLVMOptions(options, llvmOptions)) { LogError("Error in hiprtc: unable to match -mllvm options"); return false; } std::vector exeOpts(exe_options_); exeOpts.reserve(exeOpts.size() + llvmOptions.size() + 2); // Add these options by default for optimizations during BC to Relocatable phase. exeOpts.push_back("-mllvm"); exeOpts.push_back("-amdgpu-internalize-symbols"); // User provided -mllvm options are appended at the end since they can override the above // default options if necessary exeOpts.insert(exeOpts.end(), llvmOptions.begin(), llvmOptions.end()); if (settings_.dumpISA) { if (!dumpIsaFromBC(exec_input_, isa_, exeOpts, name_, build_log_)) { LogError("Error in hiprtc: unable to dump isa code"); return false; } } if (!createExecutable(exec_input_, isa_, exeOpts, build_log_, executable_)) { LogError("Error in hiprtc: unable to create executable"); return false; } std::vector mangledNames; if (!fillMangledNames(executable_, mangledNames, false)) { LogError("Error in hiprtc: unable to fill mangled names"); return false; } if (!getDemangledNames(mangledNames, demangled_names_)) { LogError("Error in hiprtc: unable to get demangled names"); return false; } return true; } void RTCCompileProgram::stripNamedExpression(std::string& strippedName) { if (strippedName.back() == ')') { strippedName.pop_back(); strippedName.erase(0, strippedName.find('(')); } if (strippedName.front() == '&') { strippedName.erase(0, 1); } // Removes the spaces from strippedName if present strippedName.erase(std::remove_if(strippedName.begin(), strippedName.end(), [](unsigned char c) { return std::isspace(c); }), strippedName.end()); } bool RTCCompileProgram::trackMangledName(std::string& name) { amd::ScopedLock lock(lock_); if (name.size() == 0) return false; std::string strippedNameNoSpace = name; stripNamedExpression(strippedNameNoSpace); stripped_names_.insert(std::pair(name, strippedNameNoSpace)); demangled_names_.insert(std::pair(strippedNameNoSpace, "")); const auto var{"__hiprtc_" + std::to_string(stripped_names_.size())}; const auto code{"\nextern \"C\" constexpr auto " + var + " = " + name + ";\n"}; source_code_ += code; return true; } bool RTCCompileProgram::getMangledName(const char* name_expression, const char** loweredName) { std::string strippedName = name_expression; stripNamedExpression(strippedName); if (auto dres = demangled_names_.find(strippedName); dres != demangled_names_.end()) { if (dres->second.size() != 0) { *loweredName = dres->second.c_str(); return true; } else return false; } return false; } bool RTCCompileProgram::GetBitcode(char* bitcode) { if (!fgpu_rdc_ || LLVMBitcode_.size() <= 0) { return false; } std::copy(LLVMBitcode_.begin(), LLVMBitcode_.end(), bitcode); return true; } bool RTCCompileProgram::GetBitcodeSize(size_t* bitcode_size) { if (!fgpu_rdc_ || LLVMBitcode_.size() <= 0) { return false; } *bitcode_size = LLVMBitcode_.size(); return true; } // RTC Link Program Member Functions RTCLinkProgram::RTCLinkProgram(std::string name) : RTCProgram(name) { if (amd::Comgr::create_data_set(&link_input_) != AMD_COMGR_STATUS_SUCCESS) { crashWithMessage("Failed to allocate internal hiprtc structure"); } } bool RTCLinkProgram::AddLinkerOptions(unsigned int num_options, hiprtcJIT_option* options_ptr, void** options_vals_ptr) { for (size_t opt_idx = 0; opt_idx < num_options; ++opt_idx) { if (options_vals_ptr[opt_idx] == nullptr) { LogError("Options value can not be nullptr"); return false; } switch (options_ptr[opt_idx]) { case HIPRTC_JIT_MAX_REGISTERS: link_args_.max_registers_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_THREADS_PER_BLOCK: link_args_.threads_per_block_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_WALL_TIME: link_args_.wall_time_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_INFO_LOG_BUFFER: link_args_.info_log_ = (reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_INFO_LOG_BUFFER_SIZE_BYTES: link_args_.info_log_size_ = (reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_ERROR_LOG_BUFFER: link_args_.error_log_ = reinterpret_cast(options_vals_ptr[opt_idx]); break; case HIPRTC_JIT_ERROR_LOG_BUFFER_SIZE_BYTES: link_args_.error_log_size_ = (reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_OPTIMIZATION_LEVEL: link_args_.optimization_level_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_TARGET_FROM_HIPCONTEXT: link_args_.target_from_hip_context_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_TARGET: link_args_.jit_target_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_FALLBACK_STRATEGY: link_args_.fallback_strategy_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_GENERATE_DEBUG_INFO: link_args_.generate_debug_info_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_LOG_VERBOSE: link_args_.log_verbose_ = reinterpret_cast(options_vals_ptr[opt_idx]); break; case HIPRTC_JIT_GENERATE_LINE_INFO: link_args_.generate_line_info_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_CACHE_MODE: link_args_.cache_mode_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_NEW_SM3X_OPT: link_args_.sm3x_opt_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_FAST_COMPILE: link_args_.fast_compile_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_GLOBAL_SYMBOL_NAMES: link_args_.global_symbol_names_ = reinterpret_cast(options_vals_ptr[opt_idx]); break; case HIPRTC_JIT_GLOBAL_SYMBOL_ADDRESS: link_args_.global_symbol_addresses_ = reinterpret_cast(options_vals_ptr[opt_idx]); break; case HIPRTC_JIT_GLOBAL_SYMBOL_COUNT: link_args_.global_symbol_count_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_LTO: link_args_.lto_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_FTZ: link_args_.ftz_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_PREC_DIV: link_args_.prec_div_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_PREC_SQRT: link_args_.prec_sqrt_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_FMA: link_args_.fma_ = *(reinterpret_cast(options_vals_ptr[opt_idx])); break; case HIPRTC_JIT_IR_TO_ISA_OPT_EXT: link_args_.linker_ir2isa_args_ = reinterpret_cast(options_vals_ptr[opt_idx]); break; case HIPRTC_JIT_IR_TO_ISA_OPT_COUNT_EXT: link_args_.linker_ir2isa_args_count_ = reinterpret_cast(options_vals_ptr[opt_idx]); break; default: break; } } return true; } amd_comgr_data_kind_t RTCLinkProgram::GetCOMGRDataKind(hiprtcJITInputType input_type) { amd_comgr_data_kind_t data_kind = AMD_COMGR_DATA_KIND_UNDEF; // Map the hiprtc input type to comgr data kind switch (input_type) { case HIPRTC_JIT_INPUT_LLVM_BITCODE: data_kind = AMD_COMGR_DATA_KIND_BC; break; case HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE: data_kind = HIPRTC_USE_RUNTIME_UNBUNDLER ? AMD_COMGR_DATA_KIND_BC : AMD_COMGR_DATA_KIND_BC_BUNDLE; break; case HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE: data_kind = AMD_COMGR_DATA_KIND_AR_BUNDLE; break; default: LogError("Cannot find the corresponding comgr data kind"); break; } return data_kind; } bool RTCLinkProgram::AddLinkerDataImpl(std::vector& link_data, hiprtcJITInputType input_type, std::string& link_file_name) { std::vector llvm_bitcode; // If this is bundled bitcode then unbundle this. if (HIPRTC_USE_RUNTIME_UNBUNDLER && input_type == HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE) { if (!findIsa()) { return false; } size_t co_offset = 0; size_t co_size = 0; if (!UnbundleBitCode(link_data, isa_, co_offset, co_size)) { LogError("Error in hiprtc: unable to unbundle the llvm bitcode"); return false; } llvm_bitcode.assign(link_data.begin() + co_offset, link_data.begin() + co_offset + co_size); } else { llvm_bitcode.assign(link_data.begin(), link_data.end()); } amd_comgr_data_kind_t data_kind; if ((data_kind = GetCOMGRDataKind(input_type)) == AMD_COMGR_DATA_KIND_UNDEF) { LogError("Cannot find the correct COMGR data kind"); return false; } if (!addCodeObjData(link_input_, llvm_bitcode, link_file_name, data_kind)) { LogError("Error in hiprtc: unable to add linked code object"); return false; } return true; } bool RTCLinkProgram::AddLinkerFile(std::string file_path, hiprtcJITInputType input_type) { std::ifstream file_stream{file_path, std::ios_base::in | std::ios_base::binary}; if (!file_stream.good()) { return false; } file_stream.seekg(0, std::ios::end); std::streampos file_size = file_stream.tellg(); file_stream.seekg(0, std::ios::beg); // Read the file contents std::vector link_file_info(file_size); file_stream.read(link_file_info.data(), file_size); file_stream.close(); std::string link_file_name("LinkerProgram"); return AddLinkerDataImpl(link_file_info, input_type, link_file_name); } bool RTCLinkProgram::AddLinkerData(void* image_ptr, size_t image_size, std::string link_file_name, hiprtcJITInputType input_type) { char* image_char_buf = reinterpret_cast(image_ptr); std::vector bundled_llvm_bitcode(image_char_buf, image_char_buf + image_size); return AddLinkerDataImpl(bundled_llvm_bitcode, input_type, link_file_name); } bool RTCLinkProgram::LinkComplete(void** bin_out, size_t* size_out) { if (!findIsa()) { return false; } std::vector linked_llvm_bitcode; std::vector linkopts; if (!linkLLVMBitcode(link_input_, isa_, linkopts, build_log_, linked_llvm_bitcode)) { LogError("Error in hiprtc: unable to add device libs to linked bitcode"); return false; } std::string linkedFileName = "LLVMBitcode.bc"; if (!addCodeObjData(exec_input_, linked_llvm_bitcode, linkedFileName, AMD_COMGR_DATA_KIND_BC)) { LogError("Error in hiprtc: unable to add linked bitcode"); return false; } std::vector exe_options = getLinkOptions(link_args_); exe_options.push_back("-O3"); LogPrintfInfo("Exe options forwarded to compiler: %s", [&]() { std::string ret; for (const auto& i : exe_options) { ret += i; ret += " "; } return ret; }() .c_str()); if (!createExecutable(exec_input_, isa_, exe_options, build_log_, executable_)) { LogError("Error in hiprtc: unable to create exectuable"); return false; } *size_out = executable_.size(); char* bin_out_c = new char[*size_out]; std::copy(executable_.begin(), executable_.end(), bin_out_c); *bin_out = reinterpret_cast(bin_out_c); return true; } } // namespace hiprtc clr-rocm-5.7.1/hipamd/src/hiprtc/hiprtcInternal.hpp000066400000000000000000000234661450307266000223010ustar00rootroot00000000000000/* Copyright (c) 2022 - Present Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #ifdef HIPRTC_USE_EXCEPTIONS #include #endif #include #include #include #include #include "top.hpp" #include "utils/debug.hpp" #include "utils/flags.hpp" #include "utils/macros.hpp" #ifdef __HIP_ENABLE_RTC extern "C" { extern const char __hipRTC_header[]; extern unsigned __hipRTC_header_size; } #endif #include "hiprtcComgrHelper.hpp" namespace hiprtc { namespace internal { template inline std::string ToString(T v) { std::ostringstream ss; ss << v; return ss.str(); } template inline std::string ToString(T* v) { std::ostringstream ss; if (v == nullptr) { ss << ""; } else { ss << v; } return ss.str(); }; inline std::string ToString() { return (""); } template inline std::string ToString(T first, Args... args) { return ToString(first) + ", " + ToString(args...); } } // namespace internal } // namespace hiprtc static amd::Monitor g_hiprtcInitlock{"hiprtcInit lock"}; #define HIPRTC_INIT_API_INTERNAL(...) \ amd::Thread* thread = amd::Thread::current(); \ if (!VDI_CHECK_THREAD(thread)) { \ ClPrint(amd::LOG_NONE, amd::LOG_ALWAYS, "An internal error has occurred." \ " This may be due to insufficient memory."); \ HIPRTC_RETURN(HIPRTC_ERROR_INTERNAL_ERROR); \ } \ amd::ScopedLock lock(g_hiprtcInitlock); \ if (!amd::Flag::init()) { \ HIPRTC_RETURN(HIPRTC_ERROR_INTERNAL_ERROR); \ } #define HIPRTC_INIT_API(...) \ HIPRTC_INIT_API_INTERNAL(0, __VA_ARGS__) \ ClPrint(amd::LOG_INFO, amd::LOG_API, "%s ( %s )", __func__, \ hiprtc::internal::ToString(__VA_ARGS__).c_str()); #define HIPRTC_RETURN(ret) \ hiprtc::tls.last_rtc_error_ = (ret); \ ClPrint(amd::LOG_INFO, amd::LOG_API, "%s: Returned %s", __func__, \ hiprtcGetErrorString(hiprtc::tls.last_rtc_error_)); \ return hiprtc::tls.last_rtc_error_; namespace hiprtc { static void crashWithMessage(std::string message) { #ifdef HIPRTC_USE_EXCEPTIONS throw std::runtime_error(message); #else guarantee(false, message.c_str()); #endif } struct Settings { bool dumpISA{false}; bool offloadArchProvided{false}; }; class RTCProgram { protected: // Lock and control variables static amd::Monitor lock_; static std::once_flag initialized_; RTCProgram(std::string name); ~RTCProgram() { amd::Comgr::destroy_data_set(exec_input_); } // Member Functions bool findIsa(); // Data Members std::string name_; std::string isa_; std::string build_log_; std::vector executable_; amd_comgr_data_set_t exec_input_; std::vector exe_options_; }; class RTCCompileProgram : public RTCProgram { // Private Data Members Settings settings_; std::string source_code_; std::string source_name_; std::map stripped_names_; std::map demangled_names_; std::vector compile_options_; std::vector link_options_; amd_comgr_data_set_t compile_input_; amd_comgr_data_set_t link_input_; bool fgpu_rdc_; std::vector LLVMBitcode_; // Private Member functions bool addSource_impl(); bool addBuiltinHeader(); bool transformOptions(std::vector& compile_options); bool findLLVMOptions(const std::vector& options, std::vector& llvm_options); RTCCompileProgram() = delete; RTCCompileProgram(RTCCompileProgram&) = delete; RTCCompileProgram& operator=(RTCCompileProgram&) = delete; public: RTCCompileProgram(std::string); ~RTCCompileProgram() { amd::Comgr::destroy_data_set(compile_input_); amd::Comgr::destroy_data_set(link_input_); } // Converters inline static hiprtcProgram as_hiprtcProgram(RTCCompileProgram* p) { return reinterpret_cast(p); } inline static RTCCompileProgram* as_RTCCompileProgram(hiprtcProgram& p) { return reinterpret_cast(p); } // Public Member Functions bool addSource(const std::string& source, const std::string& name); bool addHeader(const std::string& source, const std::string& name); bool compile(const std::vector& options, bool fgpu_rdc); bool getMangledName(const char* name_expression, const char** loweredName); bool trackMangledName(std::string& name); void stripNamedExpression(std::string& named_expression); bool GetBitcode(char* bitcode); bool GetBitcodeSize(size_t* bitcode_size); // Public Getter/Setters const std::vector& getExec() const { return executable_; } size_t getExecSize() const { return executable_.size(); } const std::string& getLog() const { return build_log_; } size_t getLogSize() const { return build_log_.size(); } }; // Linker Arguments passed via hipLinkCreate struct LinkArguments { unsigned int max_registers_; unsigned int threads_per_block_; float wall_time_; size_t info_log_size_; char* info_log_; size_t error_log_size_; char* error_log_; unsigned int optimization_level_; unsigned int target_from_hip_context_; unsigned int jit_target_; unsigned int fallback_strategy_; int generate_debug_info_; long log_verbose_; int generate_line_info_; unsigned int cache_mode_; bool sm3x_opt_; bool fast_compile_; const char** global_symbol_names_; void** global_symbol_addresses_; unsigned int global_symbol_count_; int lto_; int ftz_; int prec_div_; int prec_sqrt_; int fma_; const char** linker_ir2isa_args_; size_t linker_ir2isa_args_count_; LinkArguments() : max_registers_{0}, threads_per_block_{0}, wall_time_{0.0f}, info_log_size_{0}, info_log_{nullptr}, error_log_size_{0}, error_log_{nullptr}, optimization_level_{3}, target_from_hip_context_{0}, jit_target_{0}, fallback_strategy_{0}, generate_debug_info_{0}, log_verbose_{0}, generate_line_info_{0}, cache_mode_{0}, sm3x_opt_{false}, fast_compile_{false}, global_symbol_names_{nullptr}, global_symbol_addresses_{nullptr}, global_symbol_count_{0}, lto_{0}, ftz_{0}, prec_div_{0}, prec_sqrt_{0}, fma_{0}, linker_ir2isa_args_{nullptr}, linker_ir2isa_args_count_{0} {} size_t linkerIRArgCount() const { return linker_ir2isa_args_count_; } const char** linkerIRArg() const { return linker_ir2isa_args_; } }; class RTCLinkProgram : public RTCProgram { // Private Member Functions (forbid these function calls) RTCLinkProgram() = delete; RTCLinkProgram(RTCLinkProgram&) = delete; RTCLinkProgram& operator=(RTCLinkProgram&) = delete; amd_comgr_data_kind_t GetCOMGRDataKind(hiprtcJITInputType input_type); // Linker Argumenets at hipLinkCreate LinkArguments link_args_; // Private Data Members amd_comgr_data_set_t link_input_; std::vector link_options_; bool AddLinkerDataImpl(std::vector& link_data, hiprtcJITInputType input_type, std::string& link_file_name); public: RTCLinkProgram(std::string name); ~RTCLinkProgram() { amd::Comgr::destroy_data_set(link_input_); } // Public Member Functions bool AddLinkerOptions(unsigned int num_options, hiprtcJIT_option* options_ptr, void** options_vals_ptr); bool AddLinkerFile(std::string file_path, hiprtcJITInputType input_type); bool AddLinkerData(void* image_ptr, size_t image_size, std::string link_file_name, hiprtcJITInputType input_type); bool LinkComplete(void** bin_out, size_t* size_out); }; // Thread Local Storage Variables Aggregator Class class TlsAggregator { public: hiprtcResult last_rtc_error_; TlsAggregator() : last_rtc_error_(HIPRTC_SUCCESS) {} ~TlsAggregator() {} }; extern thread_local TlsAggregator tls; } // namespace hiprtc clr-rocm-5.7.1/hipamd/src/trace_helper.h000066400000000000000000000172721450307266000201150ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include #include #include #include //--- // Helper functions to convert HIP function arguments into strings. // Handles POD data types as well as enumerations (ie hipMemcpyKind). // The implementation uses C++11 variadic templates and template specialization. // The hipMemcpyKind example below is a good example that shows how to implement conversion for a // new HSA type. // Handy macro to convert an enumeration to a stringified version of same: #define CASE_STR(x) \ case x: \ return #x; inline const char* ihipErrorString(hipError_t hip_error) { switch (hip_error) { CASE_STR(hipSuccess); CASE_STR(hipErrorOutOfMemory); CASE_STR(hipErrorNotInitialized); CASE_STR(hipErrorDeinitialized); CASE_STR(hipErrorProfilerDisabled); CASE_STR(hipErrorProfilerNotInitialized); CASE_STR(hipErrorProfilerAlreadyStarted); CASE_STR(hipErrorProfilerAlreadyStopped); CASE_STR(hipErrorInvalidImage); CASE_STR(hipErrorInvalidContext); CASE_STR(hipErrorContextAlreadyCurrent); CASE_STR(hipErrorMapFailed); CASE_STR(hipErrorUnmapFailed); CASE_STR(hipErrorArrayIsMapped); CASE_STR(hipErrorAlreadyMapped); CASE_STR(hipErrorNoBinaryForGpu); CASE_STR(hipErrorAlreadyAcquired); CASE_STR(hipErrorNotMapped); CASE_STR(hipErrorNotMappedAsArray); CASE_STR(hipErrorNotMappedAsPointer); CASE_STR(hipErrorECCNotCorrectable); CASE_STR(hipErrorUnsupportedLimit); CASE_STR(hipErrorContextAlreadyInUse); CASE_STR(hipErrorPeerAccessUnsupported); CASE_STR(hipErrorInvalidKernelFile); CASE_STR(hipErrorInvalidGraphicsContext); CASE_STR(hipErrorInvalidSource); CASE_STR(hipErrorFileNotFound); CASE_STR(hipErrorSharedObjectSymbolNotFound); CASE_STR(hipErrorSharedObjectInitFailed); CASE_STR(hipErrorOperatingSystem); CASE_STR(hipErrorSetOnActiveProcess); CASE_STR(hipErrorInvalidHandle); CASE_STR(hipErrorNotFound); CASE_STR(hipErrorIllegalAddress); CASE_STR(hipErrorMissingConfiguration); CASE_STR(hipErrorLaunchFailure); CASE_STR(hipErrorPriorLaunchFailure); CASE_STR(hipErrorLaunchTimeOut); CASE_STR(hipErrorLaunchOutOfResources); CASE_STR(hipErrorInvalidDeviceFunction); CASE_STR(hipErrorInvalidConfiguration); CASE_STR(hipErrorInvalidDevice); CASE_STR(hipErrorInvalidValue); CASE_STR(hipErrorInvalidPitchValue); CASE_STR(hipErrorInvalidDevicePointer); CASE_STR(hipErrorInvalidMemcpyDirection); CASE_STR(hipErrorUnknown); CASE_STR(hipErrorNotReady); CASE_STR(hipErrorNoDevice); CASE_STR(hipErrorPeerAccessAlreadyEnabled); CASE_STR(hipErrorPeerAccessNotEnabled); CASE_STR(hipErrorRuntimeMemory); CASE_STR(hipErrorRuntimeOther); CASE_STR(hipErrorHostMemoryAlreadyRegistered); CASE_STR(hipErrorHostMemoryNotRegistered); CASE_STR(hipErrorTbd); default: return "hipErrorUnknown"; }; }; // Building block functions: template inline std::string ToHexString(T v) { std::ostringstream ss; ss << "0x" << std::hex << v; return ss.str(); }; template inline std::string ToString(T* v) { std::ostringstream ss; if (v == NULL) { ss << "char array:"; } else { ss << v; } return ss.str(); }; template inline std::string ToString(T** v) { std::ostringstream ss; if (v == NULL) { ss << "char array:"; } else { ss << v; } return ss.str(); }; //--- // Template overloads for ToString to handle specific types // This is the default which works for most types: template inline std::string ToString(T v) { std::ostringstream ss; ss << v; return ss.str(); }; template <> inline std::string ToString(hipFunction_t v) { std::ostringstream ss; ss << "0x" << std::hex << static_cast(v); return ss.str(); }; // hipEvent_t specialization. TODO - maybe add an event ID for debug? template <> inline std::string ToString(hipEvent_t v) { std::ostringstream ss; ss << "event:" << std::hex << static_cast(v); return ss.str(); }; // hipStream_t template <> inline std::string ToString(hipStream_t v) { std::ostringstream ss; if (v == NULL) { ss << "stream:"; } else { ss << "stream:" << std::hex << static_cast(v); } return ss.str(); }; // hipCtx_t template <> inline std::string ToString(hipCtx_t v) { std::ostringstream ss; if (v == NULL) { ss << "context:"; } else { ss << "context:" << std::hex << static_cast(v); } return ss.str(); }; // hipPitchedPtr template <> inline std::string ToString(hipPitchedPtr v) { std::ostringstream ss; ss << "pitchPtr:" << std::hex << static_cast(v.ptr); return ss.str(); }; // hipMemcpyKind specialization template <> inline std::string ToString(hipMemcpyKind v) { switch (v) { CASE_STR(hipMemcpyHostToHost); CASE_STR(hipMemcpyHostToDevice); CASE_STR(hipMemcpyDeviceToHost); CASE_STR(hipMemcpyDeviceToDevice); CASE_STR(hipMemcpyDefault); default: return ToHexString(v); }; }; template <> inline std::string ToString(hipFuncCache_t v) { switch (v) { CASE_STR(hipFuncCachePreferNone); CASE_STR(hipFuncCachePreferShared); CASE_STR(hipFuncCachePreferL1); CASE_STR(hipFuncCachePreferEqual); default: return ToHexString(v); }; }; template <> inline std::string ToString(hipSharedMemConfig v) { switch (v) { CASE_STR(hipSharedMemBankSizeDefault); CASE_STR(hipSharedMemBankSizeFourByte); CASE_STR(hipSharedMemBankSizeEightByte); default: return ToHexString(v); }; }; template <> inline std::string ToString(hipError_t v) { return ihipErrorString(v); }; // Catch empty arguments case inline std::string ToString() { return (""); } //--- // C++11 variadic template - peels off first argument, converts to string, and calls itself again to // peel the next arg. Strings are automatically separated by comma+space. template inline std::string ToString(T first, Args... args) { return ToString(first) + ", " + ToString(args...); } clr-rocm-5.7.1/opencl/000077500000000000000000000000001450307266000145255ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/.clang-format000066400000000000000000000004031450307266000170750ustar00rootroot00000000000000Language: Cpp BasedOnStyle: Google AlignEscapedNewlinesLeft: false AlignOperands: false ColumnLimit: 100 AlwaysBreakTemplateDeclarations: false DerivePointerAlignment: false IndentFunctionDeclarationAfterType: false MaxEmptyLinesToKeep: 2 SortIncludes: false clr-rocm-5.7.1/opencl/.gitattributes000066400000000000000000000011611450307266000174170ustar00rootroot00000000000000# Set the default behavior, in case people don't have core.autolf set. * text=auto # Explicitly declare text files you want to always be normalized and converted # to have LF line endings on checkout. *.c text eol=lf *.cpp text eol=lf *.cc text eol=lf *.h text eol=lf *.hpp text eol=lf *.txt text eol=lf # Define files to support auto-remove trailing white space # Need to run the command below, before add modified file(s) to the staging area # git config filter.trimspace.clean 'sed -e "s/[[:space:]]*$//g"' *.cpp filter=trimspace *.c filter=trimspace *.h filter=trimspacecpp *.hpp filter=trimspace *.md filter=trimspace clr-rocm-5.7.1/opencl/.gitignore000066400000000000000000000001171450307266000165140ustar00rootroot00000000000000.* !.gitignore *.d *.o *.obj *.gch *.pch *.so *.dll *.a *.lib *.exe *.out buildclr-rocm-5.7.1/opencl/CMakeLists.txt000066400000000000000000000110501450307266000172620ustar00rootroot00000000000000cmake_minimum_required(VERSION 3.5.1) if (POLICY CMP0048) cmake_policy(SET CMP0048 NEW) set(PROJ_VERSION VERSION 1.5.0) endif() project(opencl) set(CMAKE_POSITION_INDEPENDENT_CODE ON) # Set default libdir to be "lib" for ROCm, distros will override this anyway: set(CMAKE_INSTALL_LIBDIR "lib" CACHE STRING "Library install directory") include(GNUInstallDirs) option(BUILD_TESTS "Enable building OpenCL tests" OFF) option(BUILD_ICD "Enable building OpenCL ICD Loader" ON) option(EMU_ENV "Enable building for emulation environment" OFF) # Disable file reorg backward compatibilty for ASAN build if(NOT ENABLE_ASAN_PACKAGING) option(FILE_REORG_BACKWARD_COMPATIBILITY "Enable File Reorganization backward compatibility" ON) endif() # Add flags to generate PDB files with full symbolic information if(MSVC) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /Zi") set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} /DEBUG:FULL") endif() set(OPENCL_ICD_LOADER_HEADERS_DIR "${CMAKE_CURRENT_LIST_DIR}/khronos/headers/opencl2.2" CACHE PATH "") if(BUILD_ICD) add_subdirectory(khronos/icd) else() find_package(OpenCL REQUIRED) endif() add_subdirectory(amdocl) add_subdirectory(tools/clinfo) add_subdirectory(tools/cltrace) if(BUILD_TESTS) add_subdirectory(tests/ocltst) endif() ###--- Packaging ------------------------------------------------------------### # DEV package install(DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/khronos/headers/opencl2.2/CL" DESTINATION include COMPONENT DEV USE_SOURCE_PERMISSIONS PATTERN cl_d3d10.h EXCLUDE PATTERN cl_d3d11.h EXCLUDE PATTERN cl_dx9_media_sharing.h EXCLUDE PATTERN cl_egl.h EXCLUDE) ############################# # Packaging steps ############################# if(NOT WIN32) find_package(ROCM QUIET CONFIG PATHS /opt/rocm) if(ROCM_FOUND) include(ROCMSetupVersion) rocm_setup_version( VERSION "2.0.0" ) else() set(PROJECT_VERSION "2.0.0") endif() #set a name for icd file set(OPENCL_AMD_ICD_FILE "amdocl64.icd") if (DEFINED ROCM_PATCH_VERSION) # set unique name for ICD file for each jenkins build # Use ENV variable CPACK_RPM_PACKAGE_RELEASE, which is having build number set(PACKAGE_RELEASE_VERSION "") if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE}) set(PACKAGE_RELEASE_VERSION $ENV{CPACK_RPM_PACKAGE_RELEASE}) endif() if(PACKAGE_RELEASE_VERSION) # Replace "." to "_" in package version string. So file name will have .icd as extension string(REPLACE "." "_" PACKAGE_RELEASE_VERSION ${PACKAGE_RELEASE_VERSION}) else() # set a default value set(PACKAGE_RELEASE_VERSION "9999") endif() set(OPENCL_AMD_ICD_FILE "amdocl64_${ROCM_PATCH_VERSION}_${PACKAGE_RELEASE_VERSION}.icd") endif() if(BUILD_ICD) get_target_property(OPENCL_LIB_VERSION_MAJOR OpenCL SOVERSION) get_target_property(OPENCL_LIB_VERSION_STRING OpenCL VERSION) endif() #Set Package Version set(CPACK_PACKAGE_VERSION ${PROJECT_VERSION}) if(DEFINED ENV{ROCM_LIBPATCH_VERSION}) set(CPACK_PACKAGE_VERSION "${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}") message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}") endif() set(CPACK_PACKAGING_INSTALL_PREFIX "/opt/rocm" CACHE PATH "Package Installation path for OpenCL") #ROCM_PATH is needed to create symlink of libraries if(NOT DEFINED ROCM_PATH) string(REPLACE "/opencl" "" ROCM_PATH ${CPACK_PACKAGING_INSTALL_PREFIX}) endif() message (STATUS "ROCM Installation path(ROCM_PATH): ${ROCM_PATH}") #Package: rocm-opencl,rocm-opencl-dev/devel,rocm-ocl-icd if(BUILD_ICD) set(BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR}/packages/rocm-ocl-icd) configure_file(packaging/rocm-ocl-icd.postinst ${BUILD_DIR}/postinst @ONLY) configure_file(packaging/rocm-ocl-icd.prerm ${BUILD_DIR}/prerm @ONLY) configure_file(packaging/rocm-ocl-icd.rpm_post ${BUILD_DIR}/rpm_post @ONLY) configure_file(packaging/rocm-ocl-icd.rpm_postun ${BUILD_DIR}/rpm_postun @ONLY) endif() add_subdirectory(packaging) #File reorg Backward compatibility function if(FILE_REORG_BACKWARD_COMPATIBILITY) # To enable/disable #error in wrapper header files if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR) if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR}) set(ROCM_HEADER_WRAPPER_WERROR "$ENV{ROCM_HEADER_WRAPPER_WERROR}" CACHE STRING "Header wrapper warnings as errors.") else() set(ROCM_HEADER_WRAPPER_WERROR "OFF" CACHE STRING "Header wrapper warnings as errors.") endif() endif() if(ROCM_HEADER_WRAPPER_WERROR) set(deprecated_error 1) else() set(deprecated_error 0) endif() include(opencl-backward-compat.cmake) endif() endif() clr-rocm-5.7.1/opencl/LICENSE.txt000066400000000000000000000020701450307266000163470ustar00rootroot00000000000000Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. clr-rocm-5.7.1/opencl/README.md000066400000000000000000000066731450307266000160200ustar00rootroot00000000000000# OpenCL™ Compatible Runtime - OpenCL 2.0 compatible language runtime - Supports offline and in-process/in-memory compilation ## DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. © 2021 Advanced Micro Devices, Inc. All Rights Reserved. ## Getting the source code Download the git projects using the following commands: ```bash git clone -b main https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime.git ``` ## Repository branches The repository maintains several branches. The branches that are of importance are: - Main branch: This is the stable branch. It is up to date with the latest release branch, for example, if the latest ROCM release is rocm-4.1, main branch will be the repository based on this release. - Develop branch: This is the default branch, on which the new features are still under development and visible. While this maybe of interest to many, it should be noted that this branch and the features under development might not be stable. - Release branches: These are branches corresponding to each ROCM release, listed with release tags, such as rocm-4.0, rocm-4.1, etc. ## Setup OpenCL Copy the amdocl64.icd file to /etc/OpenCL/vendors ```bash sudo cp api/opencl/config/amdocl64.icd /etc/OpenCL/vendors/ ``` ## Building Follow these steps: - Build ROCclr first. Follow the steps in the following link to build ROCclr [ROCclr Readme](https://github.com/ROCm-Developer-Tools/ROCclr) In this step, $OPENCL_DIR and $ROCclr_DIR are defined. - Building OpenCL Run these commands: ```bash cd "$OPENCL_DIR" mkdir -p build; cd build cmake -DUSE_COMGR_LIBRARY=ON -DCMAKE_PREFIX_PATH="$ROCclr_DIR/build;/opt/rocm/" .. make -j$(nproc) ``` Note: For release build, add "-DCMAKE_BUILD_TYPE=Release" to the cmake command line. --- OpenCL™ is registered Trademark of Apple clr-rocm-5.7.1/opencl/amdocl/000077500000000000000000000000001450307266000157645ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/amdocl/CMakeLists.txt000066400000000000000000000100351450307266000205230ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.5) project(amdocl) option(BUILD_SHARED_LIBS "Build the shared library" ON) if(ADDRESS_SANITIZER) set(ASAN_LINKER_FLAGS "-fsanitize=address") set(ASAN_COMPILER_FLAGS "-fno-omit-frame-pointer -fsanitize=address") if(NOT CMAKE_COMPILER_IS_GNUCC) if(BUILD_SHARED_LIBS) set(ASAN_COMPILER_FLAGS "${ASAN_COMPILER_FLAGS} -shared-libsan") set(ASAN_LINKER_FLAGS "${ASAN_LINKER_FLAGS} -shared-libsan") else() set(ASAN_LINKER_FLAGS "${ASAN_LINKER_FLAGS} -static-libsan") endif() endif() set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ASAN_COMPILER_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ASAN_COMPILER_FLAGS}") set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${ASAN_LINKER_FLAGS} -s -Wl,--build-id=sha1") set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} ${ASAN_LINKER_FLAGS} -Wl,--build-id=sha1") endif() list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_LIST_DIR}/cmake") if(BUILD_SHARED_LIBS) add_library(amdocl SHARED) # Windows doesn't have a strip utility, so CMAKE_STRIP won't be set. if((CMAKE_BUILD_TYPE STREQUAL "Release") AND NOT ("${CMAKE_STRIP}" STREQUAL "")) add_custom_command(TARGET amdocl POST_BUILD COMMAND ${CMAKE_STRIP} $) endif() else() add_library(amdocl STATIC) endif() set_target_properties(amdocl PROPERTIES CXX_STANDARD 17 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF POSITION_INDEPENDENT_CODE ON) if(CMAKE_SIZEOF_VOID_P EQUAL 8) set_target_properties(amdocl PROPERTIES OUTPUT_NAME "amdocl64") else() set_target_properties(amdocl PROPERTIES OUTPUT_NAME "amdocl32") endif() target_sources(amdocl PRIVATE cl_command.cpp cl_context.cpp cl_counter.cpp cl_d3d9.cpp cl_d3d10.cpp cl_d3d11.cpp cl_device.cpp cl_event.cpp cl_execute.cpp cl_gl.cpp cl_icd.cpp cl_kernel_info_amd.cpp cl_memobj.cpp cl_p2p_amd.cpp cl_pipe.cpp cl_platform_amd.cpp cl_profile_amd.cpp cl_program.cpp cl_sampler.cpp cl_sdi_amd.cpp cl_svm.cpp cl_thread_trace_amd.cpp) if(WIN32) target_sources(amdocl PRIVATE cl_runtime.cpp) endif() if(BUILD_SHARED_LIBS) if(WIN32) target_sources(amdocl PRIVATE amdocl.def) else() # -Bsymbolic is required to make sure all AMD OpenCL runtime symbols are resolved internally # Otherwise ld might resolve them using symbols from the OpenCL ICD target_link_libraries(amdocl PRIVATE "-Wl,-Bsymbolic") target_link_libraries(amdocl PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_LIST_DIR}/amdocl.map") set_target_properties(amdocl PROPERTIES LINK_DEPENDS "${CMAKE_CURRENT_LIST_DIR}/amdocl.map") endif() endif() if(WIN32) configure_file(amdocl.rc.in amdocl_info.rc @ONLY) target_sources(amdocl PRIVATE amdocl_info.rc) endif() target_link_libraries(amdocl PUBLIC rocclr) INSTALL(TARGETS amdocl COMPONENT MAIN RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) clr-rocm-5.7.1/opencl/amdocl/amdocl.def000066400000000000000000000045721450307266000177130ustar00rootroot00000000000000EXPORTS clBuildProgram clCreateBuffer clCreateCommandQueue clCreateContext clCreateContextFromType clCreateFromGLBuffer clCreateFromGLRenderbuffer clCreateFromGLTexture2D clCreateFromGLTexture3D clCreateImage2D clCreateImage3D clCreateKernel clCreateKernelsInProgram clCreateProgramWithBinary clCreateProgramWithSource clCreateSampler clEnqueueAcquireGLObjects clEnqueueBarrier clEnqueueCopyBuffer clEnqueueCopyBufferToImage clEnqueueCopyImage clEnqueueCopyImageToBuffer clEnqueueMapBuffer clEnqueueMapImage clEnqueueMarker clEnqueueNDRangeKernel clEnqueueNativeKernel clEnqueueReadBuffer clEnqueueReadImage clEnqueueReleaseGLObjects clEnqueueTask clEnqueueUnmapMemObject clEnqueueWaitForEvents clEnqueueWriteBuffer clEnqueueWriteImage clFinish clFlush clGetCommandQueueInfo clGetContextInfo clGetDeviceIDs clGetDeviceInfo clGetEventInfo clGetEventProfilingInfo clGetExtensionFunctionAddress clGetGLObjectInfo clGetGLTextureInfo clGetImageInfo clGetKernelInfo clGetKernelWorkGroupInfo clGetMemObjectInfo clGetPlatformIDs clGetPlatformInfo clGetProgramBuildInfo clGetProgramInfo clGetSamplerInfo clGetSupportedImageFormats clReleaseCommandQueue clReleaseContext clReleaseEvent clReleaseKernel clReleaseMemObject clReleaseProgram clReleaseSampler clRetainCommandQueue clRetainContext clRetainEvent clRetainKernel clRetainMemObject clRetainProgram clRetainSampler clSetCommandQueueProperty clSetKernelArg clUnloadCompiler clWaitForEvents clIcdGetPlatformIDsKHR clCreateUserEvent clSetUserEventStatus clSetEventCallback clSetMemObjectDestructorCallback clCreateSubBuffer clEnqueueReadBufferRect clEnqueueWriteBufferRect clEnqueueCopyBufferRect clCompileProgram clCreateFromGLTexture clCreateImage clCreateProgramWithBuiltInKernels clCreateSubDevices clEnqueueBarrierWithWaitList clEnqueueFillBuffer clEnqueueFillImage clEnqueueMarkerWithWaitList clEnqueueMigrateMemObjects clGetExtensionFunctionAddressForPlatform clGetKernelArgInfo clLinkProgram clReleaseDevice clRetainDevice clUnloadPlatformCompiler clCreateCommandQueueWithProperties clCreateSamplerWithProperties clCreatePipe clGetPipeInfo clSVMAlloc clSVMFree clSetKernelArgSVMPointer clSetKernelExecInfo clEnqueueSVMFree clEnqueueSVMMemcpy clEnqueueSVMMemFill clEnqueueSVMMap clEnqueueSVMUnmap clCloneKernel clCreateProgramWithIL clEnqueueSVMMigrateMem clGetDeviceAndHostTimer clGetHostTimer clGetKernelSubGroupInfo clSetDefaultDeviceCommandQueue clCreateProgramWithIL clr-rocm-5.7.1/opencl/amdocl/amdocl.def.in000066400000000000000000000064511450307266000203160ustar00rootroot00000000000000EXPORTS clBuildProgram clCreateBuffer clCreateCommandQueue clCreateContext clCreateContextFromType clCreateFromGLBuffer clCreateFromGLRenderbuffer clCreateFromGLTexture2D clCreateFromGLTexture3D clCreateImage2D clCreateImage3D clCreateKernel clCreateKernelsInProgram clCreateProgramWithBinary clCreateProgramWithSource clCreateSampler clEnqueueAcquireGLObjects clEnqueueBarrier clEnqueueCopyBuffer clEnqueueCopyBufferToImage clEnqueueCopyImage clEnqueueCopyImageToBuffer clEnqueueMapBuffer clEnqueueMapImage clEnqueueMarker clEnqueueNDRangeKernel clEnqueueNativeKernel clEnqueueReadBuffer clEnqueueReadImage clEnqueueReleaseGLObjects clEnqueueTask clEnqueueUnmapMemObject clEnqueueWaitForEvents clEnqueueWriteBuffer clEnqueueWriteImage clFinish clFlush clGetCommandQueueInfo clGetContextInfo clGetDeviceIDs clGetDeviceInfo clGetEventInfo clGetEventProfilingInfo clGetExtensionFunctionAddress clGetGLObjectInfo clGetGLTextureInfo clGetImageInfo clGetKernelInfo clGetKernelWorkGroupInfo clGetMemObjectInfo clGetPlatformIDs clGetPlatformInfo clGetProgramBuildInfo clGetProgramInfo clGetSamplerInfo clGetSupportedImageFormats clReleaseCommandQueue clReleaseContext clReleaseEvent clReleaseKernel clReleaseMemObject clReleaseProgram clReleaseSampler clRetainCommandQueue clRetainContext clRetainEvent clRetainKernel clRetainMemObject clRetainProgram clRetainSampler clSetCommandQueueProperty clSetKernelArg clUnloadCompiler clWaitForEvents clIcdGetPlatformIDsKHR clCreateUserEvent clSetUserEventStatus clSetEventCallback clSetMemObjectDestructorCallback clCreateSubBuffer clEnqueueReadBufferRect clEnqueueWriteBufferRect clEnqueueCopyBufferRect #if (OPENCL_MAJOR > 1) || (OPENCL_MAJOR == 1 && OPENCL_MINOR >= 2) clCompileProgram clCreateFromGLTexture clCreateImage clCreateProgramWithBuiltInKernels clCreateSubDevices clEnqueueBarrierWithWaitList clEnqueueFillBuffer clEnqueueFillImage clEnqueueMarkerWithWaitList clEnqueueMigrateMemObjects clGetExtensionFunctionAddressForPlatform clGetKernelArgInfo clLinkProgram clReleaseDevice clRetainDevice clUnloadPlatformCompiler #endif #if (OPENCL_MAJOR >= 2) clCreateCommandQueueWithProperties clCreateSamplerWithProperties clCreatePipe clGetPipeInfo clSVMAlloc clSVMFree clSetKernelArgSVMPointer clSetKernelExecInfo clEnqueueSVMFree clEnqueueSVMMemcpy clEnqueueSVMMemFill clEnqueueSVMMap clEnqueueSVMUnmap #endif #if (OPENCL_MAJOR > 2) || (OPENCL_MAJOR == 2 && OPENCL_MINOR >= 1) clCloneKernel clCreateProgramWithIL clEnqueueSVMMigrateMem clGetDeviceAndHostTimer clGetHostTimer clGetKernelSubGroupInfo clSetDefaultDeviceCommandQueue #endif #if !defined(WITH_LIGHTNING_COMPILER) aclCompilerInit aclCompilerFini aclCompilerVersion aclVersionSize aclGetErrorString aclGetArchInfo aclGetDeviceInfo aclGetTargetInfo aclGetArchitecture aclGetFamily aclGetChip aclBinaryInit aclBinaryFini aclReadFromFile aclReadFromMem aclWriteToFile aclWriteToMem aclCreateFromBinary aclBinaryVersion aclInsertSection aclRemoveSection aclExtractSection aclInsertSymbol aclRemoveSymbol aclExtractSymbol aclDbgAddArgument aclDbgRemoveArgument aclQueryInfo aclCompile aclLink aclGetCompilerLog aclRetrieveType aclSetType aclConvertType aclDisassemble aclInsertKernelStatistics aclGetDeviceBinary aclDumpBinary #endif // !defined(WITH_LIGHTNING_COMPILER) #if (OPENCL_MAJOR > 2) || (OPENCL_MAJOR == 2 && OPENCL_MINOR >= 1) clCreateProgramWithIL #endif clr-rocm-5.7.1/opencl/amdocl/amdocl.map000066400000000000000000000071771450307266000177360ustar00rootroot00000000000000OPENCL_1.0 { global: clBuildProgram; clCreateBuffer; clCreateCommandQueue; clCreateContext; clCreateContextFromType; clCreateFromD3D10Buffer; clCreateFromGLBuffer; clCreateFromGLRenderbuffer; clCreateFromGLTexture2D; clCreateFromGLTexture3D; clCreateImage2D; clCreateImage3D; clCreateImageFromD3D10Resource; clCreateKernel; clCreateKernelsInProgram; clCreateProgramWithBinary; clCreateProgramWithSource; clCreateSampler; clEnqueueAcquireExternalObjects; clEnqueueAcquireGLObjects; clEnqueueBarrier; clEnqueueCopyBuffer; clEnqueueCopyBufferToImage; clEnqueueCopyImage; clEnqueueCopyImageToBuffer; clEnqueueMapBuffer; clEnqueueMapImage; clEnqueueMarker; clEnqueueNDRangeKernel; clEnqueueNativeKernel; clEnqueueReadBuffer; clEnqueueReadImage; clEnqueueReleaseExternalObjects; clEnqueueReleaseGLObjects; clEnqueueTask; clEnqueueUnmapMemObject; clEnqueueWaitForEvents; clEnqueueWriteBuffer; clEnqueueWriteImage; clFinish; clFlush; clGetCommandQueueInfo; clGetContextInfo; clGetDeviceIDs; clGetDeviceInfo; clGetEventInfo; clGetEventProfilingInfo; clGetExtensionFunctionAddress; clGetGLObjectInfo; clGetGLTextureInfo; clGetImageInfo; clGetKernelInfo; clGetKernelWorkGroupInfo; clGetMemObjectInfo; clGetPlatformIDs; clGetPlatformInfo; clGetProgramBuildInfo; clGetProgramInfo; clGetSamplerInfo; clGetSupportedImageFormats; clReleaseCommandQueue; clReleaseContext; clReleaseEvent; clReleaseKernel; clReleaseMemObject; clReleaseProgram; clReleaseSampler; clRetainCommandQueue; clRetainContext; clRetainEvent; clRetainKernel; clRetainMemObject; clRetainProgram; clRetainSampler; clSetCommandQueueProperty; clSetKernelArg; clUnloadCompiler; clWaitForEvents; clIcdGetPlatformIDsKHR; local: *; }; OPENCL_1.1 { global: clCreateUserEvent; clSetUserEventStatus; clSetEventCallback; clSetMemObjectDestructorCallback; clCreateSubBuffer; clEnqueueReadBufferRect; clEnqueueWriteBufferRect; clEnqueueCopyBufferRect; aclGetTargetInfo; aclCompilerInit; aclCompilerFini; aclReadFromMem; aclReadFromFile; aclBinaryInit; aclBinaryFini; aclWriteToMem; aclInsertSection; aclExtractSection; aclRemoveSection; aclQueryInfo; aclDbgAddArgument; aclExtractSymbol; aclInsertSymbol; aclRemoveSymbol; aclCompile; aclInsertKernelStatistics; aclDisassemble; } OPENCL_1.0; OPENCL_1.2 { global: clCompileProgram; clCreateFromGLTexture; clCreateImage; clCreateProgramWithBuiltInKernels; clCreateSubDevices; clEnqueueBarrierWithWaitList; clEnqueueFillBuffer; clEnqueueFillImage; clEnqueueMarkerWithWaitList; clEnqueueMigrateMemObjects; clGetExtensionFunctionAddressForPlatform; clGetKernelArgInfo; clLinkProgram; clReleaseDevice; clRetainDevice; clUnloadPlatformCompiler; } OPENCL_1.1; OPENCL_2.0 { global: clCreateCommandQueueWithProperties; clCreateSamplerWithProperties; clCreatePipe; clGetPipeInfo; clSVMAlloc; clSVMFree; clSetKernelArgSVMPointer; clSetKernelExecInfo; clEnqueueSVMFree; clEnqueueSVMMemcpy; clEnqueueSVMMemFill; clEnqueueSVMMap; clEnqueueSVMUnmap; } OPENCL_1.2; OPENCL_2.1 { global: clCloneKernel; clCreateProgramWithIL; clEnqueueSVMMigrateMem; clGetDeviceAndHostTimer; clGetHostTimer; clGetKernelSubGroupInfo; clSetDefaultDeviceCommandQueue; } OPENCL_2.0; clr-rocm-5.7.1/opencl/amdocl/amdocl.map.in000066400000000000000000000112561450307266000203340ustar00rootroot00000000000000OPENCL_1.0 { global: clBuildProgram; clCreateBuffer; clCreateCommandQueue; clCreateContext; clCreateContextFromType; clCreateFromD3D10Buffer; clCreateFromGLBuffer; clCreateFromGLRenderbuffer; clCreateFromGLTexture2D; clCreateFromGLTexture3D; clCreateImage2D; clCreateImage3D; clCreateImageFromD3D10Resource; clCreateKernel; clCreateKernelsInProgram; clCreateProgramWithBinary; clCreateProgramWithSource; clCreateSampler; clEnqueueAcquireExternalObjects; clEnqueueAcquireGLObjects; clEnqueueBarrier; clEnqueueCopyBuffer; clEnqueueCopyBufferToImage; clEnqueueCopyImage; clEnqueueCopyImageToBuffer; clEnqueueMapBuffer; clEnqueueMapImage; clEnqueueMarker; clEnqueueNDRangeKernel; clEnqueueNativeKernel; clEnqueueReadBuffer; clEnqueueReadImage; clEnqueueReleaseExternalObjects; clEnqueueReleaseGLObjects; clEnqueueTask; clEnqueueUnmapMemObject; clEnqueueWaitForEvents; clEnqueueWriteBuffer; clEnqueueWriteImage; clFinish; clFlush; clGetCommandQueueInfo; clGetContextInfo; clGetDeviceIDs; clGetDeviceInfo; clGetEventInfo; clGetEventProfilingInfo; clGetExtensionFunctionAddress; clGetGLObjectInfo; clGetGLTextureInfo; clGetImageInfo; clGetKernelInfo; clGetKernelWorkGroupInfo; clGetMemObjectInfo; clGetPlatformIDs; clGetPlatformInfo; clGetProgramBuildInfo; clGetProgramInfo; clGetSamplerInfo; clGetSupportedImageFormats; clReleaseCommandQueue; clReleaseContext; clReleaseEvent; clReleaseKernel; clReleaseMemObject; clReleaseProgram; clReleaseSampler; clRetainCommandQueue; clRetainContext; clRetainEvent; clRetainKernel; clRetainMemObject; clRetainProgram; clRetainSampler; clSetCommandQueueProperty; clSetKernelArg; clUnloadCompiler; clWaitForEvents; clIcdGetPlatformIDsKHR; local: *; }; #if (OPENCL_MAJOR > 1) || (OPENCL_MAJOR == 1 && OPENCL_MINOR >= 1) OPENCL_1.1 { global: clCreateUserEvent; clSetUserEventStatus; clSetEventCallback; clSetMemObjectDestructorCallback; clCreateSubBuffer; clEnqueueReadBufferRect; clEnqueueWriteBufferRect; clEnqueueCopyBufferRect; aclGetTargetInfo; aclCompilerInit; aclCompilerFini; aclReadFromMem; aclReadFromFile; aclBinaryInit; aclBinaryFini; aclWriteToMem; aclInsertSection; aclExtractSection; aclRemoveSection; aclQueryInfo; aclDbgAddArgument; aclExtractSymbol; aclInsertSymbol; aclRemoveSymbol; aclCompile; aclInsertKernelStatistics; aclDisassemble; } OPENCL_1.0; #endif #if (OPENCL_MAJOR > 1) || (OPENCL_MAJOR == 1 && OPENCL_MINOR >= 2) OPENCL_1.2 { global: clCompileProgram; clCreateFromGLTexture; clCreateImage; clCreateProgramWithBuiltInKernels; clCreateSubDevices; clEnqueueBarrierWithWaitList; clEnqueueFillBuffer; clEnqueueFillImage; clEnqueueMarkerWithWaitList; clEnqueueMigrateMemObjects; clGetExtensionFunctionAddressForPlatform; clGetKernelArgInfo; clLinkProgram; clReleaseDevice; clRetainDevice; clUnloadPlatformCompiler; } OPENCL_1.1; #endif #if (OPENCL_MAJOR >= 2) OPENCL_2.0 { global: clCreateCommandQueueWithProperties; clCreateSamplerWithProperties; clCreatePipe; clGetPipeInfo; clSVMAlloc; clSVMFree; clSetKernelArgSVMPointer; clSetKernelExecInfo; clEnqueueSVMFree; clEnqueueSVMMemcpy; clEnqueueSVMMemFill; clEnqueueSVMMap; clEnqueueSVMUnmap; } OPENCL_1.2; #endif #if (OPENCL_MAJOR > 2) || (OPENCL_MAJOR == 2 && OPENCL_MINOR >= 1) OPENCL_2.1 { global: clCloneKernel; clCreateProgramWithIL; clEnqueueSVMMigrateMem; clGetDeviceAndHostTimer; clGetHostTimer; clGetKernelSubGroupInfo; clSetDefaultDeviceCommandQueue; } OPENCL_2.0; #endif ACL_0.8 { global: aclCompilerInit; aclCompilerFini; aclCompilerVersion; aclVersionSize; aclGetErrorString; aclGetArchInfo; aclGetDeviceInfo; aclGetTargetInfo; aclGetArchitecture; aclGetFamily; aclGetChip; aclBinaryInit; aclBinaryFini; aclReadFromFile; aclReadFromMem; aclWriteToFile; aclWriteToMem; aclCreateFromBinary; aclBinaryVersion; aclInsertSection; aclRemoveSection; aclExtractSection; aclInsertSymbol; aclRemoveSymbol; aclExtractSymbol; aclDbgAddArgument; aclDbgRemoveArgument; aclQueryInfo; aclCompile; aclLink; aclGetCompilerLog; aclRetrieveType; aclSetType; aclConvertType; aclDisassemble; aclInsertKernelStatistics; aclGetDeviceBinary; aclDumpBinary; }; clr-rocm-5.7.1/opencl/amdocl/amdocl.rc000066400000000000000000000043721450307266000175570ustar00rootroot00000000000000#define STR(__macro__) #__macro__ #define XSTR(__macro__) STR(__macro__) #if defined(_DEBUG) #define DEBUG_ONLY(x) x #else #define DEBUG_ONLY(x) #endif #define VERSION_PREFIX_MAJOR 2 #define VERSION_PREFIX_MINOR 0 #define APSTUDIO_READONLY_SYMBOLS ///////////////////////////////////////////////////////////////////////////// // // Generated from the TEXTINCLUDE 2 resource. // #include "winresrc.h" #include "utils/versions.hpp" ///////////////////////////////////////////////////////////////////////////// #undef APSTUDIO_READONLY_SYMBOLS ///////////////////////////////////////////////////////////////////////////// // English (U.S.) resources #if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_ENU) #ifdef _WIN32 LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US #pragma code_page(1252) #endif //_WIN32 ///////////////////////////////////////////////////////////////////////////// // // Version // VS_VERSION_INFO VERSIONINFO FILEVERSION 10,0,AMD_PLATFORM_BUILD_NUMBER,AMD_PLATFORM_REVISION_NUMBER PRODUCTVERSION 10,0,AMD_PLATFORM_BUILD_NUMBER,AMD_PLATFORM_REVISION_NUMBER FILEFLAGSMASK 0x3fL #ifdef _DEBUG FILEFLAGS 0x1L #else FILEFLAGS 0x0L #endif FILEOS 0x40004L FILETYPE 0x2L FILESUBTYPE 0x0L BEGIN BLOCK "StringFileInfo" BEGIN BLOCK "040904b0" BEGIN VALUE "Comments", " \0" VALUE "CompanyName", "Advanced Micro Devices Inc.\0" VALUE "FileDescription", AMD_PLATFORM_NAME " OpenCL " XSTR(VERSION_PREFIX_MAJOR) "." XSTR(VERSION_PREFIX_MINOR) " Runtime\0" VALUE "FileVersion", "10.0." XSTR(AMD_PLATFORM_BUILD_NUMBER) "." XSTR(AMD_PLATFORM_REVISION_NUMBER) VALUE "InternalName", "OpenCL" VALUE "LegalCopyright", "Copyright (c) 2011 - 2021 Advanced Micro Devices Inc.\0" VALUE "OriginalFilename", "OpenCL.dll" VALUE "ProductName", "OpenCL " XSTR(VERSION_PREFIX_MAJOR) "." XSTR(VERSION_PREFIX_MINOR) " " AMD_PLATFORM_INFO "\0" VALUE "ProductVersion", "10.0." XSTR(AMD_PLATFORM_BUILD_NUMBER) "." XSTR(AMD_PLATFORM_REVISION_NUMBER) END END BLOCK "VarFileInfo" BEGIN VALUE "Translation", 0x409, 1200 END END #endif // English (U.S.) resources ///////////////////////////////////////////////////////////////////////////// clr-rocm-5.7.1/opencl/amdocl/amdocl.rc.in000066400000000000000000000043721450307266000201640ustar00rootroot00000000000000#define STR(__macro__) #__macro__ #define XSTR(__macro__) STR(__macro__) #if defined(_DEBUG) #define DEBUG_ONLY(x) x #else #define DEBUG_ONLY(x) #endif #define VERSION_PREFIX_MAJOR 2 #define VERSION_PREFIX_MINOR 0 #define APSTUDIO_READONLY_SYMBOLS ///////////////////////////////////////////////////////////////////////////// // // Generated from the TEXTINCLUDE 2 resource. // #include "winresrc.h" #include "utils/versions.hpp" ///////////////////////////////////////////////////////////////////////////// #undef APSTUDIO_READONLY_SYMBOLS ///////////////////////////////////////////////////////////////////////////// // English (U.S.) resources #if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_ENU) #ifdef _WIN32 LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US #pragma code_page(1252) #endif //_WIN32 ///////////////////////////////////////////////////////////////////////////// // // Version // VS_VERSION_INFO VERSIONINFO FILEVERSION 10,0,AMD_PLATFORM_BUILD_NUMBER,AMD_PLATFORM_REVISION_NUMBER PRODUCTVERSION 10,0,AMD_PLATFORM_BUILD_NUMBER,AMD_PLATFORM_REVISION_NUMBER FILEFLAGSMASK 0x3fL #ifdef _DEBUG FILEFLAGS 0x1L #else FILEFLAGS 0x0L #endif FILEOS 0x40004L FILETYPE 0x2L FILESUBTYPE 0x0L BEGIN BLOCK "StringFileInfo" BEGIN BLOCK "040904b0" BEGIN VALUE "Comments", " \0" VALUE "CompanyName", "Advanced Micro Devices Inc.\0" VALUE "FileDescription", AMD_PLATFORM_NAME " OpenCL " XSTR(VERSION_PREFIX_MAJOR) "." XSTR(VERSION_PREFIX_MINOR) " Runtime\0" VALUE "FileVersion", "10.0." XSTR(AMD_PLATFORM_BUILD_NUMBER) "." XSTR(AMD_PLATFORM_REVISION_NUMBER) VALUE "InternalName", "OpenCL" VALUE "LegalCopyright", "Copyright (c) 2011 - 2022 Advanced Micro Devices Inc.\0" VALUE "OriginalFilename", "OpenCL.dll" VALUE "ProductName", "OpenCL " XSTR(VERSION_PREFIX_MAJOR) "." XSTR(VERSION_PREFIX_MINOR) " " AMD_PLATFORM_INFO "\0" VALUE "ProductVersion", "10.0." XSTR(AMD_PLATFORM_BUILD_NUMBER) "." XSTR(AMD_PLATFORM_REVISION_NUMBER) END END BLOCK "VarFileInfo" BEGIN VALUE "Translation", 0x409, 1200 END END #endif // English (U.S.) resources ///////////////////////////////////////////////////////////////////////////// clr-rocm-5.7.1/opencl/amdocl/cl_agent_amd.h000066400000000000000000000155211450307266000205360ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __OPENCL_CL_AGENT_AMD_H #define __OPENCL_CL_AGENT_AMD_H #include #include "cl_icd_amd.h" #define cl_amd_agent 1 #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ typedef const struct _cl_agent cl_agent; #define CL_AGENT_VERSION_1_0 100 /* Context Callbacks */ typedef void(CL_CALLBACK* acContextCreate_fn)(cl_agent* /* agent */, cl_context /* context */); typedef void(CL_CALLBACK* acContextFree_fn)(cl_agent* /* agent */, cl_context /* context */); /* Command Queue Callbacks */ typedef void(CL_CALLBACK* acCommandQueueCreate_fn)(cl_agent* /* agent */, cl_command_queue /* queue */); typedef void(CL_CALLBACK* acCommandQueueFree_fn)(cl_agent* /* agent */, cl_command_queue /* queue */); /* Event Callbacks */ typedef void(CL_CALLBACK* acEventCreate_fn)(cl_agent* /* agent */, cl_event /* event */, cl_command_type /* type */); typedef void(CL_CALLBACK* acEventFree_fn)(cl_agent* /* agent */, cl_event /* event */); typedef void(CL_CALLBACK* acEventStatusChanged_fn)(cl_agent* /* agent */, cl_event /* event */, cl_int /* execution_status */, cl_long /* epoch_time_stamp */); /* Memory Object Callbacks */ typedef void(CL_CALLBACK* acMemObjectCreate_fn)(cl_agent* /* agent */, cl_mem /* memobj */); typedef void(CL_CALLBACK* acMemObjectFree_fn)(cl_agent* /* agent */, cl_mem /* memobj */); typedef void(CL_CALLBACK* acMemObjectAcquired_fn)(cl_agent* /* agent */, cl_mem /* memobj */, cl_device_id /* device */, cl_long /* elapsed_time */); /* Sampler Callbacks */ typedef void(CL_CALLBACK* acSamplerCreate_fn)(cl_agent* /* agent */, cl_sampler /* sampler */); typedef void(CL_CALLBACK* acSamplerFree_fn)(cl_agent* /* agent */, cl_sampler /* sampler */); /* Program Callbacks */ typedef void(CL_CALLBACK* acProgramCreate_fn)(cl_agent* /* agent */, cl_program /* program */); typedef void(CL_CALLBACK* acProgramFree_fn)(cl_agent* /* agent */, cl_program /* program */); typedef void(CL_CALLBACK* acProgramBuild_fn)(cl_agent* /* agent */, cl_program /* program */); /* Kernel Callbacks */ typedef void(CL_CALLBACK* acKernelCreate_fn)(cl_agent* /* agent */, cl_kernel /* kernel */); typedef void(CL_CALLBACK* acKernelFree_fn)(cl_agent* /* agent */, cl_kernel /* kernel */); typedef void(CL_CALLBACK* acKernelSetArg_fn)(cl_agent* /* agent */, cl_kernel /* kernel */, cl_int /* arg_index */, size_t /* size */, const void* /* value_ptr */); typedef struct _cl_agent_callbacks { /* Context Callbacks */ acContextCreate_fn ContextCreate; acContextFree_fn ContextFree; /* Command Queue Callbacks */ acCommandQueueCreate_fn CommandQueueCreate; acCommandQueueFree_fn CommandQueueFree; /* Event Callbacks */ acEventCreate_fn EventCreate; acEventFree_fn EventFree; acEventStatusChanged_fn EventStatusChanged; /* Memory Object Callbacks */ acMemObjectCreate_fn MemObjectCreate; acMemObjectFree_fn MemObjectFree; acMemObjectAcquired_fn MemObjectAcquired; /* Sampler Callbacks */ acSamplerCreate_fn SamplerCreate; acSamplerFree_fn SamplerFree; /* Program Callbacks */ acProgramCreate_fn ProgramCreate; acProgramFree_fn ProgramFree; acProgramBuild_fn ProgramBuild; /* Kernel Callbacks */ acKernelCreate_fn KernelCreate; acKernelFree_fn KernelFree; acKernelSetArg_fn KernelSetArg; } cl_agent_callbacks; typedef cl_uint cl_agent_capability_action; #define CL_AGENT_ADD_CAPABILITIES 0x0 #define CL_AGENT_RELINQUISH_CAPABILITIES 0x1 typedef struct _cl_agent_capabilities { cl_bitfield canGenerateContextEvents : 1; cl_bitfield canGenerateCommandQueueEvents : 1; cl_bitfield canGenerateEventEvents : 1; cl_bitfield canGenerateMemObjectEvents : 1; cl_bitfield canGenerateSamplerEvents : 1; cl_bitfield canGenerateProgramEvents : 1; cl_bitfield canGenerateKernelEvents : 1; } cl_agent_capabilities; struct _cl_agent { cl_int(CL_API_CALL* GetVersionNumber)(cl_agent* /* agent */, cl_int* /* version_ret */); cl_int(CL_API_CALL* GetPlatform)(cl_agent* /* agent */, cl_platform_id* /* platform_id_ret */); cl_int(CL_API_CALL* GetTime)(cl_agent* /* agent */, cl_long* /* time_nanos */); cl_int(CL_API_CALL* SetCallbacks)(cl_agent* /* agent */, const cl_agent_callbacks* /* callbacks */, size_t /* size */); cl_int(CL_API_CALL* GetPotentialCapabilities)(cl_agent* /* agent */, cl_agent_capabilities* /* capabilities */); cl_int(CL_API_CALL* GetCapabilities)(cl_agent* /* agent */, cl_agent_capabilities* /* capabilities */); cl_int(CL_API_CALL* SetCapabilities)(cl_agent* /* agent */, const cl_agent_capabilities* /* capabilities */, cl_agent_capability_action /* action */); cl_int(CL_API_CALL* GetICDDispatchTable)(cl_agent* /* agent */, cl_icd_dispatch_table* /* table */, size_t /* size */); cl_int(CL_API_CALL* SetICDDispatchTable)(cl_agent* /* agent */, const cl_icd_dispatch_table* /* table */, size_t /* size */); /* add Kernel/Program helper functions, etc... */ }; extern cl_int CL_CALLBACK clAgent_OnLoad(cl_agent* /* agent */); extern void CL_CALLBACK clAgent_OnUnload(cl_agent* /* agent */); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* __OPENCL_CL_AGENT_AMD_H */ clr-rocm-5.7.1/opencl/amdocl/cl_command.cpp000066400000000000000000000371221450307266000205710ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "platform/object.hpp" #include "platform/context.hpp" #include "platform/command.hpp" #include "platform/agent.hpp" /*! \addtogroup API * @{ * * \addtogroup CL_Queues * * OpenCL objects such as memory objects, program and kernel objects are * created using a context. Operations on these objects are performed using * a command-queue. The command-queue can be used to queue a set of operations * (referred to as commands) in order. Having multiple command-queues allows * applications to queue multiple independent commands without requiring * synchronization. Note that this should work as long as these objects are * not being shared. Sharing of objects across multiple command-queues will * require the application to perform appropriate synchronization. * * @{ */ /*! \brief Create a command-queue on a specific device. * * \param context must be a valid OpenCL context. * * \param device must be a device associated with context. It can either be * in the list of devices specified when context is created using * clCreateContext or have the same device type as device type specified wheni * context is created using clCreateContextFromType. * * \param properties specifies a list of properties for the command-queue. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero command-queue and \a errcode_ret is set to * CL_SUCCESS if the command-queue is created successfully or a NULL value * with one of the following error values returned \a in errcode_ret: * - CL_INVALID_CONTEXT if context is not a valid. * - CL_INVALID_DEVICE if device is not a valid device or is not associated * with context * - CL_INVALID_VALUE if values specified in properties are not valid. * - CL_INVALID_QUEUE_PROPERTIES if values specified in properties are valid * but are not supported by the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_command_queue, clCreateCommandQueueWithProperties, (cl_context context, cl_device_id device, const cl_queue_properties* queue_properties, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return (cl_command_queue)0; } amd::Context& amdContext = *as_amd(context); amd::Device& amdDevice = *as_amd(device); if (!is_valid(device) || !amdContext.containsDevice(&amdDevice)) { *not_null(errcode_ret) = CL_INVALID_DEVICE; return (cl_command_queue)0; } cl_command_queue_properties properties = 0; const struct QueueProperty { cl_queue_properties name; union { cl_queue_properties raw; // FIXME_lmoriche: Check with Khronos. cl_queue_properties is an intptr, // but cl_command_queue_properties is a bitfield (truncate?). // cl_command_queue_properties properties; cl_uint size; } value; }* p = reinterpret_cast(queue_properties); uint queueSize = amdDevice.info().queueOnDevicePreferredSize_; uint queueRTCUs = amd::CommandQueue::RealTimeDisabled; amd::CommandQueue::Priority priority = amd::CommandQueue::Priority::Normal; if (p != NULL) while (p->name != 0) { switch (p->name) { case CL_QUEUE_PROPERTIES: // FIXME_lmoriche: See comment above. // properties = p->value.properties; properties = static_cast(p->value.raw); break; case CL_QUEUE_SIZE: queueSize = p->value.size; break; #define CL_QUEUE_REAL_TIME_COMPUTE_UNITS_AMD 0x404f case CL_QUEUE_REAL_TIME_COMPUTE_UNITS_AMD: queueRTCUs = p->value.size; break; #define CL_QUEUE_MEDIUM_PRIORITY_AMD 0x4050 case CL_QUEUE_MEDIUM_PRIORITY_AMD: priority = amd::CommandQueue::Priority::Medium; if (p->value.size != 0) { queueRTCUs = p->value.size; } break; default: *not_null(errcode_ret) = CL_INVALID_QUEUE_PROPERTIES; LogWarning("invalid property name"); return (cl_command_queue)0; } ++p; } if (queueSize > amdDevice.info().queueOnDeviceMaxSize_) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_command_queue)0; } if ((queueRTCUs != amd::CommandQueue::RealTimeDisabled) && ((queueRTCUs > amdDevice.info().numRTCUs_) || (queueRTCUs == 0) || (queueRTCUs < amdDevice.info().granularityRTCUs_))) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_command_queue)0; } amd::CommandQueue* queue = NULL; { amd::ScopedLock lock(amdContext.lock()); // Check if the app creates a host queue if (!(properties & CL_QUEUE_ON_DEVICE)) { queue = new amd::HostQueue(amdContext, amdDevice, properties, queueRTCUs, priority); } else { // Is it a device default queue if (properties & CL_QUEUE_ON_DEVICE_DEFAULT) { queue = amdContext.defDeviceQueue(amdDevice); // If current context has one already then return it if (NULL != queue) { queue->retain(); *not_null(errcode_ret) = CL_SUCCESS; return as_cl(queue); } } // Check if runtime can allocate a new device queue on this context if (amdContext.isDevQueuePossible(amdDevice)) { queue = new amd::DeviceQueue(amdContext, amdDevice, properties, queueSize); } } if (queue == NULL || !queue->create()) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; delete queue; return (cl_command_queue)0; } } if (amd::Agent::shouldPostCommandQueueEvents()) { amd::Agent::postCommandQueueCreate(as_cl(queue->asCommandQueue())); } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(queue); } RUNTIME_EXIT RUNTIME_ENTRY_RET(cl_command_queue, clCreateCommandQueue, (cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int* errcode_ret)) { const cl_queue_properties cprops[] = {CL_QUEUE_PROPERTIES, static_cast(properties), 0}; return clCreateCommandQueueWithProperties(context, device, properties ? cprops : NULL, errcode_ret); } RUNTIME_EXIT /*! \brief Replaces the default command queue on the device * * \param context must be a valid OpenCL context. * * \param device must be a device associated with context. * * \param command_queue specifies the default command-queue. * * \reture One of the following values: * - CL_SUCCESS if the function executed successfully. * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_DEVICE if \a device is not a valid device or is not * associated with context. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command- * queue for device. */ RUNTIME_ENTRY(cl_int, clSetDefaultDeviceCommandQueue, (cl_context context, cl_device_id device, cl_command_queue command_queue)) { if (!is_valid(context)) { return CL_INVALID_CONTEXT; } if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::Context* amdContext = as_amd(context); amd::Device* amdDevice = as_amd(device); if (!is_valid(device) || !amdContext->containsDevice(amdDevice)) { return CL_INVALID_DEVICE; } amd::DeviceQueue* deviceQueue = as_amd(command_queue)->asDeviceQueue(); if ((deviceQueue == NULL) || (amdContext != &deviceQueue->context()) || (amdDevice != &deviceQueue->device())) { return CL_INVALID_COMMAND_QUEUE; } { amd::ScopedLock lock(amdContext->lock()); amdContext->setDefDeviceQueue(*amdDevice, deviceQueue); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Increment the \a command_queue reference count. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid * command-queue. * * clCreateCommandQueue performs an implicit retain. This is very helpful for * 3rd party libraries, which typically get a command-queue passed to them * by the application. However, it is possible that the application may delete * the command-queue without informing the library. Allowing functions to * attach to (i.e. retain) and release a command-queue solves the problem of a * command-queue being used by a library no longer being valid. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clRetainCommandQueue, (cl_command_queue command_queue)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } as_amd(command_queue)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Decrement the \a command_queue reference count. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid * command-queue. * * After the command_queue reference count becomes zero and all commands queued * to \a command_queue have finished (eg. kernel executions, memory object * updates etc.), the command-queue is deleted. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clReleaseCommandQueue, (cl_command_queue command_queue)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } as_amd(command_queue)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Query information about a command-queue. * * \param command_queue specifies the command-queue being queried. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * If param_value is NULL, it is ignored. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by \a param_value. If \a param_value_size_ret is NULL, * it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid * command-queue. * - CL_INVALID_VALUE if \a param_name is not one of the supported * values or if size in bytes specified by \a param_value_size is < size of * return type and \a param_value is not a NULL value. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetCommandQueueInfo, (cl_command_queue command_queue, cl_command_queue_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } switch (param_name) { case CL_QUEUE_CONTEXT: { cl_context context = const_cast(as_cl(&as_amd(command_queue)->context())); return amd::clGetInfo(context, param_value_size, param_value, param_value_size_ret); } case CL_QUEUE_DEVICE: { cl_device_id device = const_cast(as_cl(&as_amd(command_queue)->device())); return amd::clGetInfo(device, param_value_size, param_value, param_value_size_ret); } case CL_QUEUE_PROPERTIES: { cl_command_queue_properties properties = as_amd(command_queue)->properties().value_; return amd::clGetInfo(properties, param_value_size, param_value, param_value_size_ret); } case CL_QUEUE_REFERENCE_COUNT: { cl_uint count = as_amd(command_queue)->referenceCount(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } case CL_QUEUE_SIZE: { const amd::DeviceQueue* deviceQueue = as_amd(command_queue)->asDeviceQueue(); if (NULL == deviceQueue) { return CL_INVALID_COMMAND_QUEUE; } cl_uint size = deviceQueue->size(); return amd::clGetInfo(size, param_value_size, param_value, param_value_size_ret); } case CL_QUEUE_THREAD_HANDLE_AMD: { const amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } const void* handle = hostQueue->thread().handle(); return amd::clGetInfo(handle, param_value_size, param_value, param_value_size_ret); } case CL_QUEUE_DEVICE_DEFAULT: { const amd::Device& device = as_amd(command_queue)->device(); amd::CommandQueue* defQueue = as_amd(command_queue)->context().defDeviceQueue(device); cl_command_queue queue = defQueue ? as_cl(defQueue) : NULL; return amd::clGetInfo(queue, param_value_size, param_value, param_value_size_ret); } default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! \brief Enable or disable the properties of a command-queue. * * \param command_queue specifies the command-queue being queried. * * \param properties specifies the new command-queue properties to be applied * to \a command_queue . * * \param enable determines whether the values specified by properties are * enabled (if enable is CL_TRUE) or disabled (if enable is CL_FALSE) for the * command-queue . * * \param old_properties returns the command-queue properties before they were * changed by clSetCommandQueueProperty. If \a old_properties is NULL, * it is ignored. * * \return One of the following values: * - CL_SUCCESS if the command-queue properties are successfully updated. * - CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue. * - CL_INVALID_VALUE if the values specified in properties are not valid. * - CL_INVALID_QUEUE_PROPERTIES if values specified in properties are * not supported by the device. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clSetCommandQueueProperty, (cl_command_queue command_queue, cl_command_queue_properties properties, cl_bool enable, cl_command_queue_properties* old_properties)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } *not_null(old_properties) = as_amd(command_queue)->properties().value_; if (properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) { clFinish(command_queue); } bool success; if (enable == CL_TRUE) { success = as_amd(command_queue)->properties().set(properties); } else { success = as_amd(command_queue)->properties().clear(properties); } return success ? CL_SUCCESS : CL_INVALID_QUEUE_PROPERTIES; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_common.hpp000066400000000000000000000122361450307266000204470ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef CL_COMMON_HPP_ #define CL_COMMON_HPP_ #ifdef _WIN32 #include #include #include #endif #include #include "top.hpp" #include "vdi_common.hpp" //! Helper function to check "properties" parameter in various functions int checkContextProperties( const cl_context_properties *properties, bool* offlineDevices); namespace amd { template static inline cl_int clGetInfo( T& field, size_t param_value_size, void* param_value, size_t* param_value_size_ret) { const void *valuePtr; size_t valueSize; std::tie(valuePtr, valueSize) = detail::ParamInfo::type>::get(field); *not_null(param_value_size_ret) = valueSize; cl_int ret = CL_SUCCESS; if (param_value != NULL && param_value_size < valueSize) { if ((param_value_size == 0) || !std::is_pointer() || !std::is_same::type>::type, char>()) { return CL_INVALID_VALUE; } // For char* and char[] params, we will at least fill up to // param_value_size, then return an error. valueSize = param_value_size; static_cast(param_value)[--valueSize] = '\0'; ret = CL_INVALID_VALUE; } if (param_value != NULL) { ::memcpy(param_value, valuePtr, valueSize); if (param_value_size > valueSize) { ::memset(static_cast
(param_value) + valueSize, '\0', param_value_size - valueSize); } } return ret; } static inline cl_int clSetEventWaitList( Command::EventWaitList& eventWaitList, const amd::HostQueue& hostQueue, cl_uint num_events_in_wait_list, const cl_event* event_wait_list) { if ((num_events_in_wait_list == 0 && event_wait_list != NULL) || (num_events_in_wait_list != 0 && event_wait_list == NULL)) { return CL_INVALID_EVENT_WAIT_LIST; } while (num_events_in_wait_list-- > 0) { cl_event event = *event_wait_list++; Event* amdEvent = as_amd(event); if (!is_valid(event)) { return CL_INVALID_EVENT_WAIT_LIST; } if (&hostQueue.context() != &amdEvent->context()) { return CL_INVALID_CONTEXT; } if ((amdEvent->command().queue() != &hostQueue) && !amdEvent->notifyCmdQueue()) { return CL_INVALID_EVENT_WAIT_LIST; } eventWaitList.push_back(amdEvent); } return CL_SUCCESS; } //! Common function declarations for CL-external graphics API interop cl_int clEnqueueAcquireExtObjectsAMD(cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event, cl_command_type cmd_type); cl_int clEnqueueReleaseExtObjectsAMD(cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event, cl_command_type cmd_type); } // namespace amd extern "C" { #if defined(CL_VERSION_1_1) extern CL_API_ENTRY cl_int CL_API_CALL clSetCommandQueueProperty( cl_command_queue command_queue, cl_command_queue_properties properties, cl_bool enable, cl_command_queue_properties *old_properties) CL_API_SUFFIX__VERSION_1_0; #endif // CL_VERSION_1_1 extern CL_API_ENTRY cl_mem CL_API_CALL clConvertImageAMD( cl_context context, cl_mem image, const cl_image_format * image_format, cl_int * errcode_ret); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateBufferFromImageAMD( cl_context context, cl_mem image, cl_int * errcode_ret); extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithAssemblyAMD( cl_context context, cl_uint count, const char ** strings, const size_t * lengths, cl_int * errcode_ret); } // extern "C" //! \endcond #endif /*CL_COMMON_HPP_*/ clr-rocm-5.7.1/opencl/amdocl/cl_context.cpp000066400000000000000000000524311450307266000206370ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "vdi_common.hpp" #include "platform/context.hpp" #include "device/device.hpp" #include "platform/runtime.hpp" #include "platform/agent.hpp" #ifdef _WIN32 #include "cl_d3d9_amd.hpp" #include "cl_d3d10_amd.hpp" #include "cl_d3d11_amd.hpp" #endif // _WIN32 #include "cl_kernel_info_amd.h" #include "cl_profile_amd.h" #include "cl_platform_amd.h" #include "cl_sdi_amd.h" #include "cl_thread_trace_amd.h" #include "cl_p2p_amd.h" #include #include #include "CL/cl_gl.h" /*! \addtogroup API * @{ * * \addtogroup CL_Contexts * @{ */ /*! \brief Create an OpenCL context. * * An OpenCL context is created with one or more devices. Contexts are used by * the OpenCL runtime for managing objects such as command-queues, memory, * program and kernel objects and for executing kernels on one or more devices * specified in the context. * * \param properties is reserved and must be zero. * * \param num_devices is the number of devices specified in the \a devices * argument. * * \param devices is a pointer to a list of unique devices returned by * clGetDevices. If more than one device is specified in devices, * a selection criteria may be applied to determine if the list of devices * specified can be used together to create a context. * * \param pfn_notify is a callback function that can be registered by the * application. This callback function will be used by the runtime to report * information on errors that occur in this context. This callback function * may be called asynchronously by the runtime. If \a pfn_notify is NULL, * no callback function is registered. * * \param user_data will be passed as the user_data argument when \a pfn_notify * is called. \a user_data can be NULL. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero context and errcode_ret is set to CL_SUCCESS * if the context is created successfully or NULL with the following * error values stored in \a errcode_ret: * - CL_INVALID_VALUE if \a properties is not zero. * - CL_INVALID_VALUE if \a devices is NULL. * - CL_INVALID_VALUE if \a num_devices is equal to zero. * - CL_INVALID_DEVICE if \a devices contains an invalid device. * - CL_INVALID_DEVICE_LIST if more than one device is specified in * \a devices and the list of devices specified cannot be used together * to create a context. * - CL_DEVICE_NOT_AVAILABLE if a device in \a devices is currently not * available even though the device was returned by clGetDevices. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_context, clCreateContext, (const cl_context_properties* properties, cl_uint num_devices, const cl_device_id* devices, void(CL_CALLBACK* pfn_notify)(const char*, const void*, size_t, void*), void* user_data, cl_int* errcode_ret)) { cl_int errcode; amd::Context::Info info; errcode = amd::Context::checkProperties(properties, &info); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_context)0; } if (num_devices == 0 || devices == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_context)0; } std::vector devices_; for (cl_uint i = 0; i < num_devices; ++i) { // FIXME_lmoriche: Set errcode_ret to CL_DEVICE_NOT_AVAILABLE if a // device in devices is no longer available. cl_device_id device = devices[i]; if (!is_valid(device)) { *not_null(errcode_ret) = CL_INVALID_DEVICE; return (cl_context)0; } devices_.push_back(as_amd(device)); } amd::Context* context = new amd::Context(devices_, info); if (context == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_context)0; } if (CL_SUCCESS != (errcode = context->create(properties))) { context->release(); *not_null(errcode_ret) = errcode; return (cl_context)0; } if (amd::Agent::shouldPostContextEvents()) { amd::Agent::postContextCreate(as_cl(context)); } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(context); } RUNTIME_EXIT /*! \brief Create an OpenCL context from a device type that identifies the * specific device(s) to use. * * \param properties is reserved and must be zero. * * \param device_type is a bit-field that identifies the type of device. * * \param pfn_notify described in clCreateContext. * * \param user_data described in clCreateContext. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero context and errcode_ret is set to CL_SUCCESS * if the context is created successfully or NULL with the following error * values stored in errcode_ret: * - CL_INVALID_VALUE if \a properties is not zero. * - CL_INVALID_DEVICE_TYPE if \a device_type is not a valid value. * - CL_DEVICE_NOT_AVAILABLE if no devices that match \a device_type * are currently available. * - CL_DEVICE_NOT_FOUND if no devices that match \a device_type were found. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_context, clCreateContextFromType, (const cl_context_properties* properties, cl_device_type device_type, void(CL_CALLBACK* pfn_notify)(const char*, const void*, size_t, void*), void* user_data, cl_int* errcode_ret)) { amd::Context::Info info; cl_int errcode = amd::Context::checkProperties(properties, &info); if (errcode != CL_SUCCESS) { *not_null(errcode_ret) = errcode; return (cl_context)0; } // Get the devices of the given type. cl_uint num_devices; bool offlineDevices = (info.flags_ & amd::Context::OfflineDevices) ? true : false; if (!amd::Device::getDeviceIDs(device_type, 0, NULL, &num_devices, offlineDevices)) { *not_null(errcode_ret) = CL_DEVICE_NOT_FOUND; return (cl_context)0; } assert(num_devices > 0 && "Should have returned an error!"); cl_device_id* devices = (cl_device_id*)alloca(num_devices * sizeof(cl_device_id)); if (!amd::Device::getDeviceIDs(device_type, num_devices, devices, NULL, offlineDevices)) { *not_null(errcode_ret) = CL_DEVICE_NOT_FOUND; return (cl_context)0; } // Create a new context with the devices cl_context context = clCreateContext(properties, num_devices, devices, pfn_notify, user_data, errcode_ret); return context; } RUNTIME_EXIT /*! \brief Increment the context reference count. * * \return One of the following values: * - CL_INVALID_CONTEXT if context is not a valid OpenCL context. * - CL_SUCCESS if the function is executed successfully. * * clCreateContext and clCreateContextFromType perform an implicit retain. * This is very helpful for 3rd party libraries, which typically get a context * passed to them by the application. * However, it is possible that the application may delete the context without * informing the library. Allowing functions to attach to (i.e. retain) and * release a context solves the problem of a context being used by a library * no longer being valid. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clRetainContext, (cl_context context)) { if (!is_valid(context)) { return CL_INVALID_CONTEXT; } as_amd(context)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Decrement the context reference count. * * \return One of the following values: * - CL_INVALID_CONTEXT if context is not a valid OpenCL context. * - CL_SUCCESS if the function is executed successfully. * * After the context reference count becomes zero and all the objects attached * to context (such as memory objects, command-queues) are released, * the context is deleted. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clReleaseContext, (cl_context context)) { if (!is_valid(context)) { return CL_INVALID_CONTEXT; } as_amd(context)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Query information about a context. * * \param context specifies the OpenCL context being queried. * * \param param_name is an enum that specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size specifies the size in bytes of memory pointed to by * \a param_value. This size must be greater than or equal to the size of * return type. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by \a param_value. If \a param_value_size_ret is NULL, * it is ignored. * * \return One of the following values: * - CL_INVALID_CONTEXT if context is not a valid context. * - CL_INVALID_VALUE if \a param_name is not one of the supported values * or if size in bytes specified by \a param_value_size is < size of return * type and \a param_value is not a NULL value. * - CL_SUCCESS if the function is executed successfully. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetContextInfo, (cl_context context, cl_context_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(context)) { return CL_INVALID_CONTEXT; } switch (param_name) { case CL_CONTEXT_REFERENCE_COUNT: { cl_uint count = as_amd(context)->referenceCount(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } case CL_CONTEXT_NUM_DEVICES: { cl_uint numDevices = (cl_uint)as_amd(context)->devices().size(); return amd::clGetInfo(numDevices, param_value_size, param_value, param_value_size_ret); } case CL_CONTEXT_DEVICES: { const std::vector& devices = as_amd(context)->devices(); size_t numDevices = devices.size(); size_t valueSize = numDevices * sizeof(cl_device_id*); if (param_value != NULL && param_value_size < valueSize) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = valueSize; if (param_value != NULL) { cl_device_id* device_list = (cl_device_id*)param_value; for (const auto& it : devices) { *device_list++ = const_cast(as_cl(it)); } } return CL_SUCCESS; } case CL_CONTEXT_PROPERTIES: { const amd::Context* amdContext = as_amd(context); size_t valueSize = amdContext->info().propertiesSize_; if (param_value != NULL && param_value_size < valueSize) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = valueSize; if ((param_value != NULL) && (valueSize != 0)) { ::memcpy(param_value, amdContext->properties(), valueSize); } return CL_SUCCESS; } #ifdef _WIN32 case CL_CONTEXT_D3D10_DEVICE_KHR: { // Not defined in the ext.spec, but tested in the conf.test // Guessing functionality from the test... if (param_value != NULL && param_value_size < sizeof(void*)) { return CL_INVALID_VALUE; } const amd::Context* amdContext = as_amd(context); if (!(amdContext->info().flags_ & amd::Context::D3D10DeviceKhr)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = sizeof(intptr_t); if (param_value != NULL) { *(intptr_t*)param_value = reinterpret_cast(amdContext->info().hDev_[amd::Context::D3D10DeviceKhrIdx]); } return CL_SUCCESS; } case CL_CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR: { if (param_value != NULL && param_value_size < sizeof(cl_bool)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = sizeof(cl_bool); if (param_value != NULL) { *(cl_bool*)param_value = CL_TRUE; } return CL_SUCCESS; } case CL_CONTEXT_D3D11_DEVICE_KHR: { // Not defined in the ext.spec, but tested in the conf.test // Guessing functionality from the test... if (param_value != NULL && param_value_size < sizeof(void*)) { return CL_INVALID_VALUE; } const amd::Context* amdContext = as_amd(context); if (!(amdContext->info().flags_ & amd::Context::D3D11DeviceKhr)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = sizeof(intptr_t); if (param_value != NULL) { *(intptr_t*)param_value = reinterpret_cast(amdContext->info().hDev_[amd::Context::D3D11DeviceKhrIdx]); } return CL_SUCCESS; } case CL_CONTEXT_D3D11_PREFER_SHARED_RESOURCES_KHR: { if (param_value != NULL && param_value_size < sizeof(cl_bool)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = sizeof(cl_bool); if (param_value != NULL) { *(cl_bool*)param_value = CL_TRUE; } return CL_SUCCESS; } case CL_CONTEXT_ADAPTER_D3D9_KHR: { if (param_value != NULL && param_value_size < sizeof(void*)) { return CL_INVALID_VALUE; } const amd::Context* amdContext = as_amd(context); if (!(amdContext->info().flags_ & amd::Context::D3D9DeviceKhr)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = sizeof(intptr_t); if (param_value != NULL) { *(intptr_t*)param_value = reinterpret_cast(amdContext->info().hDev_[amd::Context::D3D9DeviceKhrIdx]); } return CL_SUCCESS; } case CL_CONTEXT_ADAPTER_D3D9EX_KHR: { if (param_value != NULL && param_value_size < sizeof(void*)) { return CL_INVALID_VALUE; } const amd::Context* amdContext = as_amd(context); if (!(amdContext->info().flags_ & amd::Context::D3D9DeviceEXKhr)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = sizeof(intptr_t); if (param_value != NULL) { *(intptr_t*)param_value = reinterpret_cast(amdContext->info().hDev_[amd::Context::D3D9DeviceEXKhrIdx]); } return CL_SUCCESS; } case CL_CONTEXT_ADAPTER_DXVA_KHR: { if (param_value != NULL && param_value_size < sizeof(void*)) { return CL_INVALID_VALUE; } const amd::Context* amdContext = as_amd(context); if (!(amdContext->info().flags_ & amd::Context::D3D9DeviceVAKhr)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = sizeof(intptr_t); if (param_value != NULL) { *(intptr_t*)param_value = reinterpret_cast(amdContext->info().hDev_[amd::Context::D3D9DeviceVAKhrIdx]); } return CL_SUCCESS; } #endif //_WIN32 default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! \brief returns the address of the extension function named by * funcname for a given platform. The pointer returned should be cast * to a function pointer type matching the extension functions definition * defined in the appropriate extension specification and header file. * A return value of NULL indicates that the specified function does not * exist for the implementation or platform is not a valid platform. * A non-NULL return value for \a clGetExtensionFunctionAddressForPlatform * does not guarantee that an extension function is actually supported by * the platform. The application must also make a corresponding query using * \a clGetPlatformInfo(platform, CL_PLATFORM_EXTENSIONS, ... ) or * \a clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, ... ) to determine if * an extension is supported by the OpenCL implementation. * * \version 1.2r07 */ CL_API_ENTRY void* CL_API_CALL clGetExtensionFunctionAddressForPlatform(cl_platform_id platform, const char* funcname) { if (platform != NULL && platform != AMD_PLATFORM) { return NULL; } return clGetExtensionFunctionAddress(funcname); } CL_API_ENTRY void* CL_API_CALL clGetExtensionFunctionAddress(const char* func_name) { #define CL_EXTENSION_ENTRYPOINT_CHECK(name) \ if (!strcmp(func_name, #name)) return reinterpret_cast(name); #define CL_EXTENSION_ENTRYPOINT_CHECK2(name1, name2) \ if (!strcmp(func_name, #name1)) return reinterpret_cast(name2); switch (func_name[2]) { case 'C': CL_EXTENSION_ENTRYPOINT_CHECK(clCreateEventFromGLsyncKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clCreatePerfCounterAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateThreadTraceAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateFromGLBuffer); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateFromGLTexture2D); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateFromGLTexture3D); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateFromGLRenderbuffer); #ifdef _WIN32 CL_EXTENSION_ENTRYPOINT_CHECK(clCreateFromD3D10BufferKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateFromD3D10Texture2DKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateFromD3D10Texture3DKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateFromDX9MediaSurfaceKHR); #endif //_WIN32 CL_EXTENSION_ENTRYPOINT_CHECK(clConvertImageAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clCreateBufferFromImageAMD); #if defined(cl_khr_il_program) || defined(CL_VERSION_2_1) CL_EXTENSION_ENTRYPOINT_CHECK2(clCreateProgramWithILKHR,clCreateProgramWithIL); #endif // defined(cl_khr_il_program) || defined(CL_VERSION_2_1) #if cl_amd_assembly_program CL_EXTENSION_ENTRYPOINT_CHECK(clCreateProgramWithAssemblyAMD); #endif // cl_amd_assembly_program break; case 'D': break; case 'E': CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueBeginPerfCounterAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueEndPerfCounterAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueAcquireGLObjects); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueReleaseGLObjects); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueBindThreadTraceBufferAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueThreadTraceCommandAMD); #ifdef _WIN32 CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueAcquireD3D10ObjectsKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueReleaseD3D10ObjectsKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueAcquireDX9MediaSurfacesKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueReleaseDX9MediaSurfacesKHR); #endif //_WIN32 CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueWaitSignalAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueWriteSignalAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueMakeBuffersResidentAMD); #if cl_amd_copy_buffer_p2p CL_EXTENSION_ENTRYPOINT_CHECK(clEnqueueCopyBufferP2PAMD); #endif // cl_amd_copy_buffer_p2p break; case 'G': CL_EXTENSION_ENTRYPOINT_CHECK(clGetKernelInfoAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clGetPerfCounterInfoAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clGetGLObjectInfo); CL_EXTENSION_ENTRYPOINT_CHECK(clGetGLTextureInfo); CL_EXTENSION_ENTRYPOINT_CHECK(clGetGLContextInfoKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clGetThreadTraceInfoAMD); #ifdef _WIN32 CL_EXTENSION_ENTRYPOINT_CHECK(clGetDeviceIDsFromD3D10KHR); CL_EXTENSION_ENTRYPOINT_CHECK(clGetDeviceIDsFromDX9MediaAdapterKHR); CL_EXTENSION_ENTRYPOINT_CHECK(clGetPlaneFromImageAMD); #endif //_WIN32 #if defined(cl_khr_sub_groups) || defined(CL_VERSION_2_1) CL_EXTENSION_ENTRYPOINT_CHECK2(clGetKernelSubGroupInfoKHR,clGetKernelSubGroupInfo); #endif // defined(cl_khr_sub_groups) || defined(CL_VERSION_2_1) break; case 'I': CL_EXTENSION_ENTRYPOINT_CHECK(clIcdGetPlatformIDsKHR); break; case 'R': CL_EXTENSION_ENTRYPOINT_CHECK(clReleasePerfCounterAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clRetainPerfCounterAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clReleaseThreadTraceAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clRetainThreadTraceAMD); break; case 'S': CL_EXTENSION_ENTRYPOINT_CHECK(clSetThreadTraceParamAMD); CL_EXTENSION_ENTRYPOINT_CHECK(clSetDeviceClockModeAMD); break; case 'U': CL_EXTENSION_ENTRYPOINT_CHECK(clUnloadPlatformAMD); default: break; } return NULL; } RUNTIME_ENTRY(cl_int, clTerminateContextKHR, (cl_context context)) { return CL_INVALID_CONTEXT; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_counter.cpp000066400000000000000000000073131450307266000206310ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include #include "platform/object.hpp" #include "platform/context.hpp" #include "platform/command.hpp" #include "platform/counter.hpp" #ifdef cl_amd_atomic_counters /*! \addtogroup API * @{ * \addtogroup CL_Counters * * Counter objects ... * * @{ */ /*! \brief * * \version 1.1r18 */ RUNTIME_ENTRY_RET(cl_counter_amd, clCreateCounterAMD, (cl_context context, cl_counter_flags_amd flags, cl_uint value, cl_int* errcode_ret)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return (cl_counter_amd)0; } RUNTIME_EXIT /*! \brief * * \version 1.1r18 */ RUNTIME_ENTRY(cl_int, clGetCounterInfoAMD, (cl_counter_amd counter, cl_counter_info_amd param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { return CL_INVALID_COUNTER_AMD; } RUNTIME_EXIT /*! \brief Increment the counter reference count. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_COUNTER if \a counter is not a valid counter object. * * The OpenCL commands that return a counter perform an implicit retain. * * \version 1.1r18 */ RUNTIME_ENTRY(cl_int, clRetainCounterAMD, (cl_counter_amd counter)) { if (!is_valid(counter)) { return CL_INVALID_COUNTER_AMD; } as_amd(counter)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Decrement the counter reference count. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_EVENT if \a counter is not a valid counter object. * * The counter object is deleted once the reference count becomes zero. * * \version 1.1r18 */ RUNTIME_ENTRY(cl_int, clReleaseCounterAMD, (cl_counter_amd counter)) { if (!is_valid(counter)) { return CL_INVALID_COUNTER_AMD; } as_amd(counter)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief * * \version 1.1r18 */ RUNTIME_ENTRY(cl_int, clEnqueueReadCounterAMD, (cl_command_queue command_queue, cl_counter_amd counter, cl_bool blocking_read, cl_uint* value, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return CL_INVALID_COUNTER_AMD; } RUNTIME_EXIT /*! \brief * * \version 1.1r18 */ RUNTIME_ENTRY(cl_int, clEnqueueWriteCounterAMD, (cl_command_queue command_queue, cl_counter_amd counter, cl_bool blocking_write, cl_uint value, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return CL_INVALID_COUNTER_AMD; } RUNTIME_EXIT /*! @} * @} */ #endif // cl_amd_atomic_counters clr-rocm-5.7.1/opencl/amdocl/cl_d3d10.cpp000066400000000000000000000477721450307266000200020ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifdef _WIN32 #include "top.hpp" #include "cl_common.hpp" #include "cl_d3d10_amd.hpp" #include "platform/command.hpp" #include #include /*! \addtogroup API * @{ * * \addtogroup CL_D3D10_Interops * * This section discusses OpenCL functions that allow applications to use Direct3D 10 * resources (buffers/textures) as OpenCL memory objects. This allows efficient sharing of * data between OpenCL and Direct3D 10. The OpenCL API can be used to execute kernels that * read and/or write memory objects that are also the Direct3D resources. * An OpenCL image object can be created from a D3D10 texture object. An * OpenCL buffer object can be created from a D3D10 buffer object (index/vertex). * * @} * \addtogroup clGetDeviceIDsFromD3D10KHR * @{ */ RUNTIME_ENTRY(cl_int, clGetDeviceIDsFromD3D10KHR, (cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void* d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices)) { cl_int errcode; ID3D10Device* d3d10_device = NULL; cl_device_id* gpu_devices; cl_uint num_gpu_devices = 0; bool create_d3d10Device = false; static const bool VALIDATE_ONLY = true; HMODULE d3d10Module = NULL; if (platform != NULL && platform != AMD_PLATFORM) { LogWarning("\"platrform\" is not a valid AMD platform"); return CL_INVALID_PLATFORM; } if (((num_entries > 0 || num_devices == NULL) && devices == NULL) || (num_entries == 0 && devices != NULL)) { return CL_INVALID_VALUE; } // Get GPU devices errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 0, NULL, &num_gpu_devices); if (errcode != CL_SUCCESS && errcode != CL_DEVICE_NOT_FOUND) { return CL_INVALID_VALUE; } if (!num_gpu_devices) { *not_null(num_devices) = 0; return CL_DEVICE_NOT_FOUND; } switch (d3d_device_source) { case CL_D3D10_DEVICE_KHR: d3d10_device = static_cast(d3d_object); break; case CL_D3D10_DXGI_ADAPTER_KHR: { typedef HRESULT(WINAPI * LPD3D10CREATEDEVICE)(IDXGIAdapter*, D3D10_DRIVER_TYPE, HMODULE, UINT, UINT32, ID3D10Device**); static LPD3D10CREATEDEVICE dynamicD3D10CreateDevice = NULL; d3d10Module = LoadLibrary("D3D10.dll"); if (d3d10Module == NULL) { return CL_INVALID_PLATFORM; } dynamicD3D10CreateDevice = (LPD3D10CREATEDEVICE)GetProcAddress(d3d10Module, "D3D10CreateDevice"); IDXGIAdapter* dxgi_adapter = static_cast(d3d_object); HRESULT hr = dynamicD3D10CreateDevice(dxgi_adapter, D3D10_DRIVER_TYPE_HARDWARE, NULL, 0, D3D10_SDK_VERSION, &d3d10_device); if (SUCCEEDED(hr) && (NULL != d3d10_device)) { create_d3d10Device = true; } else { FreeLibrary(d3d10Module); return CL_INVALID_VALUE; } } break; default: LogWarning("\"d3d_device_source\" is invalid"); return CL_INVALID_VALUE; } switch (d3d_device_set) { case CL_PREFERRED_DEVICES_FOR_D3D10_KHR: case CL_ALL_DEVICES_FOR_D3D10_KHR: { gpu_devices = (cl_device_id*)alloca(num_gpu_devices * sizeof(cl_device_id)); errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, num_gpu_devices, gpu_devices, NULL); if (errcode != CL_SUCCESS) { break; } void* external_device[amd::Context::DeviceFlagIdx::LastDeviceFlagIdx] = {}; external_device[amd::Context::DeviceFlagIdx::D3D10DeviceKhrIdx] = d3d10_device; std::vector compatible_devices; for (cl_uint i = 0; i < num_gpu_devices; ++i) { cl_device_id device = gpu_devices[i]; if (is_valid(device) && as_amd(device)->bindExternalDevice(amd::Context::Flags::D3D10DeviceKhr, external_device, NULL, VALIDATE_ONLY)) { compatible_devices.push_back(as_amd(device)); } } if (compatible_devices.size() == 0) { *not_null(num_devices) = 0; errcode = CL_DEVICE_NOT_FOUND; break; } auto it = compatible_devices.cbegin(); cl_uint compatible_count = std::min(num_entries, (cl_uint)compatible_devices.size()); while (compatible_count--) { *devices++ = as_cl(*it++); --num_entries; } while (num_entries--) { *devices++ = (cl_device_id)0; } *not_null(num_devices) = (cl_uint)compatible_devices.size(); } break; default: LogWarning("\"d3d_device_set\" is invalid"); errcode = CL_INVALID_VALUE; } if (create_d3d10Device) { d3d10_device->Release(); FreeLibrary(d3d10Module); } return errcode; } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromD3D10BufferKHR * @{ */ /*! \brief Creates an OpenCL buffer object from a Direct3D 10 resource. * * \param context is a valid OpenCL context. * * \param flags is a bit-field that is used to specify usage information. * Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE values * can be used. * * \param pD3DResource is a valid pointer to a D3D10 resource of type ID3D10Buffer. * * \return valid non-zero OpenCL buffer object and \a errcode_ret is set * to CL_SUCCESS if the buffer object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context or if Direct3D 10 * interoperatbility has not been initialized between context and the ID3D10Device * from which pD3DResource was created. * - CL_INVALID_VALUE if values specified in \a clFlags are not valid. * - CL_INVALID_D3D_RESOURCE if \a pD3DResource is not of type ID3D10Buffer. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33? */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromD3D10BufferKHR, (cl_context context, cl_mem_flags flags, ID3D10Buffer* pD3DResource, cl_int* errcode_ret)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!flags) flags = CL_MEM_READ_WRITE; if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } if (!pD3DResource) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("parameter \"pD3DResource\" is a NULL pointer"); return clMemObj; } return ( amd::clCreateBufferFromD3D10ResourceAMD(*as_amd(context), flags, pD3DResource, errcode_ret)); } RUNTIME_EXIT /*! @} * \addtogroup clCreateImageFromD3D10Resource * @{ */ /*! \brief Create an OpenCL 2D or 3D image object from a D3D10 resource. * * \param context is a valid OpenCL context. * * \param flags is a bit-field that is used to specify usage information. * Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE values * can be used. * * \param pD3DResource is a valid pointer to a D3D10 resource of type * ID3D10Texture2D, ID3D10Texture2D, or ID3D10Texture3D. * If pD3DResource is of type ID3D10Texture1D then the created image object * will be a 1D mipmapped image object. * If pD3DResource is of type ID3D10Texture2D and was not created with flag * D3D10_RESOURCE_MISC_TEXTURECUBE then the created image object will be a * 2D mipmapped image object. * If pD3DResource is of type ID3D10Texture2D and was created with flag * D3D10_RESOURCE_MISC_TEXTURECUBE then the created image object will be * a cubemap mipmapped image object. * errocde_ret returns CL_INVALID_D3D_RESOURCE if an OpenCL memory object has * already been created from pD3DResource in context. * If pD3DResource is of type ID3D10Texture3D then the created image object will * be a 3D mipmapped imageobject. * * \return valid non-zero OpenCL image object and \a errcode_ret is set * to CL_SUCCESS if the image object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context or if Direct3D 10 * interoperatbility has not been initialized between context and the ID3D10Device * from which pD3DResource was created. * - CL_INVALID_VALUE if values specified in \a flags are not valid. * - CL_INVALID_D3D_RESOURCE if \a pD3DResource is not of type ID3D10Texture1D, * ID3D10Texture2D, or ID3D10Texture3D. * - CL_INVALID_D3D_RESOURCE if an OpenCL memory object has already been created * from \a pD3DResource in context. * - CL_INVALID_IMAGE_FORMAT if the Direct3D 10 texture format does not map * to an appropriate OpenCL image format. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r48? */ RUNTIME_ENTRY_RET(cl_mem, clCreateImageFromD3D10Resource, (cl_context context, cl_mem_flags flags, ID3D10Resource* pD3DResource, UINT subresource, int* errcode_ret, UINT dimension)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!flags) flags = CL_MEM_READ_WRITE; if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } if (!pD3DResource) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("parameter \"pD3DResource\" is a NULL pointer"); return clMemObj; } // Verify context init'ed for interop ID3D10Device* pDev; pD3DResource->GetDevice(&pDev); if (pDev == NULL) { *not_null(errcode_ret) = CL_INVALID_D3D10_DEVICE_KHR; LogWarning("Cannot retrieve D3D10 device from D3D10 resource"); return (cl_mem)0; } pDev->Release(); if (!((*as_amd(context)).info().flags_ & amd::Context::D3D10DeviceKhr)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("\"amdContext\" is not created from D3D10 device"); return (cl_mem)0; } // Check for image support const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; bool sizePass = false; for (const auto& it : devices) { if (it->info().imageSupport_) { supportPass = true; } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return (cl_mem)0; } switch (dimension) { #if 0 case 1: return(amd::clCreateImage1DFromD3D10ResourceAMD( *as_amd(context), flags, pD3DResource, subresource, errcode_ret)); #endif // 0 case 2: return (amd::clCreateImage2DFromD3D10ResourceAMD(*as_amd(context), flags, pD3DResource, subresource, errcode_ret)); case 3: return (amd::clCreateImage3DFromD3D10ResourceAMD(*as_amd(context), flags, pD3DResource, subresource, errcode_ret)); default: break; } *not_null(errcode_ret) = CL_INVALID_D3D10_RESOURCE_KHR; return (cl_mem)0; } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromD3D10Texture2DKHR * @{ */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromD3D10Texture2DKHR, (cl_context context, cl_mem_flags flags, ID3D10Texture2D* resource, UINT subresource, cl_int* errcode_ret)) { return clCreateImageFromD3D10Resource(context, flags, resource, subresource, errcode_ret, 2); } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromD3D10Texture3DKHR * @{ */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromD3D10Texture3DKHR, (cl_context context, cl_mem_flags flags, ID3D10Texture3D* resource, UINT subresource, cl_int* errcode_ret)) { return clCreateImageFromD3D10Resource(context, flags, resource, subresource, errcode_ret, 3); } RUNTIME_EXIT /*! @} * \addtogroup clEnqueueAcquireD3D10ObjectsKHR * @{ */ RUNTIME_ENTRY(cl_int, clEnqueueAcquireD3D10ObjectsKHR, (cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return amd::clEnqueueAcquireExtObjectsAMD(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event, CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR); } RUNTIME_EXIT /*! @} * \addtogroup clEnqueueReleaseD3D10ObjectsKHR * @{ */ RUNTIME_ENTRY(cl_int, clEnqueueReleaseD3D10ObjectsKHR, (cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return amd::clEnqueueReleaseExtObjectsAMD(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event, CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR); } RUNTIME_EXIT /*! @} * \addtogroup CL-D3D10 interop helper functions * @{ */ //******************************************************************* // // Internal implementation of CL API functions // //******************************************************************* // // clCreateBufferFromD3D10ResourceAMD // cl_mem amd::clCreateBufferFromD3D10ResourceAMD(Context& amdContext, cl_mem_flags flags, ID3D10Resource* pD3DResource, int* errcode_ret) { // Verify pD3DResource is a buffer D3D10_RESOURCE_DIMENSION rType; pD3DResource->GetType(&rType); if (rType != D3D10_RESOURCE_DIMENSION_BUFFER) { *not_null(errcode_ret) = CL_INVALID_D3D10_RESOURCE_KHR; return (cl_mem)0; } D3D10Object obj; int errcode = D3D10Object::initD3D10Object(amdContext, pD3DResource, 0, obj); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem)0; } BufferD3D10* pBufferD3D10 = new (amdContext) BufferD3D10(amdContext, flags, obj); if (!pBufferD3D10) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!pBufferD3D10->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pBufferD3D10->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pBufferD3D10); } #if 0 // There is no support for 1D images in the base imagee code // // clCreateImage1DFromD3D10ResourceAMD // cl_mem amd::clCreateImage1DFromD3D10ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D10Resource* pD3DResource, UINT subresource, int* errcode_ret) { // Verify the resource is a 1D texture D3D10_RESOURCE_DIMENSION rType; pD3DResource->GetType(&rType); if(rType != D3D10_RESOURCE_DIMENSION_TEXTURE1D) { *not_null(errcode_ret) = CL_INVALID_D3D10_RESOURCE_KHR; return (cl_mem) 0; } D3D10Object obj; int errcode = D3D10Object::initD3D10Object(pD3DResource, subresource, obj); if(CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem) 0; } Image1DD3D10 *pImage1DD3D10 = new Image1DD3D10(amdContext, flags, obj); if(!pImage1DD3D10) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem) 0; } if (!pImage1DD3D10->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImage1DD3D10->release(); return (cl_mem) 0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImage1DD3D10); } #endif // // clCreateImage2DFromD3D10ResourceAMD // cl_mem amd::clCreateImage2DFromD3D10ResourceAMD(Context& amdContext, cl_mem_flags flags, ID3D10Resource* pD3DResource, UINT subresource, int* errcode_ret) { // Verify the resource is a 2D texture D3D10_RESOURCE_DIMENSION rType; pD3DResource->GetType(&rType); if (rType != D3D10_RESOURCE_DIMENSION_TEXTURE2D) { *not_null(errcode_ret) = CL_INVALID_D3D10_RESOURCE_KHR; return (cl_mem)0; } D3D10Object obj; int errcode = D3D10Object::initD3D10Object(amdContext, pD3DResource, subresource, obj); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem)0; } Image2DD3D10* pImage2DD3D10 = new (amdContext) Image2DD3D10(amdContext, flags, obj); if (!pImage2DD3D10) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!pImage2DD3D10->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImage2DD3D10->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImage2DD3D10); } // // clCreateImage2DFromD3D10ResourceAMD // cl_mem amd::clCreateImage3DFromD3D10ResourceAMD(Context& amdContext, cl_mem_flags flags, ID3D10Resource* pD3DResource, UINT subresource, int* errcode_ret) { // Verify the resource is a 2D texture D3D10_RESOURCE_DIMENSION rType; pD3DResource->GetType(&rType); if (rType != D3D10_RESOURCE_DIMENSION_TEXTURE3D) { *not_null(errcode_ret) = CL_INVALID_D3D10_RESOURCE_KHR; return (cl_mem)0; } D3D10Object obj; int errcode = D3D10Object::initD3D10Object(amdContext, pD3DResource, subresource, obj); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem)0; } Image3DD3D10* pImage3DD3D10 = new (amdContext) Image3DD3D10(amdContext, flags, obj); if (!pImage3DD3D10) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!pImage3DD3D10->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImage3DD3D10->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImage3DD3D10); } // // Helper function SyncD3D10Objects // void amd::SyncD3D10Objects(std::vector& memObjects) { Memory*& mem = memObjects.front(); if (!mem) { LogWarning("\nNULL memory object\n"); return; } InteropObject* interop = mem->getInteropObj(); if (!interop) { LogWarning("\nNULL interop object\n"); return; } D3D10Object* d3d10Obj = interop->asD3D10Object(); if (!d3d10Obj) { LogWarning("\nNULL D3D10 object\n"); return; } ID3D10Query* query = d3d10Obj->getQuery(); if (!query) { LogWarning("\nNULL ID3D10Query\n"); return; } query->End(); BOOL data = FALSE; while (S_OK != query->GetData(&data, sizeof(BOOL), 0)) { } } #endif //_WIN32 clr-rocm-5.7.1/opencl/amdocl/cl_d3d10_amd.hpp000066400000000000000000000042241450307266000206110ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "cl_common.hpp" #include "platform/context.hpp" #include "platform/memory.hpp" #include "platform/interop_d3d10.hpp" #include namespace amd { //! Functions for executing the D3D10 related stuff cl_mem clCreateBufferFromD3D10ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D10Resource* pD3DResource, int* errcode_ret); cl_mem clCreateImage1DFromD3D10ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D10Resource* pD3DResource, UINT subresource, int* errcode_ret); cl_mem clCreateImage2DFromD3D10ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D10Resource* pD3DResource, UINT subresource, int* errcode_ret); cl_mem clCreateImage3DFromD3D10ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D10Resource* pD3DResource, UINT subresource, int* errcode_ret); void SyncD3D10Objects(std::vector& memObjects); } //namespace amd clr-rocm-5.7.1/opencl/amdocl/cl_d3d11.cpp000066400000000000000000000521741450307266000177730ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifdef _WIN32 #include "top.hpp" #include "cl_d3d11_amd.hpp" #include "platform/command.hpp" #include #include /*! \addtogroup API * @{ * * \addtogroup CL_D3D11_Interops * * This section discusses OpenCL functions that allow applications to use Direct3D 11 * resources (buffers/textures) as OpenCL memory objects. This allows efficient sharing of * data between OpenCL and Direct3D 11. The OpenCL API can be used to execute kernels that * read and/or write memory objects that are also the Direct3D resources. * An OpenCL image object can be created from a D3D11 texture object. An * OpenCL buffer object can be created from a D3D11 buffer object (index/vertex). * * @} * \addtogroup clGetDeviceIDsFromD3D11KHR * @{ */ RUNTIME_ENTRY(cl_int, clGetDeviceIDsFromD3D11KHR, (cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void* d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices)) { cl_int errcode; ID3D11Device* d3d11_device = NULL; cl_device_id* gpu_devices; cl_uint num_gpu_devices = 0; bool create_d3d11Device = false; static const bool VALIDATE_ONLY = true; HMODULE d3d11Module = NULL; if (platform != NULL && platform != AMD_PLATFORM) { LogWarning("\"platrform\" is not a valid AMD platform"); return CL_INVALID_PLATFORM; } if (((num_entries > 0 || num_devices == NULL) && devices == NULL) || (num_entries == 0 && devices != NULL)) { return CL_INVALID_VALUE; } // Get GPU devices errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 0, NULL, &num_gpu_devices); if (errcode != CL_SUCCESS && errcode != CL_DEVICE_NOT_FOUND) { return CL_INVALID_VALUE; } if (!num_gpu_devices) { *not_null(num_devices) = 0; return CL_DEVICE_NOT_FOUND; } switch (d3d_device_source) { case CL_D3D11_DEVICE_KHR: d3d11_device = static_cast(d3d_object); break; case CL_D3D11_DXGI_ADAPTER_KHR: { static PFN_D3D11_CREATE_DEVICE dynamicD3D11CreateDevice = NULL; d3d11Module = LoadLibrary("D3D11.dll"); if (d3d11Module == NULL) { return CL_INVALID_PLATFORM; } dynamicD3D11CreateDevice = (PFN_D3D11_CREATE_DEVICE)GetProcAddress(d3d11Module, "D3D11CreateDevice"); IDXGIAdapter* dxgi_adapter = static_cast(d3d_object); D3D_FEATURE_LEVEL requestedFeatureLevels[] = {D3D_FEATURE_LEVEL_10_0}; D3D_FEATURE_LEVEL featureLevel = D3D_FEATURE_LEVEL_11_0; HRESULT hr = dynamicD3D11CreateDevice(dxgi_adapter, D3D_DRIVER_TYPE_UNKNOWN, NULL, 0, requestedFeatureLevels, 1, D3D11_SDK_VERSION, &d3d11_device, &featureLevel, NULL); if (SUCCEEDED(hr) && (NULL != d3d11_device)) { create_d3d11Device = true; } else { FreeLibrary(d3d11Module); return CL_INVALID_VALUE; } } break; default: LogWarning("\"d3d_device_source\" is invalid"); return CL_INVALID_VALUE; } switch (d3d_device_set) { case CL_PREFERRED_DEVICES_FOR_D3D11_KHR: case CL_ALL_DEVICES_FOR_D3D11_KHR: { gpu_devices = (cl_device_id*)alloca(num_gpu_devices * sizeof(cl_device_id)); errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, num_gpu_devices, gpu_devices, NULL); if (errcode != CL_SUCCESS) { break; } std::vector compatible_devices; for (cl_uint i = 0; i < num_gpu_devices; ++i) { void* external_device[amd::Context::DeviceFlagIdx::LastDeviceFlagIdx] = {}; external_device[amd::Context::DeviceFlagIdx::D3D11DeviceKhrIdx] = d3d11_device; cl_device_id device = gpu_devices[i]; if (is_valid(device) && as_amd(device)->bindExternalDevice(amd::Context::Flags::D3D11DeviceKhr, external_device, NULL, VALIDATE_ONLY)) { compatible_devices.push_back(as_amd(device)); } } if (compatible_devices.size() == 0) { *not_null(num_devices) = 0; errcode = CL_DEVICE_NOT_FOUND; break; } auto it = compatible_devices.cbegin(); cl_uint compatible_count = std::min(num_entries, (cl_uint)compatible_devices.size()); while (compatible_count--) { *devices++ = as_cl(*it++); --num_entries; } while (num_entries--) { *devices++ = (cl_device_id)0; } *not_null(num_devices) = (cl_uint)compatible_devices.size(); } break; default: LogWarning("\"d3d_device_set\" is invalid"); errcode = CL_INVALID_VALUE; } if (create_d3d11Device) { d3d11_device->Release(); FreeLibrary(d3d11Module); } return errcode; } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromD3D11BufferKHR * @{ */ /*! \brief Creates an OpenCL buffer object from a Direct3D 10 resource. * * \param context is a valid OpenCL context. * * \param flags is a bit-field that is used to specify usage information. * Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE values * can be used. * * \param pD3DResource is a valid pointer to a D3D11 resource of type ID3D11Buffer. * * \return valid non-zero OpenCL buffer object and \a errcode_ret is set * to CL_SUCCESS if the buffer object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context or if Direct3D 10 * interoperatbility has not been initialized between context and the ID3D11Device * from which pD3DResource was created. * - CL_INVALID_VALUE if values specified in \a clFlags are not valid. * - CL_INVALID_D3D_RESOURCE if \a pD3DResource is not of type ID3D11Buffer. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33? */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromD3D11BufferKHR, (cl_context context, cl_mem_flags flags, ID3D11Buffer* pD3DResource, cl_int* errcode_ret)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!flags) flags = CL_MEM_READ_WRITE; if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } if (!pD3DResource) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("parameter \"pD3DResource\" is a NULL pointer"); return clMemObj; } return ( amd::clCreateBufferFromD3D11ResourceAMD(*as_amd(context), flags, pD3DResource, errcode_ret)); } RUNTIME_EXIT /*! @} * \addtogroup clCreateImageFromD3D11Resource * @{ */ /*! \brief Create an OpenCL 2D or 3D image object from a D3D11 resource. * * \param context is a valid OpenCL context. * * \param flags is a bit-field that is used to specify usage information. * Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE values * can be used. * * \param pD3DResource is a valid pointer to a D3D11 resource of type * ID3D11Texture2D, ID3D11Texture2D, or ID3D11Texture3D. * If pD3DResource is of type ID3D11Texture1D then the created image object * will be a 1D mipmapped image object. * If pD3DResource is of type ID3D11Texture2D and was not created with flag * D3D11_RESOURCE_MISC_TEXTURECUBE then the created image object will be a * 2D mipmapped image object. * If pD3DResource is of type ID3D11Texture2D and was created with flag * D3D11_RESOURCE_MISC_TEXTURECUBE then the created image object will be * a cubemap mipmapped image object. * errocde_ret returns CL_INVALID_D3D_RESOURCE if an OpenCL memory object has * already been created from pD3DResource in context. * If pD3DResource is of type ID3D11Texture3D then the created image object will * be a 3D mipmapped imageobject. * * \return valid non-zero OpenCL image object and \a errcode_ret is set * to CL_SUCCESS if the image object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context or if Direct3D 11 * interoperatbility has not been initialized between context and the ID3D11Device * from which pD3DResource was created. * - CL_INVALID_VALUE if values specified in \a flags are not valid. * - CL_INVALID_D3D_RESOURCE if \a pD3DResource is not of type ID3D11Texture1D, * ID3D11Texture2D, or ID3D11Texture3D. * - CL_INVALID_D3D_RESOURCE if an OpenCL memory object has already been created * from \a pD3DResource in context. * - CL_INVALID_IMAGE_FORMAT if the Direct3D 11 texture format does not map * to an appropriate OpenCL image format. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r48? */ RUNTIME_ENTRY_RET(cl_mem, clCreateImageFromD3D11Resource, (cl_context context, cl_mem_flags flags, ID3D11Resource* pD3DResource, UINT subresource, int* errcode_ret, UINT dimension)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!flags) flags = CL_MEM_READ_WRITE; if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } if (!pD3DResource) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("parameter \"pD3DResource\" is a NULL pointer"); return clMemObj; } // Verify context init'ed for interop ID3D11Device* pDev; pD3DResource->GetDevice(&pDev); if (pDev == NULL) { *not_null(errcode_ret) = CL_INVALID_D3D11_DEVICE_KHR; LogWarning("Cannot retrieve D3D11 device from D3D11 resource"); return (cl_mem)0; } pDev->Release(); if (!((*as_amd(context)).info().flags_ & amd::Context::D3D11DeviceKhr)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("\"amdContext\" is not created from D3D11 device"); return (cl_mem)0; } // Check for image support const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; bool sizePass = false; for (const auto& it : devices) { if (it->info().imageSupport_) { supportPass = true; } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return (cl_mem)0; } switch (dimension) { #if 0 case 1: return(amd::clCreateImage1DFromD3D11ResourceAMD( *as_amd(context), flags, pD3DResource, subresource, errcode_ret)); #endif // 0 case 2: return (amd::clCreateImage2DFromD3D11ResourceAMD(*as_amd(context), flags, pD3DResource, subresource, errcode_ret)); case 3: return (amd::clCreateImage3DFromD3D11ResourceAMD(*as_amd(context), flags, pD3DResource, subresource, errcode_ret)); default: break; } *not_null(errcode_ret) = CL_INVALID_D3D11_RESOURCE_KHR; return (cl_mem)0; } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromD3D11Texture2DKHR * @{ */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromD3D11Texture2DKHR, (cl_context context, cl_mem_flags flags, ID3D11Texture2D* resource, UINT subresource, cl_int* errcode_ret)) { return clCreateImageFromD3D11Resource(context, flags, resource, subresource, errcode_ret, 2); } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromD3D11Texture3DKHR * @{ */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromD3D11Texture3DKHR, (cl_context context, cl_mem_flags flags, ID3D11Texture3D* resource, UINT subresource, cl_int* errcode_ret)) { return clCreateImageFromD3D11Resource(context, flags, resource, subresource, errcode_ret, 3); } RUNTIME_EXIT /*! @} * \addtogroup clEnqueueAcquireD3D11ObjectsKHR * @{ */ RUNTIME_ENTRY(cl_int, clEnqueueAcquireD3D11ObjectsKHR, (cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return amd::clEnqueueAcquireExtObjectsAMD(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event, CL_COMMAND_ACQUIRE_D3D11_OBJECTS_KHR); } RUNTIME_EXIT /*! @} * \addtogroup clEnqueueReleaseD3D11ObjectsKHR * @{ */ RUNTIME_ENTRY(cl_int, clEnqueueReleaseD3D11ObjectsKHR, (cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return amd::clEnqueueReleaseExtObjectsAMD(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event, CL_COMMAND_RELEASE_D3D11_OBJECTS_KHR); } RUNTIME_EXIT /*! @} * \addtogroup clGetPlaneFromImageAMD * @{ */ RUNTIME_ENTRY_RET(cl_mem, clGetPlaneFromImageAMD, (cl_context context, cl_mem mem, cl_uint plane, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return 0; } if (mem == 0) { *not_null(errcode_ret) = CL_INVALID_VALUE; return 0; } if (!is_valid(mem)) { *not_null(errcode_ret) = CL_INVALID_MEM_OBJECT; return 0; } amd::Memory* amdMem = as_amd(mem); amd::Context& amdContext = *as_amd(context); if (amdMem->getInteropObj() == NULL) { *not_null(errcode_ret) = CL_INVALID_MEM_OBJECT; return 0; } amd::Image2DD3D11* pImage = reinterpret_cast(amdMem); ID3D11Resource* pD3DResource = pImage->getD3D11Resource(); // Verify the resource is a 2D texture D3D11_RESOURCE_DIMENSION rType; pD3DResource->GetType(&rType); if (rType != D3D11_RESOURCE_DIMENSION_TEXTURE2D) { *not_null(errcode_ret) = CL_INVALID_D3D11_RESOURCE_KHR; return (cl_mem)0; } amd::D3D11Object obj; int errcode = amd::D3D11Object::initD3D11Object(amdContext, pD3DResource, 0, obj, plane); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem)0; } amd::Image2DD3D11* pImage2DD3D11 = new (amdContext) amd::Image2DD3D11(amdContext, pImage->getMemFlags(), obj); if (!pImage2DD3D11) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!pImage2DD3D11->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImage2DD3D11->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImage2DD3D11); } RUNTIME_EXIT /*! @} * \addtogroup CL-D3D11 interop helper functions * @{ */ //******************************************************************* // // Internal implementation of CL API functions // //******************************************************************* // // clCreateBufferFromD3D11ResourceAMD // cl_mem amd::clCreateBufferFromD3D11ResourceAMD(Context& amdContext, cl_mem_flags flags, ID3D11Resource* pD3DResource, int* errcode_ret) { // Verify pD3DResource is a buffer D3D11_RESOURCE_DIMENSION rType; pD3DResource->GetType(&rType); if (rType != D3D11_RESOURCE_DIMENSION_BUFFER) { *not_null(errcode_ret) = CL_INVALID_D3D11_RESOURCE_KHR; return (cl_mem)0; } D3D11Object obj; int errcode = D3D11Object::initD3D11Object(amdContext, pD3DResource, 0, obj); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem)0; } BufferD3D11* pBufferD3D11 = new (amdContext) BufferD3D11(amdContext, flags, obj); if (!pBufferD3D11) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!pBufferD3D11->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pBufferD3D11->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pBufferD3D11); } // // clCreateImage2DFromD3D11ResourceAMD // cl_mem amd::clCreateImage2DFromD3D11ResourceAMD(Context& amdContext, cl_mem_flags flags, ID3D11Resource* pD3DResource, UINT subresource, int* errcode_ret) { // Verify the resource is a 2D texture D3D11_RESOURCE_DIMENSION rType; pD3DResource->GetType(&rType); if (rType != D3D11_RESOURCE_DIMENSION_TEXTURE2D) { *not_null(errcode_ret) = CL_INVALID_D3D11_RESOURCE_KHR; return (cl_mem)0; } D3D11Object obj; int errcode = D3D11Object::initD3D11Object(amdContext, pD3DResource, subresource, obj); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem)0; } Image2DD3D11* pImage2DD3D11 = new (amdContext) Image2DD3D11(amdContext, flags, obj); if (!pImage2DD3D11) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!pImage2DD3D11->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImage2DD3D11->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImage2DD3D11); } // // clCreateImage2DFromD3D11ResourceAMD // cl_mem amd::clCreateImage3DFromD3D11ResourceAMD(Context& amdContext, cl_mem_flags flags, ID3D11Resource* pD3DResource, UINT subresource, int* errcode_ret) { // Verify the resource is a 2D texture D3D11_RESOURCE_DIMENSION rType; pD3DResource->GetType(&rType); if (rType != D3D11_RESOURCE_DIMENSION_TEXTURE3D) { *not_null(errcode_ret) = CL_INVALID_D3D11_RESOURCE_KHR; return (cl_mem)0; } D3D11Object obj; int errcode = D3D11Object::initD3D11Object(amdContext, pD3DResource, subresource, obj); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem)0; } Image3DD3D11* pImage3DD3D11 = new (amdContext) Image3DD3D11(amdContext, flags, obj); if (!pImage3DD3D11) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!pImage3DD3D11->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImage3DD3D11->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImage3DD3D11); } // // Helper function SyncD3D11Objects // void amd::SyncD3D11Objects(std::vector& memObjects) { Memory*& mem = memObjects.front(); if (!mem) { LogWarning("\nNULL memory object\n"); return; } InteropObject* interop = mem->getInteropObj(); if (!interop) { LogWarning("\nNULL interop object\n"); return; } D3D11Object* d3dObj = interop->asD3D11Object(); if (!d3dObj) { LogWarning("\nNULL D3D11 object\n"); return; } ID3D11Query* query = d3dObj->getQuery(); if (!query) { LogWarning("\nNULL ID3D11Query\n"); return; } ID3D11Device* d3dDev; query->GetDevice(&d3dDev); if (!d3dDev) { LogError("\nCannot get D3D11 device from D3D11 resource\n"); return; } ID3D11DeviceContext* pImmediateContext = NULL; d3dDev->GetImmediateContext(&pImmediateContext); if (!pImmediateContext) { LogError("\nCannot get D3D11 device context"); return; } pImmediateContext->Release(); // Flush D3D queues and make sure D3D stuff is finished { ScopedLock sl(d3dObj->getResLock()); pImmediateContext->End(query); BOOL data = FALSE; while ((S_OK != pImmediateContext->GetData(query, &data, sizeof(BOOL), 0)) || (data != TRUE)) { } } d3dDev->Release(); } #endif //_WIN32 clr-rocm-5.7.1/opencl/amdocl/cl_d3d11_amd.hpp000066400000000000000000000045161450307266000206160ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "cl_d3d10_amd.hpp" #include "platform/context.hpp" #include "platform/memory.hpp" #include "platform/interop_d3d11.hpp" #include extern CL_API_ENTRY cl_mem CL_API_CALL clGetPlaneFromImageAMD( cl_context /* context */, cl_mem /* mem */, cl_uint /* plane */, cl_int* /* errcode_ret */); namespace amd { //! Functions for executing the D3D11 related stuff cl_mem clCreateBufferFromD3D11ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D11Resource* pD3DResource, int* errcode_ret); cl_mem clCreateImage1DFromD3D11ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D11Resource* pD3DResource, UINT subresource, int* errcode_ret); cl_mem clCreateImage2DFromD3D11ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D11Resource* pD3DResource, UINT subresource, int* errcode_ret); cl_mem clCreateImage3DFromD3D11ResourceAMD( Context& amdContext, cl_mem_flags flags, ID3D11Resource* pD3DResource, UINT subresource, int* errcode_ret); void SyncD3D11Objects(std::vector& memObjects); } //namespace amd clr-rocm-5.7.1/opencl/amdocl/cl_d3d9.cpp000066400000000000000000000253301450307266000177140ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifdef _WIN32 #include "top.hpp" #include "cl_d3d9_amd.hpp" #include "platform/command.hpp" #include #include RUNTIME_ENTRY(cl_int, clGetDeviceIDsFromDX9MediaAdapterKHR, (cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr* media_adapters_type, void* media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices)) { cl_int errcode; // Accept an array of DX9 devices here as the spec mention of array of num_media_adapters size. IDirect3DDevice9Ex** d3d9_device = static_cast(media_adapters); cl_device_id* gpu_devices = NULL; cl_uint num_gpu_devices = 0; static const bool VALIDATE_ONLY = true; if (platform != NULL && platform != AMD_PLATFORM) { LogWarning("\"platrform\" is not a valid AMD platform"); return CL_INVALID_PLATFORM; } // check if input parameter are correct if ((num_media_adapters == 0) || (media_adapters_type == NULL) || (media_adapters == NULL) || (media_adapter_set != CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR && media_adapter_set != CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR) || (num_entries == 0 && devices != NULL)) { return CL_INVALID_VALUE; } // Get GPU devices errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 0, NULL, &num_gpu_devices); if (errcode != CL_SUCCESS && errcode != CL_DEVICE_NOT_FOUND) { return CL_INVALID_VALUE; } if (!num_gpu_devices) { *not_null(num_devices) = 0; return CL_DEVICE_NOT_FOUND; } switch (media_adapter_set) { case CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR: case CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR: { gpu_devices = new cl_device_id[num_gpu_devices]; errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, num_gpu_devices, gpu_devices, NULL); if (errcode != CL_SUCCESS) { break; } std::vector compatible_devices; for (cl_uint i = 0; i < num_gpu_devices; ++i) { cl_device_id device = gpu_devices[i]; amd::Context::Flags context_flag; amd::Context::DeviceFlagIdx devIdx; switch (media_adapters_type[i]) { case CL_ADAPTER_D3D9_KHR: context_flag = amd::Context::Flags::D3D9DeviceKhr; devIdx = amd::Context::DeviceFlagIdx::D3D9DeviceKhrIdx; break; case CL_ADAPTER_D3D9EX_KHR: context_flag = amd::Context::Flags::D3D9DeviceEXKhr; devIdx = amd::Context::DeviceFlagIdx::D3D9DeviceEXKhrIdx; break; case CL_ADAPTER_DXVA_KHR: context_flag = amd::Context::Flags::D3D9DeviceVAKhr; devIdx = amd::Context::DeviceFlagIdx::D3D9DeviceVAKhrIdx; break; } for (cl_uint j = 0; j < num_media_adapters; ++j) { // Since there can be multiple DX9 adapters passed in the array we need to validate // interopability with each. void* external_device[amd::Context::DeviceFlagIdx::LastDeviceFlagIdx] = {}; external_device[devIdx] = d3d9_device[j]; if (is_valid(device) && (media_adapters_type[j] == CL_ADAPTER_D3D9EX_KHR) && as_amd(device)->bindExternalDevice(context_flag, external_device, NULL, VALIDATE_ONLY)) { compatible_devices.push_back(as_amd(device)); } } } if (compatible_devices.size() == 0) { *not_null(num_devices) = 0; errcode = CL_DEVICE_NOT_FOUND; break; } auto it = compatible_devices.cbegin(); cl_uint compatible_count = std::min(num_entries, (cl_uint)compatible_devices.size()); while (compatible_count--) { *devices++ = as_cl(*it++); --num_entries; } while (num_entries--) { *devices++ = (cl_device_id)0; } *not_null(num_devices) = (cl_uint)compatible_devices.size(); } break; default: LogWarning("\"d3d9_device_set\" is invalid"); errcode = CL_INVALID_VALUE; } delete[] gpu_devices; return errcode; } RUNTIME_EXIT RUNTIME_ENTRY_RET(cl_mem, clCreateFromDX9MediaSurfaceKHR, (cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void* surface_info, cl_uint plane, cl_int* errcode_ret)) { cl_mem clMemObj = NULL; cl_dx9_surface_info_khr* cl_surf_info = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!flags) flags = CL_MEM_READ_WRITE; if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } if ((adapter_type != CL_ADAPTER_D3D9_KHR) && (adapter_type != CL_ADAPTER_D3D9EX_KHR) && (adapter_type != CL_ADAPTER_DXVA_KHR)) { *not_null(errcode_ret) = CL_INVALID_VALUE; return clMemObj; } if (!surface_info) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("parameter \"pD3DResource\" is a NULL pointer"); return clMemObj; } cl_surf_info = (cl_dx9_surface_info_khr*)surface_info; IDirect3DSurface9* pD3D9Resource = cl_surf_info->resource; HANDLE shared_handle = cl_surf_info->shared_handle; if (!pD3D9Resource) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("parameter \"surface_info\" is a NULL pointer"); return clMemObj; } D3DSURFACE_DESC Desc; pD3D9Resource->GetDesc(&Desc); if ((Desc.Format != D3DFMT_NV_12) && (Desc.Format != D3DFMT_P010) && (Desc.Format != D3DFMT_YV_12) && (plane != 0)) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("The plane has to be Zero if the surface format is non-planar !"); return clMemObj; } // Check for image support const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; bool sizePass = false; for (const auto& it : devices) { if (it->info().imageSupport_) { supportPass = true; } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return (cl_mem)0; } // Verify the resource is a 2D image return amd::clCreateImage2DFromD3D9ResourceAMD(*as_amd(context), flags, adapter_type, cl_surf_info, plane, errcode_ret); } RUNTIME_EXIT RUNTIME_ENTRY(cl_int, clEnqueueAcquireDX9MediaSurfacesKHR, (cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return amd::clEnqueueAcquireExtObjectsAMD(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event, CL_COMMAND_ACQUIRE_DX9_MEDIA_SURFACES_KHR); } RUNTIME_EXIT RUNTIME_ENTRY(cl_int, clEnqueueReleaseDX9MediaSurfacesKHR, (cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return amd::clEnqueueReleaseExtObjectsAMD(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event, CL_COMMAND_RELEASE_DX9_MEDIA_SURFACES_KHR); } RUNTIME_EXIT // // clCreateImage2DFromD3D9ResourceAMD // cl_mem amd::clCreateImage2DFromD3D9ResourceAMD(amd::Context& amdContext, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, cl_dx9_surface_info_khr* surface_info, cl_uint plane, int* errcode_ret) { cl_dx9_surface_info_khr* cl_surf_info = reinterpret_cast(surface_info); IDirect3DSurface9* pD3D9Resource = cl_surf_info->resource; HANDLE shared_handle = cl_surf_info->shared_handle; amd::D3D9Object obj; cl_int errcode = amd::D3D9Object::initD3D9Object(amdContext, adapter_type, surface_info, plane, obj); if (CL_SUCCESS != errcode) { *not_null(errcode_ret) = errcode; return (cl_mem)0; } amd::Image2DD3D9* pImage2DD3D9 = new (amdContext) amd::Image2DD3D9(amdContext, flags, obj); if (!pImage2DD3D9) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!pImage2DD3D9->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImage2DD3D9->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImage2DD3D9); } // // Helper function SyncD3D9Objects // void amd::SyncD3D9Objects(std::vector& memObjects) { amd::Memory*& mem = memObjects.front(); if (!mem) { LogWarning("\nNULL memory object\n"); return; } amd::InteropObject* interop = mem->getInteropObj(); if (!interop) { LogWarning("\nNULL interop object\n"); return; } amd::D3D9Object* d3d9Obj = interop->asD3D9Object(); if (!d3d9Obj) { LogWarning("\nNULL D3D9 object\n"); return; } IDirect3DQuery9* query = d3d9Obj->getQuery(); if (!query) { LogWarning("\nNULL IDirect3DQuery9\n"); return; } amd::ScopedLock sl(d3d9Obj->getResLock()); query->Issue(D3DISSUE_END); BOOL data = FALSE; while (S_OK != query->GetData(&data, sizeof(BOOL), D3DGETDATA_FLUSH)) { } } #endif //_WIN32 clr-rocm-5.7.1/opencl/amdocl/cl_d3d9_amd.hpp000066400000000000000000000031531450307266000205410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* $Revision$ on $Date$ */ #pragma once #include "cl_common.hpp" #include "platform/context.hpp" #include "platform/memory.hpp" #include "platform/interop_d3d9.hpp" #include namespace amd { cl_mem clCreateImage2DFromD3D9ResourceAMD( Context& amdContext, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, cl_dx9_surface_info_khr* surface_info, cl_uint plane, int* errcode_ret); void SyncD3D9Objects(std::vector& memObjects); } //namespace amd clr-rocm-5.7.1/opencl/amdocl/cl_device.cpp000066400000000000000000000517631450307266000204210ustar00rootroot00000000000000/* Copyright (c) 2008 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "vdi_common.hpp" #include "device/device.hpp" #include "platform/runtime.hpp" #include "utils/versions.hpp" #include "os/os.hpp" #include "cl_semaphore_amd.h" #include "CL/cl_ext.h" #include // for alloca /*! \addtogroup API * @{ * * \addtogroup CL_PlatformInfo * @{ */ /*! \brief Get the list of available platforms. * * \param num_entries is the number of cl_platform_id entries that can be added * to platforms. If \a platforms is not NULL, the \a num_entries must be greater * than zero. * * \param platforms returns a list of OpenCL platforms found. The cl_platform_id * values returned in \a platforms can be used to identify a specific OpenCL * platform. If \a platforms argument is NULL, this argument is ignored. The * number of OpenCL platforms returned is the mininum of the value specified by * \a num_entries or the number of OpenCL platforms available. * * \param num_platforms returns the number of OpenCL platforms available. If * \a num_platforms is NULL, this argument is ignored. * * \return CL_INVALID_VALUE if num_entries is equal to zero and platforms is not * NULL or if both num_platforms and platforms are NULL, and returns CL_SUCCESS * if the function is executed successfully. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetPlatformIDs, (cl_uint num_entries, cl_platform_id* platforms, cl_uint* num_platforms)) { if (!amd::Runtime::initialized()) { amd::Runtime::init(); } if (((num_entries > 0 || num_platforms == NULL) && platforms == NULL) || (num_entries == 0 && platforms != NULL)) { return CL_INVALID_VALUE; } if (num_platforms != NULL && platforms == NULL) { *num_platforms = 1; return CL_SUCCESS; } assert(platforms != NULL && "check the code above"); *platforms = AMD_PLATFORM; *not_null(num_platforms) = 1; return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Get specific information about the OpenCL platform. * * \param param_name is an enum that identifies the platform information being * queried. * * \param param_value is a pointer to memory location where appropriate values * for a given \a param_name will be returned. If \a param_value is NULL, * it is ignored. * * \param param_value_size specifies the size in bytes of memory pointed to by * \a param_value. This size in bytes must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_INVALID_VALUE if \a param_name is not one of the supported * values or if size in bytes specified by \a param_value_size is < size of * return type and \a param_value is not a NULL value. * - CL_SUCCESS if the function is executed successfully. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetPlatformInfo, (cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (platform != NULL && platform != AMD_PLATFORM) { return CL_INVALID_PLATFORM; } const char* value = NULL; switch (param_name) { case CL_PLATFORM_PROFILE: value = "FULL_PROFILE"; break; case CL_PLATFORM_VERSION: value = "OpenCL " XSTR(OPENCL_MAJOR) "." XSTR(OPENCL_MINOR) " " AMD_PLATFORM_INFO; break; case CL_PLATFORM_NAME: value = AMD_PLATFORM_NAME; break; case CL_PLATFORM_VENDOR: value = "Advanced Micro Devices, Inc."; break; case CL_PLATFORM_EXTENSIONS: value = "cl_khr_icd " #ifdef _WIN32 "cl_khr_d3d10_sharing " "cl_khr_d3d11_sharing " "cl_khr_dx9_media_sharing " #endif //_WIN32 "cl_amd_event_callback " #if defined(WITH_COMPILER_LIB) "cl_amd_offline_devices " #endif // defined(WITH_COMPILER_LIB) ; break; case CL_PLATFORM_ICD_SUFFIX_KHR: value = "AMD"; break; case CL_PLATFORM_HOST_TIMER_RESOLUTION: { cl_ulong resolution = (cl_ulong)amd::Os::timerResolutionNanos(); return amd::clGetInfo(resolution, param_value_size, param_value, param_value_size_ret); } default: break; } if (value != NULL) { return amd::clGetInfo(value, param_value_size, param_value, param_value_size_ret); } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! @} * \addtogroup CL_Devices * @{ */ /*! \brief Get the list of available devices. * * \param device_type is a bitfield that identifies the type of OpenCL device. * The \a device_type can be used to query specific OpenCL devices or all * OpenCL devices available. * * \param num_entries is the number of cl_device_id entries that can be added * to devices. If devices is not NULL, the \a num_entries must be greater than * zero. * * \param devices returns a list of OpenCL devices found. The cl_device_id * values returned in devices can be used to identify a specific OpenCL device. * If \a devices argument is NULL, this argument is ignored. The number of * OpenCL devices returned is the mininum of value specified by \a num_entries * or the number of OpenCL devices whose type matches device_type. * * \param num_devices returns the number of OpenCL devices available that match * device_type. If \a num_devices is NULL, this argument is ignored. * * \return One of the following values: * - CL_INVALID_DEVICE_TYPE if \a device_type is not a valid value. * - CL_INVALID_VALUE if \a num_entries is equal to zero and devices is * not NULL or if both \a num_devices and \a devices are NULL. * - CL_DEVICE_ NOT_FOUND if no OpenCL devices that matched \a device_type * were found. * - CL_SUCCESS if the function is executed successfully. * * The application can query specific capabilities of the OpenCL device(s) * returned by clGetDeviceIDs. This can be used by the application to * determine which device(s) to use. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetDeviceIDs, (cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices)) { if (platform != NULL && platform != AMD_PLATFORM) { return CL_INVALID_PLATFORM; } if (((num_entries > 0 || num_devices == NULL) && devices == NULL) || (num_entries == 0 && devices != NULL)) { return CL_INVALID_VALUE; } // Get all available devices if (!amd::Device::getDeviceIDs(device_type, num_entries, devices, num_devices, false)) { return CL_DEVICE_NOT_FOUND; } return CL_SUCCESS; } RUNTIME_EXIT /*! \fn clGetDeviceInfo * * \brief Get specific information about an OpenCL device. * * \param device is a device returned by clGetDeviceIDs. * * \param param_name is an enum that identifies the device information being * queried. * * \param param_value is a pointer to memory location where appropriate values * for a given \a param_name will be returned. If \a param_value is NULL, * it is ignored. * * \param param_value_size specifies the size in bytes of memory pointed to * by \a param_value. This size in bytes must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_INVALID_DEVICE if device is not valid. * - CL_INVALID_VALUE if param_name is not one of the supported values * or if size in bytes specified by \a param_value_size is < size of return * type and \a param_value is not a NULL value. * - CL_SUCCESS if the function is executed successfully. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetDeviceInfo, (cl_device_id device, cl_device_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(device)) { return CL_INVALID_DEVICE; } #define CASE(param_name, field_name) \ case param_name: \ return amd::clGetInfo(as_amd(device)->info().field_name, param_value_size, param_value, \ param_value_size_ret); switch (param_name) { case CL_DEVICE_TYPE: { // For cl_device_type, we need to mask out the default bit. cl_device_type device_type = as_amd(device)->type(); return amd::clGetInfo(device_type, param_value_size, param_value, param_value_size_ret); } CASE(CL_DEVICE_VENDOR_ID, vendorId_); CASE(CL_DEVICE_MAX_COMPUTE_UNITS, maxComputeUnits_); CASE(CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, maxWorkItemDimensions_); CASE(CL_DEVICE_MAX_WORK_GROUP_SIZE, preferredWorkGroupSize_); CASE(CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD, preferredWorkGroupSize_); CASE(CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD, maxWorkGroupSize_); CASE(CL_DEVICE_MAX_WORK_ITEM_SIZES, maxWorkItemSizes_); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, preferredVectorWidthChar_); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, preferredVectorWidthShort_); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, preferredVectorWidthInt_); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, preferredVectorWidthLong_); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, preferredVectorWidthFloat_); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, preferredVectorWidthDouble_); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, preferredVectorWidthDouble_); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, nativeVectorWidthChar_); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, nativeVectorWidthShort_); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, nativeVectorWidthInt_); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, nativeVectorWidthLong_); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, nativeVectorWidthFloat_); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, nativeVectorWidthDouble_); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, nativeVectorWidthDouble_); CASE(CL_DEVICE_MAX_CLOCK_FREQUENCY, maxEngineClockFrequency_); CASE(CL_DEVICE_ADDRESS_BITS, addressBits_); CASE(CL_DEVICE_MAX_READ_IMAGE_ARGS, maxReadImageArgs_); CASE(CL_DEVICE_MAX_WRITE_IMAGE_ARGS, maxWriteImageArgs_); CASE(CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, maxReadWriteImageArgs_); CASE(CL_DEVICE_MAX_MEM_ALLOC_SIZE, maxMemAllocSize_); CASE(CL_DEVICE_IMAGE2D_MAX_WIDTH, image2DMaxWidth_); CASE(CL_DEVICE_IMAGE2D_MAX_HEIGHT, image2DMaxHeight_); CASE(CL_DEVICE_IMAGE3D_MAX_WIDTH, image3DMaxWidth_); CASE(CL_DEVICE_IMAGE3D_MAX_HEIGHT, image3DMaxHeight_); CASE(CL_DEVICE_IMAGE3D_MAX_DEPTH, image3DMaxDepth_); CASE(CL_DEVICE_IMAGE_SUPPORT, imageSupport_); CASE(CL_DEVICE_MAX_PARAMETER_SIZE, maxParameterSize_); CASE(CL_DEVICE_MAX_SAMPLERS, maxSamplers_); CASE(CL_DEVICE_MEM_BASE_ADDR_ALIGN, memBaseAddrAlign_); CASE(CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, minDataTypeAlignSize_); CASE(CL_DEVICE_HALF_FP_CONFIG, halfFPConfig_); CASE(CL_DEVICE_SINGLE_FP_CONFIG, singleFPConfig_); CASE(CL_DEVICE_DOUBLE_FP_CONFIG, doubleFPConfig_); CASE(CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, globalMemCacheType_); CASE(CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, globalMemCacheLineSize_); CASE(CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, globalMemCacheSize_); CASE(CL_DEVICE_GLOBAL_MEM_SIZE, globalMemSize_); CASE(CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, maxConstantBufferSize_); CASE(CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD, preferredConstantBufferSize_); CASE(CL_DEVICE_MAX_CONSTANT_ARGS, maxConstantArgs_); CASE(CL_DEVICE_LOCAL_MEM_TYPE, localMemType_); CASE(CL_DEVICE_LOCAL_MEM_SIZE, localMemSize_); CASE(CL_DEVICE_ERROR_CORRECTION_SUPPORT, errorCorrectionSupport_); CASE(CL_DEVICE_HOST_UNIFIED_MEMORY, hostUnifiedMemory_); CASE(CL_DEVICE_PROFILING_TIMER_RESOLUTION, profilingTimerResolution_); CASE(CL_DEVICE_PROFILING_TIMER_OFFSET_AMD, profilingTimerOffset_); CASE(CL_DEVICE_ENDIAN_LITTLE, littleEndian_); CASE(CL_DEVICE_AVAILABLE, available_); CASE(CL_DEVICE_COMPILER_AVAILABLE, compilerAvailable_); CASE(CL_DEVICE_EXECUTION_CAPABILITIES, executionCapabilities_); CASE(CL_DEVICE_SVM_CAPABILITIES, svmCapabilities_); CASE(CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT, preferredPlatformAtomicAlignment_); CASE(CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT, preferredGlobalAtomicAlignment_); CASE(CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT, preferredLocalAtomicAlignment_); CASE(CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, queueProperties_); CASE(CL_DEVICE_PLATFORM, platform_); CASE(CL_DEVICE_NAME, name_); CASE(CL_DEVICE_VENDOR, vendor_); CASE(CL_DRIVER_VERSION, driverVersion_); CASE(CL_DEVICE_PROFILE, profile_); CASE(CL_DEVICE_VERSION, version_); CASE(CL_DEVICE_OPENCL_C_VERSION, oclcVersion_); CASE(CL_DEVICE_EXTENSIONS, extensions_); CASE(CL_DEVICE_MAX_ATOMIC_COUNTERS_EXT, maxAtomicCounters_); CASE(CL_DEVICE_TOPOLOGY_AMD, deviceTopology_); CASE(CL_DEVICE_MAX_SEMAPHORE_SIZE_AMD, maxSemaphoreSize_); CASE(CL_DEVICE_BOARD_NAME_AMD, boardName_); CASE(CL_DEVICE_SPIR_VERSIONS, spirVersions_); CASE(CL_DEVICE_IL_VERSION, spirVersions_); CASE(CL_DEVICE_MAX_PIPE_ARGS, maxPipeArgs_); CASE(CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, maxPipeActiveReservations_); CASE(CL_DEVICE_PIPE_MAX_PACKET_SIZE, maxPipePacketSize_); CASE(CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, maxGlobalVariableSize_); CASE(CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, globalVariablePreferredTotalSize_); CASE(CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, queueOnDeviceProperties_); CASE(CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, queueOnDevicePreferredSize_); CASE(CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, queueOnDeviceMaxSize_); CASE(CL_DEVICE_MAX_ON_DEVICE_QUEUES, maxOnDeviceQueues_); CASE(CL_DEVICE_MAX_ON_DEVICE_EVENTS, maxOnDeviceEvents_); CASE(CL_DEVICE_LINKER_AVAILABLE, linkerAvailable_); CASE(CL_DEVICE_BUILT_IN_KERNELS, builtInKernels_); CASE(CL_DEVICE_IMAGE_MAX_BUFFER_SIZE, imageMaxBufferSize_); CASE(CL_DEVICE_IMAGE_MAX_ARRAY_SIZE, imageMaxArraySize_); case CL_DEVICE_PARENT_DEVICE: { cl_device_id parent = (cl_device_id)0; return amd::clGetInfo(parent, param_value_size, param_value, param_value_size_ret); } CASE(CL_DEVICE_PARTITION_MAX_SUB_DEVICES, maxComputeUnits_); case CL_DEVICE_PARTITION_PROPERTIES: { cl_device_partition_property cl_property = {}; return amd::clGetInfo(cl_property, param_value_size, param_value, param_value_size_ret); } case CL_DEVICE_PARTITION_AFFINITY_DOMAIN: { cl_device_affinity_domain deviceAffinity = {}; return amd::clGetInfo(deviceAffinity, param_value_size, param_value, param_value_size_ret); } case CL_DEVICE_PARTITION_TYPE: { cl_device_partition_property cl_property = {}; return amd::clGetInfo(cl_property, param_value_size, param_value, param_value_size_ret); } case CL_DEVICE_REFERENCE_COUNT: { cl_uint count = as_amd(device)->referenceCount(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } CASE(CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, preferredInteropUserSync_); CASE(CL_DEVICE_PRINTF_BUFFER_SIZE, printfBufferSize_); CASE(CL_DEVICE_IMAGE_PITCH_ALIGNMENT, imagePitchAlignment_); CASE(CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, imageBaseAddressAlignment_); default: break; } if (as_amd(device)->type() == CL_DEVICE_TYPE_GPU) { switch (param_name) { case CL_DEVICE_GLOBAL_FREE_MEMORY_AMD: { // Free memory should contain 2 values: // total free memory and the biggest free block size_t freeMemory[2]; if (!as_amd(device)->globalFreeMemory(freeMemory)) { return CL_INVALID_DEVICE; } if (param_value_size < sizeof(freeMemory)) { // Return just total free memory if the app provided space for one value return amd::clGetInfo(freeMemory[0], param_value_size, param_value, param_value_size_ret); } else { return amd::clGetInfo(freeMemory, param_value_size, param_value, param_value_size_ret); } } CASE(CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, simdPerCU_); CASE(CL_DEVICE_SIMD_WIDTH_AMD, simdWidth_); CASE(CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, simdInstructionWidth_); CASE(CL_DEVICE_WAVEFRONT_WIDTH_AMD, wavefrontWidth_); case CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD: { cl_uint globalMemChannels = as_amd(device)->info().vramBusBitWidth_ / 32; return amd::clGetInfo(globalMemChannels, param_value_size, param_value, param_value_size_ret); } CASE(CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, globalMemChannelBanks_); CASE(CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, globalMemChannelBankWidth_); CASE(CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, localMemSizePerCU_); CASE(CL_DEVICE_LOCAL_MEM_BANKS_AMD, localMemBanks_); CASE(CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD, threadTraceEnable_); case CL_DEVICE_GFXIP_MAJOR_AMD: { cl_uint major = as_amd(device)->isa().versionMajor(); return amd::clGetInfo(major, param_value_size, param_value, param_value_size_ret); } case CL_DEVICE_GFXIP_MINOR_AMD: { cl_uint minor = as_amd(device)->isa().versionMinor(); return amd::clGetInfo(minor, param_value_size, param_value, param_value_size_ret); } CASE(CL_DEVICE_AVAILABLE_ASYNC_QUEUES_AMD, numAsyncQueues_); #define CL_DEVICE_MAX_REAL_TIME_COMPUTE_QUEUES_AMD 0x404D #define CL_DEVICE_MAX_REAL_TIME_COMPUTE_UNITS_AMD 0x404E #define CL_DEVICE_MAX_REAL_TIME_COMPUTE_UNITS_GRANULARITY_AMD 0x403A CASE(CL_DEVICE_MAX_REAL_TIME_COMPUTE_QUEUES_AMD, numRTQueues_); CASE(CL_DEVICE_MAX_REAL_TIME_COMPUTE_UNITS_AMD, numRTCUs_); CASE(CL_DEVICE_MAX_REAL_TIME_COMPUTE_UNITS_GRANULARITY_AMD, granularityRTCUs_); case CL_DEVICE_NUM_P2P_DEVICES_AMD: { cl_uint num_p2p_devices = as_amd(device)->p2pDevices_.size(); return amd::clGetInfo(num_p2p_devices, param_value_size, param_value, param_value_size_ret); } case CL_DEVICE_P2P_DEVICES_AMD: { uint valueSize = as_amd(device)->p2pDevices_.size() * sizeof(cl_device_id); if (param_value != NULL) { if ((param_value_size < valueSize) || (param_value_size == 0)) { return CL_INVALID_VALUE; } } else { return CL_INVALID_VALUE; } memcpy(param_value, as_amd(device)->p2pDevices_.data(), valueSize); *not_null(param_value_size_ret) = valueSize; if (param_value != NULL && param_value_size > valueSize) { ::memset(static_cast(param_value) + valueSize, '\0', param_value_size - valueSize); } return CL_SUCCESS; } CASE(CL_DEVICE_PCIE_ID_AMD, pcieDeviceId_); default: break; } } #undef CASE return CL_INVALID_VALUE; } RUNTIME_EXIT RUNTIME_ENTRY(cl_int, clCreateSubDevices, (cl_device_id in_device, const cl_device_partition_property* partition_properties, cl_uint num_entries, cl_device_id* out_devices, cl_uint* num_devices)) { if (!is_valid(in_device)) { return CL_INVALID_DEVICE; } if (partition_properties == NULL || *partition_properties == 0u) { return CL_INVALID_VALUE; } if ((num_devices == NULL && out_devices == NULL) || (num_entries == 0 && out_devices != NULL)) { return CL_INVALID_VALUE; } return CL_INVALID_VALUE; } RUNTIME_EXIT RUNTIME_ENTRY(cl_int, clRetainDevice, (cl_device_id device)) { if (!is_valid(device)) { return CL_INVALID_DEVICE; } as_amd(device)->retain(); return CL_SUCCESS; } RUNTIME_EXIT RUNTIME_ENTRY(cl_int, clReleaseDevice, (cl_device_id device)) { if (!is_valid(device)) { return CL_INVALID_DEVICE; } as_amd(device)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_event.cpp000066400000000000000000000364351450307266000203020ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "platform/object.hpp" #include "platform/context.hpp" #include "platform/command.hpp" /*! \addtogroup API * @{ * \addtogroup CL_Events * * Event objects can be used to refer to a kernel execution command: * - clEnqueueNDRangeKernel * - clEnqueueTask * - clEnqueueNativeKernel * * or read, write, map and copy commands on memory objects: * - clEnqueue{Read|Write|Map}{Buffer|Image} * - clEnqueueCopy{Buffer|Image} * - clEnqueueCopyBufferToImage * - clEnqueueCopyImageToBuffer * * An event object can be used to track the execution status of a command. * The execution status of a command at any given point in time can be * CL_QUEUED (is currently in the command queue), * CL_RUNNING (device is currently executing this command), * CL_COMPLETE (command has successfully completed) or the appropriate error * code if the command was abnormally terminated (this may be caused by a bad * memory access etc.). The error code returned by a terminated command is * a negative integer value. A command is considered to be complete if its * execution status is CL_COMPLETE or is a negative integer value. * * If the execution of a command is terminated, the command-queue associated * with this terminated command, and the associated context (and all other * command-queues in this context) may no longer be available. The behavior of * OpenCL API calls that use this context (and command-queues associated with * this context) are now considered to be implementationdefined. The user * registered callback function specified when context is created can be used * to report appropriate error information. * * @{ */ /*! \brief Wait on the host thread for commands identified by event objects in * event_list to complete. * * A command is considered complete if its execution status is CL_COMPLETE or * a negative value. The events specified in event_list act as synchronization * points. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully. * - CL_INVALID_VALUE if \a num_events is zero * - CL_INVALID_CONTEXT if events specified in \a event_list do not belong to * the same context * - CL_INVALID_EVENT if event objects specified in \a event_list are not valid * event objects. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clWaitForEvents, (cl_uint num_events, const cl_event* event_list)) { if (num_events == 0 || event_list == NULL) { return CL_INVALID_VALUE; } const amd::Context* prevContext = NULL; const amd::HostQueue* prevQueue = NULL; for (cl_uint i = 0; i < num_events; ++i) { cl_event event = event_list[i]; if (!is_valid(event)) { return CL_INVALID_EVENT; } // Make sure all the events are associated with the same context const amd::Context* context = &as_amd(event)->context(); if (prevContext != NULL && prevContext != context) { return CL_INVALID_CONTEXT; } prevContext = context; // Flush the command queues associated with event1...eventN amd::HostQueue* queue = as_amd(event)->command().queue(); if (queue != NULL && prevQueue != queue) { queue->flush(); } prevQueue = queue; } bool allSucceeded = true; while (num_events-- > 0) { allSucceeded &= as_amd(*event_list++)->awaitCompletion(); } return allSucceeded ? CL_SUCCESS : CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; } RUNTIME_EXIT /*! \brief Return information about the event object. * * \param event specifies the event object being queried. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * Using clGetEventInfo to determine if a command identified by event has * finished execution (i.e. CL_EVENT_COMMAND_EXECUTION_STATUS returns * CL_COMPLETE) is not a synchronization point i.e. there are no guarantees * that the memory objects being modified by command associated with event will * be visible to other enqueued commands. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_EVENT if \a event is a not a valid event object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetEventInfo, (cl_event event, cl_event_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(event)) { return CL_INVALID_EVENT; } switch (param_name) { case CL_EVENT_CONTEXT: { amd::Context& amdCtx = const_cast(as_amd(event)->context()); cl_context context = as_cl(&amdCtx); return amd::clGetInfo(context, param_value_size, param_value, param_value_size_ret); } case CL_EVENT_COMMAND_QUEUE: { amd::Command& command = as_amd(event)->command(); cl_command_queue queue = command.queue() == NULL ? NULL : const_cast(as_cl(command.queue()->asCommandQueue())); return amd::clGetInfo(queue, param_value_size, param_value, param_value_size_ret); } case CL_EVENT_COMMAND_TYPE: { cl_command_type type = as_amd(event)->command().type(); return amd::clGetInfo(type, param_value_size, param_value, param_value_size_ret); } case CL_EVENT_COMMAND_EXECUTION_STATUS: { as_amd(event)->notifyCmdQueue(); cl_int status = as_amd(event)->command().status(); return amd::clGetInfo(status, param_value_size, param_value, param_value_size_ret); } case CL_EVENT_REFERENCE_COUNT: { cl_uint count = as_amd(event)->referenceCount(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! \brief Increment the event reference count. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_EVENT if \a event is not a valid event object. * * The OpenCL commands that return an event perform an implicit retain. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clRetainEvent, (cl_event event)) { if (!is_valid(event)) { return CL_INVALID_EVENT; } as_amd(event)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Decrement the event reference count. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_EVENT if \a event is not a valid event object. * * The event object is deleted once the reference count becomes zero, the * specific command identified by this event has completed (or terminated) and * there are no commands in the command-queues of a context that require a wait * for this event to complete. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clReleaseEvent, (cl_event event)) { if (!is_valid(event)) { return CL_INVALID_EVENT; } as_amd(event)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Creates a user event object. * * User events allow applications to enqueue commands that wait on a user event * to finish before the command is executed by the device. * * \return a valid non-zero event object and errcode_ret is set to CL_SUCCESS * if the user event object is created successfully. Otherwise, it returns * a NULL value with one of the following error values returned in errcode_ret: * - CL_INVALID_CONTEXT if context is not a valid context. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * The execution status of the user event object created is set to CL_SUBMITTED. * * \version 1.1r15 */ RUNTIME_ENTRY_RET(cl_event, clCreateUserEvent, (cl_context context, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return (cl_event)0; } amd::Event* event = new amd::UserEvent(*as_amd(context)); if (event == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_event)0; } event->retain(); *not_null(errcode_ret) = CL_SUCCESS; return as_cl(event); } RUNTIME_EXIT /*! \brief Sets the execution status of a user event object. * * \a event is a user event object created using clCreateUserEvent. * \a execution_status specifies the new execution status to be set and can be * CL_COMPLETE or a negative integer value to indicate an error. * clSetUserEventStatus can only be called once to change the execution status * of event. * * \return CL_SUCCESS if the function was executed successfully. Otherwise, * it returns one of the following errors: * - CL_INVALID_EVENT if event is not a valid user event object. * - CL_INVALID_VALUE if the execution_status is not CL_COMPLETE or * a negative integer value. * - CL_INVALID_OPERATION if the execution_status for event has already been * changed by a previous call to clSetUserEventStatus. * * \version 1.1r15 */ RUNTIME_ENTRY(cl_int, clSetUserEventStatus, (cl_event event, cl_int execution_status)) { if (!is_valid(event)) { return CL_INVALID_EVENT; } if (execution_status > CL_COMPLETE) { return CL_INVALID_VALUE; } if (!as_amd(event)->setStatus(execution_status)) { return CL_INVALID_OPERATION; } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Registers a user callback function for a specific command execution * status. * * The registered callback function will be called when the execution status * of command associated with event changes to the execution status specified * by command_exec_status. * * Each call to clSetEventCallback registers the specified user callback * function on a callback stack associated with event. The order in which the * registered user callback functions are called is undefined. * * \a event is a valid event object. * \a command_exec_callback_type specifies the command execution status for * which the callback is registered. The command execution callback mask * values for which a callback can be registered are: CL_COMPLETE. * There is no guarantee that the callback functions registered for various * execution status values for an event will be called in the exact order * that the execution status of a command changes. * \a pfn_event_notify is the event callback function that can be registered * by the application. This callback function may be called asynchronously * by the OpenCL implementation. It is the application’s responsibility to * ensure that the callback function is thread-safe. The parameters to this * callback function are: * event is the event object for which the callback function is invoked. * event_command_exec_status represents the execution status of command * for which this callback function is invoked. If the callback is called * as the result of the command associated with event being abnormally * terminated, an appropriate error code for the error that caused the * termination will be passed to event_command_exec_status instead. * \a user_data is a pointer to user supplied data. user_data will be passed as * the user_data argument when pfn_notify is called. user_data can be NULL. * * All callbacks registered for an event object must be called. All enqueued * callbacks shall be called before the event object is destroyed. Callbacks * must return promptly. The behavior of calling expensive system routines, * OpenCL API calls to create contexts or command-queues, or blocking OpenCL * operations from the following list below, in a callback is undefined. * clFinish, clWaitForEvents, blocking calls to clEnqueueReadBuffer, * clEnqueueReadBufferRect, clEnqueueWriteBuffer, clEnqueueWriteBufferRect, * blocking calls to clEnqueueReadImage and clEnqueueWriteImage, blocking * calls to clEnqueueMapBuffer and clEnqueueMapImage, blocking calls to * clBuildProgram * * If an application needs to wait for completion of a routine from the above * list in a callback, please use the non-blocking form of the function, and * assign a completion callback to it to do the remainder of your work. * Note that when a callback (or other code) enqueues commands to a * command-queue, the commands are not required to begin execution until the * queue is flushed. In standard usage, blocking enqueue calls serve this role * by implicitly flushing the queue. Since blocking calls are not permitted in * callbacks, those callbacks that enqueue commands on a command queue should * either call clFlush on the queue before returning or arrange for clFlush * to be called later on another thread. * * \return CL_SUCCESS if the function is executed successfully. Otherwise, * it returns one of the following errors: * - CL_INVALID_EVENT if event is not a valid event object or is a user event * object created using clCreateUserEvent. * - CL_INVALID_VALUE if pfn_event_notify is NULL or if * command_exec_callback_type is not a valid command execution status. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 1.1r15 */ RUNTIME_ENTRY(cl_int, clSetEventCallback, (cl_event event, cl_int command_exec_callback_type, void(CL_CALLBACK* pfn_notify)(cl_event event, cl_int command_exec_status, void* user_data), void* user_data)) { if (!is_valid(event)) { return CL_INVALID_EVENT; } if (pfn_notify == NULL || command_exec_callback_type < CL_COMPLETE || command_exec_callback_type > CL_QUEUED) { return CL_INVALID_VALUE; } if (!as_amd(event)->setCallback(command_exec_callback_type, pfn_notify, user_data)) { return CL_OUT_OF_HOST_MEMORY; } as_amd(event)->notifyCmdQueue(); return CL_SUCCESS; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_execute.cpp000066400000000000000000001327711450307266000206230ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "vdi_common.hpp" #include "platform/kernel.hpp" #include "platform/ndrange.hpp" #include "platform/command.hpp" #include "platform/program.hpp" #include "os/os.hpp" /*! \addtogroup API * @{ * * \addtogroup CL_Exec Executing Kernel Objects * * @{ */ /*! \brief Enqueue a command to execute a kernel on a device. * * \param command_queue is a valid command-queue. The kernel will be queued * for execution on the device associated with \a command_queue. * * \param kernel is a valid kernel object. The OpenCL context associated with * \a kernel and \a command-queue must be the same. * * \param work_dim is the number of dimensions used to specify the global * work-items and work-items in the work-group. \a work_dim must be greater * than zero and less than or equal to three. * * \param global_work_offset must currently be a NULL value. In a future * revision of OpenCL, \a global_work_offset can be used to specify an array * of \a work_dim unsigned values that describe the offset used to calculate * the global ID of a work-item instead of having the global IDs always start * at offset (0, 0, 0). * * \param global_work_size points to an array of \a work_dim unsigned values * that describe the number of global work-items in \a work_dim dimensions * that will execute the kernel function. The total number of global * work-items is computed as global_work_size[0] * ... * * global_work_size[work_dim - 1]. * * \param local_work_size points to an array of \a work_dim unsigned values * that describe the number of work-items that make up a work-group (also * referred to as the size of the work-group) that will execue the kernel * specified by kernel. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular kernel * execution instance. Event objects are unique and can be used to identify a * particular kernel execution instance later on. If \a event is NULL, no * event will be created for this kernel execution instance and therefore it * will not be possible for the application to query or queue a wait for this * particular kernel execution instance. * * The total number of work-items in a work-group is computed as * local_work_size[0] * ... * local_work_size[work_dim - 1]. * The total number of work-items in the work-group must be less than or equal * to the CL_DEVICE_MAX_WORK_GROUP_SIZE. The explicitly specified * \a local_work_size will be used to determine how to break the global work- * items specified by global_work_size into appropriate work-group instances. * If \a local_work_size is specified, the values specified in * \a global_work_size[0], ..., global_work_size[work_dim - 1] must be evenly * divisable by the corresponding values specified in \a local_work_size[0], * ..., local_work_size[work_dim - 1]. \a local_work_size can also be a NULL * value in which case the OpenCL implementation will determine how to be * break the global work-items into appropriate work-groups. * * If \a local_work_size is NULL and no work-group size is specified when the * kernel is compiled, the OpenCL implementation will determine how to break * the global work-items specified by \a global_work_size into appropriate * work-group instances. The work-group size to be used for kernel can also be * specified in the program source using the * __attribute__((reqd_work_group_size(X, Y, Z))) qualifier. In this case the * size of work group specified by \a local_work_size must match the value * specified by the \a reqd_work_group_size attribute qualifier. * * These work-group instances are executed in parallel across multiple * compute units or concurrently on the same compute unit. Each work-item * is uniquely identified by a global identifier. The global ID, which can be * read inside the kernel is computed using the value given by * \a global_work_size and \a global_work_offset. * * \return One of the following values: * * - CL_SUCCESS if the kernel execution was successfully queued * * - CL_INVALID_PROGRAM_EXECUTABLE if there is no successfully built program * executable available for device associated with \a command_queue. * * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * * - CL_INVALID_KERNEL if \a kernel is not a valid kernel object. * * - CL_INVALID_CONTEXT if context associated with command_queue and kernel are * not the same or if the context associated with command_queue and events in * event_wait_list are not the same. * * - CL_INVALID_KERNEL_ARGS if the kernel argument values have not been * specified or are not valid for the device on which kernel will be * executed. * * - CL_INVALID_WORK_DIMENSION if \a work_dim is not a valid value * (i.e. a value between 1 and 3). * * - CL_INVALID_WORK_GROUP_SIZE if \a local_work_size is specified and number * of workitems specified by \a global_work_size is not evenly divisable by * size of work-given by \a local_work_size or does not match the work-group * size specified for kernel using the * __attribute__((reqd_work_group_size(X, Y, Z))) qualifier in program * source. * * - CL_INVALID_GLOBAL_OFFSET if \a global_work_offset is not NULL. * * - CL_OUT_OF_RESOURCES if there is a failure to queue the execution instance * of \a kernel on the command-queue because of insufficient resources * needed to execute the kernel. For example, the explicitly specified * \a local_work_dim in range causes a failure to execute the kernel because * of insufficient resources such as registers or local memory. Another * example would be the number of read-only image args used in kernel exceed * the CL_DEVICE_MAX_READ_IMAGE_ARGS value for device or the number of * write-only image args used in kernel exceed the * CL_DEVICE_MAX_WRITE_IMAGE_ARGS value for device or the number of samplers * used in kernel exceed CL_DEVICE_MAX_SAMPLERS for device. * * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for image or buffer objects specified as arguments to kernel. * * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in * \a event_wait_list are not valid events. * * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueNDRangeKernel, (cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t* global_work_offset, const size_t* global_work_size, const size_t* local_work_size, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { *not_null(event) = NULL; if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; const amd::Kernel* amdKernel = as_amd(kernel); if (&hostQueue.context() != &amdKernel->program().context()) { return CL_INVALID_CONTEXT; } const amd::Device& device = hostQueue.device(); const device::Kernel* devKernel = amdKernel->getDeviceKernel(device); if (devKernel == NULL) { return CL_INVALID_PROGRAM_EXECUTABLE; } if (amdKernel->parameters().getSvmSystemPointersSupport() == FGS_YES && !(device.info().svmCapabilities_ & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM)) { // The user indicated that this kernel will access SVM system pointers, // but the device does not support them. return CL_INVALID_OPERATION; } if (work_dim < 1 || work_dim > 3) { return CL_INVALID_WORK_DIMENSION; } #if !defined(CL_VERSION_1_1) if (global_work_offset != NULL) { return CL_INVALID_GLOBAL_OFFSET; } #endif // CL_VERSION if (global_work_size == NULL) { return CL_INVALID_VALUE; } if (local_work_size == NULL) { static size_t zeroes[3] = {0, 0, 0}; local_work_size = zeroes; } else { size_t numWorkItems = 1; for (cl_uint dim = 0; dim < work_dim; ++dim) { if ((devKernel->workGroupInfo()->compileSize_[0] != 0) && (local_work_size[dim] != devKernel->workGroupInfo()->compileSize_[dim])) { return CL_INVALID_WORK_GROUP_SIZE; } // >32bits global work size is not supported. if ((global_work_size[dim] == 0) || (global_work_size[dim] > static_cast(0xffffffff))) { return CL_INVALID_GLOBAL_WORK_SIZE; } numWorkItems *= local_work_size[dim]; } // Make sure local work size is valid if ((numWorkItems == 0) || (numWorkItems > devKernel->workGroupInfo()->size_)) { return CL_INVALID_WORK_GROUP_SIZE; } // Check if uniform was requested and validate dimensions if (devKernel->workGroupInfo()->uniformWorkGroupSize_) { for (cl_uint dim = 0; dim < work_dim; ++dim) { if ((global_work_size[dim] % local_work_size[dim]) != 0) { return CL_INVALID_WORK_GROUP_SIZE; } } } } // Check that all parameters have been defined. if (!amdKernel->parameters().check()) { return CL_INVALID_KERNEL_ARGS; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::NDRangeContainer ndrange((size_t)work_dim, global_work_offset, global_work_size, local_work_size); amd::NDRangeKernelCommand* command = new amd::NDRangeKernelCommand(hostQueue, eventWaitList, *as_amd(kernel), ndrange); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // ndrange is now owned by command. Do not delete it! // Make sure we have memory for the command execution cl_int result = command->captureAndValidate(); if (result != CL_SUCCESS) { delete command; return result; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueue a command to execute a kernel on a device. * The kernel is executed using a single work-item. * * \param command_queue is a valid command-queue. The kernel will be queued * for execution on the device associated with \a command_queue. * * \param kernel is a valid kernel object. The OpenCL context associated with * \a kernel and \a command-queue must be the same. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event objects that identifies this particular kernel * execution instance. Event objects are unique and can be used to identify a * particular kernel execution instance later on. If \a event is NULL, no event * will be created for this kernel execution instance and therefore it will not * be possible for the application to query or queue a wait for this particular * kernel execution instance. * * \return One of the following values: * - CL_SUCCESS if the kernel execution was successfully queued. * - CL_INVALID_PROGRAM_EXECUTABLE if there is no successfully built program * executable available for device associated with \a command_queue. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_KERNEL if \a kernel is not a valid kernel object. * - CL_INVALID_KERNEL_ARGS if the kernel argument values have not been * specified or are not valid for the device on which kernel will be * executed. * - CL_INVALID_WORK_GROUP_SIZE if a work-group size is specified for * kernel using the __attribute__((reqd_work_group_size(X, Y, Z))) * qualifier in program source and is not (1, 1, 1). * - CL_OUT_OF_RESOURCES if there is a failure to queue the execution instance * of kernel on the command-queue because of insufficient resources needed * to execute the kernel. For example, the explicitly specified * \a local_work_dim in range causes a failure to execute the kernel because * of insufficient resources such as registers or local memory. Another * example would be the number of read-only image args used in kernel exceed * the CL_DEVICE_MAX_READ_IMAGE_ARGS value for device or the number of * write-only image args used in kernel exceed the * CL_DEVICE_MAX_WRITE_IMAGE_ARGS value for device or the number of samplers * used in kernel exceed CL_DEVICE_MAX_SAMPLERS for device. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for image or buffer objects specified as arguments to kernel. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueTask, (cl_command_queue command_queue, cl_kernel kernel, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { static size_t const globalWorkSize[3] = {1, 0, 0}; static size_t const localWorkSize[3] = {1, 0, 0}; if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } return hostQueue->dispatch_->clEnqueueNDRangeKernel( command_queue, kernel, 1, NULL, globalWorkSize, localWorkSize, num_events_in_wait_list, event_wait_list, event); } RUNTIME_EXIT /*! \brief Enqueue a command to execute a native C/C++ function not compiled * using the OpenCL compiler. * * \param command_queue is a valid command-queue. A native user function can * only be executed on a command-queue created on a device that has * CL_EXEC_NATIVE_KERNEL capability set in CL_DEVICE_EXECUTION_CAPABILITIES. * * \param user_func is a pointer to a host-callable user function. * * \param args is a pointer to the args list that \a user_func should be called * with. * * \param cb_args is the size in bytes of the args list that args points to. * The data pointed to by \a args and \a cb_args bytes in size will be copied * and a pointer to this copied region will be passed to \a user_func. The copy * needs to be done because the memory objects (cl_mem values) that args may * contain need to be modified and replaced by appropriate pointers to global * memory. When clEnqueueNativeKernel returns, the memory region pointed to by * args can be reused by the application. * * \param num_mem_objects is the number of buffer objects that are passed in * args. * * \param mem_list is a list of valid buffer objects, if \a num_mem_objects > 0 * * \param args_mem_loc is a pointer to appropriate locations that args points * to where memory object handles (cl_mem values) are stored. Before the user * function is executed, the memory object handles are replaced by pointers to * global memory. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list * * \param event_wait_list as described in clEnqueueNDRangeKernel. * * \param event returns an event objects that identifies this particular kernel * execution instance. Event objects are unique and can be used to identify a * particular kernel execution instance later on. If \a event is NULL, no event * will be created for this kernel execution instance and therefore it will not * be possible for the application to query or queue a wait for this particular * kernel execution instance. * * \return One of the following values: * - CL_SUCCESS if the user function execution instance was successfully queued * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_VALUE if \a user_func is NULL, or if \a args is a NULL value * and \a num_mem_objects > 0 or if \a num_mem_objects > 0 and \a mem_list * is NULL. * - CL_INVALID_OPERATION if device cannot execute the native kernel. * - CL_INVALID_MEM_OBJECT if one or more memory objects specified in * \a mem_list are not valid or are not buffer objects. * - CL_OUT_OF_RESOURCES if there is a failure to queue the execution instance * of kernel on the command-queue because of insufficient resources needed * to execute the kernel. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for buffer objects specified as arguments to \a kernel. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueNativeKernel, (cl_command_queue command_queue, void(CL_CALLBACK* user_func)(void*), void* args, size_t cb_args, cl_uint num_mem_objects, const cl_mem* mem_list, const void** args_mem_loc, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { *not_null(event) = NULL; if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; const amd::Device& device = hostQueue.device(); if (!(device.info().executionCapabilities_ & CL_EXEC_NATIVE_KERNEL)) { return CL_INVALID_OPERATION; } if (user_func == NULL || (num_mem_objects > 0 && (mem_list == NULL || args_mem_loc == NULL)) || (num_mem_objects == 0 && (mem_list != NULL || args_mem_loc != NULL)) || (args == NULL && (cb_args > 0 || num_mem_objects > 0)) || (args != NULL && cb_args == 0)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } for (size_t i = 0; i < num_mem_objects; ++i) { cl_mem obj = mem_list[i]; if (!is_valid(obj)) { return CL_INVALID_MEM_OBJECT; } } amd::NativeFnCommand* command = new amd::NativeFnCommand( hostQueue, eventWaitList, user_func, args, cb_args, num_mem_objects, mem_list, args_mem_loc); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * * \addtogroup CL_Order Out of order Execution of Kernels and Memory Commands * * The OpenCL functions that are submitted to a command-queue are queued in * the order the calls are made but can be configured to execute in-order or * out-of-order. The properties argument in clCreateCommandQueue can be used * to specify the execution order. * * If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a command-queue * is not set, the commands queued to a command-queue execute in order. * For example, if an application calls clEnqueueNDRangeKernel to execute * kernel A followed by a clEnqueueNDRangeKernel to execute kernel B, * the application can assume that kernel A finishes first and then kernel B * is executed. If the memory objects output by kernel A are inputs to kernel B * then kernel B will see the correct data in memory objects produced * by execution of kernel A. If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE * property of a commandqueue is set, then there is no guarantee that kernel A * will finish before kernel B starts execution. * * Applications can configure the commands queued to a command-queue to * execute out-of-order by setting the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE * property of the commandqueue. This can be specified when the command-queue * is created or can be changed dynamically using clSetCommandQueueProperty. * In out-of-order execution mode there is no guarantee that the queued * commands will finish execution in the order they were queued. As there is * no guarantee that kernels will be executed in order i.e. based on when * the clEnqueueNDRangeKernel calls are made within a command-queue, it is * therefore possible that an earlier clEnqueueNDRangeKernel call to execute * kernel A identified by event A may execute and/or finish later than a * clEnqueueNDRangeKernel call to execute kernel B which was called by the * application at a later point in time. To guarantee a specific order of * execution of kernels, a wait on a particular event (in this case event A) * can be used. The wait for event A can be specified in the event_wait_list * argument to clEnqueueNDRangeKernel for kernel B. * * In addition, a wait for events or a barrier function can be queued to the * command-queue. The wait for events command ensures that previously queued * commands identified by the list of events to wait for have finished before * the next batch of commands is executed. The barrier ensures that all * previously queued commands in a command-queue have finished execution * before the next batch of commands is executed. * * Similarly, commands to read, write, copy or map memory objects that are * queued after clEnqueueNDRangeKernel, clEnqueueTask or clEnqueueNativeKernel * commands are not guaranteed to wait for kernels scheduled for execution * to have completed (if the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property * is set). To ensure correct ordering of commands, the event object returned * by clEnqueueNDRangeKernel, clEnqueueTask or clEnqueueNativeKernel can be * used to queue a wait for event or a barrier command can be queued that must * complete before reads or writes to the memory object(s) occur. * * @{ */ /*! \brief Enqueue a marker command to \a command_queue. * * The marker command returns an event which can be used by to queue a wait on * this marker event i.e. wait for all commands queued before the marker * command to complete. * * \return One of the following values: * - CL_SUCCESS if the function is successfully executed * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_INVALID_VALUE if \a event is a NULL value * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueMarker, (cl_command_queue command_queue, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } amd::Command* command = new amd::Marker(*hostQueue, true); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief enqueues a marker command which waits for either a list of events * to complete, or if the list is empty it waits for all commands previously * enqueued in \a command_queue to complete before it completes. This command * returns an event which can be waited on, i.e. this event can be waited on * to insure that all events either in the \a event_wait_list or all * previously enqueued commands, queued before this command to * \a command_queue, have completed. * * \param command_queue is a valid command-queue. * * \param num_events_in_wait_list specifies the number of events given * by \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must * be greater than 0. The events specified in event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. The * memory associated with \a event_wait_list can be reused or freed after * the function returns. * If \a event_wait_list is NULL, then this particular command waits until * all previous enqueued commands to \a command_queue have completed. * * \param event returns an event object that identifies this particular * kernel execution instance. Event objects are unique and can be used to * identify this marker command later on. * * \return CL_SUCCESS if the function is successfully executed. * Otherwise, it returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid \a command-queue. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by * the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clEnqueueMarkerWithWaitList, (cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, *hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Command* command = new amd::Marker(*hostQueue, true, eventWaitList); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueue a wait for a specific event or a list of events to complete * before any future commands queued in the command-queue are executed. * * \param command_queue is a valid command-queue. * * \param num_events specifies the number of events given by \a event_list. * * \param event_list is the list of events. Each event in \a event_list must * be a valid event object returned by a previous call to: * - clEnqueueNDRangeKernel * - clEnqueueTask * - clEnqueueNativeKernel * - clEnqueue{Read|Write|Map}{Buffer|Image} * - clEnqueueCopy{Buffer|Image} * - clEnqueueCopyBufferToImage * - clEnqueueCopyImageToBuffer * - clEnqueueMarker. * The events specified in \a event_list act as synchronization points. * * \return One of the following values: * - CL_SUCCESS if the function was successfully executed. * - CL_INVALID_COMMAND_QUEUE if c\a ommand_queue is not a valid command-queue * - CL_INVALID_VALUE if \a num_events is zero or \a event_list is NULL * - CL_INVALID_EVENT if event objects specified in \a event_list are not valid * events * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueWaitForEvents, (cl_command_queue command_queue, cl_uint num_events, const cl_event* event_list)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events, event_list); if (err != CL_SUCCESS) { return err; } amd::Command* command = new amd::Marker(hostQueue, false, eventWaitList); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); command->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueue a barrier operation. * * The clEnqueueBarrier command ensures that all queued commands in * \a command_queue have finished execution before the next batch of commands * can begin execution. clEnqueueBarrier is a synchronization point. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueBarrier, (cl_command_queue command_queue)) { //! @todo: Unimplemented(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief enqueues a barrier command which waits for either a list of events * to complete, or if the list is empty it waits for all commands previously * enqueued in \a command_queue to complete before it completes. This command * blocks command execution, that is, any following commands enqueued after it * do not execute until it completes. This command returns an event which can * be waited on, i.e. this event can be waited on to insure that all events * either in the \a event_wait_list or all previously enqueued commands, * queued before this command to command_queue, have completed * * \param command_queue is a valid command-queue. * * \param num_events_in_wait_list specifies the number of events given * by \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must * be greater than 0. The events specified in event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. The * memory associated with \a event_wait_list can be reused or freed after * the function returns. * If \a event_wait_list is NULL, then this particular command waits until * all previous enqueued commands to \a command_queue have completed. * * \param event returns an event object that identifies this particular * kernel execution instance. Event objects are unique and can be used to * identify this marker command later on. * * \return CL_SUCCESS if the function is successfully executed. * Otherwise, it returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid \a command-queue. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by * the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clEnqueueBarrierWithWaitList, (cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, *hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } //!@note: with the current runtime architecture and in-order execution //! barrier and marker should be the same operation amd::Command* command = new amd::Marker(*hostQueue, true, eventWaitList); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * * \addtogroup CL_Profiling Profiling Operations on Memory Objects and Kernels * * Profiling of OpenCL functions that are enqueued as commands to a * command-queue. The specific functions being referred to are: * - clEnqueue{Read|Write|Map}Buffer, * - clEnqueue{Read|Write|Map}Image, * - clEnqueueCopy{Buffer|Image}, * - clEnqueueCopyImageToBuffer, * - clEnqueueCopyBufferToImage, * - clEnqueueNDRangeKernel , * - clEnqueueTask and * - clEnqueueNativeKernel. * These enqueued commands are identified by unique event objects. * * Event objects can be used to capture profiling information that measure * execution time of a command. Profiling of OpenCL commands can be enabled * either by using a command-queue created with CL_QUEUE_PROFILING_ENABLE * flag set in properties arguments to clCreateCommandQueue or by setting the * CL_QUEUE_PROFILING_ENABLE flag in properties arguments to * clSetCommandQueueProperty. * * @{ */ /*! \brief Return profiling information for the command associated with event. * * \param event specifies the event object. * * \param param_name specifies the profiling data to query. * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * The unsigned 64-bit values returned can be used to measure the time in * nano-seconds consumed by OpenCL commands. OpenCL devices are required to * correctly track time across changes in frequency and p-states. The * CL_DEVICE_PROFILING_TIMER_RESOLUTION specifies the resolution of the timer * i.e. the number of nanoseconds elapsed before the timer is incremented. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully and the profiling * information has been recorded * - CL_PROFILING_INFO_NOT_AVAILABLE if the profiling information is currently * not available (because the command identified by event has not completed) * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by param_value_size is < size of return type and \a param_value * is not NULL * - CL_INVALID_EVENT if \a event is a not a valid event object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetEventProfilingInfo, (cl_event event, cl_profiling_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(event)) { return CL_INVALID_EVENT; } if (!as_amd(event)->profilingInfo().enabled_) { return CL_PROFILING_INFO_NOT_AVAILABLE; } if (param_value != NULL && param_value_size < sizeof(cl_ulong)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = sizeof(cl_ulong); if (param_value != NULL) { cl_ulong value = 0; switch (param_name) { case CL_PROFILING_COMMAND_END: case CL_PROFILING_COMMAND_COMPLETE: value = as_amd(event)->profilingInfo().end_; break; case CL_PROFILING_COMMAND_START: value = as_amd(event)->profilingInfo().start_; break; case CL_PROFILING_COMMAND_SUBMIT: value = as_amd(event)->profilingInfo().submitted_; break; case CL_PROFILING_COMMAND_QUEUED: value = as_amd(event)->profilingInfo().queued_; break; default: return CL_INVALID_VALUE; } if (value == 0) { return CL_PROFILING_INFO_NOT_AVAILABLE; } *(cl_ulong*)param_value = value; } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Returns a reasonably synchronized pair of timestamps from the device * timer and the host timer as seen by device. * * \param device a device returned by clGetDeviceIDs. * * \param device_timestamp will be updated with the value of the current timer * in nanoseconds. The resolution of the timer is the same as the device * profiling timer returned by clGetDeviceInfo and the * CL_DEVICE_PROFILING_TIMER_RESOLUTION query. * * \param host_timestamp will be updated with the value of the current timer * in nanoseconds at the closest possible point in time to that at which * device_timer was returned. The resolution of the timer may be queried * via clGetPlatformInfo and the flag CL_PLATFORM_HOST_TIMER_RESOLUTION. * * Returns a reasonably synchronized pair of timestamps from the device * timer and the host timer as seen by device. Implementations may need * to execute this query with a high latency in order to provide reasonable * synchronization of the timestamps. The host timestamp and device timestamp * returned by this function and clGetHostTimer each have an implementation * defined timebase. The timestamps will always be in their respective timebases * regardless of which query function is used. The timestamp returned from * clGetEventProfilingInfo for an event on a device and a device timestamp * queried from the same device will always be in the same timebase. * * \return One of the following values: * - CL_SUCCESS if a time value in host_timestamp is provided * - CL_INVALID_DEVICE if device is not a valid OpenCL device. * - CL_INVALID_VALUE if host_timestamp is NULL. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * */ RUNTIME_ENTRY(cl_int, clGetDeviceAndHostTimer, (cl_device_id device, cl_ulong * device_timestamp, cl_ulong * host_timestamp)) { if (!is_valid(device)) { return CL_INVALID_DEVICE; } if (!device_timestamp || !host_timestamp) { return CL_INVALID_VALUE; } // The device timestamp and host timestamp use the same timebase. *device_timestamp = *host_timestamp = amd::Os::timeNanos(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Return the current value of the host clock as seen by device. * * \param device a device returned by clGetDeviceIDs. * * \param host_timestamp will be updated with the value of the current timer * in nanoseconds. The resolution of the timer may be queried via * clGetPlatformInfo and the flag CL_PLATFORM_HOST_TIMER_RESOLUTION. * * Return the current value of the host clock as seen by device. This value * is in the same timebase as the host_timestamp returned from * clGetDeviceAndHostTimer. The implementation will return with as low a * latency as possible to allow a correlation with a subsequent application * sampled time. The host timestamp and device timestamp returned by this * function and clGetDeviceAndHostTimer each have an implementation defined * timebase. The timestamps will always be in their respective timebases * regardless of which query function is used. The timestamp returned from * clGetEventProfilingInfo for an event on a device and a device timestamp * queried from the same device will always be in the same timebase. * * \return One of the following values: * * - CL_SUCCESS if a time value in host_timestamp is provided * - CL_INVALID_DEVICE if device is not a valid OpenCL device. * - CL_INVALID_VALUE if host_timestamp is NULL. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * */ RUNTIME_ENTRY(cl_int, clGetHostTimer, (cl_device_id device, cl_ulong * host_timestamp)) { if (!is_valid(device)) { return CL_INVALID_DEVICE; } if (!host_timestamp) { return CL_INVALID_VALUE; } *host_timestamp = amd::Os::timeNanos(); return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_FlushFinish Flush and Finish * @{ */ /*! \brief Issue all previously queued OpenCL commands in \a command_queue to * the device associated with command_queue. * * clFlush only guarantees that all queued commands to \a command_queue get * issued to the appropriate device. There is no guarantee that they will be * complete after clFlush returns. * * \return One of the following values: * - CL_SUCCESS if the function call was executed successfully * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * Any blocking commands queued in a command-queue such as * clEnqueueRead{Image|Buffer} with \a blocking_read set to CL_TRUE, * clEnqueueWrite{Image|Buffer} with \a blocking_write set to CL_TRUE, * clEnqueueMap{Buffer|Image} with \a blocking_map set to CL_TRUE or * clWaitForEvents perform an implicit flush of the command-queue. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clFlush, (cl_command_queue command_queue)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } amd::Command* command = new amd::Marker(*hostQueue, false); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); command->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Block until all previously queued OpenCL runtime commands in * \a command_queue are issued to the associated device and have completed. * * clFinish does not return until all queued commands in \a command_queue have * been processed and completed. clFinish is also a synchronization point. * * \return One of the following values: * - CL_SUCCESS if the function call was executed successfully. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clFinish, (cl_command_queue command_queue)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } hostQueue->finish(); return CL_SUCCESS; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_gl.cpp000066400000000000000000002047361450307266000175640ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "top.hpp" #ifdef _WIN32 #include #include #include // This is necessary since there are common GL/D3D10 functions #include "cl_d3d9_amd.hpp" #include "cl_d3d10_amd.hpp" #include "cl_d3d11_amd.hpp" #endif //_WIN32 #include #include #include #include #include #include "cl_common.hpp" #include "platform/interop_gl.hpp" #include "device/device.hpp" /* The pixel internal format for DOPP texture defined in gl_enum.h */ #define GL_BGR8_ATI 0x8083 #define GL_BGRA8_ATI 0x8088 #include #include /*! \addtogroup API * @{ * * \addtogroup CL_GL_Interops * * This section discusses OpenCL functions that allow applications to * use OpenGL buffer/texture/render-buffer objects as OpenCL memory * objects. This allows efficient sharing of data between these OpenCL * and OpenGL. The OpenCL API can be used to execute kernels that read * and/or write memory objects that are also an OpenGL buffer object * or a texture. An OpenCL image object can be created from an OpenGL * texture or renderbuffer object. An OpenCL buffer object can be * created from an OpenGL buffer object. An OpenCL memory object can * be created from an OpenGL texture/buffer/render-buffer object or * the default system provided framebuffer if any only if the OpenCL * clContext has been created from a GL clContext. OpenGL contexts are * created using platform specific APIs (EGL, CGL, WGL, GLX are some * of the platform specific APIs that allow applications to create GL * contexts). The appropriate platform API (such as EGL, CGL, WGL, * GLX) will be extended to allow a CL clContext to be created from a * GL clContext. Creating an OpenCL memory object from the default * system provided framebuffer will also require an appropriate * extension to the platform API. Refer to the appropriate platform * API documentation to understand how to create a CL clContext from a * GL clContext and creating a CL memory object from the default * system provided framebuffer. * * @{ * * \addtogroup clCreateFromGLBuffer * * @{ */ /*! \brief Creates an OpenCL buffer object from an OpenGL buffer object. * * \param clContext is a valid OpenCL clContext created from an OpenGL clContext. * * \param clFlags is a bit-field that is used to specify usage information. Only * CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE can be used. * * \param glBufferName is a GL buffer object name. The GL buffer * object must have a data store created though it does not need to * be initialized. The size of the data store will be used to * determine the size of the CL buffer object. * * \param pCpuMem is a pointer to the buffer data that may already be * allocated by the application. The size of the buffer that pCpuMem points * to must be >= \a size bytes. Passing in a pointer to an already allocated * buffer on the host and using it as a buffer object allows applications to * share data efficiently with kernels and the host. * * \param errcode_ret will return an appropriate error code. If errcode_ret * is NULL, no error code is returned. * * \return valid non-zero OpenCL buffer object and errcode_ret is set * to CL_SUCCESS if the buffer object is created successfully. It * returns a NULL value with one of the following error values * returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a clContext is not a valid clContext. * - CL_INVALID_VALUE if values specified in \a clFlags are not valid. * - CL_INVALID_GL_OBJECT if glBufferName is not a GL buffer object or is a * GL buffer object but does not have a data store created. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r29 */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromGLBuffer, (cl_context context, cl_mem_flags flags, GLuint bufobj, cl_int* errcode_ret)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } return (amd::clCreateFromGLBufferAMD(*as_amd(context), flags, bufobj, errcode_ret)); } RUNTIME_EXIT /*! \brief creates the following: * - an OpenCL 2D image object from an OpenGL 2D texture object * or a single face of an OpenGL cubemap texture object, * - an OpenCL 2D image array object from an OpenGL 2D texture array object, * - an OpenCL 1D image object from an OpenGL 1D texture object, * - an OpenCL 1D image buffer object from an OpenGL texture buffer object, * - an OpenCL 1D image array object from an OpenGL 1D texture array object, * - an OpenCL 3D image object from an OpenGL 3D texture object. * * \param clContext is a valid OpenCL clContext created from an OpenGL clContext. * * \param clFlags is a bit-field that is used to specify usage information. * Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE values * can be used. * * \param texture_target must be GL_TEXTURE_1D, GL_TEXTURE_1D_ARRAY, * GL_TEXTURE_BUFFER, GL_TEXTURE_2D_ARRAY, GL_TEXTURE_3D, * GL_TEXTURE_2D, GL_TEXTURE_CUBE_MAP_POSITIVE_X, * GL_TEXTURE_CUBE_MAP_POSITIVE_Y, GL_TEXTURE_CUBE_MAP_POSITIVE_Z, * GL_TEXTURE_CUBE_MAP_NEGATIVE_X, GL_TEXTURE_CUBE_MAP_NEGATIVE_Y, * GL_TEXTURE_CUBE_MAP_NEGATIVE_Z or GL_TEXTURE_RECTANGLE_ARB. * * \param miplevel is the mipmap level to be used. If \a texture_target * is GL_TEXTURE_BUFFER, \a miplevel must be 0. * * \param texture is a GL 1D, 2D, 3D, 1D array, 2D array, cubemap, * rectangle or buffer texture object. * The texture object must be a complete texture as per * OpenGL rules on texture completeness. The texture format and dimensions * defined by OpenGL for the specified miplevel of the texture will be * used to create the OpenCL image memory object. Only GL texture formats * that map to appropriate image channel order and data type can be used * to create the the OpenCL image memory object. * * \param errcode_ret will return an appropriate error code. If \a * errcode_ret is NULL, no error code is returned. * * \return A valid non-zero OpenCL image object and \a errcode_ret is set to * CL_SUCCESS if the image object is created successfully. It returns a NULL value * with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a clContext is not a valid clContext or was not * created from a GL clContext. * - CL_INVALID_VALUE if values specified in \a clFlags are not valid. * - CL_INVALID_MIP_LEVEL if \a miplevel is not a valid mip-level for \a texture. * - CL_INVALID_GL_OBJECT if \a texture is not an appropriate GL 2D texture, * cubemap or texture rectangle. * - CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if the OpenGL texture format does not * map to an appropriate OpenCL image format. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.2r07 */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromGLTexture, (cl_context context, cl_mem_flags flags, GLenum texture_target, GLint miplevel, GLuint texture, cl_int* errcode_ret)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; bool sizePass = false; for (const auto& it : devices) { if (it->info().imageSupport_) { supportPass = true; } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return static_cast(0); } return amd::clCreateFromGLTextureAMD(*as_amd(context), flags, texture_target, miplevel, texture, errcode_ret); } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromGLTexture2D * @{ */ /*! \brief Create an OpenCL 2D image object from an OpenGL 2D texture object. * * \param clContext is a valid OpenCL clContext created from an OpenGL clContext. * * \param clFlags is a bit-field that is used to specify usage information. * Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE values * can be used. * * \param target must be GL_TEXTURE_2D, GL_TEXTURE_CUBE_MAP_POSITIVE_X, * GL_TEXTURE_CUBE_MAP_POSITIVE_Y, GL_TEXTURE_CUBE_MAP_POSITIVE_Z, * GL_TEXTURE_CUBE_MAP_NEGATIVE_X, GL_TEXTURE_CUBE_MAP_NEGATIVE_Y, * GL_TEXTURE_CUBE_MAP_NEGATIVE_Z or GL_TEXTURE_RECTANGLE_ARB. * * \param miplevel is the mipmap level to be used. * * \param texture is a GL 2D texture, cubemap or texture rectangle * object name. The texture object must be a complete texture as per * OpenGL rules on texture completeness. The \a texture format and * dimensions specified using appropriate glTexImage2D call for \a * miplevel will be used to create the 2D image object. Only GL * texture formats that map to appropriate image channel order and * data type can be used to create the 2D image object. * * \param errcode_ret will return an appropriate error code. If \a * errcode_ret is NULL, no error code is returned. * * \return A valid non-zero OpenCL image object and \a errcode_ret is set to * CL_SUCCESS if the image object is created successfully. It returns a NULL value * with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a clContext is not a valid clContext or was not * created from a GL clContext. * - CL_INVALID_VALUE if values specified in \a clFlags are not valid. * - CL_INVALID_MIP_LEVEL if \a miplevel is not a valid mip-level for \a texture. * - CL_INVALID_GL_OBJECT if \a texture is not an appropriate GL 2D texture, * cubemap or texture rectangle. * - CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if the OpenGL texture format does not * map to an appropriate OpenCL image format. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r29 */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromGLTexture2D, (cl_context context, cl_mem_flags flags, GLenum target, GLint miplevel, GLuint texture, cl_int* errcode_ret)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; bool sizePass = false; for (const auto& it : devices) { if (it->info().imageSupport_) { supportPass = true; } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return static_cast(0); } return amd::clCreateFromGLTextureAMD(*as_amd(context), flags, target, miplevel, texture, errcode_ret); } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromGLTexture3D * @{ */ /*! \brief Create an OpenCL 3D image object from an OpenGL 3D texture object. * * \param clContext is a valid OpenCL clContext created from an OpenGL clContext. * * \param clFlags is a bit-field that is used to specify usage information. * Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE values * can be used. * * \param target must be GL_TEXTURE_3D. * * \param miplevel is the mipmap level to be used. * * \param texture is a GL 3D texture object [name]. * The texture object must be a complete texture as per OpenGL rules on texture * completeness. The \a texture format and dimensions specified using appropriate * glTexImage3D call for \a miplevel will be used to create the 3D image object. * Only GL texture formats that map to appropriate image channel order and * data type can be used to create the 3D image object. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero OpenCL image object and \a errcode_ret is set to * CL_SUCCESS if the image object is created successfully. It returns a NULL value * with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a clContext is not a valid clContext or was not * created from a GL clContext. * - CL_INVALID_VALUE if values specified in \a clFlags are not valid. * - CL_INVALID_MIP_LEVEL if \a miplevel is not a valid mip-level for \a texture. * - CL_INVALID_GL_OBJECT if \a texture is not an GL 3D texture. * - CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if the OpenGL texture format does not * map to an appropriate OpenCL image format. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r29 */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromGLTexture3D, (cl_context context, cl_mem_flags flags, GLenum target, GLint miplevel, GLuint texture, cl_int* errcode_ret)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; bool sizePass = false; for (const auto& it : devices) { if (it->info().imageSupport_) { supportPass = true; } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return static_cast(0); } return amd::clCreateFromGLTextureAMD(*as_amd(context), flags, target, miplevel, texture, errcode_ret); } RUNTIME_EXIT /*! @} * \addtogroup clCreateFromGLRenderbuffer * @{ */ /*! \brief Create an OpenCL 2D image object from an OpenGL renderbuffer object. * * \param clContext is a valid OpenCL clContext created from an OpenGL clContext. * * \param clFlags is a bit-field that is used to specify usage information. * Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY and CL_MEM_READ_WRITE values * can be used. * * \param renderbuffer is a GL renderbuffer object name. The renderbuffer * storage must be specified before the image object can be created. Only * GL renderbuffer formats that map to appropriate image channel order and * data type can be used to create the 2D image object. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero OpenCL image object and \a errcode_ret is set * to CL_SUCCESS if the image object is created successfully. It returns a * NULL value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a clContext is not a valid clContext or was not * created from a GL clContext. * - CL_INVALID_VALUE if values specified in \a clFlags are not valid. * - CL_INVALID_GL_OBJECT if \a renderbuffer is not an GL renderbuffer object. * - CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if the OpenGL renderbuffer format * does not map to an appropriate OpenCL image format. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r29 */ RUNTIME_ENTRY_RET(cl_mem, clCreateFromGLRenderbuffer, (cl_context context, cl_mem_flags flags, GLuint renderbuffer, cl_int* errcode_ret)) { cl_mem clMemObj = NULL; if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return clMemObj; } if (!(((flags & CL_MEM_READ_ONLY) == CL_MEM_READ_ONLY) || ((flags & CL_MEM_WRITE_ONLY) == CL_MEM_WRITE_ONLY) || ((flags & CL_MEM_READ_WRITE) == CL_MEM_READ_WRITE))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return clMemObj; } return (amd::clCreateFromGLRenderbufferAMD(*as_amd(context), flags, renderbuffer, errcode_ret)); } RUNTIME_EXIT /*! @} * \addtogroup clGetGLObjectInfo * @{ */ /*! \brief Query GL object type from a CL memory object. * * \param memobj [is a valid cl_mem object created from a GL object]. * * \param gl_object_type returns the type of GL object attached to memobj * and can be CL_GL_OBJECT_BUFFER, CL_GL_OBJECT_TEXTURE2D, * CL_GL_OBJECT_TEXTURE_RECTANGLE, CL_GL_OBJECT_TEXTURE3D, or * CL_GL_OBJECT_RENDERBUFFER. If \a gl_object_type is NULL, it is ignored. * * \param gl_object_name returns the GL object name used to create memobj. * If \a gl_object_name is NULL, it is ignored. * * \return One of the following values is returned: * - CL_SUCCESS if the call was executed successfully. * - CL_INVALID_MEM_OBJECT if \a memobj is not a valid OpenCL memory object. * - CL_INVALID_GL_OBJECT if there is no GL object associated with \a memobj. * * \version 1.0r29 */ RUNTIME_ENTRY(cl_int, clGetGLObjectInfo, (cl_mem memobj, cl_gl_object_type* gl_object_type, GLuint* gl_object_name)) { if (!is_valid(memobj)) { LogWarning("\"memobj\" is not a valid cl_mem object"); return CL_INVALID_MEM_OBJECT; } amd::InteropObject* interop = as_amd(memobj)->getInteropObj(); if (NULL == interop) { LogWarning("CL object \"memobj\" is not created from GL object"); return CL_INVALID_GL_OBJECT; } amd::GLObject* glObject = interop->asGLObject(); if (NULL == glObject) { LogWarning("CL object \"memobj\" is not created from GL object"); return CL_INVALID_GL_OBJECT; } cl_int result; cl_gl_object_type clGLType = glObject->getCLGLObjectType(); result = amd::clGetInfo(clGLType, sizeof(cl_gl_object_type), gl_object_type, NULL); GLuint glName = glObject->getGLName(); result |= amd::clGetInfo(glName, sizeof(GLuint), gl_object_name, NULL); return result; } RUNTIME_EXIT /*! @} * \addtogroup clGetGLTextureInfo * @{ */ /*! \brief Query additional information about the GL texture object associated * with \a memobj. * * \param memobj [is a valid cl_mem object created from a GL object]. * * \param param_name specifies what additional information about the GL * texture object associated with \a memobj to query: * - CL_GL_TEXTURE_TARGET (GLenum) to query the \a target argument specified * in clCreateGLTexture2D or clCreateGLTexture3D calls. * - CL_GL_MIPMAP_LEVEL (GLint) to query the \a miplevel argument specified * in clCreateGLTexture2D or clCreateGLTexture3D calls. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type as * described for \a param_name argumnet (GLenum or GLint). * \a param_value_size_ret returns the actual size in bytes of data copied to * \a param_value. If \a param_value_size_ret is NULL, it is ignored * * \return One of the following values is returned: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_MEM_OBJECT if \a memobj is not a valid OpenCL memory object. * - CL_INVALID_GL_OBJECT if there is no GL texture object (2D or 3D texture) * associated with \a memobj. * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type required by * \a param_name and \a param_value is not NULL, or if \a param_value and * \a param_value_size_ret are NULL. * * \version 1.0r29 */ RUNTIME_ENTRY(cl_int, clGetGLTextureInfo, (cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(memobj)) { LogWarning("\"memobj\" is not a valid cl_mem object"); return CL_INVALID_MEM_OBJECT; } amd::InteropObject* interop = as_amd(memobj)->getInteropObj(); if (NULL == interop) { LogWarning("CL object \"memobj\" is not created from GL object"); return CL_INVALID_GL_OBJECT; } amd::GLObject* glObject = interop->asGLObject(); if ((NULL == glObject) || (NULL != glObject->asBufferGL())) { LogWarning("CL object \"memobj\" is not created from GL texture"); return CL_INVALID_GL_OBJECT; } switch (param_name) { case CL_GL_TEXTURE_TARGET: { GLenum glTarget = glObject->getGLTarget(); if (glTarget == GL_TEXTURE_CUBE_MAP) { glTarget = glObject->getCubemapFace(); } return amd::clGetInfo(glTarget, param_value_size, param_value, param_value_size_ret); } case CL_GL_MIPMAP_LEVEL: { GLint mipLevel = glObject->getGLMipLevel(); return amd::clGetInfo(mipLevel, param_value_size, param_value, param_value_size_ret); } case CL_GL_NUM_SAMPLES: { GLsizei numSamples = glObject->getNumSamples(); return amd::clGetInfo(numSamples, param_value_size, param_value, param_value_size_ret); } default: LogWarning("Unknown param_name in clGetGLTextureInfoAMD"); break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! @} * \addtogroup clEnqueueAcquireExtObjects * @{ */ /*! \brief Acquire OpenCL memory objects that have been created from external * objects (OpenGL, D3D). * * \param command_queue is a valid command-queue. * * \param num_objects is the number of memory objects to be acquired * in \a mem_objects. * * \param mem_objects is a pointer to a list of CL memory objects that refer * to a GL object (buffer/texture/renderbuffer objects or the framebuffer). * * \param event_wait_list specify [is a pointer to] events that need to * complete before this particular command can be executed. * If \a event_wait_list is NULL, then this particular command does not wait * on any event to complete. If \a event_wait_list is NULL, * \a num_events_in_wait_list must be 0. If \a event_wait_list is not NULL, * the list of events pointed to by \a event_wait_list must be valid and * \a num_events_in_wait_list must be greater than 0. The events specified in * \a event_wait_list act as synchronization points. * * \param num_events_in_wait_list specify the number of events in * \a event_wait_list. It must be 0 if \a event_wait_list is NULL. It must be * greater than 0 if \a event_wait_list is not NULL. * * \param event returns an event object that identifies this particular * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue a * wait for this command to complete. * * \return One of the following values is returned: * - CL_SUCCESS if the function is executed successfully. * - CL_SUCCESS if \a num_objects is 0 and \a mem_objects is NULL; the * function does nothing. * - CL_INVALID_VALUE if \a num_objects is zero and \a mem_objects is not a * NULL value or if \a num_objects > 0 and \a mem_objects is NULL. * - CL_INVALID_MEM_OBJECT if memory objects in \a mem_objects are not valid * OpenCL memory objects. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if clContext associated with \a command_queue was not * created from an OpenGL clContext. * - CL_INVALID_GL_OBJECT if memory objects in \a mem_objects have not been * created from a GL object(s). * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 1.0r29 */ RUNTIME_ENTRY(cl_int, clEnqueueAcquireGLObjects, (cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return amd::clEnqueueAcquireExtObjectsAMD(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event, CL_COMMAND_ACQUIRE_GL_OBJECTS); } RUNTIME_EXIT /*! @} * \addtogroup clEnqueueReleaseGLObjects * @{ */ /*! \brief Release OpenCL memory objects that have been created from OpenGL * objects. * * \param command_queue is a valid command-queue [which is associated with the * OpenCL clContext releasing the OpenGL objects]. * * \param num_objects is the number of memory objects to be released * in \a mem_objects. * * \param mem_objects is a pointer to a list of CL memory objects that refer * to a GL object (buffer/texture/renderbuffer objects or the framebuffer). * * \param event_wait_list specify [is a pointer to] events that need to * complete before this particular command can be executed. * If \a event_wait_list is NULL, then this particular command does not wait * on any event to complete. If \a event_wait_list is NULL, * \a num_events_in_wait_list must be 0. If \a event_wait_list is not NULL, * the list of events pointed to by \a event_wait_list must be valid and * \a num_events_in_wait_list must be greater than 0. The events specified in * \a event_wait_list act as synchronization points. * * \param num_events_in_wait_list specify the number of events in * \a event_wait_list. It must be 0 if \a event_wait_list is NULL. It must be * greater than 0 if \a event_wait_list is not NULL. * * \param event returns an event object that identifies this particular * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue a * wait for this command to complete. * * \return One of the following values is returned: * - CL_SUCCESS if the function is executed successfully. * - CL_SUCCESS if \a num_objects is 0 and \a mem_objects is NULL; the * function does nothing. * - CL_INVALID_VALUE if \a num_objects is zero and \a mem_objects is not a * NULL value or if \a num_objects > 0 and \a mem_objects is NULL. * - CL_INVALID_MEM_OBJECT if memory objects in \a mem_objects are not valid * OpenCL memory objects. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if clContext associated with \a command_queue was not * created from an OpenGL clContext. * - CL_INVALID_GL_OBJECT if memory objects in \a mem_objects have not been * created from a GL object(s). * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 1.0r29 */ RUNTIME_ENTRY(cl_int, clEnqueueReleaseGLObjects, (cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { return amd::clEnqueueReleaseExtObjectsAMD(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event, CL_COMMAND_RELEASE_GL_OBJECTS); } RUNTIME_EXIT /*! @} * \addtogroup clCreateEventFromGLsyncKHR * @{ */ /*! \brief Creates an event object linked to an OpenGL sync object. * Completion of such an event object is equivalent to waiting for completion * of the fence command associated with the linked GL sync object. * * \param context is valid OpenCL context created from an OpenGL context * or share group, using the cl_khr_gl_sharing extension. * * \param sync is the 'name' of a sync object in the GL share group associated * with context. * * \param errcode_ret Returns an appropriate error code as described below. * If errcode_ret is NULL, no error code is returned. * * \return a valid OpenCL event object and errcode_ret is set to CL_SUCCESS * if the event object is created successfully.Otherwise, it returns a NULL * value with one of the following error values returned in errcode_ret: * - CL_INVALID_CONTEXT if context is not a valid context or was not created * from a GL context. * - CL_INVALID_GL_OBJECT if sync is not the name of a sync object in the * GL share group associated with context. * * \version 1.1 */ RUNTIME_ENTRY_RET(cl_event, clCreateEventFromGLsyncKHR, (cl_context context, cl_GLsync clGLsync, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return nullptr; } // create event of fence sync type amd::ClGlEvent* clglEvent = new amd::ClGlEvent(*as_amd(context)); if (clglEvent == nullptr) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("Memory allocation of clglEvent object failed"); return nullptr; } clglEvent->context().glenv()->glFlush_(); // initially set the status of fence as queued clglEvent->setStatus(CL_SUBMITTED); // store GLsync id of the fence in event in order to associate them together clglEvent->setData(clGLsync); amd::Event* evt = clglEvent; evt->retain(); *not_null(errcode_ret) = CL_SUCCESS; return as_cl(evt); } RUNTIME_EXIT /*! @} * \addtogroup clGetGLContextInfoKHR * @{ */ /*! \brief This f-n is defined in CL extension cl_khr_gl_sharing and serves * the purpose of quering current device and all devices that support * CL-GL interoperability. * * \param properties points to an , which is a array of * ordered pairs terminated with zero. If an * attribute is not specified in , then its default value * (listed in table 4.attr) is used (it is said to be specified * implicitly). If is NULL or empty (points to a list * whose first value is zero), all attributes take on their default * values. * * \param param_name may accept one of the following enumerated values: * - CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR 0x2006 * - CL_DEVICES_FOR_GL_CONTEXT_KHR 0x2007. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type as * described for \a param_name argumnet (GLenum or GLint). * \a param_value_size_ret returns the actual size in bytes of data copied to * \a param_value. If \a param_value_size_ret is NULL, it is ignored * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type as * described for \a param_name argumnet (GLenum or GLint). * \a param_value_size_ret returns the actual size in bytes of data copied to * \a param_value. If \a param_value_size_ret is NULL, it is ignored * * \return one of the following values is returned: * - CL_SUCCESS if the function is executed successfully. * - CL_SUCCESS if \a num_objects is 0 and \a mem_objects is NULL; the * function does nothing. * - CL_INVALID_VALUE if \a num_objects is zero and \a mem_objects is not a * NULL value or if \a num_objects > 0 and \a mem_objects is NULL. * - CL_INVALID_MEM_OBJECT if memory objects in \a mem_objects are not valid * OpenCL memory objects. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if clContext associated with \a command_queue was not * created from an OpenGL clContext. * - CL_INVALID_GL_OBJECT if memory objects in \a mem_objects have not been * created from a GL object(s). * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * - CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR if * * \version 1.0r47 */ RUNTIME_ENTRY(cl_int, clGetGLContextInfoKHR, (const cl_context_properties* properties, cl_gl_context_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { cl_int errcode; cl_device_id* gpu_devices; cl_uint num_gpu_devices = 0; amd::Context::Info info; static const bool VALIDATE_ONLY = true; errcode = amd::Context::checkProperties(properties, &info); if (CL_SUCCESS != errcode) { return errcode; } if (!(info.flags_ & amd::Context::GLDeviceKhr)) { // No GL context is specified return CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR; } // Get devices errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 0, NULL, &num_gpu_devices); if (errcode != CL_SUCCESS && errcode != CL_DEVICE_NOT_FOUND) { return CL_INVALID_VALUE; } if (!num_gpu_devices) { return CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR; } switch (param_name) { case CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR: // Return the CL device currently associated with the specified OpenGL context. if (num_gpu_devices) { gpu_devices = (cl_device_id*)alloca(num_gpu_devices * sizeof(cl_device_id)); errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, num_gpu_devices, gpu_devices, NULL); if (errcode != CL_SUCCESS) { return errcode; } for (cl_uint i = 0; i < num_gpu_devices; ++i) { cl_device_id device = gpu_devices[i]; if (is_valid(device) && as_amd(device)->bindExternalDevice(info.flags_, info.hDev_, info.hCtx_, VALIDATE_ONLY)) { return amd::clGetInfo(device, param_value_size, param_value, param_value_size_ret); } } *not_null(param_value_size_ret) = 0; } break; case CL_DEVICES_FOR_GL_CONTEXT_KHR: { // List of all CL devices that can be associated with the specified OpenGL context. cl_uint total_devices = num_gpu_devices; size_t size = total_devices * sizeof(cl_device_id); cl_device_id* devices = (cl_device_id*)alloca(size); errcode = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, total_devices, devices, NULL); if (errcode != CL_SUCCESS) { return errcode; } std::vector compatible_devices; for (cl_uint i = 0; i < total_devices; ++i) { cl_device_id device = devices[i]; if (is_valid(device) && as_amd(device)->bindExternalDevice(info.flags_, info.hDev_, info.hCtx_, VALIDATE_ONLY)) { compatible_devices.push_back(as_amd(device)); } } size_t deviceCount = compatible_devices.size(); size_t deviceCountSize = deviceCount * sizeof(cl_device_id); if (param_value != NULL && param_value_size < deviceCountSize) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = deviceCountSize; if (param_value != NULL) { cl_device_id* deviceList = (cl_device_id*)param_value; for (const auto& it : compatible_devices) { *deviceList++ = as_cl(it); } } return CL_SUCCESS; } break; default: LogWarning("\"param_name\" is not valid"); return CL_INVALID_VALUE; } return CL_SUCCESS; } RUNTIME_EXIT // // // namespace amd // // namespace amd { typedef struct { GLenum glBinding; GLenum glTarget; } TargetBindings_t; /*! @} * \addtogroup CL-GL interop helper functions * @{ */ //! Function clearGLErrors() to clear all GL error bits, if any void clearGLErrors(const Context& amdContext) { GLenum glErr, glLastErr = GL_NO_ERROR; while (1) { glErr = amdContext.glenv()->glGetError_(); if (glErr == GL_NO_ERROR || glErr == glLastErr) { break; } glLastErr = glErr; LogWarning("GL error"); } } GLenum checkForGLError(const Context& amdContext) { GLenum glRetErr = GL_NO_ERROR; GLenum glErr; while (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { glRetErr = glErr; // Just return the last GL error LogWarning("Check GL error"); } return glRetErr; } static GLenum clChannelDataTypeToGlType(cl_channel_type channel_type) { // Pick // GL_BYTE, GL_UNSIGNED_BYTE, GL_SHORT, GL_UNSIGNED_SHORT, GL_INT, // GL_UNSIGNED_INT, GL_FLOAT, GL_2_BYTES, GL_3_BYTES, GL_4_BYTES // or GL_DOUBLE switch (channel_type) { case CL_SNORM_INT8: return GL_BYTE; case CL_SNORM_INT16: return GL_SHORT; case CL_UNORM_INT8: return GL_UNSIGNED_BYTE; case CL_UNORM_INT16: return GL_UNSIGNED_SHORT; case CL_SIGNED_INT8: return GL_BYTE; case CL_SIGNED_INT16: return GL_SHORT; case CL_SIGNED_INT32: return GL_INT; case CL_UNSIGNED_INT8: return GL_UNSIGNED_BYTE; case CL_UNSIGNED_INT16: return GL_UNSIGNED_SHORT; case CL_UNSIGNED_INT32: return GL_UNSIGNED_INT; case CL_FLOAT: return GL_FLOAT; case CL_UNORM_INT_101010: return GL_UNSIGNED_INT_10_10_10_2; case CL_HALF_FLOAT: case CL_UNORM_SHORT_565: case CL_UNORM_SHORT_555: default: guarantee(false, "Unexpected CL type."); return 0; } } static GLenum glInternalFormatToGlFormat(GLenum internalFormat) { switch (internalFormat) { // Base internal formats case GL_RGBA: case GL_BGRA: return internalFormat; // Sized internal formats case GL_RGBA8: case GL_RGBA16: case GL_RGBA16F: case GL_RGBA32F: return GL_RGBA; case GL_RGBA8I: case GL_RGBA8UI: case GL_RGBA16I: case GL_RGBA16UI: case GL_RGBA32I: case GL_RGBA32UI: return GL_RGBA_INTEGER; default: guarantee(false, "Unexpected GL internal format."); return 0; } } //******************************************************************* // // Internal implementation of CL API functions // //******************************************************************* // // clCreateFromGLBufferAMD // cl_mem clCreateFromGLBufferAMD(Context& amdContext, cl_mem_flags flags, GLuint bufobj, cl_int* errcode_ret) { BufferGL* pBufferGL = NULL; GLenum glErr; GLenum glTarget = GL_ARRAY_BUFFER; GLint gliSize = 0; GLint gliMapped = 0; // Verify context init'ed for interop if (!amdContext.glenv() || !amdContext.glenv()->isAssociated()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("\"amdContext\" is not created from GL context or share list"); return (cl_mem)0; } // Add this scope to bound the scoped lock { GLFunctions::SetIntEnv ie(amdContext.glenv()); if (!ie.isValid()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("\"amdContext\" is not created from GL context or share list"); return as_cl(0); } // Verify GL buffer object clearGLErrors(amdContext); if ((GL_FALSE == amdContext.glenv()->glIsBuffer_(bufobj)) || (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_()))) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("\"bufobj\" is not a GL buffer object"); return (cl_mem)0; } // It seems that CL spec is not concerned with GL_BUFFER_USAGE, so skip it // Check if size is available - data store is created amdContext.glenv()->glBindBuffer_(glTarget, bufobj); clearGLErrors(amdContext); amdContext.glenv()->glGetBufferParameteriv_(glTarget, GL_BUFFER_SIZE, &gliSize); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("cannot get the GL buffer size"); return (cl_mem)0; } if (gliSize == 0) { //@todo - check why sometime the size is zero *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("the GL buffer's data store is not created"); return (cl_mem)0; } // Mapping will be done at acquire time (sync point) } // Release scoped lock // Now create BufferGL object pBufferGL = new (amdContext) BufferGL(amdContext, flags, gliSize, 0, bufobj); if (!pBufferGL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("cannot create object of class BufferGL"); return (cl_mem)0; } if (!pBufferGL->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pBufferGL->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; // Create interop object if (pBufferGL->getInteropObj() == NULL) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("cannot create object of class BufferGL"); return (cl_mem)0; } // Fixme: If more than one device is present in the context, we choose the first device. // We should come up with a more elegant solution to handle this. assert(amdContext.devices().size() == 1); const auto it = amdContext.devices().cbegin(); const amd::Device& dev = *(*it); device::Memory* mem = pBufferGL->getDeviceMemory(dev); if (NULL == mem) { LogPrintfError("Can't allocate memory size - 0x%08X bytes!", pBufferGL->getSize()); *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; return (cl_mem)0; } mem->processGLResource(device::Memory::GLDecompressResource); return as_cl(pBufferGL); } cl_mem clCreateFromGLTextureAMD(Context& amdContext, cl_mem_flags clFlags, GLenum target, GLint miplevel, GLuint texture, int* errcode_ret) { ImageGL* pImageGL = NULL; GLenum glErr; GLenum glTarget = 0; GLenum glInternalFormat; cl_image_format clImageFormat; uint dim = 1; cl_mem_object_type clType; cl_gl_object_type clGLType; GLsizei numSamples = 1; GLint gliTexMaxLevel; bool wholeMipmap = false; // Verify context init'ed for interop if (!amdContext.glenv() || !amdContext.glenv()->isAssociated()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("\"amdContext\" is not created from GL context or share list"); return static_cast(0); } GLint gliTexWidth = 1; GLint gliTexHeight = 1; GLint gliTexDepth = 1; // Add this scope to bound the scoped lock { GLFunctions::SetIntEnv ie(amdContext.glenv()); if (!ie.isValid()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("\"amdContext\" is not created from GL context or share list"); return as_cl(0); } // Verify GL texture object clearGLErrors(amdContext); if ((GL_FALSE == amdContext.glenv()->glIsTexture_(texture)) || (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_()))) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("\"texture\" is not a GL texture object"); return static_cast(0); } bool image = true; // Check target value validity switch (target) { case GL_TEXTURE_BUFFER: glTarget = GL_TEXTURE_BUFFER; dim = 1; clType = CL_MEM_OBJECT_IMAGE1D_BUFFER; clGLType = CL_GL_OBJECT_TEXTURE_BUFFER; image = false; break; case GL_TEXTURE_1D: glTarget = GL_TEXTURE_1D; dim = 1; clType = CL_MEM_OBJECT_IMAGE1D; clGLType = CL_GL_OBJECT_TEXTURE1D; break; case GL_TEXTURE_CUBE_MAP_POSITIVE_X: case GL_TEXTURE_CUBE_MAP_NEGATIVE_X: case GL_TEXTURE_CUBE_MAP_POSITIVE_Y: case GL_TEXTURE_CUBE_MAP_NEGATIVE_Y: case GL_TEXTURE_CUBE_MAP_POSITIVE_Z: case GL_TEXTURE_CUBE_MAP_NEGATIVE_Z: glTarget = GL_TEXTURE_CUBE_MAP; dim = 2; clType = CL_MEM_OBJECT_IMAGE2D; clGLType = CL_GL_OBJECT_TEXTURE2D; break; case GL_TEXTURE_1D_ARRAY: glTarget = GL_TEXTURE_1D_ARRAY; dim = 2; clType = CL_MEM_OBJECT_IMAGE1D_ARRAY; clGLType = CL_GL_OBJECT_TEXTURE1D_ARRAY; break; case GL_TEXTURE_2D: glTarget = GL_TEXTURE_2D; dim = 2; clType = CL_MEM_OBJECT_IMAGE2D; clGLType = CL_GL_OBJECT_TEXTURE2D; break; case GL_TEXTURE_2D_MULTISAMPLE: glTarget = GL_TEXTURE_2D_MULTISAMPLE; dim = 2; clType = CL_MEM_OBJECT_IMAGE2D; clGLType = CL_GL_OBJECT_TEXTURE2D; break; case GL_TEXTURE_RECTANGLE_ARB: glTarget = GL_TEXTURE_RECTANGLE_ARB; dim = 2; clType = CL_MEM_OBJECT_IMAGE2D; clGLType = CL_GL_OBJECT_TEXTURE2D; break; case GL_TEXTURE_2D_ARRAY: glTarget = GL_TEXTURE_2D_ARRAY; dim = 3; clType = CL_MEM_OBJECT_IMAGE2D_ARRAY; clGLType = CL_GL_OBJECT_TEXTURE2D_ARRAY; break; case GL_TEXTURE_3D: glTarget = GL_TEXTURE_3D; dim = 3; clType = CL_MEM_OBJECT_IMAGE3D; clGLType = CL_GL_OBJECT_TEXTURE3D; break; default: // wrong value *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid \"target\" value"); return static_cast(0); break; } amdContext.glenv()->glBindTexture_(glTarget, texture); // Check if size is available - data store is created if (image) { // Check mipmap level for "texture" name GLint gliTexBaseLevel; clearGLErrors(amdContext); amdContext.glenv()->glGetTexParameteriv_(glTarget, GL_TEXTURE_BASE_LEVEL, &gliTexBaseLevel); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_MIP_LEVEL; LogWarning("Cannot get base mipmap level of a GL \"texture\" object"); return static_cast(0); } clearGLErrors(amdContext); amdContext.glenv()->glGetTexParameteriv_(glTarget, GL_TEXTURE_MAX_LEVEL, &gliTexMaxLevel); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_MIP_LEVEL; LogWarning("Cannot get max mipmap level of a GL \"texture\" object"); return static_cast(0); } wholeMipmap = miplevel < 0; miplevel = wholeMipmap ? gliTexBaseLevel : miplevel; if ((gliTexBaseLevel > miplevel) || (miplevel > gliTexMaxLevel)) { *not_null(errcode_ret) = CL_INVALID_MIP_LEVEL; LogWarning("\"miplevel\" is not a valid mipmap level of the GL \"texture\" object"); return static_cast(0); } // Get GL texture format and check if it's compatible with CL format clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_INTERNAL_FORMAT, (GLint*)&glInternalFormat); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("Cannot get internal format of \"miplevel\" of GL \"texture\" object"); return static_cast(0); } amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_SAMPLES, (GLint*)&numSamples); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("Cannot get numbers of samples of GL \"texture\" object"); return static_cast(0); } if (numSamples > 1) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("MSAA \"texture\" object is not suppoerted for the device"); return static_cast(0); } // Now get CL format from GL format and bytes per pixel int iBytesPerPixel = 0; if (!getCLFormatFromGL(amdContext, glInternalFormat, &clImageFormat, &iBytesPerPixel, clFlags)) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("\"texture\" format does not map to an appropriate CL image format"); return static_cast(0); } switch (dim) { case 3: clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_DEPTH, &gliTexDepth); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("Cannot get the depth of \"miplevel\" of GL \"texure\""); return static_cast(0); } // Fall trough to process other dimensions... case 2: clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_HEIGHT, &gliTexHeight); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("Cannot get the height of \"miplevel\" of GL \"texure\""); return static_cast(0); } // Fall trough to process other dimensions... case 1: clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_(target, miplevel, GL_TEXTURE_WIDTH, &gliTexWidth); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("Cannot get the width of \"miplevel\" of GL \"texure\""); return static_cast(0); } break; default: *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid \"target\" value"); return static_cast(0); } } else { GLint size; // In case target is GL_TEXTURE_BUFFER GLint backingBuffer; clearGLErrors(amdContext); amdContext.glenv()->glGetTexLevelParameteriv_( glTarget, 0, GL_TEXTURE_BUFFER_DATA_STORE_BINDING, &backingBuffer); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("Cannot get backing buffer for GL \"texture buffer\" object"); return static_cast(0); } amdContext.glenv()->glBindBuffer_(glTarget, backingBuffer); // Get GL texture format and check if it's compatible with CL format clearGLErrors(amdContext); amdContext.glenv()->glGetIntegerv_(GL_TEXTURE_BUFFER_FORMAT_EXT, reinterpret_cast(&glInternalFormat)); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("Cannot get internal format of \"miplevel\" of GL \"texture\" object"); return static_cast(0); } // Now get CL format from GL format and bytes per pixel int iBytesPerPixel = 0; if (!getCLFormatFromGL(amdContext, glInternalFormat, &clImageFormat, &iBytesPerPixel, clFlags)) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("\"texture\" format does not map to an appropriate CL image format"); return static_cast(0); } clearGLErrors(amdContext); amdContext.glenv()->glGetBufferParameteriv_(glTarget, GL_BUFFER_SIZE, &size); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("Cannot get internal format of \"miplevel\" of GL \"texture\" object"); return static_cast(0); } gliTexWidth = size / iBytesPerPixel; } size_t imageSize = (clType == CL_MEM_OBJECT_IMAGE1D_ARRAY) ? static_cast(gliTexHeight) : static_cast(gliTexDepth); if (!amd::Image::validateDimensions( amdContext.devices(), clType, static_cast(gliTexWidth), static_cast(gliTexHeight), static_cast(gliTexDepth), imageSize)) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("The GL \"texture\" data store is not created or out of supported dimensions"); return static_cast(0); } // PBO and mapping will be done at "acquire" time (sync point) } // Release scoped lock target = (glTarget == GL_TEXTURE_CUBE_MAP) ? target : 0; if (wholeMipmap) { pImageGL = new (amdContext) ImageGL(amdContext, clType, clFlags, clImageFormat, static_cast(gliTexWidth), static_cast(gliTexHeight), static_cast(gliTexDepth), glTarget, texture, miplevel, glInternalFormat, clGLType, numSamples, gliTexMaxLevel, target); } else { pImageGL = new (amdContext) ImageGL(amdContext, clType, clFlags, clImageFormat, static_cast(gliTexWidth), static_cast(gliTexHeight), static_cast(gliTexDepth), glTarget, texture, miplevel, glInternalFormat, clGLType, numSamples, target); } if (!pImageGL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("Cannot create class ImageGL - out of memory?"); return static_cast(0); } if (!pImageGL->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImageGL->release(); return static_cast(0); } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImageGL); } // // clCreateFromGLRenderbufferDAMD // cl_mem clCreateFromGLRenderbufferAMD(Context& amdContext, cl_mem_flags clFlags, GLuint renderbuffer, int* errcode_ret) { ImageGL* pImageGL = NULL; GLenum glErr; GLenum glTarget = GL_RENDERBUFFER; GLenum glInternalFormat; cl_image_format clImageFormat; // Verify context init'ed for interop if (!amdContext.glenv() || !amdContext.glenv()->isAssociated()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("\"amdContext\" is not created from GL context or share list"); return (cl_mem)0; } GLint gliRbWidth; GLint gliRbHeight; // Add this scope to bound the scoped lock { GLFunctions::SetIntEnv ie(amdContext.glenv()); if (!ie.isValid()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("\"amdContext\" is not created from GL context or share list"); return as_cl(0); } // Verify GL renderbuffer object clearGLErrors(amdContext); if ((GL_FALSE == amdContext.glenv()->glIsRenderbufferEXT_(renderbuffer)) || (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_()))) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("\"renderbuffer\" is not a GL texture object"); return (cl_mem)0; } amdContext.glenv()->glBindRenderbuffer_(glTarget, renderbuffer); // Get GL RB format and check if it's compatible with CL format clearGLErrors(amdContext); amdContext.glenv()->glGetRenderbufferParameterivEXT_(glTarget, GL_RENDERBUFFER_INTERNAL_FORMAT, (GLint*)&glInternalFormat); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("Cannot get internal format of GL \"renderbuffer\" object"); return (cl_mem)0; } // Now get CL format from GL format and bytes per pixel int iBytesPerPixel = 0; if (!getCLFormatFromGL(amdContext, glInternalFormat, &clImageFormat, &iBytesPerPixel, clFlags)) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("\"renderbuffer\" format does not map to an appropriate CL image format"); return (cl_mem)0; } // Check if size is available - data store is created clearGLErrors(amdContext); amdContext.glenv()->glGetRenderbufferParameterivEXT_(glTarget, GL_RENDERBUFFER_WIDTH, &gliRbWidth); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("Cannot get the width of GL \"renderbuffer\""); return (cl_mem)0; } if (gliRbWidth == 0) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("The GL \"renderbuffer\" data store is not created"); return (cl_mem)0; } clearGLErrors(amdContext); amdContext.glenv()->glGetRenderbufferParameterivEXT_(glTarget, GL_RENDERBUFFER_HEIGHT, &gliRbHeight); if (GL_NO_ERROR != (glErr = amdContext.glenv()->glGetError_())) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("Cannot get the height of GL \"renderbuffer\""); return (cl_mem)0; } if (gliRbHeight == 0) { *not_null(errcode_ret) = CL_INVALID_GL_OBJECT; LogWarning("The GL \"renderbuffer\" data store is not created"); return (cl_mem)0; } // PBO and mapping will be done at "acquire" time (sync point) } // Release scoped lock pImageGL = new (amdContext) ImageGL(amdContext, CL_MEM_OBJECT_IMAGE2D, clFlags, clImageFormat, (size_t)gliRbWidth, (size_t)gliRbHeight, 1, glTarget, renderbuffer, 0, glInternalFormat, CL_GL_OBJECT_RENDERBUFFER, 0); if (!pImageGL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("Cannot create class ImageGL from renderbuffer - out of memory?"); return (cl_mem)0; } if (!pImageGL->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; pImageGL->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(pImageGL); } // // clEnqueueAcquireExtObjectsAMD // static cl_int clSetInteropObjects(cl_uint num_objects, const cl_mem* mem_objects, std::vector& interopObjects) { if ((num_objects == 0 && mem_objects != NULL) || (num_objects != 0 && mem_objects == NULL)) { return CL_INVALID_VALUE; } while (num_objects-- > 0) { cl_mem obj = *mem_objects++; if (!is_valid(obj)) { return CL_INVALID_MEM_OBJECT; } amd::Memory* mem = as_amd(obj); if (mem->getInteropObj() == NULL) { return CL_INVALID_GL_OBJECT; } interopObjects.push_back(mem); } return CL_SUCCESS; } cl_int clEnqueueAcquireExtObjectsAMD(cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event, cl_command_type cmd_type) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (cmd_type == CL_COMMAND_ACQUIRE_GL_OBJECTS) { GLFunctions* gl_functions = hostQueue.context().glenv(); // Verify context init'ed for interop if (!gl_functions || !gl_functions->isAssociated()) { LogWarning("\"amdContext\" is not created from GL context or share list"); return CL_INVALID_CONTEXT; } // If the cl_khr_gl_event extension is supported, then the OpenCL implementation will ensure // that any such pending OpenGL operations are complete for an OpenGL context bound // to the same thread as the OpenCL context. if (hostQueue.device().settings().checkExtension(ClKhrGlEvent)) { gl_functions->WaitCurrentGlContext(hostQueue.context().info()); } } std::vector memObjects; cl_int err = clSetInteropObjects(num_objects, mem_objects, memObjects); if (err != CL_SUCCESS) { return err; } amd::Command::EventWaitList eventWaitList; err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } #ifdef _WIN32 if ((hostQueue.context().info().flags_ & amd::Context::InteropUserSync) == 0) { //! Make sure D3D10 queues are flushed and all commands are finished //! before CL side would access interop objects if (cmd_type == CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR) { amd::SyncD3D10Objects(memObjects); } //! Make sure D3D11 queues are flushed and all commands are finished //! before CL side would access interop objects if (cmd_type == CL_COMMAND_ACQUIRE_D3D11_OBJECTS_KHR) { amd::SyncD3D11Objects(memObjects); } //! Make sure D3D9 queues are flushed and all commands are finished //! before CL side would access interop objects if (cmd_type == CL_COMMAND_ACQUIRE_DX9_MEDIA_SURFACES_KHR) { amd::SyncD3D9Objects(memObjects); } } #endif //_WIN32 //! Now create command and enqueue amd::AcquireExtObjectsCommand* command = new amd::AcquireExtObjectsCommand( hostQueue, eventWaitList, num_objects, memObjects, cmd_type); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } // // clEnqueueReleaseExtObjectsAMD // cl_int clEnqueueReleaseExtObjectsAMD(cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event, cl_command_type cmd_type) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; std::vector memObjects; cl_int err = clSetInteropObjects(num_objects, mem_objects, memObjects); if (err != CL_SUCCESS) { return err; } amd::Command::EventWaitList eventWaitList; err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } //! Now create command and enqueue amd::ReleaseExtObjectsCommand* command = new amd::ReleaseExtObjectsCommand( hostQueue, eventWaitList, num_objects, memObjects, cmd_type); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); #ifdef _WIN32 if ((hostQueue.context().info().flags_ & amd::Context::InteropUserSync) == 0) { //! Make sure CL command queue is flushed and all commands are finished //! before D3D10 side would access interop resources if (cmd_type == CL_COMMAND_RELEASE_DX9_MEDIA_SURFACES_KHR || cmd_type == CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR || cmd_type == CL_COMMAND_RELEASE_D3D11_OBJECTS_KHR) { command->awaitCompletion(); } } #endif //_WIN32 // If the cl_khr_gl_event extension is supported, then the OpenCL implementation will ensure // that any pending OpenCL operations are complete for an OpenGL context bound // to the same thread as the OpenCL context. if (cmd_type == CL_COMMAND_RELEASE_GL_OBJECTS) { GLFunctions* gl_functions = hostQueue.context().glenv(); // Verify context init'ed for interop if (!gl_functions || !gl_functions->isAssociated()) { LogWarning("\"amdContext\" is not created from GL context or share list"); return CL_INVALID_CONTEXT; } if (hostQueue.device().settings().checkExtension(ClKhrGlEvent) && gl_functions->IsCurrentGlContext(hostQueue.context().info())) { command->awaitCompletion(); } } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } } // namespace amd clr-rocm-5.7.1/opencl/amdocl/cl_icd.cpp000066400000000000000000000305461450307266000177150ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "vdi_common.hpp" #ifdef _WIN32 #include #include "cl_d3d9_amd.hpp" #include "cl_d3d10_amd.hpp" #include "cl_d3d11_amd.hpp" #endif //_WIN32 #include amd::PlatformIDS amd::PlatformID::Platform = //{ NULL }; {amd::ICDDispatchedObject::icdVendorDispatch_}; static cl_int CL_API_CALL icdGetPlatformInfo(cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret) { if (platform != reinterpret_cast(&amd::PlatformID::Platform)) { return CL_INVALID_PLATFORM; } return clGetPlatformInfo(NULL, param_name, param_value_size, param_value, param_value_size_ret); } static cl_int CL_API_CALL icdGetDeviceIDs(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices) { return clGetDeviceIDs(NULL, device_type, num_entries, devices, num_devices); } static cl_int CL_API_CALL icdGetDeviceInfo(cl_device_id device, cl_device_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret) { if (param_name == CL_DEVICE_PLATFORM) { // Return the ICD platform instead of the default NULL platform. cl_platform_id platform = reinterpret_cast(&amd::PlatformID::Platform); return amd::clGetInfo(platform, param_value_size, param_value, param_value_size_ret); } return clGetDeviceInfo(device, param_name, param_value_size, param_value, param_value_size_ret); } cl_icd_dispatch amd::ICDDispatchedObject::icdVendorDispatch_[] = { {NULL /* should not get called */, icdGetPlatformInfo, icdGetDeviceIDs, icdGetDeviceInfo, clCreateContext, clCreateContextFromType, clRetainContext, clReleaseContext, clGetContextInfo, clCreateCommandQueue, clRetainCommandQueue, clReleaseCommandQueue, clGetCommandQueueInfo, clSetCommandQueueProperty, clCreateBuffer, clCreateImage2D, clCreateImage3D, clRetainMemObject, clReleaseMemObject, clGetSupportedImageFormats, clGetMemObjectInfo, clGetImageInfo, clCreateSampler, clRetainSampler, clReleaseSampler, clGetSamplerInfo, clCreateProgramWithSource, clCreateProgramWithBinary, clRetainProgram, clReleaseProgram, clBuildProgram, clUnloadCompiler, clGetProgramInfo, clGetProgramBuildInfo, clCreateKernel, clCreateKernelsInProgram, clRetainKernel, clReleaseKernel, clSetKernelArg, clGetKernelInfo, clGetKernelWorkGroupInfo, clWaitForEvents, clGetEventInfo, clRetainEvent, clReleaseEvent, clGetEventProfilingInfo, clFlush, clFinish, clEnqueueReadBuffer, clEnqueueWriteBuffer, clEnqueueCopyBuffer, clEnqueueReadImage, clEnqueueWriteImage, clEnqueueCopyImage, clEnqueueCopyImageToBuffer, clEnqueueCopyBufferToImage, clEnqueueMapBuffer, clEnqueueMapImage, clEnqueueUnmapMemObject, clEnqueueNDRangeKernel, clEnqueueTask, clEnqueueNativeKernel, clEnqueueMarker, clEnqueueWaitForEvents, clEnqueueBarrier, clGetExtensionFunctionAddress, clCreateFromGLBuffer, clCreateFromGLTexture2D, clCreateFromGLTexture3D, clCreateFromGLRenderbuffer, clGetGLObjectInfo, clGetGLTextureInfo, clEnqueueAcquireGLObjects, clEnqueueReleaseGLObjects, clGetGLContextInfoKHR, WINDOWS_SWITCH(clGetDeviceIDsFromD3D10KHR, NULL), WINDOWS_SWITCH(clCreateFromD3D10BufferKHR, NULL), WINDOWS_SWITCH(clCreateFromD3D10Texture2DKHR, NULL), WINDOWS_SWITCH(clCreateFromD3D10Texture3DKHR, NULL), WINDOWS_SWITCH(clEnqueueAcquireD3D10ObjectsKHR, NULL), WINDOWS_SWITCH(clEnqueueReleaseD3D10ObjectsKHR, NULL), clSetEventCallback, clCreateSubBuffer, clSetMemObjectDestructorCallback, clCreateUserEvent, clSetUserEventStatus, clEnqueueReadBufferRect, clEnqueueWriteBufferRect, clEnqueueCopyBufferRect, NULL, NULL, NULL, clCreateEventFromGLsyncKHR, /* OpenCL 1.2*/ clCreateSubDevices, clRetainDevice, clReleaseDevice, clCreateImage, clCreateProgramWithBuiltInKernels, clCompileProgram, clLinkProgram, clUnloadPlatformCompiler, clGetKernelArgInfo, clEnqueueFillBuffer, clEnqueueFillImage, clEnqueueMigrateMemObjects, clEnqueueMarkerWithWaitList, clEnqueueBarrierWithWaitList, clGetExtensionFunctionAddressForPlatform, clCreateFromGLTexture, WINDOWS_SWITCH(clGetDeviceIDsFromD3D11KHR, NULL), WINDOWS_SWITCH(clCreateFromD3D11BufferKHR, NULL), WINDOWS_SWITCH(clCreateFromD3D11Texture2DKHR, NULL), WINDOWS_SWITCH(clCreateFromD3D11Texture3DKHR, NULL), WINDOWS_SWITCH(clCreateFromDX9MediaSurfaceKHR, NULL), WINDOWS_SWITCH(clEnqueueAcquireD3D11ObjectsKHR, NULL), WINDOWS_SWITCH(clEnqueueReleaseD3D11ObjectsKHR, NULL), WINDOWS_SWITCH(clGetDeviceIDsFromDX9MediaAdapterKHR, NULL), // KHRpfn_clGetDeviceIDsFromDX9MediaAdapterKHR // clGetDeviceIDsFromDX9MediaAdapterKHR; WINDOWS_SWITCH( clEnqueueAcquireDX9MediaSurfacesKHR, NULL), // KHRpfn_clEnqueueAcquireDX9MediaSurfacesKHR clEnqueueAcquireDX9MediaSurfacesKHR; WINDOWS_SWITCH( clEnqueueReleaseDX9MediaSurfacesKHR, NULL), // KHRpfn_clEnqueueReleaseDX9MediaSurfacesKHR clEnqueueReleaseDX9MediaSurfacesKHR; NULL, NULL, NULL, NULL, clCreateCommandQueueWithProperties, clCreatePipe, clGetPipeInfo, clSVMAlloc, clSVMFree, clEnqueueSVMFree, clEnqueueSVMMemcpy, clEnqueueSVMMemFill, clEnqueueSVMMap, clEnqueueSVMUnmap, clCreateSamplerWithProperties, clSetKernelArgSVMPointer, clSetKernelExecInfo, clGetKernelSubGroupInfo, clCloneKernel, clCreateProgramWithIL, clEnqueueSVMMigrateMem, clGetDeviceAndHostTimer, clGetHostTimer, clGetKernelSubGroupInfo, clSetDefaultDeviceCommandQueue, clSetProgramReleaseCallback, clSetProgramSpecializationConstant }}; #if defined(_WIN32) #include #pragma comment(lib, "shlwapi.lib") static bool ShouldLoadPlatform() { // Get the OpenCL ICD registry values HKEY platformsKey = NULL; if (RegOpenKeyExA(HKEY_LOCAL_MACHINE, "SOFTWARE\\Khronos\\OpenCL\\Vendors", 0, KEY_READ, &platformsKey) != ERROR_SUCCESS) return true; std::vector registryValues; DWORD dwIndex = 0; while (true) { char cszLibraryName[1024] = {0}; DWORD dwLibraryNameSize = sizeof(cszLibraryName); DWORD dwLibraryNameType = 0; DWORD dwValue = 0; DWORD dwValueSize = sizeof(dwValue); if (RegEnumValueA(platformsKey, dwIndex++, cszLibraryName, &dwLibraryNameSize, NULL, &dwLibraryNameType, (LPBYTE)&dwValue, &dwValueSize) != ERROR_SUCCESS) break; // Require that the value be a DWORD and equal zero if (dwLibraryNameType != REG_DWORD || dwValue != 0) { continue; } registryValues.push_back(cszLibraryName); } RegCloseKey(platformsKey); HMODULE hm = NULL; if (!GetModuleHandleExA( GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS | GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, (LPCSTR)&ShouldLoadPlatform, &hm)) return true; char cszDllPath[1024] = {0}; if (!GetModuleFileNameA(hm, cszDllPath, sizeof(cszDllPath))) return true; // If we are loaded from the DriverStore, then there should be a registry // value matching our current module absolute path. if (std::find(registryValues.begin(), registryValues.end(), cszDllPath) == registryValues.end()) return true; LPSTR cszFileName; char buffer[1024] = {0}; if (!GetFullPathNameA(cszDllPath, sizeof(buffer), buffer, &cszFileName)) return true; // We found an absolute path in the registry that matched this DLL, now // check if there is also an entry with the same filename. if (std::find(registryValues.begin(), registryValues.end(), cszFileName) == registryValues.end()) return true; // Lastly, check if there is a DLL with the same name in the System folder. char cszSystemPath[1024] = {0}; #if defined(ATI_BITS_32) if (!GetSystemWow64DirectoryA(cszSystemPath, sizeof(cszSystemPath))) #endif // defined(ATI_BITS_32) if (!GetSystemDirectoryA(cszSystemPath, sizeof(cszSystemPath))) return true; std::string systemDllPath; systemDllPath.append(cszSystemPath).append("\\").append(cszFileName); if (!PathFileExistsA(systemDllPath.c_str())) { return true; } // If we get here, then all 3 conditions are true: // - An entry in the registry with an absolute path matches the current DLL // - An entry in the registry with a relative path matches the current DLL // - A DLL with the same name was found in the system directory // // We should not load this platform! return false; } #else #include // If there is only one platform, load it. // If there is more than one platform, only load platforms that have visible devices // If all platforms have no devices available, only load the PAL platform static bool ShouldLoadPlatform() { bool shouldLoad = true; if (!amd::Runtime::initialized()) { amd::Runtime::init(); } const int numDevices = amd::Device::numDevices(CL_DEVICE_TYPE_GPU, false); void *otherPlatform = nullptr; if (amd::IS_LEGACY) { otherPlatform = dlopen("libamdocl64.so", RTLD_LAZY); if (otherPlatform != nullptr) { // Present platform exists shouldLoad = numDevices > 0; } } else { otherPlatform = dlopen("libamdocl-orca64.so", RTLD_LAZY); if (otherPlatform != nullptr) { // Legacy platform exists // gcc4.8 doesn't support casting void* to a function pointer // Work around this by creating a typedef untill we upgrade the compiler typedef void*(*clGetFunctionAddress_t)(const char *); typedef cl_int(*clIcdGetPlatformIDs_t)(cl_uint, cl_platform_id *, cl_uint *); clGetFunctionAddress_t legacyGetFunctionAddress = reinterpret_cast(dlsym(otherPlatform, "clGetExtensionFunctionAddress")); clIcdGetPlatformIDs_t legacyGetPlatformIDs = reinterpret_cast(legacyGetFunctionAddress("clIcdGetPlatformIDsKHR")); cl_uint numLegacyPlatforms = 0; legacyGetPlatformIDs(0, nullptr, &numLegacyPlatforms); shouldLoad = (numDevices > 0) || (numLegacyPlatforms == 0); } } if (otherPlatform != nullptr) { dlclose(otherPlatform); } return shouldLoad; } #endif // defined(_WIN32) CL_API_ENTRY cl_int CL_API_CALL clIcdGetPlatformIDsKHR(cl_uint num_entries, cl_platform_id* platforms, cl_uint* num_platforms) { if (((num_entries > 0 || num_platforms == NULL) && platforms == NULL) || (num_entries == 0 && platforms != NULL)) { return CL_INVALID_VALUE; } static bool shouldLoad = true; static std::once_flag initOnce; std::call_once(initOnce, [](){ shouldLoad = ShouldLoadPlatform(); }); if (!shouldLoad) { *not_null(num_platforms) = 0; return CL_SUCCESS; } if (!amd::Runtime::initialized()) { amd::Runtime::init(); } if (num_platforms != NULL && platforms == NULL) { *num_platforms = 1; return CL_SUCCESS; } assert(platforms != NULL && "check the code above"); *platforms = reinterpret_cast(&amd::PlatformID::Platform); *not_null(num_platforms) = 1; return CL_SUCCESS; } clr-rocm-5.7.1/opencl/amdocl/cl_icd_amd.h000066400000000000000000001131361450307266000202000ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2010 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ #ifndef __OPENCL_CL_ICD_H #define __OPENCL_CL_ICD_H #include #include #define cl_khr_icd 1 #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ typedef cl_int(CL_API_CALL* clGetPlatformIDs_fn)( cl_uint /* num_entries */, cl_platform_id* /* platforms */, cl_uint* /* num_platforms */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetPlatformInfo_fn)( cl_platform_id /* platform */, cl_platform_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetDeviceIDs_fn)( cl_platform_id /* platform */, cl_device_type /* device_type */, cl_uint /* num_entries */, cl_device_id* /* devices */, cl_uint* /* num_devices */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetDeviceInfo_fn)( cl_device_id /* device */, cl_device_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_context(CL_API_CALL* clCreateContext_fn)( const cl_context_properties* /* properties */, cl_uint /* num_devices */, const cl_device_id* /* devices */, void(CL_CALLBACK* /* pfn_notify */)(const char*, const void*, size_t, void*), void* /* user_data */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_context(CL_API_CALL* clCreateContextFromType_fn)( const cl_context_properties* /* properties */, cl_device_type /* device_type */, void(CL_CALLBACK* /* pfn_notify*/)(const char*, const void*, size_t, void*), void* /* user_data */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clRetainContext_fn)(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clReleaseContext_fn)(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetContextInfo_fn)( cl_context /* context */, cl_context_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_command_queue(CL_API_CALL* clCreateCommandQueue_fn)( cl_context /* context */, cl_device_id /* device */, cl_command_queue_properties /* properties */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clRetainCommandQueue_fn)(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clReleaseCommandQueue_fn)(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetCommandQueueInfo_fn)( cl_command_queue /* command_queue */, cl_command_queue_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clSetCommandQueueProperty_fn)( cl_command_queue /* command_queue */, cl_command_queue_properties /* properties */, cl_bool /* enable */, cl_command_queue_properties* /* old_properties */) /*CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED*/; typedef cl_mem(CL_API_CALL* clCreateBuffer_fn)( cl_context /* context */, cl_mem_flags /* flags */, size_t /* size */, void* /* host_ptr */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_mem(CL_API_CALL* clCreateSubBuffer_fn)( cl_mem /* buffer */, cl_mem_flags /* flags */, cl_buffer_create_type /* buffer_create_type */, const void* /* buffer_create_info */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; typedef cl_mem(CL_API_CALL* clCreateImage2D_fn)( cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format* /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_row_pitch */, void* /* host_ptr */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_mem(CL_API_CALL* clCreateImage3D_fn)( cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format* /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_depth */, size_t /* image_row_pitch */, size_t /* image_slice_pitch */, void* /* host_ptr */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clRetainMemObject_fn)(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clReleaseMemObject_fn)(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetSupportedImageFormats_fn)( cl_context /* context */, cl_mem_flags /* flags */, cl_mem_object_type /* image_type */, cl_uint /* num_entries */, cl_image_format* /* image_formats */, cl_uint* /* num_image_formats */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetMemObjectInfo_fn)( cl_mem /* memobj */, cl_mem_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetImageInfo_fn)( cl_mem /* image */, cl_image_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clSetMemObjectDestructorCallback_fn)( cl_mem /* memobj */, void(CL_CALLBACK* /*pfn_notify*/)(cl_mem /* memobj */, void* /*user_data*/), void* /*user_data */) CL_API_SUFFIX__VERSION_1_1; /* Sampler APIs */ typedef cl_sampler(CL_API_CALL* clCreateSampler_fn)( cl_context /* context */, cl_bool /* normalized_coords */, cl_addressing_mode /* addressing_mode */, cl_filter_mode /* filter_mode */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clRetainSampler_fn)(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clReleaseSampler_fn)(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetSamplerInfo_fn)( cl_sampler /* sampler */, cl_sampler_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Program Object APIs */ typedef cl_program(CL_API_CALL* clCreateProgramWithSource_fn)( cl_context /* context */, cl_uint /* count */, const char** /* strings */, const size_t* /* lengths */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithIL(cl_context /* context */, const void * /* strings */, size_t /* lengths */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_2_0; typedef cl_program(CL_API_CALL* clCreateProgramWithILKHR_fn)( cl_context /* context */, const void* /* il */, size_t /* length */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; typedef cl_program(CL_API_CALL* clCreateProgramWithBinary_fn)( cl_context /* context */, cl_uint /* num_devices */, const cl_device_id* /* device_list */, const size_t* /* lengths */, const unsigned char** /* binaries */, cl_int* /* binary_status */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clRetainProgram_fn)(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clReleaseProgram_fn)(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clBuildProgram_fn)( cl_program /* program */, cl_uint /* num_devices */, const cl_device_id* /* device_list */, const char* /* options */, void(CL_CALLBACK* /* pfn_notify */)(cl_program /* program */, void* /* user_data */), void* /* user_data */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clUnloadCompiler_fn)(void) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetProgramInfo_fn)( cl_program /* program */, cl_program_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetProgramBuildInfo_fn)( cl_program /* program */, cl_device_id /* device */, cl_program_build_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Kernel Object APIs */ typedef cl_kernel(CL_API_CALL* clCreateKernel_fn)( cl_program /* program */, const char* /* kernel_name */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clCreateKernelsInProgram_fn)( cl_program /* program */, cl_uint /* num_kernels */, cl_kernel* /* kernels */, cl_uint* /* num_kernels_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clRetainKernel_fn)(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clReleaseKernel_fn)(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clSetKernelArg_fn)(cl_kernel /* kernel */, cl_uint /* arg_index */, size_t /* arg_size */, const void* /* arg_value */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetKernelInfo_fn)( cl_kernel /* kernel */, cl_kernel_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetKernelWorkGroupInfo_fn)( cl_kernel /* kernel */, cl_device_id /* device */, cl_kernel_work_group_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Event Object APIs */ typedef cl_int(CL_API_CALL* clWaitForEvents_fn)( cl_uint /* num_events */, const cl_event* /* event_list */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetEventInfo_fn)( cl_event /* event */, cl_event_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_event(CL_API_CALL* clCreateUserEvent_fn)( cl_context /* context */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; typedef cl_int(CL_API_CALL* clRetainEvent_fn)(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clReleaseEvent_fn)(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clSetUserEventStatus_fn)( cl_event /* event */, cl_int /* execution_status */) CL_API_SUFFIX__VERSION_1_1; typedef cl_int(CL_API_CALL* clSetEventCallback_fn)( cl_event /* event */, cl_int /* command_exec_callback_type */, void(CL_CALLBACK* /* pfn_notify */)(cl_event, cl_int, void*), void* /* user_data */) CL_API_SUFFIX__VERSION_1_1; /* Profiling APIs */ typedef cl_int(CL_API_CALL* clGetEventProfilingInfo_fn)( cl_event /* event */, cl_profiling_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Flush and Finish APIs */ typedef cl_int(CL_API_CALL* clFlush_fn)(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clFinish_fn)(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; /* Enqueued Commands APIs */ typedef cl_int(CL_API_CALL* clEnqueueReadBuffer_fn)( cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, size_t /* offset */, size_t /* cb */, void* /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueReadBufferRect_fn)( cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, const size_t* /* buffer_offset */, const size_t* /* host_offset */, const size_t* /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, void* /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_1; typedef cl_int(CL_API_CALL* clEnqueueWriteBuffer_fn)( cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, size_t /* offset */, size_t /* cb */, const void* /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueWriteBufferRect_fn)( cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, const size_t* /* buffer_offset */, const size_t* /* host_offset */, const size_t* /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, const void* /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_1; typedef cl_int(CL_API_CALL* clEnqueueCopyBuffer_fn)( cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, size_t /* src_offset */, size_t /* dst_offset */, size_t /* cb */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueCopyBufferRect_fn)( cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, const size_t* /* src_origin */, const size_t* /* dst_origin */, const size_t* /* region */, size_t /* src_row_pitch */, size_t /* src_slice_pitch */, size_t /* dst_row_pitch */, size_t /* dst_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_1; typedef cl_int(CL_API_CALL* clEnqueueReadImage_fn)( cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_read */, const size_t* /* origin[3] */, const size_t* /* region[3] */, size_t /* row_pitch */, size_t /* slice_pitch */, void* /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueWriteImage_fn)( cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_write */, const size_t* /* origin[3] */, const size_t* /* region[3] */, size_t /* input_row_pitch */, size_t /* input_slice_pitch */, const void* /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueCopyImage_fn)( cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_image */, const size_t* /* src_origin[3] */, const size_t* /* dst_origin[3] */, const size_t* /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueCopyImageToBuffer_fn)( cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_buffer */, const size_t* /* src_origin[3] */, const size_t* /* region[3] */, size_t /* dst_offset */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueCopyBufferToImage_fn)( cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_image */, size_t /* src_offset */, const size_t* /* dst_origin[3] */, const size_t* /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef void*(CL_API_CALL* clEnqueueMapBuffer_fn)( cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, size_t /* offset */, size_t /* cb */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */, cl_int* /* errcode_ret */)CL_API_SUFFIX__VERSION_1_0; typedef void*(CL_API_CALL* clEnqueueMapImage_fn)( cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, const size_t* /* origin[3] */, const size_t* /* region[3] */, size_t* /* image_row_pitch */, size_t* /* image_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */, cl_int* /* errcode_ret */)CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueUnmapMemObject_fn)( cl_command_queue /* command_queue */, cl_mem /* memobj */, void* /* mapped_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueNDRangeKernel_fn)( cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* work_dim */, const size_t* /* global_work_offset */, const size_t* /* global_work_size */, const size_t* /* local_work_size */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueTask_fn)(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueNativeKernel_fn)( cl_command_queue /* command_queue */, void(CL_CALLBACK* user_func)(void*), void* /* args */, size_t /* cb_args */, cl_uint /* num_mem_objects */, const cl_mem* /* mem_list */, const void** /* args_mem_loc */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueMarker_fn)(cl_command_queue /* command_queue */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueWaitForEvents_fn)( cl_command_queue /* command_queue */, cl_uint /* num_events */, const cl_event* /* event_list */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueBarrier_fn)(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; typedef void*(CL_API_CALL* clGetExtensionFunctionAddress_fn)(const char* /* func_name */) CL_API_SUFFIX__VERSION_1_0; typedef cl_mem(CL_API_CALL* clCreateFromGLBuffer_fn)( cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* bufobj */, int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_mem(CL_API_CALL* clCreateFromGLTexture2D_fn)( cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_mem(CL_API_CALL* clCreateFromGLTexture3D_fn)( cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_mem(CL_API_CALL* clCreateFromGLRenderbuffer_fn)( cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* renderbuffer */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetGLObjectInfo_fn)( cl_mem /* memobj */, cl_gl_object_type* /* gl_object_type */, cl_GLuint* /* gl_object_name */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clGetGLTextureInfo_fn)( cl_mem /* memobj */, cl_gl_texture_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef cl_event(CL_API_CALL* clCreateEventFromGLsyncKHR_fn)( cl_context /* context */, cl_GLsync /* cl_GLsync */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; typedef cl_int(CL_API_CALL* clEnqueueAcquireGLObjects_fn)( cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem* /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clEnqueueReleaseGLObjects_fn)( cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem* /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_0; typedef cl_int(CL_API_CALL* clCreateSubDevices_fn)( cl_device_id /* in_device */, const cl_device_partition_property* /* properties */, cl_uint /* num_entries */, cl_device_id* /* out_devices */, cl_uint* /* num_devices */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clRetainDevice_fn)(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clReleaseDevice_fn)(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; typedef cl_mem(CL_API_CALL* clCreateImage_fn)(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format* /* image_format*/, const cl_image_desc* /* image_desc*/, void* /* host_ptr */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; typedef cl_program(CL_API_CALL* clCreateProgramWithBuiltInKernels_fn)( cl_context /* context */, cl_uint /* num_devices */, const cl_device_id* /* device_list */, const char* /* kernel_names */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clCompileProgram_fn)( cl_program /* program */, cl_uint /* num_devices */, const cl_device_id* /* device_list */, const char* /* options */, cl_uint /* num_input_headers */, const cl_program* /* input_headers */, const char** /* header_include_names */, void(CL_CALLBACK* pfn_notify)(cl_program program, void* user_data), void* /* user_data */) CL_API_SUFFIX__VERSION_1_2; typedef cl_program(CL_API_CALL* clLinkProgram_fn)( cl_context /* context */, cl_uint /* num_devices */, const cl_device_id* /* device_list */, const char* /* options */, cl_uint /* num_input_programs */, const cl_program* /* input_programs */, void(CL_CALLBACK* pfn_notify)(cl_program program, void* user_data), void* /* user_data */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clUnloadPlatformCompiler_fn)(cl_platform_id /* platform */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clGetKernelArgInfo_fn)( cl_kernel /* kernel */, cl_uint /* arg_indx */, cl_kernel_arg_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clEnqueueFillBuffer_fn)( cl_command_queue /* command_queue */, cl_mem /* buffer */, const void* /* pattern */, size_t /* pattern_size */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clEnqueueFillImage_fn)( cl_command_queue /* command_queue */, cl_mem /* image */, const void* /* fill_color */, const size_t* /* origin */, const size_t* /* region */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clEnqueueMigrateMemObjects_fn)( cl_command_queue /* command_queue */, cl_uint /* num_mem_objects */, const cl_mem* /* mem_objects */, cl_mem_migration_flags /* flags */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clEnqueueMarkerWithWaitList_fn)( cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_2; typedef cl_int(CL_API_CALL* clEnqueueBarrierWithWaitList_fn)( cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_1_2; typedef void*(CL_API_CALL* clGetExtensionFunctionAddressForPlatform_fn)( cl_platform_id /* platform */, const char* /* funcname */)CL_API_SUFFIX__VERSION_1_2; typedef cl_mem(CL_API_CALL* clCreateFromGLTexture_fn)( cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* texture_target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; typedef cl_command_queue(CL_API_CALL* clCreateCommandQueueWithProperties_fn)( cl_context /* context */, cl_device_id /* device */, const cl_queue_properties* /* properties */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; typedef cl_sampler(CL_API_CALL* clCreateSamplerWithProperties_fn)( cl_context /* context */, const cl_sampler_properties* /* properties */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; typedef void*(CL_API_CALL* clSVMAlloc_fn)(cl_context /* context */, cl_svm_mem_flags /* flags */, size_t /* size */, cl_uint /* alignment */)CL_API_SUFFIX__VERSION_2_0; typedef void(CL_API_CALL* clSVMFree_fn)(cl_context /* context */, void* /* svm_pointer */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clSetKernelArgSVMPointer_fn)( cl_kernel /* kernel */, cl_uint /* arg_index */, const void* /* arg_value */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clSetKernelExecInfo_fn)( cl_kernel /* kernel */, cl_kernel_exec_info /* param_name */, size_t /* param_value_size */, const void* /* param_value */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clEnqueueSVMFree_fn)( cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, void* [] /* svm_pointers */, void(CL_CALLBACK* /* pfn_free_func */)(cl_command_queue /* queue */, cl_uint /* num_svm_pointers */, void* [] /* svm_pointers */, void* /* user_data */), void* /* user_data */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clEnqueueSVMMemcpy_fn)( cl_command_queue /* command_queue */, cl_bool /* blocking_copy */, void* /* dst_ptr */, const void* /* src_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clEnqueueSVMMemFill_fn)( cl_command_queue /* command_queue */, void* /* svm_ptr */, const void* /* pattern */, size_t /* pattern_size */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clEnqueueSVMMap_fn)( cl_command_queue /* command_queue */, cl_bool /* blocking_map */, cl_map_flags /* flags */, void* /* svm_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clEnqueueSVMUnmap_fn)(cl_command_queue /* command_queue */, void* /* svm_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */) CL_API_SUFFIX__VERSION_2_0; typedef cl_mem(CL_API_CALL* clCreatePipe_fn)(cl_context /* context */, cl_mem_flags /* flags */, cl_uint /* pipe_packet_size */, cl_uint /* pipe_max_packets */, const cl_pipe_properties* /* properties */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clGetPipeInfo_fn)( cl_mem /* pipe */, cl_pipe_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clGetKernelSubGroupInfoKHR_fn)( cl_kernel /* kernel */, cl_device_id /* device */, cl_kernel_sub_group_info /* param_name */, size_t /* input_value_size */, const void* /* input_value */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */) CL_API_SUFFIX__VERSION_2_0; typedef cl_int(CL_API_CALL* clSetDefaultDeviceCommandQueue_fn)( cl_context /* context */, cl_device_id /* device */, cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_2_1; typedef cl_kernel(CL_API_CALL* clCloneKernel_fn)( cl_kernel /* source_kernel */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_1; typedef cl_int (CL_API_CALL* clEnqueueSVMMigrateMem_fn)( cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, const void ** /* svm_pointers */, const size_t * /* sizes */, cl_mem_migration_flags /* flags */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_1; typedef cl_int (CL_API_CALL* clGetDeviceAndHostTimer_fn)( cl_device_id /* device */, cl_ulong * /* device_timestamp */, cl_ulong * /* host_timestamp */) CL_API_SUFFIX__VERSION_2_1; typedef cl_int (CL_API_CALL* clGetHostTimer_fn)( cl_device_id /* device */, cl_ulong * /* host_timestamp */) CL_API_SUFFIX__VERSION_2_1; typedef cl_int (CL_API_CALL* clSetProgramSpecializationConstant_fn)( cl_program /* program */, cl_uint /* spec_id */, size_t /* spec_size */, const void* /* spec_value */) CL_API_SUFFIX__VERSION_2_2; typedef cl_int (CL_API_CALL* clSetProgramReleaseCallback_fn)( cl_program /* program */, void (CL_CALLBACK * /* pfn_notify */)(cl_program program, void * user_data), void * /* user_data */) CL_API_SUFFIX__VERSION_2_2; typedef struct _cl_icd_dispatch_table { /* OpenCL 1.0 */ clGetPlatformIDs_fn GetPlatformIDs; clGetPlatformInfo_fn GetPlatformInfo; clGetDeviceIDs_fn GetDeviceIDs; clGetDeviceInfo_fn GetDeviceInfo; clCreateContext_fn CreateContext; clCreateContextFromType_fn CreateContextFromType; clRetainContext_fn RetainContext; clReleaseContext_fn ReleaseContext; clGetContextInfo_fn GetContextInfo; clCreateCommandQueue_fn CreateCommandQueue; clRetainCommandQueue_fn RetainCommandQueue; clReleaseCommandQueue_fn ReleaseCommandQueue; clGetCommandQueueInfo_fn GetCommandQueueInfo; clSetCommandQueueProperty_fn SetCommandQueueProperty; clCreateBuffer_fn CreateBuffer; clCreateImage2D_fn CreateImage2D; clCreateImage3D_fn CreateImage3D; clRetainMemObject_fn RetainMemObject; clReleaseMemObject_fn ReleaseMemObject; clGetSupportedImageFormats_fn GetSupportedImageFormats; clGetMemObjectInfo_fn GetMemObjectInfo; clGetImageInfo_fn GetImageInfo; clCreateSampler_fn CreateSampler; clRetainSampler_fn RetainSampler; clReleaseSampler_fn ReleaseSampler; clGetSamplerInfo_fn GetSamplerInfo; clCreateProgramWithSource_fn CreateProgramWithSource; clCreateProgramWithBinary_fn CreateProgramWithBinary; clRetainProgram_fn RetainProgram; clReleaseProgram_fn ReleaseProgram; clBuildProgram_fn BuildProgram; clUnloadCompiler_fn UnloadCompiler; clGetProgramInfo_fn GetProgramInfo; clGetProgramBuildInfo_fn GetProgramBuildInfo; clCreateKernel_fn CreateKernel; clCreateKernelsInProgram_fn CreateKernelsInProgram; clRetainKernel_fn RetainKernel; clReleaseKernel_fn ReleaseKernel; clSetKernelArg_fn SetKernelArg; clGetKernelInfo_fn GetKernelInfo; clGetKernelWorkGroupInfo_fn GetKernelWorkGroupInfo; clWaitForEvents_fn WaitForEvents; clGetEventInfo_fn GetEventInfo; clRetainEvent_fn RetainEvent; clReleaseEvent_fn ReleaseEvent; clGetEventProfilingInfo_fn GetEventProfilingInfo; clFlush_fn Flush; clFinish_fn Finish; clEnqueueReadBuffer_fn EnqueueReadBuffer; clEnqueueWriteBuffer_fn EnqueueWriteBuffer; clEnqueueCopyBuffer_fn EnqueueCopyBuffer; clEnqueueReadImage_fn EnqueueReadImage; clEnqueueWriteImage_fn EnqueueWriteImage; clEnqueueCopyImage_fn EnqueueCopyImage; clEnqueueCopyImageToBuffer_fn EnqueueCopyImageToBuffer; clEnqueueCopyBufferToImage_fn EnqueueCopyBufferToImage; clEnqueueMapBuffer_fn EnqueueMapBuffer; clEnqueueMapImage_fn EnqueueMapImage; clEnqueueUnmapMemObject_fn EnqueueUnmapMemObject; clEnqueueNDRangeKernel_fn EnqueueNDRangeKernel; clEnqueueTask_fn EnqueueTask; clEnqueueNativeKernel_fn EnqueueNativeKernel; clEnqueueMarker_fn EnqueueMarker; clEnqueueWaitForEvents_fn EnqueueWaitForEvents; clEnqueueBarrier_fn EnqueueBarrier; clGetExtensionFunctionAddress_fn GetExtensionFunctionAddress; clCreateFromGLBuffer_fn CreateFromGLBuffer; clCreateFromGLTexture2D_fn CreateFromGLTexture2D; clCreateFromGLTexture3D_fn CreateFromGLTexture3D; clCreateFromGLRenderbuffer_fn CreateFromGLRenderbuffer; clGetGLObjectInfo_fn GetGLObjectInfo; clGetGLTextureInfo_fn GetGLTextureInfo; clEnqueueAcquireGLObjects_fn EnqueueAcquireGLObjects; clEnqueueReleaseGLObjects_fn EnqueueReleaseGLObjects; clGetGLContextInfoKHR_fn GetGLContextInfoKHR; void* _reservedForD3D10KHR[6]; /* OpenCL 1.1 */ clSetEventCallback_fn SetEventCallback; clCreateSubBuffer_fn CreateSubBuffer; clSetMemObjectDestructorCallback_fn SetMemObjectDestructorCallback; clCreateUserEvent_fn CreateUserEvent; clSetUserEventStatus_fn SetUserEventStatus; clEnqueueReadBufferRect_fn EnqueueReadBufferRect; clEnqueueWriteBufferRect_fn EnqueueWriteBufferRect; clEnqueueCopyBufferRect_fn EnqueueCopyBufferRect; void* _reservedForDeviceFissionEXT[3]; clCreateEventFromGLsyncKHR_fn CreateEventFromGLsyncKHR; /* OpenCL 1.2 */ clCreateSubDevices_fn CreateSubDevices; clRetainDevice_fn RetainDevice; clReleaseDevice_fn ReleaseDevice; clCreateImage_fn CreateImage; clCreateProgramWithBuiltInKernels_fn CreateProgramWithBuiltInKernels; clCompileProgram_fn CompileProgram; clLinkProgram_fn LinkProgram; clUnloadPlatformCompiler_fn UnloadPlatformCompiler; clGetKernelArgInfo_fn GetKernelArgInfo; clEnqueueFillBuffer_fn EnqueueFillBuffer; clEnqueueFillImage_fn EnqueueFillImage; clEnqueueMigrateMemObjects_fn EnqueueMigrateMemObjects; clEnqueueMarkerWithWaitList_fn EnqueueMarkerWithWaitList; clEnqueueBarrierWithWaitList_fn EnqueueBarrierWithWaitList; clGetExtensionFunctionAddressForPlatform_fn GetExtensionFunctionAddressForPlatform; clCreateFromGLTexture_fn CreateFromGLTexture; /* cl_khr_d3d11_sharing, cl_khr_dx9_media_sharing */ void* _reservedForD3DExtensions[10]; /* cl_khr_egl_image, cl_khr_egl_event */ void* _reservedForEGLExtensions[4]; /* OpenCL 2.0 */ clCreateCommandQueueWithProperties_fn CreateCommandQueueWithProperties; clCreatePipe_fn CreatePipe; clGetPipeInfo_fn GetPipeInfo; clSVMAlloc_fn SVMAlloc; clSVMFree_fn SVMFree; clEnqueueSVMFree_fn EnqueueSVMFree; clEnqueueSVMMemcpy_fn EnqueueSVMMemcpy; clEnqueueSVMMemFill_fn EnqueueSVMMemFill; clEnqueueSVMMap_fn EnqueueSVMMap; clEnqueueSVMUnmap_fn EnqueueSVMUnmap; clCreateSamplerWithProperties_fn CreateSamplerWithProperties; clSetKernelArgSVMPointer_fn SetKernelArgSVMPointer; clSetKernelExecInfo_fn SetKernelExecInfo; /* cl_khr_sub_groups */ clGetKernelSubGroupInfoKHR_fn GetKernelSubGroupInfoKHR; /* OpenCL 2.1 */ clCloneKernel_fn CloneKernel; clCreateProgramWithILKHR_fn CreateProgramWithILKHR; clEnqueueSVMMigrateMem_fn EnqueueSVMMigrateMem; clGetDeviceAndHostTimer_fn GetDeviceAndHostTimer; clGetHostTimer_fn GetHostTimer; clGetKernelSubGroupInfoKHR_fn GetKernelSubGroupInfo; clSetDefaultDeviceCommandQueue_fn SetDefaultDeviceCommandQueue; /* OpenCL 2.2 */ clSetProgramReleaseCallback_fn SetProgramReleaseCallback; clSetProgramSpecializationConstant_fn SetProgramSpecializationConstant; } cl_icd_dispatch_table; #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* __OPENCL_CL_ICD_H */ clr-rocm-5.7.1/opencl/amdocl/cl_kernel.h000066400000000000000000000074431450307266000201030ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef CL_KERNEL_H_ #define CL_KERNEL_H_ struct clk_builtins_t; // This must be a multiple of sizeof(cl_ulong16) #define __CPU_SCRATCH_SIZE 128 #define CLK_PRIVATE_MEMORY_SIZE (16 * 1024) struct clk_thread_info_block_t { // Warning! The size of this struct needs to be a multiple // of 16 when compiling 64 bit struct clk_builtins_t const* builtins; void* local_mem_base; void* local_scratch; const void* table_base; size_t pad; uint work_dim; size_t global_offset[4]; /*dim0,dim1,dim2,invalid(dim<0||dim>2)*/ size_t global_size[4]; /*dim0,dim1,dim2,invalid(dim<0||dim>2)*/ size_t enqueued_local_size[4]; size_t local_size[4]; /*dim0,dim1,dim2,invalid(dim<0||dim>2)*/ size_t local_id[4]; /*dim0,dim1,dim2,invalid(dim<0||dim>2)*/ size_t group_id[4]; /*dim0,dim1,dim2,invalid(dim<0||dim>2)*/ }; typedef enum clk_value_type_t { T_VOID, T_CHAR, T_SHORT, T_INT, T_LONG, T_FLOAT, T_DOUBLE, T_POINTER, T_CHAR2, T_CHAR3, T_CHAR4, T_CHAR8, T_CHAR16, T_SHORT2, T_SHORT3, T_SHORT4, T_SHORT8, T_SHORT16, T_INT2, T_INT3, T_INT4, T_INT8, T_INT16, T_LONG2, T_LONG3, T_LONG4, T_LONG8, T_LONG16, T_FLOAT2, T_FLOAT3, T_FLOAT4, T_FLOAT8, T_FLOAT16, T_DOUBLE2, T_DOUBLE3, T_DOUBLE4, T_DOUBLE8, T_DOUBLE16, T_SAMPLER, T_SEMA, T_STRUCT, T_QUEUE, T_PAD } clk_value_type_t; typedef enum clk_address_space_t { A_PRIVATE, A_LOCAL, A_CONSTANT, A_GLOBAL, A_REGION } clk_address_space_t; // kernel arg access qualifier and type qualifier typedef enum clk_arg_qualifier_t { Q_NONE = 0, // for image type only, access qualifier Q_READ = 1, Q_WRITE = 2, // for pointer type only Q_CONST = 4, // pointee Q_RESTRICT = 8, Q_VOLATILE = 16, // pointee Q_PIPE = 32 // pipe } clk_arg_qualifier_t; #pragma pack(push, 4) struct clk_parameter_descriptor_t { clk_value_type_t type; clk_address_space_t space; uint qualifier; const char* name; }; #pragma pack(pop) //#define CLK_LOCAL_MEM_FENCE (1 << 0) //#define CLK_GLOBAL_MEM_FENCE (1 << 1) struct clk_builtins_t { /* Synchronization functions */ void (*barrier_ptr)(cl_mem_fence_flags flags); /* AMD Only builtins: FIXME_lmoriche (extension) */ void* reserved; int (*printf_ptr)(const char* format, ...); }; enum clk_natures_t { KN_HAS_BARRIER = 1 << 0, KN_WG_LEVEL = 1 << 1 }; #if defined(_MSC_VER) #pragma warning(push) #pragma warning(disable : 4200) #endif #if !defined(__OPENCL_VERSION__) || __OPENCL_VERSION__ >= 200 typedef struct clk_pipe_t { size_t read_idx; size_t write_idx; size_t end_idx; char padding[128 - 3 * sizeof(size_t)]; char packets[]; } clk_pipe_t; #endif #if defined(_MSC_VER) #pragma warning(pop) #endif #endif /*CL_KERNEL_H_*/ clr-rocm-5.7.1/opencl/amdocl/cl_kernel_info_amd.cpp000066400000000000000000000135211450307266000222640ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "cl_kernel_info_amd.h" #include "platform/kernel.hpp" #include "platform/ndrange.hpp" #include "platform/command.hpp" /*! \addtogroup API * @{ * * \addtogroup AMD_Extensions * @{ * */ /*! \brief Retrieves the kernel information. * * \param kernel specifies the kernel object being queried. * * \param device identifies a specific device in the list of devices associated * with \a kernel. The list of devices is the list of devices in the OpenCL * context that is associated with \a kernel. If the list of devices associated * with kernel is a single device, \a device can be a NULL value. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_KERNEL if \a kernel is a not a valid program object */ RUNTIME_ENTRY(cl_int, clGetKernelInfoAMD, (cl_kernel kernel, cl_device_id device, cl_kernel_info_amd param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { // Check if we have a valid device if (!is_valid(device)) { return CL_INVALID_DEVICE; } // Check if we have a valid performance counter if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } // Find the kernel, associated with the specified device const device::Kernel* devKernel = as_amd(kernel)->getDeviceKernel(*as_amd(device)); // Make sure we found a valid kernel if (devKernel == NULL) { return CL_INVALID_KERNEL; } // Get the corresponded parameters switch (param_name) { case CL_KERNELINFO_SCRATCH_REGS: return amd::clGetInfo(devKernel->workGroupInfo()->scratchRegs_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_WAVEFRONT_PER_SIMD: return amd::clGetInfo(devKernel->workGroupInfo()->wavefrontPerSIMD_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_WAVEFRONT_SIZE: return amd::clGetInfo(devKernel->workGroupInfo()->wavefrontSize_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_AVAILABLE_GPRS: return amd::clGetInfo(devKernel->workGroupInfo()->availableGPRs_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_USED_GPRS: return amd::clGetInfo(devKernel->workGroupInfo()->usedGPRs_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_AVAILABLE_SGPRS: return amd::clGetInfo(devKernel->workGroupInfo()->availableSGPRs_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_USED_SGPRS: return amd::clGetInfo(devKernel->workGroupInfo()->usedSGPRs_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_AVAILABLE_VGPRS: return amd::clGetInfo(devKernel->workGroupInfo()->availableVGPRs_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_USED_VGPRS: return amd::clGetInfo(devKernel->workGroupInfo()->usedVGPRs_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_AVAILABLE_LDS_SIZE: return amd::clGetInfo(devKernel->workGroupInfo()->availableLDSSize_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_USED_LDS_SIZE: return amd::clGetInfo(devKernel->workGroupInfo()->usedLDSSize_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_AVAILABLE_STACK_SIZE: return amd::clGetInfo(devKernel->workGroupInfo()->availableStackSize_, param_value_size, param_value, param_value_size_ret); case CL_KERNELINFO_USED_STACK_SIZE: return amd::clGetInfo(devKernel->workGroupInfo()->usedStackSize_, param_value_size, param_value, param_value_size_ret); default: return CL_INVALID_VALUE; } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_kernel_info_amd.h000066400000000000000000000066101450307266000217320ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __CL_KERNEL_INFO_AMD_H #define __CL_KERNEL_INFO_AMD_H #include "CL/cl_platform.h" #ifdef __cplusplus extern "C" { #endif /*__cplusplus*/ typedef cl_uint cl_kernel_info_amd; /* cl_kernel_info */ enum KernelInfoAMD { CL_KERNELINFO_NONE = 0x0, CL_KERNELINFO_SCRATCH_REGS, CL_KERNELINFO_WAVEFRONT_PER_SIMD, CL_KERNELINFO_WAVEFRONT_SIZE, CL_KERNELINFO_AVAILABLE_GPRS, CL_KERNELINFO_USED_GPRS, CL_KERNELINFO_AVAILABLE_LDS_SIZE, CL_KERNELINFO_USED_LDS_SIZE, CL_KERNELINFO_AVAILABLE_STACK_SIZE, CL_KERNELINFO_USED_STACK_SIZE, CL_KERNELINFO_AVAILABLE_SGPRS, CL_KERNELINFO_USED_SGPRS, CL_KERNELINFO_AVAILABLE_VGPRS, CL_KERNELINFO_USED_VGPRS, CL_KERNELINFO_LAST }; /*! \brief Retrieves the kernel information. * * \param kernel specifies the kernel object being queried. * * \param device identifies a specific device in the list of devices associated * with \a kernel. The list of devices is the list of devices in the OpenCL * context that is associated with \a kernel. If the list of devices associated * with kernel is a single device, \a device can be a NULL value. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_KERNEL if \a kernel is a not a valid program object */ extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelInfoAMD( cl_kernel /* kernel */, cl_device_id /* device */, cl_kernel_info_amd /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */ ) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } /*extern "C"*/ #endif /*__cplusplus*/ #endif /*__CL_KERNEL_INFO_AMD_H*/ clr-rocm-5.7.1/opencl/amdocl/cl_memobj.cpp000066400000000000000000005643351450307266000204370ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "platform/context.hpp" #include "platform/command.hpp" #include "platform/memory.hpp" #include #ifdef _WIN32 #include #include "cl_d3d9_amd.hpp" #include "cl_d3d10_amd.hpp" #include "cl_d3d11_amd.hpp" #endif //_WIN32 #include /*! \addtogroup API * @{ * * \addtogroup CL_MemObjs * * Memory objects are categorized into two types: buffer objects, and image * objects. A buffer object stores a one-dimensional collection of elements * whereas an image object is used to store a two- or three- dimensional * texture, frame-buffer or image. * * Elements of a buffer object can be a scalar data type (such as an int, * float), vector data type, or a user-defined structure. An image object is * used to represent a buffer that can be used as a texture or a frame-buffer. * The elements of an image object are selected from a list of predefined * image formats. The minimum number of elements in a memory object is one. * * @{ * * \addtogroup CL_CreatingBuffer * * @{ */ /*! \brief Helper function to validate cl_mem_flags * * chkReadWrite: true: check the flag CL_MEM_KERNEL_READ_AND_WRITE * false: don't check the falg CL_MEM_KERNEL_READ_AND_WRITE * \return true of flags are valid, otherwise - false */ static bool validateFlags(cl_mem_flags flags, bool chkReadWrite = false) { // check flags for validity cl_bitfield temp = flags & (CL_MEM_READ_WRITE | CL_MEM_WRITE_ONLY | CL_MEM_READ_ONLY); if (chkReadWrite) { temp |= (flags & CL_MEM_KERNEL_READ_AND_WRITE); } if (temp && !(CL_MEM_READ_WRITE == temp || CL_MEM_WRITE_ONLY == temp || (chkReadWrite && (CL_MEM_KERNEL_READ_AND_WRITE == temp || (CL_MEM_KERNEL_READ_AND_WRITE | CL_MEM_READ_WRITE) == temp)) || CL_MEM_READ_ONLY == temp)) { return false; } if ((flags & (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR)) == (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR)) { return false; } if ((flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR)) == (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR)) { return false; } if ((flags & CL_MEM_EXTERNAL_PHYSICAL_AMD) && (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR | CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_WRITE | CL_MEM_READ_ONLY))) { return false; } if ((flags & CL_MEM_BUS_ADDRESSABLE_AMD) && (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR))) { return false; } return true; } /*! \brief Helper function to validate cl_image_desc * * \return true of cl_image_desc parameters are valid, otherwise - false * * image_type describes the image type and must be either CL_MEM_OBJECT_IMAGE1D, * CL_MEM_OBJECT_IMAGE1D_BUFFER, CL_MEM_OBJECT_IMAGE1D_ARRAY, * CL_MEM_OBJECT_IMAGE2D, CL_MEM_OBJECT_IMAGE2D_ARRAY or CL_MEM_OBJECT_IMAGE3D. * * image_width is the width of the image in pixels. For a 2D image and * image array, the image width must be <= CL_DEVICE_IMAGE2D_MAX_WIDTH. * For a 3D image, the image width must be <= CL_DEVICE_IMAGE3D_MAX_WIDTH. * For a 1D image buffer, the image width must be <= CL_DEVICE_IMAGE_MAX_BUFFER_SIZE. * For a 1D image and 1D image array, the image width must be * <= CL_DEVICE_IMAGE2D_MAX_WIDTH. * * image_height is height of the image in pixels. This is only used if * the image is a 2D, 3D or 2D image array. For a 2D image or image array, * the image height must be <= CL_DEVICE_IMAGE2D_MAX_HEIGHT. For a 3D image, * the image height must be <= CL_DEVICE_IMAGE3D_MAX_HEIGHT. * * image_depth is the depth of the image in pixels. This is only used if * the image is a 3D image and must be a value > 1 and * <= CL_DEVICE_IMAGE3D_MAX_DEPTH. * * image_array_size is the number of images in the image array. This is only * used if the image is a 1D or 2D image array. The values for * image_array_size, if specified, must be between 1 and * CL_DEVICE_IMAGE_MAX_ARRAY_SIZE. * * image_row_pitch is the scan-line pitch in bytes. This must be 0 if * host_ptr is NULL and can be either 0 or >= image_width * size of element in * bytes if host_ptr is not NULL. If host_ptr is not NULL and image_row_pitch = 0, * image_row_pitch is calculated as image_width * size of element in bytes. * If image_row_pitch is not 0, it must be a multiple of the image element * size in bytes. * * image_slice_pitch is the size in bytes of each 2D slice in the 3D image or * the size in bytes of each image in a 1D or 2D image array. This must be 0 * if host_ptr is NULL. If host_ptr is not NULL, image_slice_pitch can be either * 0 or >= image_row_pitch * image_height for a 2D image array or 3D image and * can be either 0 or >= image_row_pitch for a 1D image array. If host_ptr is * not NULL and image_slice_pitch = 0, image_slice_pitch is calculated as * image_row_pitch * image_height for a 2D image array or 3D image and * image_row_pitch for a 1D image array. If image_slice_pitch is not 0, it must * be a multiple of the image_row_pitch. * * num_mip_levels and num_samples must be 0. * * buffer refers to a valid buffer memory object if image_type is * CL_MEM_OBJECT_IMAGE1D_BUFFER. Otherwise it must be NULL. For a 1D image * buffer object, the image pixels are taken from the buffer object’s * data store. When the contents of a buffer object’s data store are modified, * those changes are reflected in the contents of the 1D image buffer object * and vice-versa at corresponding sychronization points. The image_width * size of element in bytes must be <= size of buffer object data store. */ static bool validateImageDescriptor(const std::vector& devices, const amd::Image::Format imageFormat, const cl_image_desc* desc, void* hostPtr, size_t& imageRowPitch, size_t& imageSlicePitch) { if (desc == NULL) { return false; } // Check if any device supports mipmaps bool mipMapSupport = false; for (auto& dev : devices) { if (dev->settings().checkExtension(ClKhrMipMapImage)) { mipMapSupport = true; break; } } // Check if any device can accept mipmaps if ((desc->num_mip_levels != 0) && (!mipMapSupport || (hostPtr != NULL))) { return false; } if (desc->num_samples != 0) { return false; } amd::Buffer* buffer = NULL; size_t elemSize = imageFormat.getElementSize(); bool imageBuffer = false; if (desc->image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER || (desc->mem_object != NULL && desc->image_type == CL_MEM_OBJECT_IMAGE2D)) { if (desc->mem_object == NULL) { return false; } buffer = as_amd(desc->mem_object)->asBuffer(); if (buffer == NULL) { return false; } if ((desc->image_width * desc->image_height * elemSize) > buffer->getSize()) { return false; } imageBuffer = true; } else if (desc->mem_object != NULL) { return false; } imageRowPitch = desc->image_row_pitch; imageSlicePitch = desc->image_slice_pitch; switch (desc->image_type) { case CL_MEM_OBJECT_IMAGE3D: case CL_MEM_OBJECT_IMAGE2D_ARRAY: case CL_MEM_OBJECT_IMAGE1D_ARRAY: // check slice pitch if (hostPtr == NULL) { if (imageSlicePitch != 0) { return false; } } // Fall through to process pitch... case CL_MEM_OBJECT_IMAGE2D: case CL_MEM_OBJECT_IMAGE1D: // check row pitch rules if (hostPtr == NULL && !imageBuffer) { if (imageRowPitch != 0) { return false; } } else if (imageRowPitch != 0) { if ((imageRowPitch < desc->image_width * elemSize) || ((imageRowPitch % elemSize) != 0)) { return false; } } if (imageRowPitch == 0) { if (desc->mem_object != nullptr) { imageRowPitch = amd::alignUp(desc->image_width, devices[0]->info().imagePitchAlignment_) * elemSize; } else { imageRowPitch = desc->image_width * elemSize; } } break; case CL_MEM_OBJECT_IMAGE1D_BUFFER: break; default: return false; break; } // Extra slice validation for three dimensional images if ((desc->image_type == CL_MEM_OBJECT_IMAGE3D) || (desc->image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY)) { if (imageSlicePitch != 0) { if ((imageSlicePitch < (imageRowPitch * desc->image_height)) || ((imageSlicePitch % imageRowPitch) != 0)) { return false; } } if (imageSlicePitch == 0) { imageSlicePitch = imageRowPitch * desc->image_height; } } else if (desc->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { if (imageSlicePitch != 0) { if ((imageSlicePitch % imageRowPitch) != 0) { return false; } } if (imageSlicePitch == 0) { imageSlicePitch = imageRowPitch; } } return true; } class ImageViewRef : public amd::EmbeddedObject { private: amd::Image* ref_; // Do not copy image view references. ImageViewRef& operator=(const ImageViewRef& sref); public: explicit ImageViewRef() : ref_(NULL) {} ~ImageViewRef() { if (ref_ != NULL) { ref_->release(); } } ImageViewRef& operator=(amd::Image* sref) { ref_ = sref; return *this; } amd::Image* operator()() const { return ref_; } }; /*! \brief Create a buffer object. * * \param context is a valid OpenCL context used to create the buffer object. * * \param flags is a bit-field that is used to specify allocation and usage * information such as the memory arena that should be used to allocate the * buffer object and how it will be used. * * \param size is the size in bytes of the buffer memory object to be * allocated. * * \param host_ptr is a pointer to the buffer data that may already be * allocated by the application. The size of the buffer that host_ptr points * to must be >= \a size bytes. Passing in a pointer to an already allocated * buffer on the host and using it as a buffer object allows applications to * share data efficiently with kernels and the host. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero buffer object and \a errcode_ret is set to * CL_SUCCESS if the buffer object is created successfully or a NULL value * with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if values specified in \a flags are not valid. * - CL_INVALID_BUFFER_SIZE if \a size is 0 or is greater than * CL_DEVICE_MAX_MEM_ALLOC_SIZE value. * - CL_INVALID_HOST_PTR if host_ptr is NULL and CL_MEM_USE_HOST_PTR or * CL_MEM_COPY_HOST_PTR are set in \a flags or if \a host_ptr is not NULL but * CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in \a flags. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for buffer object. * - CL_INVALID_OPERATION if the buffer object cannot be created for all * devices in \a context. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_mem, clCreateBuffer, (cl_context context, cl_mem_flags flags, size_t size, void* host_ptr, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return NULL; } // check flags for validity if (!validateFlags(flags)) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return (cl_mem)0; } // check size if (size == 0) { *not_null(errcode_ret) = CL_INVALID_BUFFER_SIZE; LogWarning("invalid parameter \"size = 0\""); return (cl_mem)0; } const std::vector& devices = as_amd(context)->devices(); bool sizePass = false; for (auto& dev : devices) { if ((dev->info().maxMemAllocSize_ >= size) || (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR))) { sizePass = true; break; } } if (!sizePass) { *not_null(errcode_ret) = CL_INVALID_BUFFER_SIZE; LogWarning("invalid parameter \"size\""); return (cl_mem)0; } // check host_ptr consistency if (host_ptr == NULL) { if (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR | CL_MEM_EXTERNAL_PHYSICAL_AMD)) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"host_ptr\""); return (cl_mem)0; } } else { if (!(flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR | CL_MEM_EXTERNAL_PHYSICAL_AMD))) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"host_ptr\""); return (cl_mem)0; } if (flags & CL_MEM_EXTERNAL_PHYSICAL_AMD) { flags |= CL_MEM_WRITE_ONLY; cl_bus_address_amd* bus_address = reinterpret_cast(host_ptr); if (bus_address->surface_bus_address == 0) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"surface bus address\""); return static_cast(NULL); } if (bus_address->surface_bus_address & (amd::Os::pageSize() - 1)) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"surface bus address\""); return static_cast(NULL); } if (bus_address->marker_bus_address == 0) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"marker bus address\""); return static_cast(NULL); } if (bus_address->marker_bus_address & (amd::Os::pageSize() - 1)) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"marker bus address\""); return static_cast(NULL); } } } // check extensions flag consistency if ((flags & CL_MEM_USE_PERSISTENT_MEM_AMD) && (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR | CL_MEM_EXTERNAL_PHYSICAL_AMD | CL_MEM_BUS_ADDRESSABLE_AMD))) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("conflicting flags CL_MEM_USE_PERSISTENT_MEM_AMD and host memory specific flags"); return (cl_mem)0; } if ((flags & CL_MEM_EXTERNAL_PHYSICAL_AMD) || (flags & CL_MEM_BUS_ADDRESSABLE_AMD)) { size = (size + (amd::Os::pageSize() - 1)) & (~(amd::Os::pageSize() - 1)); } amd::Context& amdContext = *as_amd(context); amd::Memory* mem = NULL; // check if the ptr is in the svm space, if yes, we need return SVM buffer amd::Memory* svmMem = amd::MemObjMap::FindMemObj(host_ptr); if ((NULL != svmMem) && (flags & CL_MEM_USE_HOST_PTR)) { size_t svmSize = svmMem->getSize(); size_t offset = static_cast
(host_ptr) - static_cast
(svmMem->getSvmPtr()); if (size + offset > svmSize) { LogWarning("invalid parameter \"size\""); return (cl_mem)0; } mem = new (amdContext) amd::Buffer(*svmMem, flags, offset, size); svmMem->setHostMem(host_ptr); } else { mem = new (amdContext) amd::Buffer(amdContext, flags, size); } if (mem == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!mem->create(host_ptr)) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; mem->release(); return NULL; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(mem); } RUNTIME_EXIT RUNTIME_ENTRY_RET(cl_mem, clCreateSubBuffer, (cl_mem mem, cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void* buffer_create_info, cl_int* errcode_ret)) { if (!is_valid(mem) || as_amd(mem)->asBuffer() == NULL) { *not_null(errcode_ret) = CL_INVALID_MEM_OBJECT; return NULL; } amd::Buffer& buffer = *as_amd(mem)->asBuffer(); // check flags for validity if (!validateFlags(flags) || (buffer_create_type != CL_BUFFER_CREATE_TYPE_REGION)) { *not_null(errcode_ret) = CL_INVALID_VALUE; return NULL; } if (buffer.getMemFlags() & (CL_MEM_EXTERNAL_PHYSICAL_AMD | CL_MEM_BUS_ADDRESSABLE_AMD)) { *not_null(errcode_ret) = CL_INVALID_VALUE; return NULL; } const cl_buffer_region* region = (const cl_buffer_region*)buffer_create_info; // Check sub buffer offset alignment bool alignmentPass = false; const std::vector& devices = buffer.getContext().devices(); for (auto& dev : devices) { cl_uint deviceAlignmentBytes = dev->info().memBaseAddrAlign_ >> 3; if (region->origin == amd::alignDown(region->origin, deviceAlignmentBytes)) { alignmentPass = true; } } // Return an error if the offset is misaligned on all devices if (!alignmentPass) { *not_null(errcode_ret) = CL_MISALIGNED_SUB_BUFFER_OFFSET; return NULL; } // check size if ((region->size == 0) || (region->origin + region->size) > buffer.getSize()) { *not_null(errcode_ret) = CL_INVALID_BUFFER_SIZE; return NULL; } amd::Memory* mem = new (buffer.getContext()) amd::Buffer(buffer, (flags) ? flags : buffer.getMemFlags(), region->origin, region->size); if (mem == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return NULL; } if (!mem->create(NULL)) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; mem->release(); return NULL; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(mem); } RUNTIME_EXIT /*! @} * \addtogroup CL_ReadWriteBuffer * @{ */ /*! \brief Enqueue a command to read from a buffer object to host memory. * * \param command_queue refers to the command-queue in which the read / write * command will be queued. \a command_queue and \a buffer must be created with * the same OpenCL context. * * \param buffer refers to a valid buffer object. * * \param blocking_read indicates if the read operation is blocking or * nonblocking. If \a blocking_read is CL_TRUE i.e. the read command is * blocking, clEnqueueReadBuffer does not return until the buffer data has been * read and copied into memory pointed to by ptr. * If \a blocking_read is CL_FALSE i.e. the read command is non-blocking, * clEnqueueReadBuffer queues a non-blocking read command and returns. The * contents of the buffer that ptr points to cannot be used until the read * command has completed. The \a event argument returns an event object which * can be used to query the execution status of the read command. When the read * command has completed, the contents of the buffer that ptr points to can be * used by the application. * * \param offset is the offset in bytes in the buffer object to read from or * write to. * * \param cb is the size in bytes of data being read or written. * * \param ptr is the pointer to buffer in host memory where data is to be read * into or to be written from. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, * then this particular command does not wait on any event to complete. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular read * command and can be used to query or queue a wait for this particular command * to complete. \a event can be NULL in which case it will not be possible for * the application to query the status of this command or queue a wait for this * command to complete. * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with \a command_queue and * \a buffer are not the same. * - CL_INVALID_MEM_OBJECT if \a buffer is not a valid buffer object. * - CL_INVALID_VALUE if the region being read or written specified by (offset, * cb) is out of bounds or if \a ptr is a NULL value. * - CL_INVALID_OPERATION if \a clEnqueueReadBuffer is called on buffer which * has been created with CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and \a * num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clEnqueueReadBuffer, (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t cb, void* ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(buffer)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* srcBuffer = as_amd(buffer)->asBuffer(); if (srcBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } if (srcBuffer->getMemFlags() & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) { return CL_INVALID_OPERATION; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcBuffer->getContext()) { return CL_INVALID_CONTEXT; } if (ptr == NULL) { return CL_INVALID_VALUE; } amd::Coord3D srcOffset(offset, 0, 0); amd::Coord3D srcSize(cb, 1, 1); if (!srcBuffer->validateRegion(srcOffset, srcSize)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::CopyMetadata copyMetadata(!blocking_read, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::ReadMemoryCommand* command = new amd::ReadMemoryCommand( hostQueue, CL_COMMAND_READ_BUFFER, eventWaitList, *srcBuffer, srcOffset, srcSize, ptr, 0, 0, copyMetadata); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); if (blocking_read) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueue a command to write to a buffer object from host memory. * * \param command_queue refers to the command-queue in which the read / write * command will be queued. \a command_queue and \a buffer must be created with * the same OpenCL context. * * \param buffer refers to a valid buffer object. * * \param blocking_write indicates if the write operation is blocking or * non-blocking. If \a blocking_write is CL_TRUE, the OpenCL implementation * copies the data referred to by \a ptr and enqueues the write operation in * the command-queue. The memory pointed to by \a ptr can be reused by the * application after the clEnqueueWriteBuffer call returns. If * \a blocking_write is CL_FALSE, the OpenCL implementation will use \a ptr to * perform a nonblocking write. As the write is non-blocking the implementation * can return immediately. The memory pointed to by \a ptr cannot be reused by * the application after the call returns. The \a event argument returns an * event object which can be used to query the execution status of the write * command. When the write command has completed, the memory pointed to by * \a ptr can then be reused by the application * * \param offset is the offset in bytes in the buffer object to read from or * write to. * * \param cb is the size in bytes of data being read or written. * * \param ptr is the pointer to buffer in host memory where data is to be read * into or to be written from. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, * then this particular command does not wait on any event to complete. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular write * command and can be used to query or queue a wait for this particular command * to complete. \a event can be NULL in which case it will not be possible for * the application to query the status of this command or queue a wait for this * command to complete. * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with \a command_queue and * \a buffer are not the same. * - CL_INVALID_MEM_OBJECT if \a buffer is not a valid buffer object. * - CL_INVALID_VALUE if the region being read or written specified by (offset, * cb) is out of bounds or if \a ptr is a NULL value. * - CL_INVALID_OPERATION if \a clEnqueueWriteBuffer is called on buffer which * has been created with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and \a * num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueWriteBuffer, (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t cb, const void* ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(buffer)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* dstBuffer = as_amd(buffer)->asBuffer(); if (dstBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } if (dstBuffer->getMemFlags() & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) { return CL_INVALID_OPERATION; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != dstBuffer->getContext()) { return CL_INVALID_CONTEXT; } if (ptr == NULL) { return CL_INVALID_VALUE; } amd::Coord3D dstOffset(offset, 0, 0); amd::Coord3D dstSize(cb, 1, 1); if (!dstBuffer->validateRegion(dstOffset, dstSize)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::CopyMetadata copyMetadata(!blocking_write, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::WriteMemoryCommand* command = new amd::WriteMemoryCommand( hostQueue, CL_COMMAND_WRITE_BUFFER, eventWaitList, *dstBuffer, dstOffset, dstSize, ptr, 0, 0, copyMetadata); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); if (blocking_write) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueues a command to copy a buffer object to another * * \param command_queue refers to the command-queue in which the copy command * will be queued. The OpenCL context associated with \a command_queue, * \a src_buffer and \a dst_buffer must be the same. * * \param src_buffer is the source buffer object. * * \param dst_buffer is the destination buffer object. * * \param src_offset refers to the offset where to begin reading data in * \a src_buffer. * * \param dst_offset refers to the offset where to begin copying data in * \a dst_buffer. * * \param cb refers to the size in bytes to copy. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, * then this particular command does not wait on any event to complete. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular copy * command and can be used to query or queue a wait for this particular command * to complete. \a event can be NULL in which case it will not be possible for * the application to query the status of this command or queue and wait for * this command to complete. clEnqueueBarrier can be used instead. * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with \a command_queue, * \a src_buffer and \a dst_buffer are not the same. * - CL_INVALID_MEM_OBJECT if \a src_buffer and \a dst_buffer are not valid * buffer objects. * - CL_INVALID_VALUE if \a src_offset, \a dst_offset, \a cb, \a src_offset + * \a cb or \a dst_offset + \a cb require accessing elements outside the * buffer memory objects. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueCopyBuffer, (cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(src_buffer) || !is_valid(dst_buffer)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* srcBuffer = as_amd(src_buffer)->asBuffer(); amd::Buffer* dstBuffer = as_amd(dst_buffer)->asBuffer(); if (srcBuffer == NULL || dstBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcBuffer->getContext() || hostQueue.context() != dstBuffer->getContext()) { return CL_INVALID_CONTEXT; } amd::Coord3D srcOffset(src_offset, 0, 0); amd::Coord3D dstOffset(dst_offset, 0, 0); amd::Coord3D size(cb, 1, 1); if (!srcBuffer->validateRegion(srcOffset, size) || !dstBuffer->validateRegion(dstOffset, size)) { return CL_INVALID_VALUE; } if (srcBuffer == dstBuffer && ((src_offset <= dst_offset && dst_offset < src_offset + cb) || (dst_offset <= src_offset && src_offset < dst_offset + cb))) { return CL_MEM_COPY_OVERLAP; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::CopyMemoryCommand* command = new amd::CopyMemoryCommand(hostQueue, CL_COMMAND_COPY_BUFFER, eventWaitList, *srcBuffer, *dstBuffer, srcOffset, dstOffset, size); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief clEnqueueReadBufferRect enqueues commands to read a 2D or 3D rectangular * region from a buffer object to host memory. * * \param command_queue refers to the command-queue in which the read / write * command will be queued. command_queue and buffer must be created with the same * OpenCL context. buffer refers to a valid buffer object. * * \param blocking_read indicates if the read operations are blocking or * nonblocking. * If \a blocking_read is CL_TRUE i.e. the read command is blocking, * clEnqueueReadBufferRect does not return until the buffer data has been read * and copied into memory pointed to by ptr. * If blocking_read is CL_FALSE i.e. the read command is non-blocking, * clEnqueueReadBufferRect queues a non-blocking read command and returns. * The contents of the buffer that ptr points to cannot be used until * the read command has completed. The event argument returns an event object * which can be used to query the execution status of the read command. * When the read command has completed, the contents of the buffer that * ptr points to can be used by the application. * * \buffer_origin defines the (x, y, z) offset in the memory region associated * with buffer. For a 2D rectangle region, the z value given by buffer_origin[2] * should be 0. The offset in bytes is computed as * buffer_origin[2] * buffer_slice_pitch + buffer_origin[1] * buffer_row_pitch + * buffer_origin[0]. * * \host_origin defines the (x, y, z) offset in the memory region pointed to * by ptr. For a 2D rectangle region, the z value given by host_origin[2] * should be 0. The offset in bytes is computed as * host_origin[2] * host_slice_pitch + host_origin[1] * host_row_pitch + * host_origin[0]. * * \param region defines the (width, height, depth) in bytes of the 2D or 3D * rectangle being read or written. * For a 2D rectangle copy, the depth value given by region[2] should be 1. * * \param buffer_row_pitch is the length of each row in bytes to be used for * the memory region associated with buffer. If \a buffer_row_pitch is 0, * \a buffer_row_pitch is computed as region[0]. * * \param buffer_slice_pitch is the length of each 2D slice in bytes to be used * for the memory region associated with buffer. If \a buffer_slice_pitch is 0, * \a buffer_slice_pitch is computed as region[1] * \a buffer_row_pitch. * * \param host_row_pitch is the length of each row in bytes to be used for * the memory region pointed to by ptr. If \a host_row_pitch is 0, \a host_row_pitch * is computed as region[0]. * * \param host_slice_pitch is the length of each 2D slice in bytes to be used * for the memory region pointed to by ptr. If \a host_slice_pitch is 0, * \a host_slice_pitch is computed as region[1] * \a host_row_pitch. * ptr is the pointer to buffer in host memory where data is to be read into * or to be written from. * * \param event_wait_list and \a num_events_in_wait_list specify events that * need to complete before this particular command can be executed. * If \a event_wait_list is NULL, then this particular command does not wait on any * event to complete. If \a event_wait_list is NULL, \a num_events_in_wait_list * must be 0. If \a event_wait_list is not NULL, the list of events pointed to * by \a event_wait_list must be valid and \a num_events_in_wait_list * must be greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. * * \param event returns an event object that identifies this particular * read / write command and can be used to query or queue a wait for this * particular command to complete. event can be NULL in which case it will not * be possible for the application to query the status of this command or queue a * wait for this command to complete. * * \return CL_SUCCESS if the function is executed successfully. Otherwise, * it returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with command_queue and * buffer are not the same or if the context associated with \a command_queue * and events in event_wait_list are not the same. * - CL_INVALID_MEM_OBJECT if buffer is not a valid buffer object. * - CL_INVALID_VALUE if the region being read or written specified by * (buffer_origin, region) is out of bounds. * - CL_INVALID_VALUE if ptr is a NULL value. * - CL_INVALID_OPERATION if \a clEnqueueReadBufferRect is called on buffer which * has been created with CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS. * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and * \a num_events_in_wait_list > 0, or event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_MISALIGNED_SUB_BUFFER_OFFSET if buffer is a sub-buffer object and offset * specified when the sub-buffer object is created is not aligned to * - CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with queue. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for data store associated with buffer. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clEnqueueReadBufferRect, (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t* buffer_origin, const size_t* host_origin, const size_t* region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void* ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { // Validate command queue if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } // Validate opencl buffer if (!is_valid(buffer)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* srcBuffer = as_amd(buffer)->asBuffer(); if (srcBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } if (srcBuffer->getMemFlags() & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) { return CL_INVALID_OPERATION; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcBuffer->getContext()) { return CL_INVALID_CONTEXT; } // Make sure we have a valid system memory pointer if (ptr == NULL) { return CL_INVALID_VALUE; } // Create buffer rectangle info structure amd::BufferRect bufRect; amd::BufferRect hostRect; if (!bufRect.create(buffer_origin, region, buffer_row_pitch, buffer_slice_pitch) || !hostRect.create(host_origin, region, host_row_pitch, host_slice_pitch)) { return CL_INVALID_VALUE; } amd::Coord3D srcStart(bufRect.start_, 0, 0); amd::Coord3D srcEnd(bufRect.end_, 1, 1); if (!srcBuffer->validateRegion(srcStart, srcEnd)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Coord3D size(region[0], region[1], region[2]); amd::CopyMetadata copyMetadata(!blocking_read, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::ReadMemoryCommand* command = new amd::ReadMemoryCommand(hostQueue, CL_COMMAND_READ_BUFFER_RECT, eventWaitList, *srcBuffer, srcStart, size, ptr, bufRect, hostRect, copyMetadata); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); if (blocking_read) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief clEnqueueWriteBufferRect enqueues commands to write a 2D or 3D * rectangular region to a buffer object from host memory. * * \param command_queue refers to the command-queue in which the read / write * command will be queued. command_queue and buffer must be created with the same * OpenCL context. buffer refers to a valid buffer object. * * \param blocking_write indicates if the write operations are blocking or * nonblocking. * If \a blocking_write is CL_TRUE, the OpenCL implementation copies the data * referred to by ptr and enqueues the write operation in the command-queue. * The memory pointed to by ptr can be reused by the application after * the clEnqueueWriteBufferRect call returns. * If \a blocking_write is CL_FALSE, the OpenCL implementation will use ptr to * perform a nonblocking write. As the write is non-blocking the implementation * can return immediately. The memory pointed to by ptr cannot be reused by * the application after the call returns. The event argument returns * an event object which can be used to query the execution status of the write * command. When the write command has completed, the memory pointed to by ptr * can then be reused by the application. * * \buffer_origin defines the (x, y, z) offset in the memory region associated * with buffer. For a 2D rectangle region, the z value given by buffer_origin[2] * should be 0. The offset in bytes is computed as * buffer_origin[2] * buffer_slice_pitch + buffer_origin[1] * buffer_row_pitch + * buffer_origin[0]. * * \host_origin defines the (x, y, z) offset in the memory region pointed to * by ptr. For a 2D rectangle region, the z value given by host_origin[2] * should be 0. The offset in bytes is computed as * host_origin[2] * host_slice_pitch + host_origin[1] * host_row_pitch + * host_origin[0]. * * \param region defines the (width, height, depth) in bytes of the 2D or 3D * rectangle being read or written. * For a 2D rectangle copy, the depth value given by region[2] should be 1. * * \param buffer_row_pitch is the length of each row in bytes to be used for * the memory region associated with buffer. If \a buffer_row_pitch is 0, * \a buffer_row_pitch is computed as region[0]. * * \param buffer_slice_pitch is the length of each 2D slice in bytes to be used * for the memory region associated with buffer. If \a buffer_slice_pitch is 0, * \a buffer_slice_pitch is computed as region[1] * \a buffer_row_pitch. * * \param host_row_pitch is the length of each row in bytes to be used for * the memory region pointed to by ptr. If \a host_row_pitch is 0, \a host_row_pitch * is computed as region[0]. * * \param host_slice_pitch is the length of each 2D slice in bytes to be used * for the memory region pointed to by ptr. If \a host_slice_pitch is 0, * \a host_slice_pitch is computed as region[1] * \a host_row_pitch. * ptr is the pointer to buffer in host memory where data is to be read into * or to be written from. * * \param event_wait_list and \a num_events_in_wait_list specify events that * need to complete before this particular command can be executed. * If \a event_wait_list is NULL, then this particular command does not wait on any * event to complete. If \a event_wait_list is NULL, \a num_events_in_wait_list * must be 0. If \a event_wait_list is not NULL, the list of events pointed to * by \a event_wait_list must be valid and \a num_events_in_wait_list * must be greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. * * \param event returns an event object that identifies this particular * read / write command and can be used to query or queue a wait for this * particular command to complete. event can be NULL in which case it will not * be possible for the application to query the status of this command or queue a * wait for this command to complete. * * clEnqueueReadBufferRect and clEnqueueWriteBufferRect * \return CL_SUCCESS if the function is executed successfully. Otherwise, * it returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with command_queue and * buffer are not the same or if the context associated with \a command_queue * and events in event_wait_list are not the same. * - CL_INVALID_MEM_OBJECT if buffer is not a valid buffer object. * - CL_INVALID_VALUE if the region being read or written specified by * (buffer_origin, region) is out of bounds. * - CL_INVALID_VALUE if ptr is a NULL value. * - CL_INVALID_OPERATION if \a clEnqueueWriteBufferRect is called on buffer * which has been created with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS. * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and * \a num_events_in_wait_list > 0, or event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_MISALIGNED_SUB_BUFFER_OFFSET if buffer is a sub-buffer object and offset * specified when the sub-buffer object is created is not aligned to * - CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with queue. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for data store associated with buffer. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. */ RUNTIME_ENTRY(cl_int, clEnqueueWriteBufferRect, (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, const size_t* buffer_origin, const size_t* host_origin, const size_t* region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void* ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(buffer)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* dstBuffer = as_amd(buffer)->asBuffer(); if (dstBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } if (dstBuffer->getMemFlags() & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) { return CL_INVALID_OPERATION; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != dstBuffer->getContext()) { return CL_INVALID_CONTEXT; } if (ptr == NULL) { return CL_INVALID_VALUE; } // Create buffer rectangle info structure amd::BufferRect bufRect; amd::BufferRect hostRect; if (!bufRect.create(buffer_origin, region, buffer_row_pitch, buffer_slice_pitch) || !hostRect.create(host_origin, region, host_row_pitch, host_slice_pitch)) { return CL_INVALID_VALUE; } amd::Coord3D dstStart(bufRect.start_, 0, 0); amd::Coord3D dstEnd(bufRect.end_, 1, 1); if (!dstBuffer->validateRegion(dstStart, dstEnd)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Coord3D size(region[0], region[1], region[2]); amd::CopyMetadata copyMetadata(!blocking_write, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::WriteMemoryCommand* command = new amd::WriteMemoryCommand(hostQueue, CL_COMMAND_WRITE_BUFFER_RECT, eventWaitList, *dstBuffer, dstStart, size, ptr, bufRect, hostRect, copyMetadata); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); if (blocking_write) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueues a command to copy a 2D or 3D rectangular region from * the buffer object identified by \a src_buffer to a 2D or 3D region * in the buffer object identified by \a dst_buffer. * * \param command_queue refers to the command-queue in which the copy command * will be queued. The OpenCL context associated with command_queue, * \a src_buffer and \a dst_buffer must be the same. * * \param src_origin defines the (x, y, z) offset in the memory region * associated with \a src_buffer. For a 2D rectangle region, the z value given * by src_origin[2] should be 0. The offset in bytes is computed as * src_origin[2] * src_slice_pitch + src_origin[1] * src_row_pitch + src_origin[0]. * * \param dst_origin defines the (x, y, z) offset in the memory region * associated with \a dst_buffer. For a 2D rectangle region, the z value given * by dst_origin[2] should be 0. The offset in bytes is computed as * dst_origin[2] * dst_slice_pitch + dst_origin[1] * dst_row_pitch + dst_origin[0]. * * \param region defines the (width, height, depth) in bytes of the 2D or 3D * rectangle being copied. For a 2D rectangle, the depth value given by * region[2] should be 1. * * \param pasrc_row_pitch is the length of each row in bytes to be used for * the memory region associated with src_buffer. If src_row_pitch is 0, * src_row_pitch is computed as region[0]. * * \param src_slice_pitch is the length of each 2D slice in bytes to be used * for the memory region associated with src_buffer. If src_slice_pitch is 0, * src_slice_pitch is computed as region[1] * src_row_pitch. * * \param dst_row_pitch is the length of each row in bytes to be used for * the memory region associated with dst_buffer. If dst_row_pitch is 0, * dst_row_pitch is computed as region[0]. * * \param dst_slice_pitch is the length of each 2D slice in bytes to be used * for the memory region associated with dst_buffer. If dst_slice_pitch is 0, * dst_slice_pitch is computed as region[1] * dst_row_pitch. * * \param event_wait_list and num_events_in_wait_list specify events that * need to complete before this particular command can be executed. * If event_wait_list is NULL, then this particular command does not wait on * any event to complete. If event_wait_list is NULL, num_events_in_wait_list * must be 0. If event_wait_list is not NULL, the list of events pointed to by * event_wait_list must be valid and num_events_in_wait_list must be greater * than 0. The events specified in event_wait_list act as synchronization * points. The context associated with events in event_wait_list and * command_queue must be the same. * * \param event returns an event object that identifies this particular copy * command and can be used to query or queue a wait for this particular * command to complete. event can be NULL in which case it will not be * possible for the application to query the status of this command or queue * a wait for this command to complete. clEnqueueBarrier can be used instead. * * \return CL_SUCCESS if the function is executed successfully. Otherwise, * it returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with command_queue, * \a src_buffer and \a dst_buffer are not the same or if the context * associated with \a command_queue and in \a event_wait_list are not the same. * - CL_INVALID_MEM_OBJECT if \a src_buffer and \a dst_buffer are not valid * buffer objects. * - CL_INVALID_VALUE if (\a src_offset, \a region) or (\a dst_offset, * \a region) require accessing elements outside the \a src_buffer and * \a dst_buffer buffer objects respectively. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in * \a event_wait_list are not valid events. * - CL_MEM_COPY_OVERLAP if \a src_buffer and \a dst_buffer are the same * buffer object and the source and destination regions overlap. * - CL_MISALIGNED_SUB_BUFFER_OFFSET if \a src_buffer is a sub-buffer object * and offset specified when the sub-buffer object is created is * not aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device * associated with queue. * - CL_MISALIGNED_SUB_BUFFER_OFFSET if dst_buffer is a sub-buffer object * and offset specified when the sub-buffer object is created is not * aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated * with queue. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate * memory for data store associated with src_buffer or dst_buffer. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources * required by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host * */ RUNTIME_ENTRY(cl_int, clEnqueueCopyBufferRect, (cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, const size_t* src_origin, const size_t* dst_origin, const size_t* region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(src_buffer) || !is_valid(dst_buffer)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* srcBuffer = as_amd(src_buffer)->asBuffer(); amd::Buffer* dstBuffer = as_amd(dst_buffer)->asBuffer(); if (srcBuffer == NULL || dstBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcBuffer->getContext() || hostQueue.context() != dstBuffer->getContext()) { return CL_INVALID_CONTEXT; } // Create buffer rectangle info structure amd::BufferRect srcRect; amd::BufferRect dstRect; if (!srcRect.create(src_origin, region, src_row_pitch, src_slice_pitch) || !dstRect.create(dst_origin, region, dst_row_pitch, dst_slice_pitch)) { return CL_INVALID_VALUE; } amd::Coord3D srcStart(srcRect.start_, 0, 0); amd::Coord3D dstStart(dstRect.start_, 0, 0); amd::Coord3D srcEnd(srcRect.end_, 1, 1); amd::Coord3D dstEnd(dstRect.end_, 1, 1); if (!srcBuffer->validateRegion(srcStart, srcEnd) || !dstBuffer->validateRegion(dstStart, dstEnd)) { return CL_INVALID_VALUE; } // Check if regions overlap each other if ((srcBuffer == dstBuffer) && (std::abs(static_cast(src_origin[0]) - static_cast(dst_origin[0])) < static_cast(region[0])) && (std::abs(static_cast(src_origin[1]) - static_cast(dst_origin[1])) < static_cast(region[1])) && (std::abs(static_cast(src_origin[2]) - static_cast(dst_origin[2])) < static_cast(region[2]))) { return CL_MEM_COPY_OVERLAP; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Coord3D size(region[0], region[1], region[2]); amd::CopyMemoryCommand* command = new amd::CopyMemoryCommand(hostQueue, CL_COMMAND_COPY_BUFFER_RECT, eventWaitList, *srcBuffer, *dstBuffer, srcStart, dstStart, size, srcRect, dstRect); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_MemoryCallback * @{ */ /*! \brief Registers a user callback function that will be called when the * memory object is deleted and its resources freed. * * Each call to clSetMemObjectDestructorCallback registers the specified user * callback function on a callback stack associated with memobj. The registered * user callback functions are called in the reverse order in which they were * registered. The user callback functions are called and then the memory * object’s resources are freed and the memory object is deleted. * This provides a mechanism for the application (and libraries) using memobj * to be notified when the memory referenced by host_ptr, specified when * the memory object is created and used as the storage bits for the memory * object, can be reused or freed. * * \a memobj is a valid memory object. * \a pfn_notify is the callback function that can be registered by the * application. This callback function may be called asynchronously by the * OpenCL implementation. It is the application’s responsibility to ensure * that the callback function is thread-safe. The parameters to this callback * function are: * - memobj is the memory object being deleted. * - user_data is a pointer to user supplied data. * If pfn_notify is NULL, no callback function is registered for memobj. * \a user_data will be passed as the user_data argument when pfn_notify is * called. user_data can be NULL. * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_MEM_OBJECT if memobj is not a valid memory object. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * NOTE: When the user callback function is called by the implementation, the * contents of the memory region pointed to by host_ptr (if the memory object is * created with CL_MEM_USE_HOST_PTR) are undefined. The callback function is * typically used by the application to either free or reuse the memory region * pointed to by host_ptr. The behavior of calling expensive system routines, * OpenCL API calls to create contexts or command-queues, or blocking OpenCL * operations from the following list below, in a callback is undefined. * * \version 1.1r17 */ RUNTIME_ENTRY(cl_int, clSetMemObjectDestructorCallback, (cl_mem memobj, void(CL_CALLBACK* pfn_notify)(cl_mem memobj, void* user_data), void* user_data)) { if (!is_valid(memobj)) { return CL_INVALID_MEM_OBJECT; } if (pfn_notify == NULL) { return CL_INVALID_VALUE; } if (!as_amd(memobj)->setDestructorCallback(pfn_notify, user_data)) { return CL_OUT_OF_HOST_MEMORY; } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_RetRelMemory * @{ */ /*! \brief Increment the \a memobj reference count. * * \return CL_SUCCESS if the function is executed successfully or * CL_INVALID_MEM_OBJECT if \a memobj is not a valid memory object. * * clCreateBuffer and clCreateImage{2D|3D} perform an implicit retain. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clRetainMemObject, (cl_mem memobj)) { if (!is_valid(memobj)) { return CL_INVALID_MEM_OBJECT; } as_amd(memobj)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Decrement the \a memobj reference count. * * After the \a memobj reference count becomes zero and commands queued for * execution on a command-queue(s) that use \a memobj have finished, the * memory object is deleted. * * \return CL_SUCCESS if the function is executed successfully or * CL_INVALID_MEM_OBJECT if \a memobj is not a valid memory object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clReleaseMemObject, (cl_mem memobj)) { if (!is_valid(memobj)) { return CL_INVALID_MEM_OBJECT; } as_amd(memobj)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_CreatingImage * @{ */ /*! \brief Create a (1D, or 2D) image object. * * \param context is a valid OpenCL context on which the image object is to be * created. * * \param flags is a bit-field that is used to specify allocation and usage * information about the image memory object being created. * * \param image_format is a pointer to a structure that describes format * properties of the image to be allocated. * * \param image_width is the width of the image in pixels. Must be greater * than or equal to 1. * * \param image_height is the height of the image in pixels. Must be greater * than or equal to 1. * * \param image_row_pitch is the scan-line pitch in bytes. This must be 0 if * \a host_ptr is NULL and can be either 0 or >= \a image_width * size of * element in bytes if \a host_ptr is not NULL. If \a host_ptr is not NULL and * \a image_row_pitch = 0, \a image_row_pitch is calculated as * \a image_width * size of element in bytes. * * \param host_ptr is a pointer to the image data that may already be allocated * by the application. The size of the buffer that \a host_ptr points to must * be >= \a image_row_pitch * \a image_height. The size of each element in * bytes must be a power of 2. Passing in a pointer to an already allocated * buffer on the host and using it as a memory object allows applications to * share data efficiently with kernels and the host. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero image object and errcode_ret is set to CL_SUCCESS * if the image object is created successfully. It returns a NULL value with * one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if values specified in \a flags are not valid. * - CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if values specified in \a image_format * are not valid or if \a image_format is NULL. * - CL_INVALID_IMAGE_SIZE if \a image_width or \a image_height are 0 or if * they exceed values specified in CL_DEVICE_IMAGE2D_MAX_WIDTH or * CL_DEVICE_IMAGE2D_MAX_HEIGHT respectively or if values specified by * \a image_row_pitch do not follow rules described in the argument * description above. * - CL_INVALID_HOST_PTR if \a host_ptr is NULL and CL_MEM_USE_HOST_PTR or * CL_MEM_COPY_HOST_PTR are set in \a flags or if \a host_ptr is not NULL * but CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in \a flags. * - CL_IMAGE_FORMAT_NOT_SUPPORTED if the \a image_format is not supported. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for image object. * - CL_INVALID_OPERATION if the image object as specified by the * \a image_format, \a flags and dimensions cannot be created for all devices * in context that support images or if there are no devices in context that * support images. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_mem, clCreateImage2D, (cl_context context, cl_mem_flags flags, const cl_image_format* image_format, size_t image_width, size_t image_height, size_t image_row_pitch, void* host_ptr, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return (cl_mem)0; } // check flags for validity if (!validateFlags(flags)) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return (cl_mem)0; } // check format if (image_format == NULL) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter \"image_format\""); return (cl_mem)0; } const amd::Image::Format imageFormat(*image_format); if (!imageFormat.isValid()) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter \"image_format\""); return (cl_mem)0; } amd::Context& amdContext = *as_amd(context); if (!imageFormat.isSupported(amdContext)) { *not_null(errcode_ret) = CL_IMAGE_FORMAT_NOT_SUPPORTED; LogWarning("invalid parameter \"image_format\""); return (cl_mem)0; } // check size parameters if (image_width == 0 || image_height == 0) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter \"image_width\" or \"image_height\""); return (cl_mem)0; } const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; bool sizePass = false; for (auto& dev : devices) { if (dev->info().imageSupport_) { supportPass = true; if (dev->info().image2DMaxWidth_ >= image_width && dev->info().image2DMaxHeight_ >= image_height) { sizePass = true; break; } } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return (cl_mem)0; } if (!sizePass) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter \"image_width\" or \"image_height\""); return (cl_mem)0; } // check row pitch rules if (host_ptr == NULL) { if (image_row_pitch) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter \"image_row_pitch\""); return (cl_mem)0; } } else if (image_row_pitch) { size_t elemSize = imageFormat.getElementSize(); if ((image_row_pitch < image_width * elemSize) || (image_row_pitch % elemSize)) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter \"image_row_pitch\""); return (cl_mem)0; } } // check host_ptr consistency if (host_ptr == NULL) { if (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR)) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"host_ptr\""); return (cl_mem)0; } } else { if (!(flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR))) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"host_ptr\""); return (cl_mem)0; } } // CL_IMAGE_FORMAT_NOT_SUPPORTED ??? if (image_row_pitch == 0) { image_row_pitch = image_width * imageFormat.getElementSize(); } amd::Image* image = new (amdContext) amd::Image(amdContext, CL_MEM_OBJECT_IMAGE2D, flags, imageFormat, image_width, image_height, 1, image_row_pitch, 0); if (image == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("cannot allocate resources"); return (cl_mem)0; } // CL_MEM_OBJECT_ALLOCATION_FAILURE if (!image->create(host_ptr)) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; image->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return (cl_mem)as_cl(image); } RUNTIME_EXIT /*! \brief Create a 3D image object. * * \param context is a valid OpenCL context on which the image object is to be * created. * * \param flags is a bit-field that is used to specify allocation and usage * information about the image memory object being created. * * \param image_format is a pointer to a structure that describes format * properties of the image to be allocated. * * \param image_width is the width of the image in pixels. Must be greater * than or equal to 1. * * \param image_height is the height of the image in pixels. Must be greater * than or equal to 1. * * \param image_depth is the depth of the image in pixels. This must be a * value > 1. * * \param image_row_pitch is the scan-line pitch in bytes. This must be 0 if * \a host_ptr is NULL and can be either 0 or >= \a image_width * size of * element in bytes if \a host_ptr is not NULL. If \a host_ptr is not NULL and * \a image_row_pitch = 0, \a image_row_pitch is calculated as * \a image_width * size of element in bytes. * * \param image_slice_pitch is the size in bytes of each 2D slice in the 3D * image. This must be 0 if \a host_ptr is NULL and can be either 0 or >= * \a image_row_pitch * \a image_height if \a host_ptr is not NULL. * If \a host_ptr is not NULL and \a image_slice_pitch = 0, * \a image_slice_pitch is calculated as \a image_row_pitch * \a image_height. * * \param host_ptr is a pointer to the image data that may already be allocated * by the application. The size of the buffer that \a host_ptr points to must * be >= \a image_row_pitch * \a image_height * \a image_depth. The size of * each element in bytes must be a power of 2. Passing in a pointer to an * already allocated buffer on the host and using it as a memory object allows * applications to share data efficiently with kernels and the host. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return valid non-zero image object created and the \a errcode_ret is set to * CL_SUCCESS if the image object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if values specified in \a flags are not valid. * - CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if values specified in \a image_format * are not valid or if \a image_format is NULL. * - CL_INVALID_IMAGE_SIZE if \a image_width, \a image_height or \a image_depth * are 0 or if they exceed values specified in CL_DEVICE_IMAGE3D_MAX_WIDTH, * CL_DEVICE_IMAGE3D_MAX_HEIGHT or CL_DEVICE_IMAGE3D_MAX_DEPTH respectively * or if values specified by \a image_row_pitch and \a image_slice_pitch do * not follow rules described in the argument description above. * - CL_INVALID_HOST_PTR if \a host_ptr is NULL and CL_MEM_USE_HOST_PTR or * CL_MEM_COPY_HOST_PTR are set in \a flags or if \a host_ptr is not NULL but * CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in \a flags. * - CL_IMAGE_FORMAT_NOT_SUPPORTED if the \a image_format is not supported. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for image object. * - CL_INVALID_OPERATION if the image object as specified by the * \a image_format, \a flags and dimensions cannot be created for all devices * in context that support images, or if there are no devices in context that * support images. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_mem, clCreateImage3D, (cl_context context, cl_mem_flags flags, const cl_image_format* image_format, size_t image_width, size_t image_height, size_t image_depth, size_t image_row_pitch, size_t image_slice_pitch, void* host_ptr, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return (cl_mem)0; } // check flags for validity if (!validateFlags(flags)) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return (cl_mem)0; } // check format if (image_format == NULL) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter \"image_format\""); return (cl_mem)0; } amd::Image::Format imageFormat(*image_format); if (!imageFormat.isValid()) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter \"image_format\""); return (cl_mem)0; } amd::Context& amdContext = *as_amd(context); if (!imageFormat.isSupported(amdContext)) { *not_null(errcode_ret) = CL_IMAGE_FORMAT_NOT_SUPPORTED; LogWarning("invalid parameter \"image_format\""); return (cl_mem)0; } // check size parameters if (image_width == 0 || image_height == 0 || image_depth <= 1) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid size parameter(s)"); return (cl_mem)0; } const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; bool sizePass = false; for (auto& dev : devices) { if (dev->info().imageSupport_) { supportPass = true; if ((dev->info().image3DMaxWidth_ >= image_width) && (dev->info().image3DMaxHeight_ >= image_height) && (dev->info().image3DMaxDepth_ >= image_depth)) { sizePass = true; break; } } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return (cl_mem)0; } if (!sizePass) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid size parameter(s)"); return (cl_mem)0; } // check row pitch rules if (host_ptr == NULL) { if (image_row_pitch) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter \"image_row_pitch\""); return (cl_mem)0; } } else if (image_row_pitch) { size_t elemSize = imageFormat.getElementSize(); if ((image_row_pitch < image_width * elemSize) || (image_row_pitch % elemSize)) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter \"image_row_pitch\""); return (cl_mem)0; } } // check slice pitch if (host_ptr == NULL) { if (image_slice_pitch) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter \"image_row_pitch\""); return (cl_mem)0; } } else if (image_slice_pitch) { size_t elemSize = imageFormat.getElementSize(); if ((image_slice_pitch < image_row_pitch * image_height) || (image_slice_pitch % image_row_pitch)) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter \"image_row_pitch\""); return (cl_mem)0; } } // check host_ptr consistency if (host_ptr == NULL) { if (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR)) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"host_ptr\""); return (cl_mem)0; } } else { if (!(flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR))) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter \"host_ptr\""); return (cl_mem)0; } } // CL_IMAGE_FORMAT_NOT_SUPPORTED ??? if (image_row_pitch == 0) { image_row_pitch = image_width * imageFormat.getElementSize(); } if (image_slice_pitch == 0) { image_slice_pitch = image_row_pitch * image_height; } amd::Image* image = new (amdContext) amd::Image(amdContext, CL_MEM_OBJECT_IMAGE3D, flags, imageFormat, image_width, image_height, image_depth, image_row_pitch, image_slice_pitch); if (image == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("cannot allocate resources"); return (cl_mem)0; } // CL_MEM_OBJECT_ALLOCATION_FAILURE if (!image->create(host_ptr)) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; image->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return (cl_mem)as_cl(image); } RUNTIME_EXIT /*! @} * \addtogroup CL_QueryImageFormat * @{ */ /*! \brief Get the list of supported image formats. * * \param context is a valid OpenCL context on which the image object(s) will * be created. * * \param flags is a bit-field that is used to specify allocation and usage * information about the image memory object being created. * * \param image_type describes the image type and must be either * CL_MEM_OBJECT_IMAGE1D, CL_MEM_OBJECT_IMAGE1D_BUFFER, CL_MEM_OBJECT_IMAGE2D, * CL_MEM_OBJECT_IMAGE3D, CL_MEM_OBJECT_IMAGE1D_ARRAY or * CL_MEM_OBJECT_IMAGE2D_ARRAY. * * \param num_entries specifies the number of entries that can be returned in * the memory location given by \a image_formats. * * \param image_formats is a pointer to a memory location where the list of * supported image formats are returned. Each entry describes a cl_image_format * structure supported by the runtime. If \a image_formats is NULL, it is * ignored. * * \param num_image_formats is the actual number of supported image formats for * a specific context and values specified by \a flags. If \a num_image_formats * is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_CONTEXT if \a context is not a valid context * - CL_INVALID_VALUE if \a flags or \a image_type are not valid, or if * \a num_entries is 0 and \a image_formats is not NULL * * \version 1.2r08 */ RUNTIME_ENTRY(cl_int, clGetSupportedImageFormats, (cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format* image_formats, cl_uint* num_image_formats)) { if (!is_valid(context)) { LogWarning("invalid parameter \"context\""); return CL_INVALID_CONTEXT; } // check flags for validity if (!validateFlags(flags, true)) { LogWarning("invalid parameter \"flags\""); return CL_INVALID_VALUE; } // chack image_type switch (image_type) { case CL_MEM_OBJECT_IMAGE1D_BUFFER: case CL_MEM_OBJECT_IMAGE1D: case CL_MEM_OBJECT_IMAGE1D_ARRAY: case CL_MEM_OBJECT_IMAGE2D: case CL_MEM_OBJECT_IMAGE2D_ARRAY: case CL_MEM_OBJECT_IMAGE3D: break; default: LogWarning("invalid parameter \"image_type\""); return CL_INVALID_VALUE; } if (num_entries == 0 && image_formats != NULL) { LogWarning("invalid parameter \"num_entries\""); return CL_INVALID_VALUE; } const amd::Context& amdContext = *as_amd(context); if (image_formats != NULL) { amd::Image::getSupportedFormats(amdContext, image_type, num_entries, image_formats, flags); } if (num_image_formats != NULL) { *num_image_formats = amd::Image::numSupportedFormats(amdContext, image_type, flags); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_ReadWriteImage * @{ */ /*! \brief Enqueue a command to read from a 2D or 3D image object to host memory * * \param command_queue refers to the command-queue in which the read * command will be queued. \a command_queue and \a image must be created with * the same OpenCL context. * * \param image refers to a valid 2D or 3D image object. * * \param blocking_read indicates if the read is blocking or nonblocking. If * \a blocking_read is CL_TRUE i.e. the read command is blocking, * clEnqueueReadImage does not return until the buffer data has been read and * copied into memory pointed to by \a ptr. If \a blocking_read is CL_FALSE * i.e. the read command is non-blocking, clEnqueueReadImage queues a * non-blocking read command and returns. The contents of the buffer that * \a ptr points to cannot be used until the read command has completed. * The \a event argument returns an event object which can be used to query the * execution status of the read command. When the read command has completed, * the contents of the buffer that ptr points to can be used by the application * * \param origin defines the (x, y, z) offset in the image from where to read * or write. If image is a 2D image object, the z value given by origin[2] must * be 0. * * \param region defines the (width, height, depth) of the 2D or 3D rectangle * being read or written. If image is a 2D image object, the depth value given * by region[2] must be 1. * * \param row_pitch in clEnqueueReadImage is the length of each row in bytes. * This value must be greater than or equal to the element size in bytes * width. If \a row_pitch is set to 0, the appropriate row pitch is calculated * based on the size of each element in bytes multiplied by width. * * \param slice_pitch in clEnqueueReadImage clEnqueueWriteImage is the size * in bytes of the 2D slice of the 3D region of a 3D image being read or * written respectively. This must be 0 if image is a 2D image. This value * must be greater than or equal to row_pitch * height. If \a slice_pitch is * set to 0, the appropriate slice pitch is calculated based on the * \a row_pitch * \a height. * * \param ptr is the pointer to a buffer in host memory where image data is * to be read from. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, then this * particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular read * command and can be used to query or queue a wait for this particular command * to complete. \a event can be NULL in which case it will not be possible for * the application to query the status of this command or queue a wait for this * command to complete. * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with \a command_queue and * \a image are not the same. * - CL_INVALID_MEM_OBJECT if \a image is not a valid image object. * - CL_INVALID_VALUE if the region being read specified by \a origin and * \a region is out of bounds or if \a ptr is a NULL value. * - CL_INVALID_VALUE if \a image is a 2D image object and \a origin[2] is not * equal to 0 or \a region[2] is not equal to 1 or \a slice_pitch is not * equal to 0. * - CL_INVALID_OPERATION if \a clEnqueueReadImage is called on image which * has been created with CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_INVALID_VALUE if blocking_read is CL_FALSE and \a event is NULL. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clEnqueueReadImage, (cl_command_queue command_queue, cl_mem image, cl_bool blocking_read, const size_t* origin, const size_t* region, size_t row_pitch, size_t slice_pitch, void* ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(image)) { return CL_INVALID_MEM_OBJECT; } amd::Image* srcImage = as_amd(image)->asImage(); if (srcImage == NULL) { return CL_INVALID_MEM_OBJECT; } if (srcImage->getMemFlags() & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) { return CL_INVALID_OPERATION; } if (srcImage->getImageFormat().image_channel_order == CL_DEPTH_STENCIL) { return CL_INVALID_OPERATION; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcImage->getContext()) { return CL_INVALID_CONTEXT; } if (ptr == NULL) { return CL_INVALID_VALUE; } amd::Coord3D srcOrigin(origin[0], origin[1], origin[2]); amd::Coord3D srcRegion(region[0], region[1], region[2]); ImageViewRef mip; if (srcImage->getMipLevels() > 1) { // Create a view for the specified mip level mip = srcImage->createView(srcImage->getContext(), srcImage->getImageFormat(), NULL, origin[srcImage->getDims()]); if (mip() == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Reset the mip level value to 0, since a view was created if (srcImage->getDims() < 3) { srcOrigin.c[srcImage->getDims()] = 0; } srcImage = mip(); } if (!srcImage->validateRegion(srcOrigin, srcRegion) || !srcImage->isRowSliceValid(row_pitch, slice_pitch, region[0], region[1])) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::CopyMetadata copyMetadata(!blocking_read, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::ReadMemoryCommand* command = new amd::ReadMemoryCommand(hostQueue, CL_COMMAND_READ_IMAGE, eventWaitList, *srcImage, srcOrigin, srcRegion, ptr, row_pitch, slice_pitch, copyMetadata); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); if (blocking_read) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueue a command to write to a 2D or 3D image object from host * memory * * \param command_queue refers to the command-queue in which the write * command will be queued. \a command_queue and \a image must be created with * the same OpenCL context. * * \param image refers to a valid 2D or 3D image object. * * \param blocking_write indicates if the write operation is blocking or * nonblocking. If blocking_write is CL_TRUE, the OpenCL implementation copies * the data referred to by \a ptr and enqueues the write command in the * command-queue. The memory pointed to by ptr can be reused by the application * after the clEnqueueWriteImage call returns. If blocking_write is CL_FALSE, * the OpenCL implementation will use ptr to perform a nonblocking write. As * the write is non-blocking the implementation can return immediately. The * memory pointed to by ptr cannot be reused by the application after the call * returns. The event argument returns an event object which can be used to * query the execution status of the write command. When the write command has * completed, the memory pointed to by ptr can then be reused by the * application. * * \param origin defines the (x, y, z) offset in the image from where to read * or write. If image is a 2D image object, the z value given by origin[2] must * be 0. * * \param region defines the (width, height, depth) of the 2D or 3D rectangle * being read or written. If image is a 2D image object, the depth value given * by region[2] must be 1. * * \param input_row_pitch in is the length of each row in bytes. * This value must be greater than or equal to the element size in bytes * width. If \a input_row_pitch is set to 0, the appropriate row pitch is * calculated based on the size of each element in bytes multiplied by width. * * \param input_slice_pitch is the size * in bytes of the 2D slice of the 3D region of a 3D image being read or * written respectively. This must be 0 if image is a 2D image. This value * must be greater than or equal to input_row_pitch * height. If * \a input_slice_pitch is set to 0, the appropriate slice pitch is calculated * based on the \a input_row_pitch * \a height. * * \param ptr is the pointer to a buffer in host memory where image data is * to be written to. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, then this * particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular write * command and can be used to query or queue a wait for this particular command * to complete. \a event can be NULL in which case it will not be possible for * the application to query the status of this command or queue a wait for this * command to complete. * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with \a command_queue and * \a image are not the same. * - CL_INVALID_MEM_OBJECT if \a image is not a valid image object. * - CL_INVALID_VALUE if the region being written specified by \a origin and * \a region is out of bounds or if \a ptr is a NULL value. * - CL_INVALID_VALUE if \a image is a 2D image object and \a origin[2] is not * equal to 0 or \a region[2] is not equal to 1 or \a slice_pitch is not * equal to 0. * - CL_INVALID_OPERATION if \a clEnqueueWriteImage is called on image which * has been created with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_INVALID_VALUE if blocking_write is CL_FALSE and \a event is NULL. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueWriteImage, (cl_command_queue command_queue, cl_mem image, cl_bool blocking_write, const size_t* origin, const size_t* region, size_t input_row_pitch, size_t input_slice_pitch, const void* ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(image)) { return CL_INVALID_MEM_OBJECT; } amd::Image* dstImage = as_amd(image)->asImage(); if (dstImage == NULL) { return CL_INVALID_MEM_OBJECT; } if (dstImage->getMemFlags() & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) { return CL_INVALID_OPERATION; } if (dstImage->getImageFormat().image_channel_order == CL_DEPTH_STENCIL) { return CL_INVALID_OPERATION; } if (dstImage->getDims() == 2 && origin[2] != 0 && dstImage->getMipLevels() == 0) { return CL_INVALID_VALUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != dstImage->getContext()) { return CL_INVALID_CONTEXT; } if (ptr == NULL) { return CL_INVALID_VALUE; } amd::Coord3D dstOrigin(origin[0], origin[1], origin[2]); amd::Coord3D dstRegion(region[0], region[1], region[2]); ImageViewRef mip; if (dstImage->getMipLevels() > 1) { // Create a view for the specified mip level mip = dstImage->createView(dstImage->getContext(), dstImage->getImageFormat(), NULL, origin[dstImage->getDims()]); if (mip() == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Reset the mip level value to 0, since a view was created if (dstImage->getDims() < 3) { dstOrigin.c[dstImage->getDims()] = 0; } dstImage = mip(); } if (!dstImage->validateRegion(dstOrigin, dstRegion) || !dstImage->isRowSliceValid(input_row_pitch, input_slice_pitch, region[0], region[1])) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::CopyMetadata copyMetadata(!blocking_write, amd::CopyMetadata::CopyEnginePreference::SDMA); amd::WriteMemoryCommand* command = new amd::WriteMemoryCommand(hostQueue, CL_COMMAND_WRITE_IMAGE, eventWaitList, *dstImage, dstOrigin, dstRegion, ptr, input_row_pitch, input_slice_pitch, copyMetadata); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); if (blocking_write) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueue a command to copy image objects. * * \param command_queue refers to the command-queue in which the copy command * will be queued. The OpenCL context associated with \a command_queue, * \a src_image and \a dst_image must be the same. * * \param src_image is the source image object. * * \param dst_image is the destination image object. * * \param src_origin defines the starting (x, y, z) location in \a src_image * from where to start the data copy. If \a src_image is a 2D image object, * the z value given by \a src_origin[2] must be 0. * * \param dst_origin defines the starting (x, y, z) location in \a dst_image * from where to start the data copy. If \a dst_image is a 2D image object, * the z value given by \a dst_origin[2] must be 0. * * \param region defines the (width, height, depth) of the 2D or 3D rectangle * to copy. If \a src_image or \a dst_image is a 2D image object, the depth * value given by \a region[2] must be 1. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular copy * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue * a wait for this command to complete. clEnqueueBarrier can be used instead. * It is currently a requirement that the \a src_image and \a dst_image image * memory objects for clEnqueueCopyImage must have the exact image format * (i.e. channel order and channel data type must match). * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with \a command_queue, * \a src_image and \a dst_image are not the same. * - CL_INVALID_MEM_OBJECT if \a src_image and \a dst_image are not valid image * objects. * - CL_IMAGE_FORMAT_MISMATCH if src_image and dst_image do not use the same * image format. * - CL_INVALID_VALUE if the 2D or 3D rectangular region specified by * \a src_origin and \a src_origin + \a region refers to a region outside * \a src_image, or if the 2D or 3D rectangular region specified by * \a dst_origin and \a dst_origin + \a region refers to a region outside * \a dst_image. * - CL_INVALID_VALUE if \a src_image is a 2D image object and \a origin[2] is * not equal to 0 or \a region[2] is not equal to 1. * - CL_INVALID_VALUE if \a dst_image is a 2D image object and \a dst_origin[2] * is not equal to 0 or \a region[2] is not equal to 1. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueCopyImage, (cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image, const size_t* src_origin, const size_t* dst_origin, const size_t* region, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(src_image) || !is_valid(dst_image)) { return CL_INVALID_MEM_OBJECT; } amd::Image* srcImage = as_amd(src_image)->asImage(); amd::Image* dstImage = as_amd(dst_image)->asImage(); amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcImage->getContext() || hostQueue.context() != dstImage->getContext()) { return CL_INVALID_CONTEXT; } if (srcImage->getImageFormat() != dstImage->getImageFormat()) { return CL_IMAGE_FORMAT_MISMATCH; } if (srcImage->getImageFormat().image_channel_order == CL_DEPTH_STENCIL) { return CL_INVALID_OPERATION; } amd::Coord3D srcOrigin(src_origin[0], src_origin[1], src_origin[2]); amd::Coord3D dstOrigin(dst_origin[0], dst_origin[1], dst_origin[2]); amd::Coord3D copyRegion(region[0], region[1], region[2]); ImageViewRef srcMip; if (srcImage->getMipLevels() > 1) { // Create a view for the specified mip level srcMip = srcImage->createView(srcImage->getContext(), srcImage->getImageFormat(), NULL, src_origin[srcImage->getDims()]); if (srcMip() == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Reset the mip level value to 0, since a view was created if (srcImage->getDims() < 3) { srcOrigin.c[srcImage->getDims()] = 0; } srcImage = srcMip(); } if (!srcImage->validateRegion(srcOrigin, copyRegion)) { return CL_INVALID_VALUE; } ImageViewRef dstMip; if (dstImage->getMipLevels() > 1) { // Create a view for the specified mip level dstMip = dstImage->createView(dstImage->getContext(), dstImage->getImageFormat(), NULL, dst_origin[dstImage->getDims()]); if (dstMip() == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Reset the mip level value to 0, since a view was created if (dstImage->getDims() < 3) { dstOrigin.c[dstImage->getDims()] = 0; } dstImage = dstMip(); } if (!dstImage->validateRegion(dstOrigin, copyRegion)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } if (src_image == dst_image) { if ((src_origin[0] <= dst_origin[0] && dst_origin[0] < src_origin[0] + region[0]) || (dst_origin[0] <= src_origin[0] && src_origin[0] < dst_origin[0] + region[0]) || (src_origin[1] <= dst_origin[1] && dst_origin[1] < src_origin[1] + region[1]) || (dst_origin[1] <= src_origin[1] && src_origin[1] < dst_origin[1] + region[1])) { return CL_MEM_COPY_OVERLAP; } if (srcImage->getDims() > 2) { if ((src_origin[2] <= dst_origin[2] && dst_origin[2] < src_origin[2] + region[2]) || (dst_origin[2] <= src_origin[2] && src_origin[2] < dst_origin[2] + region[2])) { return CL_MEM_COPY_OVERLAP; } } } amd::CopyMemoryCommand* command = new amd::CopyMemoryCommand(hostQueue, CL_COMMAND_COPY_IMAGE, eventWaitList, *srcImage, *dstImage, srcOrigin, dstOrigin, copyRegion); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_CopyingImageBuffer * @{ */ /*! \brief Enqueue a command to copy an image object to a buffer object. * * \param command_queue must be a valid command-queue. The OpenCL context * associated with \a command_queue, \a src_image and \a dst_buffer must be * the same. * * \param src_image is a valid image object. * * \param dst_buffer is a valid buffer object. * * \param src_origin defines the (x, y, z) offset in the image from where to * copy. If \a src_image is a 2D image object, the z value given by * \a src_origin[2] must be 0. * * \param region defines the (width, height, depth) of the 2D or 3D rectangle * to copy. If \a src_image is a 2D image object, the depth value given by * \a region[2] must be 1. * * \param dst_offset refers to the offset where to begin copying data in * \a dst_buffer. The size in bytes of the region to be copied referred to as * \a dst_cb is computed as width * height * depth * bytes/image element if * \a src_image is a 3D image object and is computed as * width * height * bytes/image element if \a src_image is a 2D image object. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, then this * particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular copy * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue a * wait for this command to complete. clEnqueueBarrier can be used instead. * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with \a command_queue, * \a src_image and \a dst_buffer are not the same. * - CL_INVALID_MEM_OBJECT if \a src_image is not a valid image object or * \a dst_buffer is not a valid buffer object. * - CL_INVALID_VALUE if the 2D or 3D rectangular region specified by * \a src_origin and \a src_origin + \a region refers to a region outside * \a src_image, or if the region specified by \a dst_offset and * \a dst_offset + \a dst_cb to a region outside \a dst_buffer. * - CL_INVALID_VALUE if \a src_image is a 2D image object and \a src_origin[2] * is not equal to 0 or \a region[2] is not equal to 1. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueCopyImageToBuffer, (cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buffer, const size_t* src_origin, const size_t* region, size_t dst_offset, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(src_image) || !is_valid(dst_buffer)) { return CL_INVALID_MEM_OBJECT; } amd::Image* srcImage = as_amd(src_image)->asImage(); amd::Buffer* dstBuffer = as_amd(dst_buffer)->asBuffer(); if (srcImage == NULL || dstBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcImage->getContext() || hostQueue.context() != dstBuffer->getContext()) { return CL_INVALID_CONTEXT; } if (srcImage->getImageFormat().image_channel_order == CL_DEPTH_STENCIL) { return CL_INVALID_OPERATION; } amd::Coord3D srcOrigin(src_origin[0], src_origin[1], src_origin[2]); amd::Coord3D dstOffset(dst_offset, 0, 0); amd::Coord3D srcRegion(region[0], region[1], region[2]); amd::Coord3D copySize( region[0] * region[1] * region[2] * srcImage->getImageFormat().getElementSize(), 0, 0); ImageViewRef mip; if (srcImage->getMipLevels() > 1) { // Create a view for the specified mip level mip = srcImage->createView(srcImage->getContext(), srcImage->getImageFormat(), NULL, src_origin[srcImage->getDims()]); if (mip() == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Reset the mip level value to 0, since a view was created if (srcImage->getDims() < 3) { srcOrigin.c[srcImage->getDims()] = 0; } srcImage = mip(); } if (!srcImage->validateRegion(srcOrigin, srcRegion) || !dstBuffer->validateRegion(dstOffset, copySize)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::CopyMemoryCommand* command = new amd::CopyMemoryCommand(hostQueue, CL_COMMAND_COPY_IMAGE_TO_BUFFER, eventWaitList, *srcImage, *dstBuffer, srcOrigin, dstOffset, srcRegion); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueue a command to copy a buffer object to an image object. * * \param command_queue must be a valid command-queue. The OpenCL context * associated with \a command_queue, \a src_buffer and \a dst_image must be * the same. * * \param src_buffer is a valid buffer object. * * \param dst_image is a valid image object. * * \param src_offset refers to the offset where to begin copying data in * \a src_buffer. * * \param dst_origin defines the (x, y, z) offset in the image from where to * copy. If \a dst_image is a 2D image object, the z value given by * \a dst_origin[2] must be 0. * * \param region defines the (width, height, depth) of the 2D or 3D rectangle * to copy. If dst_image is a 2D image object, the depth value given by * \a region[2] must be 1. The size in bytes of the region to be copied from * \a src_buffer referred to as \a src_cb is computed as * width * height * depth * bytes/image element if \a dst_image is a 3D image * object and is computed as width * height * bytes/image element if * \a dst_image is a 2D image object. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular copy * command and can be used to query or queue a wait for this particular command * to complete. \a event can be NULL in which case it will not be possible for * the application to query the status of this command or queue a wait for * this command to complete. clEnqueueBarrier can be used instead. * * \return CL_SUCCESS if the function is executed successfully. Otherwise it * returns one of the following errors: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with \a command_queue, * \a src_buffer and \a dst_image are not the same. * - CL_INVALID_MEM_OBJECT if \a src_buffer is not a valid buffer object or * \a dst_image is not a valid image object. * - CL_INVALID_VALUE if the 2D or 3D rectangular region specified by * \a dst_origin and \a dst_origin + \a region refers to a region outside * \a dst_image, or if the region specified by \a src_offset and * \a src_offset + \a src_cb to a region outside \a src_buffer. * - CL_INVALID_VALUE if \a dst_image is a 2D image object and \a dst_origin[2] * is not equal to 0 or \a region[2] is not equal to 1. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in * \a event_wait_list are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueCopyBufferToImage, (cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_image, size_t src_offset, const size_t* dst_origin, const size_t* region, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(src_buffer) || !is_valid(dst_image)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* srcBuffer = as_amd(src_buffer)->asBuffer(); amd::Image* dstImage = as_amd(dst_image)->asImage(); if (srcBuffer == NULL || dstImage == NULL) { return CL_INVALID_MEM_OBJECT; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcBuffer->getContext() || hostQueue.context() != dstImage->getContext()) { return CL_INVALID_CONTEXT; } if (dstImage->getImageFormat().image_channel_order == CL_DEPTH_STENCIL) { return CL_INVALID_OPERATION; } amd::Coord3D dstOrigin(dst_origin[0], dst_origin[1], dst_origin[2]); amd::Coord3D srcOffset(src_offset, 0, 0); amd::Coord3D dstRegion(region[0], region[1], region[2]); amd::Coord3D copySize( region[0] * region[1] * region[2] * dstImage->getImageFormat().getElementSize(), 0, 0); ImageViewRef mip; if (dstImage->getMipLevels() > 1) { // Create a view for the specified mip level mip = dstImage->createView(dstImage->getContext(), dstImage->getImageFormat(), NULL, dst_origin[dstImage->getDims()]); if (mip() == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Reset the mip level value to 0, since a view was created if (dstImage->getDims() < 3) { dstOrigin.c[dstImage->getDims()] = 0; } dstImage = mip(); } if (!srcBuffer->validateRegion(srcOffset, copySize) || !dstImage->validateRegion(dstOrigin, dstRegion)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::CopyMemoryCommand* command = new amd::CopyMemoryCommand(hostQueue, CL_COMMAND_COPY_BUFFER_TO_IMAGE, eventWaitList, *srcBuffer, *dstImage, srcOffset, dstOrigin, dstRegion); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_MapUnmap * @{ */ /*! \brief Enqueue a command to map a region of a buffer object into the * host address. * * \param command_queue must be a valid command-queue. * * \param blocking_map indicates if the map operation is blocking or * non-blocking. If \a blocking_map is CL_TRUE, clEnqueueMapBuffer does not * return until the specified region in \a buffer can be mapped. If * \a blocking_map is CL_FALSE i.e. map operation is non-blocking, the pointer * to the mapped region returned by clEnqueueMapBuffer cannot be used until the * map command has completed. The event argument returns an event object which * can be used to query the execution status of the map command. When the map * command is completed, the application can access the contents of the mapped * region using the pointer returned by clEnqueueMapBuffer. * * \param map_flags is a bit-field and can be set to CL_MAP_READ to indicate * that the region specified by (\a offset, \a cb) in the buffer object is * being mapped for reading, and/or CL_MAP_WRITE to indicate that the region * specified by (\a offset, \a cb) in the buffer object is being mapped for * writing. * * \param buffer is a valid buffer object. The OpenCL context associated with * \a command_queue and \a buffer must be the same. * * \param offset is the offset in bytes of the region in the buffer object * that is being mapped * * \param cb is the size in bytes of the region in the buffer object that * is being mapped. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue * a wait for this command to complete. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A pointer to the mapped region if buffer is a memory object * created with clCreateBuffer and the region specified by (offset , cb) * is a valid region in the buffer object and is successfully mapped into the * host address space . The \a errcode_ret is set to CL_SUCCESS. * A NULL pointer is returned otherwise with one of the following error values * returned in \a errcode_ret: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if context associated with \a command_queue and * \a buffer are not the same. * - CL_INVALID_MEM_OBJECT if \a buffer is not a valid buffer object. * - CL_INVALID_OPERATION if buffer has been created with * CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_READ * is set in map_flags or if buffer has been created with * CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_WRITE or * CL_MAP_WRITE_INVALIDATE_REGION is set in map_flags. * - CL_INVALID_VALUE if region being mapped given by (\a offset, \a cb) is out * of bounds or if values specified in \a map_flags are not valid. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in * \a event_wait_list are not valid events. * - CL_MEM_O BJECT_MAP_FAILURE if there is a failure to map the specified * region in the host address space. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * The pointer returned maps a region starting at \a offset and is atleast * \a cb bytes in size. The result of a memory access outside this region is * undefined. * * \version 1.2r07 */ RUNTIME_ENTRY_RET(void*, clEnqueueMapBuffer, (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map, cl_map_flags map_flags, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event, cl_int* errcode_ret)) { if (!is_valid(command_queue)) { *not_null(errcode_ret) = CL_INVALID_COMMAND_QUEUE; return NULL; } if (!is_valid(buffer)) { *not_null(errcode_ret) = CL_INVALID_MEM_OBJECT; return NULL; } amd::Buffer* srcBuffer = as_amd(buffer)->asBuffer(); if (srcBuffer == NULL) { *not_null(errcode_ret) = CL_INVALID_MEM_OBJECT; return NULL; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { *not_null(errcode_ret) = CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcBuffer->getContext()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return NULL; } if ((srcBuffer->getMemFlags() & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) && (map_flags & CL_MAP_READ)) { *not_null(errcode_ret) = CL_INVALID_OPERATION; return NULL; } if ((srcBuffer->getMemFlags() & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) && (map_flags & (CL_MAP_WRITE | CL_MAP_WRITE_INVALIDATE_REGION))) { *not_null(errcode_ret) = CL_INVALID_OPERATION; return NULL; } if (srcBuffer->getMemFlags() & CL_MEM_EXTERNAL_PHYSICAL_AMD) { *not_null(errcode_ret) = CL_INVALID_OPERATION; return NULL; } amd::Coord3D srcOffset(offset); amd::Coord3D srcSize(cb); if (!srcBuffer->validateRegion(srcOffset, srcSize)) { *not_null(errcode_ret) = CL_INVALID_VALUE; return NULL; } // Wait for possible pending operations amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { *not_null(errcode_ret) = err; return (void*)0; } // Make sure we have memory for the command execution device::Memory* mem = srcBuffer->getDeviceMemory(hostQueue.device()); if (NULL == mem) { LogPrintfError("Can't allocate memory size - 0x%08X bytes!", srcBuffer->getSize()); *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; return NULL; } // Attempt to allocate the map target now (whether blocking or non-blocking) void* mapPtr = mem->allocMapTarget(srcOffset, srcSize, map_flags); if (NULL == mapPtr) { *not_null(errcode_ret) = CL_MAP_FAILURE; return NULL; } // Allocate a map command for the queue thread amd::MapMemoryCommand* command = new amd::MapMemoryCommand( hostQueue, CL_COMMAND_MAP_BUFFER, eventWaitList, *srcBuffer, map_flags, blocking_map ? true : false, srcOffset, srcSize, nullptr, nullptr, mapPtr); if (command == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return NULL; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; return NULL; } if (srcBuffer->getMemFlags() & CL_MEM_USE_PERSISTENT_MEM_AMD) { // [Windows VidMM restriction] // Runtime can't map persistent memory if it's still busy or // even wasn't submitted to HW from the worker thread yet hostQueue.finish(); } // Send the map command for processing command->enqueue(); // A blocking map has to wait for completion if (blocking_map) { command->awaitCompletion(); } // Save the command event if applicaiton has requested it *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } *not_null(errcode_ret) = CL_SUCCESS; srcBuffer->incMapCount(); return mapPtr; } RUNTIME_EXIT /*! \brief Enqueue a command to map a region in an image object given into * the host address. * * \param command_queue must be a valid command-queue. * * \param image is a valid image object. The OpenCL context associated with * \a command_queue and \a image must be the same. * * \param blocking_map indicates if the map operation is blocking or * non-blocking. If \a blocking_map is CL_TRUE, clEnqueueMapImage does not * return until the specified region in image is mapped. If \a blocking_map is * CL_FALSE i.e. map operation is non-blocking, the pointer to the mapped * region returned by clEnqueueMapImage cannot be used until the map command * has completed. The event argument returns an event object which can be used * to query the execution status of the map command. When the map command is * completed, the application can access the contents of the mapped region * using the pointer returned by clEnqueueMapImage. * * \param map_flags is a bit-field and can be set to CL_MAP_READ to indicate * that the region specified by (\a origin, \a region) in the image object is * being mapped for reading, and/or CL_MAP_WRITE to indicate that the region * specified by (\a origin, \a region) in the image object is being mapped for * writing. * * \param origin defines the (x, y, z) offset in pixels in the image or (x, y) * offset and the image index in the image array. If image is a 2D image * object, origin[2] must be 0. If image is a 1D image or 1D image buffer * object, origin[1] and origin[2] must be 0. If image is a 1D image array * object, origin[2] must be 0. If image is a 1D image array object, origin[1] * describes the image index in the 1D image array. If image is a 2D image * array object, origin[2] describes the image index in the 2D image array. * * \param region defines the (width, height, depth) in pixels of the 1D, 2D or * 3D rectangle or the (width, height) in pixels in pixels of the 1D or 2D * rectangle and the image index of an image array. If image is a 2D image * object, region[2] must be 1. If image is a 1D image or 1D image buffer * object, region[1] and region[2] must be 1. If image is a 1D image array * object, region[1] and region[2] must be 1. If image is a 2D image array * object, region[2] must be 1. * * \param origin define the (x, y, z) offset of the 2D or 3D rectangle region * that is to be mapped. If image is a 2D image object, the z value given by * \a origin[2] must be 0. * * \param region define the (width, height, depth) of the 2D or 3D rectangle * region that is to be mapped. If image is a 2D image object, the depth value * given by \a region[2] must be 1. * * \param image_row_pitch returns the scan-line pitch in bytes for the mapped * region. This must be a non- NULL value. * * \param image_slice_pitch returns the size in bytes of each 2D slice for the * mapped region. For a 2D image this argument is ignored. For a 3D image this * must be a non-NULL value. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before * clEnqueueMapImage can be executed. If \a event_wait_list is NULL, then * clEnqueueMapImage does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for the * application to query the status of this command or queue a wait for this * command to complete * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A pointer to the mapped region if image is a memory object * created with clCreateImage {2D|3D}, and the 2D or 3D rectangle specified * by origin and region is a valid region in the image object and can be * mapped into the host address space. * The \a errcode_ret is set to CL_SUCCESS. A NULL pointer is returned * otherwise with one of the following error values returned in \a errcode_ret: * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if context associated with \a command_queue and * \a image are not the same. * - CL_INVALID_MEM_OBJECT if \a image is not a valid image object. * - CL_INVALID_VALUE if region being mapped given by * (\a origin, \a origin + \a region) is out of bounds or if values * specified in \a map_flags are not valid. * - CL_INVALID_VALUE if values in origin and region do not follow rules * described in the argument description for origin and region. * - CL_INVALID_VALUE if \a image is a 2D image object and \a origin[2] is not * equal to 0 or \a region[2] is not equal to 1. * - CL_INVALID_VALUE if \a image_row_pitch is NULL. * - CL_INVALID_VALUE if \a image is a 3D image object and \a image_slice_pitch * is NULL. * - CL_INVALID_IMAGE_FORMAT if image format (image channel order and data * type) for image are not supported by device associated with queue. * - CL_INVALID_OPERATION if buffer has been created with * CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_READ * is set in map_flags or if buffer has been created with * CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_WRITE or * CL_MAP_WRITE_INVALIDATE_REGION is set in map_flags. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_MEM_OBJECT_MAP_FAILURE if there is a failure to map the specified * region in the host address space. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * The pointer returned maps a 2D or 3D region starting at origin and is * at least (\a image_row_pitch * \a region[1] + \a region[0]) pixels in size * for a 2D image, and is at least (\a image_slice_pitch * \a region[2] + * \a image_row_pitch * \a region[1] + \a region[0]) pixels in size for a 3D * image. The result of a memory access outside this region is undefined. * * \version 1.2r07 */ RUNTIME_ENTRY_RET(void*, clEnqueueMapImage, (cl_command_queue command_queue, cl_mem image, cl_bool blocking_map, cl_map_flags map_flags, const size_t* origin, const size_t* region, size_t* image_row_pitch, size_t* image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event, cl_int* errcode_ret)) { if (!is_valid(command_queue)) { *not_null(errcode_ret) = CL_INVALID_COMMAND_QUEUE; return NULL; } if (!is_valid(image)) { *not_null(errcode_ret) = CL_INVALID_MEM_OBJECT; return NULL; } amd::Image* srcImage = as_amd(image)->asImage(); if (srcImage == NULL) { *not_null(errcode_ret) = CL_INVALID_MEM_OBJECT; return NULL; } if (srcImage->getImageFormat().image_channel_order == CL_DEPTH_STENCIL) { *not_null(errcode_ret) = CL_INVALID_OPERATION; return NULL; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { *not_null(errcode_ret) = CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != srcImage->getContext()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return NULL; } if ((srcImage->getMemFlags() & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) && (map_flags & CL_MAP_READ)) { *not_null(errcode_ret) = CL_INVALID_OPERATION; return NULL; } if ((srcImage->getMemFlags() & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) && (map_flags & (CL_MAP_WRITE | CL_MAP_WRITE_INVALIDATE_REGION))) { *not_null(errcode_ret) = CL_INVALID_OPERATION; return NULL; } if ((srcImage->getDims() == 1) && ((region[1] != 1) || (region[2] != 1))) { *not_null(errcode_ret) = CL_INVALID_VALUE; return NULL; } if ((srcImage->getDims() == 2) && (region[2] != 1)) { *not_null(errcode_ret) = CL_INVALID_VALUE; return NULL; } amd::Coord3D srcOrigin(origin[0], origin[1], origin[2]); amd::Coord3D srcRegion(region[0], region[1], region[2]); ImageViewRef mip; if (srcImage->getMipLevels() > 1) { // Create a view for the specified mip level mip = srcImage->createView(srcImage->getContext(), srcImage->getImageFormat(), hostQueue.vdev(), origin[srcImage->getDims()]); if (mip() == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return NULL; } // Reset the mip level value to 0, since a view was created if (srcImage->getDims() < 3) { srcOrigin.c[srcImage->getDims()] = 0; } srcImage->incMapCount(); srcImage = mip(); // Retain this view until unmap is done srcImage->retain(); } if (!srcImage->validateRegion(srcOrigin, srcRegion)) { *not_null(errcode_ret) = CL_INVALID_VALUE; return NULL; } // Wait for possible pending operations amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { *not_null(errcode_ret) = err; return (void*)0; } // Make sure we have memory for the command execution device::Memory* mem = srcImage->getDeviceMemory(hostQueue.device()); if (NULL == mem) { LogPrintfError("Can't allocate memory size - 0x%08X bytes!", srcImage->getSize()); *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; return NULL; } // Attempt to allocate the map target now (whether blocking or non-blocking) void* mapPtr = mem->allocMapTarget(srcOrigin, srcRegion, map_flags, image_row_pitch, image_slice_pitch); if (NULL == mapPtr) { *not_null(errcode_ret) = CL_MAP_FAILURE; return NULL; } // Allocate a map command for the queue thread amd::MapMemoryCommand* command = new amd::MapMemoryCommand( hostQueue, CL_COMMAND_MAP_IMAGE, eventWaitList, *srcImage, map_flags, blocking_map ? true : false, srcOrigin, srcRegion, nullptr, nullptr, mapPtr); if (command == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return NULL; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; return NULL; } if (srcImage->getMemFlags() & CL_MEM_USE_PERSISTENT_MEM_AMD) { // [Windows VidMM restriction] // Runtime can't map persistent memory if it's still busy or // even wasn't submitted to HW from the worker thread yet hostQueue.finish(); } // Send the map command for processing command->enqueue(); // A blocking map has to wait for completion if (blocking_map) { command->awaitCompletion(); } // Save the command event if applicaiton has requested it *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } *not_null(errcode_ret) = CL_SUCCESS; srcImage->incMapCount(); return mapPtr; } RUNTIME_EXIT /*! \brief Enqueue a command to unmap a previously mapped region of a memory i * object. * * Reads or writes from the host using the pointer returned by * clEnqueueMapBuffer or clEnqueueMapImage are considered to be complete. * * \param command_queue must be a valid command-queue. * * \param memobj is a valid memory object. The OpenCL context associated with * \a command_queue and \a memobj must be the same. * * \param mapped_ptr is the host address returned by a previous call to * clEnqueueMapBuffer or clEnqueueMapImage for \a memobj. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before * clEnqueueUnmapMemObject can be executed. If \a event_wait_list is NULL, * then clEnqueueUnmapMemObject does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for the * application to query the status of this command or queue a wait for this * command to complete. clEnqueueBarrier can be used instead. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_INVALID_MEM_OBJECT if \a memobj is not a valid memory object. * - CL_INVALID_VALUE if \a mapped_ptr is not a valid pointer returned by * clEnqueueMapBuffer or clEnqueueMapImage for \a memobj. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or if \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * - CL_INVALID_CONTEXT if context associated with \a command_queue and * \a memobj are not the same. * * clEnqueueMapBuffer and clEnqueueMapImage increments the mapped count of the * memory object. Multiple calls to clEnqueueMapBuffer or clEnqueueMapImage on * the same memory object will increment this mapped count by appropriate number * of calls. clEnqueueUnmapMemObject decrements the mapped count of the memory * object. clEnqueueMapBuffer and clEnqueueMapImage act as synchronization * points for a region of the memory object being mapped. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clEnqueueUnmapMemObject, (cl_command_queue command_queue, cl_mem memobj, void* mapped_ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(memobj)) { return CL_INVALID_MEM_OBJECT; } amd::Memory* amdMemory = as_amd(memobj); amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != amdMemory->getContext()) { return CL_INVALID_CONTEXT; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::UnmapMemoryCommand* command = new amd::UnmapMemoryCommand( hostQueue, CL_COMMAND_UNMAP_MEM_OBJECT, eventWaitList, *amdMemory, mapped_ptr); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } device::Memory* mem = amdMemory->getDeviceMemory(hostQueue.device()); bool blocking = false; if (mem->isPersistentMapped()) { blocking = true; } amdMemory->decMapCount(); command->enqueue(); if (blocking) { LogInfo("blocking wait in unmapping function"); command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_MemObjQuery * @{ */ /*! \brief Get information that is common to all memory objects (buffer and * image objects) * * \param memobj specifies the memory object being queried. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by \a param_value. If \a param_value_size_ret is NULL, it is * ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type. * - CL_INVALID_MEM_OBJECT if \a memobj is a not a valid memory object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetMemObjectInfo, (cl_mem memobj, cl_mem_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(memobj)) { return CL_INVALID_MEM_OBJECT; } switch (param_name) { case CL_MEM_TYPE: { cl_mem_object_type type = as_amd(memobj)->getType(); return amd::clGetInfo(type, param_value_size, param_value, param_value_size_ret); } case CL_MEM_FLAGS: { cl_mem_flags flags = as_amd(memobj)->getMemFlags(); return amd::clGetInfo(flags, param_value_size, param_value, param_value_size_ret); } case CL_MEM_SIZE: { size_t size = as_amd(memobj)->getSize(); return amd::clGetInfo(size, param_value_size, param_value, param_value_size_ret); } case CL_MEM_HOST_PTR: { amd::Memory* memory = as_amd(memobj); const void* hostPtr = (memory->getMemFlags() & CL_MEM_USE_HOST_PTR) ? memory->getHostMem() : NULL; return amd::clGetInfo(hostPtr, param_value_size, param_value, param_value_size_ret); } case CL_MEM_MAP_COUNT: { cl_uint count = as_amd(memobj)->mapCount(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } case CL_MEM_REFERENCE_COUNT: { cl_uint count = as_amd(memobj)->referenceCount(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } case CL_MEM_CONTEXT: { cl_context context = as_cl(&as_amd(memobj)->getContext()); return amd::clGetInfo(context, param_value_size, param_value, param_value_size_ret); } case CL_MEM_ASSOCIATED_MEMOBJECT: { amd::Memory* amdParent = as_amd(memobj)->parent(); if ((NULL != amdParent) && (NULL != amdParent->getSvmPtr()) && (NULL == amdParent->parent())) { amdParent = NULL; } cl_mem parent = as_cl(amdParent); return amd::clGetInfo(parent, param_value_size, param_value, param_value_size_ret); } case CL_MEM_OFFSET: { size_t mem_offset = as_amd(memobj)->getOrigin(); return amd::clGetInfo(mem_offset, param_value_size, param_value, param_value_size_ret); } case CL_MEM_USES_SVM_POINTER: { cl_bool usesSvmPointer = as_amd(memobj)->usesSvmPointer(); return amd::clGetInfo(usesSvmPointer, param_value_size, param_value, param_value_size_ret); } #ifdef _WIN32 case CL_MEM_D3D10_RESOURCE_KHR: { ID3D10Resource* pRes; amd::InteropObject* interop = ((amd::Memory*)as_amd(memobj))->getInteropObj(); if (interop) { amd::D3D10Object* d3d10obj = interop->asD3D10Object(); if (d3d10obj) { pRes = d3d10obj->getD3D10ResOrig(); if (!pRes) { pRes = d3d10obj->getD3D10Resource(); } } return amd::clGetInfo(pRes, param_value_size, param_value, param_value_size_ret); } break; } case CL_MEM_D3D11_RESOURCE_KHR: { ID3D11Resource* pRes; amd::InteropObject* interop = ((amd::Memory*)as_amd(memobj))->getInteropObj(); if (interop) { amd::D3D11Object* d3d11obj = interop->asD3D11Object(); if (d3d11obj) { pRes = d3d11obj->getD3D11ResOrig(); if (!pRes) { pRes = d3d11obj->getD3D11Resource(); } } return amd::clGetInfo(pRes, param_value_size, param_value, param_value_size_ret); } break; } case CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR: { amd::InteropObject* interop = ((amd::Memory*)as_amd(memobj))->getInteropObj(); if (interop) { amd::D3D9Object* d3d9obj = interop->asD3D9Object(); if (d3d9obj) return amd::clGetInfo(d3d9obj->getSurfInfo(), param_value_size, param_value, param_value_size_ret); else return CL_INVALID_MEM_OBJECT; } else return CL_INVALID_MEM_OBJECT; break; } case CL_MEM_DX9_MEDIA_ADAPTER_TYPE_KHR: { cl_dx9_media_adapter_type_khr adapterType; amd::InteropObject* interop = ((amd::Memory*)as_amd(memobj))->getInteropObj(); if (interop) { amd::D3D9Object* d3d9obj = interop->asD3D9Object(); if (d3d9obj) { adapterType = d3d9obj->getAdapterType(); } return amd::clGetInfo(adapterType, param_value_size, param_value, param_value_size_ret); } break; } #endif //_WIN32 default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! \brief Get information specific to an image object. * * \param obj specifies the image object being queried. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by \a param_value. If \a param_value_size_ret is NULL, it is * ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL. * - CL_INVALID_MEM_OBJECT if \a image is a not a valid image object. * * \version 1.2r09 */ RUNTIME_ENTRY(cl_int, clGetImageInfo, (cl_mem memobj, cl_image_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(memobj)) { return CL_INVALID_MEM_OBJECT; } amd::Image* image = as_amd(memobj)->asImage(); if (image == NULL) { return CL_INVALID_MEM_OBJECT; } switch (param_name) { case CL_IMAGE_FORMAT: { cl_image_format format = image->getImageFormat(); return amd::clGetInfo(format, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_ELEMENT_SIZE: { size_t elementSize = image->getImageFormat().getElementSize(); return amd::clGetInfo(elementSize, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_ROW_PITCH: { size_t rowPitch = image->getRowPitch(); return amd::clGetInfo(rowPitch, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_SLICE_PITCH: { size_t slicePitch = image->getSlicePitch(); return amd::clGetInfo(slicePitch, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_WIDTH: { size_t width = image->getWidth(); return amd::clGetInfo(width, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_HEIGHT: { size_t height = image->getHeight(); if ((image->getType() == CL_MEM_OBJECT_IMAGE1D) || (image->getType() == CL_MEM_OBJECT_IMAGE1D_ARRAY) || (image->getType() == CL_MEM_OBJECT_IMAGE1D_BUFFER)) { height = 0; } return amd::clGetInfo(height, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_DEPTH: { size_t depth = image->getDepth(); if ((image->getType() == CL_MEM_OBJECT_IMAGE1D_BUFFER) || (image->getType() == CL_MEM_OBJECT_IMAGE1D_ARRAY) || (image->getType() == CL_MEM_OBJECT_IMAGE2D_ARRAY) || (image->getType() == CL_MEM_OBJECT_IMAGE1D) || (image->getType() == CL_MEM_OBJECT_IMAGE2D)) { depth = 0; } return amd::clGetInfo(depth, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_ARRAY_SIZE: { size_t arraySize = 0; if (image->getType() == CL_MEM_OBJECT_IMAGE1D_ARRAY) { arraySize = image->getHeight(); } else if (image->getType() == CL_MEM_OBJECT_IMAGE2D_ARRAY) { arraySize = image->getDepth(); } return amd::clGetInfo(arraySize, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_BUFFER: { cl_mem buffer = 0; amd::Memory* parent = image->parent(); while (parent && (parent->asBuffer() == NULL)) { parent = parent->parent(); } buffer = as_cl(parent); return amd::clGetInfo(buffer, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_NUM_MIP_LEVELS: { cl_uint numMipLevels = image->getMipLevels(); return amd::clGetInfo(numMipLevels, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_NUM_SAMPLES: { cl_uint numSamples = 0; return amd::clGetInfo(numSamples, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_BYTE_PITCH_AMD: { size_t bytePitch = image->getBytePitch(); return amd::clGetInfo(bytePitch, param_value_size, param_value, param_value_size_ret); } #ifdef _WIN32 case CL_IMAGE_D3D10_SUBRESOURCE_KHR: { amd::InteropObject* interop = ((amd::Memory*)as_amd(memobj))->getInteropObj(); if (!interop) { return CL_INVALID_MEM_OBJECT; } amd::D3D10Object* d3d10obj = interop->asD3D10Object(); if (!d3d10obj) { return CL_INVALID_MEM_OBJECT; } UINT subresource = d3d10obj->getSubresource(); return amd::clGetInfo(subresource, param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_D3D11_SUBRESOURCE_KHR: { amd::InteropObject* interop = ((amd::Memory*)as_amd(memobj))->getInteropObj(); if (!interop) { return CL_INVALID_MEM_OBJECT; } amd::D3D11Object* d3d11obj = interop->asD3D11Object(); if (!d3d11obj) { return CL_INVALID_MEM_OBJECT; } UINT subresource = d3d11obj->getSubresource(); return amd::clGetInfo(subresource, param_value_size, param_value, param_value_size_ret); } case CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR: { amd::InteropObject* interop = ((amd::Memory*)as_amd(memobj))->getInteropObj(); if (!interop) { return CL_INVALID_MEM_OBJECT; } amd::D3D9Object* d3d9obj = interop->asD3D9Object(); if (!d3d9obj) { return CL_INVALID_MEM_OBJECT; } return amd::clGetInfo(d3d9obj->getSurfInfo(), param_value_size, param_value, param_value_size_ret); } case CL_IMAGE_DX9_MEDIA_PLANE_KHR: { amd::InteropObject* interop = ((amd::Memory*)as_amd(memobj))->getInteropObj(); if (!interop) { return CL_INVALID_MEM_OBJECT; } amd::D3D9Object* d3d9obj = interop->asD3D9Object(); if (!d3d9obj) { return CL_INVALID_MEM_OBJECT; } cl_uint plane = d3d9obj->getPlane(); return amd::clGetInfo(plane, param_value_size, param_value, param_value_size_ret); } #endif //_WIN32 default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! \brief creates a 1D image, 1D image buffer, 1D image array, 2D image, * 2D image array and 3D image object * * \param context is a valid OpenCL context on which the image object is * to be created. * * \param flags is a bit-field that is used to specify allocation and usage * information about the image memory object being created and is described * in table 5.3. If value specified for flags is 0, the default is used which * is CL_MEM_READ_WRITE. * * \param image_format is a pointer to a structure that describes format * properties of the image to be allocated. Refer to section 5.3.1.1 for * a detailed description of the image format descriptor. * * \param image_desc is a pointer to a structure that describes type and * dimensions of the image to be allocated. Refer to section 5.3.1.2 for * a detailed description of the image descriptor. * * \param host_ptr is a pointer to the image data that may already be * allocated by the application. Refer to table below for a description of * how large the buffer that host_ptr points to must be. * CL_MEM_OBJECT_IMAGE1D >= image_row_pitch * CL_MEM_OBJECT_IMAGE1D_BUFFER >= image_row_pitch * CL_MEM_OBJECT_IMAGE2D >= image_row_pitch * image_height * CL_MEM_OBJECT_IMAGE3D >= image_slice_pitch * image_depth * CL_MEM_OBJECT_IMAGE1D_ARRAY >= image_slice_pitch * image_array_size * CL_MEM_OBJECT_IMAGE2D_ARRAY >= image_slice_pitch * image_array_size * For a 3D image or 2D image array, the image data specified by \a host_ptr * is stored as a linear sequence of adjacent 2D image slices or 2D images * respectively. Each 2D image is a linear sequence of adjacent scanlines. * Each scanline is a linear sequence of image elements. * For a 2D image array, the image data specified by \a host_ptr is stored * as a linear sequence of adjacent scanlines. Each scanline is a linear * sequence of image elements. * For a 1D image array, the image data specified by \a host_ptr is stored * as a linear sequence of adjacent 1D images respectively. Each 1D image * or 1D image buffer is a single scanline which is a linear sequence of * adjacent elements. * * \param errcode_ret will return an appropriate error code. * If \a errcode_ret is NULL, no error code is returned. * * \return a valid non-zero image object created and the \a errcode_ret is * set to CL_SUCCESS if the image object is created successfully. Otherwise, * it returns a NULL value with one of the following error values * returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if values specified in \a flags are not valid. * - CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if values specified in \a image_format * are not valid or if \a image_format is NULL. * - CL_INVALID_IMAGE_DESCRIPTOR if values specified in \a image_desc are * not valid or if \a image_desc is NULL. * - CL_INVALID_HOST_PTR if \a host_ptr in \a image_desc is NULL and * CL_MEM_USE_HOST_PTR or CL_MEM_COPY_HOST_PTR are set in \a flags or * if \a host_ptr is not NULL, but CL_MEM_COPY_HOST_PTR or * CL_MEM_USE_HOST_PTR are not set in \a flags. * - CL_INVALID_VALUE if a 1D image buffer is being created and * the buffer object was created with CL_MEM_WRITE_ONLY and \a flags * specifies CL_MEM_READ_WRITE or CL_MEM_READ_ONLY, or if the buffer object * was created with CL_MEM_READ_ONLY and \a flags specifies * CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY, or if \a flags specifies * CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR or CL_MEM_COPY_HOST_PTR. * - CL_IMAGE_FORMAT_NOT_SUPPORTED if the image_format is not supported. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for image object. * - CL_INVALID_OPERATION if there are no devices in \a context that support * images * - CL_DEVICE_IMAGE_SUPPORT specified in table 4.3 is CL_FALSE). * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY_RET(cl_mem, clCreateImage, (cl_context context, cl_mem_flags flags, const cl_image_format* image_format, const cl_image_desc* image_desc, void* host_ptr, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter: context"); return (cl_mem)0; } // check flags for validity if (!validateFlags(flags)) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter: flags"); return (cl_mem)0; } // check format if (image_format == NULL) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter: image_format"); return (cl_mem)0; } const amd::Image::Format imageFormat(*image_format); if (!imageFormat.isValid()) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter: image_format"); return (cl_mem)0; } amd::Context& amdContext = *as_amd(context); if (!imageFormat.isSupported(amdContext, image_desc->image_type)) { *not_null(errcode_ret) = CL_IMAGE_FORMAT_NOT_SUPPORTED; LogWarning("invalid parameter: image_format"); return (cl_mem)0; } // check host_ptr consistency if (host_ptr == NULL) { if (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR)) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter: host_ptr"); return (cl_mem)0; } } else { if (!(flags & (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR))) { *not_null(errcode_ret) = CL_INVALID_HOST_PTR; LogWarning("invalid parameter: host_ptr"); return (cl_mem)0; } } const std::vector& devices = as_amd(context)->devices(); bool supportPass = false; for (auto& dev : devices) { if (dev->info().imageSupport_) { supportPass = true; break; } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support images"); return (cl_mem)0; } if (!amd::Image::validateDimensions(devices, image_desc->image_type, image_desc->image_width, image_desc->image_height, image_desc->image_depth, image_desc->image_array_size)) { *not_null(errcode_ret) = CL_INVALID_IMAGE_SIZE; LogWarning("invalid parameter: image dimensions exceeding max"); return (cl_mem)0; } size_t imageRowPitch = 0; size_t imageSlicePitch = 0; if (!validateImageDescriptor(devices, imageFormat, image_desc, host_ptr, imageRowPitch, imageSlicePitch)) { *not_null(errcode_ret) = CL_INVALID_IMAGE_DESCRIPTOR; LogWarning("invalid parameter: image_desc"); return (cl_mem)0; } // Validate mip level if (image_desc->num_mip_levels != 0) { size_t maxDim = std::max(image_desc->image_width, image_desc->image_height); maxDim = std::max(maxDim, image_desc->image_depth); uint mipLevels; for (mipLevels = 0; maxDim > 0; maxDim >>= 1, mipLevels++) ; if (mipLevels < image_desc->num_mip_levels) { *not_null(errcode_ret) = CL_INVALID_MIP_LEVEL; LogWarning("Invalid mip level"); return (cl_mem)0; } } amd::Image* image = NULL; switch (image_desc->image_type) { case CL_MEM_OBJECT_IMAGE1D: image = new (amdContext) amd::Image(amdContext, CL_MEM_OBJECT_IMAGE1D, flags, imageFormat, image_desc->image_width, 1, 1, imageRowPitch, 0, image_desc->num_mip_levels); break; case CL_MEM_OBJECT_IMAGE2D: if (image_desc->mem_object != NULL) { amd::Buffer& buffer = *(as_amd(image_desc->mem_object)->asBuffer()); if (&amdContext != &buffer.getContext()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter: context"); return (cl_mem)0; } // host_ptr is not supported, the buffer object is used instead. if ((flags & (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR)) != 0) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter: flags"); return (cl_mem)0; } cl_uint pitchAlignment = 0; for (unsigned int i = 0; i < devices.size(); ++i) { if (pitchAlignment < devices[i]->info().imagePitchAlignment_) { pitchAlignment = devices[i]->info().imagePitchAlignment_; } } if ((imageRowPitch % pitchAlignment) != 0) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter: flags"); return (cl_mem)0; } image = new (amdContext) amd::Image( buffer, CL_MEM_OBJECT_IMAGE2D, (flags != 0) ? flags : buffer.getMemFlags(), imageFormat, image_desc->image_width, image_desc->image_height, 1, imageRowPitch, imageSlicePitch); } else { image = new (amdContext) amd::Image(amdContext, CL_MEM_OBJECT_IMAGE2D, flags, imageFormat, image_desc->image_width, image_desc->image_height, 1, imageRowPitch, 0, image_desc->num_mip_levels); } break; case CL_MEM_OBJECT_IMAGE3D: image = new (amdContext) amd::Image(amdContext, CL_MEM_OBJECT_IMAGE3D, flags, imageFormat, image_desc->image_width, image_desc->image_height, image_desc->image_depth, imageRowPitch, imageSlicePitch, image_desc->num_mip_levels); break; case CL_MEM_OBJECT_IMAGE1D_BUFFER: { amd::Buffer& buffer = *(as_amd(image_desc->mem_object)->asBuffer()); if (&amdContext != &buffer.getContext()) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter: context"); return (cl_mem)0; } // host_ptr is not supported, the buffer object is used instead. if ((flags & (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR)) != 0) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter: flags"); return (cl_mem)0; } image = new (amdContext) amd::Image( buffer, CL_MEM_OBJECT_IMAGE1D_BUFFER, (flags != 0) ? flags : buffer.getMemFlags(), imageFormat, image_desc->image_width, 1, 1, imageRowPitch, imageSlicePitch); } break; case CL_MEM_OBJECT_IMAGE1D_ARRAY: image = new (amdContext) amd::Image(amdContext, CL_MEM_OBJECT_IMAGE1D_ARRAY, flags, imageFormat, image_desc->image_width, image_desc->image_array_size, 1, imageRowPitch, imageSlicePitch, image_desc->num_mip_levels); break; case CL_MEM_OBJECT_IMAGE2D_ARRAY: image = new (amdContext) amd::Image( amdContext, CL_MEM_OBJECT_IMAGE2D_ARRAY, flags, imageFormat, image_desc->image_width, image_desc->image_height, image_desc->image_array_size, imageRowPitch, imageSlicePitch, image_desc->num_mip_levels); break; default: { *not_null(errcode_ret) = CL_INVALID_IMAGE_DESCRIPTOR; LogWarning("invalid parameter: image_desc"); return reinterpret_cast(image); } break; } if (image == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("cannot allocate resources"); return (cl_mem)0; } if (!image->create(host_ptr)) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; image->release(); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return (cl_mem)as_cl(image); } RUNTIME_EXIT /*! \brief Enqueues a command to fill a buffer object with * a pattern of a given pattern size. * * \param command_queue refers to the command-queue in which * the fill command will be queued. The OpenCL context associated with * command_queue and buffer must be the same. * * \param buffer is a valid buffer object. * * \param pattern is a pointer to the data pattern of size pattern_size * in bytes. pattern will be used to fill a region in buffer starting * at offset and is cb bytes in size. The data pattern must be a scalar or * vector integer or floating-point data type supported by OpenCL * as described in sections 6.1.1 and 6.1.2. For example, if buffer is * to be filled with a pattern of float4 values, then pattern will be * a pointer to a cl_float4 value and pattern_size will be sizeof(cl_float4). * The maximum value of pattern_size is the size of the largest integer or * floating-point vector data type supported by the OpenCL device. * * \param offset is the location in bytes of the region being filled * in buffer and must be a multiple of pattern_size. size is the size * in bytes of region being filled in buffer and must be a multiple * of pattern_size. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifes events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, * then this particular command does not wait on any event to complete. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and a\ num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. * The memory associated with \a event_wait_list can be reused or * freed after the function returns. * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for the * application to query the status of this command or queue a wait for this * command to complete. clEnqueueBarrierWithWaitList can be used instead. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_CONTEXT if context associated with \a command_queue and * \a buffer are not the same or if the \a context associated with * \a command_queue and \a events in \a event_wait_list are not the same. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_INVALID_MEM_OBJECT if \a memobj is not a valid memory object. * - CL_INVALID_VALUE if pattern is NULL or if pattern_size is 0 or if * \a pattern_size is one of {1, 2, 4, 8, 16, 32, 64, 128}. * - CL_INVALID_VALUE if \a offset or \a offset + \a size require accessing * elements outside the \a buffer object respectively. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or if \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clEnqueueFillBuffer, (cl_command_queue command_queue, cl_mem buffer, const void* pattern, size_t pattern_size, size_t offset, size_t size, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { amd::Buffer* fillBuffer; if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(buffer)) { return CL_INVALID_MEM_OBJECT; } fillBuffer = as_amd(buffer)->asBuffer(); if (fillBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } if ((pattern == NULL) || (pattern_size == 0) || (pattern_size > amd::FillMemoryCommand::MaxFillPatterSize) || ((pattern_size & (pattern_size - 1)) != 0)) { return CL_INVALID_VALUE; } // Offset and size must be multiple of pattern_size if (!(amd::isMultipleOf(offset, pattern_size) && amd::isMultipleOf(size, pattern_size))) { return CL_INVALID_VALUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != fillBuffer->getContext()) { return CL_INVALID_CONTEXT; } amd::Coord3D fillOffset(offset, 0, 0); amd::Coord3D fillSize(size, 1, 1); // surface takes [pitch, width, height] amd::Coord3D surface(size, size, 1); if (!fillBuffer->validateRegion(fillOffset, fillSize)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::FillMemoryCommand* command = new amd::FillMemoryCommand(hostQueue, CL_COMMAND_FILL_BUFFER, eventWaitList, *fillBuffer, pattern, pattern_size, fillOffset, fillSize, surface); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief enqueues a command to fill an image object with * a specified color. * * \param command_queue refers to the command-queue in which * the fill command will be queued. The OpenCL context associated with * command_queue and buffer must be the same. * * \param buffer is a valid buffer object. * * \param fill_color is the fill color. The fill color is a four * component RGBA floating-point color value if the image channel data type * is not an unnormalized signed and unsigned integer type, is a four * component signed integer value if the image channel data type is * an unnormalized signed integer type and is a four component unsigned * integer value if the image channel data type is an unormalized * unsigned integer type. The fill color will be converted to * the appropriate image channel format and order associated with image * as described in sections 6.11.13 and 8.3. * * \param origin defines the (x, y, z) offset in pixels in the image * or (x, y) offset and the image index in the image array. If image is * a 2D image object, origin[2] must be 0. If image is a 1D image or 1D * image buffer object, origin[1] and origin[2] must be 0. If image is * a 1D image array object, origin[2] must be 0. If image is a 1D image array * object, origin[1] describes the image index in the 1D image array. * If image is a 2D image array object, origin[2] describes the image index * in the 2D image array. * * \param region defines the (width, height, depth) in pixels of * the 1D, 2D or 3D rectangle or the (width, height) in pixels in pixels of * the 1D or 2D rectangle and the image index of an image array. If image is * a 2D image object, region[2] must be 1. If image is a 1D image or * 1D image buffer object, region[1] and region[2] must be 1. If image is * a 1D image array object, region[1] and region[2] must be 1. * If image is a 2D image array object, region[2] must be 1. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifes events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, * then this particular command does not wait on any event to complete. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and a\ num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. * The memory associated with \a event_wait_list can be reused or * freed after the function returns. * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for * the application to query the status of this command or queue a wait for this * command to complete. clEnqueueBarrierWithWaitList can be used instead. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_CONTEXT if context associated with \a command_queue and * \a buffer are not the same or if the \a context associated with * \a command_queue and \a events in \a event_wait_list are not the same. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_INVALID_MEM_OBJECT if \a memobj is not a valid memory object. * - CL_INVALID_VALUE if fill_color is NULL. * - CL_INVALID_VALUE if the region being filled as specified by origin and * region is out of bounds. * - CL_INVALID_VALUE if values in origin and region do not follow rules * described in the argument description for origin and region. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or if \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_INVALID_IMAGE_SIZE if image dimensions (image width, height, specified * or compute row * - CL_INVALID_IMAGE_FORMAT if image format (image channel order and data type) * for image are not supported by device associated with queue. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clEnqueueFillImage, (cl_command_queue command_queue, cl_mem image, const void* fill_color, const size_t* origin, const size_t* region, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { amd::Image* fillImage; if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(image)) { return CL_INVALID_MEM_OBJECT; } if (fill_color == NULL) { return CL_INVALID_VALUE; } fillImage = as_amd(image)->asImage(); if (fillImage == NULL) { return CL_INVALID_MEM_OBJECT; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != fillImage->getContext()) { return CL_INVALID_CONTEXT; } if (fillImage->getImageFormat().image_channel_order == CL_DEPTH_STENCIL) { return CL_INVALID_OPERATION; } amd::Coord3D fillOrigin(origin[0], origin[1], origin[2]); amd::Coord3D fillRegion(region[0], region[1], region[2]); // surface takes [pitch, width, height] amd::Coord3D surface(region[0], region[0], region[2]); ImageViewRef mip; if (fillImage->getMipLevels() > 1) { // Create a view for the specified mip level mip = fillImage->createView(fillImage->getContext(), fillImage->getImageFormat(), nullptr, origin[fillImage->getDims()]); if (mip() == nullptr) { return CL_OUT_OF_HOST_MEMORY; } // Reset the mip level value to 0, since a view was created if (fillImage->getDims() < 3) { fillOrigin.c[fillImage->getDims()] = 0; } fillImage = mip(); } if (!fillImage->validateRegion(fillOrigin, fillRegion)) { return CL_INVALID_VALUE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::FillMemoryCommand* command = new amd::FillMemoryCommand( hostQueue, CL_COMMAND_FILL_IMAGE, eventWaitList, *fillImage, fill_color, sizeof(cl_float4), // @note color size is always 16 bytes value fillOrigin, fillRegion, surface); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueues a command to indicate which device a set of memory objects * should be associated with. Typically, memory objects are implicitly * migrated to a device for which enqueued commands, using the memory object, * are targeted. \a clEnqueueMigrateMemObjects allows this migration to be * explicitly performed ahead of the dependent commands. This allows a user to * preemptively change the association of a memory object, through regular * command queue scheduling, in order to prepare for another upcoming * command. This also permits an application to overlap the placement of * memory objects with other unrelated operations before these memory objects * are needed potentially hiding transfer latencies. Once the event, returned * from \a clEnqueueMigrateMemObjects, has been marked \a CL_COMPLETE * the memory objects specified in \a mem_objects have been successfully * migrated to the device associated with \a command_queue. The migrated memory * object shall remain resident on the device until another command is enqueued * that either implicitly or explicitly migrates it away. * \a clEnqueueMigrateMemObjects can also be used to direct the initial * placement of a memory object, after creation, possibly avoiding the initial * overhead of instantiating the object on the first enqueued command to use it. * The user is responsible for managing the event dependencies, associated with * this command, in order to avoid overlapping access to memory objects. * Improperly specified event dependencies passed to * \a clEnqueueMigrateMemObjects could result in undefined results. * * \param command_queue is a valid command-queue. The specified set of memory * objects in \a mem_objects will be migrated to the OpenCL device associated * with \a command_queue or to the host if the \a CL_MIGRATE_MEM_OBJECT_HOST * has been specified. * * \param num_mem_objects is the number of memory objects specified in * \a mem_objects. \a mem_objects is a pointer to a list of memory objects. * * \param flags is a bit-field that is used to specify migration options. * The following table describes the possible values for flags. * cl_mem_migration flags Description * CL_MIGRATE_MEM_OBJECT_HOST This flag indicates that the specified set * of memory objects are to be migrated to the * host, regardless of the target command-queue. * CL_MIGRATE_MEM_OBJECT_ This flag indicates that the contents of the set * CONTENT_UNDEFINED of memory objects are undefined after migration. * The specified set of memory objects are migrated * to the device associated with \a command_queue * without incurring * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifes events that need to complete before this * particular command can be executed. If \a event_wait_list is NULL, * then this particular command does not wait on any event to complete. * If \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. * If \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and a\ num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. * The memory associated with \a event_wait_list can be reused or * freed after the function returns. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_INVALID_CONTEXT if the context associated with \a command_queue * and memory objects in \a mem_objects are not the same or if the context * associated with \a command_queue and events in \a event_wait_list * are not the same. * - CL_INVALID_MEM_OBJECT if any of the memory objects in \a mem_objects * is not a valid memory object. * - CL_INVALID_VALUE if \a num_mem_objects is zero or * if \a mem_objects is NULL. * - CL_INVALID_VALUE if flags is not 0 or any of the values described * in the table above * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or if \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate * memory for the specified set of memory objects in \a mem_objects. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 1.2r15 */ RUNTIME_ENTRY(cl_int, clEnqueueMigrateMemObjects, (cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem* mem_objects, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if ((num_mem_objects == 0) || (mem_objects == NULL)) { return CL_INVALID_VALUE; } if (flags & ~(CL_MIGRATE_MEM_OBJECT_HOST | CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED)) { return CL_INVALID_VALUE; } std::vector memObjects; for (uint i = 0; i < num_mem_objects; ++i) { if (!is_valid(mem_objects[i])) { return CL_INVALID_MEM_OBJECT; } amd::Memory* memory = as_amd(mem_objects[i]); if (hostQueue.context() != memory->getContext()) { return CL_INVALID_CONTEXT; } memObjects.push_back(memory); } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::MigrateMemObjectsCommand* command = new amd::MigrateMemObjectsCommand( hostQueue, CL_COMMAND_MIGRATE_MEM_OBJECTS, eventWaitList, memObjects, flags); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT RUNTIME_ENTRY_RET(cl_mem, clConvertImageAMD, (cl_context context, cl_mem image, const cl_image_format* image_format, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter: context"); return (cl_mem)0; } // check format if (image_format == NULL) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter: image_format"); return (cl_mem)0; } const amd::Image::Format imageFormat(*image_format); if (!imageFormat.isValid()) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("invalid parameter: image_format"); return (cl_mem)0; } amd::Context& amdContext = *as_amd(context); if (!imageFormat.isSupported(amdContext)) { *not_null(errcode_ret) = CL_IMAGE_FORMAT_NOT_SUPPORTED; LogWarning("invalid parameter: image_format"); return (cl_mem)0; } amd::Image* amdImage = as_amd(image)->asImage(); amd::Image* converted_image = amdImage->createView(amdContext, imageFormat, NULL); if (converted_image == NULL) { *not_null(errcode_ret) = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; LogWarning("cannot allocate resources"); return (cl_mem)0; } *not_null(errcode_ret) = CL_SUCCESS; return (cl_mem)as_cl(converted_image); } RUNTIME_EXIT RUNTIME_ENTRY_RET(cl_mem, clCreateBufferFromImageAMD, (cl_context context, cl_mem image, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter: context"); return (cl_mem)0; } amd::Context& amdContext = *as_amd(context); const std::vector& devices = amdContext.devices(); bool supportPass = false; for (auto& dev : devices) { if (dev->info().bufferFromImageSupport_) { supportPass = true; break; } } if (!supportPass) { *not_null(errcode_ret) = CL_INVALID_OPERATION; LogWarning("there are no devices in context to support buffer from image"); return (cl_mem)0; } amd::Image* amdImage = as_amd(image)->asImage(); if (!is_valid(image) || amdImage == NULL) { *not_null(errcode_ret) = CL_INVALID_MEM_OBJECT; return NULL; } amd::Memory* mem = new (amdContext) amd::Buffer(*amdImage, 0, 0, amdImage->getSize()); if (mem == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!mem->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; mem->release(); return NULL; } *not_null(errcode_ret) = CL_SUCCESS; return (cl_mem)as_cl(mem); } RUNTIME_EXIT /*! @} * @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_p2p_amd.cpp000066400000000000000000000072161450307266000204760ustar00rootroot00000000000000/* Copyright (c) 2015 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include #include "cl_p2p_amd.h" #include "platform/object.hpp" RUNTIME_ENTRY(cl_int, clEnqueueCopyBufferP2PAMD, (cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(src_buffer) || !is_valid(dst_buffer)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* srcBuffer = as_amd(src_buffer)->asBuffer(); amd::Buffer* dstBuffer = as_amd(dst_buffer)->asBuffer(); if (srcBuffer == NULL || dstBuffer == NULL) { return CL_INVALID_MEM_OBJECT; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if ((hostQueue.context() != srcBuffer->getContext()) && (hostQueue.context() != dstBuffer->getContext())) { return CL_INVALID_CONTEXT; } amd::Coord3D srcOffset(src_offset, 0, 0); amd::Coord3D dstOffset(dst_offset, 0, 0); amd::Coord3D size(cb, 1, 1); if (!srcBuffer->validateRegion(srcOffset, size) || !dstBuffer->validateRegion(dstOffset, size)) { return CL_INVALID_VALUE; } if (srcBuffer == dstBuffer && ((src_offset <= dst_offset && dst_offset < src_offset + cb) || (dst_offset <= src_offset && src_offset < dst_offset + cb))) { return CL_MEM_COPY_OVERLAP; } amd::Command::EventWaitList eventWaitList; if ((num_events_in_wait_list == 0 && event_wait_list != NULL) || (num_events_in_wait_list != 0 && event_wait_list == NULL)) { return CL_INVALID_EVENT_WAIT_LIST; } while (num_events_in_wait_list-- > 0) { cl_event event = *event_wait_list++; amd::Event* amdEvent = as_amd(event); if (!is_valid(event)) { return CL_INVALID_EVENT_WAIT_LIST; } eventWaitList.push_back(amdEvent); } amd::CopyMemoryP2PCommand* command = new amd::CopyMemoryP2PCommand(hostQueue, CL_COMMAND_COPY_BUFFER, eventWaitList, *srcBuffer, *dstBuffer, srcOffset, dstOffset, size); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT clr-rocm-5.7.1/opencl/amdocl/cl_p2p_amd.h000066400000000000000000000031041450307266000201330ustar00rootroot00000000000000/* Copyright (c) 2017 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __CL_P2P_AMD_H #define __CL_P2P_AMD_H #include "CL/cl_ext.h" #ifdef __cplusplus extern "C" { #endif /*__cplusplus*/ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferP2PAMD( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_2; #ifdef __cplusplus } /*extern "C"*/ #endif /*__cplusplus*/ #endif clr-rocm-5.7.1/opencl/amdocl/cl_pipe.cpp000066400000000000000000000161221450307266000201050ustar00rootroot00000000000000/* Copyright (c) 2013 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "platform/memory.hpp" #include "platform/context.hpp" #include "platform/command.hpp" /*! \addtogroup API * @{ * * \addtogroup CL_Pipes * @{ */ /*! \brief creates a pipe object. * * \param context is a valid OpenCL context used to create the pipe object. * * \param flags is a bit-field that is used to specify allocation and usage * information such as the memory arena that should be used to allocate the pipe * object and how it will be used. Only CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY, * CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS can be specified when creating a * pipe object. If value specified for flags is 0, the default is used which is * CL_MEM_READ_WRITE. * * \param pipe_packet_size is the size in bytes of a pipe packet. * * \param pipe_max_packets specifies the pipe capacity by specifying the maximum * number of packets the pipe can hold. * * \param properties specifies a list of properties for the pipe and their * corresponding values. Each property name is immediately followed by the * corresponding desired value. The list is terminated with 0. * * In OpenCL 2.0, properties must be NULL. * * \param errcode_ret will return an appropriate error code. * If \a errcode_ret is NULL, no error code is returned. * * \return a valid non-zero pipe object and \a errcode_ret is set to CL_SUCCESS * if the pipe object is created successfully. Otherwise, it returns a NULL * value with one of the following error values returned in errcode_ret: * - CL_INVALID_CONTEXT if context is not a valid context. * - CL_INVALID_VALUE if values specified in flags are not as defined above. * - CL_INVALID_VALUE if properties is not NULL. * - CL_INVALID_PIPE_SIZE if pipe_packet_size is 0 or the pipe_packet_size * exceeds CL_DEVICE_PIPE_MAX_PACKET_SIZE value for all devices in context * or if pipe_max_packets is 0. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory * for the pipe object. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r19 */ RUNTIME_ENTRY_RET(cl_mem, clCreatePipe, (cl_context context, cl_mem_flags flags, cl_uint pipe_packet_size, cl_uint pipe_max_packets, const cl_pipe_properties* properties, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return NULL; } // check flags for validity cl_bitfield temp = flags & (CL_MEM_READ_WRITE | CL_MEM_WRITE_ONLY | CL_MEM_READ_ONLY | CL_MEM_HOST_NO_ACCESS); if (temp && !(CL_MEM_READ_WRITE == temp || CL_MEM_WRITE_ONLY == temp || CL_MEM_READ_ONLY == temp || CL_MEM_HOST_NO_ACCESS == temp)) { *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid parameter \"flags\""); return (cl_mem)0; } size_t size = sizeof(struct clk_pipe_t) + pipe_packet_size * pipe_max_packets; const std::vector& devices = as_amd(context)->devices(); bool sizePass = false; for (const auto& it : devices) { if (it->info().maxMemAllocSize_ >= size) { sizePass = true; break; } } // check size if (pipe_packet_size == 0 || pipe_max_packets == 0 || !sizePass) { *not_null(errcode_ret) = CL_INVALID_PIPE_SIZE; LogWarning("invalid parameter \"size = 0 or size > CL_DEVICE_PIPE_MAX_PACKET_SIZE\""); return (cl_mem)0; } amd::Context& amdContext = *as_amd(context); amd::Memory* mem = new (amdContext) amd::Pipe(amdContext, flags, size, (size_t)pipe_packet_size, (size_t)pipe_max_packets); if (mem == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_mem)0; } if (!mem->create()) { *not_null(errcode_ret) = CL_MEM_OBJECT_ALLOCATION_FAILURE; mem->release(); return NULL; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(mem); } RUNTIME_EXIT /*! \brief Get information specific to a pipe object created with clCreatePipe. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by param_value. If param_value_size_ret is NULL, it is ignored. * * \return CL_SUCCESS if the function is executed successfully. Otherwise, it * returns one of the following errors: * - CL_INVALID_VALUE if param_name is not valid, or if size in bytes specified * by param_value_size is < size of return type and param_value is not NULL. * - CL_INVALID_MEM_OBJECT if pipe is a not a valid pipe object. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r19 */ RUNTIME_ENTRY(cl_int, clGetPipeInfo, (cl_mem memobj, cl_image_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(memobj)) { return CL_INVALID_MEM_OBJECT; } amd::Pipe* pipe = as_amd(memobj)->asPipe(); if (pipe == NULL) { return CL_INVALID_MEM_OBJECT; } switch (param_name) { case CL_PIPE_PACKET_SIZE: { cl_uint packetSize = pipe->getPacketSize(); return amd::clGetInfo(packetSize, param_value_size, param_value, param_value_size_ret); } case CL_PIPE_MAX_PACKETS: { cl_uint count = pipe->getMaxNumPackets(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_platform_amd.cpp000066400000000000000000000027011450307266000216130ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "vdi_common.hpp" #include "cl_platform_amd.h" #include /*! \addtogroup API * @{ * * \addtogroup AMD_Extensions * @{ * */ RUNTIME_ENTRY(cl_int, clUnloadPlatformAMD, (cl_platform_id platform)) { if (AMD_PLATFORM == platform) { amd::Runtime::tearDown(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_platform_amd.h000066400000000000000000000034011450307266000212560ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ // AMD-specific platform management extensions #ifndef __CL_PLATFORM_AMD_H #define __CL_PLATFORM_AMD_H #include "CL/cl_platform.h" #ifdef __cplusplus extern "C" { #endif /*__cplusplus*/ /*! \brief Unloads the specified platform, handling all required cleanup. * * @todo This is still somewhat of a stub. It only works for the AMD * platform and just forces shutdown of all devices (to get PM4 * capture working). It should handle ICD unregistration as well. */ extern CL_API_ENTRY cl_int CL_API_CALL clUnloadPlatformAMD(cl_platform_id platform) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } /*extern "C"*/ #endif /*__cplusplus*/ #endif /*__CL_AMD_PROFILE_H*/ clr-rocm-5.7.1/opencl/amdocl/cl_profile_amd.cpp000066400000000000000000000327761450307266000214460ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "cl_profile_amd.h" #include "platform/context.hpp" #include "platform/command.hpp" #include "platform/perfctr.hpp" #include "device/device.hpp" #include /*! \addtogroup API * @{ * * \addtogroup AMD_Extensions * @{ * */ /*! \brief Creates a new HW performance counter * for the specified OpenCL context. * * \param device must be a valid OpenCL device. * * \param block_index index of the HW block to configure. * * \param counter_index index of the hardware counter * within the block to configure. * * \param event_index Event you wish to count with * the counter specified by block_index + counter_index * * \param perf_counter the created perfcounter object * * \param errcode_ret A non zero value if OpenCL failed to create PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_DEVICE if the specified context is invalid. * - CL_INVALID_OPERATION if we couldn't create the object * * \return Created perfcounter object */ RUNTIME_ENTRY_RET(cl_perfcounter_amd, clCreatePerfCounterAMD, (cl_device_id device, cl_perfcounter_property* properties, cl_int* errcode_ret)) { // Make sure we have a valid device object if (!is_valid(device)) { *not_null(errcode_ret) = CL_INVALID_DEVICE; return NULL; } // Make sure we have a valid pointer to the performance counter properties if (NULL == properties) { return NULL; } amd::PerfCounter::Properties perfProperties; size_t size = 0; while (properties[size] != CL_PERFCOUNTER_NONE) { if (properties[size] < CL_PERFCOUNTER_LAST) { perfProperties[properties[size]] = static_cast(properties[size + 1]); size += 2; } else { return NULL; } } // Create the device perf counter amd::PerfCounter* perfCounter = new amd::PerfCounter(*as_amd(device), perfProperties); if (perfCounter == NULL) { *not_null(errcode_ret) = CL_INVALID_OPERATION; return NULL; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(perfCounter); } RUNTIME_EXIT /*! \brief Destroy a performance counter object. * * \param perf_counter the perfcounter object for release * * \return A non zero value if OpenCL failed to release PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_OPERATION if we failed to release the object */ RUNTIME_ENTRY(cl_int, clReleasePerfCounterAMD, (cl_perfcounter_amd perf_counter)) { if (!is_valid(perf_counter)) { return CL_INVALID_OPERATION; } as_amd(perf_counter)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Increments the perfcounter object reference count. * * \param perf_counter the perfcounter object for retain * * \return A non zero value if OpenCL failed to retain PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_OPERATION if we failed to release the object */ RUNTIME_ENTRY(cl_int, clRetainPerfCounterAMD, (cl_perfcounter_amd perf_counter)) { if (!is_valid(perf_counter)) { return CL_INVALID_OPERATION; } as_amd(perf_counter)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueues the begin command for the specified counters. * * \param command_queue must be a valid OpenCL command queue. * * \param num_perf_counters the number of perfcounter objects in the array. * * \param perf_counters specifies an array of perfcounter objects. * * \param event_wait_list specify [is a pointer to] events that need to * complete before this particular command can be executed. * If \a event_wait_list is NULL, then this particular command does not wait * on any event to complete. If \a event_wait_list is NULL, * \a num_events_in_wait_list must be 0. If \a event_wait_list is not NULL, * the list of events pointed to by \a event_wait_list must be valid and * \a num_events_in_wait_list must be greater than 0. The events specified in * \a event_wait_list act as synchronization points. * * \param num_events_in_wait_list specify the number of events in * \a event_wait_list. It must be 0 if \a event_wait_list is NULL. It must be * greater than 0 if \a event_wait_list is not NULL. * * \param event returns an event object that identifies this particular * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue a * wait for this command to complete. * * \return A non zero value if OpenCL failed to release PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_OPERATION if we failed to enqueue the begin operation * - CL_INVALID_COMMAND_QUEUE if the queue is */ RUNTIME_ENTRY(cl_int, clEnqueueBeginPerfCounterAMD, (cl_command_queue command_queue, cl_uint num_perf_counters, cl_perfcounter_amd* perf_counters, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if ((num_perf_counters == 0) || (perf_counters == NULL)) { return CL_INVALID_OPERATION; } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } amd::PerfCounterCommand::PerfCounterList counters; // Place all counters into the list for (cl_uint i = 0; i < num_perf_counters; ++i) { amd::PerfCounter* amdPerf = as_amd(perf_counters[i]); if (&hostQueue->device() == &amdPerf->device()) { counters.push_back(amdPerf); } else { return CL_INVALID_DEVICE; } } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, *hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } // Create a new command for the performance counters amd::PerfCounterCommand* command = new amd::PerfCounterCommand( *hostQueue, eventWaitList, counters, amd::PerfCounterCommand::Begin); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Submit the command to the device command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueues the end command for the specified counters. * * \param command_queue must be a valid OpenCL command queue. * * \param num_perf_counters the number of perfcounter objects in the array. * * \param perf_counters specifies an array of perfcounter objects. * * \param event_wait_list specify [is a pointer to] events that need to * complete before this particular command can be executed. * If \a event_wait_list is NULL, then this particular command does not wait * on any event to complete. If \a event_wait_list is NULL, * \a num_events_in_wait_list must be 0. If \a event_wait_list is not NULL, * the list of events pointed to by \a event_wait_list must be valid and * \a num_events_in_wait_list must be greater than 0. The events specified in * \a event_wait_list act as synchronization points. * * \param num_events_in_wait_list specify the number of events in * \a event_wait_list. It must be 0 if \a event_wait_list is NULL. It must be * greater than 0 if \a event_wait_list is not NULL. * * \param event returns an event object that identifies this particular * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue a * wait for this command to complete. * * \return A non zero value if OpenCL failed to release PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_OPERATION if we failed to enqueue the end operation */ RUNTIME_ENTRY(cl_int, clEnqueueEndPerfCounterAMD, (cl_command_queue command_queue, cl_uint num_perf_counters, cl_perfcounter_amd* perf_counters, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if ((num_perf_counters == 0) || (perf_counters == NULL)) { return CL_INVALID_OPERATION; } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } amd::PerfCounterCommand::PerfCounterList counters; // Place all counters into the list for (cl_uint i = 0; i < num_perf_counters; ++i) { amd::PerfCounter* amdPerf = as_amd(perf_counters[i]); if (&hostQueue->device() == &amdPerf->device()) { counters.push_back(amdPerf); } else { return CL_INVALID_DEVICE; } } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, *hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } // Create a new command for the performance counters amd::PerfCounterCommand* command = new amd::PerfCounterCommand( *hostQueue, eventWaitList, counters, amd::PerfCounterCommand::End); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Submit the command to the device command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Retrieves the results from the counter objects. * * \param num_perf_counter the perfcounter object for the information query. * * \param perf_counters specifies an array of perfcounter objects. * * \param wait_event specifies the wait event, returned in * the clEnqueueEndPerfCounterAMD. * * \param wait true if OpenCL should wait for the perfcounter data. * * \param values must be a valid pointer to an array of 64-bit values * and the array size must be equal to num_perf_counters. * * \return * - CL_SUCCESS if the function is executed successfully. * - CL_PROFILING_INFO_NOT_AVAILABLE if event isn't finished. * - CL_INVALID_OPERATION if we failed to get the data */ RUNTIME_ENTRY(cl_int, clGetPerfCounterInfoAMD, (cl_perfcounter_amd perf_counter, cl_perfcounter_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { // Check if we have a valid performance counter if (!is_valid(perf_counter)) { return CL_INVALID_OPERATION; } // Find the kernel, associated with the specified device const device::PerfCounter* devCounter = as_amd(perf_counter)->getDeviceCounter(); // Make sure we found a valid performance counter if (devCounter == NULL) { return CL_INVALID_OPERATION; } // Get the corresponded parameters switch (param_name) { case CL_PERFCOUNTER_REFERENCE_COUNT: { cl_uint count = as_amd(perf_counter)->referenceCount(); // Return the reference counter return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } case CL_PERFCOUNTER_GPU_BLOCK_INDEX: case CL_PERFCOUNTER_GPU_COUNTER_INDEX: case CL_PERFCOUNTER_GPU_EVENT_INDEX: { cl_ulong data = devCounter->getInfo(param_name); // Return the device performance counter information return amd::clGetInfo(data, param_value_size, param_value, param_value_size_ret); } case CL_PERFCOUNTER_DATA: { cl_ulong data = devCounter->getInfo(param_name); if (static_cast(0xffffffffffffffffULL) == data) { return CL_PROFILING_INFO_NOT_AVAILABLE; } // Return the device performance counter result return amd::clGetInfo(data, param_value_size, param_value, param_value_size_ret); } default: return CL_INVALID_VALUE; } return CL_SUCCESS; } RUNTIME_EXIT RUNTIME_ENTRY(cl_int, clSetDeviceClockModeAMD, (cl_device_id device, cl_set_device_clock_mode_input_amd set_clock_mode_input, cl_set_device_clock_mode_output_amd* set_clock_mode_output)) { // Make sure we have a valid device object if (!is_valid(device)) { return CL_INVALID_DEVICE; } if (set_clock_mode_input.clock_mode >= CL_DEVICE_CLOCK_MODE_COUNT_AMD) { return CL_INVALID_VALUE; } amd::Device* amdDevice = as_amd(device); bool ret = amdDevice->SetClockMode(set_clock_mode_input, set_clock_mode_output); return (ret == true)? CL_SUCCESS : CL_INVALID_OPERATION; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_profile_amd.h000066400000000000000000000202611450307266000210750ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __CL_PROFILE_AMD_H #define __CL_PROFILE_AMD_H #include "CL/cl_platform.h" #ifdef __cplusplus extern "C" { #endif /*__cplusplus*/ typedef struct _cl_perfcounter_amd* cl_perfcounter_amd; typedef cl_ulong cl_perfcounter_property; typedef cl_uint cl_perfcounter_info; /* cl_perfcounter_info */ enum PerfcounterInfo { CL_PERFCOUNTER_NONE = 0x0, CL_PERFCOUNTER_REFERENCE_COUNT = 0x1, CL_PERFCOUNTER_DATA = 0x2, CL_PERFCOUNTER_GPU_BLOCK_INDEX = 0x3, CL_PERFCOUNTER_GPU_COUNTER_INDEX = 0x4, CL_PERFCOUNTER_GPU_EVENT_INDEX = 0x5, CL_PERFCOUNTER_LAST }; /********************************* * Set device clock mode data *********************************/ enum cl_DeviceClockMode_AMD { CL_DEVICE_CLOCK_MODE_DEFAULT_AMD = 0x0, /*Device clocks and other power settings are restored to default*/ CL_DEVICE_CLOCK_MODE_QUERY_AMD = 0x1, /*Queries the current device clock ratios. Leaves the clock mode of the device unchanged*/ CL_DEVICE_CLOCK_MODE_PROFILING_AMD = 0x2, /*Scale down from peak ratio*/ CL_DEVICE_CLOCK_MODE_MINIMUMMEMORY_AMD = 0x3, /* Memory clock is set to the lowest available level*/ CL_DEVICE_CLOCK_MODE_MINIMUMENGINE_AMD = 0x4, /*Engine clock is set to the lowest available level*/ CL_DEVICE_CLOCK_MODE_PEAK_AMD = 0x5, /*Clocks set to maximum when possible. Fan set to maximum.*/ CL_DEVICE_CLOCK_MODE_QUERYPROFILING_AMD = 0x6, /*Queries the profiling device clock ratios. Leaves the clock mode of the device unchanged*/ CL_DEVICE_CLOCK_MODE_QUERYPEAK_AMD = 0x7, /*Queries the peak device clock ratios.Leaves the clock mode of the device unchanged*/ CL_DEVICE_CLOCK_MODE_COUNT_AMD = 0x8, /*Maxmium count of device clock mode*/ }; typedef struct _cl_set_device_clock_mode_input_amd { /* specify the clock mode for AMD GPU device*/ cl_DeviceClockMode_AMD clock_mode; } cl_set_device_clock_mode_input_amd; typedef struct _cl_set_device_clock_mode_output_amd { /*Ratio of current mem clock to peak clock as obtained from DeviceProperties::maxGpuClock*/ cl_float memory_clock_ratio_to_peak; /*Ratio of current gpu core clock to peak clock as obtained from DeviceProperties::maxGpuClock*/ cl_float engine_clock_ratio_to_peak; } cl_set_device_clock_mode_output_amd; /*! \brief Creates a new HW performance counter * for the specified OpenCL context. * * \param device must be a valid OpenCL device. * * \param properties the list of properties of the hardware counter * * \param errcode_ret A non zero value if OpenCL failed to create PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_CONTEXT if the specified context is invalid. * - CL_OUT_OF_RESOURCES if we couldn't create the object * * \return the created perfcounter object */ extern CL_API_ENTRY cl_perfcounter_amd CL_API_CALL clCreatePerfCounterAMD( cl_device_id /* device */, cl_perfcounter_property* /* properties */, cl_int* /* errcode_ret */ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Destroy a performance counter object. * * \param perf_counter the perfcounter object for release * * \return A non zero value if OpenCL failed to release PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_OPERATION if we failed to release the object */ extern CL_API_ENTRY cl_int CL_API_CALL clReleasePerfCounterAMD(cl_perfcounter_amd /* perf_counter */ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Increments the perfcounter object reference count. * * \param perf_counter the perfcounter object for retain * * \return A non zero value if OpenCL failed to retain PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_OPERATION if we failed to release the object */ extern CL_API_ENTRY cl_int CL_API_CALL clRetainPerfCounterAMD(cl_perfcounter_amd /* perf_counter */ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Enqueues the begin command for the specified counters. * * \param command_queue must be a valid OpenCL command queue. * * \param num_perf_counters the number of perfcounter objects in the array. * * \param perf_counters specifies an array of perfcounter objects. * * \return A non zero value if OpenCL failed to release PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_OPERATION if we failed to enqueue the begin operation */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueBeginPerfCounterAMD( cl_command_queue /* command_queue */, cl_uint /* num_perf_counters */, cl_perfcounter_amd* /* perf_counters */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Enqueues the end command for the specified counters. * * \param command_queue must be a valid OpenCL command queue. * * \param num_perf_counters the number of perfcounter objects in the array. * * \param perf_counters specifies an array of perfcounter objects. * * \param event the event object associated with the end operation. * * \return A non zero value if OpenCL failed to release PerfCounter * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_OPERATION if we failed to enqueue the end operation */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueEndPerfCounterAMD( cl_command_queue /* command_queue */, cl_uint /* num_perf_counters */, cl_perfcounter_amd* /* perf_counters */, cl_uint /* num_events_in_wait_list */, const cl_event* /* event_wait_list */, cl_event* /* event */ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Retrieves the results from the counter objects. * * \param perf_counter specifies a perfcounter objects for query. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \param values must be a valid pointer to an array of 64-bit values * and the array size must be equal to num_perf_counters. * * \return * - CL_SUCCESS if the function is executed successfully. * - CL_PROFILING_INFO_NOT_AVAILABLE if event isn't finished. * - CL_INVALID_OPERATION if we failed to get the data */ extern CL_API_ENTRY cl_int CL_API_CALL clGetPerfCounterInfoAMD( cl_perfcounter_amd /* perf_counter */, cl_perfcounter_info /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */ ) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetDeviceClockModeAMD( cl_device_id /* device*/, cl_set_device_clock_mode_input_amd /* Clock_Mode_Input */, cl_set_device_clock_mode_output_amd* /* Clock_Mode_Output */ ) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } /*extern "C"*/ #endif /*__cplusplus*/ #endif /*__CL_PROFILE_AMD_H*/ clr-rocm-5.7.1/opencl/amdocl/cl_program.cpp000066400000000000000000002414051450307266000206230ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "vdi_common.hpp" #include "platform/context.hpp" #include "platform/program.hpp" #include "platform/kernel.hpp" #include "platform/sampler.hpp" #include "cl_semaphore_amd.h" #include static amd::Program* createProgram(cl_context context, cl_uint num_devices, const cl_device_id* device_list, cl_int* errcode_ret) { // Create the program amd::Program* program = new amd::Program(*as_amd(context)); if (program == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return NULL; } // Add programs for all devices in the context. if (device_list == NULL) { const std::vector& devices = as_amd(context)->devices(); for (const auto& it : devices) { if (program->addDeviceProgram(*it) == CL_OUT_OF_HOST_MEMORY) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; program->release(); return NULL; } } return program; } *not_null(errcode_ret) = CL_SUCCESS; for (cl_uint i = 0; i < num_devices; ++i) { cl_device_id device = device_list[i]; if (!is_valid(device) || !as_amd(context)->containsDevice(as_amd(device))) { *not_null(errcode_ret) = CL_INVALID_DEVICE; program->release(); return NULL; } cl_int status = program->addDeviceProgram(*as_amd(device)); if (status == CL_OUT_OF_HOST_MEMORY) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; program->release(); return NULL; } } return program; } /*! \addtogroup API * @{ * * \addtogroup CL_Programs * * An OpenCL program consists of a set of kernels that are identified as * functions declared with the __kernel qualifier in the program source. * OpenCL programs may also contain auxiliary functions and constant data that * can be used by __kernel functions. The program executable can be generated * online or offline by the OpenCL compiler for the appropriate * target device(s). * * @{ * * \addtogroup CL_CreatingPrograms * @{ */ /*! \brief Create a program object for a context, and loads the source code * specified by the text strings in the strings array into the program object. * * \param context must be a valid OpenCL context. * * \param count is the number of pointers in \a strings * * \param strings is an array of \a count pointers to optionally * null-terminated character strings that make up the source code. * * \param lengths is an array with the number of chars in each string (the * string length). If an element in lengths is zero, its accompanying string * is null-terminated. If lengths is NULL, all strings in the strings argument * are considered null-terminated. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero program object and errcode_ret is set to * \a CL_SUCCESS if the program object is created successfully. It returns a * NULL value with one of the following error values returned in * \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if \a count is zero or if \a strings or any entry in * \a strings is NULL. * - CL_COMPILER_NOT_AVAILABLE if a compiler is not available. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_program, clCreateProgramWithSource, (cl_context context, cl_uint count, const char** strings, const size_t* lengths, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return (cl_program)0; } if (count == 0 || strings == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } std::string sourceCode; for (cl_uint i = 0; i < count; ++i) { if (strings[i] == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } if (lengths && lengths[i] != 0) { sourceCode.append(strings[i], lengths[i]); } else { sourceCode.append(strings[i]); } } if (sourceCode.empty()) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } // Create the program amd::Program* program = new amd::Program(*as_amd(context), sourceCode, amd::Program::OpenCL_C); if (program == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_program)0; } // Add programs for all devices in the context. const std::vector& devices = as_amd(context)->devices(); for (const auto& it : devices) { if (program->addDeviceProgram(*it) == CL_OUT_OF_HOST_MEMORY) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; program->release(); return (cl_program)0; } } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(program); } RUNTIME_EXIT /*! \brief Create a program object for a context, and loads the IL into the * program object. * * \param context must be a valid OpenCL context. * * \param string is a pointer to IL. * * \param length is the size in bytes of IL. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero program object and errcode_ret is set to * \a CL_SUCCESS if the program object is created successfully. It returns a * NULL value with one of the following error values returned in * \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if \a il is NULL or \a length is zero. * - CL_INVALID_VALUE if the \a length-byte memory pointed to by \a il does * not contain well-formed intermediate language input appropriate for the * deployment environment in which the OpenCL platform is running. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_program, clCreateProgramWithIL, (cl_context context, const void* il, size_t length, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return (cl_program)0; } if (length == 0 || il == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } // Create the program amd::Program* program = new amd::Program(*as_amd(context), amd::Program::SPIRV); if (program == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_program)0; } // Add programs for all devices in the context. const std::vector& devices = as_amd(context)->devices(); for (const auto& it : devices) { if (program->addDeviceProgram(*it, il, length) == CL_OUT_OF_HOST_MEMORY) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; program->release(); return (cl_program)0; } } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(program); } RUNTIME_EXIT /*! \brief Create a program object for a context, and loads the binary images * into the program object. * * \param context must be a valid OpenCL context. * * \param device_list is a pointer to a list of devices that are in context. * \a device_list must be a non-NULL value. The binaries are loaded for devices * specified in this list. * * \param num_devices is the number of devices listed in \a device_list. * * \param device_list The devices associated with the program object. The * list of devices specified by \a device_list must be devices associated with * \a context. * * \param lengths is an array of the size in bytes of the program binaries to * be loaded for devices specified by \a device_list. * * \param binaries is an array of pointers to program binaries to be loaded * for devices specified by \a device_list. For each device given by * \a device_list[i], the pointer to the program binary for that device is * given by \a binaries[i] and the length of this corresponding binary is given * by \a lengths[i]. \a lengths[i] cannot be zero and \a binaries[i] cannot be * a NULL pointer. The program binaries specified by binaries contain the bits * that describe the program executable that will be run on the device(s) * associated with context. The program binary can consist of either or both: * - Device-specific executable(s) * - Implementation specific intermediate representation (IR) which will be * converted to the device-specific executable. * * \param binary_status returns whether the program binary for each device * specified in \a device_list was loaded successfully or not. It is an array * of \a num_devices entries and returns CL_SUCCESS in \a binary_status[i] if * binary was successfully loaded for device specified by \a device_list[i]; * otherwise returns CL_INVALID_VALUE if \a lengths[i] is zero or if * \a binaries[i] is a NULL value or CL_INVALID_BINARY in \a binary_status[i] * if program binary is not a valid binary for the specified device. * If \a binary_status is NULL, it is ignored. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero program object and \a errcode_ret is set to * CL_SUCCESS if the program object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if \a device_list is NULL or \a num_devices is zero. * - CL_INVALID_DEVICE if OpenCL devices listed in \a device_list are not in * the list of devices associated with \a context * - CL_INVALID_VALUE if \a lengths or \a binaries are NULL or if any entry * in \a lengths[i] is zero or \a binaries[i] is NULL. * - CL_INVALID_BINARY if an invalid program binary was encountered for any * device. \a binary_status will return specific status for each device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_program, clCreateProgramWithBinary, (cl_context context, cl_uint num_devices, const cl_device_id* device_list, const size_t* lengths, const unsigned char** binaries, cl_int* binary_status, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return (cl_program)0; } if (num_devices == 0 || device_list == NULL || binaries == NULL || lengths == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } amd::Program* program = new amd::Program(*as_amd(context)); if (program == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_program)0; } *not_null(errcode_ret) = CL_SUCCESS; for (cl_uint i = 0; i < num_devices; ++i) { cl_device_id device = device_list[i]; if (!is_valid(device) || !as_amd(context)->containsDevice(as_amd(device))) { *not_null(errcode_ret) = CL_INVALID_DEVICE; program->release(); return (cl_program)0; } if (binaries[i] == NULL || lengths[i] == 0) { if (binary_status != NULL) { binary_status[i] = CL_INVALID_VALUE; } *not_null(errcode_ret) = CL_INVALID_VALUE; continue; } cl_int status = program->addDeviceProgram(*as_amd(device), binaries[i], lengths[i]); *not_null(errcode_ret) = status; if (status == CL_OUT_OF_HOST_MEMORY) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; program->release(); return (cl_program)0; } if (binary_status != NULL) { binary_status[i] = status; } } return as_cl(program); } RUNTIME_EXIT RUNTIME_ENTRY_RET(cl_program, clCreateProgramWithAssemblyAMD, (cl_context context, cl_uint count, const char** strings, const size_t* lengths, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return (cl_program)0; } if (count == 0 || strings == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } std::string assembly; for (cl_uint i = 0; i < count; ++i) { if (strings[i] == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } if (lengths && lengths[i] != 0) { assembly.append(strings[i], lengths[i]); } else { assembly.append(strings[i]); } } if (assembly.empty()) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } // Create the program amd::Program* program = new amd::Program(*as_amd(context), assembly, amd::Program::Assembly); if (program == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_program)0; } // Add programs for all devices in the context. const std::vector& devices = as_amd(context)->devices(); for (const auto& it : devices) { if (program->addDeviceProgram(*it) == CL_OUT_OF_HOST_MEMORY) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; program->release(); return (cl_program)0; } } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(program); } RUNTIME_EXIT /*! \brief Increment the program reference count. * * clCreateProgram does an implicit retain. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_PROGRAM if \a program is not a valid program object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clRetainProgram, (cl_program program)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } as_amd(program)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Decrement the program reference count. * * The program object is deleted after all kernel objects associated with * \a program have been deleted and the program reference count becomes zero. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_PROGRAM if \a program is not a valid program object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clReleaseProgram, (cl_program program)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } as_amd(program)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_Build * @{ */ /*! \brief Build (compile & link) a program executable from the program source * or binary for all the devices or a specific device(s) in the OpenCL context * associated with program. * * OpenCL allows program executables to be built using the sources or binaries. * * \param program is the program object. * * \param device_list is a pointer to a list of devices associated with * \a program. If \a device_list is a NULL value, the program executable is * built for all devices associated with \a program for which a source or * binary has been loaded. If \a device_list is a non-NULL value, the program * executable is built for devices specified in this list for which a source * or binary has been loaded. * * \param num_devices is the number of devices listed in \a device_list. * * \param options is a pointer to a string that describes the build options to * be used for building the program executable. * * \param pfn_notify is a function pointer to a notification routine. The * notification routine allows an application to register a callback function * which will be called when the program executable has been built * (successfully or unsuccessfully). If \a pfn_notify is not NULL, * clBuildProgram does not need to wait for the build to complete and can * return immediately. If \a pfn_notify is NULL, clBuildProgram does not * return until the build has completed. This callback function may be called * asynchronously by the OpenCL implementation. It is the application's * responsibility to ensure that the callback function is thread-safe. * * \param user_data will be passed as the argument when \a pfn_notify is * called. \a user_data can be NULL. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_PROGRAM if \a program is not a valid program object * - CL_INVALID_VALUE if \a device_list is NULL and \a num_devices is greater * than zero, or if \a device_list is not NULL and \a num_devices is zero, * - CL_INVALID_DEVICE if OpenCL devices listed in \a device_list are not in * the list of devices associated with \a program * - CL_INVALID_BINARY if \a program is created with clCreateWithProgramBinary * and devices listed in \a device_list do not have a valid program binary * loaded * - CL_INVALID_BUILD_OPTIONS if the build options specified by \a options are * invalid * - CL_INVALID_OPERATION if the build of a program executable for any of the * devices listed in \a device_list by a previous call to clBuildProgram for * \a program has not completed * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clBuildProgram, (cl_program program, cl_uint num_devices, const cl_device_id* device_list, const char* options, void(CL_CALLBACK* pfn_notify)(cl_program program, void* user_data), void* user_data)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } if ((num_devices > 0 && device_list == NULL) || (num_devices == 0 && device_list != NULL)) { return CL_INVALID_VALUE; } if (pfn_notify == nullptr && user_data != nullptr) { return CL_INVALID_VALUE; } amd::Program* amdProgram = as_amd(program); if (device_list == NULL) { // build for all devices in the context. return amdProgram->build(amdProgram->context().devices(), options, pfn_notify, user_data); } std::vector devices(num_devices); for (cl_uint i = 0; i < num_devices; ++i) { amd::Device* device = as_amd(device_list[i]); if (!amdProgram->context().containsDevice(device)) { return CL_INVALID_DEVICE; } devices[i] = device; } return amdProgram->build(devices, options, pfn_notify, user_data); } RUNTIME_EXIT /*! \brief compiles a program's source for all the devices or a specific * device(s) in the OpenCL context associated with program. The pre-processor * runs before the program sources are compiled. * The compiled binary is built for all devices associated with program or * the list of devices specified. The compiled binary can be queried using * \a clGetProgramInfo(program, CL_PROGRAM_BINARIES, ...) and can be specified * to \a clCreateProgramWithBinary to create a new program object. * * \param program is the program object that is the compilation target. * * \param device_list is a pointer to a list of devices associated with program. * If device_list is a NULL value, the compile is performed for all devices * associated with program. If device_list is a non-NULL value, the compile is * performed for devices specified in this list. * * \param num_devices is the number of devices listed in \a device_list. * * \param options is a pointer to a null-terminated string of characters that * describes the compilation options to be used for building the program * executable. The list of supported options is as described in section 5.6.4. * * \param num_input_headers specifies the number of programs that describe * headers in the array referenced by input_headers. * * \param input_headers is an array of program embedded headers created with * \a clCreateProgramWithSource. * * \param header_include_names is an array that has a one to one correspondence * with input_headers. * Each entry in \a header_include_names specifies the include name used by * source in program that comes from an embedded header. The corresponding entry * in input_headers identifies the program object which contains the header * source to be used. The embedded headers are first searched before the headers * in the list of directories specified by the -I compile option (as described in * section 5.6.4.1). If multiple entries in header_include_names refer to the same * header name, the first one encountered will be used. * * \param pfn_notify is a function pointer to a notification routine. The * notification routine is a callback function that an application can register * and which will be called when the program executable has been built * (successfully or unsuccessfully). If pfn_notify is not NULL, * \a clCompileProgram does not need to wait for the compiler to complete and can * return immediately. If \a pfn_notify is NULL, \a clCompileProgram does not * return until the compiler has completed. This callback function may be called * asynchronously by the OpenCL implementation. It is the application's * responsibility to ensure that the callback function is thread-safe. * * \param user_data will be passed as an argument when pfn_notify is called. * \a user_data can be NULL. * * \return CL_SUCCESS if the function is executed successfully. Otherwise, it * returns one of the following errors: * - CL_INVALID_PROGRAM if program is not a valid program object. * - CL_INVALID_VALUE if device_list is NULL and num_devices is greater than * zero, or if \a device_list is not NULL and \a num_devices is zero. * - CL_INVALID_VALUE if num_input_headers is zero and \a header_include_names * or input_headers are not NULL or if num_input_headers is not zero and * \a header_include_names or input_headers are NULL. * - CL_INVALID_VALUE if \a pfn_notify is NULL but \a user_data is not NULL. * - CL_INVALID_DEVICE if OpenCL devices listed in device_list are not in the * list of devices associated with program * - CL_INVALID_COMPILER_OPTIONS if the compiler options specified by options * are invalid. * - CL_INVALID_OPERATION if the compilation or build of a program executable * for any of the devices listed in device_list by a previous call to * \a clCompileProgram or \a clBuildProgram for program has not completed. * - CL_COMPILER_NOT_AVAILABLE if a compiler is not available i.e. * - CL_DEVICE_COMPILER_AVAILABLE specified in table 4.3 is set to CL_FALSE. * - CL_COMPILE_PROGRAM_FAILURE if there is a failure to compile the program * source. This error will be returned if clCompileProgram does not return * until the compile has completed. * - CL_INVALID_OPERATION if there are kernel objects attached to program. * - CL_INVALID_OPERATION if program has no source i.e. it has not been created * with \a clCreateProgramWithSource. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clCompileProgram, (cl_program program, cl_uint num_devices, const cl_device_id* device_list, const char* options, cl_uint num_input_headers, const cl_program* input_headers, const char** header_include_names, void(CL_CALLBACK* pfn_notify)(cl_program program, void* user_data), void* user_data)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } if ((num_devices > 0 && device_list == NULL) || (num_devices == 0 && device_list != NULL)) { return CL_INVALID_VALUE; } if ((num_input_headers > 0 && (input_headers == NULL || header_include_names == NULL)) || (num_input_headers == 0 && (input_headers != NULL || header_include_names != NULL))) { return CL_INVALID_VALUE; } if (pfn_notify == NULL && user_data != NULL) { return CL_INVALID_VALUE; } amd::Program* amdProgram = as_amd(program); if (amdProgram->referenceCount() > 1) { return CL_INVALID_OPERATION; } std::vector headerPrograms(num_input_headers); for (cl_uint i = 0; i < num_input_headers; ++i) { if (!is_valid(input_headers[i])) { return CL_INVALID_OPERATION; } const amd::Program* headerProgram = as_amd(input_headers[i]); headerPrograms[i] = headerProgram; } if (device_list == NULL) { // compile for all devices in the context. return amdProgram->compile(amdProgram->context().devices(), num_input_headers, headerPrograms, header_include_names, options, pfn_notify, user_data); } std::vector devices(num_devices); for (cl_uint i = 0; i < num_devices; ++i) { amd::Device* device = as_amd(device_list[i]); if (!amdProgram->context().containsDevice(device)) { return CL_INVALID_DEVICE; } devices[i] = device; } return amdProgram->compile(devices, num_input_headers, headerPrograms, header_include_names, options, pfn_notify, user_data); } RUNTIME_EXIT /*! \brief links a set of compiled program objects and libraries for all * the devices or a specific device(s) in the OpenCL context and creates * an executable. clLinkProgram creates a new program object which contains * this executable. The executable binary can be queried using * \a clGetProgramInfo(program, CL_PROGRAM_BINARIES, ...) and can be specified * to \a clCreateProgramWithBinary to create a new program object. * The devices associated with the returned program object will be the list * of devices specified by device_list or if device_list is NULL it will be * the list of devices associated with context. * * \param context must be a valid OpenCL context. * * \param device_list is a pointer to a list of devices that are in context. * If device_list is a NULL value, the link is performed for all devices * associated with context for which a compiled object is available. * If device_list is a non-NULL value, the compile is performed for devices * specified in this list for which a source has been loaded. * * \param num_devices is the number of devices listed in device_list. * * \param options is a pointer to a null-terminated string of characters * that describes the link options to be used for building the program * executable. The list of supported options is as described in section 5.6.5. * * \param num_input_programs specifies the number of programs in array * referenced by input_programs. * * \param input_programs is an array of program objects that are compiled * binaries or libraries that are to be linked to create the program executable. * For each device in device_list or if device_list is NULL the list of devices * associated with context, the following cases occur: * All programs specified by input_programs contain a compiled binary or * library for the device. In this case, a link is performed to generate * a program executable for this device. None of the programs contain * a compiled binary or library for that device. In this case, no link is * performed and there will be no program executable generated for this device. * All other cases will return a CL_INVALID_OPERATION error. * * \param pfn_notify is a function pointer to a notification routine. * The notification routine is a callback function that an application can * register and which will be called when the program executable has been built * (successfully or unsuccessfully). If \a pfn_notify is not NULL, * \a clLinkProgram does not need to wait for the linker to complete and can * return immediately. Once the linker has completed, the \a pfn_notify * callback function is called with a valid program object (if the link was * successful) or NULL (if the link encountered a failure). This callback * function may be called asynchronously by the OpenCL implementation. It is * the application's responsibility to ensure that the callback function is * thread-safe. If \a pfn_notify is NULL, \a clLinkProgram does not return * until the linker has completed. clLinkProgram returns a valid non-zero * program object (if the link was successful) or NULL (if the link * encountered a failure). * * \a user_data will be passed as an argument when \a pfn_notify is called. * user_data can be NULL. * * \return a valid non-zero program object and errcode_ret is set to CL_SUCCESS * if the link was successful in generating a program executable for at least * one device and the program object was created successfully. If \a pfn_notify * is not NULL, \a clLinkProgram returns a NULL program object and * \a errcode_ret is set to CL_SUCCESS if the function was executed * successfully. Otherwise, it returns one of the following errors: * - CL_INVALID_CONTEXT if context is not a valid context. * - CL_INVALID_VALUE if device_list is NULL and num_devices is greater than * zero, or if \a device_list is not NULL and \a num_devices is zero. * - CL_INVALID_VALUE if \a num_input_programs is zero and \a input_programs * is NULL or if \a num_input_programs is zero and \a input_programs is not * NULL or if \a num_input_programs is not zero and \a input_programs is NULL. * - CL_INVALID_PROGRAM if programs specified in \a input_programs are not * valid program objects. * - CL_INVALID_VALUE if \a pfn_notify is NULL but \a user_data is not NULL. * - CL_INVALID_DEVICE if OpenCL devices listed in \a device_list are not in * the list of devices associated with context * - CL_INVALID_LINKER_OPTIONS if the linker options specified by options are * invalid. * - CL_INVALID_OPERATION if the compilation or build of a program executable * for any of the devices listed in \a device_list by a previous call to * clCompileProgram or clBuildProgram for program has not completed. * - CL_INVALID_OPERATION if the rules for devices containing compiled binaries * or libraries as described in \a input_programs argument above are * not followed. * - CL_LINKER_NOT_AVAILABLE if a linker is not available i.e. * - CL_DEVICE_LINKER_AVAILABLE specified in table 4.3 is set to CL_FALSE. * - CL_LINK_PROGRAM_FAILURE if there is a failure to link the compiled * binaries and/or libraries. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY_RET(cl_program, clLinkProgram, (cl_context context, cl_uint num_devices, const cl_device_id* device_list, const char* options, cl_uint num_input_programs, const cl_program* input_programs, void(CL_CALLBACK* pfn_notify)(cl_program program, void* user_data), void* user_data, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; return (cl_program)0; } if ((num_devices > 0 && device_list == NULL) || (num_devices == 0 && device_list != NULL)) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } if (num_input_programs == 0 || input_programs == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } if (pfn_notify == NULL && user_data != NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_program)0; } std::vector inputPrograms(num_input_programs); for (cl_uint i = 0; i < num_input_programs; ++i) { if (!is_valid(input_programs[i])) { *not_null(errcode_ret) = CL_INVALID_PROGRAM; return (cl_program)0; } amd::Program* inputProgram = as_amd(input_programs[i]); inputPrograms[i] = inputProgram; } amd::Program* program = createProgram(context, num_devices, device_list, errcode_ret); if (program == NULL) return (cl_program)0; *not_null(errcode_ret) = CL_SUCCESS; cl_int status; if (device_list == NULL) { // compile for all devices in the context. status = program->link(as_amd(context)->devices(), num_input_programs, inputPrograms, options, pfn_notify, user_data); } else { std::vector devices(num_devices); for (cl_uint i = 0; i < num_devices; ++i) { amd::Device* device = as_amd(device_list[i]); if (!as_amd(context)->containsDevice(device)) { program->release(); *not_null(errcode_ret) = CL_INVALID_DEVICE; return (cl_program)0; } devices[i] = device; } status = program->link(devices, num_input_programs, inputPrograms, options, pfn_notify, user_data); } *not_null(errcode_ret) = status; if (status == CL_SUCCESS) { return as_cl(program); } program->release(); return (cl_program)0; } RUNTIME_EXIT /*! \brief creates a program object for a context, and loads the information * related to the built-in kernels into a program object. * * \param context must be a valid OpenCL context. * * \param num_devices is the number of devices listed in device_list. * * \param device_list is a pointer to a list of devices that are in context. * \a device_list must be a non-NULL value. The built-in kernels are loaded * for devices specified in this list. The devices associated with the * program object will be the list of devices specified by \a device_list. * The list of devices specified by \a device_list must be devices associated * with context. * * \param kernel_names is a semi-colon separated list of built-in kernel names. * * \return a valid non-zero program object and \a errcode_ret is set to * CL_SUCCESS if the program object is created successfully. Otherwise, it * returns a NULL value with one of the following error values returned * in errcode_ret: * - CL_INVALID_CONTEXT if context is not a valid context. * - CL_INVALID_VALUE if device_list is NULL or num_devices is zero. * - CL_INVALID_VALUE if kernel_names is NULL or kernel_names contains a kernel * name that is not supported by any of the devices in \a device_list. * - CL_INVALID_DEVICE if devices listed in device_list are not in the list * of devices associated with context. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 1.2r07 */ RUNTIME_ENTRY_RET(cl_program, clCreateProgramWithBuiltInKernels, (cl_context context, cl_uint num_devices, const cl_device_id* device_list, const char* kernel_names, cl_int* errcode_ret)) { //!@todo Add implementation amd::Program* program = NULL; Unimplemented(); return as_cl(program); } RUNTIME_EXIT /*! @} * \addtogroup CL_Unloading * @{ */ /*! \brief Allows the implementation to release the resources allocated by * the OpenCL compiler for platform. This is a hint from the application * and does not guarantee that the compiler will not be used in the future * or that the compiler will actually be unloaded by the implementation. * Calls to \a clBuildProgram, \a clCompileProgram or \a clLinkProgram after * \a clUnloadPlatformCompiler will reload the compiler, * if necessary, to build the appropriate program executable. * * \return CL_SUCCESS if the function is executed successfully. * Otherwise, it returns one of the following errors: * - CL_INVALID_PLATFORM if platform is not a valid platform. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clUnloadPlatformCompiler, (cl_platform_id platform)) { if (platform != NULL && platform != AMD_PLATFORM) { return CL_INVALID_PLATFORM; } //! @todo: Implement Compiler::unload() return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Allow to runtime to release the resources allocated by the OpenCL * compiler. * * This is a hint from the application and does not guarantee that the compiler * will not be used in the future or that the compiler will actually be * unloaded by the implementation. * * Calls to clBuildProgram after clUnloadCompiler may reload the compiler, * if necessary, to build the appropriate program executable. * * \return This call currently always returns CL_SUCCESS * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clUnloadCompiler, (void)) { //! @todo: Implement Compiler::unload() return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_ProgramQueries * @{ */ /*! \brief Return information about the program object. * * \param program specifies the program object being queried. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_PROGRAM_EXECUTABLE if param_name is * CL_PROGRAM_NUM_KERNELS or CL_PROGRAM_KERNEL_NAMES and a successful * program executable has not been built for at least one device in the list * of devices associated with program. * - CL_INVALID_PROGRAM if \a program is a not a valid program object * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clGetProgramInfo, (cl_program program, cl_program_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } switch (param_name) { case CL_PROGRAM_REFERENCE_COUNT: { cl_uint count = as_amd(program)->referenceCount(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_CONTEXT: { cl_context context = const_cast(as_cl(&as_amd(program)->context())); return amd::clGetInfo(context, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_NUM_DEVICES: { cl_uint numDevices = (cl_uint)as_amd(program)->deviceList().size(); return amd::clGetInfo(numDevices, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_DEVICES: { const amd::Program::devicelist_t& devices = as_amd(program)->deviceList(); const size_t numDevices = devices.size(); const size_t valueSize = numDevices * sizeof(cl_device_id); if (param_value != NULL && param_value_size < valueSize) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = valueSize; if (param_value != NULL) { cl_device_id* device_list = (cl_device_id*)param_value; for (const auto& it : devices) { *device_list++ = const_cast(as_cl(it)); } if (param_value_size > valueSize) { ::memset(static_cast
(param_value) + valueSize, '\0', param_value_size - valueSize); } } return CL_SUCCESS; } case CL_PROGRAM_SOURCE: { const char* source = as_amd(program)->sourceCode().c_str(); return amd::clGetInfo(source, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_BINARY_SIZES: { amd::Program* amdProgram = as_amd(program); const amd::Program::devicelist_t& devices = amdProgram->deviceList(); const size_t numBinaries = devices.size(); const size_t valueSize = numBinaries * sizeof(size_t); if (param_value != NULL && param_value_size < valueSize) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = valueSize; if (param_value != NULL) { size_t* binary_sizes = (size_t*)param_value; for (const auto& it : devices) { *binary_sizes++ = amdProgram->getDeviceProgram(*it)->binary().second; } if (param_value_size > valueSize) { ::memset(static_cast
(param_value) + valueSize, '\0', param_value_size - valueSize); } } return CL_SUCCESS; } case CL_PROGRAM_BINARIES: { amd::Program* amdProgram = as_amd(program); const amd::Program::devicelist_t& devices = amdProgram->deviceList(); const size_t numBinaries = devices.size(); const size_t valueSize = numBinaries * sizeof(char*); if (param_value != NULL && param_value_size < valueSize) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = valueSize; if (param_value != NULL) { char** binaries = (char**)param_value; for (const auto& it : devices) { const device::Program::binary_t& binary = amdProgram->getDeviceProgram(*it)->binary(); // If an entry value in the array is NULL, // then runtime should skip copying the program binary if (*binaries != NULL) { ::memcpy(*binaries, binary.first, binary.second); } binaries++; } if (param_value_size > valueSize) { ::memset(static_cast
(param_value) + valueSize, '\0', param_value_size - valueSize); } } return CL_SUCCESS; } case CL_PROGRAM_NUM_KERNELS: { if (as_amd(program)->symbolsPtr() == NULL) { return CL_INVALID_PROGRAM_EXECUTABLE; } size_t numKernels = as_amd(program)->symbols().size(); return amd::clGetInfo(numKernels, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_KERNEL_NAMES: { const char* kernelNames = as_amd(program)->kernelNames().c_str(); return amd::clGetInfo(kernelNames, param_value_size, param_value, param_value_size_ret); } default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! \brief Return build information for each device in the program object. * * \param program specifies the program object being queried. * * \param device specifies the device for which build information is being * queried. device must be a valid device associated with \a program. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_DEVICE if \a device is not in the list of devices associated * with \a program * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_PROGRAM if \a program is a not a valid program object * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetProgramBuildInfo, (cl_program program, cl_device_id device, cl_program_build_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } if (!is_valid(device)) { return CL_INVALID_DEVICE; } const device::Program* devProgram = as_amd(program)->getDeviceProgram(*as_amd(device)); if (devProgram == NULL) { return CL_INVALID_DEVICE; } switch (param_name) { case CL_PROGRAM_BUILD_STATUS: { cl_build_status status = devProgram->buildStatus(); return amd::clGetInfo(status, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_BUILD_OPTIONS: { const std::string optionsStr = devProgram->lastBuildOptionsArg(); const char* options = optionsStr.c_str(); return amd::clGetInfo(options, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_BUILD_LOG: { const std::string logstr = as_amd(program)->programLog() + devProgram->buildLog().c_str(); const char* log = logstr.c_str(); return amd::clGetInfo(log, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_BINARY_TYPE: { const device::Program::type_t devProgramType = devProgram->type(); cl_uint type; switch (devProgramType) { case device::Program::TYPE_NONE: { type = CL_PROGRAM_BINARY_TYPE_NONE; break; } case device::Program::TYPE_COMPILED: { type = CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT; break; } case device::Program::TYPE_LIBRARY: { type = CL_PROGRAM_BINARY_TYPE_LIBRARY; break; } case device::Program::TYPE_EXECUTABLE: { type = CL_PROGRAM_BINARY_TYPE_EXECUTABLE; break; } case device::Program::TYPE_INTERMEDIATE: { type = CL_PROGRAM_BINARY_TYPE_INTERMEDIATE; break; } default: return CL_INVALID_VALUE; } return amd::clGetInfo(type, param_value_size, param_value, param_value_size_ret); } case CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE: { size_t size = devProgram->globalVariableTotalSize(); return amd::clGetInfo(size, param_value_size, param_value, param_value_size_ret); } default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! \brief Sets the values of a SPIR-V specialization constants. * * \param program must be a valid OpenCL program created from a SPIR-V module. * * \param spec id_ identifies the SPIR-V specialization constant whose value will be set. * * \param spec_size specifies the size in bytes of the data pointed to by spec_value. This should * be 1 for boolean constants. For all other constant types this should match the size of the * specialization constant in the SPIR-V module. * * \param spec_value is a pointer to the memory location that contains the value of the * specialization constant. The data pointed to by \a spec_value are copied and can be safely * reused by the application after \a clSetProgramSpecializationConstant returns. This * specialization value will be used by subsequent calls to \a clBuildProgram until another call to * \a clSetProgramSpecializationConstant changes it. If a specialization constant is a boolean * constant, _spec value_should be a pointer to a cl_uchar value. A value of zero will set the * specialization constant to false; any other value will set it to true. * * Calling this function multiple times for the same specialization constant shall cause the last * provided value to override any previously specified value. The values are used by a subsequent * \a clBuildProgram call for the program. * * Application is not required to provide values for every specialization constant contained in * SPIR-V module. SPIR-V provides default values for all specialization constants. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_PROGRAM if program is not a valid program object created from a SPIR-V module. * - CL_INVALID_SPEC_ID if spec_id is not a valid specialization constant ID * - CL_INVALID_VALUE if spec_size does not match the size of the specialization constant in the * SPIR-V module, or if spec_value is NULL. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL * implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL * implementation on the host. * * \version 2.2-3 */ RUNTIME_ENTRY(cl_int, clSetProgramSpecializationConstant, (cl_program program, cl_uint spec_id, size_t spec_size, const void* spec_value)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! \brief registers a user callback function with a program object. Each call to * \a clSetProgramReleaseCallback registers the specified user callback function on a callback stack * associated with program. The registered user callback functions are called in the reverse order * in which they were registered. The user callback functions are called after destructors (if any) * for program scope global variables (if any) are called and before the program is released. * This provides a mechanism for the application (and libraries) to be notified when destructors * are complete. * * \param program is a valid program object * * \param pfn_notify is the callback function that can be registered by the application. This * callback function may be called asynchronously by the OpenCL implementation. It is the * application's responsibility to ensure that the callback function is thread safe. The parameters * to this callback function are: * - \a prog is the program object whose destructors are being called. When the user callback is * called by the implementation, this program object is not longer valid. \a prog is only provided * for reference purposes. * - \a user_data is a pointer to user supplied data. \a user_data will be passed as the * \a user_data argument when pfn_notify is called. user data can be NULL. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_PROGRAM if program is not a valid program object. * - CL_INVALID_VALUE if pfn_notify is NULL. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL * implementation on the device. * * \version 2.2-3 */ RUNTIME_ENTRY(cl_int, clSetProgramReleaseCallback, (cl_program program, void (CL_CALLBACK *pfn_notify)( cl_program program, void *user_data ), void *user_data)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! @} * @} * * \addtogroup CL_Kernels * * A kernel is a function declared in a program. A kernel is identified by the * __kernel qualifier applied to any function in a program. A kernel object * encapsulates the specific __kernel function declared in a program and * the argument values to be used when executing this __kernel function. * * @{ * * \addtogroup CL_CreateKernel * @{ */ /*! \brief Create a kernel object. * * \param program is a program object with a successfully built executable. * * \param kernel_name is a function name in the program declared with the * __kernel qualifier. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero kernel object and \a errcode_ret is set to * CL_SUCCESS if the kernel object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_PROGRAM if \a program is not a valid program object * - CL_INVALID_PROGRAM_EXECUTABLE if there is no successfully built executable * for \a program. * - CL_INVALID_KERNEL_NAME if \a kernel_name is not found in \a program. * - CL_INVALID_KERNEL_DEFINITION if the function definition for __kernel * function given by \a kernel_name such as the number of arguments, the * argument types are not the same for all devices for which the program * executable has been built. * - CL_INVALID_VALUE if \a kernel_name is NULL. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_kernel, clCreateKernel, (cl_program program, const char* kernel_name, cl_int* errcode_ret)) { if (!is_valid(program)) { *not_null(errcode_ret) = CL_INVALID_PROGRAM; return (cl_kernel)0; } if (kernel_name == NULL) { *not_null(errcode_ret) = CL_INVALID_VALUE; return (cl_kernel)0; } /* FIXME_lmoriche, FIXME_spec: What are we supposed to do here? * if (!as_amd(program)->containsOneSuccesfullyBuiltProgram()) * { * *NotNull(errcode) = CL_INVALID_PROGRAM_EXECUTABLE; * return (cl_kernel) 0; * } */ amd::Program* amd_program = as_amd(program); if (!amd_program->load()) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_kernel)0; } const amd::Symbol* symbol = amd_program->findSymbol(kernel_name); if (symbol == NULL) { *not_null(errcode_ret) = CL_INVALID_KERNEL_NAME; return (cl_kernel)0; } amd::Kernel* kernel = new amd::Kernel(*amd_program, *symbol, kernel_name); if (kernel == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_kernel)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(kernel); } RUNTIME_EXIT /*! \brief Create kernel objects for all kernel functions in program. * * Kernel objects may not be created for any __kernel functions in program * that do not have the same function definition across all devices for which * a program executable has been successfully built. * * \param program is a program object with a successfully built executable. * * \param num_kernels is the size of memory pointed to by \a kernels specified * as the number of cl_kernel entries. * * \param kernels is the buffer where the kernel objects for kernels in * \a program will be returned. If \a kernels is NULL, it is ignored. * If \a kernels is not NULL, \a num_kernels must be greater than or equal * to the number of kernels in program. * * \param num_kernels_ret is the number of kernels in program. If * \a num_kernels_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the kernel objects were successfully allocated * - CL_INVALID_PROGRAM if \a program is not a valid program object * - CL_INVALID_PROGRAM_EXECUTABLE if there is no successfully built executable * for any device in \a program * - CL_INVALID_VALUE if \a kernels is not NULL and \a num_kernels is less * than the number of kernels in program * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * Kernel objects can only be created once you have a program object with a * valid program source or binary loaded into the program object and the * program executable has been successfully built for one or more devices * associated with \a program. No changes to the program executable are * allowed while there are kernel objects associated with a program object. * This means that calls to clBuildProgram return CL_INVALID_OPERATION if there * are kernel objects attached to a program object. The OpenCL context * associated with program will be the context associated with kernel. * Devices associated with a program object for which a valid program * executable has been built can be used to execute kernels declared in the * program object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clCreateKernelsInProgram, (cl_program program, cl_uint num_kernels, cl_kernel* kernels, cl_uint* num_kernels_ret)) { if (!is_valid(program)) { return CL_INVALID_PROGRAM; } amd::Program* amd_program = as_amd(program); if (!amd_program->load()) { return CL_OUT_OF_HOST_MEMORY; } cl_uint numKernels = (cl_uint)amd_program->symbols().size(); if (kernels != NULL && num_kernels < numKernels) { return CL_INVALID_VALUE; } *not_null(num_kernels_ret) = numKernels; if (kernels == NULL) { return CL_SUCCESS; } const amd::Program::symbols_t& symbols = amd_program->symbols(); cl_kernel* result = kernels; for (const auto& it : symbols) { amd::Kernel* kernel = new amd::Kernel(*amd_program, it.second, it.first); if (kernel == NULL) { while (--result >= kernels) { as_amd(*result)->release(); } return CL_OUT_OF_HOST_MEMORY; } *result++ = as_cl(kernel); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Increment the kernel reference count. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_KERNEL if \a kernel is not a valid kernel object. * * clCreateKernel or clCreateKernelsInProgram do an implicit retain. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clRetainKernel, (cl_kernel kernel)) { if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } as_amd(kernel)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Decrement the kernel reference count. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_KERNEL if \a kernel is not a valid kernel object. * * The kernel object is deleted once the number of instances that are retained * to \a kernel become zero and after all queued execution instances of * \a kernel have finished. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clReleaseKernel, (cl_kernel kernel)) { if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } as_amd(kernel)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Makes a shallow copy of the kernel object, its arguments and any * information passed to the kernel object using \a clSetKernelExecInfo. If * the kernel object was ready to be enqueued before copying it, the clone of * the kernel object is ready to enqueue. * * \param source_kernel is a valid cl_kernel object that will be copied. * source_kernel will not be modified in any way by this function. * * \param errcode_ret will be assigned an appropriate error code. If * errcode_ret is NULL, no error code is returned. * * \return a valid non-zero kernel object and errcode_ret is set to * CL_SUCCESS if the kernel is successfully copied. Otherwise it returns a * NULL value with one of the following error values returned in errcode_ret: * - CL_INVALID_KERNEL if kernel is not a valid kernel object. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources * required by the OpenCL implementation on the host. * * \version 2.1r01 */ RUNTIME_ENTRY_RET(cl_kernel, clCloneKernel, (cl_kernel source_kernel, cl_int* errcode_ret)) { if (!is_valid(source_kernel)) { *not_null(errcode_ret) = CL_INVALID_KERNEL; return (cl_kernel)0; } amd::Kernel* kernel = new amd::Kernel(*as_amd(source_kernel)); if (kernel == NULL) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; return (cl_kernel)0; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(kernel); } RUNTIME_EXIT /*! @} * \addtogroup CL_SettingArgs * @{ */ /*! \brief Set the argument value for a specific argument of a kernel. * * \param kernel is a valid kernel object. * * \param arg_index is the argument index. Arguments to the kernel are referred * by indices that go from 0 for the leftmost argument to n - 1, where n is the * total number of arguments declared by a kernel. * * \param arg_value is a pointer to data that should be used as the argument * value for argument specified by \a arg_index. The argument data pointed to * by \a arg_value is copied and the \a arg_value pointer can therefore be * reused by the application after clSetKernelArg returns. If the argument is * a memory object (buffer or image), the \a arg_value entry will be a pointer * to the appropriate buffer or image object. The memory object must be created * with the context associated with the kernel object. If the argument is * declared with the __local qualifier, the \a arg_value entry must be NULL. * For all other kernel arguments, the \a arg_value entry must be a pointer to * the actual data to be used as argument value. The memory object specified * as argument value must be a buffer object if the argument is declared to be * a pointer of a built-in or user defined type with the __global or __constant * qualifier. If the argument is declared with the __constant qualifier, the * size in bytes of the memory object cannot exceed * CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE and the number of arguments declared * with the __constant qualifier cannot exceed CL_DEVICE_MAX_CONSTANT_ARGS. The * memory object specified as argument value must be a 2D image object if the * argument is declared to be of type image2d_t. The memory object specified as * argument value must be a 3D image object if argument is declared to be of * type image3d_t. If the argument is of type sampler_t, the arg_value entry * must be a pointer to the sampler object. * * \param arg_size specifies the size of the argument value. If the argument is * a memory object, the size is the size of the buffer or image object type. * For arguments declared with the __local qualifier, the size specified will * be the size in bytes of the buffer that must be allocated for the __local * argument. If the argument is of type sampler_t, the arg_size value must be * equal to sizeof(cl_sampler). For all other arguments, the size will be the * size of argument type. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_KERNEL if \a kernel is not a valid kernel object. * - CL_INVALID_ARG_INDEX if \a arg_index is not a valid argument index. * - CL_INVALID_ARG_VALUE if \a arg_value specified is NULL for an argument * that is not declared with the __local qualifier or vice-versa. * - CL_INVALID_MEM_OBJECT for an argument declared to be a memory object but * the specified \a arg_value is not a valid memory object. * - CL_INVALID_SAMPLER for an argument declared to be of type sampler_t but * the specified \a arg_value is not a valid sampler object. * - CL_INVALID_ARG_SIZE if \a arg_size does not match the size of the data * type for an argument that is not a memory object or if the argument is a * memory object and \a arg_size != sizeof(cl_mem) or if \a arg_size is zero * and the argument is declared with the __local qualifier or if the * argument is a sampler and arg_size != sizeof(cl_sampler). * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clSetKernelArg, (cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void* arg_value)) { if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } const amd::KernelSignature& signature = as_amd(kernel)->signature(); if (arg_index >= signature.numParameters()) { return CL_INVALID_ARG_INDEX; } const amd::KernelParameterDescriptor& desc = signature.at(arg_index); const bool is_local = (desc.addressQualifier_ == CL_KERNEL_ARG_ADDRESS_LOCAL); if (((arg_value == NULL) && !is_local && (desc.type_ != T_POINTER)) || ((arg_value != NULL) && is_local)) { as_amd(kernel)->parameters().reset(static_cast(arg_index)); return CL_INVALID_ARG_VALUE; } if (!is_local && (desc.type_ == T_POINTER) && (arg_value != NULL)) { cl_mem memObj = *static_cast(arg_value); amd::RuntimeObject* pObject = as_amd(memObj); if (NULL != memObj && amd::RuntimeObject::ObjectTypeMemory != pObject->objectType()) { as_amd(kernel)->parameters().reset(static_cast(arg_index)); return CL_INVALID_MEM_OBJECT; } } else if ((desc.type_ == T_SAMPLER) && !is_valid(*static_cast(arg_value))) { return CL_INVALID_SAMPLER; } else if (desc.type_ == T_QUEUE) { cl_command_queue queue = *static_cast(arg_value); if (!is_valid(queue)) { as_amd(kernel)->parameters().reset(static_cast(arg_index)); return CL_INVALID_DEVICE_QUEUE; } if (NULL == as_amd(queue)->asDeviceQueue()) { as_amd(kernel)->parameters().reset(static_cast(arg_index)); return CL_INVALID_DEVICE_QUEUE; } } if ((!is_local && (arg_size != desc.size_)) || (is_local && (arg_size == 0))) { if (LP64_ONLY(true ||) ((desc.type_ != T_POINTER) && (desc.type_ != T_SAMPLER)) || (arg_size != sizeof(void*))) { as_amd(kernel)->parameters().reset(static_cast(arg_index)); return CL_INVALID_ARG_SIZE; } } as_amd(kernel)->parameters().set(static_cast(arg_index), arg_size, arg_value); return CL_SUCCESS; } RUNTIME_EXIT /*! @} * \addtogroup CL_KernelQuery * @{ */ /*! \brief Return information about the kernel object. * * \param kernel specifies the kernel object being queried. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_KERNEL if \a kernel is a not a valid kernel object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetKernelInfo, (cl_kernel kernel, cl_kernel_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { // Check if we have a valid kernel if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } const amd::Kernel* amdKernel = as_amd(kernel); // Get the corresponded parameters switch (param_name) { case CL_KERNEL_FUNCTION_NAME: { const char* name = amdKernel->name().c_str(); // Return the kernel's name return amd::clGetInfo(name, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_NUM_ARGS: { cl_uint numParam = static_cast(amdKernel->signature().numParameters()); // Return the number of kernel's parameters return amd::clGetInfo(numParam, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_REFERENCE_COUNT: { cl_uint count = amdKernel->referenceCount(); // Return the reference counter return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_CONTEXT: { cl_context context = const_cast(as_cl(&amdKernel->program().context())); // Return the context, associated with the program return amd::clGetInfo(context, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_PROGRAM: { cl_program program = const_cast(as_cl(&amdKernel->program())); // Return the program, associated with the kernel return amd::clGetInfo(program, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_ATTRIBUTES: { const char* name = amdKernel->signature().attributes().c_str(); // Return the kernel attributes return amd::clGetInfo(name, param_value_size, param_value, param_value_size_ret); } default: return CL_INVALID_VALUE; } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Returns information about the arguments of a kernel. Kernel * argument information is only available if the program object associated * with kernel is created with \a clCreateProgramWithSource and the program * executable is built with the -cl-kernel-arg-info option specified in * options argument to clBuildProgram or clCompileProgram. * * \param kernel specifies the kernel object being queried. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_KERNEL if \a kernel is a not a valid kernel object. * * \version 1.2r07 */ RUNTIME_ENTRY(cl_int, clGetKernelArgInfo, (cl_kernel kernel, cl_uint arg_indx, cl_kernel_arg_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { // Check if we have a valid kernel if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } amd::Kernel* amdKernel = as_amd(kernel); const amd::KernelSignature& signature = amdKernel->signature(); if (arg_indx >= signature.numParameters()) { return CL_INVALID_ARG_INDEX; } const amd::KernelParameterDescriptor& desc = signature.at(arg_indx); // Get the corresponded parameters switch (param_name) { case CL_KERNEL_ARG_ADDRESS_QUALIFIER: { cl_kernel_arg_address_qualifier qualifier = desc.addressQualifier_; return amd::clGetInfo(qualifier, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_ARG_ACCESS_QUALIFIER: { cl_kernel_arg_access_qualifier qualifier = desc.accessQualifier_; return amd::clGetInfo(qualifier, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_ARG_TYPE_NAME: { const char* typeName = desc.typeName_.c_str(); // Return the argument's type name return amd::clGetInfo(typeName, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_ARG_TYPE_QUALIFIER: { cl_kernel_arg_type_qualifier qualifier = desc.typeQualifier_; return amd::clGetInfo(qualifier, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_ARG_NAME: { const char* name = desc.name_.c_str(); // Return the argument's name return amd::clGetInfo(name, param_value_size, param_value, param_value_size_ret); } default: return CL_INVALID_VALUE; } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Return information about the kernel object that may be specific * to a device. * * \param kernel specifies the kernel object being queried. * * \param device identifies a specific device in the list of devices associated * with \a kernel. The list of devices is the list of devices in the OpenCL * context that is associated with \a kernel. If the list of devices associated * with kernel is a single device, \a device can be a NULL value. * * \param param_name specifies the information to query * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully, * - CL_INVALID_DEVICE if \a device is not in the list of devices associated * with \a kernel or if \a device is NULL but there are more than one * devices in the associated with \a kernel * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_KERNEL if \a kernel is a not a valid kernel object. * * \version 1.2r15 */ RUNTIME_ENTRY(cl_int, clGetKernelWorkGroupInfo, (cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { // Check if we have a valid device if (!is_valid(device)) { return CL_INVALID_DEVICE; } // Check if we have a valid kernel if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } const amd::Device& amdDevice = *as_amd(device); // Find the kernel, associated with the specified device const device::Kernel* devKernel = as_amd(kernel)->getDeviceKernel(amdDevice); // Make sure we found a valid kernel if (devKernel == NULL) { return CL_INVALID_KERNEL; } // Get the corresponded parameters switch (param_name) { case CL_KERNEL_WORK_GROUP_SIZE: { // Return workgroup size return amd::clGetInfo(devKernel->workGroupInfo()->size_, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_COMPILE_WORK_GROUP_SIZE: { // Return the compile workgroup size return amd::clGetInfo(devKernel->workGroupInfo()->compileSize_, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_LOCAL_MEM_SIZE: { // Return the amount of used local memory const size_t align = amdDevice.info().minDataTypeAlignSize_; cl_ulong memSize = as_amd(kernel)->parameters().localMemSize(align) + amd::alignUp(devKernel->workGroupInfo()->localMemSize_, align); return amd::clGetInfo(memSize, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: { // Return the compile workgroup size return amd::clGetInfo(devKernel->workGroupInfo()->preferredSizeMultiple_, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_PRIVATE_MEM_SIZE: { // Return the compile workgroup size return amd::clGetInfo(devKernel->workGroupInfo()->privateMemSize_, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_GLOBAL_WORK_SIZE: { return CL_INVALID_VALUE; } case CL_KERNEL_MAX_SEMAPHORE_SIZE_AMD: { return amd::clGetInfo(amdDevice.info().maxSemaphoreSize_, param_value_size, param_value, param_value_size_ret); } default: return CL_INVALID_VALUE; } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Returns information about the kernel object. * * \param kernel specifies the kernel object being queried. * * \param device identifies a specific device in the list of devices associated * with kernel. The list of devices is the list of devices in the OpenCL context * that is associated with kernel. If the list of devices associated with kernel * is a single device, device can be a NULL value. * * \param param_name specifies the information to query. The list of supported * param_name types and the information returned in param_value by * clGetKernelSubGroupInfo is described in the table below. * * \param input_value_size is used to specify the size in bytes of memory * pointed to by input_value. This size must be == size of input type as * described in the table below. * * \param input_value is a pointer to memory where the appropriate * parameterization of the query is passed from. If input_value is NULL, it is * ignored. * * \param param_value is a pointer to memory where the appropriate result being * queried is returned. If param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by param_value. This size must be >= size of return type as * described in the table below. * * \param param_value_size_ret returns the actual size in bytes of data copied * to param_value. If param_value_size_ret is NULL, it is ignored. * * \return CL_SUCCESS if the function is executed successfully. * Otherwise, it returns one of the following errors: * * - CL_INVALID_DEVICE if device is not in the list of devices associated with * kernel or if device is NULL but there is more than one device associated * with kernel. * - CL_INVALID_VALUE if param_name is not valid, or if size in bytes specified * by param_value_size is < size of return type as described in the table * above and param_value is not NULL. * - CL_INVALID_VALUE if param_name is CL_KERNEL_SUB_GROUP_SIZE_FOR_NDRANGE and * the size in bytes specified by input_value_size is not valid or if * input_value is NULL. * - CL_INVALID_KERNEL if kernel is a not a valid kernel object. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by * the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r12 */ RUNTIME_ENTRY(cl_int, clGetKernelSubGroupInfo, (cl_kernel kernel, cl_device_id device, cl_kernel_sub_group_info param_name, size_t input_value_size, const void* input_value, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { // Check if we have a valid device if (!is_valid(device)) { return CL_INVALID_DEVICE; } // Check if we have a valid kernel if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } const amd::Device& amdDevice = *as_amd(device); // Find the kernel, associated with the specified device const device::Kernel* devKernel = as_amd(kernel)->getDeviceKernel(amdDevice); // Make sure we found a valid kernel if (devKernel == NULL) { return CL_INVALID_KERNEL; } // Get the corresponded parameters switch (param_name) { case CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE: case CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE: { // Infer the number of dimensions from 'input_value_size' size_t dims = input_value_size / sizeof(size_t); if (dims == 0 || dims > 3 || input_value_size != dims * sizeof(size_t)) { return CL_INVALID_VALUE; } // Get the linear workgroup size size_t workGroupSize = ((size_t*)input_value)[0]; for (size_t i = 1; i < dims; ++i) { workGroupSize *= ((size_t*)input_value)[i]; } // Get the subgroup size. GPU devices sub-groups are wavefronts. size_t subGroupSize = as_amd(device)->info().wavefrontWidth_; size_t numSubGroups = (workGroupSize + subGroupSize - 1) / subGroupSize; return amd::clGetInfo((param_name == CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR) ? subGroupSize : numSubGroups, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_COMPILE_NUM_SUB_GROUPS: { size_t numSubGroups = 0; return amd::clGetInfo(numSubGroups, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_MAX_NUM_SUB_GROUPS: { size_t waveSize = as_amd(device)->info().wavefrontWidth_; size_t numSubGroups = (devKernel->workGroupInfo()->size_ + waveSize - 1) / waveSize; return amd::clGetInfo(numSubGroups, param_value_size, param_value, param_value_size_ret); } case CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT: { if (input_value_size != sizeof(size_t)) { return CL_INVALID_VALUE; } size_t numSubGroups = ((size_t*)input_value)[0]; // Infer the number of dimensions from 'param_value_size' size_t dims = param_value_size / sizeof(size_t); if (dims == 0 || dims > 3 || param_value_size != dims * sizeof(size_t)) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = param_value_size; size_t localSize; localSize = numSubGroups * as_amd(device)->info().wavefrontWidth_; if (localSize > devKernel->workGroupInfo()->size_) { ::memset(param_value, '\0', dims * sizeof(size_t)); return CL_SUCCESS; } switch (dims) { case 3: ((size_t*)param_value)[2] = 1; case 2: ((size_t*)param_value)[1] = 1; case 1: ((size_t*)param_value)[0] = localSize; } return CL_SUCCESS; } default: return CL_INVALID_VALUE; } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_runtime.cpp000066400000000000000000000036631450307266000206410ustar00rootroot00000000000000/* Copyright (c) 2008 - 2022 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "thread/thread.hpp" #include "platform/runtime.hpp" #include #include #ifdef DEBUG static int reportHook(int reportType, char* message, int* returnValue) { if (returnValue) { *returnValue = 1; } std::cerr << message; ::exit(3); return TRUE; } #endif // DEBUG extern "C" BOOL WINAPI DllMain(HINSTANCE hinst, DWORD reason, LPVOID reserved) { switch (reason) { case DLL_PROCESS_ATTACH: #ifdef DEBUG if (!::getenv("AMD_OCL_ENABLE_MESSAGE_BOX")) { _CrtSetReportHook(reportHook); _set_error_mode(_OUT_TO_STDERR); } #endif // DEBUG break; case DLL_PROCESS_DETACH: amd::Runtime::setLibraryDetached(); break; case DLL_THREAD_DETACH: { amd::Thread* thread = amd::Thread::current(); delete thread; } break; default: break; } return true; } clr-rocm-5.7.1/opencl/amdocl/cl_sampler.cpp000066400000000000000000000320451450307266000206150ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "platform/context.hpp" #include "platform/sampler.hpp" /*! \addtogroup API * @{ * * \addtogroup CL_Samplers * * A sampler object describes how to sample an image when the image is read * in the kernel. The built-in functions to read from an image in a kernel * take a sampler as an argument. The sampler arguments to the image read * function can be sampler objects created using OpenCL functions and passed * as argument values to the kernel or can be samplers declared inside * a kernel. * * @{ */ /*! \brief Create a sampler object. * * \param context must be a valid OpenCL context. * * \param specifies a list of sampler property names and their corresponding * values. Each sampler property name is immediately followed by the * corresponding desired value. The list is terminated with 0. If a supported * property and its value is not specified in sampler_properties, its default * value will be used. sampler_properties can be NULL in which case the default * values for supported sampler properties will be used. * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero sampler object and \a errcode_ret is set to * CL_SUCCESS if the sampler object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if the property name in sampler_properties is not a * supported property name, if the value specified for a supported property * name is not valid, or if the same property name is specified more than * once * - CL_INVALID_OPERATION if images are not supported by any device associated * with context * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 2.0r19 */ RUNTIME_ENTRY_RET(cl_sampler, clCreateSamplerWithProperties, (cl_context context, const cl_sampler_properties* sampler_properties, cl_int* errcode_ret)) { if (!is_valid(context)) { *not_null(errcode_ret) = CL_INVALID_CONTEXT; LogWarning("invalid parameter \"context\""); return (cl_sampler)0; } cl_bool normalizedCoords = CL_TRUE; cl_addressing_mode addressingMode = CL_ADDRESS_CLAMP; cl_filter_mode filterMode = CL_FILTER_NEAREST; #ifndef CL_FILTER_NONE #define CL_FILTER_NONE 0x1142 #endif cl_filter_mode mipFilterMode = CL_FILTER_NONE; float minLod = 0.f; float maxLod = CL_MAXFLOAT; const struct SamplerProperty { cl_sampler_properties name; union { cl_sampler_properties raw; cl_bool normalizedCoords; cl_addressing_mode addressingMode; cl_filter_mode filterMode; cl_float lod; } value; }* p = reinterpret_cast(sampler_properties); if (p != NULL) while (p->name != 0) { switch (p->name) { case CL_SAMPLER_NORMALIZED_COORDS: normalizedCoords = p->value.normalizedCoords; break; case CL_SAMPLER_ADDRESSING_MODE: addressingMode = p->value.addressingMode; break; case CL_SAMPLER_FILTER_MODE: filterMode = p->value.filterMode; break; case CL_SAMPLER_MIP_FILTER_MODE: mipFilterMode = p->value.filterMode; break; case CL_SAMPLER_LOD_MIN: minLod = p->value.lod; break; case CL_SAMPLER_LOD_MAX: maxLod = p->value.lod; break; default: *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid property name"); return (cl_sampler)0; } ++p; } // Check sampler validity // Check addressing mode switch (addressingMode) { case CL_ADDRESS_NONE: case CL_ADDRESS_CLAMP_TO_EDGE: case CL_ADDRESS_CLAMP: break; case CL_ADDRESS_REPEAT: if (!normalizedCoords) { // repeat mode cannot be used with unnormalized coordinates *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid combination for sampler"); return (cl_sampler)0; } break; case CL_ADDRESS_MIRRORED_REPEAT: if (!normalizedCoords) { // repeat mode cannot be used with unnormalized coordinates *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid combination for sampler"); return (cl_sampler)0; } break; default: *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid addressing mode"); return (cl_sampler)0; } // Check filter mode switch (filterMode) { case CL_FILTER_NEAREST: case CL_FILTER_LINEAR: break; default: *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid filter mode"); return (cl_sampler)0; } switch (mipFilterMode) { case CL_FILTER_NONE: case CL_FILTER_NEAREST: case CL_FILTER_LINEAR: break; default: *not_null(errcode_ret) = CL_INVALID_VALUE; LogWarning("invalid filter mode"); return (cl_sampler)0; } // Create instance of Sampler amd::Sampler* sampler = new amd::Sampler(*as_amd(context), normalizedCoords == CL_TRUE, // To get rid of VS warning C4800 addressingMode, filterMode, mipFilterMode, minLod, maxLod); if (!sampler) { *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("not enough host memory"); return (cl_sampler)0; } if (!sampler->create()) { delete sampler; *not_null(errcode_ret) = CL_OUT_OF_HOST_MEMORY; LogWarning("Runtime failed sampler creation!"); return as_cl(0); } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(sampler); } RUNTIME_EXIT /*! \brief Create a sampler object. * * \param context must be a valid OpenCL context. * * \param addressing_mode specifies how out of range image coordinates are * handled when reading from an image. This can be set to CL_ADDRESS_REPEAT, * CL_ADDRESS_CLAMP_TO_EDGE, CL_ADDRESS_CLAMP and CL_ADDRESS_NONE. * * \param filter_mode specifies the type of filter that must be applied when * reading an image. This can be CL_FILTER_NEAREST or CL_FILTER_LINEAR. * * \param normalized_coords determines if the image coordinates specified are * normalized (if \a normalized_coords is not zero) or not (if * \a normalized_coords is zero). * * \param errcode_ret will return an appropriate error code. If \a errcode_ret * is NULL, no error code is returned. * * \return A valid non-zero sampler object and \a errcode_ret is set to * CL_SUCCESS if the sampler object is created successfully. It returns a NULL * value with one of the following error values returned in \a errcode_ret: * - CL_INVALID_CONTEXT if \a context is not a valid context. * - CL_INVALID_VALUE if \a addressing_mode, \a filter_mode or * \a normalized_coords or combination of these argument values are not * valid. * - CL_INVALID_OPERATION if images are not supported by any device associated * with context * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the runtime. * * \version 1.0r33 */ RUNTIME_ENTRY_RET(cl_sampler, clCreateSampler, (cl_context context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int* errcode_ret)) { const cl_sampler_properties sprops[] = {CL_SAMPLER_NORMALIZED_COORDS, static_cast(normalized_coords), CL_SAMPLER_ADDRESSING_MODE, static_cast(addressing_mode), CL_SAMPLER_FILTER_MODE, static_cast(filter_mode), 0}; return clCreateSamplerWithProperties(context, sprops, errcode_ret); } RUNTIME_EXIT /*! \brief Increment the sampler reference count. * * clCreateSampler does an implicit retain. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_SAMPLER if \a sampler is not a valid sampler object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clRetainSampler, (cl_sampler sampler)) { if (!is_valid(sampler)) { return CL_INVALID_SAMPLER; } as_amd(sampler)->retain(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Decrement the sampler reference count. * * The sampler object is deleted after the reference count becomes zero and * commands queued for execution on a command-queue(s) that use sampler have * finished. * * \return CL_SUCCESS if the function is executed successfully. It returns * CL_INVALID_SAMPLER if \a sampler is not a valid sampler object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clReleaseSampler, (cl_sampler sampler)) { if (!is_valid(sampler)) { return CL_INVALID_SAMPLER; } as_amd(sampler)->release(); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Return information about the sampler object. * * \param sampler specifies the sampler being queried. * * \param param_name specifies the information to query. * * \param param_value is a pointer to memory where the appropriate result * being queried is returned. If \a param_value is NULL, it is ignored. * * \param param_value_size is used to specify the size in bytes of memory * pointed to by \a param_value. This size must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data copied * to \a param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_VALUE if \a param_name is not valid, or if size in bytes * specified by \a param_value_size is < size of return type and * \a param_value is not NULL * - CL_INVALID_SAMPLER if \a sampler is a not a valid sampler object. * * \version 1.0r33 */ RUNTIME_ENTRY(cl_int, clGetSamplerInfo, (cl_sampler sampler, cl_sampler_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(sampler)) { return CL_INVALID_SAMPLER; } switch (param_name) { case CL_SAMPLER_REFERENCE_COUNT: { cl_uint count = as_amd(sampler)->referenceCount(); return amd::clGetInfo(count, param_value_size, param_value, param_value_size_ret); } case CL_SAMPLER_CONTEXT: { cl_context context = as_cl(&as_amd(sampler)->context()); return amd::clGetInfo(context, param_value_size, param_value, param_value_size_ret); } case CL_SAMPLER_ADDRESSING_MODE: { cl_addressing_mode addressing = as_amd(sampler)->addressingMode(); return amd::clGetInfo(addressing, param_value_size, param_value, param_value_size_ret); } case CL_SAMPLER_FILTER_MODE: { cl_filter_mode filter = as_amd(sampler)->filterMode(); return amd::clGetInfo(filter, param_value_size, param_value, param_value_size_ret); } case CL_SAMPLER_NORMALIZED_COORDS: { cl_bool normalized = as_amd(sampler)->normalizedCoords(); return amd::clGetInfo(normalized, param_value_size, param_value, param_value_size_ret); } case CL_SAMPLER_MIP_FILTER_MODE: { cl_filter_mode mipFilter = as_amd(sampler)->mipFilter(); return amd::clGetInfo(mipFilter, param_value_size, param_value, param_value_size_ret); } case CL_SAMPLER_LOD_MIN: { cl_float minLod = as_amd(sampler)->minLod(); return amd::clGetInfo(minLod, param_value_size, param_value, param_value_size_ret); } case CL_SAMPLER_LOD_MAX: { cl_float maxLod = as_amd(sampler)->maxLod(); return amd::clGetInfo(maxLod, param_value_size, param_value, param_value_size_ret); } default: break; } return CL_INVALID_VALUE; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_sdi_amd.cpp000066400000000000000000000150301450307266000205450ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "cl_sdi_amd.h" #include "platform/context.hpp" #include "platform/command.hpp" #include "platform/memory.hpp" #include RUNTIME_ENTRY(cl_int, clEnqueueWaitSignalAMD, (cl_command_queue command_queue, cl_mem mem_object, cl_uint value, cl_uint num_events, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(mem_object)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* buffer = as_amd(mem_object)->asBuffer(); if (buffer == NULL) { return CL_INVALID_MEM_OBJECT; } if (!(buffer->getMemFlags() & CL_MEM_BUS_ADDRESSABLE_AMD)) { return CL_INVALID_MEM_OBJECT; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != buffer->getContext()) { return CL_INVALID_CONTEXT; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::SignalCommand* command = new amd::SignalCommand(hostQueue, CL_COMMAND_WAIT_SIGNAL_AMD, eventWaitList, *buffer, value); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_OUT_OF_RESOURCES; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT RUNTIME_ENTRY(cl_int, clEnqueueWriteSignalAMD, (cl_command_queue command_queue, cl_mem mem_object, cl_uint value, cl_ulong offset, cl_uint num_events, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (!is_valid(mem_object)) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* buffer = as_amd(mem_object)->asBuffer(); if (buffer == NULL) { return CL_INVALID_MEM_OBJECT; } if (!(buffer->getMemFlags() & CL_MEM_EXTERNAL_PHYSICAL_AMD)) { return CL_INVALID_MEM_OBJECT; } if ((offset + sizeof(value)) > (buffer->getSize() + amd::Os::pageSize())) { return CL_INVALID_BUFFER_SIZE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (hostQueue.context() != buffer->getContext()) { return CL_INVALID_CONTEXT; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::SignalCommand* command = new amd::SignalCommand(hostQueue, CL_COMMAND_WRITE_SIGNAL_AMD, eventWaitList, *buffer, value, offset); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_OUT_OF_RESOURCES; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT RUNTIME_ENTRY(cl_int, clEnqueueMakeBuffersResidentAMD, (cl_command_queue command_queue, cl_uint num_mem_objs, cl_mem* mem_objects, cl_bool blocking_make_resident, cl_bus_address_amd* bus_addresses, cl_uint num_events, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (mem_objects == 0) { return CL_INVALID_MEM_OBJECT; } if (bus_addresses == 0 || num_mem_objs == 0) { return CL_INVALID_VALUE; } memset(bus_addresses, 0, sizeof(cl_bus_address_amd) * num_mem_objs); amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; std::vector memObjects; for (unsigned int i = 0; i < num_mem_objs; ++i) { if (!is_valid(mem_objects[i])) { return CL_INVALID_MEM_OBJECT; } amd::Buffer* buffer = as_amd(mem_objects[i])->asBuffer(); if (buffer == NULL) { return CL_INVALID_MEM_OBJECT; } if (!(buffer->getMemFlags() & CL_MEM_BUS_ADDRESSABLE_AMD)) { return CL_INVALID_MEM_OBJECT; } if (hostQueue.context() != buffer->getContext()) { return CL_INVALID_CONTEXT; } memObjects.push_back(buffer); } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::MakeBuffersResidentCommand* command = new amd::MakeBuffersResidentCommand( hostQueue, CL_COMMAND_MAKE_BUFFERS_RESIDENT_AMD, eventWaitList, memObjects, bus_addresses); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_OUT_OF_RESOURCES; } command->enqueue(); if (blocking_make_resident) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT clr-rocm-5.7.1/opencl/amdocl/cl_sdi_amd.h000066400000000000000000000040701450307266000202140ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __CL_SDI_AMD_H #define __CL_SDI_AMD_H #include "CL/cl_ext.h" #ifdef __cplusplus extern "C" { #endif /*__cplusplus*/ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWaitSignalAMD( cl_command_queue command_queue, cl_mem mem_object, cl_uint value, cl_uint num_events, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteSignalAMD( cl_command_queue command_queue, cl_mem mem_object, cl_uint value, cl_ulong offset, cl_uint num_events, const cl_event* event_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMakeBuffersResidentAMD( cl_command_queue command_queue, cl_uint num_mem_objs, cl_mem* mem_objects, cl_bool blocking_make_resident, cl_bus_address_amd* bus_addresses, cl_uint num_events, const cl_event* event_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_2; #ifdef __cplusplus } /*extern "C"*/ #endif /*__cplusplus*/ #endif clr-rocm-5.7.1/opencl/amdocl/cl_semaphore_amd.h000066400000000000000000000032431450307266000214210ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __CL_SEMAPHORE_AMD_H #define __CL_SEMAPHORE_AMD_H /******************************************* * AMD Extension cl_amd_semaphore *******************************************/ #define cl_amd_semaphore 1 #if cl_amd_semaphore #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ /* cl_device_info */ #define CL_DEVICE_MAX_SEMAPHORES_AMD 0xF050 #define CL_DEVICE_MAX_SEMAPHORE_SIZE_AMD 0xF051 /* cl_kernel_work_group_info */ #define CL_KERNEL_MAX_SEMAPHORE_SIZE_AMD 0xF052 #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* cl_amd_semaphore */ #endif /* __CL_SEMAPHORE_AMD_H */ clr-rocm-5.7.1/opencl/amdocl/cl_svm.cpp000066400000000000000000001416621450307266000177650ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "platform/command.hpp" #include "platform/kernel.hpp" #include "platform/program.hpp" /*! \brief Helper function to validate SVM allocation flags * * \return true if flags are valid, otherwise - false */ static bool validateSvmFlags(cl_svm_mem_flags flags) { if (!flags) { // coarse-grained allocation return true; } const cl_svm_mem_flags rwFlags = CL_MEM_READ_WRITE | CL_MEM_WRITE_ONLY | CL_MEM_READ_ONLY; const cl_svm_mem_flags setFlags = flags & (rwFlags | CL_MEM_SVM_ATOMICS | CL_MEM_SVM_FINE_GRAIN_BUFFER); if (flags != setFlags) { // invalid flags value return false; } if (amd::countBitsSet(flags & rwFlags) > 1) { // contradictory R/W flags return false; } if ((flags & CL_MEM_SVM_ATOMICS) && !(flags & CL_MEM_SVM_FINE_GRAIN_BUFFER)) { return false; } return true; } /*! \brief Helper function to validate cl_map_flags * * \return true if flags are valid, otherwise - false */ static bool validateMapFlags(cl_map_flags flags) { const cl_map_flags maxFlag = CL_MAP_WRITE_INVALIDATE_REGION; if (flags >= (maxFlag << 1)) { // at least one flag is out-of-range return false; } else if ((flags & CL_MAP_WRITE_INVALIDATE_REGION) && (flags & (CL_MAP_READ | CL_MAP_WRITE))) { // CL_MAP_READ or CL_MAP_WRITE and CL_MAP_WRITE_INVALIDATE_REGION are // mutually exclusive. return false; } return true; } /*! \addtogroup API * @{ * * \addtogroup SVM * @{ * */ /*! \brief Allocate a shared virtual memory buffer that can be shared by the * host and all devices in an OpenCL context. * * \param context is a valid OpenCL context used to create the SVM buffer. * * \param flags is a bit-field that is used to specify allocation and usage * information. If CL_MEM_SVM_FINE_GRAIN_BUFFER is not specified, the * buffer is created as a coarse grained SVM allocation. Similarly, if * CL_MEM_SVM_ATOMICS is not specified, the buffer is created without * support for SVM atomic operations. * * \param size is the size in bytes of the SVM buffer to be allocated. * * \param alignment is the minimum alignment in bytes that is required for the * newly created buffer?s memory region. It must be a power of two up to the * largest data type supported by the OpenCL device. For the full profile, the * largest data type is long16. For the embedded profile, it is long16 if the * device supports 64-bit integers; otherwise it is int16. If alignment is 0, a * default alignment will be used that is equal to the size of largest data * type supported by the OpenCL implementation. * * \return A valid non-NULL shared virtual memory address if the SVM buffer * is successfully allocated. Otherwise, like malloc, it returns a NULL pointer * value. clSVMAlloc will fail if * - \a context is not a valid context. * - \a flags does not contain CL_MEM_SVM_FINE_GRAIN_BUFFER but does * contain CL_MEM_SVM_ATOMICS. * - Values specified in \a flags do not follow rules for that particular type. * - CL_MEM_SVM_FINE_GRAIN_BUFFER or CL_MEM_SVM_ATOMICS is specified * in \a flags and these are not supported by at least one device in * \a context. * - The values specified in \a flags are not valid. * - \a size is 0 or > CL_DEVICE_MAX_MEM_ALLOC_SIZE value for any device in * \a context. * - \a alignment is not a power of two or the OpenCL implementation cannot * support the specified alignment for at least one device in \a context. * - There was a failure to allocate resources. * * \version 2.0r15 */ RUNTIME_ENTRY_RET_NOERRCODE(void*, clSVMAlloc, (cl_context context, cl_svm_mem_flags flags, size_t size, unsigned int alignment)) { if (!is_valid(context)) { LogWarning("invalid parameter \"context\""); return NULL; } if (size == 0) { LogWarning("invalid parameter \"size = 0\""); return NULL; } if (!validateSvmFlags(flags)) { LogWarning("invalid parameter \"flags\""); return NULL; } if (!amd::isPowerOfTwo(alignment)) { LogWarning("invalid parameter \"alignment\""); return NULL; } const std::vector& devices = as_amd(context)->svmDevices(); bool sizePass = false; cl_device_svm_capabilities combinedSvmCapabilities = 0; const cl_uint hostAddressBits = LP64_SWITCH(32, 64); cl_uint minContextAlignment = std::numeric_limits::max(); for (const auto& it : devices) { cl_device_svm_capabilities svmCapabilities = it->info().svmCapabilities_; if (svmCapabilities == 0) { continue; } combinedSvmCapabilities |= svmCapabilities; if (it->info().maxMemAllocSize_ >= size) { sizePass = true; } if (it->info().addressBits_ < hostAddressBits) { LogWarning("address mode mismatch between host and device"); return NULL; } // maximum alignment for a device is given in bits. cl_uint baseAlignment = it->info().memBaseAddrAlign_ >> 3; if (alignment > baseAlignment) { LogWarning("invalid parameter \"alignment\""); return NULL; } minContextAlignment = std::min(minContextAlignment, baseAlignment); } if ((flags & CL_MEM_SVM_FINE_GRAIN_BUFFER) && !(combinedSvmCapabilities & CL_DEVICE_SVM_FINE_GRAIN_BUFFER)) { LogWarning("No device in context supports SVM fine grained buffers"); return NULL; } if ((flags & CL_MEM_SVM_ATOMICS) && !(combinedSvmCapabilities & CL_DEVICE_SVM_ATOMICS)) { LogWarning("No device in context supports SVM atomics"); return NULL; } if (!sizePass) { LogWarning("invalid parameter \"size\""); return NULL; } // if alignment not specified, use largest data type alignment supported if (alignment == 0) { alignment = minContextAlignment; ClPrint(amd::LOG_INFO, amd::LOG_API, "Assumed alignment %d\n", alignment); } amd::Context& amdContext = *as_amd(context); return amd::SvmBuffer::malloc(amdContext, flags, size, alignment); } RUNTIME_EXIT /*! \brief Free a shared virtual memory buffer allocated using clSVMAlloc. * * \param context is a valid OpenCL context used to create the SVM buffer. * * \param svm_pointer must be the value returned by a call to clSVMAlloc. If a * NULL pointer is passed in \a svm_pointer, no action occurs. * * \version 2.0r15 */ RUNTIME_ENTRY_VOID(void, clSVMFree, (cl_context context, void* svm_pointer)) { if (!is_valid(context)) { LogWarning("invalid parameter \"context\""); return; } if (svm_pointer == NULL) { return; } amd::Context& amdContext = *as_amd(context); amd::SvmBuffer::free(amdContext, svm_pointer); } RUNTIME_EXIT /*! \brief enqueues a command to free shared virtual memory allocated using * clSVMAlloc or a shared system memory pointer. * * \param command_queue is a valid host command-queue. * * \param num_svm_pointers specifies the number of elements in \a svm_pointers. * * \param svm_pointers is a list of shared virtual memory pointers to * be freed. Each pointer in \a svm_pointers that was allocated using SVMAlloc * must have been allocated from the same context from which \a command_queue * was created. The memory associated with \a svm_pointers can be reused or * freed after the function returns. * * \param pfn_free_func specifies the callback function to be called to free * the SVM pointers. \a pfn_free_func takes four arguments: \a queue which is * the command queue in which clEnqueueSVMFree was enqueued, the count and list * of SVM pointers to free and \a user_data which is a pointer to user * specified data. If \a pfn_free_func is NULL, all the pointers specified in * \a svm_pointers array must be allocated using clSVMAlloc. \a pfn_free_func * must however be a valid callback function if any SVM pointer to be freed is * a shared system memory pointer i.e. not allocated using clSVMAlloc. * * \param user_data will be passed as the user_data argument when * \a pfn_free_func is called. \a user_data can be NULL. * * \param even_wait_list specifies the events that need to complete before * this particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. The memory * associated with \a event_wait_list can be reused or freed after the function * returns. * * \param num_events_in_wait_list specifies the number of elements in * \a even_wait_list * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for the * application to query the status of this command or queue a wait for this * command to complete. If the \a event_wait_list and the \a event arguments * are not NULL, the \a event argument should not refer to an element of the * \a event_wait_list array. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid host * command-queue * - CL_INVALID_VALUE if \a num_svm_pointers is 0 or if \a svm_pointers is * NULL or if any of the pointers specified in \a svm_pointers array is NULL * - CL_INVALID_CONTEXT if context associated with \a command_queue and * events in \a event_wait_list are not the same * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r15 */ RUNTIME_ENTRY(cl_int, clEnqueueSVMFree, (cl_command_queue command_queue, cl_uint num_svm_pointers, void* svm_pointers[], void(CL_CALLBACK* pfn_free_func)(cl_command_queue queue, cl_uint num_svm_pointers, void* svm_pointers[], void* user_data), void* user_data, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (num_svm_pointers == 0) { LogWarning("invalid parameter \"num_svm_pointers = 0\""); return CL_INVALID_VALUE; } if (svm_pointers == NULL) { LogWarning("invalid parameter \"svm_pointers = NULL\""); return CL_INVALID_VALUE; } //!@todo why are NULL pointers disallowed here but not in clSVMFree? for (cl_uint i = 0; i < num_svm_pointers; i++) { if (svm_pointers[i] == NULL) { LogWarning("Null pointers are not allowed"); return CL_INVALID_VALUE; } } //!@todo what if the callback is NULL but \a user_data is not? amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Command* command = new amd::SvmFreeMemoryCommand(hostQueue, eventWaitList, num_svm_pointers, svm_pointers, pfn_free_func, user_data); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief enqueues a command to do a memcpy operation. * * \param command_queue refers to the host command-queue in which the read/ * write commands will be queued. * * \param blocking_copy indicates if the copy operation is blocking or * non-blocking. If \a blocking_copy is CL_TRUE i.e. the copy command is * blocking, clEnqueueSVMMemcpy does not return until the buffer data has been * copied into memory pointed to by \a dst_ptr. If \a blocking_copy is CL_FALSE * i.e. the copy command is non-blocking, clEnqueueSVMMemcpy queues a * non-blocking copy command and returns. The contents of the buffer that * \a dst_ptr point to cannot be used until the copy command has completed. * The \a event argument returns an event object which can be used to query the * execution status of the read command. When the copy command has completed, * the contents of the buffer that \a dst_ptr points to can be used by the * application. * * \param dst_ptr is the pointer to a memory region where data is copied to. * * \param src_ptr is the pointer to a memory region where data is copied from. * If \a dst_ptr and/or \a src_ptr are allocated using clSVMAlloc then they * must be allocated from the same context from which \a command_queue was * created. Otherwise the behavior is undefined. * * \param size is the size in bytes of data being copied. * * \param even_wait_list specifies the events that need to complete before * this particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. The memory * associated with \a event_wait_list can be reused or freed after the function * returns. * * \param num_events_in_wait_list specifies the number of elements in * \a even_wait_list * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for the * application to query the status of this command or queue a wait for this * command to complete. If the \a event_wait_list and the \a event arguments * are not NULL, the \a event argument should not refer to an element of the * \a event_wait_list array. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid host * command-queue * - CL_INVALID_CONTEXT if the context associated with \a command_queue and * events in \a event_wait_list are not the same * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the operation is * blocking and the execution status of any of the events in * \a event_wait_list is a negative integer value. * - CL_INVALID_VALUE if \a dst_ptr or \a src_ptr are NULL. * - CL_INVALID_VALUE if \a size is 0. * - CL_MEM_COPY_OVERLAP if the values specified for \a dst_ptr, \a src_ptr * and \a size result in an overlapping copy. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r15 */ RUNTIME_ENTRY(cl_int, clEnqueueSVMMemcpy, (cl_command_queue command_queue, cl_bool blocking_copy, void* dst_ptr, const void* src_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (dst_ptr == NULL || src_ptr == NULL) { return CL_INVALID_VALUE; } if (size == 0) { return CL_INVALID_VALUE; } char* dst = reinterpret_cast(dst_ptr); const char* src = reinterpret_cast(src_ptr); if ((dst > src - size) && (dst < src + size)) { return CL_MEM_COPY_OVERLAP; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Command* command = new amd::SvmCopyMemoryCommand(hostQueue, eventWaitList, dst_ptr, src_ptr, size); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); if (blocking_copy) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief enqueues a command to fill a region in memory with a pattern of a * given pattern size. * * \param command_queue refers to the host command-queue in which the fill * command will be queued. The OpenCL context associated with \a command_queue * and SVM pointer referred to by \a svm_ptr must be the same.. * * \param svm_ptr is a pointer to a memory region that will be filled with * \a pattern. It must be aligned to \a pattern_size bytes. If \a svm_ptr is * allocated using clSVMAlloc then it must be allocated from the same context * from which \a command_queue was created. Otherwise the behavior is * undefined. * * \a pattern is a pointer to the data pattern of size \a pattern_size in * bytes. \a pattern will be used to fill a region in buffer starting at * \a svm_ptr and is \a size bytes in size. The data pattern must be a scalar * or vector integer or floating-point data type supported by OpenCL. For * example, if region pointed to by \a svm_ptr is to be filled with a pattern * of float4 values, then \a pattern will be a pointer to a cl_float4 value * and \a pattern_size will be sizeof(cl_float4). The maximum value of * \a pattern_size is the size of the largest integer or floating-point vector * data type supported by the OpenCL device. The memory associated with * \a pattern can be reused or freed after the function returns. * * \param size is the size in bytes of region being filled starting with * \a svm_ptr and must be a multiple of \a pattern_size. * * \param even_wait_list specifies the events that need to complete before * this particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. The memory * associated with \a event_wait_list can be reused or freed after the function * returns. * * \param num_events_in_wait_list specifies the number of elements in * \a even_wait_list * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for the * application to query the status of this command or queue a wait for this * command to complete. clEnqueueBarrierWithWaitList can be used instead. If * the \a event_wait_list and the \a event arguments are not NULL, the \a event * argument should not refer to an element of the \a event_wait_list array. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid host * command-queue * - CL_INVALID_CONTEXT if context associated with \a command_queue and * events in \a event_wait_list are not the same * - CL_INVALID_VALUE if \a svm_ptr is NULL. * - CL_INVALID_VALUE if \a svm_ptr is not aligned to \a pattern_size bytes. * - CL_INVALID_VALUE if \a pattern is NULL or if \a pattern_size is 0 or if * \a pattern_size is not one of {1, 2, 4, 8, 16, 32, 64, 128}. * - CL_INVALID_VALUE if \a size is 0 or is not a multiple of \a pattern_size. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r15 */ RUNTIME_ENTRY(cl_int, clEnqueueSVMMemFill, (cl_command_queue command_queue, void* svm_ptr, const void* pattern, size_t pattern_size, size_t size, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (svm_ptr == NULL) { return CL_INVALID_VALUE; } char* dst = reinterpret_cast(svm_ptr); if (!amd::isMultipleOf(dst, pattern_size)) { return CL_INVALID_VALUE; } if (pattern == NULL) { return CL_INVALID_VALUE; } if (!amd::isPowerOfTwo(pattern_size) || pattern_size == 0 || pattern_size > amd::FillMemoryCommand::MaxFillPatterSize) { return CL_INVALID_VALUE; } if (size == 0 || !amd::isMultipleOf(size, pattern_size)) { return CL_INVALID_VALUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Command* command = new amd::SvmFillMemoryCommand(hostQueue, eventWaitList, svm_ptr, pattern, pattern_size, size); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief enqueues a command that will allow the host to update a region of a * SVM buffer * * \param command_queue is a valid host command-queue. * * \param blocking_map indicates if the map operation is blocking or * non-blocking. If \a blocking_map is CL_TRUE, clEnqueueSVMMap does not return * until the application can access the contents of the SVM region specified by * \a svm_ptr and \a size on the host. If blocking_map is CL_FALSE i.e. map * operation is non-blocking, the region specified by \a svm_ptr and \a size * cannot be used until the map command has completed. The \a event argument * returns an event object which can be used to query the execution status of * the map command. When the map command is completed, the application can * access the contents of the region specified by \a svm_ptr and \a size. * * \param maps_flag is a valid cl_map_flags flag. * * \param svm_ptr is a pointer to a memory region that will be updated by the * host. If \a svm_ptr is allocated using clSVMAlloc then it must be allocated * from the same context from which \a command_queue was created. Otherwise * the behavior is undefined. * * \param size is the size in bytes of the memory region that will be updated * by the host. * * \param even_wait_list specifies the events that need to complete before * this particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. The memory * associated with \a event_wait_list can be reused or freed after the function * returns. * * \param num_events_in_wait_list specifies the number of elements in * \a even_wait_list * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for the * application to query the status of this command or queue a wait for this * command to complete. clEnqueueBarrierWithWaitList can be used instead. If * the \a event_wait_list and the \a event arguments are not NULL, the \a event * argument should not refer to an element of the \a event_wait_list array. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid host * command-queue * - CL_INVALID_CONTEXT if context associated with \a command_queue and * events in \a event_wait_list are not the same * - CL_INVALID_VALUE if \a svm_ptr is NULL. * - CL_INVALID_VALUE if \a size is 0 or if values specified in \a map_flags * are not valid. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the operation is * blocking and the execution status of any of the events in * \a event_wait_list is a negative integer value. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r15 */ RUNTIME_ENTRY(cl_int, clEnqueueSVMMap, (cl_command_queue command_queue, cl_bool blocking_map, cl_map_flags map_flags, void* svm_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (svm_ptr == NULL) { return CL_INVALID_VALUE; } if (size == 0) { return CL_INVALID_VALUE; } if (!validateMapFlags(map_flags)) { return CL_INVALID_VALUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; size_t offset = 0; amd::Memory* svmMem = NULL; if ((queue->device()).isFineGrainedSystem()) { // leave blank on purpose for FGS no op } else { svmMem = amd::MemObjMap::FindMemObj(svm_ptr); if (NULL != svmMem) { // make sure the context is the same as the context of creation of svm space if (hostQueue.context() != svmMem->getContext()) { LogWarning("different contexts"); return CL_INVALID_CONTEXT; } offset = static_cast
(svm_ptr) - static_cast
(svmMem->getSvmPtr()); if (offset < 0 || offset + size > svmMem->getSize()) { LogWarning("wrong svm address "); return CL_INVALID_VALUE; } amd::Buffer* srcBuffer = svmMem->asBuffer(); amd::Coord3D srcSize(size); amd::Coord3D srcOffset(offset); if (NULL != srcBuffer) { if (!srcBuffer->validateRegion(srcOffset, srcSize)) { return CL_INVALID_VALUE; } } // Make sure we have memory for the command execution device::Memory* mem = svmMem->getDeviceMemory(queue->device()); if (NULL == mem) { LogPrintfError("Can't allocate memory size - 0x%08X bytes!", svmMem->getSize()); return CL_MEM_OBJECT_ALLOCATION_FAILURE; } // Attempt to allocate the map target now (whether blocking or non-blocking) void* mapPtr = mem->allocMapTarget(srcOffset, srcSize, map_flags); if (NULL == mapPtr || mapPtr != svm_ptr) { return CL_OUT_OF_RESOURCES; } } } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Command* command = new amd::SvmMapMemoryCommand(hostQueue, eventWaitList, svmMem, size, offset, map_flags, svm_ptr); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); if (blocking_map) { command->awaitCompletion(); } *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief enqueues a command to indicate that the host has completed updating * a memory region which was specified in a previous call to clEnqueueSVMUnmap. * * \param command_queue is a valid host command-queue. * * \param svm_ptr is a pointer that was specified in a previous call to * clEnqueueSVMMap. If \a svm_ptr is allocated using clSVMAlloc then it must be * allocated from the same context from which \a command_queue was created. * Otherwise the behavior is undefined. * * \param even_wait_list specifies the events that need to complete before * this particular command can be executed. If \a event_wait_list is NULL, then * this particular command does not wait on any event to complete. If * \a event_wait_list is NULL, \a num_events_in_wait_list must be 0. If * \a event_wait_list is not NULL, the list of events pointed to by * \a event_wait_list must be valid and \a num_events_in_wait_list must be * greater than 0. The events specified in \a event_wait_list act as * synchronization points. The context associated with events in * \a event_wait_list and \a command_queue must be the same. The memory * associated with \a event_wait_list can be reused or freed after the function * returns. * * \param num_events_in_wait_list specifies the number of elements in * \a even_wait_list * * \param event returns an event object that identifies this particular command * and can be used to query or queue a wait for this particular command to * complete. \a event can be NULL in which case it will not be possible for the * application to query the status of this command or queue a wait for this * command to complete. clEnqueueBarrierWithWaitList can be used instead. If * the \a event_wait_list and the \a event arguments are not NULL, the \a event * argument should not refer to an element of the \a event_wait_list array. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid host * command-queue * - CL_INVALID_CONTEXT if context associated with \a command_queue and * events in \a event_wait_list are not the same * - CL_INVALID_VALUE if \a svm_ptr is NULL. * - CL_INVALID_EVENT_WAIT_LIST if \a event_wait_list is NULL and * \a num_events_in_wait_list > 0, or \a event_wait_list is not NULL and * \a num_events_in_wait_list is 0, or if event objects in \a event_wait_list * are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r15 */ RUNTIME_ENTRY(cl_int, clEnqueueSVMUnmap, (cl_command_queue command_queue, void* svm_ptr, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (svm_ptr == NULL) { return CL_INVALID_VALUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; amd::Memory* svmMem = NULL; if (!(queue->device()).isFineGrainedSystem()) { // check if the ptr is in the svm space svmMem = amd::MemObjMap::FindMemObj(svm_ptr); // Make sure we have memory for the command execution if (NULL != svmMem) { // Make sure we have memory for the command execution device::Memory* mem = svmMem->getDeviceMemory(queue->device()); if (NULL == mem) { LogPrintfError("Can't allocate memory size - 0x%08X bytes!", svmMem->getSize()); return CL_INVALID_VALUE; } } } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::Command* command = new amd::SvmUnmapMemoryCommand(hostQueue, eventWaitList, svmMem, svm_ptr); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Set the argument value for a specific argument of a kernel to be * a SVM pointer. * * \param kernel is a valid kernel object. * * \param arg_index is the argument index. Arguments to the kernel are referred * by indices that go from 0 for the leftmost argument to n - 1, where n is the * total number of arguments declared by a kernel. * * \param arg_value is the SVM pointer that should be used as the argument * value for argument specified by \a arg_index. The SVM pointer pointed to by * \a arg_value is copied and the \a arg_value pointer can therefore be reused * by the application after clSetKernelArgSVMPointer returns. The SVM pointer * specified is the value used by all API calls that enqueue kernel * (clEnqueueNDRangeKernel) until the argument value is changed by a call to * clSetKernelArgSVMPointer for \a kernel. The SVM pointer can only be used for * arguments that are declared to be a pointer to global or constant memory. * The SVM pointer value must be aligned according to the argument?s type. For * example, if the argument is declared to be global float4 *p, the SVM pointer * value passed for p must be at a minimum aligned to a float4. The SVM pointer * value specified as the argument value can be the pointer returned by * clSVMAlloc or can be a pointer + offset into the SVM region. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_KERNEL if \a kernel is not a valid kernel object * - CL_INVALID_ARG_INDEX if \a arg_index is not a valid argument index * - CL_INVALID_ARG_VALUE if \a arg_value is not a valid value * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r15 */ RUNTIME_ENTRY(cl_int, clSetKernelArgSVMPointer, (cl_kernel kernel, cl_uint arg_index, const void* arg_value)) { if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } const amd::KernelSignature& signature = as_amd(kernel)->signature(); if (arg_index >= signature.numParameters()) { return CL_INVALID_ARG_INDEX; } const amd::KernelParameterDescriptor& desc = signature.at(arg_index); if (desc.type_ != T_POINTER || !(desc.addressQualifier_ & (CL_KERNEL_ARG_ADDRESS_GLOBAL | CL_KERNEL_ARG_ADDRESS_CONSTANT))) { as_amd(kernel)->parameters().reset(static_cast(arg_index)); return CL_INVALID_ARG_VALUE; } //! @todo We need to check that the alignment of \a arg_value. For instance, // if the argument is of type 'global float4*', then \a arg_value must be // aligned to sizeof(float4*). Note that desc.size_ contains the size of the // pointer type itself and the size of the pointed type. // We do not perform additional pointer validations: // -verifying pointers returned by SVMAlloc would imply keeping track // of every allocation range and then matching the pointer against that // range. Note that even if the pointer would look correct, nothing // prevents the user from using an offset within the kernel that would // result on an invalid access. // -verifying system pointers (if supported) requires matching the pointer // against the address space of the current process. as_amd(kernel)->parameters().set(static_cast(arg_index), sizeof(arg_value), &arg_value, true); return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Pass additional information other than argument values to a kernel. * * \param kernel is a valid kernel object. * * \param param_name specifies the information to be passed to \a kernel. It * must be a cl_kernel_exec_info value. * * \param param_value_size specifies the size in bytes of the memory pointed to * by \a param_value. * * \param param_value is a pointer to memory where the appropiate values * determined by \a param_name are specified. * * \return One of the following values: * - CL_SUCCESS if the function was executed successfully * - CL_INVALID_KERNEL if \a kernel is not a valid kernel object. * - CL_INVALID_VALUE if \a param_name is not valid, if \a param_value is * NULL or if the size specified by \a param_value_size is not valid * - CL_INVALID_OPERATION if \a param_name is * CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM and \a param_value = CL_TRUE * but no devices in context associated with \a kernel support fine-grained * system SVM allocations * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.0r15 */ RUNTIME_ENTRY(cl_int, clSetKernelExecInfo, (cl_kernel kernel, cl_kernel_exec_info param_name, size_t param_value_size, const void* param_value)) { if (!is_valid(kernel)) { return CL_INVALID_KERNEL; } if (param_value == NULL) { return CL_INVALID_VALUE; } const amd::Kernel* amdKernel = as_amd(kernel); switch (param_name) { case CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM: if (param_value_size != sizeof(cl_bool)) { return CL_INVALID_VALUE; } else { const bool flag = *(static_cast(param_value)); const amd::Context* amdContext = &amdKernel->program().context(); bool foundFineGrainedSystemDevice = false; const std::vector& devices = amdContext->devices(); for (const auto it : devices) { if (it->info().svmCapabilities_ & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM) { foundFineGrainedSystemDevice = true; break; } } if (flag && !foundFineGrainedSystemDevice) { return CL_INVALID_OPERATION; } amdKernel->parameters().setSvmSystemPointersSupport(flag ? FGS_YES : FGS_NO); } break; case CL_KERNEL_EXEC_INFO_SVM_PTRS: if (param_value_size == 0 || !amd::isMultipleOf(param_value_size, sizeof(void*))) { return CL_INVALID_VALUE; } else { size_t count = param_value_size / sizeof(void*); void* const* execInfoArray = reinterpret_cast(param_value); for (size_t i = 0; i < count; i++) { if (NULL == execInfoArray[i]) { return CL_INVALID_VALUE; } } amdKernel->parameters().addSvmPtr(execInfoArray, count); } break; case CL_KERNEL_EXEC_INFO_NEW_VCOP_AMD: if (param_value_size != sizeof(cl_bool)) { return CL_INVALID_VALUE; } else { const bool newVcopFlag = (*(reinterpret_cast(param_value))) ? true : false; amdKernel->parameters().setExecNewVcop(newVcopFlag); } break; case CL_KERNEL_EXEC_INFO_PFPA_VCOP_AMD: if (param_value_size != sizeof(cl_bool)) { return CL_INVALID_VALUE; } else { const bool pfpaVcopFlag = (*(reinterpret_cast(param_value))) ? true : false; amdKernel->parameters().setExecPfpaVcop(pfpaVcopFlag); } break; default: return CL_INVALID_VALUE; } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Enqueues a command to indicate which device a set of ranges of SVM * allocations should be associated with. Once the event returned by * \a clEnqueueSVMMigrateMem has become CL_COMPLETE, the ranges specified by * svm pointers and sizes have been successfully migrated to the device * associated with command queue. * The user is responsible for managing the event dependencies associated with * this command in order to avoid overlapping access to SVM allocations. * Improperly specified event dependencies passed to clEnqueueSVMMigrateMem * could result in undefined results * * \param command_queue is a valid host command queue. The specified set of * allocation ranges will be migrated to the OpenCL device associated with * command_queue. * * \param num_svm_pointers is the number of pointers in the specified * svm_pointers array, and the number of sizes in the sizes array, if sizes * is not NULL. * * \param svm_pointers is a pointer to an array of pointers. Each pointer in * this array must be within an allocation produced by a call to clSVMAlloc. * * \param sizes is an array of sizes. The pair svm_pointers[i] and sizes[i] * together define the starting address and number of bytes in a range to be * migrated. sizes may be NULL indicating that every allocation containing * any svm_pointer[i] is to be migrated. Also, if sizes[i] is zero, then the * entire allocation containing svm_pointer[i] is migrated. * * \param flags is a bit-field that is used to specify migration options. * Table 5.12 describes the possible values for flags. * * \param num_events_in_wait_list specifies the number of event objects in * \a event_wait_list. * * \param event_wait_list specifies events that need to complete before this * particular command can be executed. If event_wait_list is NULL, then this * particular command does not wait on any event to complete. If * event_wait_list is NULL, num_events_in_wait_list must be 0. If * event_wait_list is not NULL, the list of events pointed to by * event_wait_list must be valid and num_events_in_wait_list must be greater * than 0. The events specified in event_wait_list act as synchronization * points. The context associated with events in event_wait_list and * command_queue must be the same. The memory associated with * event_wait_list can be reused or freed after the function returns. * * \param event an returned event object that identifies this particular write * command and can be used to query or queue a wait for this particular * command to complete. event can be NULL in which case it will not be * possible for the application to query the status of this command or queue * another command that waits for this command to complete. If the * event_wait_list and the event arguments are not NULL, the event argument * should not refer to an element of the event_wait_list array. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully * - CL_INVALID_COMMAND_QUEUE if \a command_queue is not a valid command-queue * - CL_INVALID_VALUE if num_svm_pointers is zero or svm_pointers is NULL * - CL_INVALID_VALUE if sizes[i] is non-zero range [svm_pointers[i], * svm_pointers[i]+sizes[i]) is not contained within an existing clSVMAlloc * allocation * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and * num_events_in_wait_list > 0, or event_wait_list is not NULL and * num_events_in_wait_list is 0, or if event objects in event_wait_list are * not valid events * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required * by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required * by the OpenCL implementation on the host. * * \version 2.1r00 */ RUNTIME_ENTRY(cl_int, clEnqueueSVMMigrateMem, (cl_command_queue command_queue, cl_uint num_svm_pointers, const void **svm_pointers, const size_t *size, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue* queue = as_amd(command_queue)->asHostQueue(); if (NULL == queue) { return CL_INVALID_COMMAND_QUEUE; } amd::HostQueue& hostQueue = *queue; if (num_svm_pointers == 0) { LogWarning("invalid parameter \"num_svm_pointers = 0\""); return CL_INVALID_VALUE; } if (svm_pointers == NULL) { LogWarning("invalid parameter \"svm_pointers = NULL\""); return CL_INVALID_VALUE; } for (cl_uint i = 0; i < num_svm_pointers; i++) { if (svm_pointers[i] == NULL) { LogWarning("Null pointers are not allowed"); return CL_INVALID_VALUE; } } if (flags & ~(CL_MIGRATE_MEM_OBJECT_HOST | CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED)) { LogWarning("Invalid flag is specified"); return CL_INVALID_VALUE; } std::vector memObjects; for (cl_uint i = 0; i < num_svm_pointers; i++) { const void* svm_ptr = svm_pointers[i]; amd::Memory* svmMem = amd::MemObjMap::FindMemObj(svm_ptr); if (NULL != svmMem) { // make sure the context is the same as the context of creation of svm space if (hostQueue.context() != svmMem->getContext()) { LogWarning("different contexts"); return CL_INVALID_CONTEXT; } // Make sure the specified size[i] is within a valid range // TODO: handle the size parameter properly size_t svm_size = (size == NULL) ? 0 : size[i]; size_t offset = reinterpret_cast(svm_ptr) - reinterpret_cast
(svmMem->getSvmPtr()); if ((offset + svm_size) > svmMem->getSize()) { LogWarning("wrong svm address "); return CL_INVALID_VALUE; } memObjects.push_back(svmMem); } } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amd::MigrateMemObjectsCommand* command = new amd::MigrateMemObjectsCommand( hostQueue, CL_COMMAND_MIGRATE_MEM_OBJECTS, eventWaitList, memObjects, flags); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_MEM_OBJECT_ALLOCATION_FAILURE; } command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_thread_trace_amd.cpp000066400000000000000000000514471450307266000224270ustar00rootroot00000000000000/* Copyright (c) 2009 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "cl_common.hpp" #include "cl_thread_trace_amd.h" #include "platform/context.hpp" #include "platform/command.hpp" #include "platform/threadtrace.hpp" #include #include #include /*! \addtogroup API * @{ * * \addtogroup AMD_Extensions * @{ * */ /*! \brief Creates a new HW threadTrace * * \param device must be a valid OpenCL device. * * \param threadTrace the created cl_threadtrace_amd object * * \param errcode_ret A non zero value if OpenCL failed to create cl_threadtrace_amd * - CL_SUCCESS if the function is executed successfully. * - CL_INVALID_DEVICE if the specified context is invalid. * - CL_INVALID_OPERATION if we couldn't create the object * * \return Created cl_threadtrace_amd object */ RUNTIME_ENTRY_RET(cl_threadtrace_amd, clCreateThreadTraceAMD, (cl_device_id device, cl_int* errcode_ret)) { // Make sure we have a valid device object if (!is_valid(device)) { *not_null(errcode_ret) = CL_INVALID_DEVICE; return NULL; } // Create the device thread trace object amd::ThreadTrace* threadTrace = new amd::ThreadTrace(*as_amd(device)); if (threadTrace == NULL) { *not_null(errcode_ret) = CL_INVALID_OPERATION; return NULL; } *not_null(errcode_ret) = CL_SUCCESS; return as_cl(threadTrace); } RUNTIME_EXIT ///*! \brief Destroy a threadTrace object. // * // * \param threadTrace the cl_threadtrace_amd object for release // * // * \return A non zero value if OpenCL failed to release cl_threadtrace_amd // * - CL_SUCCESS if the function is executed successfully. // * - CL_INVALID_OPERATION if we failed to release the object // */ RUNTIME_ENTRY(cl_int, clReleaseThreadTraceAMD, (cl_threadtrace_amd threadTrace)) { if (!is_valid(threadTrace)) { return CL_INVALID_OPERATION; } as_amd(threadTrace)->release(); return CL_SUCCESS; } RUNTIME_EXIT // // *! \brief Increments the cl_threadtrace_amd object reference count. // * // * \param threadTrace the cl_threadtrace_amd object for retain // * // * \return A non zero value if OpenCL failed to retain cl_threadtrace_amd // * - CL_SUCCESS if the function is executed successfully. // * - CL_INVALID_OPERATION if we failed to release the object // */ RUNTIME_ENTRY(cl_int, clRetainThreadTraceAMD, (cl_threadtrace_amd threadTrace)) { if (!is_valid(threadTrace)) { return CL_INVALID_OPERATION; } as_amd(threadTrace)->retain(); return CL_SUCCESS; } RUNTIME_EXIT // // *! \brief Sets the cl_threadtrace_amd object configuration parameter. // * // * \param thread_trace the cl_threadtrace_amd object to set configuration parameter // * // * \param config_param the cl_thread_trace_param // * // * \param param_value corresponding to configParam // * // * \return A non zero value if OpenCL failed to set threadTrace buffer parameter // * - CL_INVALID_VALUE if the thread_trace is invalid thread trace object. // * - CL_INVALID_VALUE if the invalid config_param or param_value enum values , are used. // * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list > 0, or // event_wait_list is not NULL and num_events_in_wait_list is 0, // * - or if event objects in event_wait_list are not valid events. // * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL // implementation on the device. // * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the // OpenCL implementation on the host. // */ RUNTIME_ENTRY(cl_int, clSetThreadTraceParamAMD, (cl_threadtrace_amd thread_trace, cl_thread_trace_param config_param, cl_uint param_value)) { if (!is_valid(thread_trace)) { return CL_INVALID_OPERATION; } switch (config_param) { case CL_THREAD_TRACE_PARAM_TOKEN_MASK: if (param_value > CL_THREAD_TRACE_TOKEN_MASK_ALL_SI) { return CL_INVALID_VALUE; } as_amd(thread_trace)->setTokenMask(param_value); break; case CL_THREAD_TRACE_PARAM_REG_MASK: if (param_value > CL_THREAD_TRACE_REG_MASK_ALL_SI) { return CL_INVALID_VALUE; } as_amd(thread_trace)->setRegMask(param_value); break; case CL_THREAD_TRACE_PARAM_VM_ID_MASK: if (param_value > CL_THREAD_TRACE_VM_ID_MASK_SINGLE_DETAIL) { return CL_INVALID_VALUE; } as_amd(thread_trace)->setVmIdMask(param_value); break; case CL_THREAD_TRACE_PARAM_INSTRUCTION_MASK: if (param_value > CL_THREAD_TRACE_INST_MASK_IMMEDIATE_CI) { return CL_INVALID_VALUE; } as_amd(thread_trace)->setInstMask(param_value); break; case CL_THREAD_TRACE_PARAM_COMPUTE_UNIT_TARGET: as_amd(thread_trace)->setCU(param_value); break; case CL_THREAD_TRACE_PARAM_SHADER_ARRAY_TARGET: as_amd(thread_trace)->setSH(param_value); break; case CL_THREAD_TRACE_PARAM_SIMD_MASK: as_amd(thread_trace)->setSIMD(param_value); break; case CL_THREAD_TRACE_PARAM_USER_DATA: as_amd(thread_trace)->setUserData(param_value); break; case CL_THREAD_TRACE_PARAM_CAPTURE_MODE: if (param_value > CL_THREAD_TRACE_CAPTURE_SELECT_DETAIL) { return CL_INVALID_VALUE; } as_amd(thread_trace)->setCaptureMode(param_value); break; case CL_THREAD_TRACE_PARAM_IS_WRAPPED: as_amd(thread_trace)->setIsWrapped(true); break; case CL_THREAD_TRACE_PARAM_RANDOM_SEED: as_amd(thread_trace)->setRandomSeed(param_value); break; } return CL_SUCCESS; } RUNTIME_EXIT /*! \brief Get specific information about the OpenCL Thread Trace. * * \param threadTrace_info_param is an enum that identifies the Thread Trace information being * queried. * * \param param_value is a pointer to memory location where appropriate values * for a given \a threadTrace_info_param will be returned. If \a param_value is NULL, * it is ignored. * * \param param_value_size specifies the size in bytes of memory pointed to by * \a param_value. This size in bytes must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * CL_INVALID_OPERATION if cl_threadtrace_amd object is not valid * - CL_INVALID_VALUE if \a param_name is not one of the supported * values or if size in bytes specified by \a param_value_size is < size of * return type and \a param_value is not a NULL value. * - CL_SUCCESS if the function is executed successfully. * */ RUNTIME_ENTRY(cl_int, clGetThreadTraceInfoAMD, (cl_threadtrace_amd thread_trace /* threadTrace */, cl_threadtrace_info thread_trace_info_param, size_t param_value_size, void* param_value, size_t* param_value_size_ret)) { if (!is_valid(thread_trace)) { return CL_INVALID_OPERATION; } // Find the thread trace object, associated with the specified device const device::ThreadTrace* devThreadTrace = as_amd(thread_trace)->getDeviceThreadTrace(); const size_t seNum = as_amd(thread_trace)->deviceSeNumThreadTrace(); switch (thread_trace_info_param) { case CL_THREAD_TRACE_SE: { return amd::clGetInfo(seNum, param_value_size, param_value, param_value_size_ret); } case CL_THREAD_TRACE_BUFFERS_SIZE: { // Make sure we found a valid thread trace object if (devThreadTrace == NULL) { return CL_INVALID_OPERATION; } std::unique_ptr bufSize2Se(new uint[seNum]); if (bufSize2Se.get() == NULL) { return CL_OUT_OF_HOST_MEMORY; } if (!devThreadTrace->info(thread_trace_info_param, bufSize2Se.get(), seNum)) { return CL_INVALID_VALUE; } const size_t valueSize = seNum * sizeof(unsigned int); if (param_value != NULL && param_value_size < valueSize) { return CL_INVALID_VALUE; } *not_null(param_value_size_ret) = valueSize; if (param_value != NULL) { ::memcpy(param_value, bufSize2Se.get(), valueSize); if (param_value_size > valueSize) { ::memset(static_cast
(param_value) + valueSize, '\0', param_value_size - valueSize); } } return CL_SUCCESS; } } return CL_SUCCESS; } RUNTIME_EXIT /* \brief Enqueues the command for the specified thread trace object. * * \param command_queue must be a valid OpenCL command queue. * * \param thread_trace specifies the cl_threadtrace_amd object. * * \param event_wait_list specify [is a pointer to] events that need to * complete before this particular command can be executed. * If \a event_wait_list is NULL, then this particular command does not wait * on any event to complete. If \a event_wait_list is NULL, * \a num_events_in_wait_list must be 0. If \a event_wait_list is not NULL, * the list of events pointed to by \a event_wait_list must be valid and * \a num_events_in_wait_list must be greater than 0. The events specified in * \a event_wait_list act as synchronization points. * * \param num_events_in_wait_list specify the number of events in * \a event_wait_list. It must be 0 if \a event_wait_list is NULL. It must be * greater than 0 if \a event_wait_list is not NULL. * * \param event returns an event object that identifies this particular * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue a * wait for this command to complete. * \return A non zero value if OpenCL failed to release threadTrace * - CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list * are not the same. * - CL_INVALID_VALUE if the thread_trace is invalid thread trace object . * - CL_INVALID_VALUE if the invalid command name enum value , not described in the * cl_threadtrace_command_name_amd, is used. * - CL_INVALID_OPERATION if the command enqueue failed. It can happen in the following cases: * o BEGIN_COMMAND is queued for thread trace object for which memory object/s was/were not * bound.. * o END_COMMAND is queued for thread trace object, for which BEGIN_COMMAND was not queued. * o PAUSE_COMMAND is queued for thread trace object, for which BEGIN_COMMAND was not * queued. * o RESUME_COMMAND is queued for thread trace object, for which PAUSE_COMMAND was not * queued. * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list > 0, or * event_wait_list is not NULL and num_events_in_wait_list is 0, or if event objects in * event_wait_list are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL * implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL * implementation on the host. */ RUNTIME_ENTRY(cl_int, clEnqueueThreadTraceCommandAMD, (cl_command_queue command_queue, cl_threadtrace_amd thread_trace, cl_threadtrace_command_name_amd command_name, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { // Check if command queue is valid if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } // Check if thread trace is valid if (!is_valid(thread_trace)) { return CL_INVALID_OPERATION; } amd::ThreadTrace* amdThreadTrace = as_amd(thread_trace); amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } // Check that device associated with the command queue is the same as with thread trace if (&hostQueue->device() != &amdThreadTrace->device()) { return CL_INVALID_DEVICE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, *hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } // Create a new command for the threadTraces amd::ThreadTraceCommand* command = NULL; switch (command_name) { case CL_THREAD_TRACE_BEGIN_COMMAND: if ((amdThreadTrace->getState() != amd::ThreadTrace::MemoryBound) && (amdThreadTrace->getState() != amd::ThreadTrace::End)) { return CL_INVALID_OPERATION; } amdThreadTrace->setState(amd::ThreadTrace::Begin); command = new amd::ThreadTraceCommand( *hostQueue, eventWaitList, static_cast(&amdThreadTrace->threadTraceConfig()), *amdThreadTrace, amd::ThreadTraceCommand::Begin, CL_COMMAND_THREAD_TRACE); break; case CL_THREAD_TRACE_END_COMMAND: if ((amdThreadTrace->getState() != amd::ThreadTrace::Begin) && (amdThreadTrace->getState() != amd::ThreadTrace::Pause)) { return CL_INVALID_OPERATION; } amdThreadTrace->setState(amd::ThreadTrace::End); command = new amd::ThreadTraceCommand(*hostQueue, eventWaitList, &amdThreadTrace->threadTraceConfig(), *amdThreadTrace, amd::ThreadTraceCommand::End, CL_COMMAND_THREAD_TRACE); break; case CL_THREAD_TRACE_PAUSE_COMMAND: if (amdThreadTrace->getState() != amd::ThreadTrace::Begin) { return CL_INVALID_OPERATION; } amdThreadTrace->setState(amd::ThreadTrace::Pause); command = new amd::ThreadTraceCommand( *hostQueue, eventWaitList, &amdThreadTrace->threadTraceConfig(), *amdThreadTrace, amd::ThreadTraceCommand::Pause, CL_COMMAND_THREAD_TRACE); break; case CL_THREAD_TRACE_RESUME_COMMAND: if (amdThreadTrace->getState() != amd::ThreadTrace::Pause) { return CL_INVALID_OPERATION; } amdThreadTrace->setState(amd::ThreadTrace::Begin); command = new amd::ThreadTraceCommand( *hostQueue, eventWaitList, &amdThreadTrace->threadTraceConfig(), *amdThreadTrace, amd::ThreadTraceCommand::Resume, CL_COMMAND_THREAD_TRACE); break; } if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Submit the command to the device command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT // ///*! \brief Enqueues the binding command to bind cl_threadtrace_amd to cl_mem object for trace ///recording.. // * // * \param command_queue must be a valid OpenCL command queue. // * // * \param thread_trace specifies the cl_threadtrace_amd object. // * // * \param mem_objects the cl_mem objects for trace recording // * // * \param mem_objects_num the number of cl_mem objects in the mem_objects // * // * \param buffer_size the size of each cl_mem object from mem_objects // * // * \param event_wait_list specify [is a pointer to] events that need to // * complete before this particular command can be executed. // * If \a event_wait_list is NULL, then this particular command does not wait // * on any event to complete. If \a event_wait_list is NULL, // * \a num_events_in_wait_list must be 0. If \a event_wait_list is not NULL, // * the list of events pointed to by \a event_wait_list must be valid and // * \a num_events_in_wait_list must be greater than 0. The events specified in // * \a event_wait_list act as synchronization points. // * // * \param num_events_in_wait_list specify the number of events in // * \a event_wait_list. It must be 0 if \a event_wait_list is NULL. It must be // * greater than 0 if \a event_wait_list is not NULL. // * // * \param event returns an event object that identifies this particular // * command and can be used to query or queue a wait for this particular // * command to complete. \a event can be NULL in which case it will not be // * possible for the application to query the status of this command or queue a // * wait for this command to complete. // * \return A non zero value if OpenCL failed to set threadTrace buffer parameter // * - CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue. // * - CL_INVALID_CONTEXT if the context associated with command_queue and events in // event_wait_list are not the same. // * - CL_INVALID_VALUE if the thread_trace is invalid thread trace object. // * - CL_INVALID_VALUE if the buffer_size is negative or zero. // * - CL_INVALID_VALUE if the sub_buffers_num I less than 1. // * - CL_INVALID_OPERATION if the mem_objects_num is not equal to the number of Shader Engines of // the [GPU] device. // * - CL_INVALID_MEM_OBJECT if one on memory objects in the mem_objects array is not a valid // memory object or memory_objects is NULL. // * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory for the data store // associated from the memory objects of the mem_objects array. // * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list > 0, or // event_wait_list is not NULL and num_events_in_wait_list is 0, or if event objects in // event_wait_list are not valid events. // * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL // implementation on the device. // * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the // * OpenCL implementation on the host. // */ RUNTIME_ENTRY(cl_int, clEnqueueBindThreadTraceBufferAMD, (cl_command_queue command_queue, cl_threadtrace_amd thread_trace, cl_mem* mem_objects, cl_uint mem_objects_num, cl_uint buffer_size, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event)) { // Check if command queue is valid if (!is_valid(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } // Check if thread trace is valid if (!is_valid(thread_trace)) { return CL_INVALID_OPERATION; } // Check if input values are valid if ((mem_objects == NULL) || (buffer_size <= 0)) { return CL_INVALID_VALUE; } amd::ThreadTrace* amdThreadTrace = as_amd(thread_trace); // Check if the number of bound memory objects is the same as the number of SEs if (amdThreadTrace->deviceSeNumThreadTrace() != mem_objects_num) { return CL_INVALID_OPERATION; } // Check if memory objects ,bound the thread trace,are valid for (size_t i = 0; i < mem_objects_num; ++i) { cl_mem obj = mem_objects[i]; if (!is_valid(obj)) { return CL_INVALID_MEM_OBJECT; } } amd::HostQueue* hostQueue = as_amd(command_queue)->asHostQueue(); if (NULL == hostQueue) { return CL_INVALID_COMMAND_QUEUE; } // Check that device associated with the command queue is the same as with thread trace if (&hostQueue->device() != &amdThreadTrace->device()) { return CL_INVALID_DEVICE; } amd::Command::EventWaitList eventWaitList; cl_int err = amd::clSetEventWaitList(eventWaitList, *hostQueue, num_events_in_wait_list, event_wait_list); if (err != CL_SUCCESS) { return err; } amdThreadTrace->setState(amd::ThreadTrace::MemoryBound); // Create a new ThreadTraceMemObjectsCommand command amd::ThreadTraceMemObjectsCommand* command = new amd::ThreadTraceMemObjectsCommand( *hostQueue, eventWaitList, mem_objects_num, mem_objects, buffer_size, *amdThreadTrace, CL_COMMAND_THREAD_TRACE_MEM); if (command == NULL) { return CL_OUT_OF_HOST_MEMORY; } // Make sure we have memory for the command execution if (!command->validateMemory()) { delete command; return CL_OUT_OF_RESOURCES; } // Submit the command to the device command->enqueue(); *not_null(event) = as_cl(&command->event()); if (event == NULL) { command->release(); } return CL_SUCCESS; } RUNTIME_EXIT /*! @} * @} */ clr-rocm-5.7.1/opencl/amdocl/cl_thread_trace_amd.h000066400000000000000000000415061450307266000220670ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __CL_THREAD_TRACE_AMD_H #define __CL_THREAD_TRACE_AMD_H #include "CL/cl_platform.h" #ifdef __cplusplus extern "C" { #endif /*__cplusplus*/ typedef struct _cl_threadtrace_amd* cl_threadtrace_amd; typedef cl_uint cl_thread_trace_param; typedef cl_uint cl_threadtrace_info; /* cl_command_type */ #define CL_COMMAND_THREAD_TRACE_MEM 0x4500 #define CL_COMMAND_THREAD_TRACE 0x4501 /* cl_threadtrace_command_name_amd enumeration */ typedef enum _cl_threadtrace_command_name_amd { CL_THREAD_TRACE_BEGIN_COMMAND, CL_THREAD_TRACE_END_COMMAND, CL_THREAD_TRACE_PAUSE_COMMAND, CL_THREAD_TRACE_RESUME_COMMAND } cl_threadtrace_command_name_amd; // Thread trace parameters enum ThreadTraceParameter { CL_THREAD_TRACE_PARAM_TOKEN_MASK, CL_THREAD_TRACE_PARAM_REG_MASK, CL_THREAD_TRACE_PARAM_COMPUTE_UNIT_TARGET, CL_THREAD_TRACE_PARAM_SHADER_ARRAY_TARGET, CL_THREAD_TRACE_PARAM_SIMD_MASK, CL_THREAD_TRACE_PARAM_VM_ID_MASK, CL_THREAD_TRACE_PARAM_RANDOM_SEED, CL_THREAD_TRACE_PARAM_CAPTURE_MODE, CL_THREAD_TRACE_PARAM_INSTRUCTION_MASK, CL_THREAD_TRACE_PARAM_USER_DATA, CL_THREAD_TRACE_PARAM_IS_WRAPPED }; // CL_THREAD_TRACE_PARAM_TOKEN_MASK data selects for SI enum CL_THREAD_TRACE_TOKEN_MASK { // Time passed CL_THREAD_TRACE_TOKEN_MASK_TIME_SI = 0x00000001, // Resync the timestamp CL_THREAD_TRACE_TOKEN_MASK_TIMESTAMP_SI = 0x00000002, // A register write has occurred CL_THREAD_TRACE_TOKEN_MASK_REG_SI = 0x00000004, // A wavefront has started CL_THREAD_TRACE_TOKEN_MASK_WAVE_START_SI = 0x00000008, // Output space has been allocated for color/Z [Should be used for cl-gl] CL_THREAD_TRACE_TOKEN_MASK_WAVE_PS_ALLOC_SI = 0x00000010, // Output space has been allocated for vertex position [Should be used for cl-gl] CL_THREAD_TRACE_TOKEN_MASK_WAVE_VS_ALLOC_SI = 0x00000020, // Wavefront completion CL_THREAD_TRACE_TOKEN_MASK_WAVE_END_SI = 0x00000040, // An event has reached the top of a shader stage. In-order with WAVE_START CL_THREAD_TRACE_TOKEN_MASK_EVENT_SI = 0x00000080, // An event has reached the top of a compute shader stage. In-order with WAVE_START CL_THREAD_TRACE_TOKEN_MASK_EVENT_CS_SI = 0x00000100, // An event has reached the top of a shader stage for the second GFX pipe. In-order with // WAVE_START. //[Should be used for cl-gl] CL_THREAD_TRACE_TOKEN_MASK_EVENT_GFX_SI = 0x00000200, // The kernel has executed an instruction CL_THREAD_TRACE_TOKEN_MASK_INST_SI = 0x00000400, // The kernel has explicitly written the PC value CL_THREAD_TRACE_TOKEN_MASK_INST_PC_SI = 0x00000800, // The kernel has written user data into the thread trace buffer CL_THREAD_TRACE_TOKEN_MASK_INST_USERDATA_SI = 0x00001000, // Provides information about instruction scheduling CL_THREAD_TRACE_TOKEN_MASK_ISSUE_SI = 0x00002000, // The performance counter delta has been updated CL_THREAD_TRACE_TOKEN_MASK_PERF_SI = 0x00004000, // A miscellaneous event has been sent CL_THREAD_TRACE_TOKEN_MASK_MISC_SI = 0x00008000, // All possible tokens CL_THREAD_TRACE_TOKEN_MASK_ALL_SI = 0x0000ffff, }; // CL_THREAD_TRACE_PARAM_REG_MASK data selects enum CL_THREAD_TRACE_REG_MASK { // Event initiator CL_THREAD_TRACE_REG_MASK_EVENT_SI = 0x00000001, // Draw initiator [Should be used for cl-gl] CL_THREAD_TRACE_REG_MASK_DRAW_SI = 0x00000002, // Dispatch initiator CL_THREAD_TRACE_REG_MASK_DISPATCH_SI = 0x00000004, // User data from host CL_THREAD_TRACE_REG_MASK_USERDATA_SI = 0x00000008, // GFXDEC register (8-state) [Should be used for cl-gl] CL_THREAD_TRACE_REG_MASK_GFXDEC_SI = 0x00000020, // SHDEC register (many state) CL_THREAD_TRACE_REG_MASK_SHDEC_SI = 0x00000040, // Other registers CL_THREAD_TRACE_REG_MASK_OTHER_SI = 0x00000080, // All possible registers types CL_THREAD_TRACE_REG_MASK_ALL_SI = 0x000000ff, }; // CL_THREAD_TRACE_PARAM_VM_ID_MASK data selects enum CL_THREAD_TRACE_VM_ID_MASK { // Capture only data from the VM_ID used to write {SQTT}_BASE CL_THREAD_TRACE_VM_ID_MASK_SINGLE = 0, // Capture all data from all VM_IDs CL_THREAD_TRACE_VM_ID_MASK_ALL = 1, // Capture all data but only get target (a.k.a. detail) data from VM_ID used to write {SQTT}_BASE CL_THREAD_TRACE_VM_ID_MASK_SINGLE_DETAIL = 2 }; // CL_THREAD_TRACE_PARAM_CAPTURE_MODE data enum CL_THREAD_TRACE_CAPTURE_MODE { // Capture all data in the thread trace buffer CL_THREAD_TRACE_CAPTURE_ALL = 0, // Capture only data between THREAD_TRACE_START and THREAD_TRACE_STOP events CL_THREAD_TRACE_CAPTURE_SELECT = 1, // Capture data between THREAD_TRACE_START and THREAD_TRACE_/STOP events, // and global/reference data at all times CL_THREAD_TRACE_CAPTURE_SELECT_DETAIL = 2 }; // CL_THREAD_TRACE_PARAM_INSTRUCTION_MASK data selects enum CL_THREAD_TRACE_INSTRUCTION_MASK { // Generate {SQTT}_TOKEN_INST tokens for all instructions CL_THREAD_TRACE_INST_MASK_ALL, // Generate {SQTT}_TOKEN_INST tokens for stalled instructions only CL_THREAD_TRACE_INST_MASK_STALLED, // Generate {SQTT}_TOKEN_INST messages for stalled and other (no op/wait/set prio/etc) // instructions CL_THREAD_TRACE_INST_MASK_STALLED_AND_IMMEDIATE, // Generate {SQTT}_TOKEN_INST messages for immediate instructions only only [ Should be used only // for CI] CL_THREAD_TRACE_INST_MASK_IMMEDIATE_CI, }; enum ThreadTraceInfo { CL_THREAD_TRACE_SE, CL_THREAD_TRACE_BUFFERS_FILLED, CL_THREAD_TRACE_BUFFERS_SIZE }; /*! \brief Creates a new cl_threadtrace_amd object * * \param device must be a valid OpenCL device. * * \param errcode_ret A non zero value if OpenCL failed to create threadTrace * -CL_INVALID_DEVICE if devices contains an invalid device. * -CL_DEVICE_NOT_AVAILABLE if a device is currently not available even * though the device was returned by clGetDeviceIDs. * -CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the * OpenCL implementation on the device. * -CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. * * \return the created threadTrace object */ extern CL_API_ENTRY cl_threadtrace_amd CL_API_CALL clCreateThreadTraceAMD( cl_device_id /* device */, cl_int* /* errcode_ret */ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Destroys a cl_threadtrace_amd object. * * \param threadTrace the cl_threadtrace_amd object for release * * \return A non zero value if OpenCL failed to release threadTrace * -CL_INVALID_VALUE if the thread_trace is not a valid OpenCL thread trace object (cl_threadtrace_amd) . * -CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the * OpenCL implementation on the device. * -CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ extern CL_API_ENTRY cl_int CL_API_CALL clReleaseThreadTraceAMD(cl_threadtrace_amd /* threadTrace */ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Increments the cl_threadtrace_amd object reference count. * * \param threadTrace the cl_threadtrace_amd object for retain * * \return A non zero value if OpenCL failed to retain threadTrace * -CL_INVALID_VALUE if the thread_trace is not a valid thread trace object (cl_threadtrace_amd) . * -CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device. * -CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ extern CL_API_ENTRY cl_int CL_API_CALL clRetainThreadTraceAMD(cl_threadtrace_amd /* threadTrace */ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Sets the cl_threadtrace_amd object configuration parameter. * * \param thread_trace the cl_threadtrace_amd object to set configuration parameter * * \param config_param the cl_thread_trace_param * * \param param_value corresponding to configParam * * \return A non zero value if OpenCL failed to set threadTrace buffer parameter * - CL_INVALID_VALUE if the thread_trace is invalid thread trace object. * - CL_INVALID_VALUE if the invalid config_param or param_value enum values , are used. * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list > 0, or event_wait_list is not NULL and num_events_in_wait_list is 0, * - or if event objects in event_wait_list are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ extern CL_API_ENTRY cl_int CL_API_CALL clSetThreadTraceParamAMD( cl_threadtrace_amd /*thread_trace*/, cl_thread_trace_param /*config_param*/, cl_uint /*param_value*/ ) CL_API_SUFFIX__VERSION_1_0; /* \brief Enqueues the binding command to bind cl_threadtrace_amd to cl_mem object for trace * recording.. * * \param command_queue must be a valid OpenCL command queue. * * \param thread_trace specifies the cl_threadtrace_amd object. * * \param mem_objects the cl_mem objects for trace recording * * \param mem_objects_num the number of cl_mem objects in the mem_objects * * \param buffer_size the size of each cl_mem object from mem_objects * * \param event_wait_list specify [is a pointer to] events that need to * complete before this particular command can be executed. * If \a event_wait_list is NULL, then this particular command does not wait * on any event to complete. If \a event_wait_list is NULL, * \a num_events_in_wait_list must be 0. If \a event_wait_list is not NULL, * the list of events pointed to by \a event_wait_list must be valid and * \a num_events_in_wait_list must be greater than 0. The events specified in * \a event_wait_list act as synchronization points. * * \param num_events_in_wait_list specify the number of events in * \a event_wait_list. It must be 0 if \a event_wait_list is NULL. It must be * greater than 0 if \a event_wait_list is not NULL. * * \param event returns an event object that identifies this particular * command and can be used to query or queue a wait for this particular * command to complete. \a event can be NULL in which case it will not be * possible for the application to query the status of this command or queue a * wait for this command to complete. * \return A non zero value if OpenCL failed to set threadTrace buffer parameter * - CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list * are not the same. * - CL_INVALID_VALUE if the thread_trace is invalid thread trace object. * - CL_INVALID_VALUE if the buffer_size is negative or zero. * - CL_INVALID_VALUE if the sub_buffers_num I less than 1. * - CL_INVALID_OPERATION if the mem_objects_num is not equal to the number of Shader Engines of * the [GPU] device. * - CL_INVALID_MEM_OBJECT if one on memory objects in the mem_objects array is not a valid memory * object or memory_objects is NULL. * - CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory for the data store * associated from the memory objects of the mem_objects array. * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list > 0, or * event_wait_list is not NULL and num_events_in_wait_list is 0, or if event objects in * event_wait_list are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL * implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the * OpenCL implementation on the host. */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueBindThreadTraceBufferAMD( cl_command_queue command_queue, cl_threadtrace_amd /*thread_trace*/, cl_mem* /*mem_objects*/, cl_uint /*mem_objects_num*/, cl_uint /*buffer_size*/, cl_uint /*num_events_in_wait_list*/, const cl_event* /*event_wait_list*/, cl_event* /*event*/ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Get specific information about the OpenCL Thread Trace. * * \param thread_trace_info_param is an enum that identifies the Thread Trace information being * queried. * * \param param_value is a pointer to memory location where appropriate values * for a given \a threadTrace_info_param will be returned. If \a param_value is NULL, * it is ignored. * * \param param_value_size specifies the size in bytes of memory pointed to by * \a param_value. This size in bytes must be >= size of return type. * * \param param_value_size_ret returns the actual size in bytes of data being * queried by param_value. If \a param_value_size_ret is NULL, it is ignored. * * \return One of the following values: * CL_INVALID_OPERATION if cl_threadtrace_amd object is not valid * - CL_INVALID_VALUE if \a param_name is not one of the supported * values or if size in bytes specified by \a param_value_size is < size of * return type and \a param_value is not a NULL value. * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the * OpenCL implementation on the host. * CL_SUCCESS if the function is executed successfully. */ extern CL_API_ENTRY cl_int CL_API_CALL clGetThreadTraceInfoAMD( cl_threadtrace_amd /* thread_trace */, cl_threadtrace_info /*thread_trace_info_param*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ) CL_API_SUFFIX__VERSION_1_0; /*! \brief Enqueues the thread trace command for the specified thread trace object. * * \param command_queue must be a valid OpenCL command queue. * * \param threadTraces specifies an array of cl_threadtrace_amd objects. * * \return A non zero value if OpenCL failed to release threadTrace * - CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue. * - CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list * are not the same. * - CL_INVALID_VALUE if the thread_trace is invalid thread trace object . * - CL_INVALID_VALUE if the invalid command name enum value , not described in the * cl_threadtrace_command_name_amd, is used. * - CL_INVALID_OPERATION if the command enqueue failed. It can happen in the following cases: * o BEGIN_COMMAND is queued for thread trace object for which memory object/s was/were not * bound.. * o END_COMMAND is queued for thread trace object, for which BEGIN_COMMAND was not queued. * o PAUSE_COMMAND is queued for thread trace object, for which BEGIN_COMMAND was not * queued. * o RESUME_COMMAND is queued for thread trace object, for which PAUSE_COMMAND was not * queued. * - CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list > 0, or * event_wait_list is not NULL and num_events_in_wait_list is 0, or if event objects in * event_wait_list are not valid events. * - CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL * implementation on the device. * - CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL * implementation on the host. */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueThreadTraceCommandAMD( cl_command_queue /*command_queue*/, cl_threadtrace_amd /*thread_trace*/, cl_threadtrace_command_name_amd /*command_name*/, cl_uint /*num_events_in_wait_list*/, const cl_event* /*event_wait_list*/, cl_event* /*event*/ ) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } /*extern "C"*/ #endif /*__cplusplus*/ #endif /*__CL_THREAD_TRACE_AMD_H*/ clr-rocm-5.7.1/opencl/amdocl/cmake/000077500000000000000000000000001450307266000170445ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/amdocl/cmake/FindROCclr.cmake000066400000000000000000000035361450307266000220020ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. if(ROCCLR_FOUND) return() endif() find_path(ROCCLR_INCLUDE_DIR top.hpp HINTS ${ROCCLR_PATH} PATHS # gerrit repo name ${CMAKE_SOURCE_DIR}/vdi ${CMAKE_SOURCE_DIR}/../vdi ${CMAKE_SOURCE_DIR}/../../vdi # github repo name ${CMAKE_SOURCE_DIR}/ROCclr ${CMAKE_SOURCE_DIR}/../ROCclr ${CMAKE_SOURCE_DIR}/../../ROCclr # jenkins repo name ${CMAKE_SOURCE_DIR}/rocclr ${CMAKE_SOURCE_DIR}/../rocclr ${CMAKE_SOURCE_DIR}/../../rocclr PATH_SUFFIXES include) include(FindPackageHandleStandardArgs) find_package_handle_standard_args(ROCclr "\nROCclr not found" ROCCLR_INCLUDE_DIR) mark_as_advanced(ROCCLR_INCLUDE_DIR) list(APPEND CMAKE_MODULE_PATH "${ROCCLR_INCLUDE_DIR}/../cmake") include(ROCclr) clr-rocm-5.7.1/opencl/amdocl/gl_functions.hpp000066400000000000000000000067731450307266000212040ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ GLPREFIX(GLubyte*, glGetString, (GLenum name)) GLPREFIX(void, glBindBuffer, (GLenum target, GLuint buffer)) //GLPREFIX(void, glBindFramebufferEXT, (GLenum target, GLuint framebuffer)) GLPREFIX(void, glBindRenderbuffer, (GLenum target, GLuint renderbuffer)) GLPREFIX(void, glBindTexture, (GLenum target, GLuint texture)) GLPREFIX(void, glBufferData, (GLenum target, GLsizeiptr size, const GLvoid* data, GLenum usage)) GLPREFIX(GLenum, glCheckFramebufferStatusEXT, (GLenum target)) GLPREFIX(void, glDeleteBuffers, (GLsizei n, const GLuint* buffers)) GLPREFIX(void, glDrawPixels, (GLsizei width, GLsizei height, GLenum format, GLenum type, const GLvoid *pixels)) //GLPREFIX(void, glFramebufferRenderbufferEXT, (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer)) GLPREFIX(void, glGenBuffers, (GLsizei n, GLuint* buffers)) //GLPREFIX(void, glGenFramebuffersEXT, (GLsizei n, GLuint* framebuffers)) //10 GLPREFIX(void, glGetBufferParameteriv, (GLenum target, GLenum pname, GLint* params)) GLPREFIX(GLenum, glGetError, (void)) GLPREFIX(void, glFinish, (void)) GLPREFIX(void, glFlush, (void)) GLPREFIX(GLenum, glClientWaitSync, (GLsync sync, GLbitfield flags, GLuint64 timeout)) GLPREFIX(void, glGetIntegerv, (GLenum pname, GLint *params)) GLPREFIX(void, glGetRenderbufferParameterivEXT, (GLenum target, GLenum pname, GLint* params)) //GLPREFIX(GLubyte*, glGetString, (GLenum name)) GLPREFIX(void, glGetTexImage, (GLenum target, GLint level, GLenum format, GLenum type, GLvoid *pixels)) GLPREFIX(void, glGetTexLevelParameteriv, (GLenum target, GLint level, GLenum pname, GLint *params)) GLPREFIX(void, glGetTexParameteriv, (GLenum target, GLenum pname, GLint *params)) GLPREFIX(GLboolean, glIsBuffer, (GLuint buffer)) GLPREFIX(GLboolean, glIsRenderbufferEXT, (GLuint renderbuffer)) GLPREFIX(GLboolean, glIsTexture, (GLuint texture)) //20 GLPREFIX(GLvoid*, glMapBuffer, (GLenum target, GLenum access)) GLPREFIX(void, glReadPixels, (GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid *pixels)) GLPREFIX(void, glTexImage2D, (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const GLvoid *pixels)) GLPREFIX(void, glTexImage3D, (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const GLvoid *pixels)) GLPREFIX(GLboolean, glUnmapBuffer, (GLenum target)) #undef GLPREFIX clr-rocm-5.7.1/opencl/amdocl/glibc_functions.cpp000066400000000000000000000026561450307266000216510ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #if defined(__linux__) #include #if defined(__cplusplus) extern "C" { #endif // __cplusplus #if defined(_LP64) asm (".symver memcpy, memcpy@GLIBC_2.2.5"); void *__wrap_memcpy(void *dest, const void *src, size_t n) { return memcpy(dest, src, n); } #endif // _LP64 #if defined(__cplusplus) } #endif // __cplusplus) #endif // __linux__ clr-rocm-5.7.1/opencl/config/000077500000000000000000000000001450307266000157725ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/config/amdocl32.icd000066400000000000000000000000171450307266000200550ustar00rootroot00000000000000libamdocl32.so clr-rocm-5.7.1/opencl/config/amdocl64.icd000066400000000000000000000000171450307266000200620ustar00rootroot00000000000000libamdocl64.so clr-rocm-5.7.1/opencl/configure000066400000000000000000000000001450307266000164170ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/000077500000000000000000000000001450307266000162105ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/000077500000000000000000000000001450307266000176235ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/EGL/000077500000000000000000000000001450307266000202325ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/EGL/egl.h000066400000000000000000000301011450307266000211450ustar00rootroot00000000000000/* -*- mode: c; tab-width: 8; -*- */ /* vi: set sw=4 ts=8: */ /* Reference version of egl.h for EGL 1.4. * $Revision: 9356 $ on $Date: 2009-10-21 02:52:25 -0700 (Wed, 21 Oct 2009) $ */ /* ** Copyright (c) 2007-2009 The Khronos Group Inc. ** ** Permission is hereby granted, free of charge, to any person obtaining a ** copy of this software and/or associated documentation files (the ** "Materials"), to deal in the Materials without restriction, including ** without limitation the rights to use, copy, modify, merge, publish, ** distribute, sublicense, and/or sell copies of the Materials, and to ** permit persons to whom the Materials are furnished to do so, subject to ** the following conditions: ** ** The above copyright notice and this permission notice shall be included ** in all copies or substantial portions of the Materials. ** ** THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, ** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF ** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. ** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY ** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, ** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE ** MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. */ #ifndef __egl_h_ #define __egl_h_ /* All platform-dependent types and macro boilerplate (such as EGLAPI * and EGLAPIENTRY) should go in eglplatform.h. */ #include #ifdef __cplusplus extern "C" { #endif /* EGL Types */ /* EGLint is defined in eglplatform.h */ typedef unsigned int EGLBoolean; typedef unsigned int EGLenum; typedef void *EGLConfig; typedef void *EGLContext; typedef void *EGLDisplay; typedef void *EGLSurface; typedef void *EGLClientBuffer; /* EGL Versioning */ #define EGL_VERSION_1_0 1 #define EGL_VERSION_1_1 1 #define EGL_VERSION_1_2 1 #define EGL_VERSION_1_3 1 #define EGL_VERSION_1_4 1 /* EGL Enumerants. Bitmasks and other exceptional cases aside, most * enums are assigned unique values starting at 0x3000. */ /* EGL aliases */ #define EGL_FALSE 0 #define EGL_TRUE 1 /* Out-of-band handle values */ #define EGL_DEFAULT_DISPLAY ((EGLNativeDisplayType)0) #define EGL_NO_CONTEXT ((EGLContext)0) #define EGL_NO_DISPLAY ((EGLDisplay)0) #define EGL_NO_SURFACE ((EGLSurface)0) /* Out-of-band attribute value */ #define EGL_DONT_CARE ((EGLint)-1) /* Errors / GetError return values */ #define EGL_SUCCESS 0x3000 #define EGL_NOT_INITIALIZED 0x3001 #define EGL_BAD_ACCESS 0x3002 #define EGL_BAD_ALLOC 0x3003 #define EGL_BAD_ATTRIBUTE 0x3004 #define EGL_BAD_CONFIG 0x3005 #define EGL_BAD_CONTEXT 0x3006 #define EGL_BAD_CURRENT_SURFACE 0x3007 #define EGL_BAD_DISPLAY 0x3008 #define EGL_BAD_MATCH 0x3009 #define EGL_BAD_NATIVE_PIXMAP 0x300A #define EGL_BAD_NATIVE_WINDOW 0x300B #define EGL_BAD_PARAMETER 0x300C #define EGL_BAD_SURFACE 0x300D #define EGL_CONTEXT_LOST 0x300E /* EGL 1.1 - IMG_power_management */ /* Reserved 0x300F-0x301F for additional errors */ /* Config attributes */ #define EGL_BUFFER_SIZE 0x3020 #define EGL_ALPHA_SIZE 0x3021 #define EGL_BLUE_SIZE 0x3022 #define EGL_GREEN_SIZE 0x3023 #define EGL_RED_SIZE 0x3024 #define EGL_DEPTH_SIZE 0x3025 #define EGL_STENCIL_SIZE 0x3026 #define EGL_CONFIG_CAVEAT 0x3027 #define EGL_CONFIG_ID 0x3028 #define EGL_LEVEL 0x3029 #define EGL_MAX_PBUFFER_HEIGHT 0x302A #define EGL_MAX_PBUFFER_PIXELS 0x302B #define EGL_MAX_PBUFFER_WIDTH 0x302C #define EGL_NATIVE_RENDERABLE 0x302D #define EGL_NATIVE_VISUAL_ID 0x302E #define EGL_NATIVE_VISUAL_TYPE 0x302F #define EGL_SAMPLES 0x3031 #define EGL_SAMPLE_BUFFERS 0x3032 #define EGL_SURFACE_TYPE 0x3033 #define EGL_TRANSPARENT_TYPE 0x3034 #define EGL_TRANSPARENT_BLUE_VALUE 0x3035 #define EGL_TRANSPARENT_GREEN_VALUE 0x3036 #define EGL_TRANSPARENT_RED_VALUE 0x3037 #define EGL_NONE 0x3038 /* Attrib list terminator */ #define EGL_BIND_TO_TEXTURE_RGB 0x3039 #define EGL_BIND_TO_TEXTURE_RGBA 0x303A #define EGL_MIN_SWAP_INTERVAL 0x303B #define EGL_MAX_SWAP_INTERVAL 0x303C #define EGL_LUMINANCE_SIZE 0x303D #define EGL_ALPHA_MASK_SIZE 0x303E #define EGL_COLOR_BUFFER_TYPE 0x303F #define EGL_RENDERABLE_TYPE 0x3040 #define EGL_MATCH_NATIVE_PIXMAP 0x3041 /* Pseudo-attribute (not queryable) */ #define EGL_CONFORMANT 0x3042 /* Reserved 0x3041-0x304F for additional config attributes */ /* Config attribute values */ #define EGL_SLOW_CONFIG 0x3050 /* EGL_CONFIG_CAVEAT value */ #define EGL_NON_CONFORMANT_CONFIG 0x3051 /* EGL_CONFIG_CAVEAT value */ #define EGL_TRANSPARENT_RGB 0x3052 /* EGL_TRANSPARENT_TYPE value */ #define EGL_RGB_BUFFER 0x308E /* EGL_COLOR_BUFFER_TYPE value */ #define EGL_LUMINANCE_BUFFER 0x308F /* EGL_COLOR_BUFFER_TYPE value */ /* More config attribute values, for EGL_TEXTURE_FORMAT */ #define EGL_NO_TEXTURE 0x305C #define EGL_TEXTURE_RGB 0x305D #define EGL_TEXTURE_RGBA 0x305E #define EGL_TEXTURE_2D 0x305F /* Config attribute mask bits */ #define EGL_PBUFFER_BIT 0x0001 /* EGL_SURFACE_TYPE mask bits */ #define EGL_PIXMAP_BIT 0x0002 /* EGL_SURFACE_TYPE mask bits */ #define EGL_WINDOW_BIT 0x0004 /* EGL_SURFACE_TYPE mask bits */ #define EGL_VG_COLORSPACE_LINEAR_BIT 0x0020 /* EGL_SURFACE_TYPE mask bits */ #define EGL_VG_ALPHA_FORMAT_PRE_BIT 0x0040 /* EGL_SURFACE_TYPE mask bits */ #define EGL_MULTISAMPLE_RESOLVE_BOX_BIT 0x0200 /* EGL_SURFACE_TYPE mask bits */ #define EGL_SWAP_BEHAVIOR_PRESERVED_BIT 0x0400 /* EGL_SURFACE_TYPE mask bits */ #define EGL_OPENGL_ES_BIT 0x0001 /* EGL_RENDERABLE_TYPE mask bits */ #define EGL_OPENVG_BIT 0x0002 /* EGL_RENDERABLE_TYPE mask bits */ #define EGL_OPENGL_ES2_BIT 0x0004 /* EGL_RENDERABLE_TYPE mask bits */ #define EGL_OPENGL_BIT 0x0008 /* EGL_RENDERABLE_TYPE mask bits */ /* QueryString targets */ #define EGL_VENDOR 0x3053 #define EGL_VERSION 0x3054 #define EGL_EXTENSIONS 0x3055 #define EGL_CLIENT_APIS 0x308D /* QuerySurface / SurfaceAttrib / CreatePbufferSurface targets */ #define EGL_HEIGHT 0x3056 #define EGL_WIDTH 0x3057 #define EGL_LARGEST_PBUFFER 0x3058 #define EGL_TEXTURE_FORMAT 0x3080 #define EGL_TEXTURE_TARGET 0x3081 #define EGL_MIPMAP_TEXTURE 0x3082 #define EGL_MIPMAP_LEVEL 0x3083 #define EGL_RENDER_BUFFER 0x3086 #define EGL_VG_COLORSPACE 0x3087 #define EGL_VG_ALPHA_FORMAT 0x3088 #define EGL_HORIZONTAL_RESOLUTION 0x3090 #define EGL_VERTICAL_RESOLUTION 0x3091 #define EGL_PIXEL_ASPECT_RATIO 0x3092 #define EGL_SWAP_BEHAVIOR 0x3093 #define EGL_MULTISAMPLE_RESOLVE 0x3099 /* EGL_RENDER_BUFFER values / BindTexImage / ReleaseTexImage buffer targets */ #define EGL_BACK_BUFFER 0x3084 #define EGL_SINGLE_BUFFER 0x3085 /* OpenVG color spaces */ #define EGL_VG_COLORSPACE_sRGB 0x3089 /* EGL_VG_COLORSPACE value */ #define EGL_VG_COLORSPACE_LINEAR 0x308A /* EGL_VG_COLORSPACE value */ /* OpenVG alpha formats */ #define EGL_VG_ALPHA_FORMAT_NONPRE 0x308B /* EGL_ALPHA_FORMAT value */ #define EGL_VG_ALPHA_FORMAT_PRE 0x308C /* EGL_ALPHA_FORMAT value */ /* Constant scale factor by which fractional display resolutions & * aspect ratio are scaled when queried as integer values. */ #define EGL_DISPLAY_SCALING 10000 /* Unknown display resolution/aspect ratio */ #define EGL_UNKNOWN ((EGLint)-1) /* Back buffer swap behaviors */ #define EGL_BUFFER_PRESERVED 0x3094 /* EGL_SWAP_BEHAVIOR value */ #define EGL_BUFFER_DESTROYED 0x3095 /* EGL_SWAP_BEHAVIOR value */ /* CreatePbufferFromClientBuffer buffer types */ #define EGL_OPENVG_IMAGE 0x3096 /* QueryContext targets */ #define EGL_CONTEXT_CLIENT_TYPE 0x3097 /* CreateContext attributes */ #define EGL_CONTEXT_CLIENT_VERSION 0x3098 /* Multisample resolution behaviors */ #define EGL_MULTISAMPLE_RESOLVE_DEFAULT 0x309A /* EGL_MULTISAMPLE_RESOLVE value */ #define EGL_MULTISAMPLE_RESOLVE_BOX 0x309B /* EGL_MULTISAMPLE_RESOLVE value */ /* BindAPI/QueryAPI targets */ #define EGL_OPENGL_ES_API 0x30A0 #define EGL_OPENVG_API 0x30A1 #define EGL_OPENGL_API 0x30A2 /* GetCurrentSurface targets */ #define EGL_DRAW 0x3059 #define EGL_READ 0x305A /* WaitNative engines */ #define EGL_CORE_NATIVE_ENGINE 0x305B /* EGL 1.2 tokens renamed for consistency in EGL 1.3 */ #define EGL_COLORSPACE EGL_VG_COLORSPACE #define EGL_ALPHA_FORMAT EGL_VG_ALPHA_FORMAT #define EGL_COLORSPACE_sRGB EGL_VG_COLORSPACE_sRGB #define EGL_COLORSPACE_LINEAR EGL_VG_COLORSPACE_LINEAR #define EGL_ALPHA_FORMAT_NONPRE EGL_VG_ALPHA_FORMAT_NONPRE #define EGL_ALPHA_FORMAT_PRE EGL_VG_ALPHA_FORMAT_PRE /* EGL extensions must request enum blocks from the Khronos * API Registrar, who maintains the enumerant registry. Submit * a bug in Khronos Bugzilla against task "Registry". */ /* EGL Functions */ EGLAPI EGLint EGLAPIENTRY eglGetError(void); EGLAPI EGLDisplay EGLAPIENTRY eglGetDisplay(EGLNativeDisplayType display_id); EGLAPI EGLBoolean EGLAPIENTRY eglInitialize(EGLDisplay dpy, EGLint *major, EGLint *minor); EGLAPI EGLBoolean EGLAPIENTRY eglTerminate(EGLDisplay dpy); EGLAPI const char * EGLAPIENTRY eglQueryString(EGLDisplay dpy, EGLint name); EGLAPI EGLBoolean EGLAPIENTRY eglGetConfigs(EGLDisplay dpy, EGLConfig *configs, EGLint config_size, EGLint *num_config); EGLAPI EGLBoolean EGLAPIENTRY eglChooseConfig(EGLDisplay dpy, const EGLint *attrib_list, EGLConfig *configs, EGLint config_size, EGLint *num_config); EGLAPI EGLBoolean EGLAPIENTRY eglGetConfigAttrib(EGLDisplay dpy, EGLConfig config, EGLint attribute, EGLint *value); EGLAPI EGLSurface EGLAPIENTRY eglCreateWindowSurface(EGLDisplay dpy, EGLConfig config, EGLNativeWindowType win, const EGLint *attrib_list); EGLAPI EGLSurface EGLAPIENTRY eglCreatePbufferSurface(EGLDisplay dpy, EGLConfig config, const EGLint *attrib_list); EGLAPI EGLSurface EGLAPIENTRY eglCreatePixmapSurface(EGLDisplay dpy, EGLConfig config, EGLNativePixmapType pixmap, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglDestroySurface(EGLDisplay dpy, EGLSurface surface); EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurface(EGLDisplay dpy, EGLSurface surface, EGLint attribute, EGLint *value); EGLAPI EGLBoolean EGLAPIENTRY eglBindAPI(EGLenum api); EGLAPI EGLenum EGLAPIENTRY eglQueryAPI(void); EGLAPI EGLBoolean EGLAPIENTRY eglWaitClient(void); EGLAPI EGLBoolean EGLAPIENTRY eglReleaseThread(void); EGLAPI EGLSurface EGLAPIENTRY eglCreatePbufferFromClientBuffer( EGLDisplay dpy, EGLenum buftype, EGLClientBuffer buffer, EGLConfig config, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglSurfaceAttrib(EGLDisplay dpy, EGLSurface surface, EGLint attribute, EGLint value); EGLAPI EGLBoolean EGLAPIENTRY eglBindTexImage(EGLDisplay dpy, EGLSurface surface, EGLint buffer); EGLAPI EGLBoolean EGLAPIENTRY eglReleaseTexImage(EGLDisplay dpy, EGLSurface surface, EGLint buffer); EGLAPI EGLBoolean EGLAPIENTRY eglSwapInterval(EGLDisplay dpy, EGLint interval); EGLAPI EGLContext EGLAPIENTRY eglCreateContext(EGLDisplay dpy, EGLConfig config, EGLContext share_context, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglDestroyContext(EGLDisplay dpy, EGLContext ctx); EGLAPI EGLBoolean EGLAPIENTRY eglMakeCurrent(EGLDisplay dpy, EGLSurface draw, EGLSurface read, EGLContext ctx); EGLAPI EGLContext EGLAPIENTRY eglGetCurrentContext(void); EGLAPI EGLSurface EGLAPIENTRY eglGetCurrentSurface(EGLint readdraw); EGLAPI EGLDisplay EGLAPIENTRY eglGetCurrentDisplay(void); EGLAPI EGLBoolean EGLAPIENTRY eglQueryContext(EGLDisplay dpy, EGLContext ctx, EGLint attribute, EGLint *value); EGLAPI EGLBoolean EGLAPIENTRY eglWaitGL(void); EGLAPI EGLBoolean EGLAPIENTRY eglWaitNative(EGLint engine); EGLAPI EGLBoolean EGLAPIENTRY eglSwapBuffers(EGLDisplay dpy, EGLSurface surface); EGLAPI EGLBoolean EGLAPIENTRY eglCopyBuffers(EGLDisplay dpy, EGLSurface surface, EGLNativePixmapType target); /* This is a generic function pointer type, whose name indicates it must * be cast to the proper type *and calling convention* before use. */ typedef void (*__eglMustCastToProperFunctionPointerType)(void); /* Now, define eglGetProcAddress using the generic function ptr. type */ EGLAPI __eglMustCastToProperFunctionPointerType EGLAPIENTRY eglGetProcAddress(const char *procname); #ifdef __cplusplus } #endif #endif /* __egl_h_ */ clr-rocm-5.7.1/opencl/khronos/headers/EGL/eglext.h000066400000000000000000000724661450307266000217120ustar00rootroot00000000000000#ifndef __eglext_h_ #define __eglext_h_ 1 #ifdef __cplusplus extern "C" { #endif /* ** Copyright (c) 2013 The Khronos Group Inc. ** ** Permission is hereby granted, free of charge, to any person obtaining a ** copy of this software and/or associated documentation files (the ** "Materials"), to deal in the Materials without restriction, including ** without limitation the rights to use, copy, modify, merge, publish, ** distribute, sublicense, and/or sell copies of the Materials, and to ** permit persons to whom the Materials are furnished to do so, subject to ** the following conditions: ** ** The above copyright notice and this permission notice shall be included ** in all copies or substantial portions of the Materials. ** ** THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, ** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF ** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. ** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY ** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, ** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE ** MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. */ /* ** This header is generated from the Khronos OpenGL / OpenGL ES XML ** API Registry. The current version of the Registry, generator scripts ** used to make the header, and the header can be found at ** http://www.opengl.org/registry/ ** ** Khronos $Revision: 24350 $ on $Date: 2013-12-04 12:46:23 -0800 (Wed, 04 Dec 2013) $ */ #include #define EGL_EGLEXT_VERSION 20131204 /* Generated C header for: * API: egl * Versions considered: .* * Versions emitted: _nomatch_^ * Default extensions included: egl * Additional extensions included: _nomatch_^ * Extensions removed: _nomatch_^ */ #ifndef EGL_KHR_cl_event #define EGL_KHR_cl_event 1 #define EGL_CL_EVENT_HANDLE_KHR 0x309C #define EGL_SYNC_CL_EVENT_KHR 0x30FE #define EGL_SYNC_CL_EVENT_COMPLETE_KHR 0x30FF #endif /* EGL_KHR_cl_event */ #ifndef EGL_KHR_cl_event2 #define EGL_KHR_cl_event2 1 typedef void *EGLSyncKHR; typedef intptr_t EGLAttribKHR; typedef EGLSyncKHR (EGLAPIENTRYP PFNEGLCREATESYNC64KHRPROC) (EGLDisplay dpy, EGLenum type, const EGLAttribKHR *attrib_list); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLSyncKHR EGLAPIENTRY eglCreateSync64KHR (EGLDisplay dpy, EGLenum type, const EGLAttribKHR *attrib_list); #endif #endif /* EGL_KHR_cl_event2 */ #ifndef EGL_KHR_client_get_all_proc_addresses #define EGL_KHR_client_get_all_proc_addresses 1 #endif /* EGL_KHR_client_get_all_proc_addresses */ #ifndef EGL_KHR_config_attribs #define EGL_KHR_config_attribs 1 #define EGL_CONFORMANT_KHR 0x3042 #define EGL_VG_COLORSPACE_LINEAR_BIT_KHR 0x0020 #define EGL_VG_ALPHA_FORMAT_PRE_BIT_KHR 0x0040 #endif /* EGL_KHR_config_attribs */ #ifndef EGL_KHR_create_context #define EGL_KHR_create_context 1 #define EGL_CONTEXT_MAJOR_VERSION_KHR 0x3098 #define EGL_CONTEXT_MINOR_VERSION_KHR 0x30FB #define EGL_CONTEXT_FLAGS_KHR 0x30FC #define EGL_CONTEXT_OPENGL_PROFILE_MASK_KHR 0x30FD #define EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_KHR 0x31BD #define EGL_NO_RESET_NOTIFICATION_KHR 0x31BE #define EGL_LOSE_CONTEXT_ON_RESET_KHR 0x31BF #define EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR 0x00000001 #define EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR 0x00000002 #define EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR 0x00000004 #define EGL_CONTEXT_OPENGL_CORE_PROFILE_BIT_KHR 0x00000001 #define EGL_CONTEXT_OPENGL_COMPATIBILITY_PROFILE_BIT_KHR 0x00000002 #define EGL_OPENGL_ES3_BIT_KHR 0x00000040 #endif /* EGL_KHR_create_context */ #ifndef EGL_KHR_fence_sync #define EGL_KHR_fence_sync 1 #ifdef KHRONOS_SUPPORT_INT64 #define EGL_SYNC_PRIOR_COMMANDS_COMPLETE_KHR 0x30F0 #define EGL_SYNC_CONDITION_KHR 0x30F8 #define EGL_SYNC_FENCE_KHR 0x30F9 #endif /* KHRONOS_SUPPORT_INT64 */ #endif /* EGL_KHR_fence_sync */ #ifndef EGL_KHR_get_all_proc_addresses #define EGL_KHR_get_all_proc_addresses 1 #endif /* EGL_KHR_get_all_proc_addresses */ #ifndef EGL_KHR_gl_renderbuffer_image #define EGL_KHR_gl_renderbuffer_image 1 #define EGL_GL_RENDERBUFFER_KHR 0x30B9 #endif /* EGL_KHR_gl_renderbuffer_image */ #ifndef EGL_KHR_gl_texture_2D_image #define EGL_KHR_gl_texture_2D_image 1 #define EGL_GL_TEXTURE_2D_KHR 0x30B1 #define EGL_GL_TEXTURE_LEVEL_KHR 0x30BC #endif /* EGL_KHR_gl_texture_2D_image */ #ifndef EGL_KHR_gl_texture_3D_image #define EGL_KHR_gl_texture_3D_image 1 #define EGL_GL_TEXTURE_3D_KHR 0x30B2 #define EGL_GL_TEXTURE_ZOFFSET_KHR 0x30BD #endif /* EGL_KHR_gl_texture_3D_image */ #ifndef EGL_KHR_gl_texture_cubemap_image #define EGL_KHR_gl_texture_cubemap_image 1 #define EGL_GL_TEXTURE_CUBE_MAP_POSITIVE_X_KHR 0x30B3 #define EGL_GL_TEXTURE_CUBE_MAP_NEGATIVE_X_KHR 0x30B4 #define EGL_GL_TEXTURE_CUBE_MAP_POSITIVE_Y_KHR 0x30B5 #define EGL_GL_TEXTURE_CUBE_MAP_NEGATIVE_Y_KHR 0x30B6 #define EGL_GL_TEXTURE_CUBE_MAP_POSITIVE_Z_KHR 0x30B7 #define EGL_GL_TEXTURE_CUBE_MAP_NEGATIVE_Z_KHR 0x30B8 #endif /* EGL_KHR_gl_texture_cubemap_image */ #ifndef EGL_KHR_image #define EGL_KHR_image 1 typedef void *EGLImageKHR; #define EGL_NATIVE_PIXMAP_KHR 0x30B0 #define EGL_NO_IMAGE_KHR ((EGLImageKHR)0) typedef EGLImageKHR (EGLAPIENTRYP PFNEGLCREATEIMAGEKHRPROC) (EGLDisplay dpy, EGLContext ctx, EGLenum target, EGLClientBuffer buffer, const EGLint *attrib_list); typedef EGLBoolean (EGLAPIENTRYP PFNEGLDESTROYIMAGEKHRPROC) (EGLDisplay dpy, EGLImageKHR image); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLImageKHR EGLAPIENTRY eglCreateImageKHR (EGLDisplay dpy, EGLContext ctx, EGLenum target, EGLClientBuffer buffer, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglDestroyImageKHR (EGLDisplay dpy, EGLImageKHR image); #endif #endif /* EGL_KHR_image */ #ifndef EGL_KHR_image_base #define EGL_KHR_image_base 1 #define EGL_IMAGE_PRESERVED_KHR 0x30D2 #endif /* EGL_KHR_image_base */ #ifndef EGL_KHR_image_pixmap #define EGL_KHR_image_pixmap 1 #endif /* EGL_KHR_image_pixmap */ #ifndef EGL_KHR_lock_surface #define EGL_KHR_lock_surface 1 #define EGL_READ_SURFACE_BIT_KHR 0x0001 #define EGL_WRITE_SURFACE_BIT_KHR 0x0002 #define EGL_LOCK_SURFACE_BIT_KHR 0x0080 #define EGL_OPTIMAL_FORMAT_BIT_KHR 0x0100 #define EGL_MATCH_FORMAT_KHR 0x3043 #define EGL_FORMAT_RGB_565_EXACT_KHR 0x30C0 #define EGL_FORMAT_RGB_565_KHR 0x30C1 #define EGL_FORMAT_RGBA_8888_EXACT_KHR 0x30C2 #define EGL_FORMAT_RGBA_8888_KHR 0x30C3 #define EGL_MAP_PRESERVE_PIXELS_KHR 0x30C4 #define EGL_LOCK_USAGE_HINT_KHR 0x30C5 #define EGL_BITMAP_POINTER_KHR 0x30C6 #define EGL_BITMAP_PITCH_KHR 0x30C7 #define EGL_BITMAP_ORIGIN_KHR 0x30C8 #define EGL_BITMAP_PIXEL_RED_OFFSET_KHR 0x30C9 #define EGL_BITMAP_PIXEL_GREEN_OFFSET_KHR 0x30CA #define EGL_BITMAP_PIXEL_BLUE_OFFSET_KHR 0x30CB #define EGL_BITMAP_PIXEL_ALPHA_OFFSET_KHR 0x30CC #define EGL_BITMAP_PIXEL_LUMINANCE_OFFSET_KHR 0x30CD #define EGL_LOWER_LEFT_KHR 0x30CE #define EGL_UPPER_LEFT_KHR 0x30CF typedef EGLBoolean (EGLAPIENTRYP PFNEGLLOCKSURFACEKHRPROC) (EGLDisplay dpy, EGLSurface surface, const EGLint *attrib_list); typedef EGLBoolean (EGLAPIENTRYP PFNEGLUNLOCKSURFACEKHRPROC) (EGLDisplay dpy, EGLSurface surface); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLBoolean EGLAPIENTRY eglLockSurfaceKHR (EGLDisplay dpy, EGLSurface surface, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglUnlockSurfaceKHR (EGLDisplay dpy, EGLSurface surface); #endif #endif /* EGL_KHR_lock_surface */ #ifndef EGL_KHR_lock_surface2 #define EGL_KHR_lock_surface2 1 #define EGL_BITMAP_PIXEL_SIZE_KHR 0x3110 #endif /* EGL_KHR_lock_surface2 */ #ifndef EGL_KHR_lock_surface3 #define EGL_KHR_lock_surface3 1 typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYSURFACE64KHRPROC) (EGLDisplay dpy, EGLSurface surface, EGLint attribute, EGLAttribKHR *value); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurface64KHR (EGLDisplay dpy, EGLSurface surface, EGLint attribute, EGLAttribKHR *value); #endif #endif /* EGL_KHR_lock_surface3 */ #ifndef EGL_KHR_reusable_sync #define EGL_KHR_reusable_sync 1 typedef khronos_utime_nanoseconds_t EGLTimeKHR; #ifdef KHRONOS_SUPPORT_INT64 #define EGL_SYNC_STATUS_KHR 0x30F1 #define EGL_SIGNALED_KHR 0x30F2 #define EGL_UNSIGNALED_KHR 0x30F3 #define EGL_TIMEOUT_EXPIRED_KHR 0x30F5 #define EGL_CONDITION_SATISFIED_KHR 0x30F6 #define EGL_SYNC_TYPE_KHR 0x30F7 #define EGL_SYNC_REUSABLE_KHR 0x30FA #define EGL_SYNC_FLUSH_COMMANDS_BIT_KHR 0x0001 #define EGL_FOREVER_KHR 0xFFFFFFFFFFFFFFFFull #define EGL_NO_SYNC_KHR ((EGLSyncKHR)0) typedef EGLSyncKHR (EGLAPIENTRYP PFNEGLCREATESYNCKHRPROC) (EGLDisplay dpy, EGLenum type, const EGLint *attrib_list); typedef EGLBoolean (EGLAPIENTRYP PFNEGLDESTROYSYNCKHRPROC) (EGLDisplay dpy, EGLSyncKHR sync); typedef EGLint (EGLAPIENTRYP PFNEGLCLIENTWAITSYNCKHRPROC) (EGLDisplay dpy, EGLSyncKHR sync, EGLint flags, EGLTimeKHR timeout); typedef EGLBoolean (EGLAPIENTRYP PFNEGLSIGNALSYNCKHRPROC) (EGLDisplay dpy, EGLSyncKHR sync, EGLenum mode); typedef EGLBoolean (EGLAPIENTRYP PFNEGLGETSYNCATTRIBKHRPROC) (EGLDisplay dpy, EGLSyncKHR sync, EGLint attribute, EGLint *value); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLSyncKHR EGLAPIENTRY eglCreateSyncKHR (EGLDisplay dpy, EGLenum type, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglDestroySyncKHR (EGLDisplay dpy, EGLSyncKHR sync); EGLAPI EGLint EGLAPIENTRY eglClientWaitSyncKHR (EGLDisplay dpy, EGLSyncKHR sync, EGLint flags, EGLTimeKHR timeout); EGLAPI EGLBoolean EGLAPIENTRY eglSignalSyncKHR (EGLDisplay dpy, EGLSyncKHR sync, EGLenum mode); EGLAPI EGLBoolean EGLAPIENTRY eglGetSyncAttribKHR (EGLDisplay dpy, EGLSyncKHR sync, EGLint attribute, EGLint *value); #endif #endif /* KHRONOS_SUPPORT_INT64 */ #endif /* EGL_KHR_reusable_sync */ #ifndef EGL_KHR_stream #define EGL_KHR_stream 1 typedef void *EGLStreamKHR; typedef khronos_uint64_t EGLuint64KHR; #ifdef KHRONOS_SUPPORT_INT64 #define EGL_NO_STREAM_KHR ((EGLStreamKHR)0) #define EGL_CONSUMER_LATENCY_USEC_KHR 0x3210 #define EGL_PRODUCER_FRAME_KHR 0x3212 #define EGL_CONSUMER_FRAME_KHR 0x3213 #define EGL_STREAM_STATE_KHR 0x3214 #define EGL_STREAM_STATE_CREATED_KHR 0x3215 #define EGL_STREAM_STATE_CONNECTING_KHR 0x3216 #define EGL_STREAM_STATE_EMPTY_KHR 0x3217 #define EGL_STREAM_STATE_NEW_FRAME_AVAILABLE_KHR 0x3218 #define EGL_STREAM_STATE_OLD_FRAME_AVAILABLE_KHR 0x3219 #define EGL_STREAM_STATE_DISCONNECTED_KHR 0x321A #define EGL_BAD_STREAM_KHR 0x321B #define EGL_BAD_STATE_KHR 0x321C typedef EGLStreamKHR (EGLAPIENTRYP PFNEGLCREATESTREAMKHRPROC) (EGLDisplay dpy, const EGLint *attrib_list); typedef EGLBoolean (EGLAPIENTRYP PFNEGLDESTROYSTREAMKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream); typedef EGLBoolean (EGLAPIENTRYP PFNEGLSTREAMATTRIBKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLint value); typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYSTREAMKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLint *value); typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYSTREAMU64KHRPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLuint64KHR *value); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLStreamKHR EGLAPIENTRY eglCreateStreamKHR (EGLDisplay dpy, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglDestroyStreamKHR (EGLDisplay dpy, EGLStreamKHR stream); EGLAPI EGLBoolean EGLAPIENTRY eglStreamAttribKHR (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLint value); EGLAPI EGLBoolean EGLAPIENTRY eglQueryStreamKHR (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLint *value); EGLAPI EGLBoolean EGLAPIENTRY eglQueryStreamu64KHR (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLuint64KHR *value); #endif #endif /* KHRONOS_SUPPORT_INT64 */ #endif /* EGL_KHR_stream */ #ifndef EGL_KHR_stream_consumer_gltexture #define EGL_KHR_stream_consumer_gltexture 1 #ifdef EGL_KHR_stream #define EGL_CONSUMER_ACQUIRE_TIMEOUT_USEC_KHR 0x321E typedef EGLBoolean (EGLAPIENTRYP PFNEGLSTREAMCONSUMERGLTEXTUREEXTERNALKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream); typedef EGLBoolean (EGLAPIENTRYP PFNEGLSTREAMCONSUMERACQUIREKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream); typedef EGLBoolean (EGLAPIENTRYP PFNEGLSTREAMCONSUMERRELEASEKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerGLTextureExternalKHR (EGLDisplay dpy, EGLStreamKHR stream); EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerAcquireKHR (EGLDisplay dpy, EGLStreamKHR stream); EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerReleaseKHR (EGLDisplay dpy, EGLStreamKHR stream); #endif #endif /* EGL_KHR_stream */ #endif /* EGL_KHR_stream_consumer_gltexture */ #ifndef EGL_KHR_stream_cross_process_fd #define EGL_KHR_stream_cross_process_fd 1 typedef int EGLNativeFileDescriptorKHR; #ifdef EGL_KHR_stream #define EGL_NO_FILE_DESCRIPTOR_KHR ((EGLNativeFileDescriptorKHR)(-1)) typedef EGLNativeFileDescriptorKHR (EGLAPIENTRYP PFNEGLGETSTREAMFILEDESCRIPTORKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream); typedef EGLStreamKHR (EGLAPIENTRYP PFNEGLCREATESTREAMFROMFILEDESCRIPTORKHRPROC) (EGLDisplay dpy, EGLNativeFileDescriptorKHR file_descriptor); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLNativeFileDescriptorKHR EGLAPIENTRY eglGetStreamFileDescriptorKHR (EGLDisplay dpy, EGLStreamKHR stream); EGLAPI EGLStreamKHR EGLAPIENTRY eglCreateStreamFromFileDescriptorKHR (EGLDisplay dpy, EGLNativeFileDescriptorKHR file_descriptor); #endif #endif /* EGL_KHR_stream */ #endif /* EGL_KHR_stream_cross_process_fd */ #ifndef EGL_KHR_stream_fifo #define EGL_KHR_stream_fifo 1 #ifdef EGL_KHR_stream #define EGL_STREAM_FIFO_LENGTH_KHR 0x31FC #define EGL_STREAM_TIME_NOW_KHR 0x31FD #define EGL_STREAM_TIME_CONSUMER_KHR 0x31FE #define EGL_STREAM_TIME_PRODUCER_KHR 0x31FF typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYSTREAMTIMEKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLTimeKHR *value); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLBoolean EGLAPIENTRY eglQueryStreamTimeKHR (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLTimeKHR *value); #endif #endif /* EGL_KHR_stream */ #endif /* EGL_KHR_stream_fifo */ #ifndef EGL_KHR_stream_producer_aldatalocator #define EGL_KHR_stream_producer_aldatalocator 1 #ifdef EGL_KHR_stream #endif /* EGL_KHR_stream */ #endif /* EGL_KHR_stream_producer_aldatalocator */ #ifndef EGL_KHR_stream_producer_eglsurface #define EGL_KHR_stream_producer_eglsurface 1 #ifdef EGL_KHR_stream #define EGL_STREAM_BIT_KHR 0x0800 typedef EGLSurface (EGLAPIENTRYP PFNEGLCREATESTREAMPRODUCERSURFACEKHRPROC) (EGLDisplay dpy, EGLConfig config, EGLStreamKHR stream, const EGLint *attrib_list); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLSurface EGLAPIENTRY eglCreateStreamProducerSurfaceKHR (EGLDisplay dpy, EGLConfig config, EGLStreamKHR stream, const EGLint *attrib_list); #endif #endif /* EGL_KHR_stream */ #endif /* EGL_KHR_stream_producer_eglsurface */ #ifndef EGL_KHR_surfaceless_context #define EGL_KHR_surfaceless_context 1 #endif /* EGL_KHR_surfaceless_context */ #ifndef EGL_KHR_vg_parent_image #define EGL_KHR_vg_parent_image 1 #define EGL_VG_PARENT_IMAGE_KHR 0x30BA #endif /* EGL_KHR_vg_parent_image */ #ifndef EGL_KHR_wait_sync #define EGL_KHR_wait_sync 1 typedef EGLint (EGLAPIENTRYP PFNEGLWAITSYNCKHRPROC) (EGLDisplay dpy, EGLSyncKHR sync, EGLint flags); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLint EGLAPIENTRY eglWaitSyncKHR (EGLDisplay dpy, EGLSyncKHR sync, EGLint flags); #endif #endif /* EGL_KHR_wait_sync */ #ifndef EGL_ANDROID_blob_cache #define EGL_ANDROID_blob_cache 1 typedef khronos_ssize_t EGLsizeiANDROID; typedef void (*EGLSetBlobFuncANDROID) (const void *key, EGLsizeiANDROID keySize, const void *value, EGLsizeiANDROID valueSize); typedef EGLsizeiANDROID (*EGLGetBlobFuncANDROID) (const void *key, EGLsizeiANDROID keySize, void *value, EGLsizeiANDROID valueSize); typedef void (EGLAPIENTRYP PFNEGLSETBLOBCACHEFUNCSANDROIDPROC) (EGLDisplay dpy, EGLSetBlobFuncANDROID set, EGLGetBlobFuncANDROID get); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI void EGLAPIENTRY eglSetBlobCacheFuncsANDROID (EGLDisplay dpy, EGLSetBlobFuncANDROID set, EGLGetBlobFuncANDROID get); #endif #endif /* EGL_ANDROID_blob_cache */ #ifndef EGL_ANDROID_framebuffer_target #define EGL_ANDROID_framebuffer_target 1 #define EGL_FRAMEBUFFER_TARGET_ANDROID 0x3147 #endif /* EGL_ANDROID_framebuffer_target */ #ifndef EGL_ANDROID_image_native_buffer #define EGL_ANDROID_image_native_buffer 1 #define EGL_NATIVE_BUFFER_ANDROID 0x3140 #endif /* EGL_ANDROID_image_native_buffer */ #ifndef EGL_ANDROID_native_fence_sync #define EGL_ANDROID_native_fence_sync 1 #define EGL_SYNC_NATIVE_FENCE_ANDROID 0x3144 #define EGL_SYNC_NATIVE_FENCE_FD_ANDROID 0x3145 #define EGL_SYNC_NATIVE_FENCE_SIGNALED_ANDROID 0x3146 #define EGL_NO_NATIVE_FENCE_FD_ANDROID -1 typedef EGLint (EGLAPIENTRYP PFNEGLDUPNATIVEFENCEFDANDROIDPROC) (EGLDisplay dpy, EGLSyncKHR sync); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLint EGLAPIENTRY eglDupNativeFenceFDANDROID (EGLDisplay dpy, EGLSyncKHR sync); #endif #endif /* EGL_ANDROID_native_fence_sync */ #ifndef EGL_ANDROID_recordable #define EGL_ANDROID_recordable 1 #define EGL_RECORDABLE_ANDROID 0x3142 #endif /* EGL_ANDROID_recordable */ #ifndef EGL_ANGLE_d3d_share_handle_client_buffer #define EGL_ANGLE_d3d_share_handle_client_buffer 1 #define EGL_D3D_TEXTURE_2D_SHARE_HANDLE_ANGLE 0x3200 #endif /* EGL_ANGLE_d3d_share_handle_client_buffer */ #ifndef EGL_ANGLE_query_surface_pointer #define EGL_ANGLE_query_surface_pointer 1 typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYSURFACEPOINTERANGLEPROC) (EGLDisplay dpy, EGLSurface surface, EGLint attribute, void **value); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurfacePointerANGLE (EGLDisplay dpy, EGLSurface surface, EGLint attribute, void **value); #endif #endif /* EGL_ANGLE_query_surface_pointer */ #ifndef EGL_ANGLE_surface_d3d_texture_2d_share_handle #define EGL_ANGLE_surface_d3d_texture_2d_share_handle 1 #endif /* EGL_ANGLE_surface_d3d_texture_2d_share_handle */ #ifndef EGL_ARM_pixmap_multisample_discard #define EGL_ARM_pixmap_multisample_discard 1 #define EGL_DISCARD_SAMPLES_ARM 0x3286 #endif /* EGL_ARM_pixmap_multisample_discard */ #ifndef EGL_EXT_buffer_age #define EGL_EXT_buffer_age 1 #define EGL_BUFFER_AGE_EXT 0x313D #endif /* EGL_EXT_buffer_age */ #ifndef EGL_EXT_client_extensions #define EGL_EXT_client_extensions 1 #endif /* EGL_EXT_client_extensions */ #ifndef EGL_EXT_create_context_robustness #define EGL_EXT_create_context_robustness 1 #define EGL_CONTEXT_OPENGL_ROBUST_ACCESS_EXT 0x30BF #define EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_EXT 0x3138 #define EGL_NO_RESET_NOTIFICATION_EXT 0x31BE #define EGL_LOSE_CONTEXT_ON_RESET_EXT 0x31BF #endif /* EGL_EXT_create_context_robustness */ #ifndef EGL_EXT_image_dma_buf_import #define EGL_EXT_image_dma_buf_import 1 #define EGL_LINUX_DMA_BUF_EXT 0x3270 #define EGL_LINUX_DRM_FOURCC_EXT 0x3271 #define EGL_DMA_BUF_PLANE0_FD_EXT 0x3272 #define EGL_DMA_BUF_PLANE0_OFFSET_EXT 0x3273 #define EGL_DMA_BUF_PLANE0_PITCH_EXT 0x3274 #define EGL_DMA_BUF_PLANE1_FD_EXT 0x3275 #define EGL_DMA_BUF_PLANE1_OFFSET_EXT 0x3276 #define EGL_DMA_BUF_PLANE1_PITCH_EXT 0x3277 #define EGL_DMA_BUF_PLANE2_FD_EXT 0x3278 #define EGL_DMA_BUF_PLANE2_OFFSET_EXT 0x3279 #define EGL_DMA_BUF_PLANE2_PITCH_EXT 0x327A #define EGL_YUV_COLOR_SPACE_HINT_EXT 0x327B #define EGL_SAMPLE_RANGE_HINT_EXT 0x327C #define EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT 0x327D #define EGL_YUV_CHROMA_VERTICAL_SITING_HINT_EXT 0x327E #define EGL_ITU_REC601_EXT 0x327F #define EGL_ITU_REC709_EXT 0x3280 #define EGL_ITU_REC2020_EXT 0x3281 #define EGL_YUV_FULL_RANGE_EXT 0x3282 #define EGL_YUV_NARROW_RANGE_EXT 0x3283 #define EGL_YUV_CHROMA_SITING_0_EXT 0x3284 #define EGL_YUV_CHROMA_SITING_0_5_EXT 0x3285 #endif /* EGL_EXT_image_dma_buf_import */ #ifndef EGL_EXT_multiview_window #define EGL_EXT_multiview_window 1 #define EGL_MULTIVIEW_VIEW_COUNT_EXT 0x3134 #endif /* EGL_EXT_multiview_window */ #ifndef EGL_EXT_platform_base #define EGL_EXT_platform_base 1 typedef EGLDisplay (EGLAPIENTRYP PFNEGLGETPLATFORMDISPLAYEXTPROC) (EGLenum platform, void *native_display, const EGLint *attrib_list); typedef EGLSurface (EGLAPIENTRYP PFNEGLCREATEPLATFORMWINDOWSURFACEEXTPROC) (EGLDisplay dpy, EGLConfig config, void *native_window, const EGLint *attrib_list); typedef EGLSurface (EGLAPIENTRYP PFNEGLCREATEPLATFORMPIXMAPSURFACEEXTPROC) (EGLDisplay dpy, EGLConfig config, void *native_pixmap, const EGLint *attrib_list); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLDisplay EGLAPIENTRY eglGetPlatformDisplayEXT (EGLenum platform, void *native_display, const EGLint *attrib_list); EGLAPI EGLSurface EGLAPIENTRY eglCreatePlatformWindowSurfaceEXT (EGLDisplay dpy, EGLConfig config, void *native_window, const EGLint *attrib_list); EGLAPI EGLSurface EGLAPIENTRY eglCreatePlatformPixmapSurfaceEXT (EGLDisplay dpy, EGLConfig config, void *native_pixmap, const EGLint *attrib_list); #endif #endif /* EGL_EXT_platform_base */ #ifndef EGL_EXT_platform_wayland #define EGL_EXT_platform_wayland 1 #define EGL_PLATFORM_WAYLAND_EXT 0x31D8 #endif /* EGL_EXT_platform_wayland */ #ifndef EGL_EXT_platform_x11 #define EGL_EXT_platform_x11 1 #define EGL_PLATFORM_X11_EXT 0x31D5 #define EGL_PLATFORM_X11_SCREEN_EXT 0x31D6 #endif /* EGL_EXT_platform_x11 */ #ifndef EGL_EXT_swap_buffers_with_damage #define EGL_EXT_swap_buffers_with_damage 1 typedef EGLBoolean (EGLAPIENTRYP PFNEGLSWAPBUFFERSWITHDAMAGEEXTPROC) (EGLDisplay dpy, EGLSurface surface, EGLint *rects, EGLint n_rects); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLBoolean EGLAPIENTRY eglSwapBuffersWithDamageEXT (EGLDisplay dpy, EGLSurface surface, EGLint *rects, EGLint n_rects); #endif #endif /* EGL_EXT_swap_buffers_with_damage */ #ifndef EGL_HI_clientpixmap #define EGL_HI_clientpixmap 1 struct EGLClientPixmapHI { void *pData; EGLint iWidth; EGLint iHeight; EGLint iStride; }; #define EGL_CLIENT_PIXMAP_POINTER_HI 0x8F74 typedef EGLSurface (EGLAPIENTRYP PFNEGLCREATEPIXMAPSURFACEHIPROC) (EGLDisplay dpy, EGLConfig config, struct EGLClientPixmapHI *pixmap); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLSurface EGLAPIENTRY eglCreatePixmapSurfaceHI (EGLDisplay dpy, EGLConfig config, struct EGLClientPixmapHI *pixmap); #endif #endif /* EGL_HI_clientpixmap */ #ifndef EGL_HI_colorformats #define EGL_HI_colorformats 1 #define EGL_COLOR_FORMAT_HI 0x8F70 #define EGL_COLOR_RGB_HI 0x8F71 #define EGL_COLOR_RGBA_HI 0x8F72 #define EGL_COLOR_ARGB_HI 0x8F73 #endif /* EGL_HI_colorformats */ #ifndef EGL_IMG_context_priority #define EGL_IMG_context_priority 1 #define EGL_CONTEXT_PRIORITY_LEVEL_IMG 0x3100 #define EGL_CONTEXT_PRIORITY_HIGH_IMG 0x3101 #define EGL_CONTEXT_PRIORITY_MEDIUM_IMG 0x3102 #define EGL_CONTEXT_PRIORITY_LOW_IMG 0x3103 #endif /* EGL_IMG_context_priority */ #ifndef EGL_MESA_drm_image #define EGL_MESA_drm_image 1 #define EGL_DRM_BUFFER_FORMAT_MESA 0x31D0 #define EGL_DRM_BUFFER_USE_MESA 0x31D1 #define EGL_DRM_BUFFER_FORMAT_ARGB32_MESA 0x31D2 #define EGL_DRM_BUFFER_MESA 0x31D3 #define EGL_DRM_BUFFER_STRIDE_MESA 0x31D4 #define EGL_DRM_BUFFER_USE_SCANOUT_MESA 0x00000001 #define EGL_DRM_BUFFER_USE_SHARE_MESA 0x00000002 typedef EGLImageKHR (EGLAPIENTRYP PFNEGLCREATEDRMIMAGEMESAPROC) (EGLDisplay dpy, const EGLint *attrib_list); typedef EGLBoolean (EGLAPIENTRYP PFNEGLEXPORTDRMIMAGEMESAPROC) (EGLDisplay dpy, EGLImageKHR image, EGLint *name, EGLint *handle, EGLint *stride); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLImageKHR EGLAPIENTRY eglCreateDRMImageMESA (EGLDisplay dpy, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglExportDRMImageMESA (EGLDisplay dpy, EGLImageKHR image, EGLint *name, EGLint *handle, EGLint *stride); #endif #endif /* EGL_MESA_drm_image */ #ifndef EGL_MESA_platform_gbm #define EGL_MESA_platform_gbm 1 #define EGL_PLATFORM_GBM_MESA 0x31D7 #endif /* EGL_MESA_platform_gbm */ #ifndef EGL_NV_3dvision_surface #define EGL_NV_3dvision_surface 1 #define EGL_AUTO_STEREO_NV 0x3136 #endif /* EGL_NV_3dvision_surface */ #ifndef EGL_NV_coverage_sample #define EGL_NV_coverage_sample 1 #define EGL_COVERAGE_BUFFERS_NV 0x30E0 #define EGL_COVERAGE_SAMPLES_NV 0x30E1 #endif /* EGL_NV_coverage_sample */ #ifndef EGL_NV_coverage_sample_resolve #define EGL_NV_coverage_sample_resolve 1 #define EGL_COVERAGE_SAMPLE_RESOLVE_NV 0x3131 #define EGL_COVERAGE_SAMPLE_RESOLVE_DEFAULT_NV 0x3132 #define EGL_COVERAGE_SAMPLE_RESOLVE_NONE_NV 0x3133 #endif /* EGL_NV_coverage_sample_resolve */ #ifndef EGL_NV_depth_nonlinear #define EGL_NV_depth_nonlinear 1 #define EGL_DEPTH_ENCODING_NV 0x30E2 #define EGL_DEPTH_ENCODING_NONE_NV 0 #define EGL_DEPTH_ENCODING_NONLINEAR_NV 0x30E3 #endif /* EGL_NV_depth_nonlinear */ #ifndef EGL_NV_native_query #define EGL_NV_native_query 1 typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYNATIVEDISPLAYNVPROC) (EGLDisplay dpy, EGLNativeDisplayType *display_id); typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYNATIVEWINDOWNVPROC) (EGLDisplay dpy, EGLSurface surf, EGLNativeWindowType *window); typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYNATIVEPIXMAPNVPROC) (EGLDisplay dpy, EGLSurface surf, EGLNativePixmapType *pixmap); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLBoolean EGLAPIENTRY eglQueryNativeDisplayNV (EGLDisplay dpy, EGLNativeDisplayType *display_id); EGLAPI EGLBoolean EGLAPIENTRY eglQueryNativeWindowNV (EGLDisplay dpy, EGLSurface surf, EGLNativeWindowType *window); EGLAPI EGLBoolean EGLAPIENTRY eglQueryNativePixmapNV (EGLDisplay dpy, EGLSurface surf, EGLNativePixmapType *pixmap); #endif #endif /* EGL_NV_native_query */ #ifndef EGL_NV_post_convert_rounding #define EGL_NV_post_convert_rounding 1 #endif /* EGL_NV_post_convert_rounding */ #ifndef EGL_NV_post_sub_buffer #define EGL_NV_post_sub_buffer 1 #define EGL_POST_SUB_BUFFER_SUPPORTED_NV 0x30BE typedef EGLBoolean (EGLAPIENTRYP PFNEGLPOSTSUBBUFFERNVPROC) (EGLDisplay dpy, EGLSurface surface, EGLint x, EGLint y, EGLint width, EGLint height); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLBoolean EGLAPIENTRY eglPostSubBufferNV (EGLDisplay dpy, EGLSurface surface, EGLint x, EGLint y, EGLint width, EGLint height); #endif #endif /* EGL_NV_post_sub_buffer */ #ifndef EGL_NV_stream_sync #define EGL_NV_stream_sync 1 #define EGL_SYNC_NEW_FRAME_NV 0x321F typedef EGLSyncKHR (EGLAPIENTRYP PFNEGLCREATESTREAMSYNCNVPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLenum type, const EGLint *attrib_list); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLSyncKHR EGLAPIENTRY eglCreateStreamSyncNV (EGLDisplay dpy, EGLStreamKHR stream, EGLenum type, const EGLint *attrib_list); #endif #endif /* EGL_NV_stream_sync */ #ifndef EGL_NV_sync #define EGL_NV_sync 1 typedef void *EGLSyncNV; typedef khronos_utime_nanoseconds_t EGLTimeNV; #ifdef KHRONOS_SUPPORT_INT64 #define EGL_SYNC_PRIOR_COMMANDS_COMPLETE_NV 0x30E6 #define EGL_SYNC_STATUS_NV 0x30E7 #define EGL_SIGNALED_NV 0x30E8 #define EGL_UNSIGNALED_NV 0x30E9 #define EGL_SYNC_FLUSH_COMMANDS_BIT_NV 0x0001 #define EGL_FOREVER_NV 0xFFFFFFFFFFFFFFFFull #define EGL_ALREADY_SIGNALED_NV 0x30EA #define EGL_TIMEOUT_EXPIRED_NV 0x30EB #define EGL_CONDITION_SATISFIED_NV 0x30EC #define EGL_SYNC_TYPE_NV 0x30ED #define EGL_SYNC_CONDITION_NV 0x30EE #define EGL_SYNC_FENCE_NV 0x30EF #define EGL_NO_SYNC_NV ((EGLSyncNV)0) typedef EGLSyncNV (EGLAPIENTRYP PFNEGLCREATEFENCESYNCNVPROC) (EGLDisplay dpy, EGLenum condition, const EGLint *attrib_list); typedef EGLBoolean (EGLAPIENTRYP PFNEGLDESTROYSYNCNVPROC) (EGLSyncNV sync); typedef EGLBoolean (EGLAPIENTRYP PFNEGLFENCENVPROC) (EGLSyncNV sync); typedef EGLint (EGLAPIENTRYP PFNEGLCLIENTWAITSYNCNVPROC) (EGLSyncNV sync, EGLint flags, EGLTimeNV timeout); typedef EGLBoolean (EGLAPIENTRYP PFNEGLSIGNALSYNCNVPROC) (EGLSyncNV sync, EGLenum mode); typedef EGLBoolean (EGLAPIENTRYP PFNEGLGETSYNCATTRIBNVPROC) (EGLSyncNV sync, EGLint attribute, EGLint *value); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLSyncNV EGLAPIENTRY eglCreateFenceSyncNV (EGLDisplay dpy, EGLenum condition, const EGLint *attrib_list); EGLAPI EGLBoolean EGLAPIENTRY eglDestroySyncNV (EGLSyncNV sync); EGLAPI EGLBoolean EGLAPIENTRY eglFenceNV (EGLSyncNV sync); EGLAPI EGLint EGLAPIENTRY eglClientWaitSyncNV (EGLSyncNV sync, EGLint flags, EGLTimeNV timeout); EGLAPI EGLBoolean EGLAPIENTRY eglSignalSyncNV (EGLSyncNV sync, EGLenum mode); EGLAPI EGLBoolean EGLAPIENTRY eglGetSyncAttribNV (EGLSyncNV sync, EGLint attribute, EGLint *value); #endif #endif /* KHRONOS_SUPPORT_INT64 */ #endif /* EGL_NV_sync */ #ifndef EGL_NV_system_time #define EGL_NV_system_time 1 typedef khronos_utime_nanoseconds_t EGLuint64NV; #ifdef KHRONOS_SUPPORT_INT64 typedef EGLuint64NV (EGLAPIENTRYP PFNEGLGETSYSTEMTIMEFREQUENCYNVPROC) (void); typedef EGLuint64NV (EGLAPIENTRYP PFNEGLGETSYSTEMTIMENVPROC) (void); #ifdef EGL_EGLEXT_PROTOTYPES EGLAPI EGLuint64NV EGLAPIENTRY eglGetSystemTimeFrequencyNV (void); EGLAPI EGLuint64NV EGLAPIENTRY eglGetSystemTimeNV (void); #endif #endif /* KHRONOS_SUPPORT_INT64 */ #endif /* EGL_NV_system_time */ #ifdef __cplusplus } #endif #endif clr-rocm-5.7.1/opencl/khronos/headers/EGL/eglplatform.h000066400000000000000000000106311450307266000227200ustar00rootroot00000000000000#ifndef __eglplatform_h_ #define __eglplatform_h_ /* ** Copyright (c) 2007-2013 The Khronos Group Inc. ** ** Permission is hereby granted, free of charge, to any person obtaining a ** copy of this software and/or associated documentation files (the ** "Materials"), to deal in the Materials without restriction, including ** without limitation the rights to use, copy, modify, merge, publish, ** distribute, sublicense, and/or sell copies of the Materials, and to ** permit persons to whom the Materials are furnished to do so, subject to ** the following conditions: ** ** The above copyright notice and this permission notice shall be included ** in all copies or substantial portions of the Materials. ** ** THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, ** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF ** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. ** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY ** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, ** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE ** MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. */ /* Platform-specific types and definitions for egl.h * $Revision: 23432 $ on $Date: 2013-10-09 00:57:24 -0700 (Wed, 09 Oct 2013) $ * * Adopters may modify khrplatform.h and this file to suit their platform. * You are encouraged to submit all modifications to the Khronos group so that * they can be included in future versions of this file. Please submit changes * by sending them to the public Khronos Bugzilla (http://khronos.org/bugzilla) * by filing a bug against product "EGL" component "Registry". */ #include /* Macros used in EGL function prototype declarations. * * EGL functions should be prototyped as: * * EGLAPI return-type EGLAPIENTRY eglFunction(arguments); * typedef return-type (EXPAPIENTRYP PFNEGLFUNCTIONPROC) (arguments); * * KHRONOS_APICALL and KHRONOS_APIENTRY are defined in KHR/khrplatform.h */ #ifndef EGLAPI #define EGLAPI KHRONOS_APICALL #endif #ifndef EGLAPIENTRY #define EGLAPIENTRY KHRONOS_APIENTRY #endif #define EGLAPIENTRYP EGLAPIENTRY* /* The types NativeDisplayType, NativeWindowType, and NativePixmapType * are aliases of window-system-dependent types, such as X Display * or * Windows Device Context. They must be defined in platform-specific * code below. The EGL-prefixed versions of Native*Type are the same * types, renamed in EGL 1.3 so all types in the API start with "EGL". * * Khronos STRONGLY RECOMMENDS that you use the default definitions * provided below, since these changes affect both binary and source * portability of applications using EGL running on different EGL * implementations. */ #if defined(_WIN32) || defined(__VC32__) && !defined(__CYGWIN__) && !defined(__SCITECH_SNAP__) /* Win32 and WinCE */ #ifndef WIN32_LEAN_AND_MEAN #define WIN32_LEAN_AND_MEAN 1 #endif #include typedef HDC EGLNativeDisplayType; typedef HBITMAP EGLNativePixmapType; typedef HWND EGLNativeWindowType; #elif defined(__WINSCW__) || defined(__SYMBIAN32__) /* Symbian */ typedef int EGLNativeDisplayType; typedef void *EGLNativeWindowType; typedef void *EGLNativePixmapType; #elif defined(__ANDROID__) || defined(ANDROID) #include struct egl_native_pixmap_t; typedef struct ANativeWindow* EGLNativeWindowType; typedef struct egl_native_pixmap_t* EGLNativePixmapType; typedef void* EGLNativeDisplayType; #elif defined(__unix__) /* X11 (tentative) */ #include #include typedef Display *EGLNativeDisplayType; typedef Pixmap EGLNativePixmapType; typedef Window EGLNativeWindowType; #else #error "Platform not recognized" #endif /* EGL 1.2 types, renamed for consistency in EGL 1.3 */ typedef EGLNativeDisplayType NativeDisplayType; typedef EGLNativePixmapType NativePixmapType; typedef EGLNativeWindowType NativeWindowType; /* Define EGLint. This must be a signed integral type large enough to contain * all legal attribute names and values passed into and out of EGL, whether * their type is boolean, bitmask, enumerant (symbolic constant), integer, * handle, or other. While in general a 32-bit integer will suffice, if * handles are 64 bit types, then EGLint should be defined as a signed 64-bit * integer type. */ typedef khronos_int32_t EGLint; #endif /* __eglplatform_h */ clr-rocm-5.7.1/opencl/khronos/headers/GL/000077500000000000000000000000001450307266000201255ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/GL/glext.h000066400000000000000000017012141450307266000214270ustar00rootroot00000000000000#ifndef __glext_h_ #define __glext_h_ #ifdef __cplusplus extern "C" { #endif /* ** Copyright (c) 2007-2009 The Khronos Group Inc. ** ** Permission is hereby granted, free of charge, to any person obtaining a ** copy of this software and/or associated documentation files (the ** "Materials"), to deal in the Materials without restriction, including ** without limitation the rights to use, copy, modify, merge, publish, ** distribute, sublicense, and/or sell copies of the Materials, and to ** permit persons to whom the Materials are furnished to do so, subject to ** the following conditions: ** ** The above copyright notice and this permission notice shall be included ** in all copies or substantial portions of the Materials. ** ** THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, ** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF ** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. ** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY ** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, ** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE ** MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. */ /* Header file version number, required by OpenGL ABI for Linux */ /* glext.h last updated $Date: 2009-08-03 02:13:51 -0700 (Mon, 03 Aug 2009) $ */ /* Current version at http://www.opengl.org/registry/ */ #define GL_GLEXT_VERSION 54 /* Function declaration macros - to move into glplatform.h */ #if defined(_WIN32) && !defined(APIENTRY) && !defined(__CYGWIN__) && !defined(__SCITECH_SNAP__) #define WIN32_LEAN_AND_MEAN 1 #include #endif #ifndef APIENTRY #define APIENTRY #endif #ifndef APIENTRYP #define APIENTRYP APIENTRY * #endif #ifndef GLAPI #define GLAPI extern #endif /*************************************************************/ #ifndef GL_VERSION_1_2 #define GL_UNSIGNED_BYTE_3_3_2 0x8032 #define GL_UNSIGNED_SHORT_4_4_4_4 0x8033 #define GL_UNSIGNED_SHORT_5_5_5_1 0x8034 #define GL_UNSIGNED_INT_8_8_8_8 0x8035 #define GL_UNSIGNED_INT_10_10_10_2 0x8036 #define GL_TEXTURE_BINDING_3D 0x806A #define GL_PACK_SKIP_IMAGES 0x806B #define GL_PACK_IMAGE_HEIGHT 0x806C #define GL_UNPACK_SKIP_IMAGES 0x806D #define GL_UNPACK_IMAGE_HEIGHT 0x806E #define GL_TEXTURE_3D 0x806F #define GL_PROXY_TEXTURE_3D 0x8070 #define GL_TEXTURE_DEPTH 0x8071 #define GL_TEXTURE_WRAP_R 0x8072 #define GL_MAX_3D_TEXTURE_SIZE 0x8073 #define GL_UNSIGNED_BYTE_2_3_3_REV 0x8362 #define GL_UNSIGNED_SHORT_5_6_5 0x8363 #define GL_UNSIGNED_SHORT_5_6_5_REV 0x8364 #define GL_UNSIGNED_SHORT_4_4_4_4_REV 0x8365 #define GL_UNSIGNED_SHORT_1_5_5_5_REV 0x8366 #define GL_UNSIGNED_INT_8_8_8_8_REV 0x8367 #define GL_UNSIGNED_INT_2_10_10_10_REV 0x8368 #define GL_BGR 0x80E0 #define GL_BGRA 0x80E1 #define GL_MAX_ELEMENTS_VERTICES 0x80E8 #define GL_MAX_ELEMENTS_INDICES 0x80E9 #define GL_CLAMP_TO_EDGE 0x812F #define GL_TEXTURE_MIN_LOD 0x813A #define GL_TEXTURE_MAX_LOD 0x813B #define GL_TEXTURE_BASE_LEVEL 0x813C #define GL_TEXTURE_MAX_LEVEL 0x813D #define GL_SMOOTH_POINT_SIZE_RANGE 0x0B12 #define GL_SMOOTH_POINT_SIZE_GRANULARITY 0x0B13 #define GL_SMOOTH_LINE_WIDTH_RANGE 0x0B22 #define GL_SMOOTH_LINE_WIDTH_GRANULARITY 0x0B23 #define GL_ALIASED_LINE_WIDTH_RANGE 0x846E #endif #ifndef GL_VERSION_1_2_DEPRECATED #define GL_RESCALE_NORMAL 0x803A #define GL_LIGHT_MODEL_COLOR_CONTROL 0x81F8 #define GL_SINGLE_COLOR 0x81F9 #define GL_SEPARATE_SPECULAR_COLOR 0x81FA #define GL_ALIASED_POINT_SIZE_RANGE 0x846D #endif #ifndef GL_ARB_imaging #define GL_CONSTANT_COLOR 0x8001 #define GL_ONE_MINUS_CONSTANT_COLOR 0x8002 #define GL_CONSTANT_ALPHA 0x8003 #define GL_ONE_MINUS_CONSTANT_ALPHA 0x8004 #define GL_BLEND_COLOR 0x8005 #define GL_FUNC_ADD 0x8006 #define GL_MIN 0x8007 #define GL_MAX 0x8008 #define GL_BLEND_EQUATION 0x8009 #define GL_FUNC_SUBTRACT 0x800A #define GL_FUNC_REVERSE_SUBTRACT 0x800B #endif #ifndef GL_ARB_imaging_DEPRECATED #define GL_CONVOLUTION_1D 0x8010 #define GL_CONVOLUTION_2D 0x8011 #define GL_SEPARABLE_2D 0x8012 #define GL_CONVOLUTION_BORDER_MODE 0x8013 #define GL_CONVOLUTION_FILTER_SCALE 0x8014 #define GL_CONVOLUTION_FILTER_BIAS 0x8015 #define GL_REDUCE 0x8016 #define GL_CONVOLUTION_FORMAT 0x8017 #define GL_CONVOLUTION_WIDTH 0x8018 #define GL_CONVOLUTION_HEIGHT 0x8019 #define GL_MAX_CONVOLUTION_WIDTH 0x801A #define GL_MAX_CONVOLUTION_HEIGHT 0x801B #define GL_POST_CONVOLUTION_RED_SCALE 0x801C #define GL_POST_CONVOLUTION_GREEN_SCALE 0x801D #define GL_POST_CONVOLUTION_BLUE_SCALE 0x801E #define GL_POST_CONVOLUTION_ALPHA_SCALE 0x801F #define GL_POST_CONVOLUTION_RED_BIAS 0x8020 #define GL_POST_CONVOLUTION_GREEN_BIAS 0x8021 #define GL_POST_CONVOLUTION_BLUE_BIAS 0x8022 #define GL_POST_CONVOLUTION_ALPHA_BIAS 0x8023 #define GL_HISTOGRAM 0x8024 #define GL_PROXY_HISTOGRAM 0x8025 #define GL_HISTOGRAM_WIDTH 0x8026 #define GL_HISTOGRAM_FORMAT 0x8027 #define GL_HISTOGRAM_RED_SIZE 0x8028 #define GL_HISTOGRAM_GREEN_SIZE 0x8029 #define GL_HISTOGRAM_BLUE_SIZE 0x802A #define GL_HISTOGRAM_ALPHA_SIZE 0x802B #define GL_HISTOGRAM_LUMINANCE_SIZE 0x802C #define GL_HISTOGRAM_SINK 0x802D #define GL_MINMAX 0x802E #define GL_MINMAX_FORMAT 0x802F #define GL_MINMAX_SINK 0x8030 #define GL_TABLE_TOO_LARGE 0x8031 #define GL_COLOR_MATRIX 0x80B1 #define GL_COLOR_MATRIX_STACK_DEPTH 0x80B2 #define GL_MAX_COLOR_MATRIX_STACK_DEPTH 0x80B3 #define GL_POST_COLOR_MATRIX_RED_SCALE 0x80B4 #define GL_POST_COLOR_MATRIX_GREEN_SCALE 0x80B5 #define GL_POST_COLOR_MATRIX_BLUE_SCALE 0x80B6 #define GL_POST_COLOR_MATRIX_ALPHA_SCALE 0x80B7 #define GL_POST_COLOR_MATRIX_RED_BIAS 0x80B8 #define GL_POST_COLOR_MATRIX_GREEN_BIAS 0x80B9 #define GL_POST_COLOR_MATRIX_BLUE_BIAS 0x80BA #define GL_POST_COLOR_MATRIX_ALPHA_BIAS 0x80BB #define GL_COLOR_TABLE 0x80D0 #define GL_POST_CONVOLUTION_COLOR_TABLE 0x80D1 #define GL_POST_COLOR_MATRIX_COLOR_TABLE 0x80D2 #define GL_PROXY_COLOR_TABLE 0x80D3 #define GL_PROXY_POST_CONVOLUTION_COLOR_TABLE 0x80D4 #define GL_PROXY_POST_COLOR_MATRIX_COLOR_TABLE 0x80D5 #define GL_COLOR_TABLE_SCALE 0x80D6 #define GL_COLOR_TABLE_BIAS 0x80D7 #define GL_COLOR_TABLE_FORMAT 0x80D8 #define GL_COLOR_TABLE_WIDTH 0x80D9 #define GL_COLOR_TABLE_RED_SIZE 0x80DA #define GL_COLOR_TABLE_GREEN_SIZE 0x80DB #define GL_COLOR_TABLE_BLUE_SIZE 0x80DC #define GL_COLOR_TABLE_ALPHA_SIZE 0x80DD #define GL_COLOR_TABLE_LUMINANCE_SIZE 0x80DE #define GL_COLOR_TABLE_INTENSITY_SIZE 0x80DF #define GL_CONSTANT_BORDER 0x8151 #define GL_REPLICATE_BORDER 0x8153 #define GL_CONVOLUTION_BORDER_COLOR 0x8154 #endif #ifndef GL_VERSION_1_3 #define GL_TEXTURE0 0x84C0 #define GL_TEXTURE1 0x84C1 #define GL_TEXTURE2 0x84C2 #define GL_TEXTURE3 0x84C3 #define GL_TEXTURE4 0x84C4 #define GL_TEXTURE5 0x84C5 #define GL_TEXTURE6 0x84C6 #define GL_TEXTURE7 0x84C7 #define GL_TEXTURE8 0x84C8 #define GL_TEXTURE9 0x84C9 #define GL_TEXTURE10 0x84CA #define GL_TEXTURE11 0x84CB #define GL_TEXTURE12 0x84CC #define GL_TEXTURE13 0x84CD #define GL_TEXTURE14 0x84CE #define GL_TEXTURE15 0x84CF #define GL_TEXTURE16 0x84D0 #define GL_TEXTURE17 0x84D1 #define GL_TEXTURE18 0x84D2 #define GL_TEXTURE19 0x84D3 #define GL_TEXTURE20 0x84D4 #define GL_TEXTURE21 0x84D5 #define GL_TEXTURE22 0x84D6 #define GL_TEXTURE23 0x84D7 #define GL_TEXTURE24 0x84D8 #define GL_TEXTURE25 0x84D9 #define GL_TEXTURE26 0x84DA #define GL_TEXTURE27 0x84DB #define GL_TEXTURE28 0x84DC #define GL_TEXTURE29 0x84DD #define GL_TEXTURE30 0x84DE #define GL_TEXTURE31 0x84DF #define GL_ACTIVE_TEXTURE 0x84E0 #define GL_MULTISAMPLE 0x809D #define GL_SAMPLE_ALPHA_TO_COVERAGE 0x809E #define GL_SAMPLE_ALPHA_TO_ONE 0x809F #define GL_SAMPLE_COVERAGE 0x80A0 #define GL_SAMPLE_BUFFERS 0x80A8 #define GL_SAMPLES 0x80A9 #define GL_SAMPLE_COVERAGE_VALUE 0x80AA #define GL_SAMPLE_COVERAGE_INVERT 0x80AB #define GL_TEXTURE_CUBE_MAP 0x8513 #define GL_TEXTURE_BINDING_CUBE_MAP 0x8514 #define GL_TEXTURE_CUBE_MAP_POSITIVE_X 0x8515 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_X 0x8516 #define GL_TEXTURE_CUBE_MAP_POSITIVE_Y 0x8517 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_Y 0x8518 #define GL_TEXTURE_CUBE_MAP_POSITIVE_Z 0x8519 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_Z 0x851A #define GL_PROXY_TEXTURE_CUBE_MAP 0x851B #define GL_MAX_CUBE_MAP_TEXTURE_SIZE 0x851C #define GL_COMPRESSED_RGB 0x84ED #define GL_COMPRESSED_RGBA 0x84EE #define GL_TEXTURE_COMPRESSION_HINT 0x84EF #define GL_TEXTURE_COMPRESSED_IMAGE_SIZE 0x86A0 #define GL_TEXTURE_COMPRESSED 0x86A1 #define GL_NUM_COMPRESSED_TEXTURE_FORMATS 0x86A2 #define GL_COMPRESSED_TEXTURE_FORMATS 0x86A3 #define GL_CLAMP_TO_BORDER 0x812D #endif #ifndef GL_VERSION_1_3_DEPRECATED #define GL_CLIENT_ACTIVE_TEXTURE 0x84E1 #define GL_MAX_TEXTURE_UNITS 0x84E2 #define GL_TRANSPOSE_MODELVIEW_MATRIX 0x84E3 #define GL_TRANSPOSE_PROJECTION_MATRIX 0x84E4 #define GL_TRANSPOSE_TEXTURE_MATRIX 0x84E5 #define GL_TRANSPOSE_COLOR_MATRIX 0x84E6 #define GL_MULTISAMPLE_BIT 0x20000000 #define GL_NORMAL_MAP 0x8511 #define GL_REFLECTION_MAP 0x8512 #define GL_COMPRESSED_ALPHA 0x84E9 #define GL_COMPRESSED_LUMINANCE 0x84EA #define GL_COMPRESSED_LUMINANCE_ALPHA 0x84EB #define GL_COMPRESSED_INTENSITY 0x84EC #define GL_COMBINE 0x8570 #define GL_COMBINE_RGB 0x8571 #define GL_COMBINE_ALPHA 0x8572 #define GL_SOURCE0_RGB 0x8580 #define GL_SOURCE1_RGB 0x8581 #define GL_SOURCE2_RGB 0x8582 #define GL_SOURCE0_ALPHA 0x8588 #define GL_SOURCE1_ALPHA 0x8589 #define GL_SOURCE2_ALPHA 0x858A #define GL_OPERAND0_RGB 0x8590 #define GL_OPERAND1_RGB 0x8591 #define GL_OPERAND2_RGB 0x8592 #define GL_OPERAND0_ALPHA 0x8598 #define GL_OPERAND1_ALPHA 0x8599 #define GL_OPERAND2_ALPHA 0x859A #define GL_RGB_SCALE 0x8573 #define GL_ADD_SIGNED 0x8574 #define GL_INTERPOLATE 0x8575 #define GL_SUBTRACT 0x84E7 #define GL_CONSTANT 0x8576 #define GL_PRIMARY_COLOR 0x8577 #define GL_PREVIOUS 0x8578 #define GL_DOT3_RGB 0x86AE #define GL_DOT3_RGBA 0x86AF #endif #ifndef GL_VERSION_1_4 #define GL_BLEND_DST_RGB 0x80C8 #define GL_BLEND_SRC_RGB 0x80C9 #define GL_BLEND_DST_ALPHA 0x80CA #define GL_BLEND_SRC_ALPHA 0x80CB #define GL_POINT_FADE_THRESHOLD_SIZE 0x8128 #define GL_DEPTH_COMPONENT16 0x81A5 #define GL_DEPTH_COMPONENT24 0x81A6 #define GL_DEPTH_COMPONENT32 0x81A7 #define GL_MIRRORED_REPEAT 0x8370 #define GL_MAX_TEXTURE_LOD_BIAS 0x84FD #define GL_TEXTURE_LOD_BIAS 0x8501 #define GL_INCR_WRAP 0x8507 #define GL_DECR_WRAP 0x8508 #define GL_TEXTURE_DEPTH_SIZE 0x884A #define GL_TEXTURE_COMPARE_MODE 0x884C #define GL_TEXTURE_COMPARE_FUNC 0x884D #endif #ifndef GL_VERSION_1_4_DEPRECATED #define GL_POINT_SIZE_MIN 0x8126 #define GL_POINT_SIZE_MAX 0x8127 #define GL_POINT_DISTANCE_ATTENUATION 0x8129 #define GL_GENERATE_MIPMAP 0x8191 #define GL_GENERATE_MIPMAP_HINT 0x8192 #define GL_FOG_COORDINATE_SOURCE 0x8450 #define GL_FOG_COORDINATE 0x8451 #define GL_FRAGMENT_DEPTH 0x8452 #define GL_CURRENT_FOG_COORDINATE 0x8453 #define GL_FOG_COORDINATE_ARRAY_TYPE 0x8454 #define GL_FOG_COORDINATE_ARRAY_STRIDE 0x8455 #define GL_FOG_COORDINATE_ARRAY_POINTER 0x8456 #define GL_FOG_COORDINATE_ARRAY 0x8457 #define GL_COLOR_SUM 0x8458 #define GL_CURRENT_SECONDARY_COLOR 0x8459 #define GL_SECONDARY_COLOR_ARRAY_SIZE 0x845A #define GL_SECONDARY_COLOR_ARRAY_TYPE 0x845B #define GL_SECONDARY_COLOR_ARRAY_STRIDE 0x845C #define GL_SECONDARY_COLOR_ARRAY_POINTER 0x845D #define GL_SECONDARY_COLOR_ARRAY 0x845E #define GL_TEXTURE_FILTER_CONTROL 0x8500 #define GL_DEPTH_TEXTURE_MODE 0x884B #define GL_COMPARE_R_TO_TEXTURE 0x884E #endif #ifndef GL_VERSION_1_5 #define GL_BUFFER_SIZE 0x8764 #define GL_BUFFER_USAGE 0x8765 #define GL_QUERY_COUNTER_BITS 0x8864 #define GL_CURRENT_QUERY 0x8865 #define GL_QUERY_RESULT 0x8866 #define GL_QUERY_RESULT_AVAILABLE 0x8867 #define GL_ARRAY_BUFFER 0x8892 #define GL_ELEMENT_ARRAY_BUFFER 0x8893 #define GL_ARRAY_BUFFER_BINDING 0x8894 #define GL_ELEMENT_ARRAY_BUFFER_BINDING 0x8895 #define GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING 0x889F #define GL_READ_ONLY 0x88B8 #define GL_WRITE_ONLY 0x88B9 #define GL_READ_WRITE 0x88BA #define GL_BUFFER_ACCESS 0x88BB #define GL_BUFFER_MAPPED 0x88BC #define GL_BUFFER_MAP_POINTER 0x88BD #define GL_STREAM_DRAW 0x88E0 #define GL_STREAM_READ 0x88E1 #define GL_STREAM_COPY 0x88E2 #define GL_STATIC_DRAW 0x88E4 #define GL_STATIC_READ 0x88E5 #define GL_STATIC_COPY 0x88E6 #define GL_DYNAMIC_DRAW 0x88E8 #define GL_DYNAMIC_READ 0x88E9 #define GL_DYNAMIC_COPY 0x88EA #define GL_SAMPLES_PASSED 0x8914 #endif #ifndef GL_VERSION_1_5_DEPRECATED #define GL_VERTEX_ARRAY_BUFFER_BINDING 0x8896 #define GL_NORMAL_ARRAY_BUFFER_BINDING 0x8897 #define GL_COLOR_ARRAY_BUFFER_BINDING 0x8898 #define GL_INDEX_ARRAY_BUFFER_BINDING 0x8899 #define GL_TEXTURE_COORD_ARRAY_BUFFER_BINDING 0x889A #define GL_EDGE_FLAG_ARRAY_BUFFER_BINDING 0x889B #define GL_SECONDARY_COLOR_ARRAY_BUFFER_BINDING 0x889C #define GL_FOG_COORDINATE_ARRAY_BUFFER_BINDING 0x889D #define GL_WEIGHT_ARRAY_BUFFER_BINDING 0x889E #define GL_FOG_COORD_SRC 0x8450 #define GL_FOG_COORD 0x8451 #define GL_CURRENT_FOG_COORD 0x8453 #define GL_FOG_COORD_ARRAY_TYPE 0x8454 #define GL_FOG_COORD_ARRAY_STRIDE 0x8455 #define GL_FOG_COORD_ARRAY_POINTER 0x8456 #define GL_FOG_COORD_ARRAY 0x8457 #define GL_FOG_COORD_ARRAY_BUFFER_BINDING 0x889D #define GL_SRC0_RGB 0x8580 #define GL_SRC1_RGB 0x8581 #define GL_SRC2_RGB 0x8582 #define GL_SRC0_ALPHA 0x8588 #define GL_SRC1_ALPHA 0x8589 #define GL_SRC2_ALPHA 0x858A #endif #ifndef GL_VERSION_2_0 #define GL_BLEND_EQUATION_RGB 0x8009 #define GL_VERTEX_ATTRIB_ARRAY_ENABLED 0x8622 #define GL_VERTEX_ATTRIB_ARRAY_SIZE 0x8623 #define GL_VERTEX_ATTRIB_ARRAY_STRIDE 0x8624 #define GL_VERTEX_ATTRIB_ARRAY_TYPE 0x8625 #define GL_CURRENT_VERTEX_ATTRIB 0x8626 #define GL_VERTEX_PROGRAM_POINT_SIZE 0x8642 #define GL_VERTEX_ATTRIB_ARRAY_POINTER 0x8645 #define GL_STENCIL_BACK_FUNC 0x8800 #define GL_STENCIL_BACK_FAIL 0x8801 #define GL_STENCIL_BACK_PASS_DEPTH_FAIL 0x8802 #define GL_STENCIL_BACK_PASS_DEPTH_PASS 0x8803 #define GL_MAX_DRAW_BUFFERS 0x8824 #define GL_DRAW_BUFFER0 0x8825 #define GL_DRAW_BUFFER1 0x8826 #define GL_DRAW_BUFFER2 0x8827 #define GL_DRAW_BUFFER3 0x8828 #define GL_DRAW_BUFFER4 0x8829 #define GL_DRAW_BUFFER5 0x882A #define GL_DRAW_BUFFER6 0x882B #define GL_DRAW_BUFFER7 0x882C #define GL_DRAW_BUFFER8 0x882D #define GL_DRAW_BUFFER9 0x882E #define GL_DRAW_BUFFER10 0x882F #define GL_DRAW_BUFFER11 0x8830 #define GL_DRAW_BUFFER12 0x8831 #define GL_DRAW_BUFFER13 0x8832 #define GL_DRAW_BUFFER14 0x8833 #define GL_DRAW_BUFFER15 0x8834 #define GL_BLEND_EQUATION_ALPHA 0x883D #define GL_MAX_VERTEX_ATTRIBS 0x8869 #define GL_VERTEX_ATTRIB_ARRAY_NORMALIZED 0x886A #define GL_MAX_TEXTURE_IMAGE_UNITS 0x8872 #define GL_FRAGMENT_SHADER 0x8B30 #define GL_VERTEX_SHADER 0x8B31 #define GL_MAX_FRAGMENT_UNIFORM_COMPONENTS 0x8B49 #define GL_MAX_VERTEX_UNIFORM_COMPONENTS 0x8B4A #define GL_MAX_VARYING_FLOATS 0x8B4B #define GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS 0x8B4C #define GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS 0x8B4D #define GL_SHADER_TYPE 0x8B4F #define GL_FLOAT_VEC2 0x8B50 #define GL_FLOAT_VEC3 0x8B51 #define GL_FLOAT_VEC4 0x8B52 #define GL_INT_VEC2 0x8B53 #define GL_INT_VEC3 0x8B54 #define GL_INT_VEC4 0x8B55 #define GL_BOOL 0x8B56 #define GL_BOOL_VEC2 0x8B57 #define GL_BOOL_VEC3 0x8B58 #define GL_BOOL_VEC4 0x8B59 #define GL_FLOAT_MAT2 0x8B5A #define GL_FLOAT_MAT3 0x8B5B #define GL_FLOAT_MAT4 0x8B5C #define GL_SAMPLER_1D 0x8B5D #define GL_SAMPLER_2D 0x8B5E #define GL_SAMPLER_3D 0x8B5F #define GL_SAMPLER_CUBE 0x8B60 #define GL_SAMPLER_1D_SHADOW 0x8B61 #define GL_SAMPLER_2D_SHADOW 0x8B62 #define GL_DELETE_STATUS 0x8B80 #define GL_COMPILE_STATUS 0x8B81 #define GL_LINK_STATUS 0x8B82 #define GL_VALIDATE_STATUS 0x8B83 #define GL_INFO_LOG_LENGTH 0x8B84 #define GL_ATTACHED_SHADERS 0x8B85 #define GL_ACTIVE_UNIFORMS 0x8B86 #define GL_ACTIVE_UNIFORM_MAX_LENGTH 0x8B87 #define GL_SHADER_SOURCE_LENGTH 0x8B88 #define GL_ACTIVE_ATTRIBUTES 0x8B89 #define GL_ACTIVE_ATTRIBUTE_MAX_LENGTH 0x8B8A #define GL_FRAGMENT_SHADER_DERIVATIVE_HINT 0x8B8B #define GL_SHADING_LANGUAGE_VERSION 0x8B8C #define GL_CURRENT_PROGRAM 0x8B8D #define GL_POINT_SPRITE_COORD_ORIGIN 0x8CA0 #define GL_LOWER_LEFT 0x8CA1 #define GL_UPPER_LEFT 0x8CA2 #define GL_STENCIL_BACK_REF 0x8CA3 #define GL_STENCIL_BACK_VALUE_MASK 0x8CA4 #define GL_STENCIL_BACK_WRITEMASK 0x8CA5 #endif #ifndef GL_VERSION_2_0_DEPRECATED #define GL_VERTEX_PROGRAM_TWO_SIDE 0x8643 #define GL_POINT_SPRITE 0x8861 #define GL_COORD_REPLACE 0x8862 #define GL_MAX_TEXTURE_COORDS 0x8871 #endif #ifndef GL_VERSION_2_1 #define GL_PIXEL_PACK_BUFFER 0x88EB #define GL_PIXEL_UNPACK_BUFFER 0x88EC #define GL_PIXEL_PACK_BUFFER_BINDING 0x88ED #define GL_PIXEL_UNPACK_BUFFER_BINDING 0x88EF #define GL_FLOAT_MAT2x3 0x8B65 #define GL_FLOAT_MAT2x4 0x8B66 #define GL_FLOAT_MAT3x2 0x8B67 #define GL_FLOAT_MAT3x4 0x8B68 #define GL_FLOAT_MAT4x2 0x8B69 #define GL_FLOAT_MAT4x3 0x8B6A #define GL_SRGB 0x8C40 #define GL_SRGB8 0x8C41 #define GL_SRGB_ALPHA 0x8C42 #define GL_SRGB8_ALPHA8 0x8C43 #define GL_COMPRESSED_SRGB 0x8C48 #define GL_COMPRESSED_SRGB_ALPHA 0x8C49 #endif #ifndef GL_VERSION_2_1_DEPRECATED #define GL_CURRENT_RASTER_SECONDARY_COLOR 0x845F #define GL_SLUMINANCE_ALPHA 0x8C44 #define GL_SLUMINANCE8_ALPHA8 0x8C45 #define GL_SLUMINANCE 0x8C46 #define GL_SLUMINANCE8 0x8C47 #define GL_COMPRESSED_SLUMINANCE 0x8C4A #define GL_COMPRESSED_SLUMINANCE_ALPHA 0x8C4B #endif #ifndef GL_VERSION_3_0 #define GL_COMPARE_REF_TO_TEXTURE 0x884E #define GL_CLIP_DISTANCE0 0x3000 #define GL_CLIP_DISTANCE1 0x3001 #define GL_CLIP_DISTANCE2 0x3002 #define GL_CLIP_DISTANCE3 0x3003 #define GL_CLIP_DISTANCE4 0x3004 #define GL_CLIP_DISTANCE5 0x3005 #define GL_CLIP_DISTANCE6 0x3006 #define GL_CLIP_DISTANCE7 0x3007 #define GL_MAX_CLIP_DISTANCES 0x0D32 #define GL_MAJOR_VERSION 0x821B #define GL_MINOR_VERSION 0x821C #define GL_NUM_EXTENSIONS 0x821D #define GL_CONTEXT_FLAGS 0x821E #define GL_DEPTH_BUFFER 0x8223 #define GL_STENCIL_BUFFER 0x8224 #define GL_COMPRESSED_RED 0x8225 #define GL_COMPRESSED_RG 0x8226 #define GL_CONTEXT_FLAG_FORWARD_COMPATIBLE_BIT 0x0001 #define GL_RGBA32F 0x8814 #define GL_RGB32F 0x8815 #define GL_RGBA16F 0x881A #define GL_RGB16F 0x881B #define GL_VERTEX_ATTRIB_ARRAY_INTEGER 0x88FD #define GL_MAX_ARRAY_TEXTURE_LAYERS 0x88FF #define GL_MIN_PROGRAM_TEXEL_OFFSET 0x8904 #define GL_MAX_PROGRAM_TEXEL_OFFSET 0x8905 #define GL_CLAMP_READ_COLOR 0x891C #define GL_FIXED_ONLY 0x891D #define GL_MAX_VARYING_COMPONENTS 0x8B4B #define GL_TEXTURE_1D_ARRAY 0x8C18 #define GL_PROXY_TEXTURE_1D_ARRAY 0x8C19 #define GL_TEXTURE_2D_ARRAY 0x8C1A #define GL_PROXY_TEXTURE_2D_ARRAY 0x8C1B #define GL_TEXTURE_BINDING_1D_ARRAY 0x8C1C #define GL_TEXTURE_BINDING_2D_ARRAY 0x8C1D #define GL_R11F_G11F_B10F 0x8C3A #define GL_UNSIGNED_INT_10F_11F_11F_REV 0x8C3B #define GL_RGB9_E5 0x8C3D #define GL_UNSIGNED_INT_5_9_9_9_REV 0x8C3E #define GL_TEXTURE_SHARED_SIZE 0x8C3F #define GL_TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH 0x8C76 #define GL_TRANSFORM_FEEDBACK_BUFFER_MODE 0x8C7F #define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS 0x8C80 #define GL_TRANSFORM_FEEDBACK_VARYINGS 0x8C83 #define GL_TRANSFORM_FEEDBACK_BUFFER_START 0x8C84 #define GL_TRANSFORM_FEEDBACK_BUFFER_SIZE 0x8C85 #define GL_PRIMITIVES_GENERATED 0x8C87 #define GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN 0x8C88 #define GL_RASTERIZER_DISCARD 0x8C89 #define GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 0x8C8A #define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS 0x8C8B #define GL_INTERLEAVED_ATTRIBS 0x8C8C #define GL_SEPARATE_ATTRIBS 0x8C8D #define GL_TRANSFORM_FEEDBACK_BUFFER 0x8C8E #define GL_TRANSFORM_FEEDBACK_BUFFER_BINDING 0x8C8F #define GL_RGBA32UI 0x8D70 #define GL_RGB32UI 0x8D71 #define GL_RGBA16UI 0x8D76 #define GL_RGB16UI 0x8D77 #define GL_RGBA8UI 0x8D7C #define GL_RGB8UI 0x8D7D #define GL_RGBA32I 0x8D82 #define GL_RGB32I 0x8D83 #define GL_RGBA16I 0x8D88 #define GL_RGB16I 0x8D89 #define GL_RGBA8I 0x8D8E #define GL_RGB8I 0x8D8F #define GL_RED_INTEGER 0x8D94 #define GL_GREEN_INTEGER 0x8D95 #define GL_BLUE_INTEGER 0x8D96 #define GL_RGB_INTEGER 0x8D98 #define GL_RGBA_INTEGER 0x8D99 #define GL_BGR_INTEGER 0x8D9A #define GL_BGRA_INTEGER 0x8D9B #define GL_SAMPLER_1D_ARRAY 0x8DC0 #define GL_SAMPLER_2D_ARRAY 0x8DC1 #define GL_SAMPLER_1D_ARRAY_SHADOW 0x8DC3 #define GL_SAMPLER_2D_ARRAY_SHADOW 0x8DC4 #define GL_SAMPLER_CUBE_SHADOW 0x8DC5 #define GL_UNSIGNED_INT_VEC2 0x8DC6 #define GL_UNSIGNED_INT_VEC3 0x8DC7 #define GL_UNSIGNED_INT_VEC4 0x8DC8 #define GL_INT_SAMPLER_1D 0x8DC9 #define GL_INT_SAMPLER_2D 0x8DCA #define GL_INT_SAMPLER_3D 0x8DCB #define GL_INT_SAMPLER_CUBE 0x8DCC #define GL_INT_SAMPLER_1D_ARRAY 0x8DCE #define GL_INT_SAMPLER_2D_ARRAY 0x8DCF #define GL_UNSIGNED_INT_SAMPLER_1D 0x8DD1 #define GL_UNSIGNED_INT_SAMPLER_2D 0x8DD2 #define GL_UNSIGNED_INT_SAMPLER_3D 0x8DD3 #define GL_UNSIGNED_INT_SAMPLER_CUBE 0x8DD4 #define GL_UNSIGNED_INT_SAMPLER_1D_ARRAY 0x8DD6 #define GL_UNSIGNED_INT_SAMPLER_2D_ARRAY 0x8DD7 #define GL_QUERY_WAIT 0x8E13 #define GL_QUERY_NO_WAIT 0x8E14 #define GL_QUERY_BY_REGION_WAIT 0x8E15 #define GL_QUERY_BY_REGION_NO_WAIT 0x8E16 #define GL_BUFFER_ACCESS_FLAGS 0x911F #define GL_BUFFER_MAP_LENGTH 0x9120 #define GL_BUFFER_MAP_OFFSET 0x9121 /* Reuse tokens from ARB_depth_buffer_float */ /* reuse GL_DEPTH_COMPONENT32F */ /* reuse GL_DEPTH32F_STENCIL8 */ /* reuse GL_FLOAT_32_UNSIGNED_INT_24_8_REV */ /* Reuse tokens from ARB_framebuffer_object */ /* reuse GL_INVALID_FRAMEBUFFER_OPERATION */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE */ /* reuse GL_FRAMEBUFFER_DEFAULT */ /* reuse GL_FRAMEBUFFER_UNDEFINED */ /* reuse GL_DEPTH_STENCIL_ATTACHMENT */ /* reuse GL_INDEX */ /* reuse GL_MAX_RENDERBUFFER_SIZE */ /* reuse GL_DEPTH_STENCIL */ /* reuse GL_UNSIGNED_INT_24_8 */ /* reuse GL_DEPTH24_STENCIL8 */ /* reuse GL_TEXTURE_STENCIL_SIZE */ /* reuse GL_TEXTURE_RED_TYPE */ /* reuse GL_TEXTURE_GREEN_TYPE */ /* reuse GL_TEXTURE_BLUE_TYPE */ /* reuse GL_TEXTURE_ALPHA_TYPE */ /* reuse GL_TEXTURE_DEPTH_TYPE */ /* reuse GL_UNSIGNED_NORMALIZED */ /* reuse GL_FRAMEBUFFER_BINDING */ /* reuse GL_DRAW_FRAMEBUFFER_BINDING */ /* reuse GL_RENDERBUFFER_BINDING */ /* reuse GL_READ_FRAMEBUFFER */ /* reuse GL_DRAW_FRAMEBUFFER */ /* reuse GL_READ_FRAMEBUFFER_BINDING */ /* reuse GL_RENDERBUFFER_SAMPLES */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LEVEL */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_CUBE_MAP_FACE */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER */ /* reuse GL_FRAMEBUFFER_COMPLETE */ /* reuse GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT */ /* reuse GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT */ /* reuse GL_FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER */ /* reuse GL_FRAMEBUFFER_INCOMPLETE_READ_BUFFER */ /* reuse GL_FRAMEBUFFER_UNSUPPORTED */ /* reuse GL_MAX_COLOR_ATTACHMENTS */ /* reuse GL_COLOR_ATTACHMENT0 */ /* reuse GL_COLOR_ATTACHMENT1 */ /* reuse GL_COLOR_ATTACHMENT2 */ /* reuse GL_COLOR_ATTACHMENT3 */ /* reuse GL_COLOR_ATTACHMENT4 */ /* reuse GL_COLOR_ATTACHMENT5 */ /* reuse GL_COLOR_ATTACHMENT6 */ /* reuse GL_COLOR_ATTACHMENT7 */ /* reuse GL_COLOR_ATTACHMENT8 */ /* reuse GL_COLOR_ATTACHMENT9 */ /* reuse GL_COLOR_ATTACHMENT10 */ /* reuse GL_COLOR_ATTACHMENT11 */ /* reuse GL_COLOR_ATTACHMENT12 */ /* reuse GL_COLOR_ATTACHMENT13 */ /* reuse GL_COLOR_ATTACHMENT14 */ /* reuse GL_COLOR_ATTACHMENT15 */ /* reuse GL_DEPTH_ATTACHMENT */ /* reuse GL_STENCIL_ATTACHMENT */ /* reuse GL_FRAMEBUFFER */ /* reuse GL_RENDERBUFFER */ /* reuse GL_RENDERBUFFER_WIDTH */ /* reuse GL_RENDERBUFFER_HEIGHT */ /* reuse GL_RENDERBUFFER_INTERNAL_FORMAT */ /* reuse GL_STENCIL_INDEX1 */ /* reuse GL_STENCIL_INDEX4 */ /* reuse GL_STENCIL_INDEX8 */ /* reuse GL_STENCIL_INDEX16 */ /* reuse GL_RENDERBUFFER_RED_SIZE */ /* reuse GL_RENDERBUFFER_GREEN_SIZE */ /* reuse GL_RENDERBUFFER_BLUE_SIZE */ /* reuse GL_RENDERBUFFER_ALPHA_SIZE */ /* reuse GL_RENDERBUFFER_DEPTH_SIZE */ /* reuse GL_RENDERBUFFER_STENCIL_SIZE */ /* reuse GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE */ /* reuse GL_MAX_SAMPLES */ /* Reuse tokens from ARB_framebuffer_sRGB */ /* reuse GL_FRAMEBUFFER_SRGB */ /* Reuse tokens from ARB_half_float_vertex */ /* reuse GL_HALF_FLOAT */ /* Reuse tokens from ARB_map_buffer_range */ /* reuse GL_MAP_READ_BIT */ /* reuse GL_MAP_WRITE_BIT */ /* reuse GL_MAP_INVALIDATE_RANGE_BIT */ /* reuse GL_MAP_INVALIDATE_BUFFER_BIT */ /* reuse GL_MAP_FLUSH_EXPLICIT_BIT */ /* reuse GL_MAP_UNSYNCHRONIZED_BIT */ /* Reuse tokens from ARB_texture_compression_rgtc */ /* reuse GL_COMPRESSED_RED_RGTC1 */ /* reuse GL_COMPRESSED_SIGNED_RED_RGTC1 */ /* reuse GL_COMPRESSED_RG_RGTC2 */ /* reuse GL_COMPRESSED_SIGNED_RG_RGTC2 */ /* Reuse tokens from ARB_texture_rg */ /* reuse GL_RG */ /* reuse GL_RG_INTEGER */ /* reuse GL_R8 */ /* reuse GL_R16 */ /* reuse GL_RG8 */ /* reuse GL_RG16 */ /* reuse GL_R16F */ /* reuse GL_R32F */ /* reuse GL_RG16F */ /* reuse GL_RG32F */ /* reuse GL_R8I */ /* reuse GL_R8UI */ /* reuse GL_R16I */ /* reuse GL_R16UI */ /* reuse GL_R32I */ /* reuse GL_R32UI */ /* reuse GL_RG8I */ /* reuse GL_RG8UI */ /* reuse GL_RG16I */ /* reuse GL_RG16UI */ /* reuse GL_RG32I */ /* reuse GL_RG32UI */ /* Reuse tokens from ARB_vertex_array_object */ /* reuse GL_VERTEX_ARRAY_BINDING */ #endif #ifndef GL_VERSION_3_0_DEPRECATED #define GL_CLAMP_VERTEX_COLOR 0x891A #define GL_CLAMP_FRAGMENT_COLOR 0x891B #define GL_ALPHA_INTEGER 0x8D97 /* Reuse tokens from ARB_framebuffer_object */ /* reuse GL_TEXTURE_LUMINANCE_TYPE */ /* reuse GL_TEXTURE_INTENSITY_TYPE */ #endif #ifndef GL_VERSION_3_1 #define GL_SAMPLER_2D_RECT 0x8B63 #define GL_SAMPLER_2D_RECT_SHADOW 0x8B64 #define GL_SAMPLER_BUFFER 0x8DC2 #define GL_INT_SAMPLER_2D_RECT 0x8DCD #define GL_INT_SAMPLER_BUFFER 0x8DD0 #define GL_UNSIGNED_INT_SAMPLER_2D_RECT 0x8DD5 #define GL_UNSIGNED_INT_SAMPLER_BUFFER 0x8DD8 #define GL_TEXTURE_BUFFER 0x8C2A #define GL_MAX_TEXTURE_BUFFER_SIZE 0x8C2B #define GL_TEXTURE_BINDING_BUFFER 0x8C2C #define GL_TEXTURE_BUFFER_DATA_STORE_BINDING 0x8C2D #define GL_TEXTURE_BUFFER_FORMAT 0x8C2E #define GL_TEXTURE_RECTANGLE 0x84F5 #define GL_TEXTURE_BINDING_RECTANGLE 0x84F6 #define GL_PROXY_TEXTURE_RECTANGLE 0x84F7 #define GL_MAX_RECTANGLE_TEXTURE_SIZE 0x84F8 #define GL_RED_SNORM 0x8F90 #define GL_RG_SNORM 0x8F91 #define GL_RGB_SNORM 0x8F92 #define GL_RGBA_SNORM 0x8F93 #define GL_R8_SNORM 0x8F94 #define GL_RG8_SNORM 0x8F95 #define GL_RGB8_SNORM 0x8F96 #define GL_RGBA8_SNORM 0x8F97 #define GL_R16_SNORM 0x8F98 #define GL_RG16_SNORM 0x8F99 #define GL_RGB16_SNORM 0x8F9A #define GL_RGBA16_SNORM 0x8F9B #define GL_SIGNED_NORMALIZED 0x8F9C #define GL_PRIMITIVE_RESTART 0x8F9D #define GL_PRIMITIVE_RESTART_INDEX 0x8F9E /* Reuse tokens from ARB_copy_buffer */ /* reuse GL_COPY_READ_BUFFER */ /* reuse GL_COPY_WRITE_BUFFER */ /* Would reuse tokens from ARB_draw_instanced, but it has none */ /* Reuse tokens from ARB_uniform_buffer_object */ /* reuse GL_UNIFORM_BUFFER */ /* reuse GL_UNIFORM_BUFFER_BINDING */ /* reuse GL_UNIFORM_BUFFER_START */ /* reuse GL_UNIFORM_BUFFER_SIZE */ /* reuse GL_MAX_VERTEX_UNIFORM_BLOCKS */ /* reuse GL_MAX_FRAGMENT_UNIFORM_BLOCKS */ /* reuse GL_MAX_COMBINED_UNIFORM_BLOCKS */ /* reuse GL_MAX_UNIFORM_BUFFER_BINDINGS */ /* reuse GL_MAX_UNIFORM_BLOCK_SIZE */ /* reuse GL_MAX_COMBINED_VERTEX_UNIFORM_COMPONENTS */ /* reuse GL_MAX_COMBINED_FRAGMENT_UNIFORM_COMPONENTS */ /* reuse GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT */ /* reuse GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH */ /* reuse GL_ACTIVE_UNIFORM_BLOCKS */ /* reuse GL_UNIFORM_TYPE */ /* reuse GL_UNIFORM_SIZE */ /* reuse GL_UNIFORM_NAME_LENGTH */ /* reuse GL_UNIFORM_BLOCK_INDEX */ /* reuse GL_UNIFORM_OFFSET */ /* reuse GL_UNIFORM_ARRAY_STRIDE */ /* reuse GL_UNIFORM_MATRIX_STRIDE */ /* reuse GL_UNIFORM_IS_ROW_MAJOR */ /* reuse GL_UNIFORM_BLOCK_BINDING */ /* reuse GL_UNIFORM_BLOCK_DATA_SIZE */ /* reuse GL_UNIFORM_BLOCK_NAME_LENGTH */ /* reuse GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS */ /* reuse GL_UNIFORM_BLOCK_ACTIVE_UNIFORM_INDICES */ /* reuse GL_UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER */ /* reuse GL_UNIFORM_BLOCK_REFERENCED_BY_FRAGMENT_SHADER */ /* reuse GL_INVALID_INDEX */ #endif #ifndef GL_VERSION_3_2 #define GL_CONTEXT_CORE_PROFILE_BIT 0x00000001 #define GL_CONTEXT_COMPATIBILITY_PROFILE_BIT 0x00000002 #define GL_LINES_ADJACENCY 0x000A #define GL_LINE_STRIP_ADJACENCY 0x000B #define GL_TRIANGLES_ADJACENCY 0x000C #define GL_TRIANGLE_STRIP_ADJACENCY 0x000D #define GL_PROGRAM_POINT_SIZE 0x8642 #define GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS 0x8C29 #define GL_FRAMEBUFFER_ATTACHMENT_LAYERED 0x8DA7 #define GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS 0x8DA8 #define GL_GEOMETRY_SHADER 0x8DD9 #define GL_GEOMETRY_VERTICES_OUT 0x8916 #define GL_GEOMETRY_INPUT_TYPE 0x8917 #define GL_GEOMETRY_OUTPUT_TYPE 0x8918 #define GL_MAX_GEOMETRY_UNIFORM_COMPONENTS 0x8DDF #define GL_MAX_GEOMETRY_OUTPUT_VERTICES 0x8DE0 #define GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS 0x8DE1 #define GL_MAX_VERTEX_OUTPUT_COMPONENTS 0x9122 #define GL_MAX_GEOMETRY_INPUT_COMPONENTS 0x9123 #define GL_MAX_GEOMETRY_OUTPUT_COMPONENTS 0x9124 #define GL_MAX_FRAGMENT_INPUT_COMPONENTS 0x9125 #define GL_CONTEXT_PROFILE_MASK 0x9126 /* reuse GL_MAX_VARYING_COMPONENTS */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER */ /* Reuse tokens from ARB_depth_clamp */ /* reuse GL_DEPTH_CLAMP */ /* Would reuse tokens from ARB_draw_elements_base_vertex, but it has none */ /* Would reuse tokens from ARB_fragment_coord_conventions, but it has none */ /* Reuse tokens from ARB_provoking_vertex */ /* reuse GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION */ /* reuse GL_FIRST_VERTEX_CONVENTION */ /* reuse GL_LAST_VERTEX_CONVENTION */ /* reuse GL_PROVOKING_VERTEX */ /* Reuse tokens from ARB_seamless_cube_map */ /* reuse GL_TEXTURE_CUBE_MAP_SEAMLESS */ /* Reuse tokens from ARB_sync */ /* reuse GL_MAX_SERVER_WAIT_TIMEOUT */ /* reuse GL_OBJECT_TYPE */ /* reuse GL_SYNC_CONDITION */ /* reuse GL_SYNC_STATUS */ /* reuse GL_SYNC_FLAGS */ /* reuse GL_SYNC_FENCE */ /* reuse GL_SYNC_GPU_COMMANDS_COMPLETE */ /* reuse GL_UNSIGNALED */ /* reuse GL_SIGNALED */ /* reuse GL_ALREADY_SIGNALED */ /* reuse GL_TIMEOUT_EXPIRED */ /* reuse GL_CONDITION_SATISFIED */ /* reuse GL_WAIT_FAILED */ /* reuse GL_TIMEOUT_IGNORED */ /* reuse GL_SYNC_FLUSH_COMMANDS_BIT */ /* reuse GL_TIMEOUT_IGNORED */ /* Reuse tokens from ARB_texture_multisample */ /* reuse GL_SAMPLE_POSITION */ /* reuse GL_SAMPLE_MASK */ /* reuse GL_SAMPLE_MASK_VALUE */ /* reuse GL_MAX_SAMPLE_MASK_WORDS */ /* reuse GL_TEXTURE_2D_MULTISAMPLE */ /* reuse GL_PROXY_TEXTURE_2D_MULTISAMPLE */ /* reuse GL_TEXTURE_2D_MULTISAMPLE_ARRAY */ /* reuse GL_PROXY_TEXTURE_2D_MULTISAMPLE_ARRAY */ /* reuse GL_TEXTURE_BINDING_2D_MULTISAMPLE */ /* reuse GL_TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY */ /* reuse GL_TEXTURE_SAMPLES */ /* reuse GL_TEXTURE_FIXED_SAMPLE_LOCATIONS */ /* reuse GL_SAMPLER_2D_MULTISAMPLE */ /* reuse GL_INT_SAMPLER_2D_MULTISAMPLE */ /* reuse GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE */ /* reuse GL_SAMPLER_2D_MULTISAMPLE_ARRAY */ /* reuse GL_INT_SAMPLER_2D_MULTISAMPLE_ARRAY */ /* reuse GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE_ARRAY */ /* reuse GL_MAX_COLOR_TEXTURE_SAMPLES */ /* reuse GL_MAX_DEPTH_TEXTURE_SAMPLES */ /* reuse GL_MAX_INTEGER_SAMPLES */ /* Don't need to reuse tokens from ARB_vertex_array_bgra since they're already in 1.2 core */ #endif #ifndef GL_ARB_multitexture #define GL_TEXTURE0_ARB 0x84C0 #define GL_TEXTURE1_ARB 0x84C1 #define GL_TEXTURE2_ARB 0x84C2 #define GL_TEXTURE3_ARB 0x84C3 #define GL_TEXTURE4_ARB 0x84C4 #define GL_TEXTURE5_ARB 0x84C5 #define GL_TEXTURE6_ARB 0x84C6 #define GL_TEXTURE7_ARB 0x84C7 #define GL_TEXTURE8_ARB 0x84C8 #define GL_TEXTURE9_ARB 0x84C9 #define GL_TEXTURE10_ARB 0x84CA #define GL_TEXTURE11_ARB 0x84CB #define GL_TEXTURE12_ARB 0x84CC #define GL_TEXTURE13_ARB 0x84CD #define GL_TEXTURE14_ARB 0x84CE #define GL_TEXTURE15_ARB 0x84CF #define GL_TEXTURE16_ARB 0x84D0 #define GL_TEXTURE17_ARB 0x84D1 #define GL_TEXTURE18_ARB 0x84D2 #define GL_TEXTURE19_ARB 0x84D3 #define GL_TEXTURE20_ARB 0x84D4 #define GL_TEXTURE21_ARB 0x84D5 #define GL_TEXTURE22_ARB 0x84D6 #define GL_TEXTURE23_ARB 0x84D7 #define GL_TEXTURE24_ARB 0x84D8 #define GL_TEXTURE25_ARB 0x84D9 #define GL_TEXTURE26_ARB 0x84DA #define GL_TEXTURE27_ARB 0x84DB #define GL_TEXTURE28_ARB 0x84DC #define GL_TEXTURE29_ARB 0x84DD #define GL_TEXTURE30_ARB 0x84DE #define GL_TEXTURE31_ARB 0x84DF #define GL_ACTIVE_TEXTURE_ARB 0x84E0 #define GL_CLIENT_ACTIVE_TEXTURE_ARB 0x84E1 #define GL_MAX_TEXTURE_UNITS_ARB 0x84E2 #endif #ifndef GL_ARB_transpose_matrix #define GL_TRANSPOSE_MODELVIEW_MATRIX_ARB 0x84E3 #define GL_TRANSPOSE_PROJECTION_MATRIX_ARB 0x84E4 #define GL_TRANSPOSE_TEXTURE_MATRIX_ARB 0x84E5 #define GL_TRANSPOSE_COLOR_MATRIX_ARB 0x84E6 #endif #ifndef GL_ARB_multisample #define GL_MULTISAMPLE_ARB 0x809D #define GL_SAMPLE_ALPHA_TO_COVERAGE_ARB 0x809E #define GL_SAMPLE_ALPHA_TO_ONE_ARB 0x809F #define GL_SAMPLE_COVERAGE_ARB 0x80A0 #define GL_SAMPLE_BUFFERS_ARB 0x80A8 #define GL_SAMPLES_ARB 0x80A9 #define GL_SAMPLE_COVERAGE_VALUE_ARB 0x80AA #define GL_SAMPLE_COVERAGE_INVERT_ARB 0x80AB #define GL_MULTISAMPLE_BIT_ARB 0x20000000 #endif #ifndef GL_ARB_texture_env_add #endif #ifndef GL_ARB_texture_cube_map #define GL_NORMAL_MAP_ARB 0x8511 #define GL_REFLECTION_MAP_ARB 0x8512 #define GL_TEXTURE_CUBE_MAP_ARB 0x8513 #define GL_TEXTURE_BINDING_CUBE_MAP_ARB 0x8514 #define GL_TEXTURE_CUBE_MAP_POSITIVE_X_ARB 0x8515 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_X_ARB 0x8516 #define GL_TEXTURE_CUBE_MAP_POSITIVE_Y_ARB 0x8517 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_Y_ARB 0x8518 #define GL_TEXTURE_CUBE_MAP_POSITIVE_Z_ARB 0x8519 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_Z_ARB 0x851A #define GL_PROXY_TEXTURE_CUBE_MAP_ARB 0x851B #define GL_MAX_CUBE_MAP_TEXTURE_SIZE_ARB 0x851C #endif #ifndef GL_ARB_texture_compression #define GL_COMPRESSED_ALPHA_ARB 0x84E9 #define GL_COMPRESSED_LUMINANCE_ARB 0x84EA #define GL_COMPRESSED_LUMINANCE_ALPHA_ARB 0x84EB #define GL_COMPRESSED_INTENSITY_ARB 0x84EC #define GL_COMPRESSED_RGB_ARB 0x84ED #define GL_COMPRESSED_RGBA_ARB 0x84EE #define GL_TEXTURE_COMPRESSION_HINT_ARB 0x84EF #define GL_TEXTURE_COMPRESSED_IMAGE_SIZE_ARB 0x86A0 #define GL_TEXTURE_COMPRESSED_ARB 0x86A1 #define GL_NUM_COMPRESSED_TEXTURE_FORMATS_ARB 0x86A2 #define GL_COMPRESSED_TEXTURE_FORMATS_ARB 0x86A3 #endif #ifndef GL_ARB_texture_border_clamp #define GL_CLAMP_TO_BORDER_ARB 0x812D #endif #ifndef GL_ARB_point_parameters #define GL_POINT_SIZE_MIN_ARB 0x8126 #define GL_POINT_SIZE_MAX_ARB 0x8127 #define GL_POINT_FADE_THRESHOLD_SIZE_ARB 0x8128 #define GL_POINT_DISTANCE_ATTENUATION_ARB 0x8129 #endif #ifndef GL_ARB_vertex_blend #define GL_MAX_VERTEX_UNITS_ARB 0x86A4 #define GL_ACTIVE_VERTEX_UNITS_ARB 0x86A5 #define GL_WEIGHT_SUM_UNITY_ARB 0x86A6 #define GL_VERTEX_BLEND_ARB 0x86A7 #define GL_CURRENT_WEIGHT_ARB 0x86A8 #define GL_WEIGHT_ARRAY_TYPE_ARB 0x86A9 #define GL_WEIGHT_ARRAY_STRIDE_ARB 0x86AA #define GL_WEIGHT_ARRAY_SIZE_ARB 0x86AB #define GL_WEIGHT_ARRAY_POINTER_ARB 0x86AC #define GL_WEIGHT_ARRAY_ARB 0x86AD #define GL_MODELVIEW0_ARB 0x1700 #define GL_MODELVIEW1_ARB 0x850A #define GL_MODELVIEW2_ARB 0x8722 #define GL_MODELVIEW3_ARB 0x8723 #define GL_MODELVIEW4_ARB 0x8724 #define GL_MODELVIEW5_ARB 0x8725 #define GL_MODELVIEW6_ARB 0x8726 #define GL_MODELVIEW7_ARB 0x8727 #define GL_MODELVIEW8_ARB 0x8728 #define GL_MODELVIEW9_ARB 0x8729 #define GL_MODELVIEW10_ARB 0x872A #define GL_MODELVIEW11_ARB 0x872B #define GL_MODELVIEW12_ARB 0x872C #define GL_MODELVIEW13_ARB 0x872D #define GL_MODELVIEW14_ARB 0x872E #define GL_MODELVIEW15_ARB 0x872F #define GL_MODELVIEW16_ARB 0x8730 #define GL_MODELVIEW17_ARB 0x8731 #define GL_MODELVIEW18_ARB 0x8732 #define GL_MODELVIEW19_ARB 0x8733 #define GL_MODELVIEW20_ARB 0x8734 #define GL_MODELVIEW21_ARB 0x8735 #define GL_MODELVIEW22_ARB 0x8736 #define GL_MODELVIEW23_ARB 0x8737 #define GL_MODELVIEW24_ARB 0x8738 #define GL_MODELVIEW25_ARB 0x8739 #define GL_MODELVIEW26_ARB 0x873A #define GL_MODELVIEW27_ARB 0x873B #define GL_MODELVIEW28_ARB 0x873C #define GL_MODELVIEW29_ARB 0x873D #define GL_MODELVIEW30_ARB 0x873E #define GL_MODELVIEW31_ARB 0x873F #endif #ifndef GL_ARB_matrix_palette #define GL_MATRIX_PALETTE_ARB 0x8840 #define GL_MAX_MATRIX_PALETTE_STACK_DEPTH_ARB 0x8841 #define GL_MAX_PALETTE_MATRICES_ARB 0x8842 #define GL_CURRENT_PALETTE_MATRIX_ARB 0x8843 #define GL_MATRIX_INDEX_ARRAY_ARB 0x8844 #define GL_CURRENT_MATRIX_INDEX_ARB 0x8845 #define GL_MATRIX_INDEX_ARRAY_SIZE_ARB 0x8846 #define GL_MATRIX_INDEX_ARRAY_TYPE_ARB 0x8847 #define GL_MATRIX_INDEX_ARRAY_STRIDE_ARB 0x8848 #define GL_MATRIX_INDEX_ARRAY_POINTER_ARB 0x8849 #endif #ifndef GL_ARB_texture_env_combine #define GL_COMBINE_ARB 0x8570 #define GL_COMBINE_RGB_ARB 0x8571 #define GL_COMBINE_ALPHA_ARB 0x8572 #define GL_SOURCE0_RGB_ARB 0x8580 #define GL_SOURCE1_RGB_ARB 0x8581 #define GL_SOURCE2_RGB_ARB 0x8582 #define GL_SOURCE0_ALPHA_ARB 0x8588 #define GL_SOURCE1_ALPHA_ARB 0x8589 #define GL_SOURCE2_ALPHA_ARB 0x858A #define GL_OPERAND0_RGB_ARB 0x8590 #define GL_OPERAND1_RGB_ARB 0x8591 #define GL_OPERAND2_RGB_ARB 0x8592 #define GL_OPERAND0_ALPHA_ARB 0x8598 #define GL_OPERAND1_ALPHA_ARB 0x8599 #define GL_OPERAND2_ALPHA_ARB 0x859A #define GL_RGB_SCALE_ARB 0x8573 #define GL_ADD_SIGNED_ARB 0x8574 #define GL_INTERPOLATE_ARB 0x8575 #define GL_SUBTRACT_ARB 0x84E7 #define GL_CONSTANT_ARB 0x8576 #define GL_PRIMARY_COLOR_ARB 0x8577 #define GL_PREVIOUS_ARB 0x8578 #endif #ifndef GL_ARB_texture_env_crossbar #endif #ifndef GL_ARB_texture_env_dot3 #define GL_DOT3_RGB_ARB 0x86AE #define GL_DOT3_RGBA_ARB 0x86AF #endif #ifndef GL_ARB_texture_mirrored_repeat #define GL_MIRRORED_REPEAT_ARB 0x8370 #endif #ifndef GL_ARB_depth_texture #define GL_DEPTH_COMPONENT16_ARB 0x81A5 #define GL_DEPTH_COMPONENT24_ARB 0x81A6 #define GL_DEPTH_COMPONENT32_ARB 0x81A7 #define GL_TEXTURE_DEPTH_SIZE_ARB 0x884A #define GL_DEPTH_TEXTURE_MODE_ARB 0x884B #endif #ifndef GL_ARB_shadow #define GL_TEXTURE_COMPARE_MODE_ARB 0x884C #define GL_TEXTURE_COMPARE_FUNC_ARB 0x884D #define GL_COMPARE_R_TO_TEXTURE_ARB 0x884E #endif #ifndef GL_ARB_shadow_ambient #define GL_TEXTURE_COMPARE_FAIL_VALUE_ARB 0x80BF #endif #ifndef GL_ARB_window_pos #endif #ifndef GL_ARB_vertex_program #define GL_COLOR_SUM_ARB 0x8458 #define GL_VERTEX_PROGRAM_ARB 0x8620 #define GL_VERTEX_ATTRIB_ARRAY_ENABLED_ARB 0x8622 #define GL_VERTEX_ATTRIB_ARRAY_SIZE_ARB 0x8623 #define GL_VERTEX_ATTRIB_ARRAY_STRIDE_ARB 0x8624 #define GL_VERTEX_ATTRIB_ARRAY_TYPE_ARB 0x8625 #define GL_CURRENT_VERTEX_ATTRIB_ARB 0x8626 #define GL_PROGRAM_LENGTH_ARB 0x8627 #define GL_PROGRAM_STRING_ARB 0x8628 #define GL_MAX_PROGRAM_MATRIX_STACK_DEPTH_ARB 0x862E #define GL_MAX_PROGRAM_MATRICES_ARB 0x862F #define GL_CURRENT_MATRIX_STACK_DEPTH_ARB 0x8640 #define GL_CURRENT_MATRIX_ARB 0x8641 #define GL_VERTEX_PROGRAM_POINT_SIZE_ARB 0x8642 #define GL_VERTEX_PROGRAM_TWO_SIDE_ARB 0x8643 #define GL_VERTEX_ATTRIB_ARRAY_POINTER_ARB 0x8645 #define GL_PROGRAM_ERROR_POSITION_ARB 0x864B #define GL_PROGRAM_BINDING_ARB 0x8677 #define GL_MAX_VERTEX_ATTRIBS_ARB 0x8869 #define GL_VERTEX_ATTRIB_ARRAY_NORMALIZED_ARB 0x886A #define GL_PROGRAM_ERROR_STRING_ARB 0x8874 #define GL_PROGRAM_FORMAT_ASCII_ARB 0x8875 #define GL_PROGRAM_FORMAT_ARB 0x8876 #define GL_PROGRAM_INSTRUCTIONS_ARB 0x88A0 #define GL_MAX_PROGRAM_INSTRUCTIONS_ARB 0x88A1 #define GL_PROGRAM_NATIVE_INSTRUCTIONS_ARB 0x88A2 #define GL_MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB 0x88A3 #define GL_PROGRAM_TEMPORARIES_ARB 0x88A4 #define GL_MAX_PROGRAM_TEMPORARIES_ARB 0x88A5 #define GL_PROGRAM_NATIVE_TEMPORARIES_ARB 0x88A6 #define GL_MAX_PROGRAM_NATIVE_TEMPORARIES_ARB 0x88A7 #define GL_PROGRAM_PARAMETERS_ARB 0x88A8 #define GL_MAX_PROGRAM_PARAMETERS_ARB 0x88A9 #define GL_PROGRAM_NATIVE_PARAMETERS_ARB 0x88AA #define GL_MAX_PROGRAM_NATIVE_PARAMETERS_ARB 0x88AB #define GL_PROGRAM_ATTRIBS_ARB 0x88AC #define GL_MAX_PROGRAM_ATTRIBS_ARB 0x88AD #define GL_PROGRAM_NATIVE_ATTRIBS_ARB 0x88AE #define GL_MAX_PROGRAM_NATIVE_ATTRIBS_ARB 0x88AF #define GL_PROGRAM_ADDRESS_REGISTERS_ARB 0x88B0 #define GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB 0x88B1 #define GL_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB 0x88B2 #define GL_MAX_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB 0x88B3 #define GL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB 0x88B4 #define GL_MAX_PROGRAM_ENV_PARAMETERS_ARB 0x88B5 #define GL_PROGRAM_UNDER_NATIVE_LIMITS_ARB 0x88B6 #define GL_TRANSPOSE_CURRENT_MATRIX_ARB 0x88B7 #define GL_MATRIX0_ARB 0x88C0 #define GL_MATRIX1_ARB 0x88C1 #define GL_MATRIX2_ARB 0x88C2 #define GL_MATRIX3_ARB 0x88C3 #define GL_MATRIX4_ARB 0x88C4 #define GL_MATRIX5_ARB 0x88C5 #define GL_MATRIX6_ARB 0x88C6 #define GL_MATRIX7_ARB 0x88C7 #define GL_MATRIX8_ARB 0x88C8 #define GL_MATRIX9_ARB 0x88C9 #define GL_MATRIX10_ARB 0x88CA #define GL_MATRIX11_ARB 0x88CB #define GL_MATRIX12_ARB 0x88CC #define GL_MATRIX13_ARB 0x88CD #define GL_MATRIX14_ARB 0x88CE #define GL_MATRIX15_ARB 0x88CF #define GL_MATRIX16_ARB 0x88D0 #define GL_MATRIX17_ARB 0x88D1 #define GL_MATRIX18_ARB 0x88D2 #define GL_MATRIX19_ARB 0x88D3 #define GL_MATRIX20_ARB 0x88D4 #define GL_MATRIX21_ARB 0x88D5 #define GL_MATRIX22_ARB 0x88D6 #define GL_MATRIX23_ARB 0x88D7 #define GL_MATRIX24_ARB 0x88D8 #define GL_MATRIX25_ARB 0x88D9 #define GL_MATRIX26_ARB 0x88DA #define GL_MATRIX27_ARB 0x88DB #define GL_MATRIX28_ARB 0x88DC #define GL_MATRIX29_ARB 0x88DD #define GL_MATRIX30_ARB 0x88DE #define GL_MATRIX31_ARB 0x88DF #endif #ifndef GL_ARB_fragment_program #define GL_FRAGMENT_PROGRAM_ARB 0x8804 #define GL_PROGRAM_ALU_INSTRUCTIONS_ARB 0x8805 #define GL_PROGRAM_TEX_INSTRUCTIONS_ARB 0x8806 #define GL_PROGRAM_TEX_INDIRECTIONS_ARB 0x8807 #define GL_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB 0x8808 #define GL_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB 0x8809 #define GL_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB 0x880A #define GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB 0x880B #define GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB 0x880C #define GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB 0x880D #define GL_MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB 0x880E #define GL_MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB 0x880F #define GL_MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB 0x8810 #define GL_MAX_TEXTURE_COORDS_ARB 0x8871 #define GL_MAX_TEXTURE_IMAGE_UNITS_ARB 0x8872 #endif #ifndef GL_ARB_vertex_buffer_object #define GL_BUFFER_SIZE_ARB 0x8764 #define GL_BUFFER_USAGE_ARB 0x8765 #define GL_ARRAY_BUFFER_ARB 0x8892 #define GL_ELEMENT_ARRAY_BUFFER_ARB 0x8893 #define GL_ARRAY_BUFFER_BINDING_ARB 0x8894 #define GL_ELEMENT_ARRAY_BUFFER_BINDING_ARB 0x8895 #define GL_VERTEX_ARRAY_BUFFER_BINDING_ARB 0x8896 #define GL_NORMAL_ARRAY_BUFFER_BINDING_ARB 0x8897 #define GL_COLOR_ARRAY_BUFFER_BINDING_ARB 0x8898 #define GL_INDEX_ARRAY_BUFFER_BINDING_ARB 0x8899 #define GL_TEXTURE_COORD_ARRAY_BUFFER_BINDING_ARB 0x889A #define GL_EDGE_FLAG_ARRAY_BUFFER_BINDING_ARB 0x889B #define GL_SECONDARY_COLOR_ARRAY_BUFFER_BINDING_ARB 0x889C #define GL_FOG_COORDINATE_ARRAY_BUFFER_BINDING_ARB 0x889D #define GL_WEIGHT_ARRAY_BUFFER_BINDING_ARB 0x889E #define GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_ARB 0x889F #define GL_READ_ONLY_ARB 0x88B8 #define GL_WRITE_ONLY_ARB 0x88B9 #define GL_READ_WRITE_ARB 0x88BA #define GL_BUFFER_ACCESS_ARB 0x88BB #define GL_BUFFER_MAPPED_ARB 0x88BC #define GL_BUFFER_MAP_POINTER_ARB 0x88BD #define GL_STREAM_DRAW_ARB 0x88E0 #define GL_STREAM_READ_ARB 0x88E1 #define GL_STREAM_COPY_ARB 0x88E2 #define GL_STATIC_DRAW_ARB 0x88E4 #define GL_STATIC_READ_ARB 0x88E5 #define GL_STATIC_COPY_ARB 0x88E6 #define GL_DYNAMIC_DRAW_ARB 0x88E8 #define GL_DYNAMIC_READ_ARB 0x88E9 #define GL_DYNAMIC_COPY_ARB 0x88EA #endif #ifndef GL_ARB_occlusion_query #define GL_QUERY_COUNTER_BITS_ARB 0x8864 #define GL_CURRENT_QUERY_ARB 0x8865 #define GL_QUERY_RESULT_ARB 0x8866 #define GL_QUERY_RESULT_AVAILABLE_ARB 0x8867 #define GL_SAMPLES_PASSED_ARB 0x8914 #endif #ifndef GL_ARB_shader_objects #define GL_PROGRAM_OBJECT_ARB 0x8B40 #define GL_SHADER_OBJECT_ARB 0x8B48 #define GL_OBJECT_TYPE_ARB 0x8B4E #define GL_OBJECT_SUBTYPE_ARB 0x8B4F #define GL_FLOAT_VEC2_ARB 0x8B50 #define GL_FLOAT_VEC3_ARB 0x8B51 #define GL_FLOAT_VEC4_ARB 0x8B52 #define GL_INT_VEC2_ARB 0x8B53 #define GL_INT_VEC3_ARB 0x8B54 #define GL_INT_VEC4_ARB 0x8B55 #define GL_BOOL_ARB 0x8B56 #define GL_BOOL_VEC2_ARB 0x8B57 #define GL_BOOL_VEC3_ARB 0x8B58 #define GL_BOOL_VEC4_ARB 0x8B59 #define GL_FLOAT_MAT2_ARB 0x8B5A #define GL_FLOAT_MAT3_ARB 0x8B5B #define GL_FLOAT_MAT4_ARB 0x8B5C #define GL_SAMPLER_1D_ARB 0x8B5D #define GL_SAMPLER_2D_ARB 0x8B5E #define GL_SAMPLER_3D_ARB 0x8B5F #define GL_SAMPLER_CUBE_ARB 0x8B60 #define GL_SAMPLER_1D_SHADOW_ARB 0x8B61 #define GL_SAMPLER_2D_SHADOW_ARB 0x8B62 #define GL_SAMPLER_2D_RECT_ARB 0x8B63 #define GL_SAMPLER_2D_RECT_SHADOW_ARB 0x8B64 #define GL_OBJECT_DELETE_STATUS_ARB 0x8B80 #define GL_OBJECT_COMPILE_STATUS_ARB 0x8B81 #define GL_OBJECT_LINK_STATUS_ARB 0x8B82 #define GL_OBJECT_VALIDATE_STATUS_ARB 0x8B83 #define GL_OBJECT_INFO_LOG_LENGTH_ARB 0x8B84 #define GL_OBJECT_ATTACHED_OBJECTS_ARB 0x8B85 #define GL_OBJECT_ACTIVE_UNIFORMS_ARB 0x8B86 #define GL_OBJECT_ACTIVE_UNIFORM_MAX_LENGTH_ARB 0x8B87 #define GL_OBJECT_SHADER_SOURCE_LENGTH_ARB 0x8B88 #endif #ifndef GL_ARB_vertex_shader #define GL_VERTEX_SHADER_ARB 0x8B31 #define GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB 0x8B4A #define GL_MAX_VARYING_FLOATS_ARB 0x8B4B #define GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB 0x8B4C #define GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS_ARB 0x8B4D #define GL_OBJECT_ACTIVE_ATTRIBUTES_ARB 0x8B89 #define GL_OBJECT_ACTIVE_ATTRIBUTE_MAX_LENGTH_ARB 0x8B8A #endif #ifndef GL_ARB_fragment_shader #define GL_FRAGMENT_SHADER_ARB 0x8B30 #define GL_MAX_FRAGMENT_UNIFORM_COMPONENTS_ARB 0x8B49 #define GL_FRAGMENT_SHADER_DERIVATIVE_HINT_ARB 0x8B8B #endif #ifndef GL_ARB_shading_language_100 #define GL_SHADING_LANGUAGE_VERSION_ARB 0x8B8C #endif #ifndef GL_ARB_texture_non_power_of_two #endif #ifndef GL_ARB_point_sprite #define GL_POINT_SPRITE_ARB 0x8861 #define GL_COORD_REPLACE_ARB 0x8862 #endif #ifndef GL_ARB_fragment_program_shadow #endif #ifndef GL_ARB_draw_buffers #define GL_MAX_DRAW_BUFFERS_ARB 0x8824 #define GL_DRAW_BUFFER0_ARB 0x8825 #define GL_DRAW_BUFFER1_ARB 0x8826 #define GL_DRAW_BUFFER2_ARB 0x8827 #define GL_DRAW_BUFFER3_ARB 0x8828 #define GL_DRAW_BUFFER4_ARB 0x8829 #define GL_DRAW_BUFFER5_ARB 0x882A #define GL_DRAW_BUFFER6_ARB 0x882B #define GL_DRAW_BUFFER7_ARB 0x882C #define GL_DRAW_BUFFER8_ARB 0x882D #define GL_DRAW_BUFFER9_ARB 0x882E #define GL_DRAW_BUFFER10_ARB 0x882F #define GL_DRAW_BUFFER11_ARB 0x8830 #define GL_DRAW_BUFFER12_ARB 0x8831 #define GL_DRAW_BUFFER13_ARB 0x8832 #define GL_DRAW_BUFFER14_ARB 0x8833 #define GL_DRAW_BUFFER15_ARB 0x8834 #endif #ifndef GL_ARB_texture_rectangle #define GL_TEXTURE_RECTANGLE_ARB 0x84F5 #define GL_TEXTURE_BINDING_RECTANGLE_ARB 0x84F6 #define GL_PROXY_TEXTURE_RECTANGLE_ARB 0x84F7 #define GL_MAX_RECTANGLE_TEXTURE_SIZE_ARB 0x84F8 #endif #ifndef GL_ARB_color_buffer_float #define GL_RGBA_FLOAT_MODE_ARB 0x8820 #define GL_CLAMP_VERTEX_COLOR_ARB 0x891A #define GL_CLAMP_FRAGMENT_COLOR_ARB 0x891B #define GL_CLAMP_READ_COLOR_ARB 0x891C #define GL_FIXED_ONLY_ARB 0x891D #endif #ifndef GL_ARB_half_float_pixel #define GL_HALF_FLOAT_ARB 0x140B #endif #ifndef GL_ARB_texture_float #define GL_TEXTURE_RED_TYPE_ARB 0x8C10 #define GL_TEXTURE_GREEN_TYPE_ARB 0x8C11 #define GL_TEXTURE_BLUE_TYPE_ARB 0x8C12 #define GL_TEXTURE_ALPHA_TYPE_ARB 0x8C13 #define GL_TEXTURE_LUMINANCE_TYPE_ARB 0x8C14 #define GL_TEXTURE_INTENSITY_TYPE_ARB 0x8C15 #define GL_TEXTURE_DEPTH_TYPE_ARB 0x8C16 #define GL_UNSIGNED_NORMALIZED_ARB 0x8C17 #define GL_RGBA32F_ARB 0x8814 #define GL_RGB32F_ARB 0x8815 #define GL_ALPHA32F_ARB 0x8816 #define GL_INTENSITY32F_ARB 0x8817 #define GL_LUMINANCE32F_ARB 0x8818 #define GL_LUMINANCE_ALPHA32F_ARB 0x8819 #define GL_RGBA16F_ARB 0x881A #define GL_RGB16F_ARB 0x881B #define GL_ALPHA16F_ARB 0x881C #define GL_INTENSITY16F_ARB 0x881D #define GL_LUMINANCE16F_ARB 0x881E #define GL_LUMINANCE_ALPHA16F_ARB 0x881F #endif #ifndef GL_ARB_pixel_buffer_object #define GL_PIXEL_PACK_BUFFER_ARB 0x88EB #define GL_PIXEL_UNPACK_BUFFER_ARB 0x88EC #define GL_PIXEL_PACK_BUFFER_BINDING_ARB 0x88ED #define GL_PIXEL_UNPACK_BUFFER_BINDING_ARB 0x88EF #endif #ifndef GL_ARB_depth_buffer_float #define GL_DEPTH_COMPONENT32F 0x8CAC #define GL_DEPTH32F_STENCIL8 0x8CAD #define GL_FLOAT_32_UNSIGNED_INT_24_8_REV 0x8DAD #endif #ifndef GL_ARB_draw_instanced #endif #ifndef GL_ARB_framebuffer_object #define GL_INVALID_FRAMEBUFFER_OPERATION 0x0506 #define GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING 0x8210 #define GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE 0x8211 #define GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE 0x8212 #define GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE 0x8213 #define GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE 0x8214 #define GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE 0x8215 #define GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE 0x8216 #define GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE 0x8217 #define GL_FRAMEBUFFER_DEFAULT 0x8218 #define GL_FRAMEBUFFER_UNDEFINED 0x8219 #define GL_DEPTH_STENCIL_ATTACHMENT 0x821A #define GL_MAX_RENDERBUFFER_SIZE 0x84E8 #define GL_DEPTH_STENCIL 0x84F9 #define GL_UNSIGNED_INT_24_8 0x84FA #define GL_DEPTH24_STENCIL8 0x88F0 #define GL_TEXTURE_STENCIL_SIZE 0x88F1 #define GL_TEXTURE_RED_TYPE 0x8C10 #define GL_TEXTURE_GREEN_TYPE 0x8C11 #define GL_TEXTURE_BLUE_TYPE 0x8C12 #define GL_TEXTURE_ALPHA_TYPE 0x8C13 #define GL_TEXTURE_DEPTH_TYPE 0x8C16 #define GL_UNSIGNED_NORMALIZED 0x8C17 #define GL_FRAMEBUFFER_BINDING 0x8CA6 #define GL_DRAW_FRAMEBUFFER_BINDING GL_FRAMEBUFFER_BINDING #define GL_RENDERBUFFER_BINDING 0x8CA7 #define GL_READ_FRAMEBUFFER 0x8CA8 #define GL_DRAW_FRAMEBUFFER 0x8CA9 #define GL_READ_FRAMEBUFFER_BINDING 0x8CAA #define GL_RENDERBUFFER_SAMPLES 0x8CAB #define GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE 0x8CD0 #define GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME 0x8CD1 #define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LEVEL 0x8CD2 #define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_CUBE_MAP_FACE 0x8CD3 #define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER 0x8CD4 #define GL_FRAMEBUFFER_COMPLETE 0x8CD5 #define GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT 0x8CD6 #define GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT 0x8CD7 #define GL_FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER 0x8CDB #define GL_FRAMEBUFFER_INCOMPLETE_READ_BUFFER 0x8CDC #define GL_FRAMEBUFFER_UNSUPPORTED 0x8CDD #define GL_MAX_COLOR_ATTACHMENTS 0x8CDF #define GL_COLOR_ATTACHMENT0 0x8CE0 #define GL_COLOR_ATTACHMENT1 0x8CE1 #define GL_COLOR_ATTACHMENT2 0x8CE2 #define GL_COLOR_ATTACHMENT3 0x8CE3 #define GL_COLOR_ATTACHMENT4 0x8CE4 #define GL_COLOR_ATTACHMENT5 0x8CE5 #define GL_COLOR_ATTACHMENT6 0x8CE6 #define GL_COLOR_ATTACHMENT7 0x8CE7 #define GL_COLOR_ATTACHMENT8 0x8CE8 #define GL_COLOR_ATTACHMENT9 0x8CE9 #define GL_COLOR_ATTACHMENT10 0x8CEA #define GL_COLOR_ATTACHMENT11 0x8CEB #define GL_COLOR_ATTACHMENT12 0x8CEC #define GL_COLOR_ATTACHMENT13 0x8CED #define GL_COLOR_ATTACHMENT14 0x8CEE #define GL_COLOR_ATTACHMENT15 0x8CEF #define GL_DEPTH_ATTACHMENT 0x8D00 #define GL_STENCIL_ATTACHMENT 0x8D20 #define GL_FRAMEBUFFER 0x8D40 #define GL_RENDERBUFFER 0x8D41 #define GL_RENDERBUFFER_WIDTH 0x8D42 #define GL_RENDERBUFFER_HEIGHT 0x8D43 #define GL_RENDERBUFFER_INTERNAL_FORMAT 0x8D44 #define GL_STENCIL_INDEX1 0x8D46 #define GL_STENCIL_INDEX4 0x8D47 #define GL_STENCIL_INDEX8 0x8D48 #define GL_STENCIL_INDEX16 0x8D49 #define GL_RENDERBUFFER_RED_SIZE 0x8D50 #define GL_RENDERBUFFER_GREEN_SIZE 0x8D51 #define GL_RENDERBUFFER_BLUE_SIZE 0x8D52 #define GL_RENDERBUFFER_ALPHA_SIZE 0x8D53 #define GL_RENDERBUFFER_DEPTH_SIZE 0x8D54 #define GL_RENDERBUFFER_STENCIL_SIZE 0x8D55 #define GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE 0x8D56 #define GL_MAX_SAMPLES 0x8D57 #endif #ifndef GL_ARB_framebuffer_object_DEPRECATED #define GL_INDEX 0x8222 #define GL_TEXTURE_LUMINANCE_TYPE 0x8C14 #define GL_TEXTURE_INTENSITY_TYPE 0x8C15 #endif #ifndef GL_ARB_framebuffer_sRGB #define GL_FRAMEBUFFER_SRGB 0x8DB9 #endif #ifndef GL_ARB_geometry_shader4 #define GL_LINES_ADJACENCY_ARB 0x000A #define GL_LINE_STRIP_ADJACENCY_ARB 0x000B #define GL_TRIANGLES_ADJACENCY_ARB 0x000C #define GL_TRIANGLE_STRIP_ADJACENCY_ARB 0x000D #define GL_PROGRAM_POINT_SIZE_ARB 0x8642 #define GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS_ARB 0x8C29 #define GL_FRAMEBUFFER_ATTACHMENT_LAYERED_ARB 0x8DA7 #define GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS_ARB 0x8DA8 #define GL_FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_ARB 0x8DA9 #define GL_GEOMETRY_SHADER_ARB 0x8DD9 #define GL_GEOMETRY_VERTICES_OUT_ARB 0x8DDA #define GL_GEOMETRY_INPUT_TYPE_ARB 0x8DDB #define GL_GEOMETRY_OUTPUT_TYPE_ARB 0x8DDC #define GL_MAX_GEOMETRY_VARYING_COMPONENTS_ARB 0x8DDD #define GL_MAX_VERTEX_VARYING_COMPONENTS_ARB 0x8DDE #define GL_MAX_GEOMETRY_UNIFORM_COMPONENTS_ARB 0x8DDF #define GL_MAX_GEOMETRY_OUTPUT_VERTICES_ARB 0x8DE0 #define GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS_ARB 0x8DE1 /* reuse GL_MAX_VARYING_COMPONENTS */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER */ #endif #ifndef GL_ARB_half_float_vertex #define GL_HALF_FLOAT 0x140B #endif #ifndef GL_ARB_instanced_arrays #define GL_VERTEX_ATTRIB_ARRAY_DIVISOR_ARB 0x88FE #endif #ifndef GL_ARB_map_buffer_range #define GL_MAP_READ_BIT 0x0001 #define GL_MAP_WRITE_BIT 0x0002 #define GL_MAP_INVALIDATE_RANGE_BIT 0x0004 #define GL_MAP_INVALIDATE_BUFFER_BIT 0x0008 #define GL_MAP_FLUSH_EXPLICIT_BIT 0x0010 #define GL_MAP_UNSYNCHRONIZED_BIT 0x0020 #endif #ifndef GL_ARB_texture_buffer_object #define GL_TEXTURE_BUFFER_ARB 0x8C2A #define GL_MAX_TEXTURE_BUFFER_SIZE_ARB 0x8C2B #define GL_TEXTURE_BINDING_BUFFER_ARB 0x8C2C #define GL_TEXTURE_BUFFER_DATA_STORE_BINDING_ARB 0x8C2D #define GL_TEXTURE_BUFFER_FORMAT_ARB 0x8C2E #endif #ifndef GL_ARB_texture_compression_rgtc #define GL_COMPRESSED_RED_RGTC1 0x8DBB #define GL_COMPRESSED_SIGNED_RED_RGTC1 0x8DBC #define GL_COMPRESSED_RG_RGTC2 0x8DBD #define GL_COMPRESSED_SIGNED_RG_RGTC2 0x8DBE #endif #ifndef GL_ARB_texture_rg #define GL_RG 0x8227 #define GL_RG_INTEGER 0x8228 #define GL_R8 0x8229 #define GL_R16 0x822A #define GL_RG8 0x822B #define GL_RG16 0x822C #define GL_R16F 0x822D #define GL_R32F 0x822E #define GL_RG16F 0x822F #define GL_RG32F 0x8230 #define GL_R8I 0x8231 #define GL_R8UI 0x8232 #define GL_R16I 0x8233 #define GL_R16UI 0x8234 #define GL_R32I 0x8235 #define GL_R32UI 0x8236 #define GL_RG8I 0x8237 #define GL_RG8UI 0x8238 #define GL_RG16I 0x8239 #define GL_RG16UI 0x823A #define GL_RG32I 0x823B #define GL_RG32UI 0x823C #endif #ifndef GL_ARB_vertex_array_object #define GL_VERTEX_ARRAY_BINDING 0x85B5 #endif #ifndef GL_ARB_uniform_buffer_object #define GL_UNIFORM_BUFFER 0x8A11 #define GL_UNIFORM_BUFFER_BINDING 0x8A28 #define GL_UNIFORM_BUFFER_START 0x8A29 #define GL_UNIFORM_BUFFER_SIZE 0x8A2A #define GL_MAX_VERTEX_UNIFORM_BLOCKS 0x8A2B #define GL_MAX_GEOMETRY_UNIFORM_BLOCKS 0x8A2C #define GL_MAX_FRAGMENT_UNIFORM_BLOCKS 0x8A2D #define GL_MAX_COMBINED_UNIFORM_BLOCKS 0x8A2E #define GL_MAX_UNIFORM_BUFFER_BINDINGS 0x8A2F #define GL_MAX_UNIFORM_BLOCK_SIZE 0x8A30 #define GL_MAX_COMBINED_VERTEX_UNIFORM_COMPONENTS 0x8A31 #define GL_MAX_COMBINED_GEOMETRY_UNIFORM_COMPONENTS 0x8A32 #define GL_MAX_COMBINED_FRAGMENT_UNIFORM_COMPONENTS 0x8A33 #define GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT 0x8A34 #define GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH 0x8A35 #define GL_ACTIVE_UNIFORM_BLOCKS 0x8A36 #define GL_UNIFORM_TYPE 0x8A37 #define GL_UNIFORM_SIZE 0x8A38 #define GL_UNIFORM_NAME_LENGTH 0x8A39 #define GL_UNIFORM_BLOCK_INDEX 0x8A3A #define GL_UNIFORM_OFFSET 0x8A3B #define GL_UNIFORM_ARRAY_STRIDE 0x8A3C #define GL_UNIFORM_MATRIX_STRIDE 0x8A3D #define GL_UNIFORM_IS_ROW_MAJOR 0x8A3E #define GL_UNIFORM_BLOCK_BINDING 0x8A3F #define GL_UNIFORM_BLOCK_DATA_SIZE 0x8A40 #define GL_UNIFORM_BLOCK_NAME_LENGTH 0x8A41 #define GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS 0x8A42 #define GL_UNIFORM_BLOCK_ACTIVE_UNIFORM_INDICES 0x8A43 #define GL_UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER 0x8A44 #define GL_UNIFORM_BLOCK_REFERENCED_BY_GEOMETRY_SHADER 0x8A45 #define GL_UNIFORM_BLOCK_REFERENCED_BY_FRAGMENT_SHADER 0x8A46 #define GL_INVALID_INDEX 0xFFFFFFFFu #endif #ifndef GL_ARB_compatibility /* ARB_compatibility just defines tokens from core 3.0 */ #endif #ifndef GL_ARB_copy_buffer #define GL_COPY_READ_BUFFER 0x8F36 #define GL_COPY_WRITE_BUFFER 0x8F37 #endif #ifndef GL_ARB_shader_texture_lod #endif #ifndef GL_ARB_depth_clamp #define GL_DEPTH_CLAMP 0x864F #endif #ifndef GL_ARB_draw_elements_base_vertex #endif #ifndef GL_ARB_fragment_coord_conventions #endif #ifndef GL_ARB_provoking_vertex #define GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION 0x8E4C #define GL_FIRST_VERTEX_CONVENTION 0x8E4D #define GL_LAST_VERTEX_CONVENTION 0x8E4E #define GL_PROVOKING_VERTEX 0x8E4F #endif #ifndef GL_ARB_seamless_cube_map #define GL_TEXTURE_CUBE_MAP_SEAMLESS 0x884F #endif #ifndef GL_ARB_sync #define GL_MAX_SERVER_WAIT_TIMEOUT 0x9111 #define GL_OBJECT_TYPE 0x9112 #define GL_SYNC_CONDITION 0x9113 #define GL_SYNC_STATUS 0x9114 #define GL_SYNC_FLAGS 0x9115 #define GL_SYNC_FENCE 0x9116 #define GL_SYNC_GPU_COMMANDS_COMPLETE 0x9117 #define GL_UNSIGNALED 0x9118 #define GL_SIGNALED 0x9119 #define GL_ALREADY_SIGNALED 0x911A #define GL_TIMEOUT_EXPIRED 0x911B #define GL_CONDITION_SATISFIED 0x911C #define GL_WAIT_FAILED 0x911D #define GL_SYNC_FLUSH_COMMANDS_BIT 0x00000001 #define GL_TIMEOUT_IGNORED 0xFFFFFFFFFFFFFFFFull #endif #ifndef GL_ARB_texture_multisample #define GL_SAMPLE_POSITION 0x8E50 #define GL_SAMPLE_MASK 0x8E51 #define GL_SAMPLE_MASK_VALUE 0x8E52 #define GL_MAX_SAMPLE_MASK_WORDS 0x8E59 #define GL_TEXTURE_2D_MULTISAMPLE 0x9100 #define GL_PROXY_TEXTURE_2D_MULTISAMPLE 0x9101 #define GL_TEXTURE_2D_MULTISAMPLE_ARRAY 0x9102 #define GL_PROXY_TEXTURE_2D_MULTISAMPLE_ARRAY 0x9103 #define GL_TEXTURE_BINDING_2D_MULTISAMPLE 0x9104 #define GL_TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY 0x9105 #define GL_TEXTURE_SAMPLES 0x9106 #define GL_TEXTURE_FIXED_SAMPLE_LOCATIONS 0x9107 #define GL_SAMPLER_2D_MULTISAMPLE 0x9108 #define GL_INT_SAMPLER_2D_MULTISAMPLE 0x9109 #define GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE 0x910A #define GL_SAMPLER_2D_MULTISAMPLE_ARRAY 0x910B #define GL_INT_SAMPLER_2D_MULTISAMPLE_ARRAY 0x910C #define GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE_ARRAY 0x910D #define GL_MAX_COLOR_TEXTURE_SAMPLES 0x910E #define GL_MAX_DEPTH_TEXTURE_SAMPLES 0x910F #define GL_MAX_INTEGER_SAMPLES 0x9110 #endif #ifndef GL_ARB_vertex_array_bgra /* reuse GL_BGRA */ #endif #ifndef GL_ARB_draw_buffers_blend #endif #ifndef GL_ARB_sample_shading #define GL_SAMPLE_SHADING 0x8C36 #define GL_MIN_SAMPLE_SHADING_VALUE 0x8C37 #endif #ifndef GL_ARB_texture_cube_map_array #define GL_TEXTURE_CUBE_MAP_ARRAY 0x9009 #define GL_TEXTURE_BINDING_CUBE_MAP_ARRAY 0x900A #define GL_PROXY_TEXTURE_CUBE_MAP_ARRAY 0x900B #define GL_SAMPLER_CUBE_MAP_ARRAY 0x900C #define GL_SAMPLER_CUBE_MAP_ARRAY_SHADOW 0x900D #define GL_INT_SAMPLER_CUBE_MAP_ARRAY 0x900E #define GL_UNSIGNED_INT_SAMPLER_CUBE_MAP_ARRAY 0x900F #endif #ifndef GL_ARB_texture_gather #define GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET 0x8E5E #define GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET 0x8E5F #define GL_MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS 0x8F9F #endif #ifndef GL_ARB_texture_query_lod #endif #ifndef GL_EXT_abgr #define GL_ABGR_EXT 0x8000 #endif #ifndef GL_EXT_blend_color #define GL_CONSTANT_COLOR_EXT 0x8001 #define GL_ONE_MINUS_CONSTANT_COLOR_EXT 0x8002 #define GL_CONSTANT_ALPHA_EXT 0x8003 #define GL_ONE_MINUS_CONSTANT_ALPHA_EXT 0x8004 #define GL_BLEND_COLOR_EXT 0x8005 #endif #ifndef GL_EXT_polygon_offset #define GL_POLYGON_OFFSET_EXT 0x8037 #define GL_POLYGON_OFFSET_FACTOR_EXT 0x8038 #define GL_POLYGON_OFFSET_BIAS_EXT 0x8039 #endif #ifndef GL_EXT_texture #define GL_ALPHA4_EXT 0x803B #define GL_ALPHA8_EXT 0x803C #define GL_ALPHA12_EXT 0x803D #define GL_ALPHA16_EXT 0x803E #define GL_LUMINANCE4_EXT 0x803F #define GL_LUMINANCE8_EXT 0x8040 #define GL_LUMINANCE12_EXT 0x8041 #define GL_LUMINANCE16_EXT 0x8042 #define GL_LUMINANCE4_ALPHA4_EXT 0x8043 #define GL_LUMINANCE6_ALPHA2_EXT 0x8044 #define GL_LUMINANCE8_ALPHA8_EXT 0x8045 #define GL_LUMINANCE12_ALPHA4_EXT 0x8046 #define GL_LUMINANCE12_ALPHA12_EXT 0x8047 #define GL_LUMINANCE16_ALPHA16_EXT 0x8048 #define GL_INTENSITY_EXT 0x8049 #define GL_INTENSITY4_EXT 0x804A #define GL_INTENSITY8_EXT 0x804B #define GL_INTENSITY12_EXT 0x804C #define GL_INTENSITY16_EXT 0x804D #define GL_RGB2_EXT 0x804E #define GL_RGB4_EXT 0x804F #define GL_RGB5_EXT 0x8050 #define GL_RGB8_EXT 0x8051 #define GL_RGB10_EXT 0x8052 #define GL_RGB12_EXT 0x8053 #define GL_RGB16_EXT 0x8054 #define GL_RGBA2_EXT 0x8055 #define GL_RGBA4_EXT 0x8056 #define GL_RGB5_A1_EXT 0x8057 #define GL_RGBA8_EXT 0x8058 #define GL_RGB10_A2_EXT 0x8059 #define GL_RGBA12_EXT 0x805A #define GL_RGBA16_EXT 0x805B #define GL_TEXTURE_RED_SIZE_EXT 0x805C #define GL_TEXTURE_GREEN_SIZE_EXT 0x805D #define GL_TEXTURE_BLUE_SIZE_EXT 0x805E #define GL_TEXTURE_ALPHA_SIZE_EXT 0x805F #define GL_TEXTURE_LUMINANCE_SIZE_EXT 0x8060 #define GL_TEXTURE_INTENSITY_SIZE_EXT 0x8061 #define GL_REPLACE_EXT 0x8062 #define GL_PROXY_TEXTURE_1D_EXT 0x8063 #define GL_PROXY_TEXTURE_2D_EXT 0x8064 #define GL_TEXTURE_TOO_LARGE_EXT 0x8065 #endif #ifndef GL_EXT_texture3D #define GL_PACK_SKIP_IMAGES_EXT 0x806B #define GL_PACK_IMAGE_HEIGHT_EXT 0x806C #define GL_UNPACK_SKIP_IMAGES_EXT 0x806D #define GL_UNPACK_IMAGE_HEIGHT_EXT 0x806E #define GL_TEXTURE_3D_EXT 0x806F #define GL_PROXY_TEXTURE_3D_EXT 0x8070 #define GL_TEXTURE_DEPTH_EXT 0x8071 #define GL_TEXTURE_WRAP_R_EXT 0x8072 #define GL_MAX_3D_TEXTURE_SIZE_EXT 0x8073 #endif #ifndef GL_SGIS_texture_filter4 #define GL_FILTER4_SGIS 0x8146 #define GL_TEXTURE_FILTER4_SIZE_SGIS 0x8147 #endif #ifndef GL_EXT_subtexture #endif #ifndef GL_EXT_copy_texture #endif #ifndef GL_EXT_histogram #define GL_HISTOGRAM_EXT 0x8024 #define GL_PROXY_HISTOGRAM_EXT 0x8025 #define GL_HISTOGRAM_WIDTH_EXT 0x8026 #define GL_HISTOGRAM_FORMAT_EXT 0x8027 #define GL_HISTOGRAM_RED_SIZE_EXT 0x8028 #define GL_HISTOGRAM_GREEN_SIZE_EXT 0x8029 #define GL_HISTOGRAM_BLUE_SIZE_EXT 0x802A #define GL_HISTOGRAM_ALPHA_SIZE_EXT 0x802B #define GL_HISTOGRAM_LUMINANCE_SIZE_EXT 0x802C #define GL_HISTOGRAM_SINK_EXT 0x802D #define GL_MINMAX_EXT 0x802E #define GL_MINMAX_FORMAT_EXT 0x802F #define GL_MINMAX_SINK_EXT 0x8030 #define GL_TABLE_TOO_LARGE_EXT 0x8031 #endif #ifndef GL_EXT_convolution #define GL_CONVOLUTION_1D_EXT 0x8010 #define GL_CONVOLUTION_2D_EXT 0x8011 #define GL_SEPARABLE_2D_EXT 0x8012 #define GL_CONVOLUTION_BORDER_MODE_EXT 0x8013 #define GL_CONVOLUTION_FILTER_SCALE_EXT 0x8014 #define GL_CONVOLUTION_FILTER_BIAS_EXT 0x8015 #define GL_REDUCE_EXT 0x8016 #define GL_CONVOLUTION_FORMAT_EXT 0x8017 #define GL_CONVOLUTION_WIDTH_EXT 0x8018 #define GL_CONVOLUTION_HEIGHT_EXT 0x8019 #define GL_MAX_CONVOLUTION_WIDTH_EXT 0x801A #define GL_MAX_CONVOLUTION_HEIGHT_EXT 0x801B #define GL_POST_CONVOLUTION_RED_SCALE_EXT 0x801C #define GL_POST_CONVOLUTION_GREEN_SCALE_EXT 0x801D #define GL_POST_CONVOLUTION_BLUE_SCALE_EXT 0x801E #define GL_POST_CONVOLUTION_ALPHA_SCALE_EXT 0x801F #define GL_POST_CONVOLUTION_RED_BIAS_EXT 0x8020 #define GL_POST_CONVOLUTION_GREEN_BIAS_EXT 0x8021 #define GL_POST_CONVOLUTION_BLUE_BIAS_EXT 0x8022 #define GL_POST_CONVOLUTION_ALPHA_BIAS_EXT 0x8023 #endif #ifndef GL_SGI_color_matrix #define GL_COLOR_MATRIX_SGI 0x80B1 #define GL_COLOR_MATRIX_STACK_DEPTH_SGI 0x80B2 #define GL_MAX_COLOR_MATRIX_STACK_DEPTH_SGI 0x80B3 #define GL_POST_COLOR_MATRIX_RED_SCALE_SGI 0x80B4 #define GL_POST_COLOR_MATRIX_GREEN_SCALE_SGI 0x80B5 #define GL_POST_COLOR_MATRIX_BLUE_SCALE_SGI 0x80B6 #define GL_POST_COLOR_MATRIX_ALPHA_SCALE_SGI 0x80B7 #define GL_POST_COLOR_MATRIX_RED_BIAS_SGI 0x80B8 #define GL_POST_COLOR_MATRIX_GREEN_BIAS_SGI 0x80B9 #define GL_POST_COLOR_MATRIX_BLUE_BIAS_SGI 0x80BA #define GL_POST_COLOR_MATRIX_ALPHA_BIAS_SGI 0x80BB #endif #ifndef GL_SGI_color_table #define GL_COLOR_TABLE_SGI 0x80D0 #define GL_POST_CONVOLUTION_COLOR_TABLE_SGI 0x80D1 #define GL_POST_COLOR_MATRIX_COLOR_TABLE_SGI 0x80D2 #define GL_PROXY_COLOR_TABLE_SGI 0x80D3 #define GL_PROXY_POST_CONVOLUTION_COLOR_TABLE_SGI 0x80D4 #define GL_PROXY_POST_COLOR_MATRIX_COLOR_TABLE_SGI 0x80D5 #define GL_COLOR_TABLE_SCALE_SGI 0x80D6 #define GL_COLOR_TABLE_BIAS_SGI 0x80D7 #define GL_COLOR_TABLE_FORMAT_SGI 0x80D8 #define GL_COLOR_TABLE_WIDTH_SGI 0x80D9 #define GL_COLOR_TABLE_RED_SIZE_SGI 0x80DA #define GL_COLOR_TABLE_GREEN_SIZE_SGI 0x80DB #define GL_COLOR_TABLE_BLUE_SIZE_SGI 0x80DC #define GL_COLOR_TABLE_ALPHA_SIZE_SGI 0x80DD #define GL_COLOR_TABLE_LUMINANCE_SIZE_SGI 0x80DE #define GL_COLOR_TABLE_INTENSITY_SIZE_SGI 0x80DF #endif #ifndef GL_SGIS_pixel_texture #define GL_PIXEL_TEXTURE_SGIS 0x8353 #define GL_PIXEL_FRAGMENT_RGB_SOURCE_SGIS 0x8354 #define GL_PIXEL_FRAGMENT_ALPHA_SOURCE_SGIS 0x8355 #define GL_PIXEL_GROUP_COLOR_SGIS 0x8356 #endif #ifndef GL_SGIX_pixel_texture #define GL_PIXEL_TEX_GEN_SGIX 0x8139 #define GL_PIXEL_TEX_GEN_MODE_SGIX 0x832B #endif #ifndef GL_SGIS_texture4D #define GL_PACK_SKIP_VOLUMES_SGIS 0x8130 #define GL_PACK_IMAGE_DEPTH_SGIS 0x8131 #define GL_UNPACK_SKIP_VOLUMES_SGIS 0x8132 #define GL_UNPACK_IMAGE_DEPTH_SGIS 0x8133 #define GL_TEXTURE_4D_SGIS 0x8134 #define GL_PROXY_TEXTURE_4D_SGIS 0x8135 #define GL_TEXTURE_4DSIZE_SGIS 0x8136 #define GL_TEXTURE_WRAP_Q_SGIS 0x8137 #define GL_MAX_4D_TEXTURE_SIZE_SGIS 0x8138 #define GL_TEXTURE_4D_BINDING_SGIS 0x814F #endif #ifndef GL_SGI_texture_color_table #define GL_TEXTURE_COLOR_TABLE_SGI 0x80BC #define GL_PROXY_TEXTURE_COLOR_TABLE_SGI 0x80BD #endif #ifndef GL_EXT_cmyka #define GL_CMYK_EXT 0x800C #define GL_CMYKA_EXT 0x800D #define GL_PACK_CMYK_HINT_EXT 0x800E #define GL_UNPACK_CMYK_HINT_EXT 0x800F #endif #ifndef GL_EXT_texture_object #define GL_TEXTURE_PRIORITY_EXT 0x8066 #define GL_TEXTURE_RESIDENT_EXT 0x8067 #define GL_TEXTURE_1D_BINDING_EXT 0x8068 #define GL_TEXTURE_2D_BINDING_EXT 0x8069 #define GL_TEXTURE_3D_BINDING_EXT 0x806A #endif #ifndef GL_SGIS_detail_texture #define GL_DETAIL_TEXTURE_2D_SGIS 0x8095 #define GL_DETAIL_TEXTURE_2D_BINDING_SGIS 0x8096 #define GL_LINEAR_DETAIL_SGIS 0x8097 #define GL_LINEAR_DETAIL_ALPHA_SGIS 0x8098 #define GL_LINEAR_DETAIL_COLOR_SGIS 0x8099 #define GL_DETAIL_TEXTURE_LEVEL_SGIS 0x809A #define GL_DETAIL_TEXTURE_MODE_SGIS 0x809B #define GL_DETAIL_TEXTURE_FUNC_POINTS_SGIS 0x809C #endif #ifndef GL_SGIS_sharpen_texture #define GL_LINEAR_SHARPEN_SGIS 0x80AD #define GL_LINEAR_SHARPEN_ALPHA_SGIS 0x80AE #define GL_LINEAR_SHARPEN_COLOR_SGIS 0x80AF #define GL_SHARPEN_TEXTURE_FUNC_POINTS_SGIS 0x80B0 #endif #ifndef GL_EXT_packed_pixels #define GL_UNSIGNED_BYTE_3_3_2_EXT 0x8032 #define GL_UNSIGNED_SHORT_4_4_4_4_EXT 0x8033 #define GL_UNSIGNED_SHORT_5_5_5_1_EXT 0x8034 #define GL_UNSIGNED_INT_8_8_8_8_EXT 0x8035 #define GL_UNSIGNED_INT_10_10_10_2_EXT 0x8036 #endif #ifndef GL_SGIS_texture_lod #define GL_TEXTURE_MIN_LOD_SGIS 0x813A #define GL_TEXTURE_MAX_LOD_SGIS 0x813B #define GL_TEXTURE_BASE_LEVEL_SGIS 0x813C #define GL_TEXTURE_MAX_LEVEL_SGIS 0x813D #endif #ifndef GL_SGIS_multisample #define GL_MULTISAMPLE_SGIS 0x809D #define GL_SAMPLE_ALPHA_TO_MASK_SGIS 0x809E #define GL_SAMPLE_ALPHA_TO_ONE_SGIS 0x809F #define GL_SAMPLE_MASK_SGIS 0x80A0 #define GL_1PASS_SGIS 0x80A1 #define GL_2PASS_0_SGIS 0x80A2 #define GL_2PASS_1_SGIS 0x80A3 #define GL_4PASS_0_SGIS 0x80A4 #define GL_4PASS_1_SGIS 0x80A5 #define GL_4PASS_2_SGIS 0x80A6 #define GL_4PASS_3_SGIS 0x80A7 #define GL_SAMPLE_BUFFERS_SGIS 0x80A8 #define GL_SAMPLES_SGIS 0x80A9 #define GL_SAMPLE_MASK_VALUE_SGIS 0x80AA #define GL_SAMPLE_MASK_INVERT_SGIS 0x80AB #define GL_SAMPLE_PATTERN_SGIS 0x80AC #endif #ifndef GL_EXT_rescale_normal #define GL_RESCALE_NORMAL_EXT 0x803A #endif #ifndef GL_EXT_vertex_array #define GL_VERTEX_ARRAY_EXT 0x8074 #define GL_NORMAL_ARRAY_EXT 0x8075 #define GL_COLOR_ARRAY_EXT 0x8076 #define GL_INDEX_ARRAY_EXT 0x8077 #define GL_TEXTURE_COORD_ARRAY_EXT 0x8078 #define GL_EDGE_FLAG_ARRAY_EXT 0x8079 #define GL_VERTEX_ARRAY_SIZE_EXT 0x807A #define GL_VERTEX_ARRAY_TYPE_EXT 0x807B #define GL_VERTEX_ARRAY_STRIDE_EXT 0x807C #define GL_VERTEX_ARRAY_COUNT_EXT 0x807D #define GL_NORMAL_ARRAY_TYPE_EXT 0x807E #define GL_NORMAL_ARRAY_STRIDE_EXT 0x807F #define GL_NORMAL_ARRAY_COUNT_EXT 0x8080 #define GL_COLOR_ARRAY_SIZE_EXT 0x8081 #define GL_COLOR_ARRAY_TYPE_EXT 0x8082 #define GL_COLOR_ARRAY_STRIDE_EXT 0x8083 #define GL_COLOR_ARRAY_COUNT_EXT 0x8084 #define GL_INDEX_ARRAY_TYPE_EXT 0x8085 #define GL_INDEX_ARRAY_STRIDE_EXT 0x8086 #define GL_INDEX_ARRAY_COUNT_EXT 0x8087 #define GL_TEXTURE_COORD_ARRAY_SIZE_EXT 0x8088 #define GL_TEXTURE_COORD_ARRAY_TYPE_EXT 0x8089 #define GL_TEXTURE_COORD_ARRAY_STRIDE_EXT 0x808A #define GL_TEXTURE_COORD_ARRAY_COUNT_EXT 0x808B #define GL_EDGE_FLAG_ARRAY_STRIDE_EXT 0x808C #define GL_EDGE_FLAG_ARRAY_COUNT_EXT 0x808D #define GL_VERTEX_ARRAY_POINTER_EXT 0x808E #define GL_NORMAL_ARRAY_POINTER_EXT 0x808F #define GL_COLOR_ARRAY_POINTER_EXT 0x8090 #define GL_INDEX_ARRAY_POINTER_EXT 0x8091 #define GL_TEXTURE_COORD_ARRAY_POINTER_EXT 0x8092 #define GL_EDGE_FLAG_ARRAY_POINTER_EXT 0x8093 #endif #ifndef GL_EXT_misc_attribute #endif #ifndef GL_SGIS_generate_mipmap #define GL_GENERATE_MIPMAP_SGIS 0x8191 #define GL_GENERATE_MIPMAP_HINT_SGIS 0x8192 #endif #ifndef GL_SGIX_clipmap #define GL_LINEAR_CLIPMAP_LINEAR_SGIX 0x8170 #define GL_TEXTURE_CLIPMAP_CENTER_SGIX 0x8171 #define GL_TEXTURE_CLIPMAP_FRAME_SGIX 0x8172 #define GL_TEXTURE_CLIPMAP_OFFSET_SGIX 0x8173 #define GL_TEXTURE_CLIPMAP_VIRTUAL_DEPTH_SGIX 0x8174 #define GL_TEXTURE_CLIPMAP_LOD_OFFSET_SGIX 0x8175 #define GL_TEXTURE_CLIPMAP_DEPTH_SGIX 0x8176 #define GL_MAX_CLIPMAP_DEPTH_SGIX 0x8177 #define GL_MAX_CLIPMAP_VIRTUAL_DEPTH_SGIX 0x8178 #define GL_NEAREST_CLIPMAP_NEAREST_SGIX 0x844D #define GL_NEAREST_CLIPMAP_LINEAR_SGIX 0x844E #define GL_LINEAR_CLIPMAP_NEAREST_SGIX 0x844F #endif #ifndef GL_SGIX_shadow #define GL_TEXTURE_COMPARE_SGIX 0x819A #define GL_TEXTURE_COMPARE_OPERATOR_SGIX 0x819B #define GL_TEXTURE_LEQUAL_R_SGIX 0x819C #define GL_TEXTURE_GEQUAL_R_SGIX 0x819D #endif #ifndef GL_SGIS_texture_edge_clamp #define GL_CLAMP_TO_EDGE_SGIS 0x812F #endif #ifndef GL_SGIS_texture_border_clamp #define GL_CLAMP_TO_BORDER_SGIS 0x812D #endif #ifndef GL_EXT_blend_minmax #define GL_FUNC_ADD_EXT 0x8006 #define GL_MIN_EXT 0x8007 #define GL_MAX_EXT 0x8008 #define GL_BLEND_EQUATION_EXT 0x8009 #endif #ifndef GL_EXT_blend_subtract #define GL_FUNC_SUBTRACT_EXT 0x800A #define GL_FUNC_REVERSE_SUBTRACT_EXT 0x800B #endif #ifndef GL_EXT_blend_logic_op #endif #ifndef GL_SGIX_interlace #define GL_INTERLACE_SGIX 0x8094 #endif #ifndef GL_SGIX_pixel_tiles #define GL_PIXEL_TILE_BEST_ALIGNMENT_SGIX 0x813E #define GL_PIXEL_TILE_CACHE_INCREMENT_SGIX 0x813F #define GL_PIXEL_TILE_WIDTH_SGIX 0x8140 #define GL_PIXEL_TILE_HEIGHT_SGIX 0x8141 #define GL_PIXEL_TILE_GRID_WIDTH_SGIX 0x8142 #define GL_PIXEL_TILE_GRID_HEIGHT_SGIX 0x8143 #define GL_PIXEL_TILE_GRID_DEPTH_SGIX 0x8144 #define GL_PIXEL_TILE_CACHE_SIZE_SGIX 0x8145 #endif #ifndef GL_SGIS_texture_select #define GL_DUAL_ALPHA4_SGIS 0x8110 #define GL_DUAL_ALPHA8_SGIS 0x8111 #define GL_DUAL_ALPHA12_SGIS 0x8112 #define GL_DUAL_ALPHA16_SGIS 0x8113 #define GL_DUAL_LUMINANCE4_SGIS 0x8114 #define GL_DUAL_LUMINANCE8_SGIS 0x8115 #define GL_DUAL_LUMINANCE12_SGIS 0x8116 #define GL_DUAL_LUMINANCE16_SGIS 0x8117 #define GL_DUAL_INTENSITY4_SGIS 0x8118 #define GL_DUAL_INTENSITY8_SGIS 0x8119 #define GL_DUAL_INTENSITY12_SGIS 0x811A #define GL_DUAL_INTENSITY16_SGIS 0x811B #define GL_DUAL_LUMINANCE_ALPHA4_SGIS 0x811C #define GL_DUAL_LUMINANCE_ALPHA8_SGIS 0x811D #define GL_QUAD_ALPHA4_SGIS 0x811E #define GL_QUAD_ALPHA8_SGIS 0x811F #define GL_QUAD_LUMINANCE4_SGIS 0x8120 #define GL_QUAD_LUMINANCE8_SGIS 0x8121 #define GL_QUAD_INTENSITY4_SGIS 0x8122 #define GL_QUAD_INTENSITY8_SGIS 0x8123 #define GL_DUAL_TEXTURE_SELECT_SGIS 0x8124 #define GL_QUAD_TEXTURE_SELECT_SGIS 0x8125 #endif #ifndef GL_SGIX_sprite #define GL_SPRITE_SGIX 0x8148 #define GL_SPRITE_MODE_SGIX 0x8149 #define GL_SPRITE_AXIS_SGIX 0x814A #define GL_SPRITE_TRANSLATION_SGIX 0x814B #define GL_SPRITE_AXIAL_SGIX 0x814C #define GL_SPRITE_OBJECT_ALIGNED_SGIX 0x814D #define GL_SPRITE_EYE_ALIGNED_SGIX 0x814E #endif #ifndef GL_SGIX_texture_multi_buffer #define GL_TEXTURE_MULTI_BUFFER_HINT_SGIX 0x812E #endif #ifndef GL_EXT_point_parameters #define GL_POINT_SIZE_MIN_EXT 0x8126 #define GL_POINT_SIZE_MAX_EXT 0x8127 #define GL_POINT_FADE_THRESHOLD_SIZE_EXT 0x8128 #define GL_DISTANCE_ATTENUATION_EXT 0x8129 #endif #ifndef GL_SGIS_point_parameters #define GL_POINT_SIZE_MIN_SGIS 0x8126 #define GL_POINT_SIZE_MAX_SGIS 0x8127 #define GL_POINT_FADE_THRESHOLD_SIZE_SGIS 0x8128 #define GL_DISTANCE_ATTENUATION_SGIS 0x8129 #endif #ifndef GL_SGIX_instruments #define GL_INSTRUMENT_BUFFER_POINTER_SGIX 0x8180 #define GL_INSTRUMENT_MEASUREMENTS_SGIX 0x8181 #endif #ifndef GL_SGIX_texture_scale_bias #define GL_POST_TEXTURE_FILTER_BIAS_SGIX 0x8179 #define GL_POST_TEXTURE_FILTER_SCALE_SGIX 0x817A #define GL_POST_TEXTURE_FILTER_BIAS_RANGE_SGIX 0x817B #define GL_POST_TEXTURE_FILTER_SCALE_RANGE_SGIX 0x817C #endif #ifndef GL_SGIX_framezoom #define GL_FRAMEZOOM_SGIX 0x818B #define GL_FRAMEZOOM_FACTOR_SGIX 0x818C #define GL_MAX_FRAMEZOOM_FACTOR_SGIX 0x818D #endif #ifndef GL_SGIX_tag_sample_buffer #endif #ifndef GL_FfdMaskSGIX #define GL_TEXTURE_DEFORMATION_BIT_SGIX 0x00000001 #define GL_GEOMETRY_DEFORMATION_BIT_SGIX 0x00000002 #endif #ifndef GL_SGIX_polynomial_ffd #define GL_GEOMETRY_DEFORMATION_SGIX 0x8194 #define GL_TEXTURE_DEFORMATION_SGIX 0x8195 #define GL_DEFORMATIONS_MASK_SGIX 0x8196 #define GL_MAX_DEFORMATION_ORDER_SGIX 0x8197 #endif #ifndef GL_SGIX_reference_plane #define GL_REFERENCE_PLANE_SGIX 0x817D #define GL_REFERENCE_PLANE_EQUATION_SGIX 0x817E #endif #ifndef GL_SGIX_flush_raster #endif #ifndef GL_SGIX_depth_texture #define GL_DEPTH_COMPONENT16_SGIX 0x81A5 #define GL_DEPTH_COMPONENT24_SGIX 0x81A6 #define GL_DEPTH_COMPONENT32_SGIX 0x81A7 #endif #ifndef GL_SGIS_fog_function #define GL_FOG_FUNC_SGIS 0x812A #define GL_FOG_FUNC_POINTS_SGIS 0x812B #define GL_MAX_FOG_FUNC_POINTS_SGIS 0x812C #endif #ifndef GL_SGIX_fog_offset #define GL_FOG_OFFSET_SGIX 0x8198 #define GL_FOG_OFFSET_VALUE_SGIX 0x8199 #endif #ifndef GL_HP_image_transform #define GL_IMAGE_SCALE_X_HP 0x8155 #define GL_IMAGE_SCALE_Y_HP 0x8156 #define GL_IMAGE_TRANSLATE_X_HP 0x8157 #define GL_IMAGE_TRANSLATE_Y_HP 0x8158 #define GL_IMAGE_ROTATE_ANGLE_HP 0x8159 #define GL_IMAGE_ROTATE_ORIGIN_X_HP 0x815A #define GL_IMAGE_ROTATE_ORIGIN_Y_HP 0x815B #define GL_IMAGE_MAG_FILTER_HP 0x815C #define GL_IMAGE_MIN_FILTER_HP 0x815D #define GL_IMAGE_CUBIC_WEIGHT_HP 0x815E #define GL_CUBIC_HP 0x815F #define GL_AVERAGE_HP 0x8160 #define GL_IMAGE_TRANSFORM_2D_HP 0x8161 #define GL_POST_IMAGE_TRANSFORM_COLOR_TABLE_HP 0x8162 #define GL_PROXY_POST_IMAGE_TRANSFORM_COLOR_TABLE_HP 0x8163 #endif #ifndef GL_HP_convolution_border_modes #define GL_IGNORE_BORDER_HP 0x8150 #define GL_CONSTANT_BORDER_HP 0x8151 #define GL_REPLICATE_BORDER_HP 0x8153 #define GL_CONVOLUTION_BORDER_COLOR_HP 0x8154 #endif #ifndef GL_INGR_palette_buffer #endif #ifndef GL_SGIX_texture_add_env #define GL_TEXTURE_ENV_BIAS_SGIX 0x80BE #endif #ifndef GL_EXT_color_subtable #endif #ifndef GL_PGI_vertex_hints #define GL_VERTEX_DATA_HINT_PGI 0x1A22A #define GL_VERTEX_CONSISTENT_HINT_PGI 0x1A22B #define GL_MATERIAL_SIDE_HINT_PGI 0x1A22C #define GL_MAX_VERTEX_HINT_PGI 0x1A22D #define GL_COLOR3_BIT_PGI 0x00010000 #define GL_COLOR4_BIT_PGI 0x00020000 #define GL_EDGEFLAG_BIT_PGI 0x00040000 #define GL_INDEX_BIT_PGI 0x00080000 #define GL_MAT_AMBIENT_BIT_PGI 0x00100000 #define GL_MAT_AMBIENT_AND_DIFFUSE_BIT_PGI 0x00200000 #define GL_MAT_DIFFUSE_BIT_PGI 0x00400000 #define GL_MAT_EMISSION_BIT_PGI 0x00800000 #define GL_MAT_COLOR_INDEXES_BIT_PGI 0x01000000 #define GL_MAT_SHININESS_BIT_PGI 0x02000000 #define GL_MAT_SPECULAR_BIT_PGI 0x04000000 #define GL_NORMAL_BIT_PGI 0x08000000 #define GL_TEXCOORD1_BIT_PGI 0x10000000 #define GL_TEXCOORD2_BIT_PGI 0x20000000 #define GL_TEXCOORD3_BIT_PGI 0x40000000 #define GL_TEXCOORD4_BIT_PGI 0x80000000 #define GL_VERTEX23_BIT_PGI 0x00000004 #define GL_VERTEX4_BIT_PGI 0x00000008 #endif #ifndef GL_PGI_misc_hints #define GL_PREFER_DOUBLEBUFFER_HINT_PGI 0x1A1F8 #define GL_CONSERVE_MEMORY_HINT_PGI 0x1A1FD #define GL_RECLAIM_MEMORY_HINT_PGI 0x1A1FE #define GL_NATIVE_GRAPHICS_HANDLE_PGI 0x1A202 #define GL_NATIVE_GRAPHICS_BEGIN_HINT_PGI 0x1A203 #define GL_NATIVE_GRAPHICS_END_HINT_PGI 0x1A204 #define GL_ALWAYS_FAST_HINT_PGI 0x1A20C #define GL_ALWAYS_SOFT_HINT_PGI 0x1A20D #define GL_ALLOW_DRAW_OBJ_HINT_PGI 0x1A20E #define GL_ALLOW_DRAW_WIN_HINT_PGI 0x1A20F #define GL_ALLOW_DRAW_FRG_HINT_PGI 0x1A210 #define GL_ALLOW_DRAW_MEM_HINT_PGI 0x1A211 #define GL_STRICT_DEPTHFUNC_HINT_PGI 0x1A216 #define GL_STRICT_LIGHTING_HINT_PGI 0x1A217 #define GL_STRICT_SCISSOR_HINT_PGI 0x1A218 #define GL_FULL_STIPPLE_HINT_PGI 0x1A219 #define GL_CLIP_NEAR_HINT_PGI 0x1A220 #define GL_CLIP_FAR_HINT_PGI 0x1A221 #define GL_WIDE_LINE_HINT_PGI 0x1A222 #define GL_BACK_NORMALS_HINT_PGI 0x1A223 #endif #ifndef GL_EXT_paletted_texture #define GL_COLOR_INDEX1_EXT 0x80E2 #define GL_COLOR_INDEX2_EXT 0x80E3 #define GL_COLOR_INDEX4_EXT 0x80E4 #define GL_COLOR_INDEX8_EXT 0x80E5 #define GL_COLOR_INDEX12_EXT 0x80E6 #define GL_COLOR_INDEX16_EXT 0x80E7 #define GL_TEXTURE_INDEX_SIZE_EXT 0x80ED #endif #ifndef GL_EXT_clip_volume_hint #define GL_CLIP_VOLUME_CLIPPING_HINT_EXT 0x80F0 #endif #ifndef GL_SGIX_list_priority #define GL_LIST_PRIORITY_SGIX 0x8182 #endif #ifndef GL_SGIX_ir_instrument1 #define GL_IR_INSTRUMENT1_SGIX 0x817F #endif #ifndef GL_SGIX_calligraphic_fragment #define GL_CALLIGRAPHIC_FRAGMENT_SGIX 0x8183 #endif #ifndef GL_SGIX_texture_lod_bias #define GL_TEXTURE_LOD_BIAS_S_SGIX 0x818E #define GL_TEXTURE_LOD_BIAS_T_SGIX 0x818F #define GL_TEXTURE_LOD_BIAS_R_SGIX 0x8190 #endif #ifndef GL_SGIX_shadow_ambient #define GL_SHADOW_AMBIENT_SGIX 0x80BF #endif #ifndef GL_EXT_index_texture #endif #ifndef GL_EXT_index_material #define GL_INDEX_MATERIAL_EXT 0x81B8 #define GL_INDEX_MATERIAL_PARAMETER_EXT 0x81B9 #define GL_INDEX_MATERIAL_FACE_EXT 0x81BA #endif #ifndef GL_EXT_index_func #define GL_INDEX_TEST_EXT 0x81B5 #define GL_INDEX_TEST_FUNC_EXT 0x81B6 #define GL_INDEX_TEST_REF_EXT 0x81B7 #endif #ifndef GL_EXT_index_array_formats #define GL_IUI_V2F_EXT 0x81AD #define GL_IUI_V3F_EXT 0x81AE #define GL_IUI_N3F_V2F_EXT 0x81AF #define GL_IUI_N3F_V3F_EXT 0x81B0 #define GL_T2F_IUI_V2F_EXT 0x81B1 #define GL_T2F_IUI_V3F_EXT 0x81B2 #define GL_T2F_IUI_N3F_V2F_EXT 0x81B3 #define GL_T2F_IUI_N3F_V3F_EXT 0x81B4 #endif #ifndef GL_EXT_compiled_vertex_array #define GL_ARRAY_ELEMENT_LOCK_FIRST_EXT 0x81A8 #define GL_ARRAY_ELEMENT_LOCK_COUNT_EXT 0x81A9 #endif #ifndef GL_EXT_cull_vertex #define GL_CULL_VERTEX_EXT 0x81AA #define GL_CULL_VERTEX_EYE_POSITION_EXT 0x81AB #define GL_CULL_VERTEX_OBJECT_POSITION_EXT 0x81AC #endif #ifndef GL_SGIX_ycrcb #define GL_YCRCB_422_SGIX 0x81BB #define GL_YCRCB_444_SGIX 0x81BC #endif #ifndef GL_SGIX_fragment_lighting #define GL_FRAGMENT_LIGHTING_SGIX 0x8400 #define GL_FRAGMENT_COLOR_MATERIAL_SGIX 0x8401 #define GL_FRAGMENT_COLOR_MATERIAL_FACE_SGIX 0x8402 #define GL_FRAGMENT_COLOR_MATERIAL_PARAMETER_SGIX 0x8403 #define GL_MAX_FRAGMENT_LIGHTS_SGIX 0x8404 #define GL_MAX_ACTIVE_LIGHTS_SGIX 0x8405 #define GL_CURRENT_RASTER_NORMAL_SGIX 0x8406 #define GL_LIGHT_ENV_MODE_SGIX 0x8407 #define GL_FRAGMENT_LIGHT_MODEL_LOCAL_VIEWER_SGIX 0x8408 #define GL_FRAGMENT_LIGHT_MODEL_TWO_SIDE_SGIX 0x8409 #define GL_FRAGMENT_LIGHT_MODEL_AMBIENT_SGIX 0x840A #define GL_FRAGMENT_LIGHT_MODEL_NORMAL_INTERPOLATION_SGIX 0x840B #define GL_FRAGMENT_LIGHT0_SGIX 0x840C #define GL_FRAGMENT_LIGHT1_SGIX 0x840D #define GL_FRAGMENT_LIGHT2_SGIX 0x840E #define GL_FRAGMENT_LIGHT3_SGIX 0x840F #define GL_FRAGMENT_LIGHT4_SGIX 0x8410 #define GL_FRAGMENT_LIGHT5_SGIX 0x8411 #define GL_FRAGMENT_LIGHT6_SGIX 0x8412 #define GL_FRAGMENT_LIGHT7_SGIX 0x8413 #endif #ifndef GL_IBM_rasterpos_clip #define GL_RASTER_POSITION_UNCLIPPED_IBM 0x19262 #endif #ifndef GL_HP_texture_lighting #define GL_TEXTURE_LIGHTING_MODE_HP 0x8167 #define GL_TEXTURE_POST_SPECULAR_HP 0x8168 #define GL_TEXTURE_PRE_SPECULAR_HP 0x8169 #endif #ifndef GL_EXT_draw_range_elements #define GL_MAX_ELEMENTS_VERTICES_EXT 0x80E8 #define GL_MAX_ELEMENTS_INDICES_EXT 0x80E9 #endif #ifndef GL_WIN_phong_shading #define GL_PHONG_WIN 0x80EA #define GL_PHONG_HINT_WIN 0x80EB #endif #ifndef GL_WIN_specular_fog #define GL_FOG_SPECULAR_TEXTURE_WIN 0x80EC #endif #ifndef GL_EXT_light_texture #define GL_FRAGMENT_MATERIAL_EXT 0x8349 #define GL_FRAGMENT_NORMAL_EXT 0x834A #define GL_FRAGMENT_COLOR_EXT 0x834C #define GL_ATTENUATION_EXT 0x834D #define GL_SHADOW_ATTENUATION_EXT 0x834E #define GL_TEXTURE_APPLICATION_MODE_EXT 0x834F #define GL_TEXTURE_LIGHT_EXT 0x8350 #define GL_TEXTURE_MATERIAL_FACE_EXT 0x8351 #define GL_TEXTURE_MATERIAL_PARAMETER_EXT 0x8352 /* reuse GL_FRAGMENT_DEPTH_EXT */ #endif #ifndef GL_SGIX_blend_alpha_minmax #define GL_ALPHA_MIN_SGIX 0x8320 #define GL_ALPHA_MAX_SGIX 0x8321 #endif #ifndef GL_SGIX_impact_pixel_texture #define GL_PIXEL_TEX_GEN_Q_CEILING_SGIX 0x8184 #define GL_PIXEL_TEX_GEN_Q_ROUND_SGIX 0x8185 #define GL_PIXEL_TEX_GEN_Q_FLOOR_SGIX 0x8186 #define GL_PIXEL_TEX_GEN_ALPHA_REPLACE_SGIX 0x8187 #define GL_PIXEL_TEX_GEN_ALPHA_NO_REPLACE_SGIX 0x8188 #define GL_PIXEL_TEX_GEN_ALPHA_LS_SGIX 0x8189 #define GL_PIXEL_TEX_GEN_ALPHA_MS_SGIX 0x818A #endif #ifndef GL_EXT_bgra #define GL_BGR_EXT 0x80E0 #define GL_BGRA_EXT 0x80E1 #endif #ifndef GL_SGIX_async #define GL_ASYNC_MARKER_SGIX 0x8329 #endif #ifndef GL_SGIX_async_pixel #define GL_ASYNC_TEX_IMAGE_SGIX 0x835C #define GL_ASYNC_DRAW_PIXELS_SGIX 0x835D #define GL_ASYNC_READ_PIXELS_SGIX 0x835E #define GL_MAX_ASYNC_TEX_IMAGE_SGIX 0x835F #define GL_MAX_ASYNC_DRAW_PIXELS_SGIX 0x8360 #define GL_MAX_ASYNC_READ_PIXELS_SGIX 0x8361 #endif #ifndef GL_SGIX_async_histogram #define GL_ASYNC_HISTOGRAM_SGIX 0x832C #define GL_MAX_ASYNC_HISTOGRAM_SGIX 0x832D #endif #ifndef GL_INTEL_texture_scissor #endif #ifndef GL_INTEL_parallel_arrays #define GL_PARALLEL_ARRAYS_INTEL 0x83F4 #define GL_VERTEX_ARRAY_PARALLEL_POINTERS_INTEL 0x83F5 #define GL_NORMAL_ARRAY_PARALLEL_POINTERS_INTEL 0x83F6 #define GL_COLOR_ARRAY_PARALLEL_POINTERS_INTEL 0x83F7 #define GL_TEXTURE_COORD_ARRAY_PARALLEL_POINTERS_INTEL 0x83F8 #endif #ifndef GL_HP_occlusion_test #define GL_OCCLUSION_TEST_HP 0x8165 #define GL_OCCLUSION_TEST_RESULT_HP 0x8166 #endif #ifndef GL_EXT_pixel_transform #define GL_PIXEL_TRANSFORM_2D_EXT 0x8330 #define GL_PIXEL_MAG_FILTER_EXT 0x8331 #define GL_PIXEL_MIN_FILTER_EXT 0x8332 #define GL_PIXEL_CUBIC_WEIGHT_EXT 0x8333 #define GL_CUBIC_EXT 0x8334 #define GL_AVERAGE_EXT 0x8335 #define GL_PIXEL_TRANSFORM_2D_STACK_DEPTH_EXT 0x8336 #define GL_MAX_PIXEL_TRANSFORM_2D_STACK_DEPTH_EXT 0x8337 #define GL_PIXEL_TRANSFORM_2D_MATRIX_EXT 0x8338 #endif #ifndef GL_EXT_pixel_transform_color_table #endif #ifndef GL_EXT_shared_texture_palette #define GL_SHARED_TEXTURE_PALETTE_EXT 0x81FB #endif #ifndef GL_EXT_separate_specular_color #define GL_LIGHT_MODEL_COLOR_CONTROL_EXT 0x81F8 #define GL_SINGLE_COLOR_EXT 0x81F9 #define GL_SEPARATE_SPECULAR_COLOR_EXT 0x81FA #endif #ifndef GL_EXT_secondary_color #define GL_COLOR_SUM_EXT 0x8458 #define GL_CURRENT_SECONDARY_COLOR_EXT 0x8459 #define GL_SECONDARY_COLOR_ARRAY_SIZE_EXT 0x845A #define GL_SECONDARY_COLOR_ARRAY_TYPE_EXT 0x845B #define GL_SECONDARY_COLOR_ARRAY_STRIDE_EXT 0x845C #define GL_SECONDARY_COLOR_ARRAY_POINTER_EXT 0x845D #define GL_SECONDARY_COLOR_ARRAY_EXT 0x845E #endif #ifndef GL_EXT_texture_perturb_normal #define GL_PERTURB_EXT 0x85AE #define GL_TEXTURE_NORMAL_EXT 0x85AF #endif #ifndef GL_EXT_multi_draw_arrays #endif #ifndef GL_EXT_fog_coord #define GL_FOG_COORDINATE_SOURCE_EXT 0x8450 #define GL_FOG_COORDINATE_EXT 0x8451 #define GL_FRAGMENT_DEPTH_EXT 0x8452 #define GL_CURRENT_FOG_COORDINATE_EXT 0x8453 #define GL_FOG_COORDINATE_ARRAY_TYPE_EXT 0x8454 #define GL_FOG_COORDINATE_ARRAY_STRIDE_EXT 0x8455 #define GL_FOG_COORDINATE_ARRAY_POINTER_EXT 0x8456 #define GL_FOG_COORDINATE_ARRAY_EXT 0x8457 #endif #ifndef GL_REND_screen_coordinates #define GL_SCREEN_COORDINATES_REND 0x8490 #define GL_INVERTED_SCREEN_W_REND 0x8491 #endif #ifndef GL_EXT_coordinate_frame #define GL_TANGENT_ARRAY_EXT 0x8439 #define GL_BINORMAL_ARRAY_EXT 0x843A #define GL_CURRENT_TANGENT_EXT 0x843B #define GL_CURRENT_BINORMAL_EXT 0x843C #define GL_TANGENT_ARRAY_TYPE_EXT 0x843E #define GL_TANGENT_ARRAY_STRIDE_EXT 0x843F #define GL_BINORMAL_ARRAY_TYPE_EXT 0x8440 #define GL_BINORMAL_ARRAY_STRIDE_EXT 0x8441 #define GL_TANGENT_ARRAY_POINTER_EXT 0x8442 #define GL_BINORMAL_ARRAY_POINTER_EXT 0x8443 #define GL_MAP1_TANGENT_EXT 0x8444 #define GL_MAP2_TANGENT_EXT 0x8445 #define GL_MAP1_BINORMAL_EXT 0x8446 #define GL_MAP2_BINORMAL_EXT 0x8447 #endif #ifndef GL_EXT_texture_env_combine #define GL_COMBINE_EXT 0x8570 #define GL_COMBINE_RGB_EXT 0x8571 #define GL_COMBINE_ALPHA_EXT 0x8572 #define GL_RGB_SCALE_EXT 0x8573 #define GL_ADD_SIGNED_EXT 0x8574 #define GL_INTERPOLATE_EXT 0x8575 #define GL_CONSTANT_EXT 0x8576 #define GL_PRIMARY_COLOR_EXT 0x8577 #define GL_PREVIOUS_EXT 0x8578 #define GL_SOURCE0_RGB_EXT 0x8580 #define GL_SOURCE1_RGB_EXT 0x8581 #define GL_SOURCE2_RGB_EXT 0x8582 #define GL_SOURCE0_ALPHA_EXT 0x8588 #define GL_SOURCE1_ALPHA_EXT 0x8589 #define GL_SOURCE2_ALPHA_EXT 0x858A #define GL_OPERAND0_RGB_EXT 0x8590 #define GL_OPERAND1_RGB_EXT 0x8591 #define GL_OPERAND2_RGB_EXT 0x8592 #define GL_OPERAND0_ALPHA_EXT 0x8598 #define GL_OPERAND1_ALPHA_EXT 0x8599 #define GL_OPERAND2_ALPHA_EXT 0x859A #endif #ifndef GL_APPLE_specular_vector #define GL_LIGHT_MODEL_SPECULAR_VECTOR_APPLE 0x85B0 #endif #ifndef GL_APPLE_transform_hint #define GL_TRANSFORM_HINT_APPLE 0x85B1 #endif #ifndef GL_SGIX_fog_scale #define GL_FOG_SCALE_SGIX 0x81FC #define GL_FOG_SCALE_VALUE_SGIX 0x81FD #endif #ifndef GL_SUNX_constant_data #define GL_UNPACK_CONSTANT_DATA_SUNX 0x81D5 #define GL_TEXTURE_CONSTANT_DATA_SUNX 0x81D6 #endif #ifndef GL_SUN_global_alpha #define GL_GLOBAL_ALPHA_SUN 0x81D9 #define GL_GLOBAL_ALPHA_FACTOR_SUN 0x81DA #endif #ifndef GL_SUN_triangle_list #define GL_RESTART_SUN 0x0001 #define GL_REPLACE_MIDDLE_SUN 0x0002 #define GL_REPLACE_OLDEST_SUN 0x0003 #define GL_TRIANGLE_LIST_SUN 0x81D7 #define GL_REPLACEMENT_CODE_SUN 0x81D8 #define GL_REPLACEMENT_CODE_ARRAY_SUN 0x85C0 #define GL_REPLACEMENT_CODE_ARRAY_TYPE_SUN 0x85C1 #define GL_REPLACEMENT_CODE_ARRAY_STRIDE_SUN 0x85C2 #define GL_REPLACEMENT_CODE_ARRAY_POINTER_SUN 0x85C3 #define GL_R1UI_V3F_SUN 0x85C4 #define GL_R1UI_C4UB_V3F_SUN 0x85C5 #define GL_R1UI_C3F_V3F_SUN 0x85C6 #define GL_R1UI_N3F_V3F_SUN 0x85C7 #define GL_R1UI_C4F_N3F_V3F_SUN 0x85C8 #define GL_R1UI_T2F_V3F_SUN 0x85C9 #define GL_R1UI_T2F_N3F_V3F_SUN 0x85CA #define GL_R1UI_T2F_C4F_N3F_V3F_SUN 0x85CB #endif #ifndef GL_SUN_vertex #endif #ifndef GL_EXT_blend_func_separate #define GL_BLEND_DST_RGB_EXT 0x80C8 #define GL_BLEND_SRC_RGB_EXT 0x80C9 #define GL_BLEND_DST_ALPHA_EXT 0x80CA #define GL_BLEND_SRC_ALPHA_EXT 0x80CB #endif #ifndef GL_INGR_color_clamp #define GL_RED_MIN_CLAMP_INGR 0x8560 #define GL_GREEN_MIN_CLAMP_INGR 0x8561 #define GL_BLUE_MIN_CLAMP_INGR 0x8562 #define GL_ALPHA_MIN_CLAMP_INGR 0x8563 #define GL_RED_MAX_CLAMP_INGR 0x8564 #define GL_GREEN_MAX_CLAMP_INGR 0x8565 #define GL_BLUE_MAX_CLAMP_INGR 0x8566 #define GL_ALPHA_MAX_CLAMP_INGR 0x8567 #endif #ifndef GL_INGR_interlace_read #define GL_INTERLACE_READ_INGR 0x8568 #endif #ifndef GL_EXT_stencil_wrap #define GL_INCR_WRAP_EXT 0x8507 #define GL_DECR_WRAP_EXT 0x8508 #endif #ifndef GL_EXT_422_pixels #define GL_422_EXT 0x80CC #define GL_422_REV_EXT 0x80CD #define GL_422_AVERAGE_EXT 0x80CE #define GL_422_REV_AVERAGE_EXT 0x80CF #endif #ifndef GL_NV_texgen_reflection #define GL_NORMAL_MAP_NV 0x8511 #define GL_REFLECTION_MAP_NV 0x8512 #endif #ifndef GL_EXT_texture_cube_map #define GL_NORMAL_MAP_EXT 0x8511 #define GL_REFLECTION_MAP_EXT 0x8512 #define GL_TEXTURE_CUBE_MAP_EXT 0x8513 #define GL_TEXTURE_BINDING_CUBE_MAP_EXT 0x8514 #define GL_TEXTURE_CUBE_MAP_POSITIVE_X_EXT 0x8515 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_X_EXT 0x8516 #define GL_TEXTURE_CUBE_MAP_POSITIVE_Y_EXT 0x8517 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_Y_EXT 0x8518 #define GL_TEXTURE_CUBE_MAP_POSITIVE_Z_EXT 0x8519 #define GL_TEXTURE_CUBE_MAP_NEGATIVE_Z_EXT 0x851A #define GL_PROXY_TEXTURE_CUBE_MAP_EXT 0x851B #define GL_MAX_CUBE_MAP_TEXTURE_SIZE_EXT 0x851C #endif #ifndef GL_SUN_convolution_border_modes #define GL_WRAP_BORDER_SUN 0x81D4 #endif #ifndef GL_EXT_texture_env_add #endif #ifndef GL_EXT_texture_lod_bias #define GL_MAX_TEXTURE_LOD_BIAS_EXT 0x84FD #define GL_TEXTURE_FILTER_CONTROL_EXT 0x8500 #define GL_TEXTURE_LOD_BIAS_EXT 0x8501 #endif #ifndef GL_EXT_texture_filter_anisotropic #define GL_TEXTURE_MAX_ANISOTROPY_EXT 0x84FE #define GL_MAX_TEXTURE_MAX_ANISOTROPY_EXT 0x84FF #endif #ifndef GL_EXT_vertex_weighting #define GL_MODELVIEW0_STACK_DEPTH_EXT GL_MODELVIEW_STACK_DEPTH #define GL_MODELVIEW1_STACK_DEPTH_EXT 0x8502 #define GL_MODELVIEW0_MATRIX_EXT GL_MODELVIEW_MATRIX #define GL_MODELVIEW1_MATRIX_EXT 0x8506 #define GL_VERTEX_WEIGHTING_EXT 0x8509 #define GL_MODELVIEW0_EXT GL_MODELVIEW #define GL_MODELVIEW1_EXT 0x850A #define GL_CURRENT_VERTEX_WEIGHT_EXT 0x850B #define GL_VERTEX_WEIGHT_ARRAY_EXT 0x850C #define GL_VERTEX_WEIGHT_ARRAY_SIZE_EXT 0x850D #define GL_VERTEX_WEIGHT_ARRAY_TYPE_EXT 0x850E #define GL_VERTEX_WEIGHT_ARRAY_STRIDE_EXT 0x850F #define GL_VERTEX_WEIGHT_ARRAY_POINTER_EXT 0x8510 #endif #ifndef GL_NV_light_max_exponent #define GL_MAX_SHININESS_NV 0x8504 #define GL_MAX_SPOT_EXPONENT_NV 0x8505 #endif #ifndef GL_NV_vertex_array_range #define GL_VERTEX_ARRAY_RANGE_NV 0x851D #define GL_VERTEX_ARRAY_RANGE_LENGTH_NV 0x851E #define GL_VERTEX_ARRAY_RANGE_VALID_NV 0x851F #define GL_MAX_VERTEX_ARRAY_RANGE_ELEMENT_NV 0x8520 #define GL_VERTEX_ARRAY_RANGE_POINTER_NV 0x8521 #endif #ifndef GL_NV_register_combiners #define GL_REGISTER_COMBINERS_NV 0x8522 #define GL_VARIABLE_A_NV 0x8523 #define GL_VARIABLE_B_NV 0x8524 #define GL_VARIABLE_C_NV 0x8525 #define GL_VARIABLE_D_NV 0x8526 #define GL_VARIABLE_E_NV 0x8527 #define GL_VARIABLE_F_NV 0x8528 #define GL_VARIABLE_G_NV 0x8529 #define GL_CONSTANT_COLOR0_NV 0x852A #define GL_CONSTANT_COLOR1_NV 0x852B #define GL_PRIMARY_COLOR_NV 0x852C #define GL_SECONDARY_COLOR_NV 0x852D #define GL_SPARE0_NV 0x852E #define GL_SPARE1_NV 0x852F #define GL_DISCARD_NV 0x8530 #define GL_E_TIMES_F_NV 0x8531 #define GL_SPARE0_PLUS_SECONDARY_COLOR_NV 0x8532 #define GL_UNSIGNED_IDENTITY_NV 0x8536 #define GL_UNSIGNED_INVERT_NV 0x8537 #define GL_EXPAND_NORMAL_NV 0x8538 #define GL_EXPAND_NEGATE_NV 0x8539 #define GL_HALF_BIAS_NORMAL_NV 0x853A #define GL_HALF_BIAS_NEGATE_NV 0x853B #define GL_SIGNED_IDENTITY_NV 0x853C #define GL_SIGNED_NEGATE_NV 0x853D #define GL_SCALE_BY_TWO_NV 0x853E #define GL_SCALE_BY_FOUR_NV 0x853F #define GL_SCALE_BY_ONE_HALF_NV 0x8540 #define GL_BIAS_BY_NEGATIVE_ONE_HALF_NV 0x8541 #define GL_COMBINER_INPUT_NV 0x8542 #define GL_COMBINER_MAPPING_NV 0x8543 #define GL_COMBINER_COMPONENT_USAGE_NV 0x8544 #define GL_COMBINER_AB_DOT_PRODUCT_NV 0x8545 #define GL_COMBINER_CD_DOT_PRODUCT_NV 0x8546 #define GL_COMBINER_MUX_SUM_NV 0x8547 #define GL_COMBINER_SCALE_NV 0x8548 #define GL_COMBINER_BIAS_NV 0x8549 #define GL_COMBINER_AB_OUTPUT_NV 0x854A #define GL_COMBINER_CD_OUTPUT_NV 0x854B #define GL_COMBINER_SUM_OUTPUT_NV 0x854C #define GL_MAX_GENERAL_COMBINERS_NV 0x854D #define GL_NUM_GENERAL_COMBINERS_NV 0x854E #define GL_COLOR_SUM_CLAMP_NV 0x854F #define GL_COMBINER0_NV 0x8550 #define GL_COMBINER1_NV 0x8551 #define GL_COMBINER2_NV 0x8552 #define GL_COMBINER3_NV 0x8553 #define GL_COMBINER4_NV 0x8554 #define GL_COMBINER5_NV 0x8555 #define GL_COMBINER6_NV 0x8556 #define GL_COMBINER7_NV 0x8557 /* reuse GL_TEXTURE0_ARB */ /* reuse GL_TEXTURE1_ARB */ /* reuse GL_ZERO */ /* reuse GL_NONE */ /* reuse GL_FOG */ #endif #ifndef GL_NV_fog_distance #define GL_FOG_DISTANCE_MODE_NV 0x855A #define GL_EYE_RADIAL_NV 0x855B #define GL_EYE_PLANE_ABSOLUTE_NV 0x855C /* reuse GL_EYE_PLANE */ #endif #ifndef GL_NV_texgen_emboss #define GL_EMBOSS_LIGHT_NV 0x855D #define GL_EMBOSS_CONSTANT_NV 0x855E #define GL_EMBOSS_MAP_NV 0x855F #endif #ifndef GL_NV_blend_square #endif #ifndef GL_NV_texture_env_combine4 #define GL_COMBINE4_NV 0x8503 #define GL_SOURCE3_RGB_NV 0x8583 #define GL_SOURCE3_ALPHA_NV 0x858B #define GL_OPERAND3_RGB_NV 0x8593 #define GL_OPERAND3_ALPHA_NV 0x859B #endif #ifndef GL_MESA_resize_buffers #endif #ifndef GL_MESA_window_pos #endif #ifndef GL_EXT_texture_compression_s3tc #define GL_COMPRESSED_RGB_S3TC_DXT1_EXT 0x83F0 #define GL_COMPRESSED_RGBA_S3TC_DXT1_EXT 0x83F1 #define GL_COMPRESSED_RGBA_S3TC_DXT3_EXT 0x83F2 #define GL_COMPRESSED_RGBA_S3TC_DXT5_EXT 0x83F3 #endif #ifndef GL_IBM_cull_vertex #define GL_CULL_VERTEX_IBM 103050 #endif #ifndef GL_IBM_multimode_draw_arrays #endif #ifndef GL_IBM_vertex_array_lists #define GL_VERTEX_ARRAY_LIST_IBM 103070 #define GL_NORMAL_ARRAY_LIST_IBM 103071 #define GL_COLOR_ARRAY_LIST_IBM 103072 #define GL_INDEX_ARRAY_LIST_IBM 103073 #define GL_TEXTURE_COORD_ARRAY_LIST_IBM 103074 #define GL_EDGE_FLAG_ARRAY_LIST_IBM 103075 #define GL_FOG_COORDINATE_ARRAY_LIST_IBM 103076 #define GL_SECONDARY_COLOR_ARRAY_LIST_IBM 103077 #define GL_VERTEX_ARRAY_LIST_STRIDE_IBM 103080 #define GL_NORMAL_ARRAY_LIST_STRIDE_IBM 103081 #define GL_COLOR_ARRAY_LIST_STRIDE_IBM 103082 #define GL_INDEX_ARRAY_LIST_STRIDE_IBM 103083 #define GL_TEXTURE_COORD_ARRAY_LIST_STRIDE_IBM 103084 #define GL_EDGE_FLAG_ARRAY_LIST_STRIDE_IBM 103085 #define GL_FOG_COORDINATE_ARRAY_LIST_STRIDE_IBM 103086 #define GL_SECONDARY_COLOR_ARRAY_LIST_STRIDE_IBM 103087 #endif #ifndef GL_SGIX_subsample #define GL_PACK_SUBSAMPLE_RATE_SGIX 0x85A0 #define GL_UNPACK_SUBSAMPLE_RATE_SGIX 0x85A1 #define GL_PIXEL_SUBSAMPLE_4444_SGIX 0x85A2 #define GL_PIXEL_SUBSAMPLE_2424_SGIX 0x85A3 #define GL_PIXEL_SUBSAMPLE_4242_SGIX 0x85A4 #endif #ifndef GL_SGIX_ycrcb_subsample #endif #ifndef GL_SGIX_ycrcba #define GL_YCRCB_SGIX 0x8318 #define GL_YCRCBA_SGIX 0x8319 #endif #ifndef GL_SGI_depth_pass_instrument #define GL_DEPTH_PASS_INSTRUMENT_SGIX 0x8310 #define GL_DEPTH_PASS_INSTRUMENT_COUNTERS_SGIX 0x8311 #define GL_DEPTH_PASS_INSTRUMENT_MAX_SGIX 0x8312 #endif #ifndef GL_3DFX_texture_compression_FXT1 #define GL_COMPRESSED_RGB_FXT1_3DFX 0x86B0 #define GL_COMPRESSED_RGBA_FXT1_3DFX 0x86B1 #endif #ifndef GL_3DFX_multisample #define GL_MULTISAMPLE_3DFX 0x86B2 #define GL_SAMPLE_BUFFERS_3DFX 0x86B3 #define GL_SAMPLES_3DFX 0x86B4 #define GL_MULTISAMPLE_BIT_3DFX 0x20000000 #endif #ifndef GL_3DFX_tbuffer #endif #ifndef GL_EXT_multisample #define GL_MULTISAMPLE_EXT 0x809D #define GL_SAMPLE_ALPHA_TO_MASK_EXT 0x809E #define GL_SAMPLE_ALPHA_TO_ONE_EXT 0x809F #define GL_SAMPLE_MASK_EXT 0x80A0 #define GL_1PASS_EXT 0x80A1 #define GL_2PASS_0_EXT 0x80A2 #define GL_2PASS_1_EXT 0x80A3 #define GL_4PASS_0_EXT 0x80A4 #define GL_4PASS_1_EXT 0x80A5 #define GL_4PASS_2_EXT 0x80A6 #define GL_4PASS_3_EXT 0x80A7 #define GL_SAMPLE_BUFFERS_EXT 0x80A8 #define GL_SAMPLES_EXT 0x80A9 #define GL_SAMPLE_MASK_VALUE_EXT 0x80AA #define GL_SAMPLE_MASK_INVERT_EXT 0x80AB #define GL_SAMPLE_PATTERN_EXT 0x80AC #define GL_MULTISAMPLE_BIT_EXT 0x20000000 #endif #ifndef GL_SGIX_vertex_preclip #define GL_VERTEX_PRECLIP_SGIX 0x83EE #define GL_VERTEX_PRECLIP_HINT_SGIX 0x83EF #endif #ifndef GL_SGIX_convolution_accuracy #define GL_CONVOLUTION_HINT_SGIX 0x8316 #endif #ifndef GL_SGIX_resample #define GL_PACK_RESAMPLE_SGIX 0x842C #define GL_UNPACK_RESAMPLE_SGIX 0x842D #define GL_RESAMPLE_REPLICATE_SGIX 0x842E #define GL_RESAMPLE_ZERO_FILL_SGIX 0x842F #define GL_RESAMPLE_DECIMATE_SGIX 0x8430 #endif #ifndef GL_SGIS_point_line_texgen #define GL_EYE_DISTANCE_TO_POINT_SGIS 0x81F0 #define GL_OBJECT_DISTANCE_TO_POINT_SGIS 0x81F1 #define GL_EYE_DISTANCE_TO_LINE_SGIS 0x81F2 #define GL_OBJECT_DISTANCE_TO_LINE_SGIS 0x81F3 #define GL_EYE_POINT_SGIS 0x81F4 #define GL_OBJECT_POINT_SGIS 0x81F5 #define GL_EYE_LINE_SGIS 0x81F6 #define GL_OBJECT_LINE_SGIS 0x81F7 #endif #ifndef GL_SGIS_texture_color_mask #define GL_TEXTURE_COLOR_WRITEMASK_SGIS 0x81EF #endif #ifndef GL_EXT_texture_env_dot3 #define GL_DOT3_RGB_EXT 0x8740 #define GL_DOT3_RGBA_EXT 0x8741 #endif #ifndef GL_ATI_texture_mirror_once #define GL_MIRROR_CLAMP_ATI 0x8742 #define GL_MIRROR_CLAMP_TO_EDGE_ATI 0x8743 #endif #ifndef GL_NV_fence #define GL_ALL_COMPLETED_NV 0x84F2 #define GL_FENCE_STATUS_NV 0x84F3 #define GL_FENCE_CONDITION_NV 0x84F4 #endif #ifndef GL_IBM_texture_mirrored_repeat #define GL_MIRRORED_REPEAT_IBM 0x8370 #endif #ifndef GL_NV_evaluators #define GL_EVAL_2D_NV 0x86C0 #define GL_EVAL_TRIANGULAR_2D_NV 0x86C1 #define GL_MAP_TESSELLATION_NV 0x86C2 #define GL_MAP_ATTRIB_U_ORDER_NV 0x86C3 #define GL_MAP_ATTRIB_V_ORDER_NV 0x86C4 #define GL_EVAL_FRACTIONAL_TESSELLATION_NV 0x86C5 #define GL_EVAL_VERTEX_ATTRIB0_NV 0x86C6 #define GL_EVAL_VERTEX_ATTRIB1_NV 0x86C7 #define GL_EVAL_VERTEX_ATTRIB2_NV 0x86C8 #define GL_EVAL_VERTEX_ATTRIB3_NV 0x86C9 #define GL_EVAL_VERTEX_ATTRIB4_NV 0x86CA #define GL_EVAL_VERTEX_ATTRIB5_NV 0x86CB #define GL_EVAL_VERTEX_ATTRIB6_NV 0x86CC #define GL_EVAL_VERTEX_ATTRIB7_NV 0x86CD #define GL_EVAL_VERTEX_ATTRIB8_NV 0x86CE #define GL_EVAL_VERTEX_ATTRIB9_NV 0x86CF #define GL_EVAL_VERTEX_ATTRIB10_NV 0x86D0 #define GL_EVAL_VERTEX_ATTRIB11_NV 0x86D1 #define GL_EVAL_VERTEX_ATTRIB12_NV 0x86D2 #define GL_EVAL_VERTEX_ATTRIB13_NV 0x86D3 #define GL_EVAL_VERTEX_ATTRIB14_NV 0x86D4 #define GL_EVAL_VERTEX_ATTRIB15_NV 0x86D5 #define GL_MAX_MAP_TESSELLATION_NV 0x86D6 #define GL_MAX_RATIONAL_EVAL_ORDER_NV 0x86D7 #endif #ifndef GL_NV_packed_depth_stencil #define GL_DEPTH_STENCIL_NV 0x84F9 #define GL_UNSIGNED_INT_24_8_NV 0x84FA #endif #ifndef GL_NV_register_combiners2 #define GL_PER_STAGE_CONSTANTS_NV 0x8535 #endif #ifndef GL_NV_texture_compression_vtc #endif #ifndef GL_NV_texture_rectangle #define GL_TEXTURE_RECTANGLE_NV 0x84F5 #define GL_TEXTURE_BINDING_RECTANGLE_NV 0x84F6 #define GL_PROXY_TEXTURE_RECTANGLE_NV 0x84F7 #define GL_MAX_RECTANGLE_TEXTURE_SIZE_NV 0x84F8 #endif #ifndef GL_NV_texture_shader #define GL_OFFSET_TEXTURE_RECTANGLE_NV 0x864C #define GL_OFFSET_TEXTURE_RECTANGLE_SCALE_NV 0x864D #define GL_DOT_PRODUCT_TEXTURE_RECTANGLE_NV 0x864E #define GL_RGBA_UNSIGNED_DOT_PRODUCT_MAPPING_NV 0x86D9 #define GL_UNSIGNED_INT_S8_S8_8_8_NV 0x86DA #define GL_UNSIGNED_INT_8_8_S8_S8_REV_NV 0x86DB #define GL_DSDT_MAG_INTENSITY_NV 0x86DC #define GL_SHADER_CONSISTENT_NV 0x86DD #define GL_TEXTURE_SHADER_NV 0x86DE #define GL_SHADER_OPERATION_NV 0x86DF #define GL_CULL_MODES_NV 0x86E0 #define GL_OFFSET_TEXTURE_MATRIX_NV 0x86E1 #define GL_OFFSET_TEXTURE_SCALE_NV 0x86E2 #define GL_OFFSET_TEXTURE_BIAS_NV 0x86E3 #define GL_OFFSET_TEXTURE_2D_MATRIX_NV GL_OFFSET_TEXTURE_MATRIX_NV #define GL_OFFSET_TEXTURE_2D_SCALE_NV GL_OFFSET_TEXTURE_SCALE_NV #define GL_OFFSET_TEXTURE_2D_BIAS_NV GL_OFFSET_TEXTURE_BIAS_NV #define GL_PREVIOUS_TEXTURE_INPUT_NV 0x86E4 #define GL_CONST_EYE_NV 0x86E5 #define GL_PASS_THROUGH_NV 0x86E6 #define GL_CULL_FRAGMENT_NV 0x86E7 #define GL_OFFSET_TEXTURE_2D_NV 0x86E8 #define GL_DEPENDENT_AR_TEXTURE_2D_NV 0x86E9 #define GL_DEPENDENT_GB_TEXTURE_2D_NV 0x86EA #define GL_DOT_PRODUCT_NV 0x86EC #define GL_DOT_PRODUCT_DEPTH_REPLACE_NV 0x86ED #define GL_DOT_PRODUCT_TEXTURE_2D_NV 0x86EE #define GL_DOT_PRODUCT_TEXTURE_CUBE_MAP_NV 0x86F0 #define GL_DOT_PRODUCT_DIFFUSE_CUBE_MAP_NV 0x86F1 #define GL_DOT_PRODUCT_REFLECT_CUBE_MAP_NV 0x86F2 #define GL_DOT_PRODUCT_CONST_EYE_REFLECT_CUBE_MAP_NV 0x86F3 #define GL_HILO_NV 0x86F4 #define GL_DSDT_NV 0x86F5 #define GL_DSDT_MAG_NV 0x86F6 #define GL_DSDT_MAG_VIB_NV 0x86F7 #define GL_HILO16_NV 0x86F8 #define GL_SIGNED_HILO_NV 0x86F9 #define GL_SIGNED_HILO16_NV 0x86FA #define GL_SIGNED_RGBA_NV 0x86FB #define GL_SIGNED_RGBA8_NV 0x86FC #define GL_SIGNED_RGB_NV 0x86FE #define GL_SIGNED_RGB8_NV 0x86FF #define GL_SIGNED_LUMINANCE_NV 0x8701 #define GL_SIGNED_LUMINANCE8_NV 0x8702 #define GL_SIGNED_LUMINANCE_ALPHA_NV 0x8703 #define GL_SIGNED_LUMINANCE8_ALPHA8_NV 0x8704 #define GL_SIGNED_ALPHA_NV 0x8705 #define GL_SIGNED_ALPHA8_NV 0x8706 #define GL_SIGNED_INTENSITY_NV 0x8707 #define GL_SIGNED_INTENSITY8_NV 0x8708 #define GL_DSDT8_NV 0x8709 #define GL_DSDT8_MAG8_NV 0x870A #define GL_DSDT8_MAG8_INTENSITY8_NV 0x870B #define GL_SIGNED_RGB_UNSIGNED_ALPHA_NV 0x870C #define GL_SIGNED_RGB8_UNSIGNED_ALPHA8_NV 0x870D #define GL_HI_SCALE_NV 0x870E #define GL_LO_SCALE_NV 0x870F #define GL_DS_SCALE_NV 0x8710 #define GL_DT_SCALE_NV 0x8711 #define GL_MAGNITUDE_SCALE_NV 0x8712 #define GL_VIBRANCE_SCALE_NV 0x8713 #define GL_HI_BIAS_NV 0x8714 #define GL_LO_BIAS_NV 0x8715 #define GL_DS_BIAS_NV 0x8716 #define GL_DT_BIAS_NV 0x8717 #define GL_MAGNITUDE_BIAS_NV 0x8718 #define GL_VIBRANCE_BIAS_NV 0x8719 #define GL_TEXTURE_BORDER_VALUES_NV 0x871A #define GL_TEXTURE_HI_SIZE_NV 0x871B #define GL_TEXTURE_LO_SIZE_NV 0x871C #define GL_TEXTURE_DS_SIZE_NV 0x871D #define GL_TEXTURE_DT_SIZE_NV 0x871E #define GL_TEXTURE_MAG_SIZE_NV 0x871F #endif #ifndef GL_NV_texture_shader2 #define GL_DOT_PRODUCT_TEXTURE_3D_NV 0x86EF #endif #ifndef GL_NV_vertex_array_range2 #define GL_VERTEX_ARRAY_RANGE_WITHOUT_FLUSH_NV 0x8533 #endif #ifndef GL_NV_vertex_program #define GL_VERTEX_PROGRAM_NV 0x8620 #define GL_VERTEX_STATE_PROGRAM_NV 0x8621 #define GL_ATTRIB_ARRAY_SIZE_NV 0x8623 #define GL_ATTRIB_ARRAY_STRIDE_NV 0x8624 #define GL_ATTRIB_ARRAY_TYPE_NV 0x8625 #define GL_CURRENT_ATTRIB_NV 0x8626 #define GL_PROGRAM_LENGTH_NV 0x8627 #define GL_PROGRAM_STRING_NV 0x8628 #define GL_MODELVIEW_PROJECTION_NV 0x8629 #define GL_IDENTITY_NV 0x862A #define GL_INVERSE_NV 0x862B #define GL_TRANSPOSE_NV 0x862C #define GL_INVERSE_TRANSPOSE_NV 0x862D #define GL_MAX_TRACK_MATRIX_STACK_DEPTH_NV 0x862E #define GL_MAX_TRACK_MATRICES_NV 0x862F #define GL_MATRIX0_NV 0x8630 #define GL_MATRIX1_NV 0x8631 #define GL_MATRIX2_NV 0x8632 #define GL_MATRIX3_NV 0x8633 #define GL_MATRIX4_NV 0x8634 #define GL_MATRIX5_NV 0x8635 #define GL_MATRIX6_NV 0x8636 #define GL_MATRIX7_NV 0x8637 #define GL_CURRENT_MATRIX_STACK_DEPTH_NV 0x8640 #define GL_CURRENT_MATRIX_NV 0x8641 #define GL_VERTEX_PROGRAM_POINT_SIZE_NV 0x8642 #define GL_VERTEX_PROGRAM_TWO_SIDE_NV 0x8643 #define GL_PROGRAM_PARAMETER_NV 0x8644 #define GL_ATTRIB_ARRAY_POINTER_NV 0x8645 #define GL_PROGRAM_TARGET_NV 0x8646 #define GL_PROGRAM_RESIDENT_NV 0x8647 #define GL_TRACK_MATRIX_NV 0x8648 #define GL_TRACK_MATRIX_TRANSFORM_NV 0x8649 #define GL_VERTEX_PROGRAM_BINDING_NV 0x864A #define GL_PROGRAM_ERROR_POSITION_NV 0x864B #define GL_VERTEX_ATTRIB_ARRAY0_NV 0x8650 #define GL_VERTEX_ATTRIB_ARRAY1_NV 0x8651 #define GL_VERTEX_ATTRIB_ARRAY2_NV 0x8652 #define GL_VERTEX_ATTRIB_ARRAY3_NV 0x8653 #define GL_VERTEX_ATTRIB_ARRAY4_NV 0x8654 #define GL_VERTEX_ATTRIB_ARRAY5_NV 0x8655 #define GL_VERTEX_ATTRIB_ARRAY6_NV 0x8656 #define GL_VERTEX_ATTRIB_ARRAY7_NV 0x8657 #define GL_VERTEX_ATTRIB_ARRAY8_NV 0x8658 #define GL_VERTEX_ATTRIB_ARRAY9_NV 0x8659 #define GL_VERTEX_ATTRIB_ARRAY10_NV 0x865A #define GL_VERTEX_ATTRIB_ARRAY11_NV 0x865B #define GL_VERTEX_ATTRIB_ARRAY12_NV 0x865C #define GL_VERTEX_ATTRIB_ARRAY13_NV 0x865D #define GL_VERTEX_ATTRIB_ARRAY14_NV 0x865E #define GL_VERTEX_ATTRIB_ARRAY15_NV 0x865F #define GL_MAP1_VERTEX_ATTRIB0_4_NV 0x8660 #define GL_MAP1_VERTEX_ATTRIB1_4_NV 0x8661 #define GL_MAP1_VERTEX_ATTRIB2_4_NV 0x8662 #define GL_MAP1_VERTEX_ATTRIB3_4_NV 0x8663 #define GL_MAP1_VERTEX_ATTRIB4_4_NV 0x8664 #define GL_MAP1_VERTEX_ATTRIB5_4_NV 0x8665 #define GL_MAP1_VERTEX_ATTRIB6_4_NV 0x8666 #define GL_MAP1_VERTEX_ATTRIB7_4_NV 0x8667 #define GL_MAP1_VERTEX_ATTRIB8_4_NV 0x8668 #define GL_MAP1_VERTEX_ATTRIB9_4_NV 0x8669 #define GL_MAP1_VERTEX_ATTRIB10_4_NV 0x866A #define GL_MAP1_VERTEX_ATTRIB11_4_NV 0x866B #define GL_MAP1_VERTEX_ATTRIB12_4_NV 0x866C #define GL_MAP1_VERTEX_ATTRIB13_4_NV 0x866D #define GL_MAP1_VERTEX_ATTRIB14_4_NV 0x866E #define GL_MAP1_VERTEX_ATTRIB15_4_NV 0x866F #define GL_MAP2_VERTEX_ATTRIB0_4_NV 0x8670 #define GL_MAP2_VERTEX_ATTRIB1_4_NV 0x8671 #define GL_MAP2_VERTEX_ATTRIB2_4_NV 0x8672 #define GL_MAP2_VERTEX_ATTRIB3_4_NV 0x8673 #define GL_MAP2_VERTEX_ATTRIB4_4_NV 0x8674 #define GL_MAP2_VERTEX_ATTRIB5_4_NV 0x8675 #define GL_MAP2_VERTEX_ATTRIB6_4_NV 0x8676 #define GL_MAP2_VERTEX_ATTRIB7_4_NV 0x8677 #define GL_MAP2_VERTEX_ATTRIB8_4_NV 0x8678 #define GL_MAP2_VERTEX_ATTRIB9_4_NV 0x8679 #define GL_MAP2_VERTEX_ATTRIB10_4_NV 0x867A #define GL_MAP2_VERTEX_ATTRIB11_4_NV 0x867B #define GL_MAP2_VERTEX_ATTRIB12_4_NV 0x867C #define GL_MAP2_VERTEX_ATTRIB13_4_NV 0x867D #define GL_MAP2_VERTEX_ATTRIB14_4_NV 0x867E #define GL_MAP2_VERTEX_ATTRIB15_4_NV 0x867F #endif #ifndef GL_SGIX_texture_coordinate_clamp #define GL_TEXTURE_MAX_CLAMP_S_SGIX 0x8369 #define GL_TEXTURE_MAX_CLAMP_T_SGIX 0x836A #define GL_TEXTURE_MAX_CLAMP_R_SGIX 0x836B #endif #ifndef GL_SGIX_scalebias_hint #define GL_SCALEBIAS_HINT_SGIX 0x8322 #endif #ifndef GL_OML_interlace #define GL_INTERLACE_OML 0x8980 #define GL_INTERLACE_READ_OML 0x8981 #endif #ifndef GL_OML_subsample #define GL_FORMAT_SUBSAMPLE_24_24_OML 0x8982 #define GL_FORMAT_SUBSAMPLE_244_244_OML 0x8983 #endif #ifndef GL_OML_resample #define GL_PACK_RESAMPLE_OML 0x8984 #define GL_UNPACK_RESAMPLE_OML 0x8985 #define GL_RESAMPLE_REPLICATE_OML 0x8986 #define GL_RESAMPLE_ZERO_FILL_OML 0x8987 #define GL_RESAMPLE_AVERAGE_OML 0x8988 #define GL_RESAMPLE_DECIMATE_OML 0x8989 #endif #ifndef GL_NV_copy_depth_to_color #define GL_DEPTH_STENCIL_TO_RGBA_NV 0x886E #define GL_DEPTH_STENCIL_TO_BGRA_NV 0x886F #endif #ifndef GL_ATI_envmap_bumpmap #define GL_BUMP_ROT_MATRIX_ATI 0x8775 #define GL_BUMP_ROT_MATRIX_SIZE_ATI 0x8776 #define GL_BUMP_NUM_TEX_UNITS_ATI 0x8777 #define GL_BUMP_TEX_UNITS_ATI 0x8778 #define GL_DUDV_ATI 0x8779 #define GL_DU8DV8_ATI 0x877A #define GL_BUMP_ENVMAP_ATI 0x877B #define GL_BUMP_TARGET_ATI 0x877C #endif #ifndef GL_ATI_fragment_shader #define GL_FRAGMENT_SHADER_ATI 0x8920 #define GL_REG_0_ATI 0x8921 #define GL_REG_1_ATI 0x8922 #define GL_REG_2_ATI 0x8923 #define GL_REG_3_ATI 0x8924 #define GL_REG_4_ATI 0x8925 #define GL_REG_5_ATI 0x8926 #define GL_REG_6_ATI 0x8927 #define GL_REG_7_ATI 0x8928 #define GL_REG_8_ATI 0x8929 #define GL_REG_9_ATI 0x892A #define GL_REG_10_ATI 0x892B #define GL_REG_11_ATI 0x892C #define GL_REG_12_ATI 0x892D #define GL_REG_13_ATI 0x892E #define GL_REG_14_ATI 0x892F #define GL_REG_15_ATI 0x8930 #define GL_REG_16_ATI 0x8931 #define GL_REG_17_ATI 0x8932 #define GL_REG_18_ATI 0x8933 #define GL_REG_19_ATI 0x8934 #define GL_REG_20_ATI 0x8935 #define GL_REG_21_ATI 0x8936 #define GL_REG_22_ATI 0x8937 #define GL_REG_23_ATI 0x8938 #define GL_REG_24_ATI 0x8939 #define GL_REG_25_ATI 0x893A #define GL_REG_26_ATI 0x893B #define GL_REG_27_ATI 0x893C #define GL_REG_28_ATI 0x893D #define GL_REG_29_ATI 0x893E #define GL_REG_30_ATI 0x893F #define GL_REG_31_ATI 0x8940 #define GL_CON_0_ATI 0x8941 #define GL_CON_1_ATI 0x8942 #define GL_CON_2_ATI 0x8943 #define GL_CON_3_ATI 0x8944 #define GL_CON_4_ATI 0x8945 #define GL_CON_5_ATI 0x8946 #define GL_CON_6_ATI 0x8947 #define GL_CON_7_ATI 0x8948 #define GL_CON_8_ATI 0x8949 #define GL_CON_9_ATI 0x894A #define GL_CON_10_ATI 0x894B #define GL_CON_11_ATI 0x894C #define GL_CON_12_ATI 0x894D #define GL_CON_13_ATI 0x894E #define GL_CON_14_ATI 0x894F #define GL_CON_15_ATI 0x8950 #define GL_CON_16_ATI 0x8951 #define GL_CON_17_ATI 0x8952 #define GL_CON_18_ATI 0x8953 #define GL_CON_19_ATI 0x8954 #define GL_CON_20_ATI 0x8955 #define GL_CON_21_ATI 0x8956 #define GL_CON_22_ATI 0x8957 #define GL_CON_23_ATI 0x8958 #define GL_CON_24_ATI 0x8959 #define GL_CON_25_ATI 0x895A #define GL_CON_26_ATI 0x895B #define GL_CON_27_ATI 0x895C #define GL_CON_28_ATI 0x895D #define GL_CON_29_ATI 0x895E #define GL_CON_30_ATI 0x895F #define GL_CON_31_ATI 0x8960 #define GL_MOV_ATI 0x8961 #define GL_ADD_ATI 0x8963 #define GL_MUL_ATI 0x8964 #define GL_SUB_ATI 0x8965 #define GL_DOT3_ATI 0x8966 #define GL_DOT4_ATI 0x8967 #define GL_MAD_ATI 0x8968 #define GL_LERP_ATI 0x8969 #define GL_CND_ATI 0x896A #define GL_CND0_ATI 0x896B #define GL_DOT2_ADD_ATI 0x896C #define GL_SECONDARY_INTERPOLATOR_ATI 0x896D #define GL_NUM_FRAGMENT_REGISTERS_ATI 0x896E #define GL_NUM_FRAGMENT_CONSTANTS_ATI 0x896F #define GL_NUM_PASSES_ATI 0x8970 #define GL_NUM_INSTRUCTIONS_PER_PASS_ATI 0x8971 #define GL_NUM_INSTRUCTIONS_TOTAL_ATI 0x8972 #define GL_NUM_INPUT_INTERPOLATOR_COMPONENTS_ATI 0x8973 #define GL_NUM_LOOPBACK_COMPONENTS_ATI 0x8974 #define GL_COLOR_ALPHA_PAIRING_ATI 0x8975 #define GL_SWIZZLE_STR_ATI 0x8976 #define GL_SWIZZLE_STQ_ATI 0x8977 #define GL_SWIZZLE_STR_DR_ATI 0x8978 #define GL_SWIZZLE_STQ_DQ_ATI 0x8979 #define GL_SWIZZLE_STRQ_ATI 0x897A #define GL_SWIZZLE_STRQ_DQ_ATI 0x897B #define GL_RED_BIT_ATI 0x00000001 #define GL_GREEN_BIT_ATI 0x00000002 #define GL_BLUE_BIT_ATI 0x00000004 #define GL_2X_BIT_ATI 0x00000001 #define GL_4X_BIT_ATI 0x00000002 #define GL_8X_BIT_ATI 0x00000004 #define GL_HALF_BIT_ATI 0x00000008 #define GL_QUARTER_BIT_ATI 0x00000010 #define GL_EIGHTH_BIT_ATI 0x00000020 #define GL_SATURATE_BIT_ATI 0x00000040 #define GL_COMP_BIT_ATI 0x00000002 #define GL_NEGATE_BIT_ATI 0x00000004 #define GL_BIAS_BIT_ATI 0x00000008 #endif #ifndef GL_ATI_pn_triangles #define GL_PN_TRIANGLES_ATI 0x87F0 #define GL_MAX_PN_TRIANGLES_TESSELATION_LEVEL_ATI 0x87F1 #define GL_PN_TRIANGLES_POINT_MODE_ATI 0x87F2 #define GL_PN_TRIANGLES_NORMAL_MODE_ATI 0x87F3 #define GL_PN_TRIANGLES_TESSELATION_LEVEL_ATI 0x87F4 #define GL_PN_TRIANGLES_POINT_MODE_LINEAR_ATI 0x87F5 #define GL_PN_TRIANGLES_POINT_MODE_CUBIC_ATI 0x87F6 #define GL_PN_TRIANGLES_NORMAL_MODE_LINEAR_ATI 0x87F7 #define GL_PN_TRIANGLES_NORMAL_MODE_QUADRATIC_ATI 0x87F8 #endif #ifndef GL_ATI_vertex_array_object #define GL_STATIC_ATI 0x8760 #define GL_DYNAMIC_ATI 0x8761 #define GL_PRESERVE_ATI 0x8762 #define GL_DISCARD_ATI 0x8763 #define GL_OBJECT_BUFFER_SIZE_ATI 0x8764 #define GL_OBJECT_BUFFER_USAGE_ATI 0x8765 #define GL_ARRAY_OBJECT_BUFFER_ATI 0x8766 #define GL_ARRAY_OBJECT_OFFSET_ATI 0x8767 #endif #ifndef GL_EXT_vertex_shader #define GL_VERTEX_SHADER_EXT 0x8780 #define GL_VERTEX_SHADER_BINDING_EXT 0x8781 #define GL_OP_INDEX_EXT 0x8782 #define GL_OP_NEGATE_EXT 0x8783 #define GL_OP_DOT3_EXT 0x8784 #define GL_OP_DOT4_EXT 0x8785 #define GL_OP_MUL_EXT 0x8786 #define GL_OP_ADD_EXT 0x8787 #define GL_OP_MADD_EXT 0x8788 #define GL_OP_FRAC_EXT 0x8789 #define GL_OP_MAX_EXT 0x878A #define GL_OP_MIN_EXT 0x878B #define GL_OP_SET_GE_EXT 0x878C #define GL_OP_SET_LT_EXT 0x878D #define GL_OP_CLAMP_EXT 0x878E #define GL_OP_FLOOR_EXT 0x878F #define GL_OP_ROUND_EXT 0x8790 #define GL_OP_EXP_BASE_2_EXT 0x8791 #define GL_OP_LOG_BASE_2_EXT 0x8792 #define GL_OP_POWER_EXT 0x8793 #define GL_OP_RECIP_EXT 0x8794 #define GL_OP_RECIP_SQRT_EXT 0x8795 #define GL_OP_SUB_EXT 0x8796 #define GL_OP_CROSS_PRODUCT_EXT 0x8797 #define GL_OP_MULTIPLY_MATRIX_EXT 0x8798 #define GL_OP_MOV_EXT 0x8799 #define GL_OUTPUT_VERTEX_EXT 0x879A #define GL_OUTPUT_COLOR0_EXT 0x879B #define GL_OUTPUT_COLOR1_EXT 0x879C #define GL_OUTPUT_TEXTURE_COORD0_EXT 0x879D #define GL_OUTPUT_TEXTURE_COORD1_EXT 0x879E #define GL_OUTPUT_TEXTURE_COORD2_EXT 0x879F #define GL_OUTPUT_TEXTURE_COORD3_EXT 0x87A0 #define GL_OUTPUT_TEXTURE_COORD4_EXT 0x87A1 #define GL_OUTPUT_TEXTURE_COORD5_EXT 0x87A2 #define GL_OUTPUT_TEXTURE_COORD6_EXT 0x87A3 #define GL_OUTPUT_TEXTURE_COORD7_EXT 0x87A4 #define GL_OUTPUT_TEXTURE_COORD8_EXT 0x87A5 #define GL_OUTPUT_TEXTURE_COORD9_EXT 0x87A6 #define GL_OUTPUT_TEXTURE_COORD10_EXT 0x87A7 #define GL_OUTPUT_TEXTURE_COORD11_EXT 0x87A8 #define GL_OUTPUT_TEXTURE_COORD12_EXT 0x87A9 #define GL_OUTPUT_TEXTURE_COORD13_EXT 0x87AA #define GL_OUTPUT_TEXTURE_COORD14_EXT 0x87AB #define GL_OUTPUT_TEXTURE_COORD15_EXT 0x87AC #define GL_OUTPUT_TEXTURE_COORD16_EXT 0x87AD #define GL_OUTPUT_TEXTURE_COORD17_EXT 0x87AE #define GL_OUTPUT_TEXTURE_COORD18_EXT 0x87AF #define GL_OUTPUT_TEXTURE_COORD19_EXT 0x87B0 #define GL_OUTPUT_TEXTURE_COORD20_EXT 0x87B1 #define GL_OUTPUT_TEXTURE_COORD21_EXT 0x87B2 #define GL_OUTPUT_TEXTURE_COORD22_EXT 0x87B3 #define GL_OUTPUT_TEXTURE_COORD23_EXT 0x87B4 #define GL_OUTPUT_TEXTURE_COORD24_EXT 0x87B5 #define GL_OUTPUT_TEXTURE_COORD25_EXT 0x87B6 #define GL_OUTPUT_TEXTURE_COORD26_EXT 0x87B7 #define GL_OUTPUT_TEXTURE_COORD27_EXT 0x87B8 #define GL_OUTPUT_TEXTURE_COORD28_EXT 0x87B9 #define GL_OUTPUT_TEXTURE_COORD29_EXT 0x87BA #define GL_OUTPUT_TEXTURE_COORD30_EXT 0x87BB #define GL_OUTPUT_TEXTURE_COORD31_EXT 0x87BC #define GL_OUTPUT_FOG_EXT 0x87BD #define GL_SCALAR_EXT 0x87BE #define GL_VECTOR_EXT 0x87BF #define GL_MATRIX_EXT 0x87C0 #define GL_VARIANT_EXT 0x87C1 #define GL_INVARIANT_EXT 0x87C2 #define GL_LOCAL_CONSTANT_EXT 0x87C3 #define GL_LOCAL_EXT 0x87C4 #define GL_MAX_VERTEX_SHADER_INSTRUCTIONS_EXT 0x87C5 #define GL_MAX_VERTEX_SHADER_VARIANTS_EXT 0x87C6 #define GL_MAX_VERTEX_SHADER_INVARIANTS_EXT 0x87C7 #define GL_MAX_VERTEX_SHADER_LOCAL_CONSTANTS_EXT 0x87C8 #define GL_MAX_VERTEX_SHADER_LOCALS_EXT 0x87C9 #define GL_MAX_OPTIMIZED_VERTEX_SHADER_INSTRUCTIONS_EXT 0x87CA #define GL_MAX_OPTIMIZED_VERTEX_SHADER_VARIANTS_EXT 0x87CB #define GL_MAX_OPTIMIZED_VERTEX_SHADER_LOCAL_CONSTANTS_EXT 0x87CC #define GL_MAX_OPTIMIZED_VERTEX_SHADER_INVARIANTS_EXT 0x87CD #define GL_MAX_OPTIMIZED_VERTEX_SHADER_LOCALS_EXT 0x87CE #define GL_VERTEX_SHADER_INSTRUCTIONS_EXT 0x87CF #define GL_VERTEX_SHADER_VARIANTS_EXT 0x87D0 #define GL_VERTEX_SHADER_INVARIANTS_EXT 0x87D1 #define GL_VERTEX_SHADER_LOCAL_CONSTANTS_EXT 0x87D2 #define GL_VERTEX_SHADER_LOCALS_EXT 0x87D3 #define GL_VERTEX_SHADER_OPTIMIZED_EXT 0x87D4 #define GL_X_EXT 0x87D5 #define GL_Y_EXT 0x87D6 #define GL_Z_EXT 0x87D7 #define GL_W_EXT 0x87D8 #define GL_NEGATIVE_X_EXT 0x87D9 #define GL_NEGATIVE_Y_EXT 0x87DA #define GL_NEGATIVE_Z_EXT 0x87DB #define GL_NEGATIVE_W_EXT 0x87DC #define GL_ZERO_EXT 0x87DD #define GL_ONE_EXT 0x87DE #define GL_NEGATIVE_ONE_EXT 0x87DF #define GL_NORMALIZED_RANGE_EXT 0x87E0 #define GL_FULL_RANGE_EXT 0x87E1 #define GL_CURRENT_VERTEX_EXT 0x87E2 #define GL_MVP_MATRIX_EXT 0x87E3 #define GL_VARIANT_VALUE_EXT 0x87E4 #define GL_VARIANT_DATATYPE_EXT 0x87E5 #define GL_VARIANT_ARRAY_STRIDE_EXT 0x87E6 #define GL_VARIANT_ARRAY_TYPE_EXT 0x87E7 #define GL_VARIANT_ARRAY_EXT 0x87E8 #define GL_VARIANT_ARRAY_POINTER_EXT 0x87E9 #define GL_INVARIANT_VALUE_EXT 0x87EA #define GL_INVARIANT_DATATYPE_EXT 0x87EB #define GL_LOCAL_CONSTANT_VALUE_EXT 0x87EC #define GL_LOCAL_CONSTANT_DATATYPE_EXT 0x87ED #endif #ifndef GL_ATI_vertex_streams #define GL_MAX_VERTEX_STREAMS_ATI 0x876B #define GL_VERTEX_STREAM0_ATI 0x876C #define GL_VERTEX_STREAM1_ATI 0x876D #define GL_VERTEX_STREAM2_ATI 0x876E #define GL_VERTEX_STREAM3_ATI 0x876F #define GL_VERTEX_STREAM4_ATI 0x8770 #define GL_VERTEX_STREAM5_ATI 0x8771 #define GL_VERTEX_STREAM6_ATI 0x8772 #define GL_VERTEX_STREAM7_ATI 0x8773 #define GL_VERTEX_SOURCE_ATI 0x8774 #endif #ifndef GL_ATI_element_array #define GL_ELEMENT_ARRAY_ATI 0x8768 #define GL_ELEMENT_ARRAY_TYPE_ATI 0x8769 #define GL_ELEMENT_ARRAY_POINTER_ATI 0x876A #endif #ifndef GL_SUN_mesh_array #define GL_QUAD_MESH_SUN 0x8614 #define GL_TRIANGLE_MESH_SUN 0x8615 #endif #ifndef GL_SUN_slice_accum #define GL_SLICE_ACCUM_SUN 0x85CC #endif #ifndef GL_NV_multisample_filter_hint #define GL_MULTISAMPLE_FILTER_HINT_NV 0x8534 #endif #ifndef GL_NV_depth_clamp #define GL_DEPTH_CLAMP_NV 0x864F #endif #ifndef GL_NV_occlusion_query #define GL_PIXEL_COUNTER_BITS_NV 0x8864 #define GL_CURRENT_OCCLUSION_QUERY_ID_NV 0x8865 #define GL_PIXEL_COUNT_NV 0x8866 #define GL_PIXEL_COUNT_AVAILABLE_NV 0x8867 #endif #ifndef GL_NV_point_sprite #define GL_POINT_SPRITE_NV 0x8861 #define GL_COORD_REPLACE_NV 0x8862 #define GL_POINT_SPRITE_R_MODE_NV 0x8863 #endif #ifndef GL_NV_texture_shader3 #define GL_OFFSET_PROJECTIVE_TEXTURE_2D_NV 0x8850 #define GL_OFFSET_PROJECTIVE_TEXTURE_2D_SCALE_NV 0x8851 #define GL_OFFSET_PROJECTIVE_TEXTURE_RECTANGLE_NV 0x8852 #define GL_OFFSET_PROJECTIVE_TEXTURE_RECTANGLE_SCALE_NV 0x8853 #define GL_OFFSET_HILO_TEXTURE_2D_NV 0x8854 #define GL_OFFSET_HILO_TEXTURE_RECTANGLE_NV 0x8855 #define GL_OFFSET_HILO_PROJECTIVE_TEXTURE_2D_NV 0x8856 #define GL_OFFSET_HILO_PROJECTIVE_TEXTURE_RECTANGLE_NV 0x8857 #define GL_DEPENDENT_HILO_TEXTURE_2D_NV 0x8858 #define GL_DEPENDENT_RGB_TEXTURE_3D_NV 0x8859 #define GL_DEPENDENT_RGB_TEXTURE_CUBE_MAP_NV 0x885A #define GL_DOT_PRODUCT_PASS_THROUGH_NV 0x885B #define GL_DOT_PRODUCT_TEXTURE_1D_NV 0x885C #define GL_DOT_PRODUCT_AFFINE_DEPTH_REPLACE_NV 0x885D #define GL_HILO8_NV 0x885E #define GL_SIGNED_HILO8_NV 0x885F #define GL_FORCE_BLUE_TO_ONE_NV 0x8860 #endif #ifndef GL_NV_vertex_program1_1 #endif #ifndef GL_EXT_shadow_funcs #endif #ifndef GL_EXT_stencil_two_side #define GL_STENCIL_TEST_TWO_SIDE_EXT 0x8910 #define GL_ACTIVE_STENCIL_FACE_EXT 0x8911 #endif #ifndef GL_ATI_text_fragment_shader #define GL_TEXT_FRAGMENT_SHADER_ATI 0x8200 #endif #ifndef GL_APPLE_client_storage #define GL_UNPACK_CLIENT_STORAGE_APPLE 0x85B2 #endif #ifndef GL_APPLE_element_array #define GL_ELEMENT_ARRAY_APPLE 0x8768 #define GL_ELEMENT_ARRAY_TYPE_APPLE 0x8769 #define GL_ELEMENT_ARRAY_POINTER_APPLE 0x876A #endif #ifndef GL_APPLE_fence #define GL_DRAW_PIXELS_APPLE 0x8A0A #define GL_FENCE_APPLE 0x8A0B #endif #ifndef GL_APPLE_vertex_array_object #define GL_VERTEX_ARRAY_BINDING_APPLE 0x85B5 #endif #ifndef GL_APPLE_vertex_array_range #define GL_VERTEX_ARRAY_RANGE_APPLE 0x851D #define GL_VERTEX_ARRAY_RANGE_LENGTH_APPLE 0x851E #define GL_VERTEX_ARRAY_STORAGE_HINT_APPLE 0x851F #define GL_VERTEX_ARRAY_RANGE_POINTER_APPLE 0x8521 #define GL_STORAGE_CACHED_APPLE 0x85BE #define GL_STORAGE_SHARED_APPLE 0x85BF #endif #ifndef GL_APPLE_ycbcr_422 #define GL_YCBCR_422_APPLE 0x85B9 #define GL_UNSIGNED_SHORT_8_8_APPLE 0x85BA #define GL_UNSIGNED_SHORT_8_8_REV_APPLE 0x85BB #endif #ifndef GL_S3_s3tc #define GL_RGB_S3TC 0x83A0 #define GL_RGB4_S3TC 0x83A1 #define GL_RGBA_S3TC 0x83A2 #define GL_RGBA4_S3TC 0x83A3 #endif #ifndef GL_ATI_draw_buffers #define GL_MAX_DRAW_BUFFERS_ATI 0x8824 #define GL_DRAW_BUFFER0_ATI 0x8825 #define GL_DRAW_BUFFER1_ATI 0x8826 #define GL_DRAW_BUFFER2_ATI 0x8827 #define GL_DRAW_BUFFER3_ATI 0x8828 #define GL_DRAW_BUFFER4_ATI 0x8829 #define GL_DRAW_BUFFER5_ATI 0x882A #define GL_DRAW_BUFFER6_ATI 0x882B #define GL_DRAW_BUFFER7_ATI 0x882C #define GL_DRAW_BUFFER8_ATI 0x882D #define GL_DRAW_BUFFER9_ATI 0x882E #define GL_DRAW_BUFFER10_ATI 0x882F #define GL_DRAW_BUFFER11_ATI 0x8830 #define GL_DRAW_BUFFER12_ATI 0x8831 #define GL_DRAW_BUFFER13_ATI 0x8832 #define GL_DRAW_BUFFER14_ATI 0x8833 #define GL_DRAW_BUFFER15_ATI 0x8834 #endif #ifndef GL_ATI_pixel_format_float #define GL_TYPE_RGBA_FLOAT_ATI 0x8820 #define GL_COLOR_CLEAR_UNCLAMPED_VALUE_ATI 0x8835 #endif #ifndef GL_ATI_texture_env_combine3 #define GL_MODULATE_ADD_ATI 0x8744 #define GL_MODULATE_SIGNED_ADD_ATI 0x8745 #define GL_MODULATE_SUBTRACT_ATI 0x8746 #endif #ifndef GL_ATI_texture_float #define GL_RGBA_FLOAT32_ATI 0x8814 #define GL_RGB_FLOAT32_ATI 0x8815 #define GL_ALPHA_FLOAT32_ATI 0x8816 #define GL_INTENSITY_FLOAT32_ATI 0x8817 #define GL_LUMINANCE_FLOAT32_ATI 0x8818 #define GL_LUMINANCE_ALPHA_FLOAT32_ATI 0x8819 #define GL_RGBA_FLOAT16_ATI 0x881A #define GL_RGB_FLOAT16_ATI 0x881B #define GL_ALPHA_FLOAT16_ATI 0x881C #define GL_INTENSITY_FLOAT16_ATI 0x881D #define GL_LUMINANCE_FLOAT16_ATI 0x881E #define GL_LUMINANCE_ALPHA_FLOAT16_ATI 0x881F #endif #ifndef GL_NV_float_buffer #define GL_FLOAT_R_NV 0x8880 #define GL_FLOAT_RG_NV 0x8881 #define GL_FLOAT_RGB_NV 0x8882 #define GL_FLOAT_RGBA_NV 0x8883 #define GL_FLOAT_R16_NV 0x8884 #define GL_FLOAT_R32_NV 0x8885 #define GL_FLOAT_RG16_NV 0x8886 #define GL_FLOAT_RG32_NV 0x8887 #define GL_FLOAT_RGB16_NV 0x8888 #define GL_FLOAT_RGB32_NV 0x8889 #define GL_FLOAT_RGBA16_NV 0x888A #define GL_FLOAT_RGBA32_NV 0x888B #define GL_TEXTURE_FLOAT_COMPONENTS_NV 0x888C #define GL_FLOAT_CLEAR_COLOR_VALUE_NV 0x888D #define GL_FLOAT_RGBA_MODE_NV 0x888E #endif #ifndef GL_NV_fragment_program #define GL_MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV 0x8868 #define GL_FRAGMENT_PROGRAM_NV 0x8870 #define GL_MAX_TEXTURE_COORDS_NV 0x8871 #define GL_MAX_TEXTURE_IMAGE_UNITS_NV 0x8872 #define GL_FRAGMENT_PROGRAM_BINDING_NV 0x8873 #define GL_PROGRAM_ERROR_STRING_NV 0x8874 #endif #ifndef GL_NV_half_float #define GL_HALF_FLOAT_NV 0x140B #endif #ifndef GL_NV_pixel_data_range #define GL_WRITE_PIXEL_DATA_RANGE_NV 0x8878 #define GL_READ_PIXEL_DATA_RANGE_NV 0x8879 #define GL_WRITE_PIXEL_DATA_RANGE_LENGTH_NV 0x887A #define GL_READ_PIXEL_DATA_RANGE_LENGTH_NV 0x887B #define GL_WRITE_PIXEL_DATA_RANGE_POINTER_NV 0x887C #define GL_READ_PIXEL_DATA_RANGE_POINTER_NV 0x887D #endif #ifndef GL_NV_primitive_restart #define GL_PRIMITIVE_RESTART_NV 0x8558 #define GL_PRIMITIVE_RESTART_INDEX_NV 0x8559 #endif #ifndef GL_NV_texture_expand_normal #define GL_TEXTURE_UNSIGNED_REMAP_MODE_NV 0x888F #endif #ifndef GL_NV_vertex_program2 #endif #ifndef GL_ATI_map_object_buffer #endif #ifndef GL_ATI_separate_stencil #define GL_STENCIL_BACK_FUNC_ATI 0x8800 #define GL_STENCIL_BACK_FAIL_ATI 0x8801 #define GL_STENCIL_BACK_PASS_DEPTH_FAIL_ATI 0x8802 #define GL_STENCIL_BACK_PASS_DEPTH_PASS_ATI 0x8803 #endif #ifndef GL_ATI_vertex_attrib_array_object #endif #ifndef GL_OES_read_format #define GL_IMPLEMENTATION_COLOR_READ_TYPE_OES 0x8B9A #define GL_IMPLEMENTATION_COLOR_READ_FORMAT_OES 0x8B9B #endif #ifndef GL_EXT_depth_bounds_test #define GL_DEPTH_BOUNDS_TEST_EXT 0x8890 #define GL_DEPTH_BOUNDS_EXT 0x8891 #endif #ifndef GL_EXT_texture_mirror_clamp #define GL_MIRROR_CLAMP_EXT 0x8742 #define GL_MIRROR_CLAMP_TO_EDGE_EXT 0x8743 #define GL_MIRROR_CLAMP_TO_BORDER_EXT 0x8912 #endif #ifndef GL_EXT_blend_equation_separate #define GL_BLEND_EQUATION_RGB_EXT 0x8009 #define GL_BLEND_EQUATION_ALPHA_EXT 0x883D #endif #ifndef GL_MESA_pack_invert #define GL_PACK_INVERT_MESA 0x8758 #endif #ifndef GL_MESA_ycbcr_texture #define GL_UNSIGNED_SHORT_8_8_MESA 0x85BA #define GL_UNSIGNED_SHORT_8_8_REV_MESA 0x85BB #define GL_YCBCR_MESA 0x8757 #endif #ifndef GL_EXT_pixel_buffer_object #define GL_PIXEL_PACK_BUFFER_EXT 0x88EB #define GL_PIXEL_UNPACK_BUFFER_EXT 0x88EC #define GL_PIXEL_PACK_BUFFER_BINDING_EXT 0x88ED #define GL_PIXEL_UNPACK_BUFFER_BINDING_EXT 0x88EF #endif #ifndef GL_NV_fragment_program_option #endif #ifndef GL_NV_fragment_program2 #define GL_MAX_PROGRAM_EXEC_INSTRUCTIONS_NV 0x88F4 #define GL_MAX_PROGRAM_CALL_DEPTH_NV 0x88F5 #define GL_MAX_PROGRAM_IF_DEPTH_NV 0x88F6 #define GL_MAX_PROGRAM_LOOP_DEPTH_NV 0x88F7 #define GL_MAX_PROGRAM_LOOP_COUNT_NV 0x88F8 #endif #ifndef GL_NV_vertex_program2_option /* reuse GL_MAX_PROGRAM_EXEC_INSTRUCTIONS_NV */ /* reuse GL_MAX_PROGRAM_CALL_DEPTH_NV */ #endif #ifndef GL_NV_vertex_program3 /* reuse GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB */ #endif #ifndef GL_EXT_framebuffer_object #define GL_INVALID_FRAMEBUFFER_OPERATION_EXT 0x0506 #define GL_MAX_RENDERBUFFER_SIZE_EXT 0x84E8 #define GL_FRAMEBUFFER_BINDING_EXT 0x8CA6 #define GL_RENDERBUFFER_BINDING_EXT 0x8CA7 #define GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE_EXT 0x8CD0 #define GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME_EXT 0x8CD1 #define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LEVEL_EXT 0x8CD2 #define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_CUBE_MAP_FACE_EXT 0x8CD3 #define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_3D_ZOFFSET_EXT 0x8CD4 #define GL_FRAMEBUFFER_COMPLETE_EXT 0x8CD5 #define GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT_EXT 0x8CD6 #define GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT_EXT 0x8CD7 #define GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS_EXT 0x8CD9 #define GL_FRAMEBUFFER_INCOMPLETE_FORMATS_EXT 0x8CDA #define GL_FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER_EXT 0x8CDB #define GL_FRAMEBUFFER_INCOMPLETE_READ_BUFFER_EXT 0x8CDC #define GL_FRAMEBUFFER_UNSUPPORTED_EXT 0x8CDD #define GL_MAX_COLOR_ATTACHMENTS_EXT 0x8CDF #define GL_COLOR_ATTACHMENT0_EXT 0x8CE0 #define GL_COLOR_ATTACHMENT1_EXT 0x8CE1 #define GL_COLOR_ATTACHMENT2_EXT 0x8CE2 #define GL_COLOR_ATTACHMENT3_EXT 0x8CE3 #define GL_COLOR_ATTACHMENT4_EXT 0x8CE4 #define GL_COLOR_ATTACHMENT5_EXT 0x8CE5 #define GL_COLOR_ATTACHMENT6_EXT 0x8CE6 #define GL_COLOR_ATTACHMENT7_EXT 0x8CE7 #define GL_COLOR_ATTACHMENT8_EXT 0x8CE8 #define GL_COLOR_ATTACHMENT9_EXT 0x8CE9 #define GL_COLOR_ATTACHMENT10_EXT 0x8CEA #define GL_COLOR_ATTACHMENT11_EXT 0x8CEB #define GL_COLOR_ATTACHMENT12_EXT 0x8CEC #define GL_COLOR_ATTACHMENT13_EXT 0x8CED #define GL_COLOR_ATTACHMENT14_EXT 0x8CEE #define GL_COLOR_ATTACHMENT15_EXT 0x8CEF #define GL_DEPTH_ATTACHMENT_EXT 0x8D00 #define GL_STENCIL_ATTACHMENT_EXT 0x8D20 #define GL_FRAMEBUFFER_EXT 0x8D40 #define GL_RENDERBUFFER_EXT 0x8D41 #define GL_RENDERBUFFER_WIDTH_EXT 0x8D42 #define GL_RENDERBUFFER_HEIGHT_EXT 0x8D43 #define GL_RENDERBUFFER_INTERNAL_FORMAT_EXT 0x8D44 #define GL_STENCIL_INDEX1_EXT 0x8D46 #define GL_STENCIL_INDEX4_EXT 0x8D47 #define GL_STENCIL_INDEX8_EXT 0x8D48 #define GL_STENCIL_INDEX16_EXT 0x8D49 #define GL_RENDERBUFFER_RED_SIZE_EXT 0x8D50 #define GL_RENDERBUFFER_GREEN_SIZE_EXT 0x8D51 #define GL_RENDERBUFFER_BLUE_SIZE_EXT 0x8D52 #define GL_RENDERBUFFER_ALPHA_SIZE_EXT 0x8D53 #define GL_RENDERBUFFER_DEPTH_SIZE_EXT 0x8D54 #define GL_RENDERBUFFER_STENCIL_SIZE_EXT 0x8D55 #endif #ifndef GL_GREMEDY_string_marker #endif #ifndef GL_EXT_packed_depth_stencil #define GL_DEPTH_STENCIL_EXT 0x84F9 #define GL_UNSIGNED_INT_24_8_EXT 0x84FA #define GL_DEPTH24_STENCIL8_EXT 0x88F0 #define GL_TEXTURE_STENCIL_SIZE_EXT 0x88F1 #endif #ifndef GL_EXT_stencil_clear_tag #define GL_STENCIL_TAG_BITS_EXT 0x88F2 #define GL_STENCIL_CLEAR_TAG_VALUE_EXT 0x88F3 #endif #ifndef GL_EXT_texture_sRGB #define GL_SRGB_EXT 0x8C40 #define GL_SRGB8_EXT 0x8C41 #define GL_SRGB_ALPHA_EXT 0x8C42 #define GL_SRGB8_ALPHA8_EXT 0x8C43 #define GL_SLUMINANCE_ALPHA_EXT 0x8C44 #define GL_SLUMINANCE8_ALPHA8_EXT 0x8C45 #define GL_SLUMINANCE_EXT 0x8C46 #define GL_SLUMINANCE8_EXT 0x8C47 #define GL_COMPRESSED_SRGB_EXT 0x8C48 #define GL_COMPRESSED_SRGB_ALPHA_EXT 0x8C49 #define GL_COMPRESSED_SLUMINANCE_EXT 0x8C4A #define GL_COMPRESSED_SLUMINANCE_ALPHA_EXT 0x8C4B #define GL_COMPRESSED_SRGB_S3TC_DXT1_EXT 0x8C4C #define GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT 0x8C4D #define GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT 0x8C4E #define GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT 0x8C4F #endif #ifndef GL_EXT_framebuffer_blit #define GL_READ_FRAMEBUFFER_EXT 0x8CA8 #define GL_DRAW_FRAMEBUFFER_EXT 0x8CA9 #define GL_DRAW_FRAMEBUFFER_BINDING_EXT GL_FRAMEBUFFER_BINDING_EXT #define GL_READ_FRAMEBUFFER_BINDING_EXT 0x8CAA #endif #ifndef GL_EXT_framebuffer_multisample #define GL_RENDERBUFFER_SAMPLES_EXT 0x8CAB #define GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE_EXT 0x8D56 #define GL_MAX_SAMPLES_EXT 0x8D57 #endif #ifndef GL_MESAX_texture_stack #define GL_TEXTURE_1D_STACK_MESAX 0x8759 #define GL_TEXTURE_2D_STACK_MESAX 0x875A #define GL_PROXY_TEXTURE_1D_STACK_MESAX 0x875B #define GL_PROXY_TEXTURE_2D_STACK_MESAX 0x875C #define GL_TEXTURE_1D_STACK_BINDING_MESAX 0x875D #define GL_TEXTURE_2D_STACK_BINDING_MESAX 0x875E #endif #ifndef GL_EXT_timer_query #define GL_TIME_ELAPSED_EXT 0x88BF #endif #ifndef GL_EXT_gpu_program_parameters #endif #ifndef GL_APPLE_flush_buffer_range #define GL_BUFFER_SERIALIZED_MODIFY_APPLE 0x8A12 #define GL_BUFFER_FLUSHING_UNMAP_APPLE 0x8A13 #endif #ifndef GL_NV_gpu_program4 #define GL_MIN_PROGRAM_TEXEL_OFFSET_NV 0x8904 #define GL_MAX_PROGRAM_TEXEL_OFFSET_NV 0x8905 #define GL_PROGRAM_ATTRIB_COMPONENTS_NV 0x8906 #define GL_PROGRAM_RESULT_COMPONENTS_NV 0x8907 #define GL_MAX_PROGRAM_ATTRIB_COMPONENTS_NV 0x8908 #define GL_MAX_PROGRAM_RESULT_COMPONENTS_NV 0x8909 #define GL_MAX_PROGRAM_GENERIC_ATTRIBS_NV 0x8DA5 #define GL_MAX_PROGRAM_GENERIC_RESULTS_NV 0x8DA6 #endif #ifndef GL_NV_geometry_program4 #define GL_LINES_ADJACENCY_EXT 0x000A #define GL_LINE_STRIP_ADJACENCY_EXT 0x000B #define GL_TRIANGLES_ADJACENCY_EXT 0x000C #define GL_TRIANGLE_STRIP_ADJACENCY_EXT 0x000D #define GL_GEOMETRY_PROGRAM_NV 0x8C26 #define GL_MAX_PROGRAM_OUTPUT_VERTICES_NV 0x8C27 #define GL_MAX_PROGRAM_TOTAL_OUTPUT_COMPONENTS_NV 0x8C28 #define GL_GEOMETRY_VERTICES_OUT_EXT 0x8DDA #define GL_GEOMETRY_INPUT_TYPE_EXT 0x8DDB #define GL_GEOMETRY_OUTPUT_TYPE_EXT 0x8DDC #define GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS_EXT 0x8C29 #define GL_FRAMEBUFFER_ATTACHMENT_LAYERED_EXT 0x8DA7 #define GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS_EXT 0x8DA8 #define GL_FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_EXT 0x8DA9 #define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER_EXT 0x8CD4 #define GL_PROGRAM_POINT_SIZE_EXT 0x8642 #endif #ifndef GL_EXT_geometry_shader4 #define GL_GEOMETRY_SHADER_EXT 0x8DD9 /* reuse GL_GEOMETRY_VERTICES_OUT_EXT */ /* reuse GL_GEOMETRY_INPUT_TYPE_EXT */ /* reuse GL_GEOMETRY_OUTPUT_TYPE_EXT */ /* reuse GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS_EXT */ #define GL_MAX_GEOMETRY_VARYING_COMPONENTS_EXT 0x8DDD #define GL_MAX_VERTEX_VARYING_COMPONENTS_EXT 0x8DDE #define GL_MAX_VARYING_COMPONENTS_EXT 0x8B4B #define GL_MAX_GEOMETRY_UNIFORM_COMPONENTS_EXT 0x8DDF #define GL_MAX_GEOMETRY_OUTPUT_VERTICES_EXT 0x8DE0 #define GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS_EXT 0x8DE1 /* reuse GL_LINES_ADJACENCY_EXT */ /* reuse GL_LINE_STRIP_ADJACENCY_EXT */ /* reuse GL_TRIANGLES_ADJACENCY_EXT */ /* reuse GL_TRIANGLE_STRIP_ADJACENCY_EXT */ /* reuse GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS_EXT */ /* reuse GL_FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_EXT */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_LAYERED_EXT */ /* reuse GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER_EXT */ /* reuse GL_PROGRAM_POINT_SIZE_EXT */ #endif #ifndef GL_NV_vertex_program4 #define GL_VERTEX_ATTRIB_ARRAY_INTEGER_NV 0x88FD #endif #ifndef GL_EXT_gpu_shader4 #define GL_SAMPLER_1D_ARRAY_EXT 0x8DC0 #define GL_SAMPLER_2D_ARRAY_EXT 0x8DC1 #define GL_SAMPLER_BUFFER_EXT 0x8DC2 #define GL_SAMPLER_1D_ARRAY_SHADOW_EXT 0x8DC3 #define GL_SAMPLER_2D_ARRAY_SHADOW_EXT 0x8DC4 #define GL_SAMPLER_CUBE_SHADOW_EXT 0x8DC5 #define GL_UNSIGNED_INT_VEC2_EXT 0x8DC6 #define GL_UNSIGNED_INT_VEC3_EXT 0x8DC7 #define GL_UNSIGNED_INT_VEC4_EXT 0x8DC8 #define GL_INT_SAMPLER_1D_EXT 0x8DC9 #define GL_INT_SAMPLER_2D_EXT 0x8DCA #define GL_INT_SAMPLER_3D_EXT 0x8DCB #define GL_INT_SAMPLER_CUBE_EXT 0x8DCC #define GL_INT_SAMPLER_2D_RECT_EXT 0x8DCD #define GL_INT_SAMPLER_1D_ARRAY_EXT 0x8DCE #define GL_INT_SAMPLER_2D_ARRAY_EXT 0x8DCF #define GL_INT_SAMPLER_BUFFER_EXT 0x8DD0 #define GL_UNSIGNED_INT_SAMPLER_1D_EXT 0x8DD1 #define GL_UNSIGNED_INT_SAMPLER_2D_EXT 0x8DD2 #define GL_UNSIGNED_INT_SAMPLER_3D_EXT 0x8DD3 #define GL_UNSIGNED_INT_SAMPLER_CUBE_EXT 0x8DD4 #define GL_UNSIGNED_INT_SAMPLER_2D_RECT_EXT 0x8DD5 #define GL_UNSIGNED_INT_SAMPLER_1D_ARRAY_EXT 0x8DD6 #define GL_UNSIGNED_INT_SAMPLER_2D_ARRAY_EXT 0x8DD7 #define GL_UNSIGNED_INT_SAMPLER_BUFFER_EXT 0x8DD8 #endif #ifndef GL_EXT_draw_instanced #endif #ifndef GL_EXT_packed_float #define GL_R11F_G11F_B10F_EXT 0x8C3A #define GL_UNSIGNED_INT_10F_11F_11F_REV_EXT 0x8C3B #define GL_RGBA_SIGNED_COMPONENTS_EXT 0x8C3C #endif #ifndef GL_EXT_texture_array #define GL_TEXTURE_1D_ARRAY_EXT 0x8C18 #define GL_PROXY_TEXTURE_1D_ARRAY_EXT 0x8C19 #define GL_TEXTURE_2D_ARRAY_EXT 0x8C1A #define GL_PROXY_TEXTURE_2D_ARRAY_EXT 0x8C1B #define GL_TEXTURE_BINDING_1D_ARRAY_EXT 0x8C1C #define GL_TEXTURE_BINDING_2D_ARRAY_EXT 0x8C1D #define GL_MAX_ARRAY_TEXTURE_LAYERS_EXT 0x88FF #define GL_COMPARE_REF_DEPTH_TO_TEXTURE_EXT 0x884E /* reuse GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER_EXT */ #endif #ifndef GL_EXT_texture_buffer_object #define GL_TEXTURE_BUFFER_EXT 0x8C2A #define GL_MAX_TEXTURE_BUFFER_SIZE_EXT 0x8C2B #define GL_TEXTURE_BINDING_BUFFER_EXT 0x8C2C #define GL_TEXTURE_BUFFER_DATA_STORE_BINDING_EXT 0x8C2D #define GL_TEXTURE_BUFFER_FORMAT_EXT 0x8C2E #endif #ifndef GL_EXT_texture_compression_latc #define GL_COMPRESSED_LUMINANCE_LATC1_EXT 0x8C70 #define GL_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT 0x8C71 #define GL_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT 0x8C72 #define GL_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT 0x8C73 #endif #ifndef GL_EXT_texture_compression_rgtc #define GL_COMPRESSED_RED_RGTC1_EXT 0x8DBB #define GL_COMPRESSED_SIGNED_RED_RGTC1_EXT 0x8DBC #define GL_COMPRESSED_RED_GREEN_RGTC2_EXT 0x8DBD #define GL_COMPRESSED_SIGNED_RED_GREEN_RGTC2_EXT 0x8DBE #endif #ifndef GL_EXT_texture_shared_exponent #define GL_RGB9_E5_EXT 0x8C3D #define GL_UNSIGNED_INT_5_9_9_9_REV_EXT 0x8C3E #define GL_TEXTURE_SHARED_SIZE_EXT 0x8C3F #endif #ifndef GL_NV_depth_buffer_float #define GL_DEPTH_COMPONENT32F_NV 0x8DAB #define GL_DEPTH32F_STENCIL8_NV 0x8DAC #define GL_FLOAT_32_UNSIGNED_INT_24_8_REV_NV 0x8DAD #define GL_DEPTH_BUFFER_FLOAT_MODE_NV 0x8DAF #endif #ifndef GL_NV_fragment_program4 #endif #ifndef GL_NV_framebuffer_multisample_coverage #define GL_RENDERBUFFER_COVERAGE_SAMPLES_NV 0x8CAB #define GL_RENDERBUFFER_COLOR_SAMPLES_NV 0x8E10 #define GL_MAX_MULTISAMPLE_COVERAGE_MODES_NV 0x8E11 #define GL_MULTISAMPLE_COVERAGE_MODES_NV 0x8E12 #endif #ifndef GL_EXT_framebuffer_sRGB #define GL_FRAMEBUFFER_SRGB_EXT 0x8DB9 #define GL_FRAMEBUFFER_SRGB_CAPABLE_EXT 0x8DBA #endif #ifndef GL_NV_geometry_shader4 #endif #ifndef GL_NV_parameter_buffer_object #define GL_MAX_PROGRAM_PARAMETER_BUFFER_BINDINGS_NV 0x8DA0 #define GL_MAX_PROGRAM_PARAMETER_BUFFER_SIZE_NV 0x8DA1 #define GL_VERTEX_PROGRAM_PARAMETER_BUFFER_NV 0x8DA2 #define GL_GEOMETRY_PROGRAM_PARAMETER_BUFFER_NV 0x8DA3 #define GL_FRAGMENT_PROGRAM_PARAMETER_BUFFER_NV 0x8DA4 #endif #ifndef GL_EXT_draw_buffers2 #endif #ifndef GL_NV_transform_feedback #define GL_BACK_PRIMARY_COLOR_NV 0x8C77 #define GL_BACK_SECONDARY_COLOR_NV 0x8C78 #define GL_TEXTURE_COORD_NV 0x8C79 #define GL_CLIP_DISTANCE_NV 0x8C7A #define GL_VERTEX_ID_NV 0x8C7B #define GL_PRIMITIVE_ID_NV 0x8C7C #define GL_GENERIC_ATTRIB_NV 0x8C7D #define GL_TRANSFORM_FEEDBACK_ATTRIBS_NV 0x8C7E #define GL_TRANSFORM_FEEDBACK_BUFFER_MODE_NV 0x8C7F #define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS_NV 0x8C80 #define GL_ACTIVE_VARYINGS_NV 0x8C81 #define GL_ACTIVE_VARYING_MAX_LENGTH_NV 0x8C82 #define GL_TRANSFORM_FEEDBACK_VARYINGS_NV 0x8C83 #define GL_TRANSFORM_FEEDBACK_BUFFER_START_NV 0x8C84 #define GL_TRANSFORM_FEEDBACK_BUFFER_SIZE_NV 0x8C85 #define GL_TRANSFORM_FEEDBACK_RECORD_NV 0x8C86 #define GL_PRIMITIVES_GENERATED_NV 0x8C87 #define GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN_NV 0x8C88 #define GL_RASTERIZER_DISCARD_NV 0x8C89 #define GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_ATTRIBS_NV 0x8C8A #define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS_NV 0x8C8B #define GL_INTERLEAVED_ATTRIBS_NV 0x8C8C #define GL_SEPARATE_ATTRIBS_NV 0x8C8D #define GL_TRANSFORM_FEEDBACK_BUFFER_NV 0x8C8E #define GL_TRANSFORM_FEEDBACK_BUFFER_BINDING_NV 0x8C8F #endif #ifndef GL_EXT_bindable_uniform #define GL_MAX_VERTEX_BINDABLE_UNIFORMS_EXT 0x8DE2 #define GL_MAX_FRAGMENT_BINDABLE_UNIFORMS_EXT 0x8DE3 #define GL_MAX_GEOMETRY_BINDABLE_UNIFORMS_EXT 0x8DE4 #define GL_MAX_BINDABLE_UNIFORM_SIZE_EXT 0x8DED #define GL_UNIFORM_BUFFER_EXT 0x8DEE #define GL_UNIFORM_BUFFER_BINDING_EXT 0x8DEF #endif #ifndef GL_EXT_texture_integer #define GL_RGBA32UI_EXT 0x8D70 #define GL_RGB32UI_EXT 0x8D71 #define GL_ALPHA32UI_EXT 0x8D72 #define GL_INTENSITY32UI_EXT 0x8D73 #define GL_LUMINANCE32UI_EXT 0x8D74 #define GL_LUMINANCE_ALPHA32UI_EXT 0x8D75 #define GL_RGBA16UI_EXT 0x8D76 #define GL_RGB16UI_EXT 0x8D77 #define GL_ALPHA16UI_EXT 0x8D78 #define GL_INTENSITY16UI_EXT 0x8D79 #define GL_LUMINANCE16UI_EXT 0x8D7A #define GL_LUMINANCE_ALPHA16UI_EXT 0x8D7B #define GL_RGBA8UI_EXT 0x8D7C #define GL_RGB8UI_EXT 0x8D7D #define GL_ALPHA8UI_EXT 0x8D7E #define GL_INTENSITY8UI_EXT 0x8D7F #define GL_LUMINANCE8UI_EXT 0x8D80 #define GL_LUMINANCE_ALPHA8UI_EXT 0x8D81 #define GL_RGBA32I_EXT 0x8D82 #define GL_RGB32I_EXT 0x8D83 #define GL_ALPHA32I_EXT 0x8D84 #define GL_INTENSITY32I_EXT 0x8D85 #define GL_LUMINANCE32I_EXT 0x8D86 #define GL_LUMINANCE_ALPHA32I_EXT 0x8D87 #define GL_RGBA16I_EXT 0x8D88 #define GL_RGB16I_EXT 0x8D89 #define GL_ALPHA16I_EXT 0x8D8A #define GL_INTENSITY16I_EXT 0x8D8B #define GL_LUMINANCE16I_EXT 0x8D8C #define GL_LUMINANCE_ALPHA16I_EXT 0x8D8D #define GL_RGBA8I_EXT 0x8D8E #define GL_RGB8I_EXT 0x8D8F #define GL_ALPHA8I_EXT 0x8D90 #define GL_INTENSITY8I_EXT 0x8D91 #define GL_LUMINANCE8I_EXT 0x8D92 #define GL_LUMINANCE_ALPHA8I_EXT 0x8D93 #define GL_RED_INTEGER_EXT 0x8D94 #define GL_GREEN_INTEGER_EXT 0x8D95 #define GL_BLUE_INTEGER_EXT 0x8D96 #define GL_ALPHA_INTEGER_EXT 0x8D97 #define GL_RGB_INTEGER_EXT 0x8D98 #define GL_RGBA_INTEGER_EXT 0x8D99 #define GL_BGR_INTEGER_EXT 0x8D9A #define GL_BGRA_INTEGER_EXT 0x8D9B #define GL_LUMINANCE_INTEGER_EXT 0x8D9C #define GL_LUMINANCE_ALPHA_INTEGER_EXT 0x8D9D #define GL_RGBA_INTEGER_MODE_EXT 0x8D9E #endif #ifndef GL_GREMEDY_frame_terminator #endif #ifndef GL_NV_conditional_render #define GL_QUERY_WAIT_NV 0x8E13 #define GL_QUERY_NO_WAIT_NV 0x8E14 #define GL_QUERY_BY_REGION_WAIT_NV 0x8E15 #define GL_QUERY_BY_REGION_NO_WAIT_NV 0x8E16 #endif #ifndef GL_NV_present_video #define GL_FRAME_NV 0x8E26 #define GL_FIELDS_NV 0x8E27 #define GL_CURRENT_TIME_NV 0x8E28 #define GL_NUM_FILL_STREAMS_NV 0x8E29 #define GL_PRESENT_TIME_NV 0x8E2A #define GL_PRESENT_DURATION_NV 0x8E2B #endif #ifndef GL_EXT_transform_feedback #define GL_TRANSFORM_FEEDBACK_BUFFER_EXT 0x8C8E #define GL_TRANSFORM_FEEDBACK_BUFFER_START_EXT 0x8C84 #define GL_TRANSFORM_FEEDBACK_BUFFER_SIZE_EXT 0x8C85 #define GL_TRANSFORM_FEEDBACK_BUFFER_BINDING_EXT 0x8C8F #define GL_INTERLEAVED_ATTRIBS_EXT 0x8C8C #define GL_SEPARATE_ATTRIBS_EXT 0x8C8D #define GL_PRIMITIVES_GENERATED_EXT 0x8C87 #define GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN_EXT 0x8C88 #define GL_RASTERIZER_DISCARD_EXT 0x8C89 #define GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS_EXT 0x8C8A #define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS_EXT 0x8C8B #define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS_EXT 0x8C80 #define GL_TRANSFORM_FEEDBACK_VARYINGS_EXT 0x8C83 #define GL_TRANSFORM_FEEDBACK_BUFFER_MODE_EXT 0x8C7F #define GL_TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH_EXT 0x8C76 #endif #ifndef GL_EXT_direct_state_access #define GL_PROGRAM_MATRIX_EXT 0x8E2D #define GL_TRANSPOSE_PROGRAM_MATRIX_EXT 0x8E2E #define GL_PROGRAM_MATRIX_STACK_DEPTH_EXT 0x8E2F #endif #ifndef GL_EXT_vertex_array_bgra /* reuse GL_BGRA */ #endif #ifndef GL_EXT_texture_swizzle #define GL_TEXTURE_SWIZZLE_R_EXT 0x8E42 #define GL_TEXTURE_SWIZZLE_G_EXT 0x8E43 #define GL_TEXTURE_SWIZZLE_B_EXT 0x8E44 #define GL_TEXTURE_SWIZZLE_A_EXT 0x8E45 #define GL_TEXTURE_SWIZZLE_RGBA_EXT 0x8E46 #endif #ifndef GL_NV_explicit_multisample #define GL_SAMPLE_POSITION_NV 0x8E50 #define GL_SAMPLE_MASK_NV 0x8E51 #define GL_SAMPLE_MASK_VALUE_NV 0x8E52 #define GL_TEXTURE_BINDING_RENDERBUFFER_NV 0x8E53 #define GL_TEXTURE_RENDERBUFFER_DATA_STORE_BINDING_NV 0x8E54 #define GL_TEXTURE_RENDERBUFFER_NV 0x8E55 #define GL_SAMPLER_RENDERBUFFER_NV 0x8E56 #define GL_INT_SAMPLER_RENDERBUFFER_NV 0x8E57 #define GL_UNSIGNED_INT_SAMPLER_RENDERBUFFER_NV 0x8E58 #define GL_MAX_SAMPLE_MASK_WORDS_NV 0x8E59 #endif #ifndef GL_NV_transform_feedback2 #define GL_TRANSFORM_FEEDBACK_NV 0x8E22 #define GL_TRANSFORM_FEEDBACK_BUFFER_PAUSED_NV 0x8E23 #define GL_TRANSFORM_FEEDBACK_BUFFER_ACTIVE_NV 0x8E24 #define GL_TRANSFORM_FEEDBACK_BINDING_NV 0x8E25 #endif #ifndef GL_ATI_meminfo #define GL_VBO_FREE_MEMORY_ATI 0x87FB #define GL_TEXTURE_FREE_MEMORY_ATI 0x87FC #define GL_RENDERBUFFER_FREE_MEMORY_ATI 0x87FD #endif #ifndef GL_AMD_performance_monitor #define GL_COUNTER_TYPE_AMD 0x8BC0 #define GL_COUNTER_RANGE_AMD 0x8BC1 #define GL_UNSIGNED_INT64_AMD 0x8BC2 #define GL_PERCENTAGE_AMD 0x8BC3 #define GL_PERFMON_RESULT_AVAILABLE_AMD 0x8BC4 #define GL_PERFMON_RESULT_SIZE_AMD 0x8BC5 #define GL_PERFMON_RESULT_AMD 0x8BC6 #endif #ifndef GL_AMD_texture_texture4 #endif #ifndef GL_AMD_vertex_shader_tesselator #define GL_SAMPLER_BUFFER_AMD 0x9001 #define GL_INT_SAMPLER_BUFFER_AMD 0x9002 #define GL_UNSIGNED_INT_SAMPLER_BUFFER_AMD 0x9003 #define GL_TESSELLATION_MODE_AMD 0x9004 #define GL_TESSELLATION_FACTOR_AMD 0x9005 #define GL_DISCRETE_AMD 0x9006 #define GL_CONTINUOUS_AMD 0x9007 #endif #ifndef GL_EXT_provoking_vertex #define GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION_EXT 0x8E4C #define GL_FIRST_VERTEX_CONVENTION_EXT 0x8E4D #define GL_LAST_VERTEX_CONVENTION_EXT 0x8E4E #define GL_PROVOKING_VERTEX_EXT 0x8E4F #endif #ifndef GL_EXT_texture_snorm #define GL_ALPHA_SNORM 0x9010 #define GL_LUMINANCE_SNORM 0x9011 #define GL_LUMINANCE_ALPHA_SNORM 0x9012 #define GL_INTENSITY_SNORM 0x9013 #define GL_ALPHA8_SNORM 0x9014 #define GL_LUMINANCE8_SNORM 0x9015 #define GL_LUMINANCE8_ALPHA8_SNORM 0x9016 #define GL_INTENSITY8_SNORM 0x9017 #define GL_ALPHA16_SNORM 0x9018 #define GL_LUMINANCE16_SNORM 0x9019 #define GL_LUMINANCE16_ALPHA16_SNORM 0x901A #define GL_INTENSITY16_SNORM 0x901B /* reuse GL_R_SNORM */ /* reuse GL_RG_SNORM */ /* reuse GL_RGB_SNORM */ /* reuse GL_RGBA_SNORM */ /* reuse GL_R8_SNORM */ /* reuse GL_RG8_SNORM */ /* reuse GL_RGB8_SNORM */ /* reuse GL_RGBA8_SNORM */ /* reuse GL_R16_SNORM */ /* reuse GL_RG16_SNORM */ /* reuse GL_RGB16_SNORM */ /* reuse GL_RGBA16_SNORM */ /* reuse GL_SIGNED_NORMALIZED */ #endif #ifndef GL_AMD_draw_buffers_blend #endif #ifndef GL_APPLE_texture_range #define GL_TEXTURE_RANGE_LENGTH_APPLE 0x85B7 #define GL_TEXTURE_RANGE_POINTER_APPLE 0x85B8 #define GL_TEXTURE_STORAGE_HINT_APPLE 0x85BC #define GL_STORAGE_PRIVATE_APPLE 0x85BD /* reuse GL_STORAGE_CACHED_APPLE */ /* reuse GL_STORAGE_SHARED_APPLE */ #endif #ifndef GL_APPLE_float_pixels #define GL_HALF_APPLE 0x140B #define GL_RGBA_FLOAT32_APPLE 0x8814 #define GL_RGB_FLOAT32_APPLE 0x8815 #define GL_ALPHA_FLOAT32_APPLE 0x8816 #define GL_INTENSITY_FLOAT32_APPLE 0x8817 #define GL_LUMINANCE_FLOAT32_APPLE 0x8818 #define GL_LUMINANCE_ALPHA_FLOAT32_APPLE 0x8819 #define GL_RGBA_FLOAT16_APPLE 0x881A #define GL_RGB_FLOAT16_APPLE 0x881B #define GL_ALPHA_FLOAT16_APPLE 0x881C #define GL_INTENSITY_FLOAT16_APPLE 0x881D #define GL_LUMINANCE_FLOAT16_APPLE 0x881E #define GL_LUMINANCE_ALPHA_FLOAT16_APPLE 0x881F #define GL_COLOR_FLOAT_APPLE 0x8A0F #endif #ifndef GL_APPLE_vertex_program_evaluators #define GL_VERTEX_ATTRIB_MAP1_APPLE 0x8A00 #define GL_VERTEX_ATTRIB_MAP2_APPLE 0x8A01 #define GL_VERTEX_ATTRIB_MAP1_SIZE_APPLE 0x8A02 #define GL_VERTEX_ATTRIB_MAP1_COEFF_APPLE 0x8A03 #define GL_VERTEX_ATTRIB_MAP1_ORDER_APPLE 0x8A04 #define GL_VERTEX_ATTRIB_MAP1_DOMAIN_APPLE 0x8A05 #define GL_VERTEX_ATTRIB_MAP2_SIZE_APPLE 0x8A06 #define GL_VERTEX_ATTRIB_MAP2_COEFF_APPLE 0x8A07 #define GL_VERTEX_ATTRIB_MAP2_ORDER_APPLE 0x8A08 #define GL_VERTEX_ATTRIB_MAP2_DOMAIN_APPLE 0x8A09 #endif #ifndef GL_APPLE_aux_depth_stencil #define GL_AUX_DEPTH_STENCIL_APPLE 0x8A14 #endif #ifndef GL_APPLE_object_purgeable #define GL_BUFFER_OBJECT_APPLE 0x85B3 #define GL_RELEASED_APPLE 0x8A19 #define GL_VOLATILE_APPLE 0x8A1A #define GL_RETAINED_APPLE 0x8A1B #define GL_UNDEFINED_APPLE 0x8A1C #define GL_PURGEABLE_APPLE 0x8A1D #endif #ifndef GL_APPLE_row_bytes #define GL_PACK_ROW_BYTES_APPLE 0x8A15 #define GL_UNPACK_ROW_BYTES_APPLE 0x8A16 #endif /*************************************************************/ #include #ifndef GL_VERSION_2_0 /* GL type for program/shader text */ typedef char GLchar; #endif #ifndef GL_VERSION_1_5 /* GL types for handling large vertex buffer objects */ typedef ptrdiff_t GLintptr; typedef ptrdiff_t GLsizeiptr; #endif #ifndef GL_ARB_vertex_buffer_object /* GL types for handling large vertex buffer objects */ typedef ptrdiff_t GLintptrARB; typedef ptrdiff_t GLsizeiptrARB; #endif #ifndef GL_ARB_shader_objects /* GL types for program/shader text and shader object handles */ typedef char GLcharARB; typedef unsigned int GLhandleARB; #endif /* GL type for "half" precision (s10e5) float data in host memory */ #ifndef GL_ARB_half_float_pixel typedef unsigned short GLhalfARB; #endif #ifndef GL_NV_half_float typedef unsigned short GLhalfNV; #endif #ifndef GLEXT_64_TYPES_DEFINED /* This code block is duplicated in glxext.h, so must be protected */ #define GLEXT_64_TYPES_DEFINED /* Define int32_t, int64_t, and uint64_t types for UST/MSC */ /* (as used in the GL_EXT_timer_query extension). */ #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L #include #elif defined(__sun__) || defined(__digital__) #include #if defined(__STDC__) #if defined(__arch64__) || defined(_LP64) typedef long int int64_t; typedef unsigned long int uint64_t; #else typedef long long int int64_t; typedef unsigned long long int uint64_t; #endif /* __arch64__ */ #endif /* __STDC__ */ #elif defined( __VMS ) || defined(__sgi) #include #elif defined(__SCO__) || defined(__USLC__) #include #elif defined(__UNIXOS2__) || defined(__SOL64__) typedef long int int32_t; typedef long long int int64_t; typedef unsigned long long int uint64_t; #elif defined(_WIN32) && defined(__GNUC__) #include #elif defined(_WIN32) typedef __int32 int32_t; typedef __int64 int64_t; typedef unsigned __int64 uint64_t; #else /* Fallback if nothing above works */ #include #endif #endif #ifndef GL_EXT_timer_query typedef int64_t GLint64EXT; typedef uint64_t GLuint64EXT; #endif #ifndef ARB_sync typedef int64_t GLint64; typedef uint64_t GLuint64; typedef struct __GLsync *GLsync; #endif #ifndef GL_VERSION_1_2 #define GL_VERSION_1_2 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendColor (GLclampf, GLclampf, GLclampf, GLclampf); GLAPI void APIENTRY glBlendEquation (GLenum); GLAPI void APIENTRY glDrawRangeElements (GLenum, GLuint, GLuint, GLsizei, GLenum, const GLvoid *); GLAPI void APIENTRY glTexImage3D (GLenum, GLint, GLint, GLsizei, GLsizei, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glTexSubImage3D (GLenum, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glCopyTexSubImage3D (GLenum, GLint, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDCOLORPROC) (GLclampf red, GLclampf green, GLclampf blue, GLclampf alpha); typedef void (APIENTRYP PFNGLBLENDEQUATIONPROC) (GLenum mode); typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTSPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const GLvoid *indices); typedef void (APIENTRYP PFNGLTEXIMAGE3DPROC) (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLCOPYTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height); #endif #ifndef GL_VERSION_1_2_DEPRECATED #define GL_VERSION_1_2_DEPRECATED 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glColorTable (GLenum, GLenum, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glColorTableParameterfv (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glColorTableParameteriv (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glCopyColorTable (GLenum, GLenum, GLint, GLint, GLsizei); GLAPI void APIENTRY glGetColorTable (GLenum, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetColorTableParameterfv (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetColorTableParameteriv (GLenum, GLenum, GLint *); GLAPI void APIENTRY glColorSubTable (GLenum, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glCopyColorSubTable (GLenum, GLsizei, GLint, GLint, GLsizei); GLAPI void APIENTRY glConvolutionFilter1D (GLenum, GLenum, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glConvolutionFilter2D (GLenum, GLenum, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glConvolutionParameterf (GLenum, GLenum, GLfloat); GLAPI void APIENTRY glConvolutionParameterfv (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glConvolutionParameteri (GLenum, GLenum, GLint); GLAPI void APIENTRY glConvolutionParameteriv (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glCopyConvolutionFilter1D (GLenum, GLenum, GLint, GLint, GLsizei); GLAPI void APIENTRY glCopyConvolutionFilter2D (GLenum, GLenum, GLint, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glGetConvolutionFilter (GLenum, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetConvolutionParameterfv (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetConvolutionParameteriv (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetSeparableFilter (GLenum, GLenum, GLenum, GLvoid *, GLvoid *, GLvoid *); GLAPI void APIENTRY glSeparableFilter2D (GLenum, GLenum, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *, const GLvoid *); GLAPI void APIENTRY glGetHistogram (GLenum, GLboolean, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetHistogramParameterfv (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetHistogramParameteriv (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetMinmax (GLenum, GLboolean, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetMinmaxParameterfv (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetMinmaxParameteriv (GLenum, GLenum, GLint *); GLAPI void APIENTRY glHistogram (GLenum, GLsizei, GLenum, GLboolean); GLAPI void APIENTRY glMinmax (GLenum, GLenum, GLboolean); GLAPI void APIENTRY glResetHistogram (GLenum); GLAPI void APIENTRY glResetMinmax (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOLORTABLEPROC) (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const GLvoid *table); typedef void (APIENTRYP PFNGLCOLORTABLEPARAMETERFVPROC) (GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLCOLORTABLEPARAMETERIVPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLCOPYCOLORTABLEPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width); typedef void (APIENTRYP PFNGLGETCOLORTABLEPROC) (GLenum target, GLenum format, GLenum type, GLvoid *table); typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLCOLORSUBTABLEPROC) (GLenum target, GLsizei start, GLsizei count, GLenum format, GLenum type, const GLvoid *data); typedef void (APIENTRYP PFNGLCOPYCOLORSUBTABLEPROC) (GLenum target, GLsizei start, GLint x, GLint y, GLsizei width); typedef void (APIENTRYP PFNGLCONVOLUTIONFILTER1DPROC) (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const GLvoid *image); typedef void (APIENTRYP PFNGLCONVOLUTIONFILTER2DPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const GLvoid *image); typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERFPROC) (GLenum target, GLenum pname, GLfloat params); typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERFVPROC) (GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERIPROC) (GLenum target, GLenum pname, GLint params); typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERIVPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLCOPYCONVOLUTIONFILTER1DPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width); typedef void (APIENTRYP PFNGLCOPYCONVOLUTIONFILTER2DPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLGETCONVOLUTIONFILTERPROC) (GLenum target, GLenum format, GLenum type, GLvoid *image); typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETSEPARABLEFILTERPROC) (GLenum target, GLenum format, GLenum type, GLvoid *row, GLvoid *column, GLvoid *span); typedef void (APIENTRYP PFNGLSEPARABLEFILTER2DPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const GLvoid *row, const GLvoid *column); typedef void (APIENTRYP PFNGLGETHISTOGRAMPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, GLvoid *values); typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETMINMAXPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, GLvoid *values); typedef void (APIENTRYP PFNGLGETMINMAXPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETMINMAXPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLHISTOGRAMPROC) (GLenum target, GLsizei width, GLenum internalformat, GLboolean sink); typedef void (APIENTRYP PFNGLMINMAXPROC) (GLenum target, GLenum internalformat, GLboolean sink); typedef void (APIENTRYP PFNGLRESETHISTOGRAMPROC) (GLenum target); typedef void (APIENTRYP PFNGLRESETMINMAXPROC) (GLenum target); #endif #ifndef GL_VERSION_1_3 #define GL_VERSION_1_3 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glActiveTexture (GLenum); GLAPI void APIENTRY glSampleCoverage (GLclampf, GLboolean); GLAPI void APIENTRY glCompressedTexImage3D (GLenum, GLint, GLenum, GLsizei, GLsizei, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexImage2D (GLenum, GLint, GLenum, GLsizei, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexImage1D (GLenum, GLint, GLenum, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexSubImage3D (GLenum, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexSubImage2D (GLenum, GLint, GLint, GLint, GLsizei, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexSubImage1D (GLenum, GLint, GLint, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glGetCompressedTexImage (GLenum, GLint, GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLACTIVETEXTUREPROC) (GLenum texture); typedef void (APIENTRYP PFNGLSAMPLECOVERAGEPROC) (GLclampf value, GLboolean invert); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE3DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE2DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE1DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE1DPROC) (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLGETCOMPRESSEDTEXIMAGEPROC) (GLenum target, GLint level, GLvoid *img); #endif #ifndef GL_VERSION_1_3_DEPRECATED #define GL_VERSION_1_3_DEPRECATED 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glClientActiveTexture (GLenum); GLAPI void APIENTRY glMultiTexCoord1d (GLenum, GLdouble); GLAPI void APIENTRY glMultiTexCoord1dv (GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexCoord1f (GLenum, GLfloat); GLAPI void APIENTRY glMultiTexCoord1fv (GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexCoord1i (GLenum, GLint); GLAPI void APIENTRY glMultiTexCoord1iv (GLenum, const GLint *); GLAPI void APIENTRY glMultiTexCoord1s (GLenum, GLshort); GLAPI void APIENTRY glMultiTexCoord1sv (GLenum, const GLshort *); GLAPI void APIENTRY glMultiTexCoord2d (GLenum, GLdouble, GLdouble); GLAPI void APIENTRY glMultiTexCoord2dv (GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexCoord2f (GLenum, GLfloat, GLfloat); GLAPI void APIENTRY glMultiTexCoord2fv (GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexCoord2i (GLenum, GLint, GLint); GLAPI void APIENTRY glMultiTexCoord2iv (GLenum, const GLint *); GLAPI void APIENTRY glMultiTexCoord2s (GLenum, GLshort, GLshort); GLAPI void APIENTRY glMultiTexCoord2sv (GLenum, const GLshort *); GLAPI void APIENTRY glMultiTexCoord3d (GLenum, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMultiTexCoord3dv (GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexCoord3f (GLenum, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glMultiTexCoord3fv (GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexCoord3i (GLenum, GLint, GLint, GLint); GLAPI void APIENTRY glMultiTexCoord3iv (GLenum, const GLint *); GLAPI void APIENTRY glMultiTexCoord3s (GLenum, GLshort, GLshort, GLshort); GLAPI void APIENTRY glMultiTexCoord3sv (GLenum, const GLshort *); GLAPI void APIENTRY glMultiTexCoord4d (GLenum, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMultiTexCoord4dv (GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexCoord4f (GLenum, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glMultiTexCoord4fv (GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexCoord4i (GLenum, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glMultiTexCoord4iv (GLenum, const GLint *); GLAPI void APIENTRY glMultiTexCoord4s (GLenum, GLshort, GLshort, GLshort, GLshort); GLAPI void APIENTRY glMultiTexCoord4sv (GLenum, const GLshort *); GLAPI void APIENTRY glLoadTransposeMatrixf (const GLfloat *); GLAPI void APIENTRY glLoadTransposeMatrixd (const GLdouble *); GLAPI void APIENTRY glMultTransposeMatrixf (const GLfloat *); GLAPI void APIENTRY glMultTransposeMatrixd (const GLdouble *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCLIENTACTIVETEXTUREPROC) (GLenum texture); typedef void (APIENTRYP PFNGLMULTITEXCOORD1DPROC) (GLenum target, GLdouble s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1DVPROC) (GLenum target, const GLdouble *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD1FPROC) (GLenum target, GLfloat s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1FVPROC) (GLenum target, const GLfloat *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD1IPROC) (GLenum target, GLint s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1IVPROC) (GLenum target, const GLint *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD1SPROC) (GLenum target, GLshort s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1SVPROC) (GLenum target, const GLshort *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2DPROC) (GLenum target, GLdouble s, GLdouble t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2DVPROC) (GLenum target, const GLdouble *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2FPROC) (GLenum target, GLfloat s, GLfloat t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2FVPROC) (GLenum target, const GLfloat *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2IPROC) (GLenum target, GLint s, GLint t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2IVPROC) (GLenum target, const GLint *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2SPROC) (GLenum target, GLshort s, GLshort t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2SVPROC) (GLenum target, const GLshort *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3DPROC) (GLenum target, GLdouble s, GLdouble t, GLdouble r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3DVPROC) (GLenum target, const GLdouble *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3FPROC) (GLenum target, GLfloat s, GLfloat t, GLfloat r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3FVPROC) (GLenum target, const GLfloat *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3IPROC) (GLenum target, GLint s, GLint t, GLint r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3IVPROC) (GLenum target, const GLint *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3SPROC) (GLenum target, GLshort s, GLshort t, GLshort r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3SVPROC) (GLenum target, const GLshort *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4DPROC) (GLenum target, GLdouble s, GLdouble t, GLdouble r, GLdouble q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4DVPROC) (GLenum target, const GLdouble *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4FPROC) (GLenum target, GLfloat s, GLfloat t, GLfloat r, GLfloat q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4FVPROC) (GLenum target, const GLfloat *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4IPROC) (GLenum target, GLint s, GLint t, GLint r, GLint q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4IVPROC) (GLenum target, const GLint *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4SPROC) (GLenum target, GLshort s, GLshort t, GLshort r, GLshort q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4SVPROC) (GLenum target, const GLshort *v); typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXFPROC) (const GLfloat *m); typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXDPROC) (const GLdouble *m); typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXFPROC) (const GLfloat *m); typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXDPROC) (const GLdouble *m); #endif #ifndef GL_VERSION_1_4 #define GL_VERSION_1_4 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendFuncSeparate (GLenum, GLenum, GLenum, GLenum); GLAPI void APIENTRY glMultiDrawArrays (GLenum, GLint *, GLsizei *, GLsizei); GLAPI void APIENTRY glMultiDrawElements (GLenum, const GLsizei *, GLenum, const GLvoid* *, GLsizei); GLAPI void APIENTRY glPointParameterf (GLenum, GLfloat); GLAPI void APIENTRY glPointParameterfv (GLenum, const GLfloat *); GLAPI void APIENTRY glPointParameteri (GLenum, GLint); GLAPI void APIENTRY glPointParameteriv (GLenum, const GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha); typedef void (APIENTRYP PFNGLMULTIDRAWARRAYSPROC) (GLenum mode, GLint *first, GLsizei *count, GLsizei primcount); typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSPROC) (GLenum mode, const GLsizei *count, GLenum type, const GLvoid* *indices, GLsizei primcount); typedef void (APIENTRYP PFNGLPOINTPARAMETERFPROC) (GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLPOINTPARAMETERFVPROC) (GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLPOINTPARAMETERIPROC) (GLenum pname, GLint param); typedef void (APIENTRYP PFNGLPOINTPARAMETERIVPROC) (GLenum pname, const GLint *params); #endif #ifndef GL_VERSION_1_4_DEPRECATED #define GL_VERSION_1_4_DEPRECATED 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFogCoordf (GLfloat); GLAPI void APIENTRY glFogCoordfv (const GLfloat *); GLAPI void APIENTRY glFogCoordd (GLdouble); GLAPI void APIENTRY glFogCoorddv (const GLdouble *); GLAPI void APIENTRY glFogCoordPointer (GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glSecondaryColor3b (GLbyte, GLbyte, GLbyte); GLAPI void APIENTRY glSecondaryColor3bv (const GLbyte *); GLAPI void APIENTRY glSecondaryColor3d (GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glSecondaryColor3dv (const GLdouble *); GLAPI void APIENTRY glSecondaryColor3f (GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glSecondaryColor3fv (const GLfloat *); GLAPI void APIENTRY glSecondaryColor3i (GLint, GLint, GLint); GLAPI void APIENTRY glSecondaryColor3iv (const GLint *); GLAPI void APIENTRY glSecondaryColor3s (GLshort, GLshort, GLshort); GLAPI void APIENTRY glSecondaryColor3sv (const GLshort *); GLAPI void APIENTRY glSecondaryColor3ub (GLubyte, GLubyte, GLubyte); GLAPI void APIENTRY glSecondaryColor3ubv (const GLubyte *); GLAPI void APIENTRY glSecondaryColor3ui (GLuint, GLuint, GLuint); GLAPI void APIENTRY glSecondaryColor3uiv (const GLuint *); GLAPI void APIENTRY glSecondaryColor3us (GLushort, GLushort, GLushort); GLAPI void APIENTRY glSecondaryColor3usv (const GLushort *); GLAPI void APIENTRY glSecondaryColorPointer (GLint, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glWindowPos2d (GLdouble, GLdouble); GLAPI void APIENTRY glWindowPos2dv (const GLdouble *); GLAPI void APIENTRY glWindowPos2f (GLfloat, GLfloat); GLAPI void APIENTRY glWindowPos2fv (const GLfloat *); GLAPI void APIENTRY glWindowPos2i (GLint, GLint); GLAPI void APIENTRY glWindowPos2iv (const GLint *); GLAPI void APIENTRY glWindowPos2s (GLshort, GLshort); GLAPI void APIENTRY glWindowPos2sv (const GLshort *); GLAPI void APIENTRY glWindowPos3d (GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glWindowPos3dv (const GLdouble *); GLAPI void APIENTRY glWindowPos3f (GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glWindowPos3fv (const GLfloat *); GLAPI void APIENTRY glWindowPos3i (GLint, GLint, GLint); GLAPI void APIENTRY glWindowPos3iv (const GLint *); GLAPI void APIENTRY glWindowPos3s (GLshort, GLshort, GLshort); GLAPI void APIENTRY glWindowPos3sv (const GLshort *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFOGCOORDFPROC) (GLfloat coord); typedef void (APIENTRYP PFNGLFOGCOORDFVPROC) (const GLfloat *coord); typedef void (APIENTRYP PFNGLFOGCOORDDPROC) (GLdouble coord); typedef void (APIENTRYP PFNGLFOGCOORDDVPROC) (const GLdouble *coord); typedef void (APIENTRYP PFNGLFOGCOORDPOINTERPROC) (GLenum type, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3BPROC) (GLbyte red, GLbyte green, GLbyte blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3BVPROC) (const GLbyte *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3DPROC) (GLdouble red, GLdouble green, GLdouble blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3DVPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3FPROC) (GLfloat red, GLfloat green, GLfloat blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3FVPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3IPROC) (GLint red, GLint green, GLint blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3IVPROC) (const GLint *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3SPROC) (GLshort red, GLshort green, GLshort blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3SVPROC) (const GLshort *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UBPROC) (GLubyte red, GLubyte green, GLubyte blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UBVPROC) (const GLubyte *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UIPROC) (GLuint red, GLuint green, GLuint blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UIVPROC) (const GLuint *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3USPROC) (GLushort red, GLushort green, GLushort blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3USVPROC) (const GLushort *v); typedef void (APIENTRYP PFNGLSECONDARYCOLORPOINTERPROC) (GLint size, GLenum type, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLWINDOWPOS2DPROC) (GLdouble x, GLdouble y); typedef void (APIENTRYP PFNGLWINDOWPOS2DVPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLWINDOWPOS2FPROC) (GLfloat x, GLfloat y); typedef void (APIENTRYP PFNGLWINDOWPOS2FVPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLWINDOWPOS2IPROC) (GLint x, GLint y); typedef void (APIENTRYP PFNGLWINDOWPOS2IVPROC) (const GLint *v); typedef void (APIENTRYP PFNGLWINDOWPOS2SPROC) (GLshort x, GLshort y); typedef void (APIENTRYP PFNGLWINDOWPOS2SVPROC) (const GLshort *v); typedef void (APIENTRYP PFNGLWINDOWPOS3DPROC) (GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLWINDOWPOS3DVPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLWINDOWPOS3FPROC) (GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLWINDOWPOS3FVPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLWINDOWPOS3IPROC) (GLint x, GLint y, GLint z); typedef void (APIENTRYP PFNGLWINDOWPOS3IVPROC) (const GLint *v); typedef void (APIENTRYP PFNGLWINDOWPOS3SPROC) (GLshort x, GLshort y, GLshort z); typedef void (APIENTRYP PFNGLWINDOWPOS3SVPROC) (const GLshort *v); #endif #ifndef GL_VERSION_1_5 #define GL_VERSION_1_5 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGenQueries (GLsizei, GLuint *); GLAPI void APIENTRY glDeleteQueries (GLsizei, const GLuint *); GLAPI GLboolean APIENTRY glIsQuery (GLuint); GLAPI void APIENTRY glBeginQuery (GLenum, GLuint); GLAPI void APIENTRY glEndQuery (GLenum); GLAPI void APIENTRY glGetQueryiv (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetQueryObjectiv (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetQueryObjectuiv (GLuint, GLenum, GLuint *); GLAPI void APIENTRY glBindBuffer (GLenum, GLuint); GLAPI void APIENTRY glDeleteBuffers (GLsizei, const GLuint *); GLAPI void APIENTRY glGenBuffers (GLsizei, GLuint *); GLAPI GLboolean APIENTRY glIsBuffer (GLuint); GLAPI void APIENTRY glBufferData (GLenum, GLsizeiptr, const GLvoid *, GLenum); GLAPI void APIENTRY glBufferSubData (GLenum, GLintptr, GLsizeiptr, const GLvoid *); GLAPI void APIENTRY glGetBufferSubData (GLenum, GLintptr, GLsizeiptr, GLvoid *); GLAPI GLvoid* APIENTRY glMapBuffer (GLenum, GLenum); GLAPI GLboolean APIENTRY glUnmapBuffer (GLenum); GLAPI void APIENTRY glGetBufferParameteriv (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetBufferPointerv (GLenum, GLenum, GLvoid* *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGENQUERIESPROC) (GLsizei n, GLuint *ids); typedef void (APIENTRYP PFNGLDELETEQUERIESPROC) (GLsizei n, const GLuint *ids); typedef GLboolean (APIENTRYP PFNGLISQUERYPROC) (GLuint id); typedef void (APIENTRYP PFNGLBEGINQUERYPROC) (GLenum target, GLuint id); typedef void (APIENTRYP PFNGLENDQUERYPROC) (GLenum target); typedef void (APIENTRYP PFNGLGETQUERYIVPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETQUERYOBJECTIVPROC) (GLuint id, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETQUERYOBJECTUIVPROC) (GLuint id, GLenum pname, GLuint *params); typedef void (APIENTRYP PFNGLBINDBUFFERPROC) (GLenum target, GLuint buffer); typedef void (APIENTRYP PFNGLDELETEBUFFERSPROC) (GLsizei n, const GLuint *buffers); typedef void (APIENTRYP PFNGLGENBUFFERSPROC) (GLsizei n, GLuint *buffers); typedef GLboolean (APIENTRYP PFNGLISBUFFERPROC) (GLuint buffer); typedef void (APIENTRYP PFNGLBUFFERDATAPROC) (GLenum target, GLsizeiptr size, const GLvoid *data, GLenum usage); typedef void (APIENTRYP PFNGLBUFFERSUBDATAPROC) (GLenum target, GLintptr offset, GLsizeiptr size, const GLvoid *data); typedef void (APIENTRYP PFNGLGETBUFFERSUBDATAPROC) (GLenum target, GLintptr offset, GLsizeiptr size, GLvoid *data); typedef GLvoid* (APIENTRYP PFNGLMAPBUFFERPROC) (GLenum target, GLenum access); typedef GLboolean (APIENTRYP PFNGLUNMAPBUFFERPROC) (GLenum target); typedef void (APIENTRYP PFNGLGETBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETBUFFERPOINTERVPROC) (GLenum target, GLenum pname, GLvoid* *params); #endif #ifndef GL_VERSION_2_0 #define GL_VERSION_2_0 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendEquationSeparate (GLenum, GLenum); GLAPI void APIENTRY glDrawBuffers (GLsizei, const GLenum *); GLAPI void APIENTRY glStencilOpSeparate (GLenum, GLenum, GLenum, GLenum); GLAPI void APIENTRY glStencilFuncSeparate (GLenum, GLenum, GLint, GLuint); GLAPI void APIENTRY glStencilMaskSeparate (GLenum, GLuint); GLAPI void APIENTRY glAttachShader (GLuint, GLuint); GLAPI void APIENTRY glBindAttribLocation (GLuint, GLuint, const GLchar *); GLAPI void APIENTRY glCompileShader (GLuint); GLAPI GLuint APIENTRY glCreateProgram (void); GLAPI GLuint APIENTRY glCreateShader (GLenum); GLAPI void APIENTRY glDeleteProgram (GLuint); GLAPI void APIENTRY glDeleteShader (GLuint); GLAPI void APIENTRY glDetachShader (GLuint, GLuint); GLAPI void APIENTRY glDisableVertexAttribArray (GLuint); GLAPI void APIENTRY glEnableVertexAttribArray (GLuint); GLAPI void APIENTRY glGetActiveAttrib (GLuint, GLuint, GLsizei, GLsizei *, GLint *, GLenum *, GLchar *); GLAPI void APIENTRY glGetActiveUniform (GLuint, GLuint, GLsizei, GLsizei *, GLint *, GLenum *, GLchar *); GLAPI void APIENTRY glGetAttachedShaders (GLuint, GLsizei, GLsizei *, GLuint *); GLAPI GLint APIENTRY glGetAttribLocation (GLuint, const GLchar *); GLAPI void APIENTRY glGetProgramiv (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetProgramInfoLog (GLuint, GLsizei, GLsizei *, GLchar *); GLAPI void APIENTRY glGetShaderiv (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetShaderInfoLog (GLuint, GLsizei, GLsizei *, GLchar *); GLAPI void APIENTRY glGetShaderSource (GLuint, GLsizei, GLsizei *, GLchar *); GLAPI GLint APIENTRY glGetUniformLocation (GLuint, const GLchar *); GLAPI void APIENTRY glGetUniformfv (GLuint, GLint, GLfloat *); GLAPI void APIENTRY glGetUniformiv (GLuint, GLint, GLint *); GLAPI void APIENTRY glGetVertexAttribdv (GLuint, GLenum, GLdouble *); GLAPI void APIENTRY glGetVertexAttribfv (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetVertexAttribiv (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetVertexAttribPointerv (GLuint, GLenum, GLvoid* *); GLAPI GLboolean APIENTRY glIsProgram (GLuint); GLAPI GLboolean APIENTRY glIsShader (GLuint); GLAPI void APIENTRY glLinkProgram (GLuint); GLAPI void APIENTRY glShaderSource (GLuint, GLsizei, const GLchar* *, const GLint *); GLAPI void APIENTRY glUseProgram (GLuint); GLAPI void APIENTRY glUniform1f (GLint, GLfloat); GLAPI void APIENTRY glUniform2f (GLint, GLfloat, GLfloat); GLAPI void APIENTRY glUniform3f (GLint, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glUniform4f (GLint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glUniform1i (GLint, GLint); GLAPI void APIENTRY glUniform2i (GLint, GLint, GLint); GLAPI void APIENTRY glUniform3i (GLint, GLint, GLint, GLint); GLAPI void APIENTRY glUniform4i (GLint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glUniform1fv (GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glUniform2fv (GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glUniform3fv (GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glUniform4fv (GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glUniform1iv (GLint, GLsizei, const GLint *); GLAPI void APIENTRY glUniform2iv (GLint, GLsizei, const GLint *); GLAPI void APIENTRY glUniform3iv (GLint, GLsizei, const GLint *); GLAPI void APIENTRY glUniform4iv (GLint, GLsizei, const GLint *); GLAPI void APIENTRY glUniformMatrix2fv (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix3fv (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix4fv (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glValidateProgram (GLuint); GLAPI void APIENTRY glVertexAttrib1d (GLuint, GLdouble); GLAPI void APIENTRY glVertexAttrib1dv (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib1f (GLuint, GLfloat); GLAPI void APIENTRY glVertexAttrib1fv (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib1s (GLuint, GLshort); GLAPI void APIENTRY glVertexAttrib1sv (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib2d (GLuint, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib2dv (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib2f (GLuint, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib2fv (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib2s (GLuint, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib2sv (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib3d (GLuint, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib3dv (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib3f (GLuint, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib3fv (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib3s (GLuint, GLshort, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib3sv (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib4Nbv (GLuint, const GLbyte *); GLAPI void APIENTRY glVertexAttrib4Niv (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttrib4Nsv (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib4Nub (GLuint, GLubyte, GLubyte, GLubyte, GLubyte); GLAPI void APIENTRY glVertexAttrib4Nubv (GLuint, const GLubyte *); GLAPI void APIENTRY glVertexAttrib4Nuiv (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttrib4Nusv (GLuint, const GLushort *); GLAPI void APIENTRY glVertexAttrib4bv (GLuint, const GLbyte *); GLAPI void APIENTRY glVertexAttrib4d (GLuint, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib4dv (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib4f (GLuint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib4fv (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib4iv (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttrib4s (GLuint, GLshort, GLshort, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib4sv (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib4ubv (GLuint, const GLubyte *); GLAPI void APIENTRY glVertexAttrib4uiv (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttrib4usv (GLuint, const GLushort *); GLAPI void APIENTRY glVertexAttribPointer (GLuint, GLint, GLenum, GLboolean, GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEPROC) (GLenum modeRGB, GLenum modeAlpha); typedef void (APIENTRYP PFNGLDRAWBUFFERSPROC) (GLsizei n, const GLenum *bufs); typedef void (APIENTRYP PFNGLSTENCILOPSEPARATEPROC) (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass); typedef void (APIENTRYP PFNGLSTENCILFUNCSEPARATEPROC) (GLenum frontfunc, GLenum backfunc, GLint ref, GLuint mask); typedef void (APIENTRYP PFNGLSTENCILMASKSEPARATEPROC) (GLenum face, GLuint mask); typedef void (APIENTRYP PFNGLATTACHSHADERPROC) (GLuint program, GLuint shader); typedef void (APIENTRYP PFNGLBINDATTRIBLOCATIONPROC) (GLuint program, GLuint index, const GLchar *name); typedef void (APIENTRYP PFNGLCOMPILESHADERPROC) (GLuint shader); typedef GLuint (APIENTRYP PFNGLCREATEPROGRAMPROC) (void); typedef GLuint (APIENTRYP PFNGLCREATESHADERPROC) (GLenum type); typedef void (APIENTRYP PFNGLDELETEPROGRAMPROC) (GLuint program); typedef void (APIENTRYP PFNGLDELETESHADERPROC) (GLuint shader); typedef void (APIENTRYP PFNGLDETACHSHADERPROC) (GLuint program, GLuint shader); typedef void (APIENTRYP PFNGLDISABLEVERTEXATTRIBARRAYPROC) (GLuint index); typedef void (APIENTRYP PFNGLENABLEVERTEXATTRIBARRAYPROC) (GLuint index); typedef void (APIENTRYP PFNGLGETACTIVEATTRIBPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name); typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name); typedef void (APIENTRYP PFNGLGETATTACHEDSHADERSPROC) (GLuint program, GLsizei maxCount, GLsizei *count, GLuint *obj); typedef GLint (APIENTRYP PFNGLGETATTRIBLOCATIONPROC) (GLuint program, const GLchar *name); typedef void (APIENTRYP PFNGLGETPROGRAMIVPROC) (GLuint program, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETPROGRAMINFOLOGPROC) (GLuint program, GLsizei bufSize, GLsizei *length, GLchar *infoLog); typedef void (APIENTRYP PFNGLGETSHADERIVPROC) (GLuint shader, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETSHADERINFOLOGPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *infoLog); typedef void (APIENTRYP PFNGLGETSHADERSOURCEPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *source); typedef GLint (APIENTRYP PFNGLGETUNIFORMLOCATIONPROC) (GLuint program, const GLchar *name); typedef void (APIENTRYP PFNGLGETUNIFORMFVPROC) (GLuint program, GLint location, GLfloat *params); typedef void (APIENTRYP PFNGLGETUNIFORMIVPROC) (GLuint program, GLint location, GLint *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBDVPROC) (GLuint index, GLenum pname, GLdouble *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBFVPROC) (GLuint index, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIVPROC) (GLuint index, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVPROC) (GLuint index, GLenum pname, GLvoid* *pointer); typedef GLboolean (APIENTRYP PFNGLISPROGRAMPROC) (GLuint program); typedef GLboolean (APIENTRYP PFNGLISSHADERPROC) (GLuint shader); typedef void (APIENTRYP PFNGLLINKPROGRAMPROC) (GLuint program); typedef void (APIENTRYP PFNGLSHADERSOURCEPROC) (GLuint shader, GLsizei count, const GLchar* *string, const GLint *length); typedef void (APIENTRYP PFNGLUSEPROGRAMPROC) (GLuint program); typedef void (APIENTRYP PFNGLUNIFORM1FPROC) (GLint location, GLfloat v0); typedef void (APIENTRYP PFNGLUNIFORM2FPROC) (GLint location, GLfloat v0, GLfloat v1); typedef void (APIENTRYP PFNGLUNIFORM3FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2); typedef void (APIENTRYP PFNGLUNIFORM4FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3); typedef void (APIENTRYP PFNGLUNIFORM1IPROC) (GLint location, GLint v0); typedef void (APIENTRYP PFNGLUNIFORM2IPROC) (GLint location, GLint v0, GLint v1); typedef void (APIENTRYP PFNGLUNIFORM3IPROC) (GLint location, GLint v0, GLint v1, GLint v2); typedef void (APIENTRYP PFNGLUNIFORM4IPROC) (GLint location, GLint v0, GLint v1, GLint v2, GLint v3); typedef void (APIENTRYP PFNGLUNIFORM1FVPROC) (GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORM2FVPROC) (GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORM3FVPROC) (GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORM4FVPROC) (GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORM1IVPROC) (GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLUNIFORM2IVPROC) (GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLUNIFORM3IVPROC) (GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLUNIFORM4IVPROC) (GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLVALIDATEPROGRAMPROC) (GLuint program); typedef void (APIENTRYP PFNGLVERTEXATTRIB1DPROC) (GLuint index, GLdouble x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1DVPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB1FPROC) (GLuint index, GLfloat x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1FVPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB1SPROC) (GLuint index, GLshort x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1SVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2DPROC) (GLuint index, GLdouble x, GLdouble y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2DVPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2FPROC) (GLuint index, GLfloat x, GLfloat y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2FVPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2SPROC) (GLuint index, GLshort x, GLshort y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2SVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3DPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3DVPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3FVPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3SPROC) (GLuint index, GLshort x, GLshort y, GLshort z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3SVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NBVPROC) (GLuint index, const GLbyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NIVPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NSVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUBPROC) (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUBVPROC) (GLuint index, const GLubyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUIVPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUSVPROC) (GLuint index, const GLushort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4BVPROC) (GLuint index, const GLbyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4DPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4DVPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4FVPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4IVPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4SPROC) (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4SVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4UBVPROC) (GLuint index, const GLubyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4UIVPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4USVPROC) (GLuint index, const GLushort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBPOINTERPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const GLvoid *pointer); #endif #ifndef GL_VERSION_2_1 #define GL_VERSION_2_1 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glUniformMatrix2x3fv (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix3x2fv (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix2x4fv (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix4x2fv (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix3x4fv (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix4x3fv (GLint, GLsizei, GLboolean, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLUNIFORMMATRIX2X3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX3X2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX2X4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX4X2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX3X4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX4X3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); #endif #ifndef GL_VERSION_3_0 #define GL_VERSION_3_0 1 /* OpenGL 3.0 also reuses entry points from these extensions: */ /* ARB_framebuffer_object */ /* ARB_map_buffer_range */ /* ARB_vertex_array_object */ #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glColorMaski (GLuint, GLboolean, GLboolean, GLboolean, GLboolean); GLAPI void APIENTRY glGetBooleani_v (GLenum, GLuint, GLboolean *); GLAPI void APIENTRY glGetIntegeri_v (GLenum, GLuint, GLint *); GLAPI void APIENTRY glEnablei (GLenum, GLuint); GLAPI void APIENTRY glDisablei (GLenum, GLuint); GLAPI GLboolean APIENTRY glIsEnabledi (GLenum, GLuint); GLAPI void APIENTRY glBeginTransformFeedback (GLenum); GLAPI void APIENTRY glEndTransformFeedback (void); GLAPI void APIENTRY glBindBufferRange (GLenum, GLuint, GLuint, GLintptr, GLsizeiptr); GLAPI void APIENTRY glBindBufferBase (GLenum, GLuint, GLuint); GLAPI void APIENTRY glTransformFeedbackVaryings (GLuint, GLsizei, const GLchar* *, GLenum); GLAPI void APIENTRY glGetTransformFeedbackVarying (GLuint, GLuint, GLsizei, GLsizei *, GLsizei *, GLenum *, GLchar *); GLAPI void APIENTRY glClampColor (GLenum, GLenum); GLAPI void APIENTRY glBeginConditionalRender (GLuint, GLenum); GLAPI void APIENTRY glEndConditionalRender (void); GLAPI void APIENTRY glVertexAttribIPointer (GLuint, GLint, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glGetVertexAttribIiv (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetVertexAttribIuiv (GLuint, GLenum, GLuint *); GLAPI void APIENTRY glVertexAttribI1i (GLuint, GLint); GLAPI void APIENTRY glVertexAttribI2i (GLuint, GLint, GLint); GLAPI void APIENTRY glVertexAttribI3i (GLuint, GLint, GLint, GLint); GLAPI void APIENTRY glVertexAttribI4i (GLuint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glVertexAttribI1ui (GLuint, GLuint); GLAPI void APIENTRY glVertexAttribI2ui (GLuint, GLuint, GLuint); GLAPI void APIENTRY glVertexAttribI3ui (GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glVertexAttribI4ui (GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glVertexAttribI1iv (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttribI2iv (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttribI3iv (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttribI4iv (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttribI1uiv (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttribI2uiv (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttribI3uiv (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttribI4uiv (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttribI4bv (GLuint, const GLbyte *); GLAPI void APIENTRY glVertexAttribI4sv (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttribI4ubv (GLuint, const GLubyte *); GLAPI void APIENTRY glVertexAttribI4usv (GLuint, const GLushort *); GLAPI void APIENTRY glGetUniformuiv (GLuint, GLint, GLuint *); GLAPI void APIENTRY glBindFragDataLocation (GLuint, GLuint, const GLchar *); GLAPI GLint APIENTRY glGetFragDataLocation (GLuint, const GLchar *); GLAPI void APIENTRY glUniform1ui (GLint, GLuint); GLAPI void APIENTRY glUniform2ui (GLint, GLuint, GLuint); GLAPI void APIENTRY glUniform3ui (GLint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glUniform4ui (GLint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glUniform1uiv (GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glUniform2uiv (GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glUniform3uiv (GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glUniform4uiv (GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glTexParameterIiv (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glTexParameterIuiv (GLenum, GLenum, const GLuint *); GLAPI void APIENTRY glGetTexParameterIiv (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetTexParameterIuiv (GLenum, GLenum, GLuint *); GLAPI void APIENTRY glClearBufferiv (GLenum, GLint, const GLint *); GLAPI void APIENTRY glClearBufferuiv (GLenum, GLint, const GLuint *); GLAPI void APIENTRY glClearBufferfv (GLenum, GLint, const GLfloat *); GLAPI void APIENTRY glClearBufferfi (GLenum, GLint, GLfloat, GLint); GLAPI const GLubyte * APIENTRY glGetStringi (GLenum, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOLORMASKIPROC) (GLuint index, GLboolean r, GLboolean g, GLboolean b, GLboolean a); typedef void (APIENTRYP PFNGLGETBOOLEANI_VPROC) (GLenum target, GLuint index, GLboolean *data); typedef void (APIENTRYP PFNGLGETINTEGERI_VPROC) (GLenum target, GLuint index, GLint *data); typedef void (APIENTRYP PFNGLENABLEIPROC) (GLenum target, GLuint index); typedef void (APIENTRYP PFNGLDISABLEIPROC) (GLenum target, GLuint index); typedef GLboolean (APIENTRYP PFNGLISENABLEDIPROC) (GLenum target, GLuint index); typedef void (APIENTRYP PFNGLBEGINTRANSFORMFEEDBACKPROC) (GLenum primitiveMode); typedef void (APIENTRYP PFNGLENDTRANSFORMFEEDBACKPROC) (void); typedef void (APIENTRYP PFNGLBINDBUFFERRANGEPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size); typedef void (APIENTRYP PFNGLBINDBUFFERBASEPROC) (GLenum target, GLuint index, GLuint buffer); typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKVARYINGSPROC) (GLuint program, GLsizei count, const GLchar* *varyings, GLenum bufferMode); typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKVARYINGPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name); typedef void (APIENTRYP PFNGLCLAMPCOLORPROC) (GLenum target, GLenum clamp); typedef void (APIENTRYP PFNGLBEGINCONDITIONALRENDERPROC) (GLuint id, GLenum mode); typedef void (APIENTRYP PFNGLENDCONDITIONALRENDERPROC) (void); typedef void (APIENTRYP PFNGLVERTEXATTRIBIPOINTERPROC) (GLuint index, GLint size, GLenum type, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIIVPROC) (GLuint index, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIUIVPROC) (GLuint index, GLenum pname, GLuint *params); typedef void (APIENTRYP PFNGLVERTEXATTRIBI1IPROC) (GLuint index, GLint x); typedef void (APIENTRYP PFNGLVERTEXATTRIBI2IPROC) (GLuint index, GLint x, GLint y); typedef void (APIENTRYP PFNGLVERTEXATTRIBI3IPROC) (GLuint index, GLint x, GLint y, GLint z); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4IPROC) (GLuint index, GLint x, GLint y, GLint z, GLint w); typedef void (APIENTRYP PFNGLVERTEXATTRIBI1UIPROC) (GLuint index, GLuint x); typedef void (APIENTRYP PFNGLVERTEXATTRIBI2UIPROC) (GLuint index, GLuint x, GLuint y); typedef void (APIENTRYP PFNGLVERTEXATTRIBI3UIPROC) (GLuint index, GLuint x, GLuint y, GLuint z); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UIPROC) (GLuint index, GLuint x, GLuint y, GLuint z, GLuint w); typedef void (APIENTRYP PFNGLVERTEXATTRIBI1IVPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI2IVPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI3IVPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4IVPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI1UIVPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI2UIVPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI3UIVPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UIVPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4BVPROC) (GLuint index, const GLbyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4SVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UBVPROC) (GLuint index, const GLubyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4USVPROC) (GLuint index, const GLushort *v); typedef void (APIENTRYP PFNGLGETUNIFORMUIVPROC) (GLuint program, GLint location, GLuint *params); typedef void (APIENTRYP PFNGLBINDFRAGDATALOCATIONPROC) (GLuint program, GLuint color, const GLchar *name); typedef GLint (APIENTRYP PFNGLGETFRAGDATALOCATIONPROC) (GLuint program, const GLchar *name); typedef void (APIENTRYP PFNGLUNIFORM1UIPROC) (GLint location, GLuint v0); typedef void (APIENTRYP PFNGLUNIFORM2UIPROC) (GLint location, GLuint v0, GLuint v1); typedef void (APIENTRYP PFNGLUNIFORM3UIPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2); typedef void (APIENTRYP PFNGLUNIFORM4UIPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3); typedef void (APIENTRYP PFNGLUNIFORM1UIVPROC) (GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLUNIFORM2UIVPROC) (GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLUNIFORM3UIVPROC) (GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLUNIFORM4UIVPROC) (GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLTEXPARAMETERIIVPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLTEXPARAMETERIUIVPROC) (GLenum target, GLenum pname, const GLuint *params); typedef void (APIENTRYP PFNGLGETTEXPARAMETERIIVPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETTEXPARAMETERIUIVPROC) (GLenum target, GLenum pname, GLuint *params); typedef void (APIENTRYP PFNGLCLEARBUFFERIVPROC) (GLenum buffer, GLint drawbuffer, const GLint *value); typedef void (APIENTRYP PFNGLCLEARBUFFERUIVPROC) (GLenum buffer, GLint drawbuffer, const GLuint *value); typedef void (APIENTRYP PFNGLCLEARBUFFERFVPROC) (GLenum buffer, GLint drawbuffer, const GLfloat *value); typedef void (APIENTRYP PFNGLCLEARBUFFERFIPROC) (GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil); typedef const GLubyte * (APIENTRYP PFNGLGETSTRINGIPROC) (GLenum name, GLuint index); #endif #ifndef GL_VERSION_3_1 #define GL_VERSION_3_1 1 /* OpenGL 3.1 also reuses entry points from these extensions: */ /* ARB_copy_buffer */ /* ARB_uniform_buffer_object */ #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDrawArraysInstanced (GLenum, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glDrawElementsInstanced (GLenum, GLsizei, GLenum, const GLvoid *, GLsizei); GLAPI void APIENTRY glTexBuffer (GLenum, GLenum, GLuint); GLAPI void APIENTRY glPrimitiveRestartIndex (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDRAWARRAYSINSTANCEDPROC) (GLenum mode, GLint first, GLsizei count, GLsizei primcount); typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDPROC) (GLenum mode, GLsizei count, GLenum type, const GLvoid *indices, GLsizei primcount); typedef void (APIENTRYP PFNGLTEXBUFFERPROC) (GLenum target, GLenum internalformat, GLuint buffer); typedef void (APIENTRYP PFNGLPRIMITIVERESTARTINDEXPROC) (GLuint index); #endif #ifndef GL_VERSION_3_2 #define GL_VERSION_3_2 1 /* OpenGL 3.2 also reuses entry points from these extensions: */ /* ARB_draw_elements_base_vertex */ /* ARB_provoking_vertex */ /* ARB_sync */ /* ARB_texture_multisample */ #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetInteger64i_v (GLenum, GLuint, GLint64 *); GLAPI void APIENTRY glGetBufferParameteri64v (GLenum, GLenum, GLint64 *); GLAPI void APIENTRY glProgramParameteri (GLuint, GLenum, GLint); GLAPI void APIENTRY glFramebufferTexture (GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glFramebufferTextureFace (GLenum, GLenum, GLuint, GLint, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETINTEGER64I_VPROC) (GLenum target, GLuint index, GLint64 *data); typedef void (APIENTRYP PFNGLGETBUFFERPARAMETERI64VPROC) (GLenum target, GLenum pname, GLint64 *params); typedef void (APIENTRYP PFNGLPROGRAMPARAMETERIPROC) (GLuint program, GLenum pname, GLint value); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREFACEPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLenum face); #endif #ifndef GL_ARB_multitexture #define GL_ARB_multitexture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glActiveTextureARB (GLenum); GLAPI void APIENTRY glClientActiveTextureARB (GLenum); GLAPI void APIENTRY glMultiTexCoord1dARB (GLenum, GLdouble); GLAPI void APIENTRY glMultiTexCoord1dvARB (GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexCoord1fARB (GLenum, GLfloat); GLAPI void APIENTRY glMultiTexCoord1fvARB (GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexCoord1iARB (GLenum, GLint); GLAPI void APIENTRY glMultiTexCoord1ivARB (GLenum, const GLint *); GLAPI void APIENTRY glMultiTexCoord1sARB (GLenum, GLshort); GLAPI void APIENTRY glMultiTexCoord1svARB (GLenum, const GLshort *); GLAPI void APIENTRY glMultiTexCoord2dARB (GLenum, GLdouble, GLdouble); GLAPI void APIENTRY glMultiTexCoord2dvARB (GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexCoord2fARB (GLenum, GLfloat, GLfloat); GLAPI void APIENTRY glMultiTexCoord2fvARB (GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexCoord2iARB (GLenum, GLint, GLint); GLAPI void APIENTRY glMultiTexCoord2ivARB (GLenum, const GLint *); GLAPI void APIENTRY glMultiTexCoord2sARB (GLenum, GLshort, GLshort); GLAPI void APIENTRY glMultiTexCoord2svARB (GLenum, const GLshort *); GLAPI void APIENTRY glMultiTexCoord3dARB (GLenum, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMultiTexCoord3dvARB (GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexCoord3fARB (GLenum, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glMultiTexCoord3fvARB (GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexCoord3iARB (GLenum, GLint, GLint, GLint); GLAPI void APIENTRY glMultiTexCoord3ivARB (GLenum, const GLint *); GLAPI void APIENTRY glMultiTexCoord3sARB (GLenum, GLshort, GLshort, GLshort); GLAPI void APIENTRY glMultiTexCoord3svARB (GLenum, const GLshort *); GLAPI void APIENTRY glMultiTexCoord4dARB (GLenum, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMultiTexCoord4dvARB (GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexCoord4fARB (GLenum, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glMultiTexCoord4fvARB (GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexCoord4iARB (GLenum, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glMultiTexCoord4ivARB (GLenum, const GLint *); GLAPI void APIENTRY glMultiTexCoord4sARB (GLenum, GLshort, GLshort, GLshort, GLshort); GLAPI void APIENTRY glMultiTexCoord4svARB (GLenum, const GLshort *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLACTIVETEXTUREARBPROC) (GLenum texture); typedef void (APIENTRYP PFNGLCLIENTACTIVETEXTUREARBPROC) (GLenum texture); typedef void (APIENTRYP PFNGLMULTITEXCOORD1DARBPROC) (GLenum target, GLdouble s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1DVARBPROC) (GLenum target, const GLdouble *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD1FARBPROC) (GLenum target, GLfloat s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1FVARBPROC) (GLenum target, const GLfloat *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD1IARBPROC) (GLenum target, GLint s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1IVARBPROC) (GLenum target, const GLint *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD1SARBPROC) (GLenum target, GLshort s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1SVARBPROC) (GLenum target, const GLshort *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2DARBPROC) (GLenum target, GLdouble s, GLdouble t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2DVARBPROC) (GLenum target, const GLdouble *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2FARBPROC) (GLenum target, GLfloat s, GLfloat t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2FVARBPROC) (GLenum target, const GLfloat *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2IARBPROC) (GLenum target, GLint s, GLint t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2IVARBPROC) (GLenum target, const GLint *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2SARBPROC) (GLenum target, GLshort s, GLshort t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2SVARBPROC) (GLenum target, const GLshort *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3DARBPROC) (GLenum target, GLdouble s, GLdouble t, GLdouble r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3DVARBPROC) (GLenum target, const GLdouble *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3FARBPROC) (GLenum target, GLfloat s, GLfloat t, GLfloat r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3FVARBPROC) (GLenum target, const GLfloat *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3IARBPROC) (GLenum target, GLint s, GLint t, GLint r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3IVARBPROC) (GLenum target, const GLint *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3SARBPROC) (GLenum target, GLshort s, GLshort t, GLshort r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3SVARBPROC) (GLenum target, const GLshort *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4DARBPROC) (GLenum target, GLdouble s, GLdouble t, GLdouble r, GLdouble q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4DVARBPROC) (GLenum target, const GLdouble *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4FARBPROC) (GLenum target, GLfloat s, GLfloat t, GLfloat r, GLfloat q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4FVARBPROC) (GLenum target, const GLfloat *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4IARBPROC) (GLenum target, GLint s, GLint t, GLint r, GLint q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4IVARBPROC) (GLenum target, const GLint *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4SARBPROC) (GLenum target, GLshort s, GLshort t, GLshort r, GLshort q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4SVARBPROC) (GLenum target, const GLshort *v); #endif #ifndef GL_ARB_transpose_matrix #define GL_ARB_transpose_matrix 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glLoadTransposeMatrixfARB (const GLfloat *); GLAPI void APIENTRY glLoadTransposeMatrixdARB (const GLdouble *); GLAPI void APIENTRY glMultTransposeMatrixfARB (const GLfloat *); GLAPI void APIENTRY glMultTransposeMatrixdARB (const GLdouble *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXFARBPROC) (const GLfloat *m); typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXDARBPROC) (const GLdouble *m); typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXFARBPROC) (const GLfloat *m); typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXDARBPROC) (const GLdouble *m); #endif #ifndef GL_ARB_multisample #define GL_ARB_multisample 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glSampleCoverageARB (GLclampf, GLboolean); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSAMPLECOVERAGEARBPROC) (GLclampf value, GLboolean invert); #endif #ifndef GL_ARB_texture_env_add #define GL_ARB_texture_env_add 1 #endif #ifndef GL_ARB_texture_cube_map #define GL_ARB_texture_cube_map 1 #endif #ifndef GL_ARB_texture_compression #define GL_ARB_texture_compression 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glCompressedTexImage3DARB (GLenum, GLint, GLenum, GLsizei, GLsizei, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexImage2DARB (GLenum, GLint, GLenum, GLsizei, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexImage1DARB (GLenum, GLint, GLenum, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexSubImage3DARB (GLenum, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexSubImage2DARB (GLenum, GLint, GLint, GLint, GLsizei, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTexSubImage1DARB (GLenum, GLint, GLint, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glGetCompressedTexImageARB (GLenum, GLint, GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE3DARBPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE2DARBPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE1DARBPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE3DARBPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE1DARBPROC) (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const GLvoid *data); typedef void (APIENTRYP PFNGLGETCOMPRESSEDTEXIMAGEARBPROC) (GLenum target, GLint level, GLvoid *img); #endif #ifndef GL_ARB_texture_border_clamp #define GL_ARB_texture_border_clamp 1 #endif #ifndef GL_ARB_point_parameters #define GL_ARB_point_parameters 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPointParameterfARB (GLenum, GLfloat); GLAPI void APIENTRY glPointParameterfvARB (GLenum, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPOINTPARAMETERFARBPROC) (GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLPOINTPARAMETERFVARBPROC) (GLenum pname, const GLfloat *params); #endif #ifndef GL_ARB_vertex_blend #define GL_ARB_vertex_blend 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glWeightbvARB (GLint, const GLbyte *); GLAPI void APIENTRY glWeightsvARB (GLint, const GLshort *); GLAPI void APIENTRY glWeightivARB (GLint, const GLint *); GLAPI void APIENTRY glWeightfvARB (GLint, const GLfloat *); GLAPI void APIENTRY glWeightdvARB (GLint, const GLdouble *); GLAPI void APIENTRY glWeightubvARB (GLint, const GLubyte *); GLAPI void APIENTRY glWeightusvARB (GLint, const GLushort *); GLAPI void APIENTRY glWeightuivARB (GLint, const GLuint *); GLAPI void APIENTRY glWeightPointerARB (GLint, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glVertexBlendARB (GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLWEIGHTBVARBPROC) (GLint size, const GLbyte *weights); typedef void (APIENTRYP PFNGLWEIGHTSVARBPROC) (GLint size, const GLshort *weights); typedef void (APIENTRYP PFNGLWEIGHTIVARBPROC) (GLint size, const GLint *weights); typedef void (APIENTRYP PFNGLWEIGHTFVARBPROC) (GLint size, const GLfloat *weights); typedef void (APIENTRYP PFNGLWEIGHTDVARBPROC) (GLint size, const GLdouble *weights); typedef void (APIENTRYP PFNGLWEIGHTUBVARBPROC) (GLint size, const GLubyte *weights); typedef void (APIENTRYP PFNGLWEIGHTUSVARBPROC) (GLint size, const GLushort *weights); typedef void (APIENTRYP PFNGLWEIGHTUIVARBPROC) (GLint size, const GLuint *weights); typedef void (APIENTRYP PFNGLWEIGHTPOINTERARBPROC) (GLint size, GLenum type, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLVERTEXBLENDARBPROC) (GLint count); #endif #ifndef GL_ARB_matrix_palette #define GL_ARB_matrix_palette 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glCurrentPaletteMatrixARB (GLint); GLAPI void APIENTRY glMatrixIndexubvARB (GLint, const GLubyte *); GLAPI void APIENTRY glMatrixIndexusvARB (GLint, const GLushort *); GLAPI void APIENTRY glMatrixIndexuivARB (GLint, const GLuint *); GLAPI void APIENTRY glMatrixIndexPointerARB (GLint, GLenum, GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCURRENTPALETTEMATRIXARBPROC) (GLint index); typedef void (APIENTRYP PFNGLMATRIXINDEXUBVARBPROC) (GLint size, const GLubyte *indices); typedef void (APIENTRYP PFNGLMATRIXINDEXUSVARBPROC) (GLint size, const GLushort *indices); typedef void (APIENTRYP PFNGLMATRIXINDEXUIVARBPROC) (GLint size, const GLuint *indices); typedef void (APIENTRYP PFNGLMATRIXINDEXPOINTERARBPROC) (GLint size, GLenum type, GLsizei stride, const GLvoid *pointer); #endif #ifndef GL_ARB_texture_env_combine #define GL_ARB_texture_env_combine 1 #endif #ifndef GL_ARB_texture_env_crossbar #define GL_ARB_texture_env_crossbar 1 #endif #ifndef GL_ARB_texture_env_dot3 #define GL_ARB_texture_env_dot3 1 #endif #ifndef GL_ARB_texture_mirrored_repeat #define GL_ARB_texture_mirrored_repeat 1 #endif #ifndef GL_ARB_depth_texture #define GL_ARB_depth_texture 1 #endif #ifndef GL_ARB_shadow #define GL_ARB_shadow 1 #endif #ifndef GL_ARB_shadow_ambient #define GL_ARB_shadow_ambient 1 #endif #ifndef GL_ARB_window_pos #define GL_ARB_window_pos 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glWindowPos2dARB (GLdouble, GLdouble); GLAPI void APIENTRY glWindowPos2dvARB (const GLdouble *); GLAPI void APIENTRY glWindowPos2fARB (GLfloat, GLfloat); GLAPI void APIENTRY glWindowPos2fvARB (const GLfloat *); GLAPI void APIENTRY glWindowPos2iARB (GLint, GLint); GLAPI void APIENTRY glWindowPos2ivARB (const GLint *); GLAPI void APIENTRY glWindowPos2sARB (GLshort, GLshort); GLAPI void APIENTRY glWindowPos2svARB (const GLshort *); GLAPI void APIENTRY glWindowPos3dARB (GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glWindowPos3dvARB (const GLdouble *); GLAPI void APIENTRY glWindowPos3fARB (GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glWindowPos3fvARB (const GLfloat *); GLAPI void APIENTRY glWindowPos3iARB (GLint, GLint, GLint); GLAPI void APIENTRY glWindowPos3ivARB (const GLint *); GLAPI void APIENTRY glWindowPos3sARB (GLshort, GLshort, GLshort); GLAPI void APIENTRY glWindowPos3svARB (const GLshort *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLWINDOWPOS2DARBPROC) (GLdouble x, GLdouble y); typedef void (APIENTRYP PFNGLWINDOWPOS2DVARBPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLWINDOWPOS2FARBPROC) (GLfloat x, GLfloat y); typedef void (APIENTRYP PFNGLWINDOWPOS2FVARBPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLWINDOWPOS2IARBPROC) (GLint x, GLint y); typedef void (APIENTRYP PFNGLWINDOWPOS2IVARBPROC) (const GLint *v); typedef void (APIENTRYP PFNGLWINDOWPOS2SARBPROC) (GLshort x, GLshort y); typedef void (APIENTRYP PFNGLWINDOWPOS2SVARBPROC) (const GLshort *v); typedef void (APIENTRYP PFNGLWINDOWPOS3DARBPROC) (GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLWINDOWPOS3DVARBPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLWINDOWPOS3FARBPROC) (GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLWINDOWPOS3FVARBPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLWINDOWPOS3IARBPROC) (GLint x, GLint y, GLint z); typedef void (APIENTRYP PFNGLWINDOWPOS3IVARBPROC) (const GLint *v); typedef void (APIENTRYP PFNGLWINDOWPOS3SARBPROC) (GLshort x, GLshort y, GLshort z); typedef void (APIENTRYP PFNGLWINDOWPOS3SVARBPROC) (const GLshort *v); #endif #ifndef GL_ARB_vertex_program #define GL_ARB_vertex_program 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertexAttrib1dARB (GLuint, GLdouble); GLAPI void APIENTRY glVertexAttrib1dvARB (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib1fARB (GLuint, GLfloat); GLAPI void APIENTRY glVertexAttrib1fvARB (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib1sARB (GLuint, GLshort); GLAPI void APIENTRY glVertexAttrib1svARB (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib2dARB (GLuint, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib2dvARB (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib2fARB (GLuint, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib2fvARB (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib2sARB (GLuint, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib2svARB (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib3dARB (GLuint, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib3dvARB (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib3fARB (GLuint, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib3fvARB (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib3sARB (GLuint, GLshort, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib3svARB (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib4NbvARB (GLuint, const GLbyte *); GLAPI void APIENTRY glVertexAttrib4NivARB (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttrib4NsvARB (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib4NubARB (GLuint, GLubyte, GLubyte, GLubyte, GLubyte); GLAPI void APIENTRY glVertexAttrib4NubvARB (GLuint, const GLubyte *); GLAPI void APIENTRY glVertexAttrib4NuivARB (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttrib4NusvARB (GLuint, const GLushort *); GLAPI void APIENTRY glVertexAttrib4bvARB (GLuint, const GLbyte *); GLAPI void APIENTRY glVertexAttrib4dARB (GLuint, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib4dvARB (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib4fARB (GLuint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib4fvARB (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib4ivARB (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttrib4sARB (GLuint, GLshort, GLshort, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib4svARB (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib4ubvARB (GLuint, const GLubyte *); GLAPI void APIENTRY glVertexAttrib4uivARB (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttrib4usvARB (GLuint, const GLushort *); GLAPI void APIENTRY glVertexAttribPointerARB (GLuint, GLint, GLenum, GLboolean, GLsizei, const GLvoid *); GLAPI void APIENTRY glEnableVertexAttribArrayARB (GLuint); GLAPI void APIENTRY glDisableVertexAttribArrayARB (GLuint); GLAPI void APIENTRY glProgramStringARB (GLenum, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glBindProgramARB (GLenum, GLuint); GLAPI void APIENTRY glDeleteProgramsARB (GLsizei, const GLuint *); GLAPI void APIENTRY glGenProgramsARB (GLsizei, GLuint *); GLAPI void APIENTRY glProgramEnvParameter4dARB (GLenum, GLuint, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glProgramEnvParameter4dvARB (GLenum, GLuint, const GLdouble *); GLAPI void APIENTRY glProgramEnvParameter4fARB (GLenum, GLuint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glProgramEnvParameter4fvARB (GLenum, GLuint, const GLfloat *); GLAPI void APIENTRY glProgramLocalParameter4dARB (GLenum, GLuint, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glProgramLocalParameter4dvARB (GLenum, GLuint, const GLdouble *); GLAPI void APIENTRY glProgramLocalParameter4fARB (GLenum, GLuint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glProgramLocalParameter4fvARB (GLenum, GLuint, const GLfloat *); GLAPI void APIENTRY glGetProgramEnvParameterdvARB (GLenum, GLuint, GLdouble *); GLAPI void APIENTRY glGetProgramEnvParameterfvARB (GLenum, GLuint, GLfloat *); GLAPI void APIENTRY glGetProgramLocalParameterdvARB (GLenum, GLuint, GLdouble *); GLAPI void APIENTRY glGetProgramLocalParameterfvARB (GLenum, GLuint, GLfloat *); GLAPI void APIENTRY glGetProgramivARB (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetProgramStringARB (GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetVertexAttribdvARB (GLuint, GLenum, GLdouble *); GLAPI void APIENTRY glGetVertexAttribfvARB (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetVertexAttribivARB (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetVertexAttribPointervARB (GLuint, GLenum, GLvoid* *); GLAPI GLboolean APIENTRY glIsProgramARB (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEXATTRIB1DARBPROC) (GLuint index, GLdouble x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1DVARBPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB1FARBPROC) (GLuint index, GLfloat x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1FVARBPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB1SARBPROC) (GLuint index, GLshort x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1SVARBPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2DARBPROC) (GLuint index, GLdouble x, GLdouble y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2DVARBPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2FARBPROC) (GLuint index, GLfloat x, GLfloat y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2FVARBPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2SARBPROC) (GLuint index, GLshort x, GLshort y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2SVARBPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3DARBPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3DVARBPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3FARBPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3FVARBPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3SARBPROC) (GLuint index, GLshort x, GLshort y, GLshort z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3SVARBPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NBVARBPROC) (GLuint index, const GLbyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NIVARBPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NSVARBPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUBARBPROC) (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUBVARBPROC) (GLuint index, const GLubyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUIVARBPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUSVARBPROC) (GLuint index, const GLushort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4BVARBPROC) (GLuint index, const GLbyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4DARBPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4DVARBPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4FARBPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4FVARBPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4IVARBPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4SARBPROC) (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4SVARBPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4UBVARBPROC) (GLuint index, const GLubyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4UIVARBPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4USVARBPROC) (GLuint index, const GLushort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBPOINTERARBPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLENABLEVERTEXATTRIBARRAYARBPROC) (GLuint index); typedef void (APIENTRYP PFNGLDISABLEVERTEXATTRIBARRAYARBPROC) (GLuint index); typedef void (APIENTRYP PFNGLPROGRAMSTRINGARBPROC) (GLenum target, GLenum format, GLsizei len, const GLvoid *string); typedef void (APIENTRYP PFNGLBINDPROGRAMARBPROC) (GLenum target, GLuint program); typedef void (APIENTRYP PFNGLDELETEPROGRAMSARBPROC) (GLsizei n, const GLuint *programs); typedef void (APIENTRYP PFNGLGENPROGRAMSARBPROC) (GLsizei n, GLuint *programs); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETER4DARBPROC) (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETER4DVARBPROC) (GLenum target, GLuint index, const GLdouble *params); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETER4FARBPROC) (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETER4FVARBPROC) (GLenum target, GLuint index, const GLfloat *params); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETER4DARBPROC) (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETER4DVARBPROC) (GLenum target, GLuint index, const GLdouble *params); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETER4FARBPROC) (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETER4FVARBPROC) (GLenum target, GLuint index, const GLfloat *params); typedef void (APIENTRYP PFNGLGETPROGRAMENVPARAMETERDVARBPROC) (GLenum target, GLuint index, GLdouble *params); typedef void (APIENTRYP PFNGLGETPROGRAMENVPARAMETERFVARBPROC) (GLenum target, GLuint index, GLfloat *params); typedef void (APIENTRYP PFNGLGETPROGRAMLOCALPARAMETERDVARBPROC) (GLenum target, GLuint index, GLdouble *params); typedef void (APIENTRYP PFNGLGETPROGRAMLOCALPARAMETERFVARBPROC) (GLenum target, GLuint index, GLfloat *params); typedef void (APIENTRYP PFNGLGETPROGRAMIVARBPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETPROGRAMSTRINGARBPROC) (GLenum target, GLenum pname, GLvoid *string); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBDVARBPROC) (GLuint index, GLenum pname, GLdouble *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBFVARBPROC) (GLuint index, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIVARBPROC) (GLuint index, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVARBPROC) (GLuint index, GLenum pname, GLvoid* *pointer); typedef GLboolean (APIENTRYP PFNGLISPROGRAMARBPROC) (GLuint program); #endif #ifndef GL_ARB_fragment_program #define GL_ARB_fragment_program 1 /* All ARB_fragment_program entry points are shared with ARB_vertex_program. */ #endif #ifndef GL_ARB_vertex_buffer_object #define GL_ARB_vertex_buffer_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBindBufferARB (GLenum, GLuint); GLAPI void APIENTRY glDeleteBuffersARB (GLsizei, const GLuint *); GLAPI void APIENTRY glGenBuffersARB (GLsizei, GLuint *); GLAPI GLboolean APIENTRY glIsBufferARB (GLuint); GLAPI void APIENTRY glBufferDataARB (GLenum, GLsizeiptrARB, const GLvoid *, GLenum); GLAPI void APIENTRY glBufferSubDataARB (GLenum, GLintptrARB, GLsizeiptrARB, const GLvoid *); GLAPI void APIENTRY glGetBufferSubDataARB (GLenum, GLintptrARB, GLsizeiptrARB, GLvoid *); GLAPI GLvoid* APIENTRY glMapBufferARB (GLenum, GLenum); GLAPI GLboolean APIENTRY glUnmapBufferARB (GLenum); GLAPI void APIENTRY glGetBufferParameterivARB (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetBufferPointervARB (GLenum, GLenum, GLvoid* *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBINDBUFFERARBPROC) (GLenum target, GLuint buffer); typedef void (APIENTRYP PFNGLDELETEBUFFERSARBPROC) (GLsizei n, const GLuint *buffers); typedef void (APIENTRYP PFNGLGENBUFFERSARBPROC) (GLsizei n, GLuint *buffers); typedef GLboolean (APIENTRYP PFNGLISBUFFERARBPROC) (GLuint buffer); typedef void (APIENTRYP PFNGLBUFFERDATAARBPROC) (GLenum target, GLsizeiptrARB size, const GLvoid *data, GLenum usage); typedef void (APIENTRYP PFNGLBUFFERSUBDATAARBPROC) (GLenum target, GLintptrARB offset, GLsizeiptrARB size, const GLvoid *data); typedef void (APIENTRYP PFNGLGETBUFFERSUBDATAARBPROC) (GLenum target, GLintptrARB offset, GLsizeiptrARB size, GLvoid *data); typedef GLvoid* (APIENTRYP PFNGLMAPBUFFERARBPROC) (GLenum target, GLenum access); typedef GLboolean (APIENTRYP PFNGLUNMAPBUFFERARBPROC) (GLenum target); typedef void (APIENTRYP PFNGLGETBUFFERPARAMETERIVARBPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETBUFFERPOINTERVARBPROC) (GLenum target, GLenum pname, GLvoid* *params); #endif #ifndef GL_ARB_occlusion_query #define GL_ARB_occlusion_query 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGenQueriesARB (GLsizei, GLuint *); GLAPI void APIENTRY glDeleteQueriesARB (GLsizei, const GLuint *); GLAPI GLboolean APIENTRY glIsQueryARB (GLuint); GLAPI void APIENTRY glBeginQueryARB (GLenum, GLuint); GLAPI void APIENTRY glEndQueryARB (GLenum); GLAPI void APIENTRY glGetQueryivARB (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetQueryObjectivARB (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetQueryObjectuivARB (GLuint, GLenum, GLuint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGENQUERIESARBPROC) (GLsizei n, GLuint *ids); typedef void (APIENTRYP PFNGLDELETEQUERIESARBPROC) (GLsizei n, const GLuint *ids); typedef GLboolean (APIENTRYP PFNGLISQUERYARBPROC) (GLuint id); typedef void (APIENTRYP PFNGLBEGINQUERYARBPROC) (GLenum target, GLuint id); typedef void (APIENTRYP PFNGLENDQUERYARBPROC) (GLenum target); typedef void (APIENTRYP PFNGLGETQUERYIVARBPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETQUERYOBJECTIVARBPROC) (GLuint id, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETQUERYOBJECTUIVARBPROC) (GLuint id, GLenum pname, GLuint *params); #endif #ifndef GL_ARB_shader_objects #define GL_ARB_shader_objects 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDeleteObjectARB (GLhandleARB); GLAPI GLhandleARB APIENTRY glGetHandleARB (GLenum); GLAPI void APIENTRY glDetachObjectARB (GLhandleARB, GLhandleARB); GLAPI GLhandleARB APIENTRY glCreateShaderObjectARB (GLenum); GLAPI void APIENTRY glShaderSourceARB (GLhandleARB, GLsizei, const GLcharARB* *, const GLint *); GLAPI void APIENTRY glCompileShaderARB (GLhandleARB); GLAPI GLhandleARB APIENTRY glCreateProgramObjectARB (void); GLAPI void APIENTRY glAttachObjectARB (GLhandleARB, GLhandleARB); GLAPI void APIENTRY glLinkProgramARB (GLhandleARB); GLAPI void APIENTRY glUseProgramObjectARB (GLhandleARB); GLAPI void APIENTRY glValidateProgramARB (GLhandleARB); GLAPI void APIENTRY glUniform1fARB (GLint, GLfloat); GLAPI void APIENTRY glUniform2fARB (GLint, GLfloat, GLfloat); GLAPI void APIENTRY glUniform3fARB (GLint, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glUniform4fARB (GLint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glUniform1iARB (GLint, GLint); GLAPI void APIENTRY glUniform2iARB (GLint, GLint, GLint); GLAPI void APIENTRY glUniform3iARB (GLint, GLint, GLint, GLint); GLAPI void APIENTRY glUniform4iARB (GLint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glUniform1fvARB (GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glUniform2fvARB (GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glUniform3fvARB (GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glUniform4fvARB (GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glUniform1ivARB (GLint, GLsizei, const GLint *); GLAPI void APIENTRY glUniform2ivARB (GLint, GLsizei, const GLint *); GLAPI void APIENTRY glUniform3ivARB (GLint, GLsizei, const GLint *); GLAPI void APIENTRY glUniform4ivARB (GLint, GLsizei, const GLint *); GLAPI void APIENTRY glUniformMatrix2fvARB (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix3fvARB (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glUniformMatrix4fvARB (GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glGetObjectParameterfvARB (GLhandleARB, GLenum, GLfloat *); GLAPI void APIENTRY glGetObjectParameterivARB (GLhandleARB, GLenum, GLint *); GLAPI void APIENTRY glGetInfoLogARB (GLhandleARB, GLsizei, GLsizei *, GLcharARB *); GLAPI void APIENTRY glGetAttachedObjectsARB (GLhandleARB, GLsizei, GLsizei *, GLhandleARB *); GLAPI GLint APIENTRY glGetUniformLocationARB (GLhandleARB, const GLcharARB *); GLAPI void APIENTRY glGetActiveUniformARB (GLhandleARB, GLuint, GLsizei, GLsizei *, GLint *, GLenum *, GLcharARB *); GLAPI void APIENTRY glGetUniformfvARB (GLhandleARB, GLint, GLfloat *); GLAPI void APIENTRY glGetUniformivARB (GLhandleARB, GLint, GLint *); GLAPI void APIENTRY glGetShaderSourceARB (GLhandleARB, GLsizei, GLsizei *, GLcharARB *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDELETEOBJECTARBPROC) (GLhandleARB obj); typedef GLhandleARB (APIENTRYP PFNGLGETHANDLEARBPROC) (GLenum pname); typedef void (APIENTRYP PFNGLDETACHOBJECTARBPROC) (GLhandleARB containerObj, GLhandleARB attachedObj); typedef GLhandleARB (APIENTRYP PFNGLCREATESHADEROBJECTARBPROC) (GLenum shaderType); typedef void (APIENTRYP PFNGLSHADERSOURCEARBPROC) (GLhandleARB shaderObj, GLsizei count, const GLcharARB* *string, const GLint *length); typedef void (APIENTRYP PFNGLCOMPILESHADERARBPROC) (GLhandleARB shaderObj); typedef GLhandleARB (APIENTRYP PFNGLCREATEPROGRAMOBJECTARBPROC) (void); typedef void (APIENTRYP PFNGLATTACHOBJECTARBPROC) (GLhandleARB containerObj, GLhandleARB obj); typedef void (APIENTRYP PFNGLLINKPROGRAMARBPROC) (GLhandleARB programObj); typedef void (APIENTRYP PFNGLUSEPROGRAMOBJECTARBPROC) (GLhandleARB programObj); typedef void (APIENTRYP PFNGLVALIDATEPROGRAMARBPROC) (GLhandleARB programObj); typedef void (APIENTRYP PFNGLUNIFORM1FARBPROC) (GLint location, GLfloat v0); typedef void (APIENTRYP PFNGLUNIFORM2FARBPROC) (GLint location, GLfloat v0, GLfloat v1); typedef void (APIENTRYP PFNGLUNIFORM3FARBPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2); typedef void (APIENTRYP PFNGLUNIFORM4FARBPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3); typedef void (APIENTRYP PFNGLUNIFORM1IARBPROC) (GLint location, GLint v0); typedef void (APIENTRYP PFNGLUNIFORM2IARBPROC) (GLint location, GLint v0, GLint v1); typedef void (APIENTRYP PFNGLUNIFORM3IARBPROC) (GLint location, GLint v0, GLint v1, GLint v2); typedef void (APIENTRYP PFNGLUNIFORM4IARBPROC) (GLint location, GLint v0, GLint v1, GLint v2, GLint v3); typedef void (APIENTRYP PFNGLUNIFORM1FVARBPROC) (GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORM2FVARBPROC) (GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORM3FVARBPROC) (GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORM4FVARBPROC) (GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORM1IVARBPROC) (GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLUNIFORM2IVARBPROC) (GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLUNIFORM3IVARBPROC) (GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLUNIFORM4IVARBPROC) (GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX2FVARBPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX3FVARBPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLUNIFORMMATRIX4FVARBPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLGETOBJECTPARAMETERFVARBPROC) (GLhandleARB obj, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETOBJECTPARAMETERIVARBPROC) (GLhandleARB obj, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETINFOLOGARBPROC) (GLhandleARB obj, GLsizei maxLength, GLsizei *length, GLcharARB *infoLog); typedef void (APIENTRYP PFNGLGETATTACHEDOBJECTSARBPROC) (GLhandleARB containerObj, GLsizei maxCount, GLsizei *count, GLhandleARB *obj); typedef GLint (APIENTRYP PFNGLGETUNIFORMLOCATIONARBPROC) (GLhandleARB programObj, const GLcharARB *name); typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMARBPROC) (GLhandleARB programObj, GLuint index, GLsizei maxLength, GLsizei *length, GLint *size, GLenum *type, GLcharARB *name); typedef void (APIENTRYP PFNGLGETUNIFORMFVARBPROC) (GLhandleARB programObj, GLint location, GLfloat *params); typedef void (APIENTRYP PFNGLGETUNIFORMIVARBPROC) (GLhandleARB programObj, GLint location, GLint *params); typedef void (APIENTRYP PFNGLGETSHADERSOURCEARBPROC) (GLhandleARB obj, GLsizei maxLength, GLsizei *length, GLcharARB *source); #endif #ifndef GL_ARB_vertex_shader #define GL_ARB_vertex_shader 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBindAttribLocationARB (GLhandleARB, GLuint, const GLcharARB *); GLAPI void APIENTRY glGetActiveAttribARB (GLhandleARB, GLuint, GLsizei, GLsizei *, GLint *, GLenum *, GLcharARB *); GLAPI GLint APIENTRY glGetAttribLocationARB (GLhandleARB, const GLcharARB *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBINDATTRIBLOCATIONARBPROC) (GLhandleARB programObj, GLuint index, const GLcharARB *name); typedef void (APIENTRYP PFNGLGETACTIVEATTRIBARBPROC) (GLhandleARB programObj, GLuint index, GLsizei maxLength, GLsizei *length, GLint *size, GLenum *type, GLcharARB *name); typedef GLint (APIENTRYP PFNGLGETATTRIBLOCATIONARBPROC) (GLhandleARB programObj, const GLcharARB *name); #endif #ifndef GL_ARB_fragment_shader #define GL_ARB_fragment_shader 1 #endif #ifndef GL_ARB_shading_language_100 #define GL_ARB_shading_language_100 1 #endif #ifndef GL_ARB_texture_non_power_of_two #define GL_ARB_texture_non_power_of_two 1 #endif #ifndef GL_ARB_point_sprite #define GL_ARB_point_sprite 1 #endif #ifndef GL_ARB_fragment_program_shadow #define GL_ARB_fragment_program_shadow 1 #endif #ifndef GL_ARB_draw_buffers #define GL_ARB_draw_buffers 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDrawBuffersARB (GLsizei, const GLenum *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDRAWBUFFERSARBPROC) (GLsizei n, const GLenum *bufs); #endif #ifndef GL_ARB_texture_rectangle #define GL_ARB_texture_rectangle 1 #endif #ifndef GL_ARB_color_buffer_float #define GL_ARB_color_buffer_float 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glClampColorARB (GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCLAMPCOLORARBPROC) (GLenum target, GLenum clamp); #endif #ifndef GL_ARB_half_float_pixel #define GL_ARB_half_float_pixel 1 #endif #ifndef GL_ARB_texture_float #define GL_ARB_texture_float 1 #endif #ifndef GL_ARB_pixel_buffer_object #define GL_ARB_pixel_buffer_object 1 #endif #ifndef GL_ARB_depth_buffer_float #define GL_ARB_depth_buffer_float 1 #endif #ifndef GL_ARB_draw_instanced #define GL_ARB_draw_instanced 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDrawArraysInstancedARB (GLenum, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glDrawElementsInstancedARB (GLenum, GLsizei, GLenum, const GLvoid *, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDRAWARRAYSINSTANCEDARBPROC) (GLenum mode, GLint first, GLsizei count, GLsizei primcount); typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDARBPROC) (GLenum mode, GLsizei count, GLenum type, const GLvoid *indices, GLsizei primcount); #endif #ifndef GL_ARB_framebuffer_object #define GL_ARB_framebuffer_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLboolean APIENTRY glIsRenderbuffer (GLuint); GLAPI void APIENTRY glBindRenderbuffer (GLenum, GLuint); GLAPI void APIENTRY glDeleteRenderbuffers (GLsizei, const GLuint *); GLAPI void APIENTRY glGenRenderbuffers (GLsizei, GLuint *); GLAPI void APIENTRY glRenderbufferStorage (GLenum, GLenum, GLsizei, GLsizei); GLAPI void APIENTRY glGetRenderbufferParameteriv (GLenum, GLenum, GLint *); GLAPI GLboolean APIENTRY glIsFramebuffer (GLuint); GLAPI void APIENTRY glBindFramebuffer (GLenum, GLuint); GLAPI void APIENTRY glDeleteFramebuffers (GLsizei, const GLuint *); GLAPI void APIENTRY glGenFramebuffers (GLsizei, GLuint *); GLAPI GLenum APIENTRY glCheckFramebufferStatus (GLenum); GLAPI void APIENTRY glFramebufferTexture1D (GLenum, GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glFramebufferTexture2D (GLenum, GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glFramebufferTexture3D (GLenum, GLenum, GLenum, GLuint, GLint, GLint); GLAPI void APIENTRY glFramebufferRenderbuffer (GLenum, GLenum, GLenum, GLuint); GLAPI void APIENTRY glGetFramebufferAttachmentParameteriv (GLenum, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGenerateMipmap (GLenum); GLAPI void APIENTRY glBlitFramebuffer (GLint, GLint, GLint, GLint, GLint, GLint, GLint, GLint, GLbitfield, GLenum); GLAPI void APIENTRY glRenderbufferStorageMultisample (GLenum, GLsizei, GLenum, GLsizei, GLsizei); GLAPI void APIENTRY glFramebufferTextureLayer (GLenum, GLenum, GLuint, GLint, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLboolean (APIENTRYP PFNGLISRENDERBUFFERPROC) (GLuint renderbuffer); typedef void (APIENTRYP PFNGLBINDRENDERBUFFERPROC) (GLenum target, GLuint renderbuffer); typedef void (APIENTRYP PFNGLDELETERENDERBUFFERSPROC) (GLsizei n, const GLuint *renderbuffers); typedef void (APIENTRYP PFNGLGENRENDERBUFFERSPROC) (GLsizei n, GLuint *renderbuffers); typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLGETRENDERBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params); typedef GLboolean (APIENTRYP PFNGLISFRAMEBUFFERPROC) (GLuint framebuffer); typedef void (APIENTRYP PFNGLBINDFRAMEBUFFERPROC) (GLenum target, GLuint framebuffer); typedef void (APIENTRYP PFNGLDELETEFRAMEBUFFERSPROC) (GLsizei n, const GLuint *framebuffers); typedef void (APIENTRYP PFNGLGENFRAMEBUFFERSPROC) (GLsizei n, GLuint *framebuffers); typedef GLenum (APIENTRYP PFNGLCHECKFRAMEBUFFERSTATUSPROC) (GLenum target); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE1DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE2DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE3DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset); typedef void (APIENTRYP PFNGLFRAMEBUFFERRENDERBUFFERPROC) (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer); typedef void (APIENTRYP PFNGLGETFRAMEBUFFERATTACHMENTPARAMETERIVPROC) (GLenum target, GLenum attachment, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGENERATEMIPMAPPROC) (GLenum target); typedef void (APIENTRYP PFNGLBLITFRAMEBUFFERPROC) (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter); typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYERPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer); #endif #ifndef GL_ARB_framebuffer_sRGB #define GL_ARB_framebuffer_sRGB 1 #endif #ifndef GL_ARB_geometry_shader4 #define GL_ARB_geometry_shader4 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProgramParameteriARB (GLuint, GLenum, GLint); GLAPI void APIENTRY glFramebufferTextureARB (GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glFramebufferTextureLayerARB (GLenum, GLenum, GLuint, GLint, GLint); GLAPI void APIENTRY glFramebufferTextureFaceARB (GLenum, GLenum, GLuint, GLint, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROGRAMPARAMETERIARBPROC) (GLuint program, GLenum pname, GLint value); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREARBPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYERARBPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREFACEARBPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLenum face); #endif #ifndef GL_ARB_half_float_vertex #define GL_ARB_half_float_vertex 1 #endif #ifndef GL_ARB_instanced_arrays #define GL_ARB_instanced_arrays 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertexAttribDivisorARB (GLuint, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEXATTRIBDIVISORARBPROC) (GLuint index, GLuint divisor); #endif #ifndef GL_ARB_map_buffer_range #define GL_ARB_map_buffer_range 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLvoid* APIENTRY glMapBufferRange (GLenum, GLintptr, GLsizeiptr, GLbitfield); GLAPI void APIENTRY glFlushMappedBufferRange (GLenum, GLintptr, GLsizeiptr); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLvoid* (APIENTRYP PFNGLMAPBUFFERRANGEPROC) (GLenum target, GLintptr offset, GLsizeiptr length, GLbitfield access); typedef void (APIENTRYP PFNGLFLUSHMAPPEDBUFFERRANGEPROC) (GLenum target, GLintptr offset, GLsizeiptr length); #endif #ifndef GL_ARB_texture_buffer_object #define GL_ARB_texture_buffer_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTexBufferARB (GLenum, GLenum, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXBUFFERARBPROC) (GLenum target, GLenum internalformat, GLuint buffer); #endif #ifndef GL_ARB_texture_compression_rgtc #define GL_ARB_texture_compression_rgtc 1 #endif #ifndef GL_ARB_texture_rg #define GL_ARB_texture_rg 1 #endif #ifndef GL_ARB_vertex_array_object #define GL_ARB_vertex_array_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBindVertexArray (GLuint); GLAPI void APIENTRY glDeleteVertexArrays (GLsizei, const GLuint *); GLAPI void APIENTRY glGenVertexArrays (GLsizei, GLuint *); GLAPI GLboolean APIENTRY glIsVertexArray (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBINDVERTEXARRAYPROC) (GLuint array); typedef void (APIENTRYP PFNGLDELETEVERTEXARRAYSPROC) (GLsizei n, const GLuint *arrays); typedef void (APIENTRYP PFNGLGENVERTEXARRAYSPROC) (GLsizei n, GLuint *arrays); typedef GLboolean (APIENTRYP PFNGLISVERTEXARRAYPROC) (GLuint array); #endif #ifndef GL_ARB_uniform_buffer_object #define GL_ARB_uniform_buffer_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetUniformIndices (GLuint, GLsizei, const GLchar* *, GLuint *); GLAPI void APIENTRY glGetActiveUniformsiv (GLuint, GLsizei, const GLuint *, GLenum, GLint *); GLAPI void APIENTRY glGetActiveUniformName (GLuint, GLuint, GLsizei, GLsizei *, GLchar *); GLAPI GLuint APIENTRY glGetUniformBlockIndex (GLuint, const GLchar *); GLAPI void APIENTRY glGetActiveUniformBlockiv (GLuint, GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetActiveUniformBlockName (GLuint, GLuint, GLsizei, GLsizei *, GLchar *); GLAPI void APIENTRY glUniformBlockBinding (GLuint, GLuint, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETUNIFORMINDICESPROC) (GLuint program, GLsizei uniformCount, const GLchar* *uniformNames, GLuint *uniformIndices); typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMSIVPROC) (GLuint program, GLsizei uniformCount, const GLuint *uniformIndices, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMNAMEPROC) (GLuint program, GLuint uniformIndex, GLsizei bufSize, GLsizei *length, GLchar *uniformName); typedef GLuint (APIENTRYP PFNGLGETUNIFORMBLOCKINDEXPROC) (GLuint program, const GLchar *uniformBlockName); typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMBLOCKIVPROC) (GLuint program, GLuint uniformBlockIndex, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMBLOCKNAMEPROC) (GLuint program, GLuint uniformBlockIndex, GLsizei bufSize, GLsizei *length, GLchar *uniformBlockName); typedef void (APIENTRYP PFNGLUNIFORMBLOCKBINDINGPROC) (GLuint program, GLuint uniformBlockIndex, GLuint uniformBlockBinding); #endif #ifndef GL_ARB_compatibility #define GL_ARB_compatibility 1 #endif #ifndef GL_ARB_copy_buffer #define GL_ARB_copy_buffer 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glCopyBufferSubData (GLenum, GLenum, GLintptr, GLintptr, GLsizeiptr); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOPYBUFFERSUBDATAPROC) (GLenum readTarget, GLenum writeTarget, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size); #endif #ifndef GL_ARB_shader_texture_lod #define GL_ARB_shader_texture_lod 1 #endif #ifndef GL_ARB_depth_clamp #define GL_ARB_depth_clamp 1 #endif #ifndef GL_ARB_draw_elements_base_vertex #define GL_ARB_draw_elements_base_vertex 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDrawElementsBaseVertex (GLenum, GLsizei, GLenum, const GLvoid *, GLint); GLAPI void APIENTRY glDrawRangeElementsBaseVertex (GLenum, GLuint, GLuint, GLsizei, GLenum, const GLvoid *, GLint); GLAPI void APIENTRY glDrawElementsInstancedBaseVertex (GLenum, GLsizei, GLenum, const GLvoid *, GLsizei, GLint); GLAPI void APIENTRY glMultiDrawElementsBaseVertex (GLenum, const GLsizei *, GLenum, const GLvoid* *, GLsizei, const GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDRAWELEMENTSBASEVERTEXPROC) (GLenum mode, GLsizei count, GLenum type, const GLvoid *indices, GLint basevertex); typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTSBASEVERTEXPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const GLvoid *indices, GLint basevertex); typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDBASEVERTEXPROC) (GLenum mode, GLsizei count, GLenum type, const GLvoid *indices, GLsizei primcount, GLint basevertex); typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSBASEVERTEXPROC) (GLenum mode, const GLsizei *count, GLenum type, const GLvoid* *indices, GLsizei primcount, const GLint *basevertex); #endif #ifndef GL_ARB_fragment_coord_conventions #define GL_ARB_fragment_coord_conventions 1 #endif #ifndef GL_ARB_provoking_vertex #define GL_ARB_provoking_vertex 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProvokingVertex (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROVOKINGVERTEXPROC) (GLenum mode); #endif #ifndef GL_ARB_seamless_cube_map #define GL_ARB_seamless_cube_map 1 #endif #ifndef GL_ARB_sync #define GL_ARB_sync 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLsync APIENTRY glFenceSync (GLenum, GLbitfield); GLAPI GLboolean APIENTRY glIsSync (GLsync); GLAPI void APIENTRY glDeleteSync (GLsync); GLAPI GLenum APIENTRY glClientWaitSync (GLsync, GLbitfield, GLuint64); GLAPI void APIENTRY glWaitSync (GLsync, GLbitfield, GLuint64); GLAPI void APIENTRY glGetInteger64v (GLenum, GLint64 *); GLAPI void APIENTRY glGetSynciv (GLsync, GLenum, GLsizei, GLsizei *, GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLsync (APIENTRYP PFNGLFENCESYNCPROC) (GLenum condition, GLbitfield flags); typedef GLboolean (APIENTRYP PFNGLISSYNCPROC) (GLsync sync); typedef void (APIENTRYP PFNGLDELETESYNCPROC) (GLsync sync); typedef GLenum (APIENTRYP PFNGLCLIENTWAITSYNCPROC) (GLsync sync, GLbitfield flags, GLuint64 timeout); typedef void (APIENTRYP PFNGLWAITSYNCPROC) (GLsync sync, GLbitfield flags, GLuint64 timeout); typedef void (APIENTRYP PFNGLGETINTEGER64VPROC) (GLenum pname, GLint64 *params); typedef void (APIENTRYP PFNGLGETSYNCIVPROC) (GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values); #endif #ifndef GL_ARB_texture_multisample #define GL_ARB_texture_multisample 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTexImage2DMultisample (GLenum, GLsizei, GLint, GLsizei, GLsizei, GLboolean); GLAPI void APIENTRY glTexImage3DMultisample (GLenum, GLsizei, GLint, GLsizei, GLsizei, GLsizei, GLboolean); GLAPI void APIENTRY glGetMultisamplefv (GLenum, GLuint, GLfloat *); GLAPI void APIENTRY glSampleMaski (GLuint, GLbitfield); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXIMAGE2DMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLint internalformat, GLsizei width, GLsizei height, GLboolean fixedsamplelocations); typedef void (APIENTRYP PFNGLTEXIMAGE3DMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedsamplelocations); typedef void (APIENTRYP PFNGLGETMULTISAMPLEFVPROC) (GLenum pname, GLuint index, GLfloat *val); typedef void (APIENTRYP PFNGLSAMPLEMASKIPROC) (GLuint index, GLbitfield mask); #endif #ifndef GL_ARB_vertex_array_bgra #define GL_ARB_vertex_array_bgra 1 #endif #ifndef GL_ARB_draw_buffers_blend #define GL_ARB_draw_buffers_blend 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendEquationi (GLuint, GLenum); GLAPI void APIENTRY glBlendEquationSeparatei (GLuint, GLenum, GLenum); GLAPI void APIENTRY glBlendFunci (GLuint, GLenum, GLenum); GLAPI void APIENTRY glBlendFuncSeparatei (GLuint, GLenum, GLenum, GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDEQUATIONIPROC) (GLuint buf, GLenum mode); typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEIPROC) (GLuint buf, GLenum modeRGB, GLenum modeAlpha); typedef void (APIENTRYP PFNGLBLENDFUNCIPROC) (GLuint buf, GLenum src, GLenum dst); typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEIPROC) (GLuint buf, GLenum srcRGB, GLenum dstRGB, GLenum srcAlpha, GLenum dstAlpha); #endif #ifndef GL_ARB_sample_shading #define GL_ARB_sample_shading 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glMinSampleShading (GLclampf); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLMINSAMPLESHADINGPROC) (GLclampf value); #endif #ifndef GL_ARB_texture_cube_map_array #define GL_ARB_texture_cube_map_array 1 #endif #ifndef GL_ARB_texture_gather #define GL_ARB_texture_gather 1 #endif #ifndef GL_ARB_texture_query_lod #define GL_ARB_texture_query_lod 1 #endif #ifndef GL_EXT_abgr #define GL_EXT_abgr 1 #endif #ifndef GL_EXT_blend_color #define GL_EXT_blend_color 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendColorEXT (GLclampf, GLclampf, GLclampf, GLclampf); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDCOLOREXTPROC) (GLclampf red, GLclampf green, GLclampf blue, GLclampf alpha); #endif #ifndef GL_EXT_polygon_offset #define GL_EXT_polygon_offset 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPolygonOffsetEXT (GLfloat, GLfloat); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPOLYGONOFFSETEXTPROC) (GLfloat factor, GLfloat bias); #endif #ifndef GL_EXT_texture #define GL_EXT_texture 1 #endif #ifndef GL_EXT_texture3D #define GL_EXT_texture3D 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTexImage3DEXT (GLenum, GLint, GLenum, GLsizei, GLsizei, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glTexSubImage3DEXT (GLenum, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXIMAGE3DEXTPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLTEXSUBIMAGE3DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const GLvoid *pixels); #endif #ifndef GL_SGIS_texture_filter4 #define GL_SGIS_texture_filter4 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetTexFilterFuncSGIS (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glTexFilterFuncSGIS (GLenum, GLenum, GLsizei, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETTEXFILTERFUNCSGISPROC) (GLenum target, GLenum filter, GLfloat *weights); typedef void (APIENTRYP PFNGLTEXFILTERFUNCSGISPROC) (GLenum target, GLenum filter, GLsizei n, const GLfloat *weights); #endif #ifndef GL_EXT_subtexture #define GL_EXT_subtexture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTexSubImage1DEXT (GLenum, GLint, GLint, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glTexSubImage2DEXT (GLenum, GLint, GLint, GLint, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXSUBIMAGE1DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLTEXSUBIMAGE2DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const GLvoid *pixels); #endif #ifndef GL_EXT_copy_texture #define GL_EXT_copy_texture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glCopyTexImage1DEXT (GLenum, GLint, GLenum, GLint, GLint, GLsizei, GLint); GLAPI void APIENTRY glCopyTexImage2DEXT (GLenum, GLint, GLenum, GLint, GLint, GLsizei, GLsizei, GLint); GLAPI void APIENTRY glCopyTexSubImage1DEXT (GLenum, GLint, GLint, GLint, GLint, GLsizei); GLAPI void APIENTRY glCopyTexSubImage2DEXT (GLenum, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glCopyTexSubImage3DEXT (GLenum, GLint, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOPYTEXIMAGE1DEXTPROC) (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border); typedef void (APIENTRYP PFNGLCOPYTEXIMAGE2DEXTPROC) (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border); typedef void (APIENTRYP PFNGLCOPYTEXSUBIMAGE1DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width); typedef void (APIENTRYP PFNGLCOPYTEXSUBIMAGE2DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLCOPYTEXSUBIMAGE3DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height); #endif #ifndef GL_EXT_histogram #define GL_EXT_histogram 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetHistogramEXT (GLenum, GLboolean, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetHistogramParameterfvEXT (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetHistogramParameterivEXT (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetMinmaxEXT (GLenum, GLboolean, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetMinmaxParameterfvEXT (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetMinmaxParameterivEXT (GLenum, GLenum, GLint *); GLAPI void APIENTRY glHistogramEXT (GLenum, GLsizei, GLenum, GLboolean); GLAPI void APIENTRY glMinmaxEXT (GLenum, GLenum, GLboolean); GLAPI void APIENTRY glResetHistogramEXT (GLenum); GLAPI void APIENTRY glResetMinmaxEXT (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETHISTOGRAMEXTPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, GLvoid *values); typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETMINMAXEXTPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, GLvoid *values); typedef void (APIENTRYP PFNGLGETMINMAXPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETMINMAXPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLHISTOGRAMEXTPROC) (GLenum target, GLsizei width, GLenum internalformat, GLboolean sink); typedef void (APIENTRYP PFNGLMINMAXEXTPROC) (GLenum target, GLenum internalformat, GLboolean sink); typedef void (APIENTRYP PFNGLRESETHISTOGRAMEXTPROC) (GLenum target); typedef void (APIENTRYP PFNGLRESETMINMAXEXTPROC) (GLenum target); #endif #ifndef GL_EXT_convolution #define GL_EXT_convolution 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glConvolutionFilter1DEXT (GLenum, GLenum, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glConvolutionFilter2DEXT (GLenum, GLenum, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glConvolutionParameterfEXT (GLenum, GLenum, GLfloat); GLAPI void APIENTRY glConvolutionParameterfvEXT (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glConvolutionParameteriEXT (GLenum, GLenum, GLint); GLAPI void APIENTRY glConvolutionParameterivEXT (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glCopyConvolutionFilter1DEXT (GLenum, GLenum, GLint, GLint, GLsizei); GLAPI void APIENTRY glCopyConvolutionFilter2DEXT (GLenum, GLenum, GLint, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glGetConvolutionFilterEXT (GLenum, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetConvolutionParameterfvEXT (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetConvolutionParameterivEXT (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetSeparableFilterEXT (GLenum, GLenum, GLenum, GLvoid *, GLvoid *, GLvoid *); GLAPI void APIENTRY glSeparableFilter2DEXT (GLenum, GLenum, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCONVOLUTIONFILTER1DEXTPROC) (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const GLvoid *image); typedef void (APIENTRYP PFNGLCONVOLUTIONFILTER2DEXTPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const GLvoid *image); typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERFEXTPROC) (GLenum target, GLenum pname, GLfloat params); typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERFVEXTPROC) (GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERIEXTPROC) (GLenum target, GLenum pname, GLint params); typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERIVEXTPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLCOPYCONVOLUTIONFILTER1DEXTPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width); typedef void (APIENTRYP PFNGLCOPYCONVOLUTIONFILTER2DEXTPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLGETCONVOLUTIONFILTEREXTPROC) (GLenum target, GLenum format, GLenum type, GLvoid *image); typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETSEPARABLEFILTEREXTPROC) (GLenum target, GLenum format, GLenum type, GLvoid *row, GLvoid *column, GLvoid *span); typedef void (APIENTRYP PFNGLSEPARABLEFILTER2DEXTPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const GLvoid *row, const GLvoid *column); #endif #ifndef GL_SGI_color_matrix #define GL_SGI_color_matrix 1 #endif #ifndef GL_SGI_color_table #define GL_SGI_color_table 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glColorTableSGI (GLenum, GLenum, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glColorTableParameterfvSGI (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glColorTableParameterivSGI (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glCopyColorTableSGI (GLenum, GLenum, GLint, GLint, GLsizei); GLAPI void APIENTRY glGetColorTableSGI (GLenum, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetColorTableParameterfvSGI (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetColorTableParameterivSGI (GLenum, GLenum, GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOLORTABLESGIPROC) (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const GLvoid *table); typedef void (APIENTRYP PFNGLCOLORTABLEPARAMETERFVSGIPROC) (GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLCOLORTABLEPARAMETERIVSGIPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLCOPYCOLORTABLESGIPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width); typedef void (APIENTRYP PFNGLGETCOLORTABLESGIPROC) (GLenum target, GLenum format, GLenum type, GLvoid *table); typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERFVSGIPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERIVSGIPROC) (GLenum target, GLenum pname, GLint *params); #endif #ifndef GL_SGIX_pixel_texture #define GL_SGIX_pixel_texture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPixelTexGenSGIX (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPIXELTEXGENSGIXPROC) (GLenum mode); #endif #ifndef GL_SGIS_pixel_texture #define GL_SGIS_pixel_texture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPixelTexGenParameteriSGIS (GLenum, GLint); GLAPI void APIENTRY glPixelTexGenParameterivSGIS (GLenum, const GLint *); GLAPI void APIENTRY glPixelTexGenParameterfSGIS (GLenum, GLfloat); GLAPI void APIENTRY glPixelTexGenParameterfvSGIS (GLenum, const GLfloat *); GLAPI void APIENTRY glGetPixelTexGenParameterivSGIS (GLenum, GLint *); GLAPI void APIENTRY glGetPixelTexGenParameterfvSGIS (GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPIXELTEXGENPARAMETERISGISPROC) (GLenum pname, GLint param); typedef void (APIENTRYP PFNGLPIXELTEXGENPARAMETERIVSGISPROC) (GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLPIXELTEXGENPARAMETERFSGISPROC) (GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLPIXELTEXGENPARAMETERFVSGISPROC) (GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLGETPIXELTEXGENPARAMETERIVSGISPROC) (GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETPIXELTEXGENPARAMETERFVSGISPROC) (GLenum pname, GLfloat *params); #endif #ifndef GL_SGIS_texture4D #define GL_SGIS_texture4D 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTexImage4DSGIS (GLenum, GLint, GLenum, GLsizei, GLsizei, GLsizei, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glTexSubImage4DSGIS (GLenum, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXIMAGE4DSGISPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLsizei size4d, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLTEXSUBIMAGE4DSGISPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint woffset, GLsizei width, GLsizei height, GLsizei depth, GLsizei size4d, GLenum format, GLenum type, const GLvoid *pixels); #endif #ifndef GL_SGI_texture_color_table #define GL_SGI_texture_color_table 1 #endif #ifndef GL_EXT_cmyka #define GL_EXT_cmyka 1 #endif #ifndef GL_EXT_texture_object #define GL_EXT_texture_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLboolean APIENTRY glAreTexturesResidentEXT (GLsizei, const GLuint *, GLboolean *); GLAPI void APIENTRY glBindTextureEXT (GLenum, GLuint); GLAPI void APIENTRY glDeleteTexturesEXT (GLsizei, const GLuint *); GLAPI void APIENTRY glGenTexturesEXT (GLsizei, GLuint *); GLAPI GLboolean APIENTRY glIsTextureEXT (GLuint); GLAPI void APIENTRY glPrioritizeTexturesEXT (GLsizei, const GLuint *, const GLclampf *); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLboolean (APIENTRYP PFNGLARETEXTURESRESIDENTEXTPROC) (GLsizei n, const GLuint *textures, GLboolean *residences); typedef void (APIENTRYP PFNGLBINDTEXTUREEXTPROC) (GLenum target, GLuint texture); typedef void (APIENTRYP PFNGLDELETETEXTURESEXTPROC) (GLsizei n, const GLuint *textures); typedef void (APIENTRYP PFNGLGENTEXTURESEXTPROC) (GLsizei n, GLuint *textures); typedef GLboolean (APIENTRYP PFNGLISTEXTUREEXTPROC) (GLuint texture); typedef void (APIENTRYP PFNGLPRIORITIZETEXTURESEXTPROC) (GLsizei n, const GLuint *textures, const GLclampf *priorities); #endif #ifndef GL_SGIS_detail_texture #define GL_SGIS_detail_texture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDetailTexFuncSGIS (GLenum, GLsizei, const GLfloat *); GLAPI void APIENTRY glGetDetailTexFuncSGIS (GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDETAILTEXFUNCSGISPROC) (GLenum target, GLsizei n, const GLfloat *points); typedef void (APIENTRYP PFNGLGETDETAILTEXFUNCSGISPROC) (GLenum target, GLfloat *points); #endif #ifndef GL_SGIS_sharpen_texture #define GL_SGIS_sharpen_texture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glSharpenTexFuncSGIS (GLenum, GLsizei, const GLfloat *); GLAPI void APIENTRY glGetSharpenTexFuncSGIS (GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSHARPENTEXFUNCSGISPROC) (GLenum target, GLsizei n, const GLfloat *points); typedef void (APIENTRYP PFNGLGETSHARPENTEXFUNCSGISPROC) (GLenum target, GLfloat *points); #endif #ifndef GL_EXT_packed_pixels #define GL_EXT_packed_pixels 1 #endif #ifndef GL_SGIS_texture_lod #define GL_SGIS_texture_lod 1 #endif #ifndef GL_SGIS_multisample #define GL_SGIS_multisample 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glSampleMaskSGIS (GLclampf, GLboolean); GLAPI void APIENTRY glSamplePatternSGIS (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSAMPLEMASKSGISPROC) (GLclampf value, GLboolean invert); typedef void (APIENTRYP PFNGLSAMPLEPATTERNSGISPROC) (GLenum pattern); #endif #ifndef GL_EXT_rescale_normal #define GL_EXT_rescale_normal 1 #endif #ifndef GL_EXT_vertex_array #define GL_EXT_vertex_array 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glArrayElementEXT (GLint); GLAPI void APIENTRY glColorPointerEXT (GLint, GLenum, GLsizei, GLsizei, const GLvoid *); GLAPI void APIENTRY glDrawArraysEXT (GLenum, GLint, GLsizei); GLAPI void APIENTRY glEdgeFlagPointerEXT (GLsizei, GLsizei, const GLboolean *); GLAPI void APIENTRY glGetPointervEXT (GLenum, GLvoid* *); GLAPI void APIENTRY glIndexPointerEXT (GLenum, GLsizei, GLsizei, const GLvoid *); GLAPI void APIENTRY glNormalPointerEXT (GLenum, GLsizei, GLsizei, const GLvoid *); GLAPI void APIENTRY glTexCoordPointerEXT (GLint, GLenum, GLsizei, GLsizei, const GLvoid *); GLAPI void APIENTRY glVertexPointerEXT (GLint, GLenum, GLsizei, GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLARRAYELEMENTEXTPROC) (GLint i); typedef void (APIENTRYP PFNGLCOLORPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, GLsizei count, const GLvoid *pointer); typedef void (APIENTRYP PFNGLDRAWARRAYSEXTPROC) (GLenum mode, GLint first, GLsizei count); typedef void (APIENTRYP PFNGLEDGEFLAGPOINTEREXTPROC) (GLsizei stride, GLsizei count, const GLboolean *pointer); typedef void (APIENTRYP PFNGLGETPOINTERVEXTPROC) (GLenum pname, GLvoid* *params); typedef void (APIENTRYP PFNGLINDEXPOINTEREXTPROC) (GLenum type, GLsizei stride, GLsizei count, const GLvoid *pointer); typedef void (APIENTRYP PFNGLNORMALPOINTEREXTPROC) (GLenum type, GLsizei stride, GLsizei count, const GLvoid *pointer); typedef void (APIENTRYP PFNGLTEXCOORDPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, GLsizei count, const GLvoid *pointer); typedef void (APIENTRYP PFNGLVERTEXPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, GLsizei count, const GLvoid *pointer); #endif #ifndef GL_EXT_misc_attribute #define GL_EXT_misc_attribute 1 #endif #ifndef GL_SGIS_generate_mipmap #define GL_SGIS_generate_mipmap 1 #endif #ifndef GL_SGIX_clipmap #define GL_SGIX_clipmap 1 #endif #ifndef GL_SGIX_shadow #define GL_SGIX_shadow 1 #endif #ifndef GL_SGIS_texture_edge_clamp #define GL_SGIS_texture_edge_clamp 1 #endif #ifndef GL_SGIS_texture_border_clamp #define GL_SGIS_texture_border_clamp 1 #endif #ifndef GL_EXT_blend_minmax #define GL_EXT_blend_minmax 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendEquationEXT (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDEQUATIONEXTPROC) (GLenum mode); #endif #ifndef GL_EXT_blend_subtract #define GL_EXT_blend_subtract 1 #endif #ifndef GL_EXT_blend_logic_op #define GL_EXT_blend_logic_op 1 #endif #ifndef GL_SGIX_interlace #define GL_SGIX_interlace 1 #endif #ifndef GL_SGIX_pixel_tiles #define GL_SGIX_pixel_tiles 1 #endif #ifndef GL_SGIX_texture_select #define GL_SGIX_texture_select 1 #endif #ifndef GL_SGIX_sprite #define GL_SGIX_sprite 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glSpriteParameterfSGIX (GLenum, GLfloat); GLAPI void APIENTRY glSpriteParameterfvSGIX (GLenum, const GLfloat *); GLAPI void APIENTRY glSpriteParameteriSGIX (GLenum, GLint); GLAPI void APIENTRY glSpriteParameterivSGIX (GLenum, const GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSPRITEPARAMETERFSGIXPROC) (GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLSPRITEPARAMETERFVSGIXPROC) (GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLSPRITEPARAMETERISGIXPROC) (GLenum pname, GLint param); typedef void (APIENTRYP PFNGLSPRITEPARAMETERIVSGIXPROC) (GLenum pname, const GLint *params); #endif #ifndef GL_SGIX_texture_multi_buffer #define GL_SGIX_texture_multi_buffer 1 #endif #ifndef GL_EXT_point_parameters #define GL_EXT_point_parameters 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPointParameterfEXT (GLenum, GLfloat); GLAPI void APIENTRY glPointParameterfvEXT (GLenum, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPOINTPARAMETERFEXTPROC) (GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLPOINTPARAMETERFVEXTPROC) (GLenum pname, const GLfloat *params); #endif #ifndef GL_SGIS_point_parameters #define GL_SGIS_point_parameters 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPointParameterfSGIS (GLenum, GLfloat); GLAPI void APIENTRY glPointParameterfvSGIS (GLenum, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPOINTPARAMETERFSGISPROC) (GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLPOINTPARAMETERFVSGISPROC) (GLenum pname, const GLfloat *params); #endif #ifndef GL_SGIX_instruments #define GL_SGIX_instruments 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLint APIENTRY glGetInstrumentsSGIX (void); GLAPI void APIENTRY glInstrumentsBufferSGIX (GLsizei, GLint *); GLAPI GLint APIENTRY glPollInstrumentsSGIX (GLint *); GLAPI void APIENTRY glReadInstrumentsSGIX (GLint); GLAPI void APIENTRY glStartInstrumentsSGIX (void); GLAPI void APIENTRY glStopInstrumentsSGIX (GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLint (APIENTRYP PFNGLGETINSTRUMENTSSGIXPROC) (void); typedef void (APIENTRYP PFNGLINSTRUMENTSBUFFERSGIXPROC) (GLsizei size, GLint *buffer); typedef GLint (APIENTRYP PFNGLPOLLINSTRUMENTSSGIXPROC) (GLint *marker_p); typedef void (APIENTRYP PFNGLREADINSTRUMENTSSGIXPROC) (GLint marker); typedef void (APIENTRYP PFNGLSTARTINSTRUMENTSSGIXPROC) (void); typedef void (APIENTRYP PFNGLSTOPINSTRUMENTSSGIXPROC) (GLint marker); #endif #ifndef GL_SGIX_texture_scale_bias #define GL_SGIX_texture_scale_bias 1 #endif #ifndef GL_SGIX_framezoom #define GL_SGIX_framezoom 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFrameZoomSGIX (GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFRAMEZOOMSGIXPROC) (GLint factor); #endif #ifndef GL_SGIX_tag_sample_buffer #define GL_SGIX_tag_sample_buffer 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTagSampleBufferSGIX (void); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTAGSAMPLEBUFFERSGIXPROC) (void); #endif #ifndef GL_SGIX_polynomial_ffd #define GL_SGIX_polynomial_ffd 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDeformationMap3dSGIX (GLenum, GLdouble, GLdouble, GLint, GLint, GLdouble, GLdouble, GLint, GLint, GLdouble, GLdouble, GLint, GLint, const GLdouble *); GLAPI void APIENTRY glDeformationMap3fSGIX (GLenum, GLfloat, GLfloat, GLint, GLint, GLfloat, GLfloat, GLint, GLint, GLfloat, GLfloat, GLint, GLint, const GLfloat *); GLAPI void APIENTRY glDeformSGIX (GLbitfield); GLAPI void APIENTRY glLoadIdentityDeformationMapSGIX (GLbitfield); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDEFORMATIONMAP3DSGIXPROC) (GLenum target, GLdouble u1, GLdouble u2, GLint ustride, GLint uorder, GLdouble v1, GLdouble v2, GLint vstride, GLint vorder, GLdouble w1, GLdouble w2, GLint wstride, GLint worder, const GLdouble *points); typedef void (APIENTRYP PFNGLDEFORMATIONMAP3FSGIXPROC) (GLenum target, GLfloat u1, GLfloat u2, GLint ustride, GLint uorder, GLfloat v1, GLfloat v2, GLint vstride, GLint vorder, GLfloat w1, GLfloat w2, GLint wstride, GLint worder, const GLfloat *points); typedef void (APIENTRYP PFNGLDEFORMSGIXPROC) (GLbitfield mask); typedef void (APIENTRYP PFNGLLOADIDENTITYDEFORMATIONMAPSGIXPROC) (GLbitfield mask); #endif #ifndef GL_SGIX_reference_plane #define GL_SGIX_reference_plane 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glReferencePlaneSGIX (const GLdouble *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLREFERENCEPLANESGIXPROC) (const GLdouble *equation); #endif #ifndef GL_SGIX_flush_raster #define GL_SGIX_flush_raster 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFlushRasterSGIX (void); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFLUSHRASTERSGIXPROC) (void); #endif #ifndef GL_SGIX_depth_texture #define GL_SGIX_depth_texture 1 #endif #ifndef GL_SGIS_fog_function #define GL_SGIS_fog_function 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFogFuncSGIS (GLsizei, const GLfloat *); GLAPI void APIENTRY glGetFogFuncSGIS (GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFOGFUNCSGISPROC) (GLsizei n, const GLfloat *points); typedef void (APIENTRYP PFNGLGETFOGFUNCSGISPROC) (GLfloat *points); #endif #ifndef GL_SGIX_fog_offset #define GL_SGIX_fog_offset 1 #endif #ifndef GL_HP_image_transform #define GL_HP_image_transform 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glImageTransformParameteriHP (GLenum, GLenum, GLint); GLAPI void APIENTRY glImageTransformParameterfHP (GLenum, GLenum, GLfloat); GLAPI void APIENTRY glImageTransformParameterivHP (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glImageTransformParameterfvHP (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glGetImageTransformParameterivHP (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetImageTransformParameterfvHP (GLenum, GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLIMAGETRANSFORMPARAMETERIHPPROC) (GLenum target, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLIMAGETRANSFORMPARAMETERFHPPROC) (GLenum target, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLIMAGETRANSFORMPARAMETERIVHPPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLIMAGETRANSFORMPARAMETERFVHPPROC) (GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLGETIMAGETRANSFORMPARAMETERIVHPPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETIMAGETRANSFORMPARAMETERFVHPPROC) (GLenum target, GLenum pname, GLfloat *params); #endif #ifndef GL_HP_convolution_border_modes #define GL_HP_convolution_border_modes 1 #endif #ifndef GL_SGIX_texture_add_env #define GL_SGIX_texture_add_env 1 #endif #ifndef GL_EXT_color_subtable #define GL_EXT_color_subtable 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glColorSubTableEXT (GLenum, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glCopyColorSubTableEXT (GLenum, GLsizei, GLint, GLint, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ //typedef void (APIENTRYP PFNGLCOLORSUBTABLEEXTPROC) (GLenum target, GLsizei start, GLsizei count, GLenum format, GLenum type, const GLvoid *data); typedef void (APIENTRYP PFNGLCOPYCOLORSUBTABLEEXTPROC) (GLenum target, GLsizei start, GLint x, GLint y, GLsizei width); #endif #ifndef GL_PGI_vertex_hints #define GL_PGI_vertex_hints 1 #endif #ifndef GL_PGI_misc_hints #define GL_PGI_misc_hints 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glHintPGI (GLenum, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLHINTPGIPROC) (GLenum target, GLint mode); #endif #ifndef GL_EXT_paletted_texture #define GL_EXT_paletted_texture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glColorTableEXT (GLenum, GLenum, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glGetColorTableEXT (GLenum, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetColorTableParameterivEXT (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetColorTableParameterfvEXT (GLenum, GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOLORTABLEEXTPROC) (GLenum target, GLenum internalFormat, GLsizei width, GLenum format, GLenum type, const GLvoid *table); typedef void (APIENTRYP PFNGLGETCOLORTABLEEXTPROC) (GLenum target, GLenum format, GLenum type, GLvoid *data); typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params); #endif #ifndef GL_EXT_clip_volume_hint #define GL_EXT_clip_volume_hint 1 #endif #ifndef GL_SGIX_list_priority #define GL_SGIX_list_priority 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetListParameterfvSGIX (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetListParameterivSGIX (GLuint, GLenum, GLint *); GLAPI void APIENTRY glListParameterfSGIX (GLuint, GLenum, GLfloat); GLAPI void APIENTRY glListParameterfvSGIX (GLuint, GLenum, const GLfloat *); GLAPI void APIENTRY glListParameteriSGIX (GLuint, GLenum, GLint); GLAPI void APIENTRY glListParameterivSGIX (GLuint, GLenum, const GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETLISTPARAMETERFVSGIXPROC) (GLuint list, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETLISTPARAMETERIVSGIXPROC) (GLuint list, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLLISTPARAMETERFSGIXPROC) (GLuint list, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLLISTPARAMETERFVSGIXPROC) (GLuint list, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLLISTPARAMETERISGIXPROC) (GLuint list, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLLISTPARAMETERIVSGIXPROC) (GLuint list, GLenum pname, const GLint *params); #endif #ifndef GL_SGIX_ir_instrument1 #define GL_SGIX_ir_instrument1 1 #endif #ifndef GL_SGIX_calligraphic_fragment #define GL_SGIX_calligraphic_fragment 1 #endif #ifndef GL_SGIX_texture_lod_bias #define GL_SGIX_texture_lod_bias 1 #endif #ifndef GL_SGIX_shadow_ambient #define GL_SGIX_shadow_ambient 1 #endif #ifndef GL_EXT_index_texture #define GL_EXT_index_texture 1 #endif #ifndef GL_EXT_index_material #define GL_EXT_index_material 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glIndexMaterialEXT (GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLINDEXMATERIALEXTPROC) (GLenum face, GLenum mode); #endif #ifndef GL_EXT_index_func #define GL_EXT_index_func 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glIndexFuncEXT (GLenum, GLclampf); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLINDEXFUNCEXTPROC) (GLenum func, GLclampf ref); #endif #ifndef GL_EXT_index_array_formats #define GL_EXT_index_array_formats 1 #endif #ifndef GL_EXT_compiled_vertex_array #define GL_EXT_compiled_vertex_array 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glLockArraysEXT (GLint, GLsizei); GLAPI void APIENTRY glUnlockArraysEXT (void); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLLOCKARRAYSEXTPROC) (GLint first, GLsizei count); typedef void (APIENTRYP PFNGLUNLOCKARRAYSEXTPROC) (void); #endif #ifndef GL_EXT_cull_vertex #define GL_EXT_cull_vertex 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glCullParameterdvEXT (GLenum, GLdouble *); GLAPI void APIENTRY glCullParameterfvEXT (GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCULLPARAMETERDVEXTPROC) (GLenum pname, GLdouble *params); typedef void (APIENTRYP PFNGLCULLPARAMETERFVEXTPROC) (GLenum pname, GLfloat *params); #endif #ifndef GL_SGIX_ycrcb #define GL_SGIX_ycrcb 1 #endif #ifndef GL_SGIX_fragment_lighting #define GL_SGIX_fragment_lighting 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFragmentColorMaterialSGIX (GLenum, GLenum); GLAPI void APIENTRY glFragmentLightfSGIX (GLenum, GLenum, GLfloat); GLAPI void APIENTRY glFragmentLightfvSGIX (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glFragmentLightiSGIX (GLenum, GLenum, GLint); GLAPI void APIENTRY glFragmentLightivSGIX (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glFragmentLightModelfSGIX (GLenum, GLfloat); GLAPI void APIENTRY glFragmentLightModelfvSGIX (GLenum, const GLfloat *); GLAPI void APIENTRY glFragmentLightModeliSGIX (GLenum, GLint); GLAPI void APIENTRY glFragmentLightModelivSGIX (GLenum, const GLint *); GLAPI void APIENTRY glFragmentMaterialfSGIX (GLenum, GLenum, GLfloat); GLAPI void APIENTRY glFragmentMaterialfvSGIX (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glFragmentMaterialiSGIX (GLenum, GLenum, GLint); GLAPI void APIENTRY glFragmentMaterialivSGIX (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glGetFragmentLightfvSGIX (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetFragmentLightivSGIX (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetFragmentMaterialfvSGIX (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetFragmentMaterialivSGIX (GLenum, GLenum, GLint *); GLAPI void APIENTRY glLightEnviSGIX (GLenum, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFRAGMENTCOLORMATERIALSGIXPROC) (GLenum face, GLenum mode); typedef void (APIENTRYP PFNGLFRAGMENTLIGHTFSGIXPROC) (GLenum light, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLFRAGMENTLIGHTFVSGIXPROC) (GLenum light, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLFRAGMENTLIGHTISGIXPROC) (GLenum light, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLFRAGMENTLIGHTIVSGIXPROC) (GLenum light, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLFRAGMENTLIGHTMODELFSGIXPROC) (GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLFRAGMENTLIGHTMODELFVSGIXPROC) (GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLFRAGMENTLIGHTMODELISGIXPROC) (GLenum pname, GLint param); typedef void (APIENTRYP PFNGLFRAGMENTLIGHTMODELIVSGIXPROC) (GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLFRAGMENTMATERIALFSGIXPROC) (GLenum face, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLFRAGMENTMATERIALFVSGIXPROC) (GLenum face, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLFRAGMENTMATERIALISGIXPROC) (GLenum face, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLFRAGMENTMATERIALIVSGIXPROC) (GLenum face, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLGETFRAGMENTLIGHTFVSGIXPROC) (GLenum light, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETFRAGMENTLIGHTIVSGIXPROC) (GLenum light, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETFRAGMENTMATERIALFVSGIXPROC) (GLenum face, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETFRAGMENTMATERIALIVSGIXPROC) (GLenum face, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLLIGHTENVISGIXPROC) (GLenum pname, GLint param); #endif #ifndef GL_IBM_rasterpos_clip #define GL_IBM_rasterpos_clip 1 #endif #ifndef GL_HP_texture_lighting #define GL_HP_texture_lighting 1 #endif #ifndef GL_EXT_draw_range_elements #define GL_EXT_draw_range_elements 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDrawRangeElementsEXT (GLenum, GLuint, GLuint, GLsizei, GLenum, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTSEXTPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const GLvoid *indices); #endif #ifndef GL_WIN_phong_shading #define GL_WIN_phong_shading 1 #endif #ifndef GL_WIN_specular_fog #define GL_WIN_specular_fog 1 #endif #ifndef GL_EXT_light_texture #define GL_EXT_light_texture 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glApplyTextureEXT (GLenum); GLAPI void APIENTRY glTextureLightEXT (GLenum); GLAPI void APIENTRY glTextureMaterialEXT (GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLAPPLYTEXTUREEXTPROC) (GLenum mode); typedef void (APIENTRYP PFNGLTEXTURELIGHTEXTPROC) (GLenum pname); typedef void (APIENTRYP PFNGLTEXTUREMATERIALEXTPROC) (GLenum face, GLenum mode); #endif #ifndef GL_SGIX_blend_alpha_minmax #define GL_SGIX_blend_alpha_minmax 1 #endif #ifndef GL_EXT_bgra #define GL_EXT_bgra 1 #endif #ifndef GL_SGIX_async #define GL_SGIX_async 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glAsyncMarkerSGIX (GLuint); GLAPI GLint APIENTRY glFinishAsyncSGIX (GLuint *); GLAPI GLint APIENTRY glPollAsyncSGIX (GLuint *); GLAPI GLuint APIENTRY glGenAsyncMarkersSGIX (GLsizei); GLAPI void APIENTRY glDeleteAsyncMarkersSGIX (GLuint, GLsizei); GLAPI GLboolean APIENTRY glIsAsyncMarkerSGIX (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLASYNCMARKERSGIXPROC) (GLuint marker); typedef GLint (APIENTRYP PFNGLFINISHASYNCSGIXPROC) (GLuint *markerp); typedef GLint (APIENTRYP PFNGLPOLLASYNCSGIXPROC) (GLuint *markerp); typedef GLuint (APIENTRYP PFNGLGENASYNCMARKERSSGIXPROC) (GLsizei range); typedef void (APIENTRYP PFNGLDELETEASYNCMARKERSSGIXPROC) (GLuint marker, GLsizei range); typedef GLboolean (APIENTRYP PFNGLISASYNCMARKERSGIXPROC) (GLuint marker); #endif #ifndef GL_SGIX_async_pixel #define GL_SGIX_async_pixel 1 #endif #ifndef GL_SGIX_async_histogram #define GL_SGIX_async_histogram 1 #endif #ifndef GL_INTEL_parallel_arrays #define GL_INTEL_parallel_arrays 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertexPointervINTEL (GLint, GLenum, const GLvoid* *); GLAPI void APIENTRY glNormalPointervINTEL (GLenum, const GLvoid* *); GLAPI void APIENTRY glColorPointervINTEL (GLint, GLenum, const GLvoid* *); GLAPI void APIENTRY glTexCoordPointervINTEL (GLint, GLenum, const GLvoid* *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEXPOINTERVINTELPROC) (GLint size, GLenum type, const GLvoid* *pointer); typedef void (APIENTRYP PFNGLNORMALPOINTERVINTELPROC) (GLenum type, const GLvoid* *pointer); typedef void (APIENTRYP PFNGLCOLORPOINTERVINTELPROC) (GLint size, GLenum type, const GLvoid* *pointer); typedef void (APIENTRYP PFNGLTEXCOORDPOINTERVINTELPROC) (GLint size, GLenum type, const GLvoid* *pointer); #endif #ifndef GL_HP_occlusion_test #define GL_HP_occlusion_test 1 #endif #ifndef GL_EXT_pixel_transform #define GL_EXT_pixel_transform 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPixelTransformParameteriEXT (GLenum, GLenum, GLint); GLAPI void APIENTRY glPixelTransformParameterfEXT (GLenum, GLenum, GLfloat); GLAPI void APIENTRY glPixelTransformParameterivEXT (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glPixelTransformParameterfvEXT (GLenum, GLenum, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPIXELTRANSFORMPARAMETERIEXTPROC) (GLenum target, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLPIXELTRANSFORMPARAMETERFEXTPROC) (GLenum target, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLPIXELTRANSFORMPARAMETERIVEXTPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLPIXELTRANSFORMPARAMETERFVEXTPROC) (GLenum target, GLenum pname, const GLfloat *params); #endif #ifndef GL_EXT_pixel_transform_color_table #define GL_EXT_pixel_transform_color_table 1 #endif #ifndef GL_EXT_shared_texture_palette #define GL_EXT_shared_texture_palette 1 #endif #ifndef GL_EXT_separate_specular_color #define GL_EXT_separate_specular_color 1 #endif #ifndef GL_EXT_secondary_color #define GL_EXT_secondary_color 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glSecondaryColor3bEXT (GLbyte, GLbyte, GLbyte); GLAPI void APIENTRY glSecondaryColor3bvEXT (const GLbyte *); GLAPI void APIENTRY glSecondaryColor3dEXT (GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glSecondaryColor3dvEXT (const GLdouble *); GLAPI void APIENTRY glSecondaryColor3fEXT (GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glSecondaryColor3fvEXT (const GLfloat *); GLAPI void APIENTRY glSecondaryColor3iEXT (GLint, GLint, GLint); GLAPI void APIENTRY glSecondaryColor3ivEXT (const GLint *); GLAPI void APIENTRY glSecondaryColor3sEXT (GLshort, GLshort, GLshort); GLAPI void APIENTRY glSecondaryColor3svEXT (const GLshort *); GLAPI void APIENTRY glSecondaryColor3ubEXT (GLubyte, GLubyte, GLubyte); GLAPI void APIENTRY glSecondaryColor3ubvEXT (const GLubyte *); GLAPI void APIENTRY glSecondaryColor3uiEXT (GLuint, GLuint, GLuint); GLAPI void APIENTRY glSecondaryColor3uivEXT (const GLuint *); GLAPI void APIENTRY glSecondaryColor3usEXT (GLushort, GLushort, GLushort); GLAPI void APIENTRY glSecondaryColor3usvEXT (const GLushort *); GLAPI void APIENTRY glSecondaryColorPointerEXT (GLint, GLenum, GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSECONDARYCOLOR3BEXTPROC) (GLbyte red, GLbyte green, GLbyte blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3BVEXTPROC) (const GLbyte *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3DEXTPROC) (GLdouble red, GLdouble green, GLdouble blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3DVEXTPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3FEXTPROC) (GLfloat red, GLfloat green, GLfloat blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3FVEXTPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3IEXTPROC) (GLint red, GLint green, GLint blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3IVEXTPROC) (const GLint *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3SEXTPROC) (GLshort red, GLshort green, GLshort blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3SVEXTPROC) (const GLshort *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UBEXTPROC) (GLubyte red, GLubyte green, GLubyte blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UBVEXTPROC) (const GLubyte *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UIEXTPROC) (GLuint red, GLuint green, GLuint blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UIVEXTPROC) (const GLuint *v); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3USEXTPROC) (GLushort red, GLushort green, GLushort blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3USVEXTPROC) (const GLushort *v); typedef void (APIENTRYP PFNGLSECONDARYCOLORPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, const GLvoid *pointer); #endif #ifndef GL_EXT_texture_perturb_normal #define GL_EXT_texture_perturb_normal 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTextureNormalEXT (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXTURENORMALEXTPROC) (GLenum mode); #endif #ifndef GL_EXT_multi_draw_arrays #define GL_EXT_multi_draw_arrays 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glMultiDrawArraysEXT (GLenum, GLint *, GLsizei *, GLsizei); GLAPI void APIENTRY glMultiDrawElementsEXT (GLenum, const GLsizei *, GLenum, const GLvoid* *, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLMULTIDRAWARRAYSEXTPROC) (GLenum mode, GLint *first, GLsizei *count, GLsizei primcount); typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSEXTPROC) (GLenum mode, const GLsizei *count, GLenum type, const GLvoid* *indices, GLsizei primcount); #endif #ifndef GL_EXT_fog_coord #define GL_EXT_fog_coord 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFogCoordfEXT (GLfloat); GLAPI void APIENTRY glFogCoordfvEXT (const GLfloat *); GLAPI void APIENTRY glFogCoorddEXT (GLdouble); GLAPI void APIENTRY glFogCoorddvEXT (const GLdouble *); GLAPI void APIENTRY glFogCoordPointerEXT (GLenum, GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFOGCOORDFEXTPROC) (GLfloat coord); typedef void (APIENTRYP PFNGLFOGCOORDFVEXTPROC) (const GLfloat *coord); typedef void (APIENTRYP PFNGLFOGCOORDDEXTPROC) (GLdouble coord); typedef void (APIENTRYP PFNGLFOGCOORDDVEXTPROC) (const GLdouble *coord); typedef void (APIENTRYP PFNGLFOGCOORDPOINTEREXTPROC) (GLenum type, GLsizei stride, const GLvoid *pointer); #endif #ifndef GL_REND_screen_coordinates #define GL_REND_screen_coordinates 1 #endif #ifndef GL_EXT_coordinate_frame #define GL_EXT_coordinate_frame 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTangent3bEXT (GLbyte, GLbyte, GLbyte); GLAPI void APIENTRY glTangent3bvEXT (const GLbyte *); GLAPI void APIENTRY glTangent3dEXT (GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glTangent3dvEXT (const GLdouble *); GLAPI void APIENTRY glTangent3fEXT (GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glTangent3fvEXT (const GLfloat *); GLAPI void APIENTRY glTangent3iEXT (GLint, GLint, GLint); GLAPI void APIENTRY glTangent3ivEXT (const GLint *); GLAPI void APIENTRY glTangent3sEXT (GLshort, GLshort, GLshort); GLAPI void APIENTRY glTangent3svEXT (const GLshort *); GLAPI void APIENTRY glBinormal3bEXT (GLbyte, GLbyte, GLbyte); GLAPI void APIENTRY glBinormal3bvEXT (const GLbyte *); GLAPI void APIENTRY glBinormal3dEXT (GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glBinormal3dvEXT (const GLdouble *); GLAPI void APIENTRY glBinormal3fEXT (GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glBinormal3fvEXT (const GLfloat *); GLAPI void APIENTRY glBinormal3iEXT (GLint, GLint, GLint); GLAPI void APIENTRY glBinormal3ivEXT (const GLint *); GLAPI void APIENTRY glBinormal3sEXT (GLshort, GLshort, GLshort); GLAPI void APIENTRY glBinormal3svEXT (const GLshort *); GLAPI void APIENTRY glTangentPointerEXT (GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glBinormalPointerEXT (GLenum, GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTANGENT3BEXTPROC) (GLbyte tx, GLbyte ty, GLbyte tz); typedef void (APIENTRYP PFNGLTANGENT3BVEXTPROC) (const GLbyte *v); typedef void (APIENTRYP PFNGLTANGENT3DEXTPROC) (GLdouble tx, GLdouble ty, GLdouble tz); typedef void (APIENTRYP PFNGLTANGENT3DVEXTPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLTANGENT3FEXTPROC) (GLfloat tx, GLfloat ty, GLfloat tz); typedef void (APIENTRYP PFNGLTANGENT3FVEXTPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLTANGENT3IEXTPROC) (GLint tx, GLint ty, GLint tz); typedef void (APIENTRYP PFNGLTANGENT3IVEXTPROC) (const GLint *v); typedef void (APIENTRYP PFNGLTANGENT3SEXTPROC) (GLshort tx, GLshort ty, GLshort tz); typedef void (APIENTRYP PFNGLTANGENT3SVEXTPROC) (const GLshort *v); typedef void (APIENTRYP PFNGLBINORMAL3BEXTPROC) (GLbyte bx, GLbyte by, GLbyte bz); typedef void (APIENTRYP PFNGLBINORMAL3BVEXTPROC) (const GLbyte *v); typedef void (APIENTRYP PFNGLBINORMAL3DEXTPROC) (GLdouble bx, GLdouble by, GLdouble bz); typedef void (APIENTRYP PFNGLBINORMAL3DVEXTPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLBINORMAL3FEXTPROC) (GLfloat bx, GLfloat by, GLfloat bz); typedef void (APIENTRYP PFNGLBINORMAL3FVEXTPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLBINORMAL3IEXTPROC) (GLint bx, GLint by, GLint bz); typedef void (APIENTRYP PFNGLBINORMAL3IVEXTPROC) (const GLint *v); typedef void (APIENTRYP PFNGLBINORMAL3SEXTPROC) (GLshort bx, GLshort by, GLshort bz); typedef void (APIENTRYP PFNGLBINORMAL3SVEXTPROC) (const GLshort *v); typedef void (APIENTRYP PFNGLTANGENTPOINTEREXTPROC) (GLenum type, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLBINORMALPOINTEREXTPROC) (GLenum type, GLsizei stride, const GLvoid *pointer); #endif #ifndef GL_EXT_texture_env_combine #define GL_EXT_texture_env_combine 1 #endif #ifndef GL_APPLE_specular_vector #define GL_APPLE_specular_vector 1 #endif #ifndef GL_APPLE_transform_hint #define GL_APPLE_transform_hint 1 #endif #ifndef GL_SGIX_fog_scale #define GL_SGIX_fog_scale 1 #endif #ifndef GL_SUNX_constant_data #define GL_SUNX_constant_data 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFinishTextureSUNX (void); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFINISHTEXTURESUNXPROC) (void); #endif #ifndef GL_SUN_global_alpha #define GL_SUN_global_alpha 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGlobalAlphaFactorbSUN (GLbyte); GLAPI void APIENTRY glGlobalAlphaFactorsSUN (GLshort); GLAPI void APIENTRY glGlobalAlphaFactoriSUN (GLint); GLAPI void APIENTRY glGlobalAlphaFactorfSUN (GLfloat); GLAPI void APIENTRY glGlobalAlphaFactordSUN (GLdouble); GLAPI void APIENTRY glGlobalAlphaFactorubSUN (GLubyte); GLAPI void APIENTRY glGlobalAlphaFactorusSUN (GLushort); GLAPI void APIENTRY glGlobalAlphaFactoruiSUN (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORBSUNPROC) (GLbyte factor); typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORSSUNPROC) (GLshort factor); typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORISUNPROC) (GLint factor); typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORFSUNPROC) (GLfloat factor); typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORDSUNPROC) (GLdouble factor); typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORUBSUNPROC) (GLubyte factor); typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORUSSUNPROC) (GLushort factor); typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORUISUNPROC) (GLuint factor); #endif #ifndef GL_SUN_triangle_list #define GL_SUN_triangle_list 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glReplacementCodeuiSUN (GLuint); GLAPI void APIENTRY glReplacementCodeusSUN (GLushort); GLAPI void APIENTRY glReplacementCodeubSUN (GLubyte); GLAPI void APIENTRY glReplacementCodeuivSUN (const GLuint *); GLAPI void APIENTRY glReplacementCodeusvSUN (const GLushort *); GLAPI void APIENTRY glReplacementCodeubvSUN (const GLubyte *); GLAPI void APIENTRY glReplacementCodePointerSUN (GLenum, GLsizei, const GLvoid* *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLREPLACEMENTCODEUISUNPROC) (GLuint code); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUSSUNPROC) (GLushort code); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUBSUNPROC) (GLubyte code); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUIVSUNPROC) (const GLuint *code); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUSVSUNPROC) (const GLushort *code); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUBVSUNPROC) (const GLubyte *code); typedef void (APIENTRYP PFNGLREPLACEMENTCODEPOINTERSUNPROC) (GLenum type, GLsizei stride, const GLvoid* *pointer); #endif #ifndef GL_SUN_vertex #define GL_SUN_vertex 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glColor4ubVertex2fSUN (GLubyte, GLubyte, GLubyte, GLubyte, GLfloat, GLfloat); GLAPI void APIENTRY glColor4ubVertex2fvSUN (const GLubyte *, const GLfloat *); GLAPI void APIENTRY glColor4ubVertex3fSUN (GLubyte, GLubyte, GLubyte, GLubyte, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glColor4ubVertex3fvSUN (const GLubyte *, const GLfloat *); GLAPI void APIENTRY glColor3fVertex3fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glColor3fVertex3fvSUN (const GLfloat *, const GLfloat *); GLAPI void APIENTRY glNormal3fVertex3fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glNormal3fVertex3fvSUN (const GLfloat *, const GLfloat *); GLAPI void APIENTRY glColor4fNormal3fVertex3fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glColor4fNormal3fVertex3fvSUN (const GLfloat *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glTexCoord2fVertex3fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glTexCoord2fVertex3fvSUN (const GLfloat *, const GLfloat *); GLAPI void APIENTRY glTexCoord4fVertex4fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glTexCoord4fVertex4fvSUN (const GLfloat *, const GLfloat *); GLAPI void APIENTRY glTexCoord2fColor4ubVertex3fSUN (GLfloat, GLfloat, GLubyte, GLubyte, GLubyte, GLubyte, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glTexCoord2fColor4ubVertex3fvSUN (const GLfloat *, const GLubyte *, const GLfloat *); GLAPI void APIENTRY glTexCoord2fColor3fVertex3fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glTexCoord2fColor3fVertex3fvSUN (const GLfloat *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glTexCoord2fNormal3fVertex3fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glTexCoord2fNormal3fVertex3fvSUN (const GLfloat *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glTexCoord2fColor4fNormal3fVertex3fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glTexCoord2fColor4fNormal3fVertex3fvSUN (const GLfloat *, const GLfloat *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glTexCoord4fColor4fNormal3fVertex4fSUN (GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glTexCoord4fColor4fNormal3fVertex4fvSUN (const GLfloat *, const GLfloat *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glReplacementCodeuiVertex3fSUN (GLuint, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glReplacementCodeuiVertex3fvSUN (const GLuint *, const GLfloat *); GLAPI void APIENTRY glReplacementCodeuiColor4ubVertex3fSUN (GLuint, GLubyte, GLubyte, GLubyte, GLubyte, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glReplacementCodeuiColor4ubVertex3fvSUN (const GLuint *, const GLubyte *, const GLfloat *); GLAPI void APIENTRY glReplacementCodeuiColor3fVertex3fSUN (GLuint, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glReplacementCodeuiColor3fVertex3fvSUN (const GLuint *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glReplacementCodeuiNormal3fVertex3fSUN (GLuint, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glReplacementCodeuiNormal3fVertex3fvSUN (const GLuint *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glReplacementCodeuiColor4fNormal3fVertex3fSUN (GLuint, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glReplacementCodeuiColor4fNormal3fVertex3fvSUN (const GLuint *, const GLfloat *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glReplacementCodeuiTexCoord2fVertex3fSUN (GLuint, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glReplacementCodeuiTexCoord2fVertex3fvSUN (const GLuint *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glReplacementCodeuiTexCoord2fNormal3fVertex3fSUN (GLuint, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glReplacementCodeuiTexCoord2fNormal3fVertex3fvSUN (const GLuint *, const GLfloat *, const GLfloat *, const GLfloat *); GLAPI void APIENTRY glReplacementCodeuiTexCoord2fColor4fNormal3fVertex3fSUN (GLuint, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glReplacementCodeuiTexCoord2fColor4fNormal3fVertex3fvSUN (const GLuint *, const GLfloat *, const GLfloat *, const GLfloat *, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOLOR4UBVERTEX2FSUNPROC) (GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y); typedef void (APIENTRYP PFNGLCOLOR4UBVERTEX2FVSUNPROC) (const GLubyte *c, const GLfloat *v); typedef void (APIENTRYP PFNGLCOLOR4UBVERTEX3FSUNPROC) (GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLCOLOR4UBVERTEX3FVSUNPROC) (const GLubyte *c, const GLfloat *v); typedef void (APIENTRYP PFNGLCOLOR3FVERTEX3FSUNPROC) (GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLCOLOR3FVERTEX3FVSUNPROC) (const GLfloat *c, const GLfloat *v); typedef void (APIENTRYP PFNGLNORMAL3FVERTEX3FSUNPROC) (GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLNORMAL3FVERTEX3FVSUNPROC) (const GLfloat *n, const GLfloat *v); typedef void (APIENTRYP PFNGLCOLOR4FNORMAL3FVERTEX3FSUNPROC) (GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLCOLOR4FNORMAL3FVERTEX3FVSUNPROC) (const GLfloat *c, const GLfloat *n, const GLfloat *v); typedef void (APIENTRYP PFNGLTEXCOORD2FVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLTEXCOORD2FVERTEX3FVSUNPROC) (const GLfloat *tc, const GLfloat *v); typedef void (APIENTRYP PFNGLTEXCOORD4FVERTEX4FSUNPROC) (GLfloat s, GLfloat t, GLfloat p, GLfloat q, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLTEXCOORD4FVERTEX4FVSUNPROC) (const GLfloat *tc, const GLfloat *v); typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR4UBVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR4UBVERTEX3FVSUNPROC) (const GLfloat *tc, const GLubyte *c, const GLfloat *v); typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR3FVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR3FVERTEX3FVSUNPROC) (const GLfloat *tc, const GLfloat *c, const GLfloat *v); typedef void (APIENTRYP PFNGLTEXCOORD2FNORMAL3FVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLTEXCOORD2FNORMAL3FVERTEX3FVSUNPROC) (const GLfloat *tc, const GLfloat *n, const GLfloat *v); typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR4FNORMAL3FVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR4FNORMAL3FVERTEX3FVSUNPROC) (const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v); typedef void (APIENTRYP PFNGLTEXCOORD4FCOLOR4FNORMAL3FVERTEX4FSUNPROC) (GLfloat s, GLfloat t, GLfloat p, GLfloat q, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLTEXCOORD4FCOLOR4FNORMAL3FVERTEX4FVSUNPROC) (const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUIVERTEX3FSUNPROC) (GLuint rc, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUIVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *v); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR4UBVERTEX3FSUNPROC) (GLuint rc, GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR4UBVERTEX3FVSUNPROC) (const GLuint *rc, const GLubyte *c, const GLfloat *v); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR3FVERTEX3FSUNPROC) (GLuint rc, GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *c, const GLfloat *v); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUINORMAL3FVERTEX3FSUNPROC) (GLuint rc, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUINORMAL3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *n, const GLfloat *v); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR4FNORMAL3FVERTEX3FSUNPROC) (GLuint rc, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR4FNORMAL3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *c, const GLfloat *n, const GLfloat *v); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FVERTEX3FSUNPROC) (GLuint rc, GLfloat s, GLfloat t, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *tc, const GLfloat *v); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FNORMAL3FVERTEX3FSUNPROC) (GLuint rc, GLfloat s, GLfloat t, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FNORMAL3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *tc, const GLfloat *n, const GLfloat *v); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FCOLOR4FNORMAL3FVERTEX3FSUNPROC) (GLuint rc, GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FCOLOR4FNORMAL3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v); #endif #ifndef GL_EXT_blend_func_separate #define GL_EXT_blend_func_separate 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendFuncSeparateEXT (GLenum, GLenum, GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEEXTPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha); #endif #ifndef GL_INGR_blend_func_separate #define GL_INGR_blend_func_separate 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendFuncSeparateINGR (GLenum, GLenum, GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEINGRPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha); #endif #ifndef GL_INGR_color_clamp #define GL_INGR_color_clamp 1 #endif #ifndef GL_INGR_interlace_read #define GL_INGR_interlace_read 1 #endif #ifndef GL_EXT_stencil_wrap #define GL_EXT_stencil_wrap 1 #endif #ifndef GL_EXT_422_pixels #define GL_EXT_422_pixels 1 #endif #ifndef GL_NV_texgen_reflection #define GL_NV_texgen_reflection 1 #endif #ifndef GL_SUN_convolution_border_modes #define GL_SUN_convolution_border_modes 1 #endif #ifndef GL_EXT_texture_env_add #define GL_EXT_texture_env_add 1 #endif #ifndef GL_EXT_texture_lod_bias #define GL_EXT_texture_lod_bias 1 #endif #ifndef GL_EXT_texture_filter_anisotropic #define GL_EXT_texture_filter_anisotropic 1 #endif #ifndef GL_EXT_vertex_weighting #define GL_EXT_vertex_weighting 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertexWeightfEXT (GLfloat); GLAPI void APIENTRY glVertexWeightfvEXT (const GLfloat *); GLAPI void APIENTRY glVertexWeightPointerEXT (GLsizei, GLenum, GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEXWEIGHTFEXTPROC) (GLfloat weight); typedef void (APIENTRYP PFNGLVERTEXWEIGHTFVEXTPROC) (const GLfloat *weight); typedef void (APIENTRYP PFNGLVERTEXWEIGHTPOINTEREXTPROC) (GLsizei size, GLenum type, GLsizei stride, const GLvoid *pointer); #endif #ifndef GL_NV_light_max_exponent #define GL_NV_light_max_exponent 1 #endif #ifndef GL_NV_vertex_array_range #define GL_NV_vertex_array_range 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFlushVertexArrayRangeNV (void); GLAPI void APIENTRY glVertexArrayRangeNV (GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFLUSHVERTEXARRAYRANGENVPROC) (void); typedef void (APIENTRYP PFNGLVERTEXARRAYRANGENVPROC) (GLsizei length, const GLvoid *pointer); #endif #ifndef GL_NV_register_combiners #define GL_NV_register_combiners 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glCombinerParameterfvNV (GLenum, const GLfloat *); GLAPI void APIENTRY glCombinerParameterfNV (GLenum, GLfloat); GLAPI void APIENTRY glCombinerParameterivNV (GLenum, const GLint *); GLAPI void APIENTRY glCombinerParameteriNV (GLenum, GLint); GLAPI void APIENTRY glCombinerInputNV (GLenum, GLenum, GLenum, GLenum, GLenum, GLenum); GLAPI void APIENTRY glCombinerOutputNV (GLenum, GLenum, GLenum, GLenum, GLenum, GLenum, GLenum, GLboolean, GLboolean, GLboolean); GLAPI void APIENTRY glFinalCombinerInputNV (GLenum, GLenum, GLenum, GLenum); GLAPI void APIENTRY glGetCombinerInputParameterfvNV (GLenum, GLenum, GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetCombinerInputParameterivNV (GLenum, GLenum, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetCombinerOutputParameterfvNV (GLenum, GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetCombinerOutputParameterivNV (GLenum, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetFinalCombinerInputParameterfvNV (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetFinalCombinerInputParameterivNV (GLenum, GLenum, GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOMBINERPARAMETERFVNVPROC) (GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLCOMBINERPARAMETERFNVPROC) (GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLCOMBINERPARAMETERIVNVPROC) (GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLCOMBINERPARAMETERINVPROC) (GLenum pname, GLint param); typedef void (APIENTRYP PFNGLCOMBINERINPUTNVPROC) (GLenum stage, GLenum portion, GLenum variable, GLenum input, GLenum mapping, GLenum componentUsage); typedef void (APIENTRYP PFNGLCOMBINEROUTPUTNVPROC) (GLenum stage, GLenum portion, GLenum abOutput, GLenum cdOutput, GLenum sumOutput, GLenum scale, GLenum bias, GLboolean abDotProduct, GLboolean cdDotProduct, GLboolean muxSum); typedef void (APIENTRYP PFNGLFINALCOMBINERINPUTNVPROC) (GLenum variable, GLenum input, GLenum mapping, GLenum componentUsage); typedef void (APIENTRYP PFNGLGETCOMBINERINPUTPARAMETERFVNVPROC) (GLenum stage, GLenum portion, GLenum variable, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETCOMBINERINPUTPARAMETERIVNVPROC) (GLenum stage, GLenum portion, GLenum variable, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETCOMBINEROUTPUTPARAMETERFVNVPROC) (GLenum stage, GLenum portion, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETCOMBINEROUTPUTPARAMETERIVNVPROC) (GLenum stage, GLenum portion, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETFINALCOMBINERINPUTPARAMETERFVNVPROC) (GLenum variable, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETFINALCOMBINERINPUTPARAMETERIVNVPROC) (GLenum variable, GLenum pname, GLint *params); #endif #ifndef GL_NV_fog_distance #define GL_NV_fog_distance 1 #endif #ifndef GL_NV_texgen_emboss #define GL_NV_texgen_emboss 1 #endif #ifndef GL_NV_blend_square #define GL_NV_blend_square 1 #endif #ifndef GL_NV_texture_env_combine4 #define GL_NV_texture_env_combine4 1 #endif #ifndef GL_MESA_resize_buffers #define GL_MESA_resize_buffers 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glResizeBuffersMESA (void); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLRESIZEBUFFERSMESAPROC) (void); #endif #ifndef GL_MESA_window_pos #define GL_MESA_window_pos 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glWindowPos2dMESA (GLdouble, GLdouble); GLAPI void APIENTRY glWindowPos2dvMESA (const GLdouble *); GLAPI void APIENTRY glWindowPos2fMESA (GLfloat, GLfloat); GLAPI void APIENTRY glWindowPos2fvMESA (const GLfloat *); GLAPI void APIENTRY glWindowPos2iMESA (GLint, GLint); GLAPI void APIENTRY glWindowPos2ivMESA (const GLint *); GLAPI void APIENTRY glWindowPos2sMESA (GLshort, GLshort); GLAPI void APIENTRY glWindowPos2svMESA (const GLshort *); GLAPI void APIENTRY glWindowPos3dMESA (GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glWindowPos3dvMESA (const GLdouble *); GLAPI void APIENTRY glWindowPos3fMESA (GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glWindowPos3fvMESA (const GLfloat *); GLAPI void APIENTRY glWindowPos3iMESA (GLint, GLint, GLint); GLAPI void APIENTRY glWindowPos3ivMESA (const GLint *); GLAPI void APIENTRY glWindowPos3sMESA (GLshort, GLshort, GLshort); GLAPI void APIENTRY glWindowPos3svMESA (const GLshort *); GLAPI void APIENTRY glWindowPos4dMESA (GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glWindowPos4dvMESA (const GLdouble *); GLAPI void APIENTRY glWindowPos4fMESA (GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glWindowPos4fvMESA (const GLfloat *); GLAPI void APIENTRY glWindowPos4iMESA (GLint, GLint, GLint, GLint); GLAPI void APIENTRY glWindowPos4ivMESA (const GLint *); GLAPI void APIENTRY glWindowPos4sMESA (GLshort, GLshort, GLshort, GLshort); GLAPI void APIENTRY glWindowPos4svMESA (const GLshort *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLWINDOWPOS2DMESAPROC) (GLdouble x, GLdouble y); typedef void (APIENTRYP PFNGLWINDOWPOS2DVMESAPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLWINDOWPOS2FMESAPROC) (GLfloat x, GLfloat y); typedef void (APIENTRYP PFNGLWINDOWPOS2FVMESAPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLWINDOWPOS2IMESAPROC) (GLint x, GLint y); typedef void (APIENTRYP PFNGLWINDOWPOS2IVMESAPROC) (const GLint *v); typedef void (APIENTRYP PFNGLWINDOWPOS2SMESAPROC) (GLshort x, GLshort y); typedef void (APIENTRYP PFNGLWINDOWPOS2SVMESAPROC) (const GLshort *v); typedef void (APIENTRYP PFNGLWINDOWPOS3DMESAPROC) (GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLWINDOWPOS3DVMESAPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLWINDOWPOS3FMESAPROC) (GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLWINDOWPOS3FVMESAPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLWINDOWPOS3IMESAPROC) (GLint x, GLint y, GLint z); typedef void (APIENTRYP PFNGLWINDOWPOS3IVMESAPROC) (const GLint *v); typedef void (APIENTRYP PFNGLWINDOWPOS3SMESAPROC) (GLshort x, GLshort y, GLshort z); typedef void (APIENTRYP PFNGLWINDOWPOS3SVMESAPROC) (const GLshort *v); typedef void (APIENTRYP PFNGLWINDOWPOS4DMESAPROC) (GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLWINDOWPOS4DVMESAPROC) (const GLdouble *v); typedef void (APIENTRYP PFNGLWINDOWPOS4FMESAPROC) (GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLWINDOWPOS4FVMESAPROC) (const GLfloat *v); typedef void (APIENTRYP PFNGLWINDOWPOS4IMESAPROC) (GLint x, GLint y, GLint z, GLint w); typedef void (APIENTRYP PFNGLWINDOWPOS4IVMESAPROC) (const GLint *v); typedef void (APIENTRYP PFNGLWINDOWPOS4SMESAPROC) (GLshort x, GLshort y, GLshort z, GLshort w); typedef void (APIENTRYP PFNGLWINDOWPOS4SVMESAPROC) (const GLshort *v); #endif #ifndef GL_IBM_cull_vertex #define GL_IBM_cull_vertex 1 #endif #ifndef GL_IBM_multimode_draw_arrays #define GL_IBM_multimode_draw_arrays 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glMultiModeDrawArraysIBM (const GLenum *, const GLint *, const GLsizei *, GLsizei, GLint); GLAPI void APIENTRY glMultiModeDrawElementsIBM (const GLenum *, const GLsizei *, GLenum, const GLvoid* const *, GLsizei, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLMULTIMODEDRAWARRAYSIBMPROC) (const GLenum *mode, const GLint *first, const GLsizei *count, GLsizei primcount, GLint modestride); typedef void (APIENTRYP PFNGLMULTIMODEDRAWELEMENTSIBMPROC) (const GLenum *mode, const GLsizei *count, GLenum type, const GLvoid* const *indices, GLsizei primcount, GLint modestride); #endif #ifndef GL_IBM_vertex_array_lists #define GL_IBM_vertex_array_lists 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glColorPointerListIBM (GLint, GLenum, GLint, const GLvoid* *, GLint); GLAPI void APIENTRY glSecondaryColorPointerListIBM (GLint, GLenum, GLint, const GLvoid* *, GLint); GLAPI void APIENTRY glEdgeFlagPointerListIBM (GLint, const GLboolean* *, GLint); GLAPI void APIENTRY glFogCoordPointerListIBM (GLenum, GLint, const GLvoid* *, GLint); GLAPI void APIENTRY glIndexPointerListIBM (GLenum, GLint, const GLvoid* *, GLint); GLAPI void APIENTRY glNormalPointerListIBM (GLenum, GLint, const GLvoid* *, GLint); GLAPI void APIENTRY glTexCoordPointerListIBM (GLint, GLenum, GLint, const GLvoid* *, GLint); GLAPI void APIENTRY glVertexPointerListIBM (GLint, GLenum, GLint, const GLvoid* *, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOLORPOINTERLISTIBMPROC) (GLint size, GLenum type, GLint stride, const GLvoid* *pointer, GLint ptrstride); typedef void (APIENTRYP PFNGLSECONDARYCOLORPOINTERLISTIBMPROC) (GLint size, GLenum type, GLint stride, const GLvoid* *pointer, GLint ptrstride); typedef void (APIENTRYP PFNGLEDGEFLAGPOINTERLISTIBMPROC) (GLint stride, const GLboolean* *pointer, GLint ptrstride); typedef void (APIENTRYP PFNGLFOGCOORDPOINTERLISTIBMPROC) (GLenum type, GLint stride, const GLvoid* *pointer, GLint ptrstride); typedef void (APIENTRYP PFNGLINDEXPOINTERLISTIBMPROC) (GLenum type, GLint stride, const GLvoid* *pointer, GLint ptrstride); typedef void (APIENTRYP PFNGLNORMALPOINTERLISTIBMPROC) (GLenum type, GLint stride, const GLvoid* *pointer, GLint ptrstride); typedef void (APIENTRYP PFNGLTEXCOORDPOINTERLISTIBMPROC) (GLint size, GLenum type, GLint stride, const GLvoid* *pointer, GLint ptrstride); typedef void (APIENTRYP PFNGLVERTEXPOINTERLISTIBMPROC) (GLint size, GLenum type, GLint stride, const GLvoid* *pointer, GLint ptrstride); #endif #ifndef GL_SGIX_subsample #define GL_SGIX_subsample 1 #endif #ifndef GL_SGIX_ycrcba #define GL_SGIX_ycrcba 1 #endif #ifndef GL_SGIX_ycrcb_subsample #define GL_SGIX_ycrcb_subsample 1 #endif #ifndef GL_SGIX_depth_pass_instrument #define GL_SGIX_depth_pass_instrument 1 #endif #ifndef GL_3DFX_texture_compression_FXT1 #define GL_3DFX_texture_compression_FXT1 1 #endif #ifndef GL_3DFX_multisample #define GL_3DFX_multisample 1 #endif #ifndef GL_3DFX_tbuffer #define GL_3DFX_tbuffer 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTbufferMask3DFX (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTBUFFERMASK3DFXPROC) (GLuint mask); #endif #ifndef GL_EXT_multisample #define GL_EXT_multisample 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glSampleMaskEXT (GLclampf, GLboolean); GLAPI void APIENTRY glSamplePatternEXT (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSAMPLEMASKEXTPROC) (GLclampf value, GLboolean invert); typedef void (APIENTRYP PFNGLSAMPLEPATTERNEXTPROC) (GLenum pattern); #endif #ifndef GL_SGIX_vertex_preclip #define GL_SGIX_vertex_preclip 1 #endif #ifndef GL_SGIX_convolution_accuracy #define GL_SGIX_convolution_accuracy 1 #endif #ifndef GL_SGIX_resample #define GL_SGIX_resample 1 #endif #ifndef GL_SGIS_point_line_texgen #define GL_SGIS_point_line_texgen 1 #endif #ifndef GL_SGIS_texture_color_mask #define GL_SGIS_texture_color_mask 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTextureColorMaskSGIS (GLboolean, GLboolean, GLboolean, GLboolean); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXTURECOLORMASKSGISPROC) (GLboolean red, GLboolean green, GLboolean blue, GLboolean alpha); #endif #ifndef GL_SGIX_igloo_interface #define GL_SGIX_igloo_interface 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glIglooInterfaceSGIX (GLenum, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLIGLOOINTERFACESGIXPROC) (GLenum pname, const GLvoid *params); #endif #ifndef GL_EXT_texture_env_dot3 #define GL_EXT_texture_env_dot3 1 #endif #ifndef GL_ATI_texture_mirror_once #define GL_ATI_texture_mirror_once 1 #endif #ifndef GL_NV_fence #define GL_NV_fence 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDeleteFencesNV (GLsizei, const GLuint *); GLAPI void APIENTRY glGenFencesNV (GLsizei, GLuint *); GLAPI GLboolean APIENTRY glIsFenceNV (GLuint); GLAPI GLboolean APIENTRY glTestFenceNV (GLuint); GLAPI void APIENTRY glGetFenceivNV (GLuint, GLenum, GLint *); GLAPI void APIENTRY glFinishFenceNV (GLuint); GLAPI void APIENTRY glSetFenceNV (GLuint, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDELETEFENCESNVPROC) (GLsizei n, const GLuint *fences); typedef void (APIENTRYP PFNGLGENFENCESNVPROC) (GLsizei n, GLuint *fences); typedef GLboolean (APIENTRYP PFNGLISFENCENVPROC) (GLuint fence); typedef GLboolean (APIENTRYP PFNGLTESTFENCENVPROC) (GLuint fence); typedef void (APIENTRYP PFNGLGETFENCEIVNVPROC) (GLuint fence, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLFINISHFENCENVPROC) (GLuint fence); typedef void (APIENTRYP PFNGLSETFENCENVPROC) (GLuint fence, GLenum condition); #endif #ifndef GL_NV_evaluators #define GL_NV_evaluators 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glMapControlPointsNV (GLenum, GLuint, GLenum, GLsizei, GLsizei, GLint, GLint, GLboolean, const GLvoid *); GLAPI void APIENTRY glMapParameterivNV (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glMapParameterfvNV (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glGetMapControlPointsNV (GLenum, GLuint, GLenum, GLsizei, GLsizei, GLboolean, GLvoid *); GLAPI void APIENTRY glGetMapParameterivNV (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetMapParameterfvNV (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetMapAttribParameterivNV (GLenum, GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetMapAttribParameterfvNV (GLenum, GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glEvalMapsNV (GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLMAPCONTROLPOINTSNVPROC) (GLenum target, GLuint index, GLenum type, GLsizei ustride, GLsizei vstride, GLint uorder, GLint vorder, GLboolean packed, const GLvoid *points); typedef void (APIENTRYP PFNGLMAPPARAMETERIVNVPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLMAPPARAMETERFVNVPROC) (GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLGETMAPCONTROLPOINTSNVPROC) (GLenum target, GLuint index, GLenum type, GLsizei ustride, GLsizei vstride, GLboolean packed, GLvoid *points); typedef void (APIENTRYP PFNGLGETMAPPARAMETERIVNVPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETMAPPARAMETERFVNVPROC) (GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETMAPATTRIBPARAMETERIVNVPROC) (GLenum target, GLuint index, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETMAPATTRIBPARAMETERFVNVPROC) (GLenum target, GLuint index, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLEVALMAPSNVPROC) (GLenum target, GLenum mode); #endif #ifndef GL_NV_packed_depth_stencil #define GL_NV_packed_depth_stencil 1 #endif #ifndef GL_NV_register_combiners2 #define GL_NV_register_combiners2 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glCombinerStageParameterfvNV (GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glGetCombinerStageParameterfvNV (GLenum, GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOMBINERSTAGEPARAMETERFVNVPROC) (GLenum stage, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLGETCOMBINERSTAGEPARAMETERFVNVPROC) (GLenum stage, GLenum pname, GLfloat *params); #endif #ifndef GL_NV_texture_compression_vtc #define GL_NV_texture_compression_vtc 1 #endif #ifndef GL_NV_texture_rectangle #define GL_NV_texture_rectangle 1 #endif #ifndef GL_NV_texture_shader #define GL_NV_texture_shader 1 #endif #ifndef GL_NV_texture_shader2 #define GL_NV_texture_shader2 1 #endif #ifndef GL_NV_vertex_array_range2 #define GL_NV_vertex_array_range2 1 #endif #ifndef GL_NV_vertex_program #define GL_NV_vertex_program 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLboolean APIENTRY glAreProgramsResidentNV (GLsizei, const GLuint *, GLboolean *); GLAPI void APIENTRY glBindProgramNV (GLenum, GLuint); GLAPI void APIENTRY glDeleteProgramsNV (GLsizei, const GLuint *); GLAPI void APIENTRY glExecuteProgramNV (GLenum, GLuint, const GLfloat *); GLAPI void APIENTRY glGenProgramsNV (GLsizei, GLuint *); GLAPI void APIENTRY glGetProgramParameterdvNV (GLenum, GLuint, GLenum, GLdouble *); GLAPI void APIENTRY glGetProgramParameterfvNV (GLenum, GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetProgramivNV (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetProgramStringNV (GLuint, GLenum, GLubyte *); GLAPI void APIENTRY glGetTrackMatrixivNV (GLenum, GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetVertexAttribdvNV (GLuint, GLenum, GLdouble *); GLAPI void APIENTRY glGetVertexAttribfvNV (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetVertexAttribivNV (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetVertexAttribPointervNV (GLuint, GLenum, GLvoid* *); GLAPI GLboolean APIENTRY glIsProgramNV (GLuint); GLAPI void APIENTRY glLoadProgramNV (GLenum, GLuint, GLsizei, const GLubyte *); GLAPI void APIENTRY glProgramParameter4dNV (GLenum, GLuint, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glProgramParameter4dvNV (GLenum, GLuint, const GLdouble *); GLAPI void APIENTRY glProgramParameter4fNV (GLenum, GLuint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glProgramParameter4fvNV (GLenum, GLuint, const GLfloat *); GLAPI void APIENTRY glProgramParameters4dvNV (GLenum, GLuint, GLuint, const GLdouble *); GLAPI void APIENTRY glProgramParameters4fvNV (GLenum, GLuint, GLuint, const GLfloat *); GLAPI void APIENTRY glRequestResidentProgramsNV (GLsizei, const GLuint *); GLAPI void APIENTRY glTrackMatrixNV (GLenum, GLuint, GLenum, GLenum); GLAPI void APIENTRY glVertexAttribPointerNV (GLuint, GLint, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glVertexAttrib1dNV (GLuint, GLdouble); GLAPI void APIENTRY glVertexAttrib1dvNV (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib1fNV (GLuint, GLfloat); GLAPI void APIENTRY glVertexAttrib1fvNV (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib1sNV (GLuint, GLshort); GLAPI void APIENTRY glVertexAttrib1svNV (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib2dNV (GLuint, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib2dvNV (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib2fNV (GLuint, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib2fvNV (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib2sNV (GLuint, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib2svNV (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib3dNV (GLuint, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib3dvNV (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib3fNV (GLuint, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib3fvNV (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib3sNV (GLuint, GLshort, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib3svNV (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib4dNV (GLuint, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glVertexAttrib4dvNV (GLuint, const GLdouble *); GLAPI void APIENTRY glVertexAttrib4fNV (GLuint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glVertexAttrib4fvNV (GLuint, const GLfloat *); GLAPI void APIENTRY glVertexAttrib4sNV (GLuint, GLshort, GLshort, GLshort, GLshort); GLAPI void APIENTRY glVertexAttrib4svNV (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttrib4ubNV (GLuint, GLubyte, GLubyte, GLubyte, GLubyte); GLAPI void APIENTRY glVertexAttrib4ubvNV (GLuint, const GLubyte *); GLAPI void APIENTRY glVertexAttribs1dvNV (GLuint, GLsizei, const GLdouble *); GLAPI void APIENTRY glVertexAttribs1fvNV (GLuint, GLsizei, const GLfloat *); GLAPI void APIENTRY glVertexAttribs1svNV (GLuint, GLsizei, const GLshort *); GLAPI void APIENTRY glVertexAttribs2dvNV (GLuint, GLsizei, const GLdouble *); GLAPI void APIENTRY glVertexAttribs2fvNV (GLuint, GLsizei, const GLfloat *); GLAPI void APIENTRY glVertexAttribs2svNV (GLuint, GLsizei, const GLshort *); GLAPI void APIENTRY glVertexAttribs3dvNV (GLuint, GLsizei, const GLdouble *); GLAPI void APIENTRY glVertexAttribs3fvNV (GLuint, GLsizei, const GLfloat *); GLAPI void APIENTRY glVertexAttribs3svNV (GLuint, GLsizei, const GLshort *); GLAPI void APIENTRY glVertexAttribs4dvNV (GLuint, GLsizei, const GLdouble *); GLAPI void APIENTRY glVertexAttribs4fvNV (GLuint, GLsizei, const GLfloat *); GLAPI void APIENTRY glVertexAttribs4svNV (GLuint, GLsizei, const GLshort *); GLAPI void APIENTRY glVertexAttribs4ubvNV (GLuint, GLsizei, const GLubyte *); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLboolean (APIENTRYP PFNGLAREPROGRAMSRESIDENTNVPROC) (GLsizei n, const GLuint *programs, GLboolean *residences); typedef void (APIENTRYP PFNGLBINDPROGRAMNVPROC) (GLenum target, GLuint id); typedef void (APIENTRYP PFNGLDELETEPROGRAMSNVPROC) (GLsizei n, const GLuint *programs); typedef void (APIENTRYP PFNGLEXECUTEPROGRAMNVPROC) (GLenum target, GLuint id, const GLfloat *params); typedef void (APIENTRYP PFNGLGENPROGRAMSNVPROC) (GLsizei n, GLuint *programs); typedef void (APIENTRYP PFNGLGETPROGRAMPARAMETERDVNVPROC) (GLenum target, GLuint index, GLenum pname, GLdouble *params); typedef void (APIENTRYP PFNGLGETPROGRAMPARAMETERFVNVPROC) (GLenum target, GLuint index, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETPROGRAMIVNVPROC) (GLuint id, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETPROGRAMSTRINGNVPROC) (GLuint id, GLenum pname, GLubyte *program); typedef void (APIENTRYP PFNGLGETTRACKMATRIXIVNVPROC) (GLenum target, GLuint address, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBDVNVPROC) (GLuint index, GLenum pname, GLdouble *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBFVNVPROC) (GLuint index, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIVNVPROC) (GLuint index, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVNVPROC) (GLuint index, GLenum pname, GLvoid* *pointer); typedef GLboolean (APIENTRYP PFNGLISPROGRAMNVPROC) (GLuint id); typedef void (APIENTRYP PFNGLLOADPROGRAMNVPROC) (GLenum target, GLuint id, GLsizei len, const GLubyte *program); typedef void (APIENTRYP PFNGLPROGRAMPARAMETER4DNVPROC) (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLPROGRAMPARAMETER4DVNVPROC) (GLenum target, GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLPROGRAMPARAMETER4FNVPROC) (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLPROGRAMPARAMETER4FVNVPROC) (GLenum target, GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLPROGRAMPARAMETERS4DVNVPROC) (GLenum target, GLuint index, GLuint count, const GLdouble *v); typedef void (APIENTRYP PFNGLPROGRAMPARAMETERS4FVNVPROC) (GLenum target, GLuint index, GLuint count, const GLfloat *v); typedef void (APIENTRYP PFNGLREQUESTRESIDENTPROGRAMSNVPROC) (GLsizei n, const GLuint *programs); typedef void (APIENTRYP PFNGLTRACKMATRIXNVPROC) (GLenum target, GLuint address, GLenum matrix, GLenum transform); typedef void (APIENTRYP PFNGLVERTEXATTRIBPOINTERNVPROC) (GLuint index, GLint fsize, GLenum type, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLVERTEXATTRIB1DNVPROC) (GLuint index, GLdouble x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1DVNVPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB1FNVPROC) (GLuint index, GLfloat x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1FVNVPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB1SNVPROC) (GLuint index, GLshort x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1SVNVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2DNVPROC) (GLuint index, GLdouble x, GLdouble y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2DVNVPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2FNVPROC) (GLuint index, GLfloat x, GLfloat y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2FVNVPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2SNVPROC) (GLuint index, GLshort x, GLshort y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2SVNVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3DNVPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3DVNVPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3FNVPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3FVNVPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3SNVPROC) (GLuint index, GLshort x, GLshort y, GLshort z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3SVNVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4DNVPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4DVNVPROC) (GLuint index, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4FNVPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4FVNVPROC) (GLuint index, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4SNVPROC) (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4SVNVPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4UBNVPROC) (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4UBVNVPROC) (GLuint index, const GLubyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS1DVNVPROC) (GLuint index, GLsizei count, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS1FVNVPROC) (GLuint index, GLsizei count, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS1SVNVPROC) (GLuint index, GLsizei count, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS2DVNVPROC) (GLuint index, GLsizei count, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS2FVNVPROC) (GLuint index, GLsizei count, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS2SVNVPROC) (GLuint index, GLsizei count, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS3DVNVPROC) (GLuint index, GLsizei count, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS3FVNVPROC) (GLuint index, GLsizei count, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS3SVNVPROC) (GLuint index, GLsizei count, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS4DVNVPROC) (GLuint index, GLsizei count, const GLdouble *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS4FVNVPROC) (GLuint index, GLsizei count, const GLfloat *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS4SVNVPROC) (GLuint index, GLsizei count, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS4UBVNVPROC) (GLuint index, GLsizei count, const GLubyte *v); #endif #ifndef GL_SGIX_texture_coordinate_clamp #define GL_SGIX_texture_coordinate_clamp 1 #endif #ifndef GL_SGIX_scalebias_hint #define GL_SGIX_scalebias_hint 1 #endif #ifndef GL_OML_interlace #define GL_OML_interlace 1 #endif #ifndef GL_OML_subsample #define GL_OML_subsample 1 #endif #ifndef GL_OML_resample #define GL_OML_resample 1 #endif #ifndef GL_NV_copy_depth_to_color #define GL_NV_copy_depth_to_color 1 #endif #ifndef GL_ATI_envmap_bumpmap #define GL_ATI_envmap_bumpmap 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTexBumpParameterivATI (GLenum, const GLint *); GLAPI void APIENTRY glTexBumpParameterfvATI (GLenum, const GLfloat *); GLAPI void APIENTRY glGetTexBumpParameterivATI (GLenum, GLint *); GLAPI void APIENTRY glGetTexBumpParameterfvATI (GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXBUMPPARAMETERIVATIPROC) (GLenum pname, const GLint *param); typedef void (APIENTRYP PFNGLTEXBUMPPARAMETERFVATIPROC) (GLenum pname, const GLfloat *param); typedef void (APIENTRYP PFNGLGETTEXBUMPPARAMETERIVATIPROC) (GLenum pname, GLint *param); typedef void (APIENTRYP PFNGLGETTEXBUMPPARAMETERFVATIPROC) (GLenum pname, GLfloat *param); #endif #ifndef GL_ATI_fragment_shader #define GL_ATI_fragment_shader 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLuint APIENTRY glGenFragmentShadersATI (GLuint); GLAPI void APIENTRY glBindFragmentShaderATI (GLuint); GLAPI void APIENTRY glDeleteFragmentShaderATI (GLuint); GLAPI void APIENTRY glBeginFragmentShaderATI (void); GLAPI void APIENTRY glEndFragmentShaderATI (void); GLAPI void APIENTRY glPassTexCoordATI (GLuint, GLuint, GLenum); GLAPI void APIENTRY glSampleMapATI (GLuint, GLuint, GLenum); GLAPI void APIENTRY glColorFragmentOp1ATI (GLenum, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glColorFragmentOp2ATI (GLenum, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glColorFragmentOp3ATI (GLenum, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glAlphaFragmentOp1ATI (GLenum, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glAlphaFragmentOp2ATI (GLenum, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glAlphaFragmentOp3ATI (GLenum, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glSetFragmentShaderConstantATI (GLuint, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLuint (APIENTRYP PFNGLGENFRAGMENTSHADERSATIPROC) (GLuint range); typedef void (APIENTRYP PFNGLBINDFRAGMENTSHADERATIPROC) (GLuint id); typedef void (APIENTRYP PFNGLDELETEFRAGMENTSHADERATIPROC) (GLuint id); typedef void (APIENTRYP PFNGLBEGINFRAGMENTSHADERATIPROC) (void); typedef void (APIENTRYP PFNGLENDFRAGMENTSHADERATIPROC) (void); typedef void (APIENTRYP PFNGLPASSTEXCOORDATIPROC) (GLuint dst, GLuint coord, GLenum swizzle); typedef void (APIENTRYP PFNGLSAMPLEMAPATIPROC) (GLuint dst, GLuint interp, GLenum swizzle); typedef void (APIENTRYP PFNGLCOLORFRAGMENTOP1ATIPROC) (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod); typedef void (APIENTRYP PFNGLCOLORFRAGMENTOP2ATIPROC) (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod); typedef void (APIENTRYP PFNGLCOLORFRAGMENTOP3ATIPROC) (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod, GLuint arg3, GLuint arg3Rep, GLuint arg3Mod); typedef void (APIENTRYP PFNGLALPHAFRAGMENTOP1ATIPROC) (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod); typedef void (APIENTRYP PFNGLALPHAFRAGMENTOP2ATIPROC) (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod); typedef void (APIENTRYP PFNGLALPHAFRAGMENTOP3ATIPROC) (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod, GLuint arg3, GLuint arg3Rep, GLuint arg3Mod); typedef void (APIENTRYP PFNGLSETFRAGMENTSHADERCONSTANTATIPROC) (GLuint dst, const GLfloat *value); #endif #ifndef GL_ATI_pn_triangles #define GL_ATI_pn_triangles 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPNTrianglesiATI (GLenum, GLint); GLAPI void APIENTRY glPNTrianglesfATI (GLenum, GLfloat); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPNTRIANGLESIATIPROC) (GLenum pname, GLint param); typedef void (APIENTRYP PFNGLPNTRIANGLESFATIPROC) (GLenum pname, GLfloat param); #endif #ifndef GL_ATI_vertex_array_object #define GL_ATI_vertex_array_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLuint APIENTRY glNewObjectBufferATI (GLsizei, const GLvoid *, GLenum); GLAPI GLboolean APIENTRY glIsObjectBufferATI (GLuint); GLAPI void APIENTRY glUpdateObjectBufferATI (GLuint, GLuint, GLsizei, const GLvoid *, GLenum); GLAPI void APIENTRY glGetObjectBufferfvATI (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetObjectBufferivATI (GLuint, GLenum, GLint *); GLAPI void APIENTRY glFreeObjectBufferATI (GLuint); GLAPI void APIENTRY glArrayObjectATI (GLenum, GLint, GLenum, GLsizei, GLuint, GLuint); GLAPI void APIENTRY glGetArrayObjectfvATI (GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetArrayObjectivATI (GLenum, GLenum, GLint *); GLAPI void APIENTRY glVariantArrayObjectATI (GLuint, GLenum, GLsizei, GLuint, GLuint); GLAPI void APIENTRY glGetVariantArrayObjectfvATI (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetVariantArrayObjectivATI (GLuint, GLenum, GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLuint (APIENTRYP PFNGLNEWOBJECTBUFFERATIPROC) (GLsizei size, const GLvoid *pointer, GLenum usage); typedef GLboolean (APIENTRYP PFNGLISOBJECTBUFFERATIPROC) (GLuint buffer); typedef void (APIENTRYP PFNGLUPDATEOBJECTBUFFERATIPROC) (GLuint buffer, GLuint offset, GLsizei size, const GLvoid *pointer, GLenum preserve); typedef void (APIENTRYP PFNGLGETOBJECTBUFFERFVATIPROC) (GLuint buffer, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETOBJECTBUFFERIVATIPROC) (GLuint buffer, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLFREEOBJECTBUFFERATIPROC) (GLuint buffer); typedef void (APIENTRYP PFNGLARRAYOBJECTATIPROC) (GLenum array, GLint size, GLenum type, GLsizei stride, GLuint buffer, GLuint offset); typedef void (APIENTRYP PFNGLGETARRAYOBJECTFVATIPROC) (GLenum array, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETARRAYOBJECTIVATIPROC) (GLenum array, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLVARIANTARRAYOBJECTATIPROC) (GLuint id, GLenum type, GLsizei stride, GLuint buffer, GLuint offset); typedef void (APIENTRYP PFNGLGETVARIANTARRAYOBJECTFVATIPROC) (GLuint id, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETVARIANTARRAYOBJECTIVATIPROC) (GLuint id, GLenum pname, GLint *params); #endif #ifndef GL_EXT_vertex_shader #define GL_EXT_vertex_shader 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBeginVertexShaderEXT (void); GLAPI void APIENTRY glEndVertexShaderEXT (void); GLAPI void APIENTRY glBindVertexShaderEXT (GLuint); GLAPI GLuint APIENTRY glGenVertexShadersEXT (GLuint); GLAPI void APIENTRY glDeleteVertexShaderEXT (GLuint); GLAPI void APIENTRY glShaderOp1EXT (GLenum, GLuint, GLuint); GLAPI void APIENTRY glShaderOp2EXT (GLenum, GLuint, GLuint, GLuint); GLAPI void APIENTRY glShaderOp3EXT (GLenum, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glSwizzleEXT (GLuint, GLuint, GLenum, GLenum, GLenum, GLenum); GLAPI void APIENTRY glWriteMaskEXT (GLuint, GLuint, GLenum, GLenum, GLenum, GLenum); GLAPI void APIENTRY glInsertComponentEXT (GLuint, GLuint, GLuint); GLAPI void APIENTRY glExtractComponentEXT (GLuint, GLuint, GLuint); GLAPI GLuint APIENTRY glGenSymbolsEXT (GLenum, GLenum, GLenum, GLuint); GLAPI void APIENTRY glSetInvariantEXT (GLuint, GLenum, const GLvoid *); GLAPI void APIENTRY glSetLocalConstantEXT (GLuint, GLenum, const GLvoid *); GLAPI void APIENTRY glVariantbvEXT (GLuint, const GLbyte *); GLAPI void APIENTRY glVariantsvEXT (GLuint, const GLshort *); GLAPI void APIENTRY glVariantivEXT (GLuint, const GLint *); GLAPI void APIENTRY glVariantfvEXT (GLuint, const GLfloat *); GLAPI void APIENTRY glVariantdvEXT (GLuint, const GLdouble *); GLAPI void APIENTRY glVariantubvEXT (GLuint, const GLubyte *); GLAPI void APIENTRY glVariantusvEXT (GLuint, const GLushort *); GLAPI void APIENTRY glVariantuivEXT (GLuint, const GLuint *); GLAPI void APIENTRY glVariantPointerEXT (GLuint, GLenum, GLuint, const GLvoid *); GLAPI void APIENTRY glEnableVariantClientStateEXT (GLuint); GLAPI void APIENTRY glDisableVariantClientStateEXT (GLuint); GLAPI GLuint APIENTRY glBindLightParameterEXT (GLenum, GLenum); GLAPI GLuint APIENTRY glBindMaterialParameterEXT (GLenum, GLenum); GLAPI GLuint APIENTRY glBindTexGenParameterEXT (GLenum, GLenum, GLenum); GLAPI GLuint APIENTRY glBindTextureUnitParameterEXT (GLenum, GLenum); GLAPI GLuint APIENTRY glBindParameterEXT (GLenum); GLAPI GLboolean APIENTRY glIsVariantEnabledEXT (GLuint, GLenum); GLAPI void APIENTRY glGetVariantBooleanvEXT (GLuint, GLenum, GLboolean *); GLAPI void APIENTRY glGetVariantIntegervEXT (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetVariantFloatvEXT (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetVariantPointervEXT (GLuint, GLenum, GLvoid* *); GLAPI void APIENTRY glGetInvariantBooleanvEXT (GLuint, GLenum, GLboolean *); GLAPI void APIENTRY glGetInvariantIntegervEXT (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetInvariantFloatvEXT (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetLocalConstantBooleanvEXT (GLuint, GLenum, GLboolean *); GLAPI void APIENTRY glGetLocalConstantIntegervEXT (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetLocalConstantFloatvEXT (GLuint, GLenum, GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBEGINVERTEXSHADEREXTPROC) (void); typedef void (APIENTRYP PFNGLENDVERTEXSHADEREXTPROC) (void); typedef void (APIENTRYP PFNGLBINDVERTEXSHADEREXTPROC) (GLuint id); typedef GLuint (APIENTRYP PFNGLGENVERTEXSHADERSEXTPROC) (GLuint range); typedef void (APIENTRYP PFNGLDELETEVERTEXSHADEREXTPROC) (GLuint id); typedef void (APIENTRYP PFNGLSHADEROP1EXTPROC) (GLenum op, GLuint res, GLuint arg1); typedef void (APIENTRYP PFNGLSHADEROP2EXTPROC) (GLenum op, GLuint res, GLuint arg1, GLuint arg2); typedef void (APIENTRYP PFNGLSHADEROP3EXTPROC) (GLenum op, GLuint res, GLuint arg1, GLuint arg2, GLuint arg3); typedef void (APIENTRYP PFNGLSWIZZLEEXTPROC) (GLuint res, GLuint in, GLenum outX, GLenum outY, GLenum outZ, GLenum outW); typedef void (APIENTRYP PFNGLWRITEMASKEXTPROC) (GLuint res, GLuint in, GLenum outX, GLenum outY, GLenum outZ, GLenum outW); typedef void (APIENTRYP PFNGLINSERTCOMPONENTEXTPROC) (GLuint res, GLuint src, GLuint num); typedef void (APIENTRYP PFNGLEXTRACTCOMPONENTEXTPROC) (GLuint res, GLuint src, GLuint num); typedef GLuint (APIENTRYP PFNGLGENSYMBOLSEXTPROC) (GLenum datatype, GLenum storagetype, GLenum range, GLuint components); typedef void (APIENTRYP PFNGLSETINVARIANTEXTPROC) (GLuint id, GLenum type, const GLvoid *addr); typedef void (APIENTRYP PFNGLSETLOCALCONSTANTEXTPROC) (GLuint id, GLenum type, const GLvoid *addr); typedef void (APIENTRYP PFNGLVARIANTBVEXTPROC) (GLuint id, const GLbyte *addr); typedef void (APIENTRYP PFNGLVARIANTSVEXTPROC) (GLuint id, const GLshort *addr); typedef void (APIENTRYP PFNGLVARIANTIVEXTPROC) (GLuint id, const GLint *addr); typedef void (APIENTRYP PFNGLVARIANTFVEXTPROC) (GLuint id, const GLfloat *addr); typedef void (APIENTRYP PFNGLVARIANTDVEXTPROC) (GLuint id, const GLdouble *addr); typedef void (APIENTRYP PFNGLVARIANTUBVEXTPROC) (GLuint id, const GLubyte *addr); typedef void (APIENTRYP PFNGLVARIANTUSVEXTPROC) (GLuint id, const GLushort *addr); typedef void (APIENTRYP PFNGLVARIANTUIVEXTPROC) (GLuint id, const GLuint *addr); typedef void (APIENTRYP PFNGLVARIANTPOINTEREXTPROC) (GLuint id, GLenum type, GLuint stride, const GLvoid *addr); typedef void (APIENTRYP PFNGLENABLEVARIANTCLIENTSTATEEXTPROC) (GLuint id); typedef void (APIENTRYP PFNGLDISABLEVARIANTCLIENTSTATEEXTPROC) (GLuint id); typedef GLuint (APIENTRYP PFNGLBINDLIGHTPARAMETEREXTPROC) (GLenum light, GLenum value); typedef GLuint (APIENTRYP PFNGLBINDMATERIALPARAMETEREXTPROC) (GLenum face, GLenum value); typedef GLuint (APIENTRYP PFNGLBINDTEXGENPARAMETEREXTPROC) (GLenum unit, GLenum coord, GLenum value); typedef GLuint (APIENTRYP PFNGLBINDTEXTUREUNITPARAMETEREXTPROC) (GLenum unit, GLenum value); typedef GLuint (APIENTRYP PFNGLBINDPARAMETEREXTPROC) (GLenum value); typedef GLboolean (APIENTRYP PFNGLISVARIANTENABLEDEXTPROC) (GLuint id, GLenum cap); typedef void (APIENTRYP PFNGLGETVARIANTBOOLEANVEXTPROC) (GLuint id, GLenum value, GLboolean *data); typedef void (APIENTRYP PFNGLGETVARIANTINTEGERVEXTPROC) (GLuint id, GLenum value, GLint *data); typedef void (APIENTRYP PFNGLGETVARIANTFLOATVEXTPROC) (GLuint id, GLenum value, GLfloat *data); typedef void (APIENTRYP PFNGLGETVARIANTPOINTERVEXTPROC) (GLuint id, GLenum value, GLvoid* *data); typedef void (APIENTRYP PFNGLGETINVARIANTBOOLEANVEXTPROC) (GLuint id, GLenum value, GLboolean *data); typedef void (APIENTRYP PFNGLGETINVARIANTINTEGERVEXTPROC) (GLuint id, GLenum value, GLint *data); typedef void (APIENTRYP PFNGLGETINVARIANTFLOATVEXTPROC) (GLuint id, GLenum value, GLfloat *data); typedef void (APIENTRYP PFNGLGETLOCALCONSTANTBOOLEANVEXTPROC) (GLuint id, GLenum value, GLboolean *data); typedef void (APIENTRYP PFNGLGETLOCALCONSTANTINTEGERVEXTPROC) (GLuint id, GLenum value, GLint *data); typedef void (APIENTRYP PFNGLGETLOCALCONSTANTFLOATVEXTPROC) (GLuint id, GLenum value, GLfloat *data); #endif #ifndef GL_ATI_vertex_streams #define GL_ATI_vertex_streams 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertexStream1sATI (GLenum, GLshort); GLAPI void APIENTRY glVertexStream1svATI (GLenum, const GLshort *); GLAPI void APIENTRY glVertexStream1iATI (GLenum, GLint); GLAPI void APIENTRY glVertexStream1ivATI (GLenum, const GLint *); GLAPI void APIENTRY glVertexStream1fATI (GLenum, GLfloat); GLAPI void APIENTRY glVertexStream1fvATI (GLenum, const GLfloat *); GLAPI void APIENTRY glVertexStream1dATI (GLenum, GLdouble); GLAPI void APIENTRY glVertexStream1dvATI (GLenum, const GLdouble *); GLAPI void APIENTRY glVertexStream2sATI (GLenum, GLshort, GLshort); GLAPI void APIENTRY glVertexStream2svATI (GLenum, const GLshort *); GLAPI void APIENTRY glVertexStream2iATI (GLenum, GLint, GLint); GLAPI void APIENTRY glVertexStream2ivATI (GLenum, const GLint *); GLAPI void APIENTRY glVertexStream2fATI (GLenum, GLfloat, GLfloat); GLAPI void APIENTRY glVertexStream2fvATI (GLenum, const GLfloat *); GLAPI void APIENTRY glVertexStream2dATI (GLenum, GLdouble, GLdouble); GLAPI void APIENTRY glVertexStream2dvATI (GLenum, const GLdouble *); GLAPI void APIENTRY glVertexStream3sATI (GLenum, GLshort, GLshort, GLshort); GLAPI void APIENTRY glVertexStream3svATI (GLenum, const GLshort *); GLAPI void APIENTRY glVertexStream3iATI (GLenum, GLint, GLint, GLint); GLAPI void APIENTRY glVertexStream3ivATI (GLenum, const GLint *); GLAPI void APIENTRY glVertexStream3fATI (GLenum, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glVertexStream3fvATI (GLenum, const GLfloat *); GLAPI void APIENTRY glVertexStream3dATI (GLenum, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glVertexStream3dvATI (GLenum, const GLdouble *); GLAPI void APIENTRY glVertexStream4sATI (GLenum, GLshort, GLshort, GLshort, GLshort); GLAPI void APIENTRY glVertexStream4svATI (GLenum, const GLshort *); GLAPI void APIENTRY glVertexStream4iATI (GLenum, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glVertexStream4ivATI (GLenum, const GLint *); GLAPI void APIENTRY glVertexStream4fATI (GLenum, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glVertexStream4fvATI (GLenum, const GLfloat *); GLAPI void APIENTRY glVertexStream4dATI (GLenum, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glVertexStream4dvATI (GLenum, const GLdouble *); GLAPI void APIENTRY glNormalStream3bATI (GLenum, GLbyte, GLbyte, GLbyte); GLAPI void APIENTRY glNormalStream3bvATI (GLenum, const GLbyte *); GLAPI void APIENTRY glNormalStream3sATI (GLenum, GLshort, GLshort, GLshort); GLAPI void APIENTRY glNormalStream3svATI (GLenum, const GLshort *); GLAPI void APIENTRY glNormalStream3iATI (GLenum, GLint, GLint, GLint); GLAPI void APIENTRY glNormalStream3ivATI (GLenum, const GLint *); GLAPI void APIENTRY glNormalStream3fATI (GLenum, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glNormalStream3fvATI (GLenum, const GLfloat *); GLAPI void APIENTRY glNormalStream3dATI (GLenum, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glNormalStream3dvATI (GLenum, const GLdouble *); GLAPI void APIENTRY glClientActiveVertexStreamATI (GLenum); GLAPI void APIENTRY glVertexBlendEnviATI (GLenum, GLint); GLAPI void APIENTRY glVertexBlendEnvfATI (GLenum, GLfloat); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEXSTREAM1SATIPROC) (GLenum stream, GLshort x); typedef void (APIENTRYP PFNGLVERTEXSTREAM1SVATIPROC) (GLenum stream, const GLshort *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM1IATIPROC) (GLenum stream, GLint x); typedef void (APIENTRYP PFNGLVERTEXSTREAM1IVATIPROC) (GLenum stream, const GLint *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM1FATIPROC) (GLenum stream, GLfloat x); typedef void (APIENTRYP PFNGLVERTEXSTREAM1FVATIPROC) (GLenum stream, const GLfloat *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM1DATIPROC) (GLenum stream, GLdouble x); typedef void (APIENTRYP PFNGLVERTEXSTREAM1DVATIPROC) (GLenum stream, const GLdouble *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM2SATIPROC) (GLenum stream, GLshort x, GLshort y); typedef void (APIENTRYP PFNGLVERTEXSTREAM2SVATIPROC) (GLenum stream, const GLshort *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM2IATIPROC) (GLenum stream, GLint x, GLint y); typedef void (APIENTRYP PFNGLVERTEXSTREAM2IVATIPROC) (GLenum stream, const GLint *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM2FATIPROC) (GLenum stream, GLfloat x, GLfloat y); typedef void (APIENTRYP PFNGLVERTEXSTREAM2FVATIPROC) (GLenum stream, const GLfloat *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM2DATIPROC) (GLenum stream, GLdouble x, GLdouble y); typedef void (APIENTRYP PFNGLVERTEXSTREAM2DVATIPROC) (GLenum stream, const GLdouble *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM3SATIPROC) (GLenum stream, GLshort x, GLshort y, GLshort z); typedef void (APIENTRYP PFNGLVERTEXSTREAM3SVATIPROC) (GLenum stream, const GLshort *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM3IATIPROC) (GLenum stream, GLint x, GLint y, GLint z); typedef void (APIENTRYP PFNGLVERTEXSTREAM3IVATIPROC) (GLenum stream, const GLint *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM3FATIPROC) (GLenum stream, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLVERTEXSTREAM3FVATIPROC) (GLenum stream, const GLfloat *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM3DATIPROC) (GLenum stream, GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLVERTEXSTREAM3DVATIPROC) (GLenum stream, const GLdouble *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM4SATIPROC) (GLenum stream, GLshort x, GLshort y, GLshort z, GLshort w); typedef void (APIENTRYP PFNGLVERTEXSTREAM4SVATIPROC) (GLenum stream, const GLshort *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM4IATIPROC) (GLenum stream, GLint x, GLint y, GLint z, GLint w); typedef void (APIENTRYP PFNGLVERTEXSTREAM4IVATIPROC) (GLenum stream, const GLint *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM4FATIPROC) (GLenum stream, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLVERTEXSTREAM4FVATIPROC) (GLenum stream, const GLfloat *coords); typedef void (APIENTRYP PFNGLVERTEXSTREAM4DATIPROC) (GLenum stream, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLVERTEXSTREAM4DVATIPROC) (GLenum stream, const GLdouble *coords); typedef void (APIENTRYP PFNGLNORMALSTREAM3BATIPROC) (GLenum stream, GLbyte nx, GLbyte ny, GLbyte nz); typedef void (APIENTRYP PFNGLNORMALSTREAM3BVATIPROC) (GLenum stream, const GLbyte *coords); typedef void (APIENTRYP PFNGLNORMALSTREAM3SATIPROC) (GLenum stream, GLshort nx, GLshort ny, GLshort nz); typedef void (APIENTRYP PFNGLNORMALSTREAM3SVATIPROC) (GLenum stream, const GLshort *coords); typedef void (APIENTRYP PFNGLNORMALSTREAM3IATIPROC) (GLenum stream, GLint nx, GLint ny, GLint nz); typedef void (APIENTRYP PFNGLNORMALSTREAM3IVATIPROC) (GLenum stream, const GLint *coords); typedef void (APIENTRYP PFNGLNORMALSTREAM3FATIPROC) (GLenum stream, GLfloat nx, GLfloat ny, GLfloat nz); typedef void (APIENTRYP PFNGLNORMALSTREAM3FVATIPROC) (GLenum stream, const GLfloat *coords); typedef void (APIENTRYP PFNGLNORMALSTREAM3DATIPROC) (GLenum stream, GLdouble nx, GLdouble ny, GLdouble nz); typedef void (APIENTRYP PFNGLNORMALSTREAM3DVATIPROC) (GLenum stream, const GLdouble *coords); typedef void (APIENTRYP PFNGLCLIENTACTIVEVERTEXSTREAMATIPROC) (GLenum stream); typedef void (APIENTRYP PFNGLVERTEXBLENDENVIATIPROC) (GLenum pname, GLint param); typedef void (APIENTRYP PFNGLVERTEXBLENDENVFATIPROC) (GLenum pname, GLfloat param); #endif #ifndef GL_ATI_element_array #define GL_ATI_element_array 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glElementPointerATI (GLenum, const GLvoid *); GLAPI void APIENTRY glDrawElementArrayATI (GLenum, GLsizei); GLAPI void APIENTRY glDrawRangeElementArrayATI (GLenum, GLuint, GLuint, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLELEMENTPOINTERATIPROC) (GLenum type, const GLvoid *pointer); typedef void (APIENTRYP PFNGLDRAWELEMENTARRAYATIPROC) (GLenum mode, GLsizei count); typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTARRAYATIPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count); #endif #ifndef GL_SUN_mesh_array #define GL_SUN_mesh_array 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDrawMeshArraysSUN (GLenum, GLint, GLsizei, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDRAWMESHARRAYSSUNPROC) (GLenum mode, GLint first, GLsizei count, GLsizei width); #endif #ifndef GL_SUN_slice_accum #define GL_SUN_slice_accum 1 #endif #ifndef GL_NV_multisample_filter_hint #define GL_NV_multisample_filter_hint 1 #endif #ifndef GL_NV_depth_clamp #define GL_NV_depth_clamp 1 #endif #ifndef GL_NV_occlusion_query #define GL_NV_occlusion_query 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGenOcclusionQueriesNV (GLsizei, GLuint *); GLAPI void APIENTRY glDeleteOcclusionQueriesNV (GLsizei, const GLuint *); GLAPI GLboolean APIENTRY glIsOcclusionQueryNV (GLuint); GLAPI void APIENTRY glBeginOcclusionQueryNV (GLuint); GLAPI void APIENTRY glEndOcclusionQueryNV (void); GLAPI void APIENTRY glGetOcclusionQueryivNV (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetOcclusionQueryuivNV (GLuint, GLenum, GLuint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGENOCCLUSIONQUERIESNVPROC) (GLsizei n, GLuint *ids); typedef void (APIENTRYP PFNGLDELETEOCCLUSIONQUERIESNVPROC) (GLsizei n, const GLuint *ids); typedef GLboolean (APIENTRYP PFNGLISOCCLUSIONQUERYNVPROC) (GLuint id); typedef void (APIENTRYP PFNGLBEGINOCCLUSIONQUERYNVPROC) (GLuint id); typedef void (APIENTRYP PFNGLENDOCCLUSIONQUERYNVPROC) (void); typedef void (APIENTRYP PFNGLGETOCCLUSIONQUERYIVNVPROC) (GLuint id, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETOCCLUSIONQUERYUIVNVPROC) (GLuint id, GLenum pname, GLuint *params); #endif #ifndef GL_NV_point_sprite #define GL_NV_point_sprite 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPointParameteriNV (GLenum, GLint); GLAPI void APIENTRY glPointParameterivNV (GLenum, const GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPOINTPARAMETERINVPROC) (GLenum pname, GLint param); typedef void (APIENTRYP PFNGLPOINTPARAMETERIVNVPROC) (GLenum pname, const GLint *params); #endif #ifndef GL_NV_texture_shader3 #define GL_NV_texture_shader3 1 #endif #ifndef GL_NV_vertex_program1_1 #define GL_NV_vertex_program1_1 1 #endif #ifndef GL_EXT_shadow_funcs #define GL_EXT_shadow_funcs 1 #endif #ifndef GL_EXT_stencil_two_side #define GL_EXT_stencil_two_side 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glActiveStencilFaceEXT (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLACTIVESTENCILFACEEXTPROC) (GLenum face); #endif #ifndef GL_ATI_text_fragment_shader #define GL_ATI_text_fragment_shader 1 #endif #ifndef GL_APPLE_client_storage #define GL_APPLE_client_storage 1 #endif #ifndef GL_APPLE_element_array #define GL_APPLE_element_array 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glElementPointerAPPLE (GLenum, const GLvoid *); GLAPI void APIENTRY glDrawElementArrayAPPLE (GLenum, GLint, GLsizei); GLAPI void APIENTRY glDrawRangeElementArrayAPPLE (GLenum, GLuint, GLuint, GLint, GLsizei); GLAPI void APIENTRY glMultiDrawElementArrayAPPLE (GLenum, const GLint *, const GLsizei *, GLsizei); GLAPI void APIENTRY glMultiDrawRangeElementArrayAPPLE (GLenum, GLuint, GLuint, const GLint *, const GLsizei *, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLELEMENTPOINTERAPPLEPROC) (GLenum type, const GLvoid *pointer); typedef void (APIENTRYP PFNGLDRAWELEMENTARRAYAPPLEPROC) (GLenum mode, GLint first, GLsizei count); typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTARRAYAPPLEPROC) (GLenum mode, GLuint start, GLuint end, GLint first, GLsizei count); typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTARRAYAPPLEPROC) (GLenum mode, const GLint *first, const GLsizei *count, GLsizei primcount); typedef void (APIENTRYP PFNGLMULTIDRAWRANGEELEMENTARRAYAPPLEPROC) (GLenum mode, GLuint start, GLuint end, const GLint *first, const GLsizei *count, GLsizei primcount); #endif #ifndef GL_APPLE_fence #define GL_APPLE_fence 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGenFencesAPPLE (GLsizei, GLuint *); GLAPI void APIENTRY glDeleteFencesAPPLE (GLsizei, const GLuint *); GLAPI void APIENTRY glSetFenceAPPLE (GLuint); GLAPI GLboolean APIENTRY glIsFenceAPPLE (GLuint); GLAPI GLboolean APIENTRY glTestFenceAPPLE (GLuint); GLAPI void APIENTRY glFinishFenceAPPLE (GLuint); GLAPI GLboolean APIENTRY glTestObjectAPPLE (GLenum, GLuint); GLAPI void APIENTRY glFinishObjectAPPLE (GLenum, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGENFENCESAPPLEPROC) (GLsizei n, GLuint *fences); typedef void (APIENTRYP PFNGLDELETEFENCESAPPLEPROC) (GLsizei n, const GLuint *fences); typedef void (APIENTRYP PFNGLSETFENCEAPPLEPROC) (GLuint fence); typedef GLboolean (APIENTRYP PFNGLISFENCEAPPLEPROC) (GLuint fence); typedef GLboolean (APIENTRYP PFNGLTESTFENCEAPPLEPROC) (GLuint fence); typedef void (APIENTRYP PFNGLFINISHFENCEAPPLEPROC) (GLuint fence); typedef GLboolean (APIENTRYP PFNGLTESTOBJECTAPPLEPROC) (GLenum object, GLuint name); typedef void (APIENTRYP PFNGLFINISHOBJECTAPPLEPROC) (GLenum object, GLint name); #endif #ifndef GL_APPLE_vertex_array_object #define GL_APPLE_vertex_array_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBindVertexArrayAPPLE (GLuint); GLAPI void APIENTRY glDeleteVertexArraysAPPLE (GLsizei, const GLuint *); GLAPI void APIENTRY glGenVertexArraysAPPLE (GLsizei, GLuint *); GLAPI GLboolean APIENTRY glIsVertexArrayAPPLE (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBINDVERTEXARRAYAPPLEPROC) (GLuint array); typedef void (APIENTRYP PFNGLDELETEVERTEXARRAYSAPPLEPROC) (GLsizei n, const GLuint *arrays); typedef void (APIENTRYP PFNGLGENVERTEXARRAYSAPPLEPROC) (GLsizei n, GLuint *arrays); typedef GLboolean (APIENTRYP PFNGLISVERTEXARRAYAPPLEPROC) (GLuint array); #endif #ifndef GL_APPLE_vertex_array_range #define GL_APPLE_vertex_array_range 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertexArrayRangeAPPLE (GLsizei, GLvoid *); GLAPI void APIENTRY glFlushVertexArrayRangeAPPLE (GLsizei, GLvoid *); GLAPI void APIENTRY glVertexArrayParameteriAPPLE (GLenum, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEXARRAYRANGEAPPLEPROC) (GLsizei length, GLvoid *pointer); typedef void (APIENTRYP PFNGLFLUSHVERTEXARRAYRANGEAPPLEPROC) (GLsizei length, GLvoid *pointer); typedef void (APIENTRYP PFNGLVERTEXARRAYPARAMETERIAPPLEPROC) (GLenum pname, GLint param); #endif #ifndef GL_APPLE_ycbcr_422 #define GL_APPLE_ycbcr_422 1 #endif #ifndef GL_S3_s3tc #define GL_S3_s3tc 1 #endif #ifndef GL_ATI_draw_buffers #define GL_ATI_draw_buffers 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDrawBuffersATI (GLsizei, const GLenum *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDRAWBUFFERSATIPROC) (GLsizei n, const GLenum *bufs); #endif #ifndef GL_ATI_pixel_format_float #define GL_ATI_pixel_format_float 1 /* This is really a WGL extension, but defines some associated GL enums. * ATI does not export "GL_ATI_pixel_format_float" in the GL_EXTENSIONS string. */ #endif #ifndef GL_ATI_texture_env_combine3 #define GL_ATI_texture_env_combine3 1 #endif #ifndef GL_ATI_texture_float #define GL_ATI_texture_float 1 #endif #ifndef GL_NV_float_buffer #define GL_NV_float_buffer 1 #endif #ifndef GL_NV_fragment_program #define GL_NV_fragment_program 1 /* Some NV_fragment_program entry points are shared with ARB_vertex_program. */ #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProgramNamedParameter4fNV (GLuint, GLsizei, const GLubyte *, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glProgramNamedParameter4dNV (GLuint, GLsizei, const GLubyte *, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glProgramNamedParameter4fvNV (GLuint, GLsizei, const GLubyte *, const GLfloat *); GLAPI void APIENTRY glProgramNamedParameter4dvNV (GLuint, GLsizei, const GLubyte *, const GLdouble *); GLAPI void APIENTRY glGetProgramNamedParameterfvNV (GLuint, GLsizei, const GLubyte *, GLfloat *); GLAPI void APIENTRY glGetProgramNamedParameterdvNV (GLuint, GLsizei, const GLubyte *, GLdouble *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROGRAMNAMEDPARAMETER4FNVPROC) (GLuint id, GLsizei len, const GLubyte *name, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLPROGRAMNAMEDPARAMETER4DNVPROC) (GLuint id, GLsizei len, const GLubyte *name, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLPROGRAMNAMEDPARAMETER4FVNVPROC) (GLuint id, GLsizei len, const GLubyte *name, const GLfloat *v); typedef void (APIENTRYP PFNGLPROGRAMNAMEDPARAMETER4DVNVPROC) (GLuint id, GLsizei len, const GLubyte *name, const GLdouble *v); typedef void (APIENTRYP PFNGLGETPROGRAMNAMEDPARAMETERFVNVPROC) (GLuint id, GLsizei len, const GLubyte *name, GLfloat *params); typedef void (APIENTRYP PFNGLGETPROGRAMNAMEDPARAMETERDVNVPROC) (GLuint id, GLsizei len, const GLubyte *name, GLdouble *params); #endif #ifndef GL_NV_half_float #define GL_NV_half_float 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertex2hNV (GLhalfNV, GLhalfNV); GLAPI void APIENTRY glVertex2hvNV (const GLhalfNV *); GLAPI void APIENTRY glVertex3hNV (GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glVertex3hvNV (const GLhalfNV *); GLAPI void APIENTRY glVertex4hNV (GLhalfNV, GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glVertex4hvNV (const GLhalfNV *); GLAPI void APIENTRY glNormal3hNV (GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glNormal3hvNV (const GLhalfNV *); GLAPI void APIENTRY glColor3hNV (GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glColor3hvNV (const GLhalfNV *); GLAPI void APIENTRY glColor4hNV (GLhalfNV, GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glColor4hvNV (const GLhalfNV *); GLAPI void APIENTRY glTexCoord1hNV (GLhalfNV); GLAPI void APIENTRY glTexCoord1hvNV (const GLhalfNV *); GLAPI void APIENTRY glTexCoord2hNV (GLhalfNV, GLhalfNV); GLAPI void APIENTRY glTexCoord2hvNV (const GLhalfNV *); GLAPI void APIENTRY glTexCoord3hNV (GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glTexCoord3hvNV (const GLhalfNV *); GLAPI void APIENTRY glTexCoord4hNV (GLhalfNV, GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glTexCoord4hvNV (const GLhalfNV *); GLAPI void APIENTRY glMultiTexCoord1hNV (GLenum, GLhalfNV); GLAPI void APIENTRY glMultiTexCoord1hvNV (GLenum, const GLhalfNV *); GLAPI void APIENTRY glMultiTexCoord2hNV (GLenum, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glMultiTexCoord2hvNV (GLenum, const GLhalfNV *); GLAPI void APIENTRY glMultiTexCoord3hNV (GLenum, GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glMultiTexCoord3hvNV (GLenum, const GLhalfNV *); GLAPI void APIENTRY glMultiTexCoord4hNV (GLenum, GLhalfNV, GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glMultiTexCoord4hvNV (GLenum, const GLhalfNV *); GLAPI void APIENTRY glFogCoordhNV (GLhalfNV); GLAPI void APIENTRY glFogCoordhvNV (const GLhalfNV *); GLAPI void APIENTRY glSecondaryColor3hNV (GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glSecondaryColor3hvNV (const GLhalfNV *); GLAPI void APIENTRY glVertexWeighthNV (GLhalfNV); GLAPI void APIENTRY glVertexWeighthvNV (const GLhalfNV *); GLAPI void APIENTRY glVertexAttrib1hNV (GLuint, GLhalfNV); GLAPI void APIENTRY glVertexAttrib1hvNV (GLuint, const GLhalfNV *); GLAPI void APIENTRY glVertexAttrib2hNV (GLuint, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glVertexAttrib2hvNV (GLuint, const GLhalfNV *); GLAPI void APIENTRY glVertexAttrib3hNV (GLuint, GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glVertexAttrib3hvNV (GLuint, const GLhalfNV *); GLAPI void APIENTRY glVertexAttrib4hNV (GLuint, GLhalfNV, GLhalfNV, GLhalfNV, GLhalfNV); GLAPI void APIENTRY glVertexAttrib4hvNV (GLuint, const GLhalfNV *); GLAPI void APIENTRY glVertexAttribs1hvNV (GLuint, GLsizei, const GLhalfNV *); GLAPI void APIENTRY glVertexAttribs2hvNV (GLuint, GLsizei, const GLhalfNV *); GLAPI void APIENTRY glVertexAttribs3hvNV (GLuint, GLsizei, const GLhalfNV *); GLAPI void APIENTRY glVertexAttribs4hvNV (GLuint, GLsizei, const GLhalfNV *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEX2HNVPROC) (GLhalfNV x, GLhalfNV y); typedef void (APIENTRYP PFNGLVERTEX2HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEX3HNVPROC) (GLhalfNV x, GLhalfNV y, GLhalfNV z); typedef void (APIENTRYP PFNGLVERTEX3HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEX4HNVPROC) (GLhalfNV x, GLhalfNV y, GLhalfNV z, GLhalfNV w); typedef void (APIENTRYP PFNGLVERTEX4HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLNORMAL3HNVPROC) (GLhalfNV nx, GLhalfNV ny, GLhalfNV nz); typedef void (APIENTRYP PFNGLNORMAL3HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLCOLOR3HNVPROC) (GLhalfNV red, GLhalfNV green, GLhalfNV blue); typedef void (APIENTRYP PFNGLCOLOR3HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLCOLOR4HNVPROC) (GLhalfNV red, GLhalfNV green, GLhalfNV blue, GLhalfNV alpha); typedef void (APIENTRYP PFNGLCOLOR4HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLTEXCOORD1HNVPROC) (GLhalfNV s); typedef void (APIENTRYP PFNGLTEXCOORD1HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLTEXCOORD2HNVPROC) (GLhalfNV s, GLhalfNV t); typedef void (APIENTRYP PFNGLTEXCOORD2HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLTEXCOORD3HNVPROC) (GLhalfNV s, GLhalfNV t, GLhalfNV r); typedef void (APIENTRYP PFNGLTEXCOORD3HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLTEXCOORD4HNVPROC) (GLhalfNV s, GLhalfNV t, GLhalfNV r, GLhalfNV q); typedef void (APIENTRYP PFNGLTEXCOORD4HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD1HNVPROC) (GLenum target, GLhalfNV s); typedef void (APIENTRYP PFNGLMULTITEXCOORD1HVNVPROC) (GLenum target, const GLhalfNV *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD2HNVPROC) (GLenum target, GLhalfNV s, GLhalfNV t); typedef void (APIENTRYP PFNGLMULTITEXCOORD2HVNVPROC) (GLenum target, const GLhalfNV *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD3HNVPROC) (GLenum target, GLhalfNV s, GLhalfNV t, GLhalfNV r); typedef void (APIENTRYP PFNGLMULTITEXCOORD3HVNVPROC) (GLenum target, const GLhalfNV *v); typedef void (APIENTRYP PFNGLMULTITEXCOORD4HNVPROC) (GLenum target, GLhalfNV s, GLhalfNV t, GLhalfNV r, GLhalfNV q); typedef void (APIENTRYP PFNGLMULTITEXCOORD4HVNVPROC) (GLenum target, const GLhalfNV *v); typedef void (APIENTRYP PFNGLFOGCOORDHNVPROC) (GLhalfNV fog); typedef void (APIENTRYP PFNGLFOGCOORDHVNVPROC) (const GLhalfNV *fog); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3HNVPROC) (GLhalfNV red, GLhalfNV green, GLhalfNV blue); typedef void (APIENTRYP PFNGLSECONDARYCOLOR3HVNVPROC) (const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEXWEIGHTHNVPROC) (GLhalfNV weight); typedef void (APIENTRYP PFNGLVERTEXWEIGHTHVNVPROC) (const GLhalfNV *weight); typedef void (APIENTRYP PFNGLVERTEXATTRIB1HNVPROC) (GLuint index, GLhalfNV x); typedef void (APIENTRYP PFNGLVERTEXATTRIB1HVNVPROC) (GLuint index, const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB2HNVPROC) (GLuint index, GLhalfNV x, GLhalfNV y); typedef void (APIENTRYP PFNGLVERTEXATTRIB2HVNVPROC) (GLuint index, const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB3HNVPROC) (GLuint index, GLhalfNV x, GLhalfNV y, GLhalfNV z); typedef void (APIENTRYP PFNGLVERTEXATTRIB3HVNVPROC) (GLuint index, const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEXATTRIB4HNVPROC) (GLuint index, GLhalfNV x, GLhalfNV y, GLhalfNV z, GLhalfNV w); typedef void (APIENTRYP PFNGLVERTEXATTRIB4HVNVPROC) (GLuint index, const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS1HVNVPROC) (GLuint index, GLsizei n, const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS2HVNVPROC) (GLuint index, GLsizei n, const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS3HVNVPROC) (GLuint index, GLsizei n, const GLhalfNV *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBS4HVNVPROC) (GLuint index, GLsizei n, const GLhalfNV *v); #endif #ifndef GL_NV_pixel_data_range #define GL_NV_pixel_data_range 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPixelDataRangeNV (GLenum, GLsizei, GLvoid *); GLAPI void APIENTRY glFlushPixelDataRangeNV (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPIXELDATARANGENVPROC) (GLenum target, GLsizei length, GLvoid *pointer); typedef void (APIENTRYP PFNGLFLUSHPIXELDATARANGENVPROC) (GLenum target); #endif #ifndef GL_NV_primitive_restart #define GL_NV_primitive_restart 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPrimitiveRestartNV (void); GLAPI void APIENTRY glPrimitiveRestartIndexNV (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPRIMITIVERESTARTNVPROC) (void); typedef void (APIENTRYP PFNGLPRIMITIVERESTARTINDEXNVPROC) (GLuint index); #endif #ifndef GL_NV_texture_expand_normal #define GL_NV_texture_expand_normal 1 #endif #ifndef GL_NV_vertex_program2 #define GL_NV_vertex_program2 1 #endif #ifndef GL_ATI_map_object_buffer #define GL_ATI_map_object_buffer 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLvoid* APIENTRY glMapObjectBufferATI (GLuint); GLAPI void APIENTRY glUnmapObjectBufferATI (GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLvoid* (APIENTRYP PFNGLMAPOBJECTBUFFERATIPROC) (GLuint buffer); typedef void (APIENTRYP PFNGLUNMAPOBJECTBUFFERATIPROC) (GLuint buffer); #endif #ifndef GL_ATI_separate_stencil #define GL_ATI_separate_stencil 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glStencilOpSeparateATI (GLenum, GLenum, GLenum, GLenum); GLAPI void APIENTRY glStencilFuncSeparateATI (GLenum, GLenum, GLint, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSTENCILOPSEPARATEATIPROC) (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass); typedef void (APIENTRYP PFNGLSTENCILFUNCSEPARATEATIPROC) (GLenum frontfunc, GLenum backfunc, GLint ref, GLuint mask); #endif #ifndef GL_ATI_vertex_attrib_array_object #define GL_ATI_vertex_attrib_array_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertexAttribArrayObjectATI (GLuint, GLint, GLenum, GLboolean, GLsizei, GLuint, GLuint); GLAPI void APIENTRY glGetVertexAttribArrayObjectfvATI (GLuint, GLenum, GLfloat *); GLAPI void APIENTRY glGetVertexAttribArrayObjectivATI (GLuint, GLenum, GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEXATTRIBARRAYOBJECTATIPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, GLuint buffer, GLuint offset); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBARRAYOBJECTFVATIPROC) (GLuint index, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBARRAYOBJECTIVATIPROC) (GLuint index, GLenum pname, GLint *params); #endif #ifndef GL_OES_read_format #define GL_OES_read_format 1 #endif #ifndef GL_EXT_depth_bounds_test #define GL_EXT_depth_bounds_test 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDepthBoundsEXT (GLclampd, GLclampd); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDEPTHBOUNDSEXTPROC) (GLclampd zmin, GLclampd zmax); #endif #ifndef GL_EXT_texture_mirror_clamp #define GL_EXT_texture_mirror_clamp 1 #endif #ifndef GL_EXT_blend_equation_separate #define GL_EXT_blend_equation_separate 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendEquationSeparateEXT (GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEEXTPROC) (GLenum modeRGB, GLenum modeAlpha); #endif #ifndef GL_MESA_pack_invert #define GL_MESA_pack_invert 1 #endif #ifndef GL_MESA_ycbcr_texture #define GL_MESA_ycbcr_texture 1 #endif #ifndef GL_EXT_pixel_buffer_object #define GL_EXT_pixel_buffer_object 1 #endif #ifndef GL_NV_fragment_program_option #define GL_NV_fragment_program_option 1 #endif #ifndef GL_NV_fragment_program2 #define GL_NV_fragment_program2 1 #endif #ifndef GL_NV_vertex_program2_option #define GL_NV_vertex_program2_option 1 #endif #ifndef GL_NV_vertex_program3 #define GL_NV_vertex_program3 1 #endif #ifndef GL_EXT_framebuffer_object #define GL_EXT_framebuffer_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLboolean APIENTRY glIsRenderbufferEXT (GLuint); GLAPI void APIENTRY glBindRenderbufferEXT (GLenum, GLuint); GLAPI void APIENTRY glDeleteRenderbuffersEXT (GLsizei, const GLuint *); GLAPI void APIENTRY glGenRenderbuffersEXT (GLsizei, GLuint *); GLAPI void APIENTRY glRenderbufferStorageEXT (GLenum, GLenum, GLsizei, GLsizei); GLAPI void APIENTRY glGetRenderbufferParameterivEXT (GLenum, GLenum, GLint *); GLAPI GLboolean APIENTRY glIsFramebufferEXT (GLuint); GLAPI void APIENTRY glBindFramebufferEXT (GLenum, GLuint); GLAPI void APIENTRY glDeleteFramebuffersEXT (GLsizei, const GLuint *); GLAPI void APIENTRY glGenFramebuffersEXT (GLsizei, GLuint *); GLAPI GLenum APIENTRY glCheckFramebufferStatusEXT (GLenum); GLAPI void APIENTRY glFramebufferTexture1DEXT (GLenum, GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glFramebufferTexture2DEXT (GLenum, GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glFramebufferTexture3DEXT (GLenum, GLenum, GLenum, GLuint, GLint, GLint); GLAPI void APIENTRY glFramebufferRenderbufferEXT (GLenum, GLenum, GLenum, GLuint); GLAPI void APIENTRY glGetFramebufferAttachmentParameterivEXT (GLenum, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGenerateMipmapEXT (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLboolean (APIENTRYP PFNGLISRENDERBUFFEREXTPROC) (GLuint renderbuffer); typedef void (APIENTRYP PFNGLBINDRENDERBUFFEREXTPROC) (GLenum target, GLuint renderbuffer); typedef void (APIENTRYP PFNGLDELETERENDERBUFFERSEXTPROC) (GLsizei n, const GLuint *renderbuffers); typedef void (APIENTRYP PFNGLGENRENDERBUFFERSEXTPROC) (GLsizei n, GLuint *renderbuffers); typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEEXTPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLGETRENDERBUFFERPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params); typedef GLboolean (APIENTRYP PFNGLISFRAMEBUFFEREXTPROC) (GLuint framebuffer); typedef void (APIENTRYP PFNGLBINDFRAMEBUFFEREXTPROC) (GLenum target, GLuint framebuffer); typedef void (APIENTRYP PFNGLDELETEFRAMEBUFFERSEXTPROC) (GLsizei n, const GLuint *framebuffers); typedef void (APIENTRYP PFNGLGENFRAMEBUFFERSEXTPROC) (GLsizei n, GLuint *framebuffers); typedef GLenum (APIENTRYP PFNGLCHECKFRAMEBUFFERSTATUSEXTPROC) (GLenum target); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE1DEXTPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE2DEXTPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE3DEXTPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset); typedef void (APIENTRYP PFNGLFRAMEBUFFERRENDERBUFFEREXTPROC) (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer); typedef void (APIENTRYP PFNGLGETFRAMEBUFFERATTACHMENTPARAMETERIVEXTPROC) (GLenum target, GLenum attachment, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGENERATEMIPMAPEXTPROC) (GLenum target); #endif #ifndef GL_GREMEDY_string_marker #define GL_GREMEDY_string_marker 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glStringMarkerGREMEDY (GLsizei, const GLvoid *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSTRINGMARKERGREMEDYPROC) (GLsizei len, const GLvoid *string); #endif #ifndef GL_EXT_packed_depth_stencil #define GL_EXT_packed_depth_stencil 1 #endif #ifndef GL_EXT_stencil_clear_tag #define GL_EXT_stencil_clear_tag 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glStencilClearTagEXT (GLsizei, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLSTENCILCLEARTAGEXTPROC) (GLsizei stencilTagBits, GLuint stencilClearTag); #endif #ifndef GL_EXT_texture_sRGB #define GL_EXT_texture_sRGB 1 #endif #ifndef GL_EXT_framebuffer_blit #define GL_EXT_framebuffer_blit 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlitFramebufferEXT (GLint, GLint, GLint, GLint, GLint, GLint, GLint, GLint, GLbitfield, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLITFRAMEBUFFEREXTPROC) (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter); #endif #ifndef GL_EXT_framebuffer_multisample #define GL_EXT_framebuffer_multisample 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glRenderbufferStorageMultisampleEXT (GLenum, GLsizei, GLenum, GLsizei, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height); #endif #ifndef GL_MESAX_texture_stack #define GL_MESAX_texture_stack 1 #endif #ifndef GL_EXT_timer_query #define GL_EXT_timer_query 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetQueryObjecti64vEXT (GLuint, GLenum, GLint64EXT *); GLAPI void APIENTRY glGetQueryObjectui64vEXT (GLuint, GLenum, GLuint64EXT *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETQUERYOBJECTI64VEXTPROC) (GLuint id, GLenum pname, GLint64EXT *params); typedef void (APIENTRYP PFNGLGETQUERYOBJECTUI64VEXTPROC) (GLuint id, GLenum pname, GLuint64EXT *params); #endif #ifndef GL_EXT_gpu_program_parameters #define GL_EXT_gpu_program_parameters 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProgramEnvParameters4fvEXT (GLenum, GLuint, GLsizei, const GLfloat *); GLAPI void APIENTRY glProgramLocalParameters4fvEXT (GLenum, GLuint, GLsizei, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERS4FVEXTPROC) (GLenum target, GLuint index, GLsizei count, const GLfloat *params); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERS4FVEXTPROC) (GLenum target, GLuint index, GLsizei count, const GLfloat *params); #endif #ifndef GL_APPLE_flush_buffer_range #define GL_APPLE_flush_buffer_range 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBufferParameteriAPPLE (GLenum, GLenum, GLint); GLAPI void APIENTRY glFlushMappedBufferRangeAPPLE (GLenum, GLintptr, GLsizeiptr); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBUFFERPARAMETERIAPPLEPROC) (GLenum target, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLFLUSHMAPPEDBUFFERRANGEAPPLEPROC) (GLenum target, GLintptr offset, GLsizeiptr size); #endif #ifndef GL_NV_gpu_program4 #define GL_NV_gpu_program4 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProgramLocalParameterI4iNV (GLenum, GLuint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glProgramLocalParameterI4ivNV (GLenum, GLuint, const GLint *); GLAPI void APIENTRY glProgramLocalParametersI4ivNV (GLenum, GLuint, GLsizei, const GLint *); GLAPI void APIENTRY glProgramLocalParameterI4uiNV (GLenum, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glProgramLocalParameterI4uivNV (GLenum, GLuint, const GLuint *); GLAPI void APIENTRY glProgramLocalParametersI4uivNV (GLenum, GLuint, GLsizei, const GLuint *); GLAPI void APIENTRY glProgramEnvParameterI4iNV (GLenum, GLuint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glProgramEnvParameterI4ivNV (GLenum, GLuint, const GLint *); GLAPI void APIENTRY glProgramEnvParametersI4ivNV (GLenum, GLuint, GLsizei, const GLint *); GLAPI void APIENTRY glProgramEnvParameterI4uiNV (GLenum, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glProgramEnvParameterI4uivNV (GLenum, GLuint, const GLuint *); GLAPI void APIENTRY glProgramEnvParametersI4uivNV (GLenum, GLuint, GLsizei, const GLuint *); GLAPI void APIENTRY glGetProgramLocalParameterIivNV (GLenum, GLuint, GLint *); GLAPI void APIENTRY glGetProgramLocalParameterIuivNV (GLenum, GLuint, GLuint *); GLAPI void APIENTRY glGetProgramEnvParameterIivNV (GLenum, GLuint, GLint *); GLAPI void APIENTRY glGetProgramEnvParameterIuivNV (GLenum, GLuint, GLuint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERI4INVPROC) (GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERI4IVNVPROC) (GLenum target, GLuint index, const GLint *params); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERSI4IVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLint *params); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERI4UINVPROC) (GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERI4UIVNVPROC) (GLenum target, GLuint index, const GLuint *params); typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERSI4UIVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLuint *params); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERI4INVPROC) (GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERI4IVNVPROC) (GLenum target, GLuint index, const GLint *params); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERSI4IVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLint *params); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERI4UINVPROC) (GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERI4UIVNVPROC) (GLenum target, GLuint index, const GLuint *params); typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERSI4UIVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLuint *params); typedef void (APIENTRYP PFNGLGETPROGRAMLOCALPARAMETERIIVNVPROC) (GLenum target, GLuint index, GLint *params); typedef void (APIENTRYP PFNGLGETPROGRAMLOCALPARAMETERIUIVNVPROC) (GLenum target, GLuint index, GLuint *params); typedef void (APIENTRYP PFNGLGETPROGRAMENVPARAMETERIIVNVPROC) (GLenum target, GLuint index, GLint *params); typedef void (APIENTRYP PFNGLGETPROGRAMENVPARAMETERIUIVNVPROC) (GLenum target, GLuint index, GLuint *params); #endif #ifndef GL_NV_geometry_program4 #define GL_NV_geometry_program4 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProgramVertexLimitNV (GLenum, GLint); GLAPI void APIENTRY glFramebufferTextureEXT (GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glFramebufferTextureLayerEXT (GLenum, GLenum, GLuint, GLint, GLint); GLAPI void APIENTRY glFramebufferTextureFaceEXT (GLenum, GLenum, GLuint, GLint, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROGRAMVERTEXLIMITNVPROC) (GLenum target, GLint limit); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREEXTPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYEREXTPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer); typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREFACEEXTPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLenum face); #endif #ifndef GL_EXT_geometry_shader4 #define GL_EXT_geometry_shader4 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProgramParameteriEXT (GLuint, GLenum, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROGRAMPARAMETERIEXTPROC) (GLuint program, GLenum pname, GLint value); #endif #ifndef GL_NV_vertex_program4 #define GL_NV_vertex_program4 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glVertexAttribI1iEXT (GLuint, GLint); GLAPI void APIENTRY glVertexAttribI2iEXT (GLuint, GLint, GLint); GLAPI void APIENTRY glVertexAttribI3iEXT (GLuint, GLint, GLint, GLint); GLAPI void APIENTRY glVertexAttribI4iEXT (GLuint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glVertexAttribI1uiEXT (GLuint, GLuint); GLAPI void APIENTRY glVertexAttribI2uiEXT (GLuint, GLuint, GLuint); GLAPI void APIENTRY glVertexAttribI3uiEXT (GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glVertexAttribI4uiEXT (GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glVertexAttribI1ivEXT (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttribI2ivEXT (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttribI3ivEXT (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttribI4ivEXT (GLuint, const GLint *); GLAPI void APIENTRY glVertexAttribI1uivEXT (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttribI2uivEXT (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttribI3uivEXT (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttribI4uivEXT (GLuint, const GLuint *); GLAPI void APIENTRY glVertexAttribI4bvEXT (GLuint, const GLbyte *); GLAPI void APIENTRY glVertexAttribI4svEXT (GLuint, const GLshort *); GLAPI void APIENTRY glVertexAttribI4ubvEXT (GLuint, const GLubyte *); GLAPI void APIENTRY glVertexAttribI4usvEXT (GLuint, const GLushort *); GLAPI void APIENTRY glVertexAttribIPointerEXT (GLuint, GLint, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glGetVertexAttribIivEXT (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetVertexAttribIuivEXT (GLuint, GLenum, GLuint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLVERTEXATTRIBI1IEXTPROC) (GLuint index, GLint x); typedef void (APIENTRYP PFNGLVERTEXATTRIBI2IEXTPROC) (GLuint index, GLint x, GLint y); typedef void (APIENTRYP PFNGLVERTEXATTRIBI3IEXTPROC) (GLuint index, GLint x, GLint y, GLint z); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4IEXTPROC) (GLuint index, GLint x, GLint y, GLint z, GLint w); typedef void (APIENTRYP PFNGLVERTEXATTRIBI1UIEXTPROC) (GLuint index, GLuint x); typedef void (APIENTRYP PFNGLVERTEXATTRIBI2UIEXTPROC) (GLuint index, GLuint x, GLuint y); typedef void (APIENTRYP PFNGLVERTEXATTRIBI3UIEXTPROC) (GLuint index, GLuint x, GLuint y, GLuint z); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UIEXTPROC) (GLuint index, GLuint x, GLuint y, GLuint z, GLuint w); typedef void (APIENTRYP PFNGLVERTEXATTRIBI1IVEXTPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI2IVEXTPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI3IVEXTPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4IVEXTPROC) (GLuint index, const GLint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI1UIVEXTPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI2UIVEXTPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI3UIVEXTPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UIVEXTPROC) (GLuint index, const GLuint *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4BVEXTPROC) (GLuint index, const GLbyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4SVEXTPROC) (GLuint index, const GLshort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UBVEXTPROC) (GLuint index, const GLubyte *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBI4USVEXTPROC) (GLuint index, const GLushort *v); typedef void (APIENTRYP PFNGLVERTEXATTRIBIPOINTEREXTPROC) (GLuint index, GLint size, GLenum type, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIIVEXTPROC) (GLuint index, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIUIVEXTPROC) (GLuint index, GLenum pname, GLuint *params); #endif #ifndef GL_EXT_gpu_shader4 #define GL_EXT_gpu_shader4 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetUniformuivEXT (GLuint, GLint, GLuint *); GLAPI void APIENTRY glBindFragDataLocationEXT (GLuint, GLuint, const GLchar *); GLAPI GLint APIENTRY glGetFragDataLocationEXT (GLuint, const GLchar *); GLAPI void APIENTRY glUniform1uiEXT (GLint, GLuint); GLAPI void APIENTRY glUniform2uiEXT (GLint, GLuint, GLuint); GLAPI void APIENTRY glUniform3uiEXT (GLint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glUniform4uiEXT (GLint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glUniform1uivEXT (GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glUniform2uivEXT (GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glUniform3uivEXT (GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glUniform4uivEXT (GLint, GLsizei, const GLuint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETUNIFORMUIVEXTPROC) (GLuint program, GLint location, GLuint *params); typedef void (APIENTRYP PFNGLBINDFRAGDATALOCATIONEXTPROC) (GLuint program, GLuint color, const GLchar *name); typedef GLint (APIENTRYP PFNGLGETFRAGDATALOCATIONEXTPROC) (GLuint program, const GLchar *name); typedef void (APIENTRYP PFNGLUNIFORM1UIEXTPROC) (GLint location, GLuint v0); typedef void (APIENTRYP PFNGLUNIFORM2UIEXTPROC) (GLint location, GLuint v0, GLuint v1); typedef void (APIENTRYP PFNGLUNIFORM3UIEXTPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2); typedef void (APIENTRYP PFNGLUNIFORM4UIEXTPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3); typedef void (APIENTRYP PFNGLUNIFORM1UIVEXTPROC) (GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLUNIFORM2UIVEXTPROC) (GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLUNIFORM3UIVEXTPROC) (GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLUNIFORM4UIVEXTPROC) (GLint location, GLsizei count, const GLuint *value); #endif #ifndef GL_EXT_draw_instanced #define GL_EXT_draw_instanced 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDrawArraysInstancedEXT (GLenum, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glDrawElementsInstancedEXT (GLenum, GLsizei, GLenum, const GLvoid *, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDRAWARRAYSINSTANCEDEXTPROC) (GLenum mode, GLint start, GLsizei count, GLsizei primcount); typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDEXTPROC) (GLenum mode, GLsizei count, GLenum type, const GLvoid *indices, GLsizei primcount); #endif #ifndef GL_EXT_packed_float #define GL_EXT_packed_float 1 #endif #ifndef GL_EXT_texture_array #define GL_EXT_texture_array 1 #endif #ifndef GL_EXT_texture_buffer_object #define GL_EXT_texture_buffer_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTexBufferEXT (GLenum, GLenum, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXBUFFEREXTPROC) (GLenum target, GLenum internalformat, GLuint buffer); #endif #ifndef GL_EXT_texture_compression_latc #define GL_EXT_texture_compression_latc 1 #endif #ifndef GL_EXT_texture_compression_rgtc #define GL_EXT_texture_compression_rgtc 1 #endif #ifndef GL_EXT_texture_shared_exponent #define GL_EXT_texture_shared_exponent 1 #endif #ifndef GL_NV_depth_buffer_float #define GL_NV_depth_buffer_float 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glDepthRangedNV (GLdouble, GLdouble); GLAPI void APIENTRY glClearDepthdNV (GLdouble); GLAPI void APIENTRY glDepthBoundsdNV (GLdouble, GLdouble); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLDEPTHRANGEDNVPROC) (GLdouble zNear, GLdouble zFar); typedef void (APIENTRYP PFNGLCLEARDEPTHDNVPROC) (GLdouble depth); typedef void (APIENTRYP PFNGLDEPTHBOUNDSDNVPROC) (GLdouble zmin, GLdouble zmax); #endif #ifndef GL_NV_fragment_program4 #define GL_NV_fragment_program4 1 #endif #ifndef GL_NV_framebuffer_multisample_coverage #define GL_NV_framebuffer_multisample_coverage 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glRenderbufferStorageMultisampleCoverageNV (GLenum, GLsizei, GLsizei, GLenum, GLsizei, GLsizei); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEMULTISAMPLECOVERAGENVPROC) (GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLenum internalformat, GLsizei width, GLsizei height); #endif #ifndef GL_EXT_framebuffer_sRGB #define GL_EXT_framebuffer_sRGB 1 #endif #ifndef GL_NV_geometry_shader4 #define GL_NV_geometry_shader4 1 #endif #ifndef GL_NV_parameter_buffer_object #define GL_NV_parameter_buffer_object 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProgramBufferParametersfvNV (GLenum, GLuint, GLuint, GLsizei, const GLfloat *); GLAPI void APIENTRY glProgramBufferParametersIivNV (GLenum, GLuint, GLuint, GLsizei, const GLint *); GLAPI void APIENTRY glProgramBufferParametersIuivNV (GLenum, GLuint, GLuint, GLsizei, const GLuint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROGRAMBUFFERPARAMETERSFVNVPROC) (GLenum target, GLuint buffer, GLuint index, GLsizei count, const GLfloat *params); typedef void (APIENTRYP PFNGLPROGRAMBUFFERPARAMETERSIIVNVPROC) (GLenum target, GLuint buffer, GLuint index, GLsizei count, const GLint *params); typedef void (APIENTRYP PFNGLPROGRAMBUFFERPARAMETERSIUIVNVPROC) (GLenum target, GLuint buffer, GLuint index, GLsizei count, const GLuint *params); #endif #ifndef GL_EXT_draw_buffers2 #define GL_EXT_draw_buffers2 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glColorMaskIndexedEXT (GLuint, GLboolean, GLboolean, GLboolean, GLboolean); GLAPI void APIENTRY glGetBooleanIndexedvEXT (GLenum, GLuint, GLboolean *); GLAPI void APIENTRY glGetIntegerIndexedvEXT (GLenum, GLuint, GLint *); GLAPI void APIENTRY glEnableIndexedEXT (GLenum, GLuint); GLAPI void APIENTRY glDisableIndexedEXT (GLenum, GLuint); GLAPI GLboolean APIENTRY glIsEnabledIndexedEXT (GLenum, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCOLORMASKINDEXEDEXTPROC) (GLuint index, GLboolean r, GLboolean g, GLboolean b, GLboolean a); typedef void (APIENTRYP PFNGLGETBOOLEANINDEXEDVEXTPROC) (GLenum target, GLuint index, GLboolean *data); typedef void (APIENTRYP PFNGLGETINTEGERINDEXEDVEXTPROC) (GLenum target, GLuint index, GLint *data); typedef void (APIENTRYP PFNGLENABLEINDEXEDEXTPROC) (GLenum target, GLuint index); typedef void (APIENTRYP PFNGLDISABLEINDEXEDEXTPROC) (GLenum target, GLuint index); typedef GLboolean (APIENTRYP PFNGLISENABLEDINDEXEDEXTPROC) (GLenum target, GLuint index); #endif #ifndef GL_NV_transform_feedback #define GL_NV_transform_feedback 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBeginTransformFeedbackNV (GLenum); GLAPI void APIENTRY glEndTransformFeedbackNV (void); GLAPI void APIENTRY glTransformFeedbackAttribsNV (GLuint, const GLint *, GLenum); GLAPI void APIENTRY glBindBufferRangeNV (GLenum, GLuint, GLuint, GLintptr, GLsizeiptr); GLAPI void APIENTRY glBindBufferOffsetNV (GLenum, GLuint, GLuint, GLintptr); GLAPI void APIENTRY glBindBufferBaseNV (GLenum, GLuint, GLuint); GLAPI void APIENTRY glTransformFeedbackVaryingsNV (GLuint, GLsizei, const GLchar* *, GLenum); GLAPI void APIENTRY glActiveVaryingNV (GLuint, const GLchar *); GLAPI GLint APIENTRY glGetVaryingLocationNV (GLuint, const GLchar *); GLAPI void APIENTRY glGetActiveVaryingNV (GLuint, GLuint, GLsizei, GLsizei *, GLsizei *, GLenum *, GLchar *); GLAPI void APIENTRY glGetTransformFeedbackVaryingNV (GLuint, GLuint, GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBEGINTRANSFORMFEEDBACKNVPROC) (GLenum primitiveMode); typedef void (APIENTRYP PFNGLENDTRANSFORMFEEDBACKNVPROC) (void); typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKATTRIBSNVPROC) (GLuint count, const GLint *attribs, GLenum bufferMode); typedef void (APIENTRYP PFNGLBINDBUFFERRANGENVPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size); typedef void (APIENTRYP PFNGLBINDBUFFEROFFSETNVPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset); typedef void (APIENTRYP PFNGLBINDBUFFERBASENVPROC) (GLenum target, GLuint index, GLuint buffer); typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKVARYINGSNVPROC) (GLuint program, GLsizei count, const GLchar* *varyings, GLenum bufferMode); typedef void (APIENTRYP PFNGLACTIVEVARYINGNVPROC) (GLuint program, const GLchar *name); typedef GLint (APIENTRYP PFNGLGETVARYINGLOCATIONNVPROC) (GLuint program, const GLchar *name); typedef void (APIENTRYP PFNGLGETACTIVEVARYINGNVPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name); typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKVARYINGNVPROC) (GLuint program, GLuint index, GLint *location); #endif #ifndef GL_EXT_bindable_uniform #define GL_EXT_bindable_uniform 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glUniformBufferEXT (GLuint, GLint, GLuint); GLAPI GLint APIENTRY glGetUniformBufferSizeEXT (GLuint, GLint); GLAPI GLintptr APIENTRY glGetUniformOffsetEXT (GLuint, GLint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLUNIFORMBUFFEREXTPROC) (GLuint program, GLint location, GLuint buffer); typedef GLint (APIENTRYP PFNGLGETUNIFORMBUFFERSIZEEXTPROC) (GLuint program, GLint location); typedef GLintptr (APIENTRYP PFNGLGETUNIFORMOFFSETEXTPROC) (GLuint program, GLint location); #endif #ifndef GL_EXT_texture_integer #define GL_EXT_texture_integer 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTexParameterIivEXT (GLenum, GLenum, const GLint *); GLAPI void APIENTRY glTexParameterIuivEXT (GLenum, GLenum, const GLuint *); GLAPI void APIENTRY glGetTexParameterIivEXT (GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetTexParameterIuivEXT (GLenum, GLenum, GLuint *); GLAPI void APIENTRY glClearColorIiEXT (GLint, GLint, GLint, GLint); GLAPI void APIENTRY glClearColorIuiEXT (GLuint, GLuint, GLuint, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXPARAMETERIIVEXTPROC) (GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLTEXPARAMETERIUIVEXTPROC) (GLenum target, GLenum pname, const GLuint *params); typedef void (APIENTRYP PFNGLGETTEXPARAMETERIIVEXTPROC) (GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETTEXPARAMETERIUIVEXTPROC) (GLenum target, GLenum pname, GLuint *params); typedef void (APIENTRYP PFNGLCLEARCOLORIIEXTPROC) (GLint red, GLint green, GLint blue, GLint alpha); typedef void (APIENTRYP PFNGLCLEARCOLORIUIEXTPROC) (GLuint red, GLuint green, GLuint blue, GLuint alpha); #endif #ifndef GL_GREMEDY_frame_terminator #define GL_GREMEDY_frame_terminator 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glFrameTerminatorGREMEDY (void); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLFRAMETERMINATORGREMEDYPROC) (void); #endif #ifndef GL_NV_conditional_render #define GL_NV_conditional_render 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBeginConditionalRenderNV (GLuint, GLenum); GLAPI void APIENTRY glEndConditionalRenderNV (void); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBEGINCONDITIONALRENDERNVPROC) (GLuint id, GLenum mode); typedef void (APIENTRYP PFNGLENDCONDITIONALRENDERNVPROC) (void); #endif #ifndef GL_NV_present_video #define GL_NV_present_video 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glPresentFrameKeyedNV (GLuint, GLuint64EXT, GLuint, GLuint, GLenum, GLenum, GLuint, GLuint, GLenum, GLuint, GLuint); GLAPI void APIENTRY glPresentFrameDualFillNV (GLuint, GLuint64EXT, GLuint, GLuint, GLenum, GLenum, GLuint, GLenum, GLuint, GLenum, GLuint, GLenum, GLuint); GLAPI void APIENTRY glGetVideoivNV (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetVideouivNV (GLuint, GLenum, GLuint *); GLAPI void APIENTRY glGetVideoi64vNV (GLuint, GLenum, GLint64EXT *); GLAPI void APIENTRY glGetVideoui64vNV (GLuint, GLenum, GLuint64EXT *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPRESENTFRAMEKEYEDNVPROC) (GLuint video_slot, GLuint64EXT minPresentTime, GLuint beginPresentTimeId, GLuint presentDurationId, GLenum type, GLenum target0, GLuint fill0, GLuint key0, GLenum target1, GLuint fill1, GLuint key1); typedef void (APIENTRYP PFNGLPRESENTFRAMEDUALFILLNVPROC) (GLuint video_slot, GLuint64EXT minPresentTime, GLuint beginPresentTimeId, GLuint presentDurationId, GLenum type, GLenum target0, GLuint fill0, GLenum target1, GLuint fill1, GLenum target2, GLuint fill2, GLenum target3, GLuint fill3); typedef void (APIENTRYP PFNGLGETVIDEOIVNVPROC) (GLuint video_slot, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETVIDEOUIVNVPROC) (GLuint video_slot, GLenum pname, GLuint *params); typedef void (APIENTRYP PFNGLGETVIDEOI64VNVPROC) (GLuint video_slot, GLenum pname, GLint64EXT *params); typedef void (APIENTRYP PFNGLGETVIDEOUI64VNVPROC) (GLuint video_slot, GLenum pname, GLuint64EXT *params); #endif #ifndef GL_EXT_transform_feedback #define GL_EXT_transform_feedback 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBeginTransformFeedbackEXT (GLenum); GLAPI void APIENTRY glEndTransformFeedbackEXT (void); GLAPI void APIENTRY glBindBufferRangeEXT (GLenum, GLuint, GLuint, GLintptr, GLsizeiptr); GLAPI void APIENTRY glBindBufferOffsetEXT (GLenum, GLuint, GLuint, GLintptr); GLAPI void APIENTRY glBindBufferBaseEXT (GLenum, GLuint, GLuint); GLAPI void APIENTRY glTransformFeedbackVaryingsEXT (GLuint, GLsizei, const GLchar* *, GLenum); GLAPI void APIENTRY glGetTransformFeedbackVaryingEXT (GLuint, GLuint, GLsizei, GLsizei *, GLsizei *, GLenum *, GLchar *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBEGINTRANSFORMFEEDBACKEXTPROC) (GLenum primitiveMode); typedef void (APIENTRYP PFNGLENDTRANSFORMFEEDBACKEXTPROC) (void); typedef void (APIENTRYP PFNGLBINDBUFFERRANGEEXTPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size); typedef void (APIENTRYP PFNGLBINDBUFFEROFFSETEXTPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset); typedef void (APIENTRYP PFNGLBINDBUFFERBASEEXTPROC) (GLenum target, GLuint index, GLuint buffer); typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKVARYINGSEXTPROC) (GLuint program, GLsizei count, const GLchar* *varyings, GLenum bufferMode); typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKVARYINGEXTPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name); #endif #ifndef GL_EXT_direct_state_access #define GL_EXT_direct_state_access 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glClientAttribDefaultEXT (GLbitfield); GLAPI void APIENTRY glPushClientAttribDefaultEXT (GLbitfield); GLAPI void APIENTRY glMatrixLoadfEXT (GLenum, const GLfloat *); GLAPI void APIENTRY glMatrixLoaddEXT (GLenum, const GLdouble *); GLAPI void APIENTRY glMatrixMultfEXT (GLenum, const GLfloat *); GLAPI void APIENTRY glMatrixMultdEXT (GLenum, const GLdouble *); GLAPI void APIENTRY glMatrixLoadIdentityEXT (GLenum); GLAPI void APIENTRY glMatrixRotatefEXT (GLenum, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glMatrixRotatedEXT (GLenum, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMatrixScalefEXT (GLenum, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glMatrixScaledEXT (GLenum, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMatrixTranslatefEXT (GLenum, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glMatrixTranslatedEXT (GLenum, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMatrixFrustumEXT (GLenum, GLdouble, GLdouble, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMatrixOrthoEXT (GLenum, GLdouble, GLdouble, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glMatrixPopEXT (GLenum); GLAPI void APIENTRY glMatrixPushEXT (GLenum); GLAPI void APIENTRY glMatrixLoadTransposefEXT (GLenum, const GLfloat *); GLAPI void APIENTRY glMatrixLoadTransposedEXT (GLenum, const GLdouble *); GLAPI void APIENTRY glMatrixMultTransposefEXT (GLenum, const GLfloat *); GLAPI void APIENTRY glMatrixMultTransposedEXT (GLenum, const GLdouble *); GLAPI void APIENTRY glTextureParameterfEXT (GLuint, GLenum, GLenum, GLfloat); GLAPI void APIENTRY glTextureParameterfvEXT (GLuint, GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glTextureParameteriEXT (GLuint, GLenum, GLenum, GLint); GLAPI void APIENTRY glTextureParameterivEXT (GLuint, GLenum, GLenum, const GLint *); GLAPI void APIENTRY glTextureImage1DEXT (GLuint, GLenum, GLint, GLenum, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glTextureImage2DEXT (GLuint, GLenum, GLint, GLenum, GLsizei, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glTextureSubImage1DEXT (GLuint, GLenum, GLint, GLint, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glTextureSubImage2DEXT (GLuint, GLenum, GLint, GLint, GLint, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glCopyTextureImage1DEXT (GLuint, GLenum, GLint, GLenum, GLint, GLint, GLsizei, GLint); GLAPI void APIENTRY glCopyTextureImage2DEXT (GLuint, GLenum, GLint, GLenum, GLint, GLint, GLsizei, GLsizei, GLint); GLAPI void APIENTRY glCopyTextureSubImage1DEXT (GLuint, GLenum, GLint, GLint, GLint, GLint, GLsizei); GLAPI void APIENTRY glCopyTextureSubImage2DEXT (GLuint, GLenum, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glGetTextureImageEXT (GLuint, GLenum, GLint, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetTextureParameterfvEXT (GLuint, GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetTextureParameterivEXT (GLuint, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetTextureLevelParameterfvEXT (GLuint, GLenum, GLint, GLenum, GLfloat *); GLAPI void APIENTRY glGetTextureLevelParameterivEXT (GLuint, GLenum, GLint, GLenum, GLint *); GLAPI void APIENTRY glTextureImage3DEXT (GLuint, GLenum, GLint, GLenum, GLsizei, GLsizei, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glTextureSubImage3DEXT (GLuint, GLenum, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glCopyTextureSubImage3DEXT (GLuint, GLenum, GLint, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glMultiTexParameterfEXT (GLenum, GLenum, GLenum, GLfloat); GLAPI void APIENTRY glMultiTexParameterfvEXT (GLenum, GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexParameteriEXT (GLenum, GLenum, GLenum, GLint); GLAPI void APIENTRY glMultiTexParameterivEXT (GLenum, GLenum, GLenum, const GLint *); GLAPI void APIENTRY glMultiTexImage1DEXT (GLenum, GLenum, GLint, GLenum, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glMultiTexImage2DEXT (GLenum, GLenum, GLint, GLenum, GLsizei, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glMultiTexSubImage1DEXT (GLenum, GLenum, GLint, GLint, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glMultiTexSubImage2DEXT (GLenum, GLenum, GLint, GLint, GLint, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glCopyMultiTexImage1DEXT (GLenum, GLenum, GLint, GLenum, GLint, GLint, GLsizei, GLint); GLAPI void APIENTRY glCopyMultiTexImage2DEXT (GLenum, GLenum, GLint, GLenum, GLint, GLint, GLsizei, GLsizei, GLint); GLAPI void APIENTRY glCopyMultiTexSubImage1DEXT (GLenum, GLenum, GLint, GLint, GLint, GLint, GLsizei); GLAPI void APIENTRY glCopyMultiTexSubImage2DEXT (GLenum, GLenum, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glGetMultiTexImageEXT (GLenum, GLenum, GLint, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glGetMultiTexParameterfvEXT (GLenum, GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetMultiTexParameterivEXT (GLenum, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetMultiTexLevelParameterfvEXT (GLenum, GLenum, GLint, GLenum, GLfloat *); GLAPI void APIENTRY glGetMultiTexLevelParameterivEXT (GLenum, GLenum, GLint, GLenum, GLint *); GLAPI void APIENTRY glMultiTexImage3DEXT (GLenum, GLenum, GLint, GLenum, GLsizei, GLsizei, GLsizei, GLint, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glMultiTexSubImage3DEXT (GLenum, GLenum, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLenum, GLenum, const GLvoid *); GLAPI void APIENTRY glCopyMultiTexSubImage3DEXT (GLenum, GLenum, GLint, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei); GLAPI void APIENTRY glBindMultiTextureEXT (GLenum, GLenum, GLuint); GLAPI void APIENTRY glEnableClientStateIndexedEXT (GLenum, GLuint); GLAPI void APIENTRY glDisableClientStateIndexedEXT (GLenum, GLuint); GLAPI void APIENTRY glMultiTexCoordPointerEXT (GLenum, GLint, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glMultiTexEnvfEXT (GLenum, GLenum, GLenum, GLfloat); GLAPI void APIENTRY glMultiTexEnvfvEXT (GLenum, GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexEnviEXT (GLenum, GLenum, GLenum, GLint); GLAPI void APIENTRY glMultiTexEnvivEXT (GLenum, GLenum, GLenum, const GLint *); GLAPI void APIENTRY glMultiTexGendEXT (GLenum, GLenum, GLenum, GLdouble); GLAPI void APIENTRY glMultiTexGendvEXT (GLenum, GLenum, GLenum, const GLdouble *); GLAPI void APIENTRY glMultiTexGenfEXT (GLenum, GLenum, GLenum, GLfloat); GLAPI void APIENTRY glMultiTexGenfvEXT (GLenum, GLenum, GLenum, const GLfloat *); GLAPI void APIENTRY glMultiTexGeniEXT (GLenum, GLenum, GLenum, GLint); GLAPI void APIENTRY glMultiTexGenivEXT (GLenum, GLenum, GLenum, const GLint *); GLAPI void APIENTRY glGetMultiTexEnvfvEXT (GLenum, GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetMultiTexEnvivEXT (GLenum, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetMultiTexGendvEXT (GLenum, GLenum, GLenum, GLdouble *); GLAPI void APIENTRY glGetMultiTexGenfvEXT (GLenum, GLenum, GLenum, GLfloat *); GLAPI void APIENTRY glGetMultiTexGenivEXT (GLenum, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetFloatIndexedvEXT (GLenum, GLuint, GLfloat *); GLAPI void APIENTRY glGetDoubleIndexedvEXT (GLenum, GLuint, GLdouble *); GLAPI void APIENTRY glGetPointerIndexedvEXT (GLenum, GLuint, GLvoid* *); GLAPI void APIENTRY glCompressedTextureImage3DEXT (GLuint, GLenum, GLint, GLenum, GLsizei, GLsizei, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTextureImage2DEXT (GLuint, GLenum, GLint, GLenum, GLsizei, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTextureImage1DEXT (GLuint, GLenum, GLint, GLenum, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTextureSubImage3DEXT (GLuint, GLenum, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTextureSubImage2DEXT (GLuint, GLenum, GLint, GLint, GLint, GLsizei, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedTextureSubImage1DEXT (GLuint, GLenum, GLint, GLint, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glGetCompressedTextureImageEXT (GLuint, GLenum, GLint, GLvoid *); GLAPI void APIENTRY glCompressedMultiTexImage3DEXT (GLenum, GLenum, GLint, GLenum, GLsizei, GLsizei, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedMultiTexImage2DEXT (GLenum, GLenum, GLint, GLenum, GLsizei, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedMultiTexImage1DEXT (GLenum, GLenum, GLint, GLenum, GLsizei, GLint, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedMultiTexSubImage3DEXT (GLenum, GLenum, GLint, GLint, GLint, GLint, GLsizei, GLsizei, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedMultiTexSubImage2DEXT (GLenum, GLenum, GLint, GLint, GLint, GLsizei, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glCompressedMultiTexSubImage1DEXT (GLenum, GLenum, GLint, GLint, GLsizei, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glGetCompressedMultiTexImageEXT (GLenum, GLenum, GLint, GLvoid *); GLAPI void APIENTRY glNamedProgramStringEXT (GLuint, GLenum, GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glNamedProgramLocalParameter4dEXT (GLuint, GLenum, GLuint, GLdouble, GLdouble, GLdouble, GLdouble); GLAPI void APIENTRY glNamedProgramLocalParameter4dvEXT (GLuint, GLenum, GLuint, const GLdouble *); GLAPI void APIENTRY glNamedProgramLocalParameter4fEXT (GLuint, GLenum, GLuint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glNamedProgramLocalParameter4fvEXT (GLuint, GLenum, GLuint, const GLfloat *); GLAPI void APIENTRY glGetNamedProgramLocalParameterdvEXT (GLuint, GLenum, GLuint, GLdouble *); GLAPI void APIENTRY glGetNamedProgramLocalParameterfvEXT (GLuint, GLenum, GLuint, GLfloat *); GLAPI void APIENTRY glGetNamedProgramivEXT (GLuint, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetNamedProgramStringEXT (GLuint, GLenum, GLenum, GLvoid *); GLAPI void APIENTRY glNamedProgramLocalParameters4fvEXT (GLuint, GLenum, GLuint, GLsizei, const GLfloat *); GLAPI void APIENTRY glNamedProgramLocalParameterI4iEXT (GLuint, GLenum, GLuint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glNamedProgramLocalParameterI4ivEXT (GLuint, GLenum, GLuint, const GLint *); GLAPI void APIENTRY glNamedProgramLocalParametersI4ivEXT (GLuint, GLenum, GLuint, GLsizei, const GLint *); GLAPI void APIENTRY glNamedProgramLocalParameterI4uiEXT (GLuint, GLenum, GLuint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glNamedProgramLocalParameterI4uivEXT (GLuint, GLenum, GLuint, const GLuint *); GLAPI void APIENTRY glNamedProgramLocalParametersI4uivEXT (GLuint, GLenum, GLuint, GLsizei, const GLuint *); GLAPI void APIENTRY glGetNamedProgramLocalParameterIivEXT (GLuint, GLenum, GLuint, GLint *); GLAPI void APIENTRY glGetNamedProgramLocalParameterIuivEXT (GLuint, GLenum, GLuint, GLuint *); GLAPI void APIENTRY glTextureParameterIivEXT (GLuint, GLenum, GLenum, const GLint *); GLAPI void APIENTRY glTextureParameterIuivEXT (GLuint, GLenum, GLenum, const GLuint *); GLAPI void APIENTRY glGetTextureParameterIivEXT (GLuint, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetTextureParameterIuivEXT (GLuint, GLenum, GLenum, GLuint *); GLAPI void APIENTRY glMultiTexParameterIivEXT (GLenum, GLenum, GLenum, const GLint *); GLAPI void APIENTRY glMultiTexParameterIuivEXT (GLenum, GLenum, GLenum, const GLuint *); GLAPI void APIENTRY glGetMultiTexParameterIivEXT (GLenum, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGetMultiTexParameterIuivEXT (GLenum, GLenum, GLenum, GLuint *); GLAPI void APIENTRY glProgramUniform1fEXT (GLuint, GLint, GLfloat); GLAPI void APIENTRY glProgramUniform2fEXT (GLuint, GLint, GLfloat, GLfloat); GLAPI void APIENTRY glProgramUniform3fEXT (GLuint, GLint, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glProgramUniform4fEXT (GLuint, GLint, GLfloat, GLfloat, GLfloat, GLfloat); GLAPI void APIENTRY glProgramUniform1iEXT (GLuint, GLint, GLint); GLAPI void APIENTRY glProgramUniform2iEXT (GLuint, GLint, GLint, GLint); GLAPI void APIENTRY glProgramUniform3iEXT (GLuint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glProgramUniform4iEXT (GLuint, GLint, GLint, GLint, GLint, GLint); GLAPI void APIENTRY glProgramUniform1fvEXT (GLuint, GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glProgramUniform2fvEXT (GLuint, GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glProgramUniform3fvEXT (GLuint, GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glProgramUniform4fvEXT (GLuint, GLint, GLsizei, const GLfloat *); GLAPI void APIENTRY glProgramUniform1ivEXT (GLuint, GLint, GLsizei, const GLint *); GLAPI void APIENTRY glProgramUniform2ivEXT (GLuint, GLint, GLsizei, const GLint *); GLAPI void APIENTRY glProgramUniform3ivEXT (GLuint, GLint, GLsizei, const GLint *); GLAPI void APIENTRY glProgramUniform4ivEXT (GLuint, GLint, GLsizei, const GLint *); GLAPI void APIENTRY glProgramUniformMatrix2fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniformMatrix3fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniformMatrix4fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniformMatrix2x3fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniformMatrix3x2fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniformMatrix2x4fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniformMatrix4x2fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniformMatrix3x4fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniformMatrix4x3fvEXT (GLuint, GLint, GLsizei, GLboolean, const GLfloat *); GLAPI void APIENTRY glProgramUniform1uiEXT (GLuint, GLint, GLuint); GLAPI void APIENTRY glProgramUniform2uiEXT (GLuint, GLint, GLuint, GLuint); GLAPI void APIENTRY glProgramUniform3uiEXT (GLuint, GLint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glProgramUniform4uiEXT (GLuint, GLint, GLuint, GLuint, GLuint, GLuint); GLAPI void APIENTRY glProgramUniform1uivEXT (GLuint, GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glProgramUniform2uivEXT (GLuint, GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glProgramUniform3uivEXT (GLuint, GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glProgramUniform4uivEXT (GLuint, GLint, GLsizei, const GLuint *); GLAPI void APIENTRY glNamedBufferDataEXT (GLuint, GLsizeiptr, const GLvoid *, GLenum); GLAPI void APIENTRY glNamedBufferSubDataEXT (GLuint, GLintptr, GLsizeiptr, const GLvoid *); GLAPI GLvoid* APIENTRY glMapNamedBufferEXT (GLuint, GLenum); GLAPI GLboolean APIENTRY glUnmapNamedBufferEXT (GLuint); GLAPI void APIENTRY glGetNamedBufferParameterivEXT (GLuint, GLenum, GLint *); GLAPI void APIENTRY glGetNamedBufferPointervEXT (GLuint, GLenum, GLvoid* *); GLAPI void APIENTRY glGetNamedBufferSubDataEXT (GLuint, GLintptr, GLsizeiptr, GLvoid *); GLAPI void APIENTRY glTextureBufferEXT (GLuint, GLenum, GLenum, GLuint); GLAPI void APIENTRY glMultiTexBufferEXT (GLenum, GLenum, GLenum, GLuint); GLAPI void APIENTRY glNamedRenderbufferStorageEXT (GLuint, GLenum, GLsizei, GLsizei); GLAPI void APIENTRY glGetNamedRenderbufferParameterivEXT (GLuint, GLenum, GLint *); GLAPI GLenum APIENTRY glCheckNamedFramebufferStatusEXT (GLuint, GLenum); GLAPI void APIENTRY glNamedFramebufferTexture1DEXT (GLuint, GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glNamedFramebufferTexture2DEXT (GLuint, GLenum, GLenum, GLuint, GLint); GLAPI void APIENTRY glNamedFramebufferTexture3DEXT (GLuint, GLenum, GLenum, GLuint, GLint, GLint); GLAPI void APIENTRY glNamedFramebufferRenderbufferEXT (GLuint, GLenum, GLenum, GLuint); GLAPI void APIENTRY glGetNamedFramebufferAttachmentParameterivEXT (GLuint, GLenum, GLenum, GLint *); GLAPI void APIENTRY glGenerateTextureMipmapEXT (GLuint, GLenum); GLAPI void APIENTRY glGenerateMultiTexMipmapEXT (GLenum, GLenum); GLAPI void APIENTRY glFramebufferDrawBufferEXT (GLuint, GLenum); GLAPI void APIENTRY glFramebufferDrawBuffersEXT (GLuint, GLsizei, const GLenum *); GLAPI void APIENTRY glFramebufferReadBufferEXT (GLuint, GLenum); GLAPI void APIENTRY glGetFramebufferParameterivEXT (GLuint, GLenum, GLint *); GLAPI void APIENTRY glNamedRenderbufferStorageMultisampleEXT (GLuint, GLsizei, GLenum, GLsizei, GLsizei); GLAPI void APIENTRY glNamedRenderbufferStorageMultisampleCoverageEXT (GLuint, GLsizei, GLsizei, GLenum, GLsizei, GLsizei); GLAPI void APIENTRY glNamedFramebufferTextureEXT (GLuint, GLenum, GLuint, GLint); GLAPI void APIENTRY glNamedFramebufferTextureLayerEXT (GLuint, GLenum, GLuint, GLint, GLint); GLAPI void APIENTRY glNamedFramebufferTextureFaceEXT (GLuint, GLenum, GLuint, GLint, GLenum); GLAPI void APIENTRY glTextureRenderbufferEXT (GLuint, GLenum, GLuint); GLAPI void APIENTRY glMultiTexRenderbufferEXT (GLenum, GLenum, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLCLIENTATTRIBDEFAULTEXTPROC) (GLbitfield mask); typedef void (APIENTRYP PFNGLPUSHCLIENTATTRIBDEFAULTEXTPROC) (GLbitfield mask); typedef void (APIENTRYP PFNGLMATRIXLOADFEXTPROC) (GLenum mode, const GLfloat *m); typedef void (APIENTRYP PFNGLMATRIXLOADDEXTPROC) (GLenum mode, const GLdouble *m); typedef void (APIENTRYP PFNGLMATRIXMULTFEXTPROC) (GLenum mode, const GLfloat *m); typedef void (APIENTRYP PFNGLMATRIXMULTDEXTPROC) (GLenum mode, const GLdouble *m); typedef void (APIENTRYP PFNGLMATRIXLOADIDENTITYEXTPROC) (GLenum mode); typedef void (APIENTRYP PFNGLMATRIXROTATEFEXTPROC) (GLenum mode, GLfloat angle, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLMATRIXROTATEDEXTPROC) (GLenum mode, GLdouble angle, GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLMATRIXSCALEFEXTPROC) (GLenum mode, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLMATRIXSCALEDEXTPROC) (GLenum mode, GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLMATRIXTRANSLATEFEXTPROC) (GLenum mode, GLfloat x, GLfloat y, GLfloat z); typedef void (APIENTRYP PFNGLMATRIXTRANSLATEDEXTPROC) (GLenum mode, GLdouble x, GLdouble y, GLdouble z); typedef void (APIENTRYP PFNGLMATRIXFRUSTUMEXTPROC) (GLenum mode, GLdouble left, GLdouble right, GLdouble bottom, GLdouble top, GLdouble zNear, GLdouble zFar); typedef void (APIENTRYP PFNGLMATRIXORTHOEXTPROC) (GLenum mode, GLdouble left, GLdouble right, GLdouble bottom, GLdouble top, GLdouble zNear, GLdouble zFar); typedef void (APIENTRYP PFNGLMATRIXPOPEXTPROC) (GLenum mode); typedef void (APIENTRYP PFNGLMATRIXPUSHEXTPROC) (GLenum mode); typedef void (APIENTRYP PFNGLMATRIXLOADTRANSPOSEFEXTPROC) (GLenum mode, const GLfloat *m); typedef void (APIENTRYP PFNGLMATRIXLOADTRANSPOSEDEXTPROC) (GLenum mode, const GLdouble *m); typedef void (APIENTRYP PFNGLMATRIXMULTTRANSPOSEFEXTPROC) (GLenum mode, const GLfloat *m); typedef void (APIENTRYP PFNGLMATRIXMULTTRANSPOSEDEXTPROC) (GLenum mode, const GLdouble *m); typedef void (APIENTRYP PFNGLTEXTUREPARAMETERFEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLTEXTUREPARAMETERFVEXTPROC) (GLuint texture, GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLTEXTUREPARAMETERIEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLTEXTUREPARAMETERIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLTEXTUREIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLTEXTUREIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLTEXTURESUBIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLTEXTURESUBIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLCOPYTEXTUREIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border); typedef void (APIENTRYP PFNGLCOPYTEXTUREIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border); typedef void (APIENTRYP PFNGLCOPYTEXTURESUBIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width); typedef void (APIENTRYP PFNGLCOPYTEXTURESUBIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLGETTEXTUREIMAGEEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum format, GLenum type, GLvoid *pixels); typedef void (APIENTRYP PFNGLGETTEXTUREPARAMETERFVEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETTEXTUREPARAMETERIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETTEXTURELEVELPARAMETERFVEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETTEXTURELEVELPARAMETERIVEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLTEXTUREIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLTEXTURESUBIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLCOPYTEXTURESUBIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLMULTITEXPARAMETERFEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLMULTITEXPARAMETERFVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLMULTITEXPARAMETERIEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLMULTITEXPARAMETERIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLMULTITEXIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLMULTITEXIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLMULTITEXSUBIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLMULTITEXSUBIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLCOPYMULTITEXIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border); typedef void (APIENTRYP PFNGLCOPYMULTITEXIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border); typedef void (APIENTRYP PFNGLCOPYMULTITEXSUBIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width); typedef void (APIENTRYP PFNGLCOPYMULTITEXSUBIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLGETMULTITEXIMAGEEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum format, GLenum type, GLvoid *pixels); typedef void (APIENTRYP PFNGLGETMULTITEXPARAMETERFVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETMULTITEXPARAMETERIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETMULTITEXLEVELPARAMETERFVEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETMULTITEXLEVELPARAMETERIVEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLMULTITEXIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLMULTITEXSUBIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const GLvoid *pixels); typedef void (APIENTRYP PFNGLCOPYMULTITEXSUBIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLBINDMULTITEXTUREEXTPROC) (GLenum texunit, GLenum target, GLuint texture); typedef void (APIENTRYP PFNGLENABLECLIENTSTATEINDEXEDEXTPROC) (GLenum array, GLuint index); typedef void (APIENTRYP PFNGLDISABLECLIENTSTATEINDEXEDEXTPROC) (GLenum array, GLuint index); typedef void (APIENTRYP PFNGLMULTITEXCOORDPOINTEREXTPROC) (GLenum texunit, GLint size, GLenum type, GLsizei stride, const GLvoid *pointer); typedef void (APIENTRYP PFNGLMULTITEXENVFEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLMULTITEXENVFVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLMULTITEXENVIEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLMULTITEXENVIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLMULTITEXGENDEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLdouble param); typedef void (APIENTRYP PFNGLMULTITEXGENDVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, const GLdouble *params); typedef void (APIENTRYP PFNGLMULTITEXGENFEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLfloat param); typedef void (APIENTRYP PFNGLMULTITEXGENFVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, const GLfloat *params); typedef void (APIENTRYP PFNGLMULTITEXGENIEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLint param); typedef void (APIENTRYP PFNGLMULTITEXGENIVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLGETMULTITEXENVFVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETMULTITEXENVIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETMULTITEXGENDVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLdouble *params); typedef void (APIENTRYP PFNGLGETMULTITEXGENFVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLfloat *params); typedef void (APIENTRYP PFNGLGETMULTITEXGENIVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETFLOATINDEXEDVEXTPROC) (GLenum target, GLuint index, GLfloat *data); typedef void (APIENTRYP PFNGLGETDOUBLEINDEXEDVEXTPROC) (GLenum target, GLuint index, GLdouble *data); typedef void (APIENTRYP PFNGLGETPOINTERINDEXEDVEXTPROC) (GLenum target, GLuint index, GLvoid* *data); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTUREIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTUREIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTUREIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTURESUBIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTURESUBIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTURESUBIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLGETCOMPRESSEDTEXTUREIMAGEEXTPROC) (GLuint texture, GLenum target, GLint lod, GLvoid *img); typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXSUBIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXSUBIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXSUBIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const GLvoid *bits); typedef void (APIENTRYP PFNGLGETCOMPRESSEDMULTITEXIMAGEEXTPROC) (GLenum texunit, GLenum target, GLint lod, GLvoid *img); typedef void (APIENTRYP PFNGLNAMEDPROGRAMSTRINGEXTPROC) (GLuint program, GLenum target, GLenum format, GLsizei len, const GLvoid *string); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETER4DEXTPROC) (GLuint program, GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETER4DVEXTPROC) (GLuint program, GLenum target, GLuint index, const GLdouble *params); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETER4FEXTPROC) (GLuint program, GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETER4FVEXTPROC) (GLuint program, GLenum target, GLuint index, const GLfloat *params); typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMLOCALPARAMETERDVEXTPROC) (GLuint program, GLenum target, GLuint index, GLdouble *params); typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMLOCALPARAMETERFVEXTPROC) (GLuint program, GLenum target, GLuint index, GLfloat *params); typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMIVEXTPROC) (GLuint program, GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMSTRINGEXTPROC) (GLuint program, GLenum target, GLenum pname, GLvoid *string); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERS4FVEXTPROC) (GLuint program, GLenum target, GLuint index, GLsizei count, const GLfloat *params); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERI4IEXTPROC) (GLuint program, GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERI4IVEXTPROC) (GLuint program, GLenum target, GLuint index, const GLint *params); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERSI4IVEXTPROC) (GLuint program, GLenum target, GLuint index, GLsizei count, const GLint *params); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERI4UIEXTPROC) (GLuint program, GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERI4UIVEXTPROC) (GLuint program, GLenum target, GLuint index, const GLuint *params); typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERSI4UIVEXTPROC) (GLuint program, GLenum target, GLuint index, GLsizei count, const GLuint *params); typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMLOCALPARAMETERIIVEXTPROC) (GLuint program, GLenum target, GLuint index, GLint *params); typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMLOCALPARAMETERIUIVEXTPROC) (GLuint program, GLenum target, GLuint index, GLuint *params); typedef void (APIENTRYP PFNGLTEXTUREPARAMETERIIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLTEXTUREPARAMETERIUIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, const GLuint *params); typedef void (APIENTRYP PFNGLGETTEXTUREPARAMETERIIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETTEXTUREPARAMETERIUIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLuint *params); typedef void (APIENTRYP PFNGLMULTITEXPARAMETERIIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLint *params); typedef void (APIENTRYP PFNGLMULTITEXPARAMETERIUIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLuint *params); typedef void (APIENTRYP PFNGLGETMULTITEXPARAMETERIIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETMULTITEXPARAMETERIUIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLuint *params); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1FEXTPROC) (GLuint program, GLint location, GLfloat v0); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2FEXTPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3FEXTPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4FEXTPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1IEXTPROC) (GLuint program, GLint location, GLint v0); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2IEXTPROC) (GLuint program, GLint location, GLint v0, GLint v1); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3IEXTPROC) (GLuint program, GLint location, GLint v0, GLint v1, GLint v2); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4IEXTPROC) (GLuint program, GLint location, GLint v0, GLint v1, GLint v2, GLint v3); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1FVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2FVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3FVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4FVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1IVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2IVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3IVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4IVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLint *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X3FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X2FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X4FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X2FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X4FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X3FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1UIEXTPROC) (GLuint program, GLint location, GLuint v0); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2UIEXTPROC) (GLuint program, GLint location, GLuint v0, GLuint v1); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3UIEXTPROC) (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4UIEXTPROC) (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1UIVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2UIVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3UIVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4UIVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value); typedef void (APIENTRYP PFNGLNAMEDBUFFERDATAEXTPROC) (GLuint buffer, GLsizeiptr size, const GLvoid *data, GLenum usage); typedef void (APIENTRYP PFNGLNAMEDBUFFERSUBDATAEXTPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, const GLvoid *data); typedef GLvoid* (APIENTRYP PFNGLMAPNAMEDBUFFEREXTPROC) (GLuint buffer, GLenum access); typedef GLboolean (APIENTRYP PFNGLUNMAPNAMEDBUFFEREXTPROC) (GLuint buffer); typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPARAMETERIVEXTPROC) (GLuint buffer, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPOINTERVEXTPROC) (GLuint buffer, GLenum pname, GLvoid* *params); typedef void (APIENTRYP PFNGLGETNAMEDBUFFERSUBDATAEXTPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, GLvoid *data); typedef void (APIENTRYP PFNGLTEXTUREBUFFEREXTPROC) (GLuint texture, GLenum target, GLenum internalformat, GLuint buffer); typedef void (APIENTRYP PFNGLMULTITEXBUFFEREXTPROC) (GLenum texunit, GLenum target, GLenum internalformat, GLuint buffer); typedef void (APIENTRYP PFNGLNAMEDRENDERBUFFERSTORAGEEXTPROC) (GLuint renderbuffer, GLenum internalformat, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLGETNAMEDRENDERBUFFERPARAMETERIVEXTPROC) (GLuint renderbuffer, GLenum pname, GLint *params); typedef GLenum (APIENTRYP PFNGLCHECKNAMEDFRAMEBUFFERSTATUSEXTPROC) (GLuint framebuffer, GLenum target); typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTURE1DEXTPROC) (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTURE2DEXTPROC) (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTURE3DEXTPROC) (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset); typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERRENDERBUFFEREXTPROC) (GLuint framebuffer, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer); typedef void (APIENTRYP PFNGLGETNAMEDFRAMEBUFFERATTACHMENTPARAMETERIVEXTPROC) (GLuint framebuffer, GLenum attachment, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGENERATETEXTUREMIPMAPEXTPROC) (GLuint texture, GLenum target); typedef void (APIENTRYP PFNGLGENERATEMULTITEXMIPMAPEXTPROC) (GLenum texunit, GLenum target); typedef void (APIENTRYP PFNGLFRAMEBUFFERDRAWBUFFEREXTPROC) (GLuint framebuffer, GLenum mode); typedef void (APIENTRYP PFNGLFRAMEBUFFERDRAWBUFFERSEXTPROC) (GLuint framebuffer, GLsizei n, const GLenum *bufs); typedef void (APIENTRYP PFNGLFRAMEBUFFERREADBUFFEREXTPROC) (GLuint framebuffer, GLenum mode); typedef void (APIENTRYP PFNGLGETFRAMEBUFFERPARAMETERIVEXTPROC) (GLuint framebuffer, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLNAMEDRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC) (GLuint renderbuffer, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLNAMEDRENDERBUFFERSTORAGEMULTISAMPLECOVERAGEEXTPROC) (GLuint renderbuffer, GLsizei coverageSamples, GLsizei colorSamples, GLenum internalformat, GLsizei width, GLsizei height); typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTUREEXTPROC) (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level); typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTURELAYEREXTPROC) (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level, GLint layer); typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTUREFACEEXTPROC) (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level, GLenum face); typedef void (APIENTRYP PFNGLTEXTURERENDERBUFFEREXTPROC) (GLuint texture, GLenum target, GLuint renderbuffer); typedef void (APIENTRYP PFNGLMULTITEXRENDERBUFFEREXTPROC) (GLenum texunit, GLenum target, GLuint renderbuffer); #endif #ifndef GL_EXT_vertex_array_bgra #define GL_EXT_vertex_array_bgra 1 #endif #ifndef GL_EXT_texture_swizzle #define GL_EXT_texture_swizzle 1 #endif #ifndef GL_NV_explicit_multisample #define GL_NV_explicit_multisample 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetMultisamplefvNV (GLenum, GLuint, GLfloat *); GLAPI void APIENTRY glSampleMaskIndexedNV (GLuint, GLbitfield); GLAPI void APIENTRY glTexRenderbufferNV (GLenum, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETMULTISAMPLEFVNVPROC) (GLenum pname, GLuint index, GLfloat *val); typedef void (APIENTRYP PFNGLSAMPLEMASKINDEXEDNVPROC) (GLuint index, GLbitfield mask); typedef void (APIENTRYP PFNGLTEXRENDERBUFFERNVPROC) (GLenum target, GLuint renderbuffer); #endif #ifndef GL_NV_transform_feedback2 #define GL_NV_transform_feedback2 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBindTransformFeedbackNV (GLenum, GLuint); GLAPI void APIENTRY glDeleteTransformFeedbacksNV (GLsizei, const GLuint *); GLAPI void APIENTRY glGenTransformFeedbacksNV (GLsizei, GLuint *); GLAPI GLboolean APIENTRY glIsTransformFeedbackNV (GLuint); GLAPI void APIENTRY glPauseTransformFeedbackNV (void); GLAPI void APIENTRY glResumeTransformFeedbackNV (void); GLAPI void APIENTRY glDrawTransformFeedbackNV (GLenum, GLuint); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBINDTRANSFORMFEEDBACKNVPROC) (GLenum target, GLuint id); typedef void (APIENTRYP PFNGLDELETETRANSFORMFEEDBACKSNVPROC) (GLsizei n, const GLuint *ids); typedef void (APIENTRYP PFNGLGENTRANSFORMFEEDBACKSNVPROC) (GLsizei n, GLuint *ids); typedef GLboolean (APIENTRYP PFNGLISTRANSFORMFEEDBACKNVPROC) (GLuint id); typedef void (APIENTRYP PFNGLPAUSETRANSFORMFEEDBACKNVPROC) (void); typedef void (APIENTRYP PFNGLRESUMETRANSFORMFEEDBACKNVPROC) (void); typedef void (APIENTRYP PFNGLDRAWTRANSFORMFEEDBACKNVPROC) (GLenum mode, GLuint id); #endif #ifndef GL_ATI_meminfo #define GL_ATI_meminfo 1 #endif #ifndef GL_AMD_performance_monitor #define GL_AMD_performance_monitor 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glGetPerfMonitorGroupsAMD (GLint *, GLsizei, GLuint *); GLAPI void APIENTRY glGetPerfMonitorCountersAMD (GLuint, GLint *, GLint *, GLsizei, GLuint *); GLAPI void APIENTRY glGetPerfMonitorGroupStringAMD (GLuint, GLsizei, GLsizei *, GLchar *); GLAPI void APIENTRY glGetPerfMonitorCounterStringAMD (GLuint, GLuint, GLsizei, GLsizei *, GLchar *); GLAPI void APIENTRY glGetPerfMonitorCounterInfoAMD (GLuint, GLuint, GLenum, void *); GLAPI void APIENTRY glGenPerfMonitorsAMD (GLsizei, GLuint *); GLAPI void APIENTRY glDeletePerfMonitorsAMD (GLsizei, GLuint *); GLAPI void APIENTRY glSelectPerfMonitorCountersAMD (GLuint, GLboolean, GLuint, GLint, GLuint *); GLAPI void APIENTRY glBeginPerfMonitorAMD (GLuint); GLAPI void APIENTRY glEndPerfMonitorAMD (GLuint); GLAPI void APIENTRY glGetPerfMonitorCounterDataAMD (GLuint, GLenum, GLsizei, GLuint *, GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLGETPERFMONITORGROUPSAMDPROC) (GLint *numGroups, GLsizei groupsSize, GLuint *groups); typedef void (APIENTRYP PFNGLGETPERFMONITORCOUNTERSAMDPROC) (GLuint group, GLint *numCounters, GLint *maxActiveCounters, GLsizei counterSize, GLuint *counters); typedef void (APIENTRYP PFNGLGETPERFMONITORGROUPSTRINGAMDPROC) (GLuint group, GLsizei bufSize, GLsizei *length, GLchar *groupString); typedef void (APIENTRYP PFNGLGETPERFMONITORCOUNTERSTRINGAMDPROC) (GLuint group, GLuint counter, GLsizei bufSize, GLsizei *length, GLchar *counterString); typedef void (APIENTRYP PFNGLGETPERFMONITORCOUNTERINFOAMDPROC) (GLuint group, GLuint counter, GLenum pname, void *data); typedef void (APIENTRYP PFNGLGENPERFMONITORSAMDPROC) (GLsizei n, GLuint *monitors); typedef void (APIENTRYP PFNGLDELETEPERFMONITORSAMDPROC) (GLsizei n, GLuint *monitors); typedef void (APIENTRYP PFNGLSELECTPERFMONITORCOUNTERSAMDPROC) (GLuint monitor, GLboolean enable, GLuint group, GLint numCounters, GLuint *counterList); typedef void (APIENTRYP PFNGLBEGINPERFMONITORAMDPROC) (GLuint monitor); typedef void (APIENTRYP PFNGLENDPERFMONITORAMDPROC) (GLuint monitor); typedef void (APIENTRYP PFNGLGETPERFMONITORCOUNTERDATAAMDPROC) (GLuint monitor, GLenum pname, GLsizei dataSize, GLuint *data, GLint *bytesWritten); #endif #ifndef GL_AMD_texture_texture4 #define GL_AMD_texture_texture4 1 #endif #ifndef GL_AMD_vertex_shader_tesselator #define GL_AMD_vertex_shader_tesselator 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTessellationFactorAMD (GLfloat); GLAPI void APIENTRY glTessellationModeAMD (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTESSELLATIONFACTORAMDPROC) (GLfloat factor); typedef void (APIENTRYP PFNGLTESSELLATIONMODEAMDPROC) (GLenum mode); #endif #ifndef GL_EXT_provoking_vertex #define GL_EXT_provoking_vertex 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glProvokingVertexEXT (GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLPROVOKINGVERTEXEXTPROC) (GLenum mode); #endif #ifndef GL_EXT_texture_snorm #define GL_EXT_texture_snorm 1 #endif #ifndef GL_AMD_draw_buffers_blend #define GL_AMD_draw_buffers_blend 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glBlendFuncIndexedAMD (GLuint, GLenum, GLenum); GLAPI void APIENTRY glBlendFuncSeparateIndexedAMD (GLuint, GLenum, GLenum, GLenum, GLenum); GLAPI void APIENTRY glBlendEquationIndexedAMD (GLuint, GLenum); GLAPI void APIENTRY glBlendEquationSeparateIndexedAMD (GLuint, GLenum, GLenum); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLBLENDFUNCINDEXEDAMDPROC) (GLuint buf, GLenum src, GLenum dst); typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEINDEXEDAMDPROC) (GLuint buf, GLenum srcRGB, GLenum dstRGB, GLenum srcAlpha, GLenum dstAlpha); typedef void (APIENTRYP PFNGLBLENDEQUATIONINDEXEDAMDPROC) (GLuint buf, GLenum mode); typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEINDEXEDAMDPROC) (GLuint buf, GLenum modeRGB, GLenum modeAlpha); #endif #ifndef GL_APPLE_texture_range #define GL_APPLE_texture_range 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glTextureRangeAPPLE (GLenum, GLsizei, const GLvoid *); GLAPI void APIENTRY glGetTexParameterPointervAPPLE (GLenum, GLenum, GLvoid* *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLTEXTURERANGEAPPLEPROC) (GLenum target, GLsizei length, const GLvoid *pointer); typedef void (APIENTRYP PFNGLGETTEXPARAMETERPOINTERVAPPLEPROC) (GLenum target, GLenum pname, GLvoid* *params); #endif #ifndef GL_APPLE_float_pixels #define GL_APPLE_float_pixels 1 #endif #ifndef GL_APPLE_vertex_program_evaluators #define GL_APPLE_vertex_program_evaluators 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI void APIENTRY glEnableVertexAttribAPPLE (GLuint, GLenum); GLAPI void APIENTRY glDisableVertexAttribAPPLE (GLuint, GLenum); GLAPI GLboolean APIENTRY glIsVertexAttribEnabledAPPLE (GLuint, GLenum); GLAPI void APIENTRY glMapVertexAttrib1dAPPLE (GLuint, GLuint, GLdouble, GLdouble, GLint, GLint, const GLdouble *); GLAPI void APIENTRY glMapVertexAttrib1fAPPLE (GLuint, GLuint, GLfloat, GLfloat, GLint, GLint, const GLfloat *); GLAPI void APIENTRY glMapVertexAttrib2dAPPLE (GLuint, GLuint, GLdouble, GLdouble, GLint, GLint, GLdouble, GLdouble, GLint, GLint, const GLdouble *); GLAPI void APIENTRY glMapVertexAttrib2fAPPLE (GLuint, GLuint, GLfloat, GLfloat, GLint, GLint, GLfloat, GLfloat, GLint, GLint, const GLfloat *); #endif /* GL_GLEXT_PROTOTYPES */ typedef void (APIENTRYP PFNGLENABLEVERTEXATTRIBAPPLEPROC) (GLuint index, GLenum pname); typedef void (APIENTRYP PFNGLDISABLEVERTEXATTRIBAPPLEPROC) (GLuint index, GLenum pname); typedef GLboolean (APIENTRYP PFNGLISVERTEXATTRIBENABLEDAPPLEPROC) (GLuint index, GLenum pname); typedef void (APIENTRYP PFNGLMAPVERTEXATTRIB1DAPPLEPROC) (GLuint index, GLuint size, GLdouble u1, GLdouble u2, GLint stride, GLint order, const GLdouble *points); typedef void (APIENTRYP PFNGLMAPVERTEXATTRIB1FAPPLEPROC) (GLuint index, GLuint size, GLfloat u1, GLfloat u2, GLint stride, GLint order, const GLfloat *points); typedef void (APIENTRYP PFNGLMAPVERTEXATTRIB2DAPPLEPROC) (GLuint index, GLuint size, GLdouble u1, GLdouble u2, GLint ustride, GLint uorder, GLdouble v1, GLdouble v2, GLint vstride, GLint vorder, const GLdouble *points); typedef void (APIENTRYP PFNGLMAPVERTEXATTRIB2FAPPLEPROC) (GLuint index, GLuint size, GLfloat u1, GLfloat u2, GLint ustride, GLint uorder, GLfloat v1, GLfloat v2, GLint vstride, GLint vorder, const GLfloat *points); #endif #ifndef GL_APPLE_aux_depth_stencil #define GL_APPLE_aux_depth_stencil 1 #endif #ifndef GL_APPLE_object_purgeable #define GL_APPLE_object_purgeable 1 #ifdef GL_GLEXT_PROTOTYPES GLAPI GLenum APIENTRY glObjectPurgeableAPPLE (GLenum, GLuint, GLenum); GLAPI GLenum APIENTRY glObjectUnpurgeableAPPLE (GLenum, GLuint, GLenum); GLAPI void APIENTRY glGetObjectParameterivAPPLE (GLenum, GLuint, GLenum, GLint *); #endif /* GL_GLEXT_PROTOTYPES */ typedef GLenum (APIENTRYP PFNGLOBJECTPURGEABLEAPPLEPROC) (GLenum objectType, GLuint name, GLenum option); typedef GLenum (APIENTRYP PFNGLOBJECTUNPURGEABLEAPPLEPROC) (GLenum objectType, GLuint name, GLenum option); typedef void (APIENTRYP PFNGLGETOBJECTPARAMETERIVAPPLEPROC) (GLenum objectType, GLuint name, GLenum pname, GLint *params); #endif #ifndef GL_APPLE_row_bytes #define GL_APPLE_row_bytes 1 #endif #ifdef __cplusplus } #endif #endif clr-rocm-5.7.1/opencl/khronos/headers/KHR/000077500000000000000000000000001450307266000202475ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/KHR/khrplatform.h000066400000000000000000000234461450307266000227620ustar00rootroot00000000000000#ifndef __khrplatform_h_ #define __khrplatform_h_ /* ** Copyright (c) 2008-2009 The Khronos Group Inc. ** ** Permission is hereby granted, free of charge, to any person obtaining a ** copy of this software and/or associated documentation files (the ** "Materials"), to deal in the Materials without restriction, including ** without limitation the rights to use, copy, modify, merge, publish, ** distribute, sublicense, and/or sell copies of the Materials, and to ** permit persons to whom the Materials are furnished to do so, subject to ** the following conditions: ** ** The above copyright notice and this permission notice shall be included ** in all copies or substantial portions of the Materials. ** ** THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, ** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF ** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. ** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY ** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, ** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE ** MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. */ /* Khronos platform-specific types and definitions. * * $Revision: 23298 $ on $Date: 2013-09-30 17:07:13 -0700 (Mon, 30 Sep 2013) $ * * Adopters may modify this file to suit their platform. Adopters are * encouraged to submit platform specific modifications to the Khronos * group so that they can be included in future versions of this file. * Please submit changes by sending them to the public Khronos Bugzilla * (http://khronos.org/bugzilla) by filing a bug against product * "Khronos (general)" component "Registry". * * A predefined template which fills in some of the bug fields can be * reached using http://tinyurl.com/khrplatform-h-bugreport, but you * must create a Bugzilla login first. * * * See the Implementer's Guidelines for information about where this file * should be located on your system and for more details of its use: * http://www.khronos.org/registry/implementers_guide.pdf * * This file should be included as * #include * by Khronos client API header files that use its types and defines. * * The types in khrplatform.h should only be used to define API-specific types. * * Types defined in khrplatform.h: * khronos_int8_t signed 8 bit * khronos_uint8_t unsigned 8 bit * khronos_int16_t signed 16 bit * khronos_uint16_t unsigned 16 bit * khronos_int32_t signed 32 bit * khronos_uint32_t unsigned 32 bit * khronos_int64_t signed 64 bit * khronos_uint64_t unsigned 64 bit * khronos_intptr_t signed same number of bits as a pointer * khronos_uintptr_t unsigned same number of bits as a pointer * khronos_ssize_t signed size * khronos_usize_t unsigned size * khronos_float_t signed 32 bit floating point * khronos_time_ns_t unsigned 64 bit time in nanoseconds * khronos_utime_nanoseconds_t unsigned time interval or absolute time in * nanoseconds * khronos_stime_nanoseconds_t signed time interval in nanoseconds * khronos_boolean_enum_t enumerated boolean type. This should * only be used as a base type when a client API's boolean type is * an enum. Client APIs which use an integer or other type for * booleans cannot use this as the base type for their boolean. * * Tokens defined in khrplatform.h: * * KHRONOS_FALSE, KHRONOS_TRUE Enumerated boolean false/true values. * * KHRONOS_SUPPORT_INT64 is 1 if 64 bit integers are supported; otherwise 0. * KHRONOS_SUPPORT_FLOAT is 1 if floats are supported; otherwise 0. * * Calling convention macros defined in this file: * KHRONOS_APICALL * KHRONOS_APIENTRY * KHRONOS_APIATTRIBUTES * * These may be used in function prototypes as: * * KHRONOS_APICALL void KHRONOS_APIENTRY funcname( * int arg1, * int arg2) KHRONOS_APIATTRIBUTES; */ /*------------------------------------------------------------------------- * Definition of KHRONOS_APICALL *------------------------------------------------------------------------- * This precedes the return type of the function in the function prototype. */ #if defined(_WIN32) && !defined(__SCITECH_SNAP__) # define KHRONOS_APICALL __declspec(dllimport) #elif defined (__SYMBIAN32__) # define KHRONOS_APICALL IMPORT_C #else # define KHRONOS_APICALL #endif /*------------------------------------------------------------------------- * Definition of KHRONOS_APIENTRY *------------------------------------------------------------------------- * This follows the return type of the function and precedes the function * name in the function prototype. */ #if defined(_WIN32) && !defined(_WIN32_WCE) && !defined(__SCITECH_SNAP__) /* Win32 but not WinCE */ # define KHRONOS_APIENTRY __stdcall #else # define KHRONOS_APIENTRY #endif /*------------------------------------------------------------------------- * Definition of KHRONOS_APIATTRIBUTES *------------------------------------------------------------------------- * This follows the closing parenthesis of the function prototype arguments. */ #if defined (__ARMCC_2__) #define KHRONOS_APIATTRIBUTES __softfp #else #define KHRONOS_APIATTRIBUTES #endif /*------------------------------------------------------------------------- * basic type definitions *-----------------------------------------------------------------------*/ #if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L) || defined(__GNUC__) || defined(__SCO__) || defined(__USLC__) /* * Using */ #include typedef int32_t khronos_int32_t; typedef uint32_t khronos_uint32_t; typedef int64_t khronos_int64_t; typedef uint64_t khronos_uint64_t; #define KHRONOS_SUPPORT_INT64 1 #define KHRONOS_SUPPORT_FLOAT 1 #elif defined(__VMS ) || defined(__sgi) /* * Using */ #include typedef int32_t khronos_int32_t; typedef uint32_t khronos_uint32_t; typedef int64_t khronos_int64_t; typedef uint64_t khronos_uint64_t; #define KHRONOS_SUPPORT_INT64 1 #define KHRONOS_SUPPORT_FLOAT 1 #elif defined(_WIN32) && !defined(__SCITECH_SNAP__) /* * Win32 */ typedef __int32 khronos_int32_t; typedef unsigned __int32 khronos_uint32_t; typedef __int64 khronos_int64_t; typedef unsigned __int64 khronos_uint64_t; #define KHRONOS_SUPPORT_INT64 1 #define KHRONOS_SUPPORT_FLOAT 1 #elif defined(__sun__) || defined(__digital__) /* * Sun or Digital */ typedef int khronos_int32_t; typedef unsigned int khronos_uint32_t; #if defined(__arch64__) || defined(_LP64) typedef long int khronos_int64_t; typedef unsigned long int khronos_uint64_t; #else typedef long long int khronos_int64_t; typedef unsigned long long int khronos_uint64_t; #endif /* __arch64__ */ #define KHRONOS_SUPPORT_INT64 1 #define KHRONOS_SUPPORT_FLOAT 1 #elif 0 /* * Hypothetical platform with no float or int64 support */ typedef int khronos_int32_t; typedef unsigned int khronos_uint32_t; #define KHRONOS_SUPPORT_INT64 0 #define KHRONOS_SUPPORT_FLOAT 0 #else /* * Generic fallback */ #include typedef int32_t khronos_int32_t; typedef uint32_t khronos_uint32_t; typedef int64_t khronos_int64_t; typedef uint64_t khronos_uint64_t; #define KHRONOS_SUPPORT_INT64 1 #define KHRONOS_SUPPORT_FLOAT 1 #endif /* * Types that are (so far) the same on all platforms */ typedef signed char khronos_int8_t; typedef unsigned char khronos_uint8_t; typedef signed short int khronos_int16_t; typedef unsigned short int khronos_uint16_t; /* * Types that differ between LLP64 and LP64 architectures - in LLP64, * pointers are 64 bits, but 'long' is still 32 bits. Win64 appears * to be the only LLP64 architecture in current use. */ #ifdef _WIN64 typedef signed long long int khronos_intptr_t; typedef unsigned long long int khronos_uintptr_t; typedef signed long long int khronos_ssize_t; typedef unsigned long long int khronos_usize_t; #else typedef signed long int khronos_intptr_t; typedef unsigned long int khronos_uintptr_t; typedef signed long int khronos_ssize_t; typedef unsigned long int khronos_usize_t; #endif #if KHRONOS_SUPPORT_FLOAT /* * Float type */ typedef float khronos_float_t; #endif #if KHRONOS_SUPPORT_INT64 /* Time types * * These types can be used to represent a time interval in nanoseconds or * an absolute Unadjusted System Time. Unadjusted System Time is the number * of nanoseconds since some arbitrary system event (e.g. since the last * time the system booted). The Unadjusted System Time is an unsigned * 64 bit value that wraps back to 0 every 584 years. Time intervals * may be either signed or unsigned. */ typedef khronos_uint64_t khronos_utime_nanoseconds_t; typedef khronos_int64_t khronos_stime_nanoseconds_t; #endif /* * Dummy value used to pad enum types to 32 bits. */ #ifndef KHRONOS_MAX_ENUM #define KHRONOS_MAX_ENUM 0x7FFFFFFF #endif /* * Enumerated boolean type * * Values other than zero should be considered to be true. Therefore * comparisons should not be made against KHRONOS_TRUE. */ typedef enum { KHRONOS_FALSE = 0, KHRONOS_TRUE = 1, KHRONOS_BOOLEAN_ENUM_FORCE_SIZE = KHRONOS_MAX_ENUM } khronos_boolean_enum_t; #endif /* __khrplatform_h_ */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/000077500000000000000000000000001450307266000213245ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/000077500000000000000000000000001450307266000216225ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl.h000066400000000000000000001705441450307266000224040ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ #ifndef __OPENCL_CL_H #define __OPENCL_CL_H #ifdef __APPLE__ #include #else #include #endif #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ typedef struct _cl_platform_id * cl_platform_id; typedef struct _cl_device_id * cl_device_id; typedef struct _cl_context * cl_context; typedef struct _cl_command_queue * cl_command_queue; typedef struct _cl_mem * cl_mem; typedef struct _cl_program * cl_program; typedef struct _cl_kernel * cl_kernel; typedef struct _cl_event * cl_event; typedef struct _cl_sampler * cl_sampler; typedef cl_uint cl_bool; /* WARNING! Unlike cl_ types in cl_platform.h, cl_bool is not guaranteed to be the same size as the bool in kernels. */ typedef cl_ulong cl_bitfield; typedef cl_bitfield cl_device_type; typedef cl_uint cl_platform_info; typedef cl_uint cl_device_info; typedef cl_bitfield cl_device_fp_config; typedef cl_uint cl_device_mem_cache_type; typedef cl_uint cl_device_local_mem_type; typedef cl_bitfield cl_device_exec_capabilities; typedef cl_bitfield cl_command_queue_properties; typedef intptr_t cl_device_partition_property; typedef cl_bitfield cl_device_affinity_domain; typedef intptr_t cl_context_properties; typedef cl_uint cl_context_info; typedef cl_uint cl_command_queue_info; typedef cl_uint cl_channel_order; typedef cl_uint cl_channel_type; typedef cl_bitfield cl_mem_flags; typedef cl_uint cl_mem_object_type; typedef cl_uint cl_mem_info; typedef cl_bitfield cl_mem_migration_flags; typedef cl_uint cl_image_info; typedef cl_uint cl_buffer_create_type; typedef cl_uint cl_addressing_mode; typedef cl_uint cl_filter_mode; typedef cl_uint cl_sampler_info; typedef cl_bitfield cl_map_flags; typedef cl_uint cl_program_info; typedef cl_uint cl_program_build_info; typedef cl_uint cl_program_binary_type; typedef cl_int cl_build_status; typedef cl_uint cl_kernel_info; typedef cl_uint cl_kernel_arg_info; typedef cl_uint cl_kernel_arg_address_qualifier; typedef cl_uint cl_kernel_arg_access_qualifier; typedef cl_bitfield cl_kernel_arg_type_qualifier; typedef cl_uint cl_kernel_work_group_info; typedef cl_uint cl_event_info; typedef cl_uint cl_command_type; typedef cl_uint cl_profiling_info; typedef struct _cl_image_format { cl_channel_order image_channel_order; cl_channel_type image_channel_data_type; } cl_image_format; typedef struct _cl_image_desc { cl_mem_object_type image_type; size_t image_width; size_t image_height; size_t image_depth; size_t image_array_size; size_t image_row_pitch; size_t image_slice_pitch; cl_uint num_mip_levels; cl_uint num_samples; cl_mem buffer; } cl_image_desc; typedef struct _cl_buffer_region { size_t origin; size_t size; } cl_buffer_region; /******************************************************************************/ /* Error Codes */ #define CL_SUCCESS 0 #define CL_DEVICE_NOT_FOUND -1 #define CL_DEVICE_NOT_AVAILABLE -2 #define CL_COMPILER_NOT_AVAILABLE -3 #define CL_MEM_OBJECT_ALLOCATION_FAILURE -4 #define CL_OUT_OF_RESOURCES -5 #define CL_OUT_OF_HOST_MEMORY -6 #define CL_PROFILING_INFO_NOT_AVAILABLE -7 #define CL_MEM_COPY_OVERLAP -8 #define CL_IMAGE_FORMAT_MISMATCH -9 #define CL_IMAGE_FORMAT_NOT_SUPPORTED -10 #define CL_BUILD_PROGRAM_FAILURE -11 #define CL_MAP_FAILURE -12 #define CL_MISALIGNED_SUB_BUFFER_OFFSET -13 #define CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST -14 #define CL_COMPILE_PROGRAM_FAILURE -15 #define CL_LINKER_NOT_AVAILABLE -16 #define CL_LINK_PROGRAM_FAILURE -17 #define CL_DEVICE_PARTITION_FAILED -18 #define CL_KERNEL_ARG_INFO_NOT_AVAILABLE -19 #define CL_INVALID_VALUE -30 #define CL_INVALID_DEVICE_TYPE -31 #define CL_INVALID_PLATFORM -32 #define CL_INVALID_DEVICE -33 #define CL_INVALID_CONTEXT -34 #define CL_INVALID_QUEUE_PROPERTIES -35 #define CL_INVALID_COMMAND_QUEUE -36 #define CL_INVALID_HOST_PTR -37 #define CL_INVALID_MEM_OBJECT -38 #define CL_INVALID_IMAGE_FORMAT_DESCRIPTOR -39 #define CL_INVALID_IMAGE_SIZE -40 #define CL_INVALID_SAMPLER -41 #define CL_INVALID_BINARY -42 #define CL_INVALID_BUILD_OPTIONS -43 #define CL_INVALID_PROGRAM -44 #define CL_INVALID_PROGRAM_EXECUTABLE -45 #define CL_INVALID_KERNEL_NAME -46 #define CL_INVALID_KERNEL_DEFINITION -47 #define CL_INVALID_KERNEL -48 #define CL_INVALID_ARG_INDEX -49 #define CL_INVALID_ARG_VALUE -50 #define CL_INVALID_ARG_SIZE -51 #define CL_INVALID_KERNEL_ARGS -52 #define CL_INVALID_WORK_DIMENSION -53 #define CL_INVALID_WORK_GROUP_SIZE -54 #define CL_INVALID_WORK_ITEM_SIZE -55 #define CL_INVALID_GLOBAL_OFFSET -56 #define CL_INVALID_EVENT_WAIT_LIST -57 #define CL_INVALID_EVENT -58 #define CL_INVALID_OPERATION -59 #define CL_INVALID_GL_OBJECT -60 #define CL_INVALID_BUFFER_SIZE -61 #define CL_INVALID_MIP_LEVEL -62 #define CL_INVALID_GLOBAL_WORK_SIZE -63 #define CL_INVALID_PROPERTY -64 #define CL_INVALID_IMAGE_DESCRIPTOR -65 #define CL_INVALID_COMPILER_OPTIONS -66 #define CL_INVALID_LINKER_OPTIONS -67 #define CL_INVALID_DEVICE_PARTITION_COUNT -68 /* OpenCL Version */ #define CL_VERSION_1_0 1 #define CL_VERSION_1_1 1 #define CL_VERSION_1_2 1 /* cl_bool */ #define CL_FALSE 0 #define CL_TRUE 1 #define CL_BLOCKING CL_TRUE #define CL_NON_BLOCKING CL_FALSE /* cl_platform_info */ #define CL_PLATFORM_PROFILE 0x0900 #define CL_PLATFORM_VERSION 0x0901 #define CL_PLATFORM_NAME 0x0902 #define CL_PLATFORM_VENDOR 0x0903 #define CL_PLATFORM_EXTENSIONS 0x0904 /* cl_device_type - bitfield */ #define CL_DEVICE_TYPE_DEFAULT (1 << 0) #define CL_DEVICE_TYPE_CPU (1 << 1) #define CL_DEVICE_TYPE_GPU (1 << 2) #define CL_DEVICE_TYPE_ACCELERATOR (1 << 3) #define CL_DEVICE_TYPE_CUSTOM (1 << 4) #define CL_DEVICE_TYPE_ALL 0xFFFFFFFF /* cl_device_info */ #define CL_DEVICE_TYPE 0x1000 #define CL_DEVICE_VENDOR_ID 0x1001 #define CL_DEVICE_MAX_COMPUTE_UNITS 0x1002 #define CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 0x1003 #define CL_DEVICE_MAX_WORK_GROUP_SIZE 0x1004 #define CL_DEVICE_MAX_WORK_ITEM_SIZES 0x1005 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR 0x1006 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT 0x1007 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT 0x1008 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG 0x1009 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT 0x100A #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 0x100B #define CL_DEVICE_MAX_CLOCK_FREQUENCY 0x100C #define CL_DEVICE_ADDRESS_BITS 0x100D #define CL_DEVICE_MAX_READ_IMAGE_ARGS 0x100E #define CL_DEVICE_MAX_WRITE_IMAGE_ARGS 0x100F #define CL_DEVICE_MAX_MEM_ALLOC_SIZE 0x1010 #define CL_DEVICE_IMAGE2D_MAX_WIDTH 0x1011 #define CL_DEVICE_IMAGE2D_MAX_HEIGHT 0x1012 #define CL_DEVICE_IMAGE3D_MAX_WIDTH 0x1013 #define CL_DEVICE_IMAGE3D_MAX_HEIGHT 0x1014 #define CL_DEVICE_IMAGE3D_MAX_DEPTH 0x1015 #define CL_DEVICE_IMAGE_SUPPORT 0x1016 #define CL_DEVICE_MAX_PARAMETER_SIZE 0x1017 #define CL_DEVICE_MAX_SAMPLERS 0x1018 #define CL_DEVICE_MEM_BASE_ADDR_ALIGN 0x1019 #define CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE 0x101A #define CL_DEVICE_SINGLE_FP_CONFIG 0x101B #define CL_DEVICE_GLOBAL_MEM_CACHE_TYPE 0x101C #define CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE 0x101D #define CL_DEVICE_GLOBAL_MEM_CACHE_SIZE 0x101E #define CL_DEVICE_GLOBAL_MEM_SIZE 0x101F #define CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE 0x1020 #define CL_DEVICE_MAX_CONSTANT_ARGS 0x1021 #define CL_DEVICE_LOCAL_MEM_TYPE 0x1022 #define CL_DEVICE_LOCAL_MEM_SIZE 0x1023 #define CL_DEVICE_ERROR_CORRECTION_SUPPORT 0x1024 #define CL_DEVICE_PROFILING_TIMER_RESOLUTION 0x1025 #define CL_DEVICE_ENDIAN_LITTLE 0x1026 #define CL_DEVICE_AVAILABLE 0x1027 #define CL_DEVICE_COMPILER_AVAILABLE 0x1028 #define CL_DEVICE_EXECUTION_CAPABILITIES 0x1029 #define CL_DEVICE_QUEUE_PROPERTIES 0x102A #define CL_DEVICE_NAME 0x102B #define CL_DEVICE_VENDOR 0x102C #define CL_DRIVER_VERSION 0x102D #define CL_DEVICE_PROFILE 0x102E #define CL_DEVICE_VERSION 0x102F #define CL_DEVICE_EXTENSIONS 0x1030 #define CL_DEVICE_PLATFORM 0x1031 #define CL_DEVICE_DOUBLE_FP_CONFIG 0x1032 #define CL_DEVICE_HALF_FP_CONFIG 0x1033 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF 0x1034 #define CL_DEVICE_HOST_UNIFIED_MEMORY 0x1035 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR 0x1036 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT 0x1037 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_INT 0x1038 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG 0x1039 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT 0x103A #define CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE 0x103B #define CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF 0x103C #define CL_DEVICE_OPENCL_C_VERSION 0x103D #define CL_DEVICE_LINKER_AVAILABLE 0x103E #define CL_DEVICE_BUILT_IN_KERNELS 0x103F #define CL_DEVICE_IMAGE_MAX_BUFFER_SIZE 0x1040 #define CL_DEVICE_IMAGE_MAX_ARRAY_SIZE 0x1041 #define CL_DEVICE_PARENT_DEVICE 0x1042 #define CL_DEVICE_PARTITION_MAX_SUB_DEVICES 0x1043 #define CL_DEVICE_PARTITION_PROPERTIES 0x1044 #define CL_DEVICE_PARTITION_AFFINITY_DOMAIN 0x1045 #define CL_DEVICE_PARTITION_TYPE 0x1046 #define CL_DEVICE_REFERENCE_COUNT 0x1047 #define CL_DEVICE_PREFERRED_INTEROP_USER_SYNC 0x1048 #define CL_DEVICE_PRINTF_BUFFER_SIZE 0x1049 #define CL_DEVICE_IMAGE_PITCH_ALIGNMENT 0x104A #define CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT 0x104B /* cl_device_fp_config - bitfield */ #define CL_FP_DENORM (1 << 0) #define CL_FP_INF_NAN (1 << 1) #define CL_FP_ROUND_TO_NEAREST (1 << 2) #define CL_FP_ROUND_TO_ZERO (1 << 3) #define CL_FP_ROUND_TO_INF (1 << 4) #define CL_FP_FMA (1 << 5) #define CL_FP_SOFT_FLOAT (1 << 6) #define CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT (1 << 7) /* cl_device_mem_cache_type */ #define CL_NONE 0x0 #define CL_READ_ONLY_CACHE 0x1 #define CL_READ_WRITE_CACHE 0x2 /* cl_device_local_mem_type */ #define CL_LOCAL 0x1 #define CL_GLOBAL 0x2 /* cl_device_exec_capabilities - bitfield */ #define CL_EXEC_KERNEL (1 << 0) #define CL_EXEC_NATIVE_KERNEL (1 << 1) /* cl_command_queue_properties - bitfield */ #define CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE (1 << 0) #define CL_QUEUE_PROFILING_ENABLE (1 << 1) /* cl_context_info */ #define CL_CONTEXT_REFERENCE_COUNT 0x1080 #define CL_CONTEXT_DEVICES 0x1081 #define CL_CONTEXT_PROPERTIES 0x1082 #define CL_CONTEXT_NUM_DEVICES 0x1083 /* cl_context_properties */ #define CL_CONTEXT_PLATFORM 0x1084 #define CL_CONTEXT_INTEROP_USER_SYNC 0x1085 /* cl_device_partition_property */ #define CL_DEVICE_PARTITION_EQUALLY 0x1086 #define CL_DEVICE_PARTITION_BY_COUNTS 0x1087 #define CL_DEVICE_PARTITION_BY_COUNTS_LIST_END 0x0 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN 0x1088 /* cl_device_affinity_domain */ #define CL_DEVICE_AFFINITY_DOMAIN_NUMA (1 << 0) #define CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE (1 << 1) #define CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE (1 << 2) #define CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE (1 << 3) #define CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE (1 << 4) #define CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE (1 << 5) /* cl_command_queue_info */ #define CL_QUEUE_CONTEXT 0x1090 #define CL_QUEUE_DEVICE 0x1091 #define CL_QUEUE_REFERENCE_COUNT 0x1092 #define CL_QUEUE_PROPERTIES 0x1093 /* cl_mem_flags - bitfield */ #define CL_MEM_READ_WRITE (1 << 0) #define CL_MEM_WRITE_ONLY (1 << 1) #define CL_MEM_READ_ONLY (1 << 2) #define CL_MEM_USE_HOST_PTR (1 << 3) #define CL_MEM_ALLOC_HOST_PTR (1 << 4) #define CL_MEM_COPY_HOST_PTR (1 << 5) /* reserved (1 << 6) */ #define CL_MEM_HOST_WRITE_ONLY (1 << 7) #define CL_MEM_HOST_READ_ONLY (1 << 8) #define CL_MEM_HOST_NO_ACCESS (1 << 9) /* cl_mem_migration_flags - bitfield */ #define CL_MIGRATE_MEM_OBJECT_HOST (1 << 0) #define CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED (1 << 1) /* cl_channel_order */ #define CL_R 0x10B0 #define CL_A 0x10B1 #define CL_RG 0x10B2 #define CL_RA 0x10B3 #define CL_RGB 0x10B4 #define CL_RGBA 0x10B5 #define CL_BGRA 0x10B6 #define CL_ARGB 0x10B7 #define CL_INTENSITY 0x10B8 #define CL_LUMINANCE 0x10B9 #define CL_Rx 0x10BA #define CL_RGx 0x10BB #define CL_RGBx 0x10BC #define CL_DEPTH 0x10BD #define CL_DEPTH_STENCIL 0x10BE /* cl_channel_type */ #define CL_SNORM_INT8 0x10D0 #define CL_SNORM_INT16 0x10D1 #define CL_UNORM_INT8 0x10D2 #define CL_UNORM_INT16 0x10D3 #define CL_UNORM_SHORT_565 0x10D4 #define CL_UNORM_SHORT_555 0x10D5 #define CL_UNORM_INT_101010 0x10D6 #define CL_SIGNED_INT8 0x10D7 #define CL_SIGNED_INT16 0x10D8 #define CL_SIGNED_INT32 0x10D9 #define CL_UNSIGNED_INT8 0x10DA #define CL_UNSIGNED_INT16 0x10DB #define CL_UNSIGNED_INT32 0x10DC #define CL_HALF_FLOAT 0x10DD #define CL_FLOAT 0x10DE #define CL_UNORM_INT24 0x10DF /* cl_mem_object_type */ #define CL_MEM_OBJECT_BUFFER 0x10F0 #define CL_MEM_OBJECT_IMAGE2D 0x10F1 #define CL_MEM_OBJECT_IMAGE3D 0x10F2 #define CL_MEM_OBJECT_IMAGE2D_ARRAY 0x10F3 #define CL_MEM_OBJECT_IMAGE1D 0x10F4 #define CL_MEM_OBJECT_IMAGE1D_ARRAY 0x10F5 #define CL_MEM_OBJECT_IMAGE1D_BUFFER 0x10F6 /* cl_mem_info */ #define CL_MEM_TYPE 0x1100 #define CL_MEM_FLAGS 0x1101 #define CL_MEM_SIZE 0x1102 #define CL_MEM_HOST_PTR 0x1103 #define CL_MEM_MAP_COUNT 0x1104 #define CL_MEM_REFERENCE_COUNT 0x1105 #define CL_MEM_CONTEXT 0x1106 #define CL_MEM_ASSOCIATED_MEMOBJECT 0x1107 #define CL_MEM_OFFSET 0x1108 /* cl_image_info */ #define CL_IMAGE_FORMAT 0x1110 #define CL_IMAGE_ELEMENT_SIZE 0x1111 #define CL_IMAGE_ROW_PITCH 0x1112 #define CL_IMAGE_SLICE_PITCH 0x1113 #define CL_IMAGE_WIDTH 0x1114 #define CL_IMAGE_HEIGHT 0x1115 #define CL_IMAGE_DEPTH 0x1116 #define CL_IMAGE_ARRAY_SIZE 0x1117 #define CL_IMAGE_BUFFER 0x1118 #define CL_IMAGE_NUM_MIP_LEVELS 0x1119 #define CL_IMAGE_NUM_SAMPLES 0x111A /* cl_addressing_mode */ #define CL_ADDRESS_NONE 0x1130 #define CL_ADDRESS_CLAMP_TO_EDGE 0x1131 #define CL_ADDRESS_CLAMP 0x1132 #define CL_ADDRESS_REPEAT 0x1133 #define CL_ADDRESS_MIRRORED_REPEAT 0x1134 /* cl_filter_mode */ #define CL_FILTER_NEAREST 0x1140 #define CL_FILTER_LINEAR 0x1141 /* cl_sampler_info */ #define CL_SAMPLER_REFERENCE_COUNT 0x1150 #define CL_SAMPLER_CONTEXT 0x1151 #define CL_SAMPLER_NORMALIZED_COORDS 0x1152 #define CL_SAMPLER_ADDRESSING_MODE 0x1153 #define CL_SAMPLER_FILTER_MODE 0x1154 /* cl_map_flags - bitfield */ #define CL_MAP_READ (1 << 0) #define CL_MAP_WRITE (1 << 1) #define CL_MAP_WRITE_INVALIDATE_REGION (1 << 2) /* cl_program_info */ #define CL_PROGRAM_REFERENCE_COUNT 0x1160 #define CL_PROGRAM_CONTEXT 0x1161 #define CL_PROGRAM_NUM_DEVICES 0x1162 #define CL_PROGRAM_DEVICES 0x1163 #define CL_PROGRAM_SOURCE 0x1164 #define CL_PROGRAM_BINARY_SIZES 0x1165 #define CL_PROGRAM_BINARIES 0x1166 #define CL_PROGRAM_NUM_KERNELS 0x1167 #define CL_PROGRAM_KERNEL_NAMES 0x1168 /* cl_program_build_info */ #define CL_PROGRAM_BUILD_STATUS 0x1181 #define CL_PROGRAM_BUILD_OPTIONS 0x1182 #define CL_PROGRAM_BUILD_LOG 0x1183 #define CL_PROGRAM_BINARY_TYPE 0x1184 /* cl_program_binary_type */ #define CL_PROGRAM_BINARY_TYPE_NONE 0x0 #define CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT 0x1 #define CL_PROGRAM_BINARY_TYPE_LIBRARY 0x2 #define CL_PROGRAM_BINARY_TYPE_EXECUTABLE 0x4 /* cl_build_status */ #define CL_BUILD_SUCCESS 0 #define CL_BUILD_NONE -1 #define CL_BUILD_ERROR -2 #define CL_BUILD_IN_PROGRESS -3 /* cl_kernel_info */ #define CL_KERNEL_FUNCTION_NAME 0x1190 #define CL_KERNEL_NUM_ARGS 0x1191 #define CL_KERNEL_REFERENCE_COUNT 0x1192 #define CL_KERNEL_CONTEXT 0x1193 #define CL_KERNEL_PROGRAM 0x1194 #define CL_KERNEL_ATTRIBUTES 0x1195 /* cl_kernel_arg_info */ #define CL_KERNEL_ARG_ADDRESS_QUALIFIER 0x1196 #define CL_KERNEL_ARG_ACCESS_QUALIFIER 0x1197 #define CL_KERNEL_ARG_TYPE_NAME 0x1198 #define CL_KERNEL_ARG_TYPE_QUALIFIER 0x1199 #define CL_KERNEL_ARG_NAME 0x119A /* cl_kernel_arg_address_qualifier */ #define CL_KERNEL_ARG_ADDRESS_GLOBAL 0x119B #define CL_KERNEL_ARG_ADDRESS_LOCAL 0x119C #define CL_KERNEL_ARG_ADDRESS_CONSTANT 0x119D #define CL_KERNEL_ARG_ADDRESS_PRIVATE 0x119E /* cl_kernel_arg_access_qualifier */ #define CL_KERNEL_ARG_ACCESS_READ_ONLY 0x11A0 #define CL_KERNEL_ARG_ACCESS_WRITE_ONLY 0x11A1 #define CL_KERNEL_ARG_ACCESS_READ_WRITE 0x11A2 #define CL_KERNEL_ARG_ACCESS_NONE 0x11A3 /* cl_kernel_arg_type_qualifier */ #define CL_KERNEL_ARG_TYPE_NONE 0 #define CL_KERNEL_ARG_TYPE_CONST (1 << 0) #define CL_KERNEL_ARG_TYPE_RESTRICT (1 << 1) #define CL_KERNEL_ARG_TYPE_VOLATILE (1 << 2) /* cl_kernel_work_group_info */ #define CL_KERNEL_WORK_GROUP_SIZE 0x11B0 #define CL_KERNEL_COMPILE_WORK_GROUP_SIZE 0x11B1 #define CL_KERNEL_LOCAL_MEM_SIZE 0x11B2 #define CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE 0x11B3 #define CL_KERNEL_PRIVATE_MEM_SIZE 0x11B4 #define CL_KERNEL_GLOBAL_WORK_SIZE 0x11B5 /* cl_event_info */ #define CL_EVENT_COMMAND_QUEUE 0x11D0 #define CL_EVENT_COMMAND_TYPE 0x11D1 #define CL_EVENT_REFERENCE_COUNT 0x11D2 #define CL_EVENT_COMMAND_EXECUTION_STATUS 0x11D3 #define CL_EVENT_CONTEXT 0x11D4 /* cl_command_type */ #define CL_COMMAND_NDRANGE_KERNEL 0x11F0 #define CL_COMMAND_TASK 0x11F1 #define CL_COMMAND_NATIVE_KERNEL 0x11F2 #define CL_COMMAND_READ_BUFFER 0x11F3 #define CL_COMMAND_WRITE_BUFFER 0x11F4 #define CL_COMMAND_COPY_BUFFER 0x11F5 #define CL_COMMAND_READ_IMAGE 0x11F6 #define CL_COMMAND_WRITE_IMAGE 0x11F7 #define CL_COMMAND_COPY_IMAGE 0x11F8 #define CL_COMMAND_COPY_IMAGE_TO_BUFFER 0x11F9 #define CL_COMMAND_COPY_BUFFER_TO_IMAGE 0x11FA #define CL_COMMAND_MAP_BUFFER 0x11FB #define CL_COMMAND_MAP_IMAGE 0x11FC #define CL_COMMAND_UNMAP_MEM_OBJECT 0x11FD #define CL_COMMAND_MARKER 0x11FE #define CL_COMMAND_ACQUIRE_GL_OBJECTS 0x11FF #define CL_COMMAND_RELEASE_GL_OBJECTS 0x1200 #define CL_COMMAND_READ_BUFFER_RECT 0x1201 #define CL_COMMAND_WRITE_BUFFER_RECT 0x1202 #define CL_COMMAND_COPY_BUFFER_RECT 0x1203 #define CL_COMMAND_USER 0x1204 #define CL_COMMAND_BARRIER 0x1205 #define CL_COMMAND_MIGRATE_MEM_OBJECTS 0x1206 #define CL_COMMAND_FILL_BUFFER 0x1207 #define CL_COMMAND_FILL_IMAGE 0x1208 /* command execution status */ #define CL_COMPLETE 0x0 #define CL_RUNNING 0x1 #define CL_SUBMITTED 0x2 #define CL_QUEUED 0x3 /* cl_buffer_create_type */ #define CL_BUFFER_CREATE_TYPE_REGION 0x1220 /* cl_profiling_info */ #define CL_PROFILING_COMMAND_QUEUED 0x1280 #define CL_PROFILING_COMMAND_SUBMIT 0x1281 #define CL_PROFILING_COMMAND_START 0x1282 #define CL_PROFILING_COMMAND_END 0x1283 /********************************************************************************************************/ /* Platform API */ extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformIDs(cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformInfo(cl_platform_id /* platform */, cl_platform_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Device APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDs(cl_platform_id /* platform */, cl_device_type /* device_type */, cl_uint /* num_entries */, cl_device_id * /* devices */, cl_uint * /* num_devices */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceInfo(cl_device_id /* device */, cl_device_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevices(cl_device_id /* in_device */, const cl_device_partition_property * /* properties */, cl_uint /* num_devices */, cl_device_id * /* out_devices */, cl_uint * /* num_devices_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDevice(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDevice(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; /* Context APIs */ extern CL_API_ENTRY cl_context CL_API_CALL clCreateContext(const cl_context_properties * /* properties */, cl_uint /* num_devices */, const cl_device_id * /* devices */, void (CL_CALLBACK * /* pfn_notify */)(const char *, const void *, size_t, void *), void * /* user_data */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_context CL_API_CALL clCreateContextFromType(const cl_context_properties * /* properties */, cl_device_type /* device_type */, void (CL_CALLBACK * /* pfn_notify*/ )(const char *, const void *, size_t, void *), void * /* user_data */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainContext(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseContext(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetContextInfo(cl_context /* context */, cl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Command Queue APIs */ extern CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueue(cl_context /* context */, cl_device_id /* device */, cl_command_queue_properties /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainCommandQueue(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseCommandQueue(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetCommandQueueInfo(cl_command_queue /* command_queue */, cl_command_queue_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Memory Object APIs */ extern CL_API_ENTRY cl_mem CL_API_CALL clCreateBuffer(cl_context /* context */, cl_mem_flags /* flags */, size_t /* size */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateSubBuffer(cl_mem /* buffer */, cl_mem_flags /* flags */, cl_buffer_create_type /* buffer_create_type */, const void * /* buffer_create_info */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateImage(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, const cl_image_desc * /* image_desc */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainMemObject(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseMemObject(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSupportedImageFormats(cl_context /* context */, cl_mem_flags /* flags */, cl_mem_object_type /* image_type */, cl_uint /* num_entries */, cl_image_format * /* image_formats */, cl_uint * /* num_image_formats */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectInfo(cl_mem /* memobj */, cl_mem_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetImageInfo(cl_mem /* image */, cl_image_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetMemObjectDestructorCallback( cl_mem /* memobj */, void (CL_CALLBACK * /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/), void * /*user_data */ ) CL_API_SUFFIX__VERSION_1_1; /* Sampler APIs */ extern CL_API_ENTRY cl_sampler CL_API_CALL clCreateSampler(cl_context /* context */, cl_bool /* normalized_coords */, cl_addressing_mode /* addressing_mode */, cl_filter_mode /* filter_mode */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainSampler(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseSampler(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSamplerInfo(cl_sampler /* sampler */, cl_sampler_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Program Object APIs */ extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithSource(cl_context /* context */, cl_uint /* count */, const char ** /* strings */, const size_t * /* lengths */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBinary(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const size_t * /* lengths */, const unsigned char ** /* binaries */, cl_int * /* binary_status */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBuiltInKernels(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* kernel_names */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainProgram(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseProgram(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clBuildProgram(cl_program /* program */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCompileProgram(cl_program /* program */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, cl_uint /* num_input_headers */, const cl_program * /* input_headers */, const char ** /* header_include_names */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_program CL_API_CALL clLinkProgram(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, cl_uint /* num_input_programs */, const cl_program * /* input_programs */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */, cl_int * /* errcode_ret */ ) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clUnloadPlatformCompiler(cl_platform_id /* platform */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramInfo(cl_program /* program */, cl_program_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramBuildInfo(cl_program /* program */, cl_device_id /* device */, cl_program_build_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Kernel Object APIs */ extern CL_API_ENTRY cl_kernel CL_API_CALL clCreateKernel(cl_program /* program */, const char * /* kernel_name */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateKernelsInProgram(cl_program /* program */, cl_uint /* num_kernels */, cl_kernel * /* kernels */, cl_uint * /* num_kernels_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainKernel(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseKernel(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArg(cl_kernel /* kernel */, cl_uint /* arg_index */, size_t /* arg_size */, const void * /* arg_value */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelInfo(cl_kernel /* kernel */, cl_kernel_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelArgInfo(cl_kernel /* kernel */, cl_uint /* arg_indx */, cl_kernel_arg_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelWorkGroupInfo(cl_kernel /* kernel */, cl_device_id /* device */, cl_kernel_work_group_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Event Object APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clWaitForEvents(cl_uint /* num_events */, const cl_event * /* event_list */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetEventInfo(cl_event /* event */, cl_event_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_event CL_API_CALL clCreateUserEvent(cl_context /* context */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainEvent(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseEvent(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetUserEventStatus(cl_event /* event */, cl_int /* execution_status */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clSetEventCallback( cl_event /* event */, cl_int /* command_exec_callback_type */, void (CL_CALLBACK * /* pfn_notify */)(cl_event, cl_int, void *), void * /* user_data */) CL_API_SUFFIX__VERSION_1_1; /* Profiling APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetEventProfilingInfo(cl_event /* event */, cl_profiling_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Flush and Finish APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clFlush(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clFinish(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; /* Enqueued Commands APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, size_t /* offset */, size_t /* size */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBufferRect(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, const size_t * /* buffer_offset */, const size_t * /* host_offset */, const size_t * /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, size_t /* offset */, size_t /* size */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBufferRect(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, const size_t * /* buffer_offset */, const size_t * /* host_offset */, const size_t * /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, const void * /* pattern */, size_t /* pattern_size */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBuffer(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, size_t /* src_offset */, size_t /* dst_offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferRect(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, const size_t * /* src_origin */, const size_t * /* dst_origin */, const size_t * /* region */, size_t /* src_row_pitch */, size_t /* src_slice_pitch */, size_t /* dst_row_pitch */, size_t /* dst_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_read */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t /* row_pitch */, size_t /* slice_pitch */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_write */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t /* input_row_pitch */, size_t /* input_slice_pitch */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillImage(cl_command_queue /* command_queue */, cl_mem /* image */, const void * /* fill_color */, const size_t * /* origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImage(cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_image */, const size_t * /* src_origin[3] */, const size_t * /* dst_origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImageToBuffer(cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_buffer */, const size_t * /* src_origin[3] */, const size_t * /* region[3] */, size_t /* dst_offset */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferToImage(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_image */, size_t /* src_offset */, const size_t * /* dst_origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t * /* image_row_pitch */, size_t * /* image_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueUnmapMemObject(cl_command_queue /* command_queue */, cl_mem /* memobj */, void * /* mapped_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMigrateMemObjects(cl_command_queue /* command_queue */, cl_uint /* num_mem_objects */, const cl_mem * /* mem_objects */, cl_mem_migration_flags /* flags */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNDRangeKernel(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* work_dim */, const size_t * /* global_work_offset */, const size_t * /* global_work_size */, const size_t * /* local_work_size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueTask(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNativeKernel(cl_command_queue /* command_queue */, void (CL_CALLBACK * /*user_func*/)(void *), void * /* args */, size_t /* cb_args */, cl_uint /* num_mem_objects */, const cl_mem * /* mem_list */, const void ** /* args_mem_loc */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarkerWithWaitList(cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrierWithWaitList(cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; /* Extension function access * * Returns the extension function address for the given function name, * or NULL if a valid function can not be found. The client must * check to make sure the address is not NULL, before using or * calling the returned function address. */ extern CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddressForPlatform(cl_platform_id /* platform */, const char * /* func_name */) CL_API_SUFFIX__VERSION_1_2; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage2D(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_row_pitch */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage3D(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_depth */, size_t /* image_row_pitch */, size_t /* image_slice_pitch */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueMarker(cl_command_queue /* command_queue */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueWaitForEvents(cl_command_queue /* command_queue */, cl_uint /* num_events */, const cl_event * /* event_list */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueBarrier(cl_command_queue /* command_queue */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clUnloadCompiler(void) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED void * CL_API_CALL clGetExtensionFunctionAddress(const char * /* func_name */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl.hpp000066400000000000000000011076001450307266000227360ustar00rootroot00000000000000/* Modifications Copyright(C)[2021-2022] Advanced Micro Devices, Inc. * All rights reserved. * */ /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /*! \file * * \brief C++ bindings for OpenCL 1.0 (rev 48), OpenCL 1.1 (rev 33) and * OpenCL 1.2 (rev 15) * \author Benedict R. Gaster, Laurent Morichetti and Lee Howes * * Additions and fixes from: * Brian Cole, March 3rd 2010 and April 2012 * Matt Gruenke, April 2012. * Bruce Merry, February 2013. * Tom Deakin and Simon McIntosh-Smith, July 2013 * * \version 1.2.9 * \date December 2015 * * Optional extension support * * cl * cl_ext_device_fission * #define USE_CL_DEVICE_FISSION */ /*! \mainpage * \section intro Introduction * For many large applications C++ is the language of choice and so it seems * reasonable to define C++ bindings for OpenCL. * * * The interface is contained with a single C++ header file \em cl.hpp and all * definitions are contained within the namespace \em cl. There is no additional * requirement to include \em cl.h and to use either the C++ or original C * bindings it is enough to simply include \em cl.hpp. * * The bindings themselves are lightweight and correspond closely to the * underlying C API. Using the C++ bindings introduces no additional execution * overhead. * * For detail documentation on the bindings see: * * The OpenCL C++ Wrapper API 1.2 (revision 09) * http://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.2.pdf * * \section example Example * * The following example shows a general use case for the C++ * bindings, including support for the optional exception feature and * also the supplied vector and string classes, see following sections for * decriptions of these features. * * \code * #define __CL_ENABLE_EXCEPTIONS * * #if defined(__APPLE__) || defined(__MACOSX) * #include * #else * #include * #endif * #include * #include * #include * * const char * helloStr = "__kernel void " * "hello(void) " * "{ " * " " * "} "; * * int * main(void) * { * cl_int err = CL_SUCCESS; * try { * * std::vector platforms; * cl::Platform::get(&platforms); * if (platforms.size() == 0) { * std::cout << "Platform size 0\n"; * return -1; * } * * cl_context_properties properties[] = * { CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[0])(), 0}; * cl::Context context(CL_DEVICE_TYPE_CPU, properties); * * std::vector devices = context.getInfo(); * * cl::Program::Sources source(1, * std::make_pair(helloStr,strlen(helloStr))); * cl::Program program_ = cl::Program(context, source); * program_.build(devices); * * cl::Kernel kernel(program_, "hello", &err); * * cl::Event event; * cl::CommandQueue queue(context, devices[0], 0, &err); * queue.enqueueNDRangeKernel( * kernel, * cl::NullRange, * cl::NDRange(4,4), * cl::NullRange, * NULL, * &event); * * event.wait(); * } * catch (cl::Error err) { * std::cerr * << "ERROR: " * << err.what() * << "(" * << err.err() * << ")" * << std::endl; * } * * return EXIT_SUCCESS; * } * * \endcode * */ #ifndef CL_HPP_ #define CL_HPP_ #ifdef _WIN32 #include #if defined(USE_DX_INTEROP) #include #include #endif #endif // _WIN32 #if defined(_MSC_VER) #include #endif // _MSC_VER // #if defined(USE_CL_DEVICE_FISSION) #include #endif #if defined(__APPLE__) || defined(__MACOSX) #include #else #include #endif // !__APPLE__ #if (_MSC_VER >= 1700) || (__cplusplus >= 201103L) #define CL_HPP_RVALUE_REFERENCES_SUPPORTED #define CL_HPP_CPP11_ATOMICS_SUPPORTED #include #endif #if (__cplusplus >= 201103L) #define CL_HPP_NOEXCEPT noexcept #else #define CL_HPP_NOEXCEPT #endif // To avoid accidentally taking ownership of core OpenCL types // such as cl_kernel constructors are made explicit // under OpenCL 1.2 #if defined(CL_VERSION_1_2) && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __CL_EXPLICIT_CONSTRUCTORS explicit #else // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __CL_EXPLICIT_CONSTRUCTORS #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Define deprecated prefixes and suffixes to ensure compilation // in case they are not pre-defined #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_CALLBACK) #define CL_CALLBACK #endif //CL_CALLBACK #include #include #include #if defined(__CL_ENABLE_EXCEPTIONS) #include #endif // #if defined(__CL_ENABLE_EXCEPTIONS) #if !defined(__NO_STD_VECTOR) #include #endif #if !defined(__NO_STD_STRING) #include #endif #if defined(__ANDROID__) || defined(linux) || defined(__APPLE__) || defined(__MACOSX) #include #endif // linux #include /*! \namespace cl * * \brief The OpenCL C++ bindings are defined within this namespace. * */ namespace cl { class Memory; /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) #define __INIT_CL_EXT_FCN_PTR(name) \ if(!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddress(#name); \ if(!pfn_##name) { \ } \ } #endif // #if defined(CL_VERSION_1_1) #if defined(CL_VERSION_1_2) #define __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, name) \ if(!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddressForPlatform(platform, #name); \ if(!pfn_##name) { \ } \ } #endif // #if defined(CL_VERSION_1_1) class Program; class Device; class Context; class CommandQueue; class Memory; class Buffer; #if defined(__CL_ENABLE_EXCEPTIONS) /*! \brief Exception class * * This may be thrown by API functions when __CL_ENABLE_EXCEPTIONS is defined. */ class Error : public std::exception { private: cl_int err_; const char * errStr_; public: /*! \brief Create a new CL error exception for a given error code * and corresponding message. * * \param err error code value. * * \param errStr a descriptive string that must remain in scope until * handling of the exception has concluded. If set, it * will be returned by what(). */ Error(cl_int err, const char * errStr = NULL) : err_(err), errStr_(errStr) {} ~Error() throw() {} /*! \brief Get error string associated with exception * * \return A memory pointer to the error message string. */ virtual const char * what() const throw () { if (errStr_ == NULL) { return "empty"; } else { return errStr_; } } /*! \brief Get error code associated with exception * * \return The error code. */ cl_int err(void) const { return err_; } }; #define __ERR_STR(x) #x #else #define __ERR_STR(x) NULL #endif // __CL_ENABLE_EXCEPTIONS namespace detail { #if defined(__CL_ENABLE_EXCEPTIONS) static inline cl_int errHandler ( cl_int err, const char * errStr = NULL) { if (err != CL_SUCCESS) { throw Error(err, errStr); } return err; } #else static inline cl_int errHandler (cl_int err, const char * errStr = NULL) { (void) errStr; // suppress unused variable warning return err; } #endif // __CL_ENABLE_EXCEPTIONS } //! \cond DOXYGEN_DETAIL #if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS) #define __GET_DEVICE_INFO_ERR __ERR_STR(clGetDeviceInfo) #define __GET_PLATFORM_INFO_ERR __ERR_STR(clGetPlatformInfo) #define __GET_DEVICE_IDS_ERR __ERR_STR(clGetDeviceIDs) #define __GET_PLATFORM_IDS_ERR __ERR_STR(clGetPlatformIDs) #define __GET_CONTEXT_INFO_ERR __ERR_STR(clGetContextInfo) #define __GET_EVENT_INFO_ERR __ERR_STR(clGetEventInfo) #define __GET_EVENT_PROFILE_INFO_ERR __ERR_STR(clGetEventProfileInfo) #define __GET_MEM_OBJECT_INFO_ERR __ERR_STR(clGetMemObjectInfo) #define __GET_IMAGE_INFO_ERR __ERR_STR(clGetImageInfo) #define __GET_SAMPLER_INFO_ERR __ERR_STR(clGetSamplerInfo) #define __GET_KERNEL_INFO_ERR __ERR_STR(clGetKernelInfo) #if defined(CL_VERSION_1_2) #define __GET_KERNEL_ARG_INFO_ERR __ERR_STR(clGetKernelArgInfo) #endif // #if defined(CL_VERSION_1_2) #define __GET_KERNEL_WORK_GROUP_INFO_ERR __ERR_STR(clGetKernelWorkGroupInfo) #define __GET_PROGRAM_INFO_ERR __ERR_STR(clGetProgramInfo) #define __GET_PROGRAM_BUILD_INFO_ERR __ERR_STR(clGetProgramBuildInfo) #define __GET_COMMAND_QUEUE_INFO_ERR __ERR_STR(clGetCommandQueueInfo) #define __CREATE_CONTEXT_ERR __ERR_STR(clCreateContext) #define __CREATE_CONTEXT_FROM_TYPE_ERR __ERR_STR(clCreateContextFromType) #define __GET_SUPPORTED_IMAGE_FORMATS_ERR __ERR_STR(clGetSupportedImageFormats) #define __CREATE_BUFFER_ERR __ERR_STR(clCreateBuffer) #define __COPY_ERR __ERR_STR(cl::copy) #define __CREATE_SUBBUFFER_ERR __ERR_STR(clCreateSubBuffer) #define __CREATE_GL_BUFFER_ERR __ERR_STR(clCreateFromGLBuffer) #define __CREATE_GL_RENDER_BUFFER_ERR __ERR_STR(clCreateFromGLBuffer) #define __GET_GL_OBJECT_INFO_ERR __ERR_STR(clGetGLObjectInfo) #if defined(CL_VERSION_1_2) #define __CREATE_IMAGE_ERR __ERR_STR(clCreateImage) #define __CREATE_GL_TEXTURE_ERR __ERR_STR(clCreateFromGLTexture) #define __IMAGE_DIMENSION_ERR __ERR_STR(Incorrect image dimensions) #endif // #if defined(CL_VERSION_1_2) #define __CREATE_SAMPLER_ERR __ERR_STR(clCreateSampler) #define __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR __ERR_STR(clSetMemObjectDestructorCallback) #define __CREATE_USER_EVENT_ERR __ERR_STR(clCreateUserEvent) #define __SET_USER_EVENT_STATUS_ERR __ERR_STR(clSetUserEventStatus) #define __SET_EVENT_CALLBACK_ERR __ERR_STR(clSetEventCallback) #define __WAIT_FOR_EVENTS_ERR __ERR_STR(clWaitForEvents) #define __CREATE_KERNEL_ERR __ERR_STR(clCreateKernel) #define __SET_KERNEL_ARGS_ERR __ERR_STR(clSetKernelArg) #define __CREATE_PROGRAM_WITH_SOURCE_ERR __ERR_STR(clCreateProgramWithSource) #define __CREATE_PROGRAM_WITH_BINARY_ERR __ERR_STR(clCreateProgramWithBinary) #if defined(CL_VERSION_1_2) #define __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR __ERR_STR(clCreateProgramWithBuiltInKernels) #endif // #if defined(CL_VERSION_1_2) #define __BUILD_PROGRAM_ERR __ERR_STR(clBuildProgram) #if defined(CL_VERSION_1_2) #define __COMPILE_PROGRAM_ERR __ERR_STR(clCompileProgram) #define __LINK_PROGRAM_ERR __ERR_STR(clLinkProgram) #endif // #if defined(CL_VERSION_1_2) #define __CREATE_KERNELS_IN_PROGRAM_ERR __ERR_STR(clCreateKernelsInProgram) #define __CREATE_COMMAND_QUEUE_ERR __ERR_STR(clCreateCommandQueue) #define __SET_COMMAND_QUEUE_PROPERTY_ERR __ERR_STR(clSetCommandQueueProperty) #define __ENQUEUE_READ_BUFFER_ERR __ERR_STR(clEnqueueReadBuffer) #define __ENQUEUE_READ_BUFFER_RECT_ERR __ERR_STR(clEnqueueReadBufferRect) #define __ENQUEUE_WRITE_BUFFER_ERR __ERR_STR(clEnqueueWriteBuffer) #define __ENQUEUE_WRITE_BUFFER_RECT_ERR __ERR_STR(clEnqueueWriteBufferRect) #define __ENQEUE_COPY_BUFFER_ERR __ERR_STR(clEnqueueCopyBuffer) #define __ENQEUE_COPY_BUFFER_RECT_ERR __ERR_STR(clEnqueueCopyBufferRect) #define __ENQUEUE_FILL_BUFFER_ERR __ERR_STR(clEnqueueFillBuffer) #define __ENQUEUE_READ_IMAGE_ERR __ERR_STR(clEnqueueReadImage) #define __ENQUEUE_WRITE_IMAGE_ERR __ERR_STR(clEnqueueWriteImage) #define __ENQUEUE_COPY_IMAGE_ERR __ERR_STR(clEnqueueCopyImage) #define __ENQUEUE_FILL_IMAGE_ERR __ERR_STR(clEnqueueFillImage) #define __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR __ERR_STR(clEnqueueCopyImageToBuffer) #define __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR __ERR_STR(clEnqueueCopyBufferToImage) #define __ENQUEUE_MAP_BUFFER_ERR __ERR_STR(clEnqueueMapBuffer) #define __ENQUEUE_MAP_IMAGE_ERR __ERR_STR(clEnqueueMapImage) #define __ENQUEUE_UNMAP_MEM_OBJECT_ERR __ERR_STR(clEnqueueUnMapMemObject) #define __ENQUEUE_NDRANGE_KERNEL_ERR __ERR_STR(clEnqueueNDRangeKernel) #define __ENQUEUE_TASK_ERR __ERR_STR(clEnqueueTask) #define __ENQUEUE_NATIVE_KERNEL __ERR_STR(clEnqueueNativeKernel) #if defined(CL_VERSION_1_2) #define __ENQUEUE_MIGRATE_MEM_OBJECTS_ERR __ERR_STR(clEnqueueMigrateMemObjects) #endif // #if defined(CL_VERSION_1_2) #define __ENQUEUE_ACQUIRE_GL_ERR __ERR_STR(clEnqueueAcquireGLObjects) #define __ENQUEUE_RELEASE_GL_ERR __ERR_STR(clEnqueueReleaseGLObjects) #define __RETAIN_ERR __ERR_STR(Retain Object) #define __RELEASE_ERR __ERR_STR(Release Object) #define __FLUSH_ERR __ERR_STR(clFlush) #define __FINISH_ERR __ERR_STR(clFinish) #define __VECTOR_CAPACITY_ERR __ERR_STR(Vector capacity error) /** * CL 1.2 version that uses device fission. */ #if defined(CL_VERSION_1_2) #define __CREATE_SUB_DEVICES __ERR_STR(clCreateSubDevices) #else #define __CREATE_SUB_DEVICES __ERR_STR(clCreateSubDevicesEXT) #endif // #if defined(CL_VERSION_1_2) /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) #define __ENQUEUE_MARKER_ERR __ERR_STR(clEnqueueMarker) #define __ENQUEUE_WAIT_FOR_EVENTS_ERR __ERR_STR(clEnqueueWaitForEvents) #define __ENQUEUE_BARRIER_ERR __ERR_STR(clEnqueueBarrier) #define __UNLOAD_COMPILER_ERR __ERR_STR(clUnloadCompiler) #define __CREATE_GL_TEXTURE_2D_ERR __ERR_STR(clCreateFromGLTexture2D) #define __CREATE_GL_TEXTURE_3D_ERR __ERR_STR(clCreateFromGLTexture3D) #define __CREATE_IMAGE2D_ERR __ERR_STR(clCreateImage2D) #define __CREATE_IMAGE3D_ERR __ERR_STR(clCreateImage3D) #endif // #if defined(CL_VERSION_1_1) #endif // __CL_USER_OVERRIDE_ERROR_STRINGS //! \endcond /** * CL 1.2 marker and barrier commands */ #if defined(CL_VERSION_1_2) #define __ENQUEUE_MARKER_WAIT_LIST_ERR __ERR_STR(clEnqueueMarkerWithWaitList) #define __ENQUEUE_BARRIER_WAIT_LIST_ERR __ERR_STR(clEnqueueBarrierWithWaitList) #endif // #if defined(CL_VERSION_1_2) #if !defined(__USE_DEV_STRING) && !defined(__NO_STD_STRING) typedef std::string STRING_CLASS; #elif !defined(__USE_DEV_STRING) /*! \class string * \brief Simple string class, that provides a limited subset of std::string * functionality but avoids many of the issues that come with that class. * \note Deprecated. Please use std::string as default or * re-define the string class to match the std::string * interface by defining STRING_CLASS */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED string { private: ::size_t size_; char * str_; public: //! \brief Constructs an empty string, allocating no memory. string(void) : size_(0), str_(NULL) { } /*! \brief Constructs a string populated from an arbitrary value of * specified size. * * An extra '\0' is added, in case none was contained in str. * * \param str the initial value of the string instance. Note that '\0' * characters receive no special treatment. If NULL, * the string is left empty, with a size of 0. * * \param size the number of characters to copy from str. */ string(const char * str, ::size_t size) : size_(size), str_(NULL) { if( size > 0 ) { str_ = new char[size_+1]; if (str_ != NULL) { memcpy(str_, str, size_ * sizeof(char)); str_[size_] = '\0'; } else { size_ = 0; } } } /*! \brief Constructs a string populated from a null-terminated value. * * \param str the null-terminated initial value of the string instance. * If NULL, the string is left empty, with a size of 0. */ string(const char * str) : size_(0), str_(NULL) { if( str ) { size_= ::strlen(str); } if( size_ > 0 ) { str_ = new char[size_ + 1]; if (str_ != NULL) { memcpy(str_, str, (size_ + 1) * sizeof(char)); } } } void resize( ::size_t n ) { if( size_ == n ) { return; } if (n == 0) { if( str_ ) { delete [] str_; } str_ = NULL; size_ = 0; } else { char *newString = new char[n + 1]; ::size_t copySize = n; if( size_ < n ) { copySize = size_; } size_ = n; if(str_) { memcpy(newString, str_, (copySize + 1) * sizeof(char)); } if( copySize < size_ ) { memset(newString + copySize, 0, size_ - copySize); } newString[size_] = '\0'; delete [] str_; str_ = newString; } } const char& operator[] ( ::size_t pos ) const { return str_[pos]; } char& operator[] ( ::size_t pos ) { return str_[pos]; } /*! \brief Copies the value of another string to this one. * * \param rhs the string to copy. * * \returns a reference to the modified instance. */ string& operator=(const string& rhs) { if (this == &rhs) { return *this; } if( str_ != NULL ) { delete [] str_; str_ = NULL; size_ = 0; } if (rhs.size_ == 0 || rhs.str_ == NULL) { str_ = NULL; size_ = 0; } else { str_ = new char[rhs.size_ + 1]; size_ = rhs.size_; if (str_ != NULL) { memcpy(str_, rhs.str_, (size_ + 1) * sizeof(char)); } else { size_ = 0; } } return *this; } /*! \brief Constructs a string by copying the value of another instance. * * \param rhs the string to copy. */ string(const string& rhs) : size_(0), str_(NULL) { *this = rhs; } //! \brief Destructor - frees memory used to hold the current value. ~string() { delete[] str_; str_ = NULL; } //! \brief Queries the length of the string, excluding any added '\0's. ::size_t size(void) const { return size_; } //! \brief Queries the length of the string, excluding any added '\0's. ::size_t length(void) const { return size(); } /*! \brief Returns a pointer to the private copy held by this instance, * or "" if empty/unset. */ const char * c_str(void) const { return (str_) ? str_ : "";} } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef cl::string STRING_CLASS; #endif // #elif !defined(__USE_DEV_STRING) #if !defined(__USE_DEV_VECTOR) && !defined(__NO_STD_VECTOR) #define VECTOR_CLASS std::vector #elif !defined(__USE_DEV_VECTOR) #define VECTOR_CLASS cl::vector #if !defined(__MAX_DEFAULT_VECTOR_SIZE) #define __MAX_DEFAULT_VECTOR_SIZE 10 #endif /*! \class vector * \brief Fixed sized vector implementation that mirroring * * \note Deprecated. Please use std::vector as default or * re-define the vector class to match the std::vector * interface by defining VECTOR_CLASS * \note Not recommended for use with custom objects as * current implementation will construct N elements * * std::vector functionality. * \brief Fixed sized vector compatible with std::vector. * * \note * This differs from std::vector<> not just in memory allocation, * but also in terms of when members are constructed, destroyed, * and assigned instead of being copy constructed. * * \param T type of element contained in the vector. * * \param N maximum size of the vector. */ template class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED vector { private: T data_[N]; unsigned int size_; public: //! \brief Constructs an empty vector with no memory allocated. vector() : size_(static_cast(0)) {} //! \brief Deallocates the vector's memory and destroys all of its elements. ~vector() { clear(); } //! \brief Returns the number of elements currently contained. unsigned int size(void) const { return size_; } /*! \brief Empties the vector of all elements. * \note * This does not deallocate memory but will invoke destructors * on contained elements. */ void clear() { while(!empty()) { pop_back(); } } /*! \brief Appends an element after the last valid element. * Calling this on a vector that has reached capacity will throw an * exception if exceptions are enabled. */ void push_back (const T& x) { if (size() < N) { new (&data_[size_]) T(x); size_++; } else { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } } /*! \brief Removes the last valid element from the vector. * Calling this on an empty vector will throw an exception * if exceptions are enabled. */ void pop_back(void) { if (size_ != 0) { --size_; data_[size_].~T(); } else { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } } /*! \brief Constructs with a value copied from another. * * \param vec the vector to copy. */ vector(const vector& vec) : size_(vec.size_) { if (size_ != 0) { assign(vec.begin(), vec.end()); } } /*! \brief Constructs with a specified number of initial elements. * * \param size number of initial elements. * * \param val value of initial elements. */ vector(unsigned int size, const T& val = T()) : size_(0) { for (unsigned int i = 0; i < size; i++) { push_back(val); } } /*! \brief Overwrites the current content with that copied from another * instance. * * \param rhs vector to copy. * * \returns a reference to this. */ vector& operator=(const vector& rhs) { if (this == &rhs) { return *this; } if (rhs.size_ != 0) { assign(rhs.begin(), rhs.end()); } else { clear(); } return *this; } /*! \brief Tests equality against another instance. * * \param vec the vector against which to compare. */ bool operator==(vector &vec) { if (size() != vec.size()) { return false; } for( unsigned int i = 0; i < size(); ++i ) { if( operator[](i) != vec[i] ) { return false; } } return true; } //! \brief Conversion operator to T*. operator T* () { return data_; } //! \brief Conversion operator to const T*. operator const T* () const { return data_; } //! \brief Tests whether this instance has any elements. bool empty (void) const { return size_==0; } //! \brief Returns the maximum number of elements this instance can hold. unsigned int max_size (void) const { return N; } //! \brief Returns the maximum number of elements this instance can hold. unsigned int capacity () const { return N; } //! \brief Resizes the vector to the given size void resize(unsigned int newSize, T fill = T()) { if (newSize > N) { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } else { while (size_ < newSize) { new (&data_[size_]) T(fill); size_++; } while (size_ > newSize) { --size_; data_[size_].~T(); } } } /*! \brief Returns a reference to a given element. * * \param index which element to access. * * \note * The caller is responsible for ensuring index is >= 0 and < size(). */ T& operator[](int index) { return data_[index]; } /*! \brief Returns a const reference to a given element. * * \param index which element to access. * * \note * The caller is responsible for ensuring index is >= 0 and < size(). */ const T& operator[](int index) const { return data_[index]; } /*! \brief Assigns elements of the vector based on a source iterator range. * * \param start Beginning iterator of source range * \param end Enditerator of source range * * \note * Will throw an exception if exceptions are enabled and size exceeded. */ template void assign(I start, I end) { clear(); while(start != end) { push_back(*start); start++; } } /*! \class iterator * \brief Const iterator class for vectors */ class iterator { private: const vector *vec_; int index_; /** * Internal iterator constructor to capture reference * to the vector it iterates over rather than taking * the vector by copy. */ iterator (const vector &vec, int index) : vec_(&vec) { if( !vec.empty() ) { index_ = index; } else { index_ = -1; } } public: iterator(void) : index_(-1), vec_(NULL) { } iterator(const iterator& rhs) : vec_(rhs.vec_), index_(rhs.index_) { } ~iterator(void) {} static iterator begin(const cl::vector &vec) { iterator i(vec, 0); return i; } static iterator end(const cl::vector &vec) { iterator i(vec, vec.size()); return i; } bool operator==(iterator i) { return ((vec_ == i.vec_) && (index_ == i.index_)); } bool operator!=(iterator i) { return (!(*this==i)); } iterator& operator++() { ++index_; return *this; } iterator operator++(int) { iterator retVal(*this); ++index_; return retVal; } iterator& operator--() { --index_; return *this; } iterator operator--(int) { iterator retVal(*this); --index_; return retVal; } const T& operator *() const { return (*vec_)[index_]; } }; iterator begin(void) { return iterator::begin(*this); } iterator begin(void) const { return iterator::begin(*this); } iterator end(void) { return iterator::end(*this); } iterator end(void) const { return iterator::end(*this); } T& front(void) { return data_[0]; } T& back(void) { return data_[size_]; } const T& front(void) const { return data_[0]; } const T& back(void) const { return data_[size_-1]; } } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; #endif // #if !defined(__USE_DEV_VECTOR) && !defined(__NO_STD_VECTOR) namespace detail { #define __DEFAULT_NOT_INITIALIZED 1 #define __DEFAULT_BEING_INITIALIZED 2 #define __DEFAULT_INITIALIZED 4 /* * Compare and exchange primitives are needed for handling of defaults */ #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED inline int compare_exchange(std::atomic * dest, int exchange, int comparand) #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED inline int compare_exchange(volatile int * dest, int exchange, int comparand) #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED { #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED std::atomic_compare_exchange_strong(dest, &comparand, exchange); return comparand; #elif _MSC_VER return (int)(_InterlockedCompareExchange( (volatile long*)dest, (long)exchange, (long)comparand)); #else // !_MSC_VER && !CL_HPP_CPP11_ATOMICS_SUPPORTED return (__sync_val_compare_and_swap( dest, comparand, exchange)); #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED } inline void fence() { #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED std::atomic_thread_fence(std::memory_order_seq_cst); #elif _MSC_VER // !CL_HPP_CPP11_ATOMICS_SUPPORTED _ReadWriteBarrier(); #else // !_MSC_VER && !CL_HPP_CPP11_ATOMICS_SUPPORTED __sync_synchronize(); #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED } } // namespace detail /*! \brief class used to interface between C++ and * OpenCL C calls that require arrays of size_t values, whose * size is known statically. */ template class size_t { private: ::size_t data_[N]; public: //! \brief Initialize size_t to all 0s size_t() { for( int i = 0; i < N; ++i ) { data_[i] = 0; } } ::size_t& operator[](int index) { return data_[index]; } const ::size_t& operator[](int index) const { return data_[index]; } //! \brief Conversion operator to T*. operator ::size_t* () { return data_; } //! \brief Conversion operator to const T*. operator const ::size_t* () const { return data_; } }; namespace detail { // Generic getInfoHelper. The final parameter is used to guide overload // resolution: the actual parameter passed is an int, which makes this // a worse conversion sequence than a specialization that declares the // parameter as an int. template inline cl_int getInfoHelper(Functor f, cl_uint name, T* param, long) { return f(name, sizeof(T), param, NULL); } // Specialized getInfoHelper for VECTOR_CLASS params template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, long) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } T* value = (T*) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } param->assign(&value[0], &value[required/sizeof(T)]); return CL_SUCCESS; } /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, int, typename T::cl_type = 0) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } typename T::cl_type * value = (typename T::cl_type *) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } ::size_t elements = required / sizeof(typename T::cl_type); param->assign(&value[0], &value[elements]); for (::size_t i = 0; i < elements; i++) { if (value[i] != NULL) { err = (*param)[i].retain(); if (err != CL_SUCCESS) { return err; } } } return CL_SUCCESS; } // Specialized for getInfo template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, int) { cl_int err = f(name, param->size() * sizeof(char *), &(*param)[0], NULL); if (err != CL_SUCCESS) { return err; } return CL_SUCCESS; } // Specialized GetInfoHelper for STRING_CLASS params template inline cl_int getInfoHelper(Func f, cl_uint name, STRING_CLASS* param, long) { #if defined(__NO_STD_VECTOR) || defined(__NO_STD_STRING) ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } char* value = (char*)alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; return CL_SUCCESS; #else ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } // std::string has a constant data member // a char vector does not VECTOR_CLASS value(required); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { param->assign(value.begin(), value.end()); } #endif return CL_SUCCESS; } // Specialized GetInfoHelper for cl::size_t params template inline cl_int getInfoHelper(Func f, cl_uint name, size_t* param, long) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } ::size_t* value = (::size_t*) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } for(int i = 0; i < N; ++i) { (*param)[i] = value[i]; } return CL_SUCCESS; } template struct ReferenceHandler; /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, T* param, int, typename T::cl_type = 0) { typename T::cl_type value; cl_int err = f(name, sizeof(value), &value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; if (value != NULL) { err = param->retain(); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } #define __PARAM_NAME_INFO_1_0(F) \ F(cl_platform_info, CL_PLATFORM_PROFILE, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_VERSION, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_NAME, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_VENDOR, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_EXTENSIONS, STRING_CLASS) \ \ F(cl_device_info, CL_DEVICE_TYPE, cl_device_type) \ F(cl_device_info, CL_DEVICE_VENDOR_ID, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_COMPUTE_UNITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE, ::size_t) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_SIZES, VECTOR_CLASS< ::size_t>) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_CLOCK_FREQUENCY, cl_uint) \ F(cl_device_info, CL_DEVICE_ADDRESS_BITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_READ_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_MEM_ALLOC_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_WIDTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_HEIGHT, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_WIDTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_HEIGHT, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_DEPTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_MAX_PARAMETER_SIZE, ::size_t) \ F(cl_device_info, CL_DEVICE_MAX_SAMPLERS, cl_uint) \ F(cl_device_info, CL_DEVICE_MEM_BASE_ADDR_ALIGN, cl_uint) \ F(cl_device_info, CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SINGLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, cl_device_mem_cache_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, cl_uint)\ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_TYPE, cl_device_local_mem_type) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_ERROR_CORRECTION_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_PROFILING_TIMER_RESOLUTION, ::size_t) \ F(cl_device_info, CL_DEVICE_ENDIAN_LITTLE, cl_bool) \ F(cl_device_info, CL_DEVICE_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_COMPILER_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_EXECUTION_CAPABILITIES, cl_device_exec_capabilities) \ F(cl_device_info, CL_DEVICE_QUEUE_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_PLATFORM, cl_platform_id) \ F(cl_device_info, CL_DEVICE_NAME, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_VENDOR, STRING_CLASS) \ F(cl_device_info, CL_DRIVER_VERSION, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_PROFILE, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_VERSION, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_EXTENSIONS, STRING_CLASS) \ \ F(cl_context_info, CL_CONTEXT_REFERENCE_COUNT, cl_uint) \ F(cl_context_info, CL_CONTEXT_DEVICES, VECTOR_CLASS) \ F(cl_context_info, CL_CONTEXT_PROPERTIES, VECTOR_CLASS) \ \ F(cl_event_info, CL_EVENT_COMMAND_QUEUE, cl::CommandQueue) \ F(cl_event_info, CL_EVENT_COMMAND_TYPE, cl_command_type) \ F(cl_event_info, CL_EVENT_REFERENCE_COUNT, cl_uint) \ F(cl_event_info, CL_EVENT_COMMAND_EXECUTION_STATUS, cl_int) \ \ F(cl_profiling_info, CL_PROFILING_COMMAND_QUEUED, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_SUBMIT, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_START, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_END, cl_ulong) \ \ F(cl_mem_info, CL_MEM_TYPE, cl_mem_object_type) \ F(cl_mem_info, CL_MEM_FLAGS, cl_mem_flags) \ F(cl_mem_info, CL_MEM_SIZE, ::size_t) \ F(cl_mem_info, CL_MEM_HOST_PTR, void*) \ F(cl_mem_info, CL_MEM_MAP_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_REFERENCE_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_CONTEXT, cl::Context) \ \ F(cl_image_info, CL_IMAGE_FORMAT, cl_image_format) \ F(cl_image_info, CL_IMAGE_ELEMENT_SIZE, ::size_t) \ F(cl_image_info, CL_IMAGE_ROW_PITCH, ::size_t) \ F(cl_image_info, CL_IMAGE_SLICE_PITCH, ::size_t) \ F(cl_image_info, CL_IMAGE_WIDTH, ::size_t) \ F(cl_image_info, CL_IMAGE_HEIGHT, ::size_t) \ F(cl_image_info, CL_IMAGE_DEPTH, ::size_t) \ \ F(cl_sampler_info, CL_SAMPLER_REFERENCE_COUNT, cl_uint) \ F(cl_sampler_info, CL_SAMPLER_CONTEXT, cl::Context) \ F(cl_sampler_info, CL_SAMPLER_NORMALIZED_COORDS, cl_bool) \ F(cl_sampler_info, CL_SAMPLER_ADDRESSING_MODE, cl_addressing_mode) \ F(cl_sampler_info, CL_SAMPLER_FILTER_MODE, cl_filter_mode) \ \ F(cl_program_info, CL_PROGRAM_REFERENCE_COUNT, cl_uint) \ F(cl_program_info, CL_PROGRAM_CONTEXT, cl::Context) \ F(cl_program_info, CL_PROGRAM_NUM_DEVICES, cl_uint) \ F(cl_program_info, CL_PROGRAM_DEVICES, VECTOR_CLASS) \ F(cl_program_info, CL_PROGRAM_SOURCE, STRING_CLASS) \ F(cl_program_info, CL_PROGRAM_BINARY_SIZES, VECTOR_CLASS< ::size_t>) \ F(cl_program_info, CL_PROGRAM_BINARIES, VECTOR_CLASS) \ \ F(cl_program_build_info, CL_PROGRAM_BUILD_STATUS, cl_build_status) \ F(cl_program_build_info, CL_PROGRAM_BUILD_OPTIONS, STRING_CLASS) \ F(cl_program_build_info, CL_PROGRAM_BUILD_LOG, STRING_CLASS) \ \ F(cl_kernel_info, CL_KERNEL_FUNCTION_NAME, STRING_CLASS) \ F(cl_kernel_info, CL_KERNEL_NUM_ARGS, cl_uint) \ F(cl_kernel_info, CL_KERNEL_REFERENCE_COUNT, cl_uint) \ F(cl_kernel_info, CL_KERNEL_CONTEXT, cl::Context) \ F(cl_kernel_info, CL_KERNEL_PROGRAM, cl::Program) \ \ F(cl_kernel_work_group_info, CL_KERNEL_WORK_GROUP_SIZE, ::size_t) \ F(cl_kernel_work_group_info, CL_KERNEL_COMPILE_WORK_GROUP_SIZE, cl::size_t<3>) \ F(cl_kernel_work_group_info, CL_KERNEL_LOCAL_MEM_SIZE, cl_ulong) \ \ F(cl_command_queue_info, CL_QUEUE_CONTEXT, cl::Context) \ F(cl_command_queue_info, CL_QUEUE_DEVICE, cl::Device) \ F(cl_command_queue_info, CL_QUEUE_REFERENCE_COUNT, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_PROPERTIES, cl_command_queue_properties) #if defined(CL_VERSION_1_1) #define __PARAM_NAME_INFO_1_1(F) \ F(cl_context_info, CL_CONTEXT_NUM_DEVICES, cl_uint)\ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_DOUBLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HALF_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HOST_UNIFIED_MEMORY, cl_bool) \ F(cl_device_info, CL_DEVICE_OPENCL_C_VERSION, STRING_CLASS) \ \ F(cl_mem_info, CL_MEM_ASSOCIATED_MEMOBJECT, cl::Memory) \ F(cl_mem_info, CL_MEM_OFFSET, ::size_t) \ \ F(cl_kernel_work_group_info, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, ::size_t) \ F(cl_kernel_work_group_info, CL_KERNEL_PRIVATE_MEM_SIZE, cl_ulong) \ \ F(cl_event_info, CL_EVENT_CONTEXT, cl::Context) #endif // CL_VERSION_1_1 #if defined(CL_VERSION_1_2) #define __PARAM_NAME_INFO_1_2(F) \ F(cl_image_info, CL_IMAGE_BUFFER, cl::Buffer) \ \ F(cl_program_info, CL_PROGRAM_NUM_KERNELS, ::size_t) \ F(cl_program_info, CL_PROGRAM_KERNEL_NAMES, STRING_CLASS) \ \ F(cl_program_build_info, CL_PROGRAM_BINARY_TYPE, cl_program_binary_type) \ \ F(cl_kernel_info, CL_KERNEL_ATTRIBUTES, STRING_CLASS) \ \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ADDRESS_QUALIFIER, cl_kernel_arg_address_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ACCESS_QUALIFIER, cl_kernel_arg_access_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_NAME, STRING_CLASS) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_NAME, STRING_CLASS) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_QUALIFIER, cl_kernel_arg_type_qualifier) \ \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_PROPERTIES, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPE, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, ::size_t) \ F(cl_device_info, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, cl_device_affinity_domain) \ F(cl_device_info, CL_DEVICE_BUILT_IN_KERNELS, STRING_CLASS) #endif // #if defined(CL_VERSION_1_2) #if defined(USE_CL_DEVICE_FISSION) #define __PARAM_NAME_DEVICE_FISSION(F) \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE_EXT, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPES_EXT, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_AFFINITY_DOMAINS_EXT, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT_EXT , cl_uint) \ F(cl_device_info, CL_DEVICE_PARTITION_STYLE_EXT, VECTOR_CLASS) #endif // USE_CL_DEVICE_FISSION template struct param_traits {}; #define __CL_DECLARE_PARAM_TRAITS(token, param_name, T) \ struct token; \ template<> \ struct param_traits \ { \ enum { value = param_name }; \ typedef T param_type; \ }; __PARAM_NAME_INFO_1_0(__CL_DECLARE_PARAM_TRAITS) #if defined(CL_VERSION_1_1) __PARAM_NAME_INFO_1_1(__CL_DECLARE_PARAM_TRAITS) #endif // CL_VERSION_1_1 #if defined(CL_VERSION_1_2) __PARAM_NAME_INFO_1_2(__CL_DECLARE_PARAM_TRAITS) #endif // CL_VERSION_1_1 #if defined(USE_CL_DEVICE_FISSION) __PARAM_NAME_DEVICE_FISSION(__CL_DECLARE_PARAM_TRAITS); #endif // USE_CL_DEVICE_FISSION #ifdef CL_PLATFORM_ICD_SUFFIX_KHR __CL_DECLARE_PARAM_TRAITS(cl_platform_info, CL_PLATFORM_ICD_SUFFIX_KHR, STRING_CLASS) #endif #ifdef CL_DEVICE_PROFILING_TIMER_OFFSET_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PROFILING_TIMER_OFFSET_AMD, cl_ulong) #endif #ifdef CL_DEVICE_GLOBAL_FREE_MEMORY_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, VECTOR_CLASS< ::size_t>) #endif #ifdef CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_WAVEFRONT_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_WAVEFRONT_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_BANKS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_LOCAL_MEM_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV, cl_uint) #endif #ifdef CL_DEVICE_REGISTERS_PER_BLOCK_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_REGISTERS_PER_BLOCK_NV, cl_uint) #endif #ifdef CL_DEVICE_WARP_SIZE_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_WARP_SIZE_NV, cl_uint) #endif #ifdef CL_DEVICE_GPU_OVERLAP_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GPU_OVERLAP_NV, cl_bool) #endif #ifdef CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV, cl_bool) #endif #ifdef CL_DEVICE_INTEGRATED_MEMORY_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_INTEGRATED_MEMORY_NV, cl_bool) #endif // Convenience functions template inline cl_int getInfo(Func f, cl_uint name, T* param) { return getInfoHelper(f, name, param, 0); } template struct GetInfoFunctor0 { Func f_; const Arg0& arg0_; cl_int operator ()( cl_uint param, ::size_t size, void* value, ::size_t* size_ret) { return f_(arg0_, param, size, value, size_ret); } }; template struct GetInfoFunctor1 { Func f_; const Arg0& arg0_; const Arg1& arg1_; cl_int operator ()( cl_uint param, ::size_t size, void* value, ::size_t* size_ret) { return f_(arg0_, arg1_, param, size, value, size_ret); } }; template inline cl_int getInfo(Func f, const Arg0& arg0, cl_uint name, T* param) { GetInfoFunctor0 f0 = { f, arg0 }; return getInfoHelper(f0, name, param, 0); } template inline cl_int getInfo(Func f, const Arg0& arg0, const Arg1& arg1, cl_uint name, T* param) { GetInfoFunctor1 f0 = { f, arg0, arg1 }; return getInfoHelper(f0, name, param, 0); } template struct ReferenceHandler { }; #if defined(CL_VERSION_1_2) /** * OpenCL 1.2 devices do have retain/release. */ template <> struct ReferenceHandler { /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int retain(cl_device_id device) { return ::clRetainDevice(device); } /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int release(cl_device_id device) { return ::clReleaseDevice(device); } }; #else // #if defined(CL_VERSION_1_2) /** * OpenCL 1.1 devices do not have retain/release. */ template <> struct ReferenceHandler { // cl_device_id does not have retain(). static cl_int retain(cl_device_id) { return CL_SUCCESS; } // cl_device_id does not have release(). static cl_int release(cl_device_id) { return CL_SUCCESS; } }; #endif // #if defined(CL_VERSION_1_2) template <> struct ReferenceHandler { // cl_platform_id does not have retain(). static cl_int retain(cl_platform_id) { return CL_SUCCESS; } // cl_platform_id does not have release(). static cl_int release(cl_platform_id) { return CL_SUCCESS; } }; template <> struct ReferenceHandler { static cl_int retain(cl_context context) { return ::clRetainContext(context); } static cl_int release(cl_context context) { return ::clReleaseContext(context); } }; template <> struct ReferenceHandler { static cl_int retain(cl_command_queue queue) { return ::clRetainCommandQueue(queue); } static cl_int release(cl_command_queue queue) { return ::clReleaseCommandQueue(queue); } }; template <> struct ReferenceHandler { static cl_int retain(cl_mem memory) { return ::clRetainMemObject(memory); } static cl_int release(cl_mem memory) { return ::clReleaseMemObject(memory); } }; template <> struct ReferenceHandler { static cl_int retain(cl_sampler sampler) { return ::clRetainSampler(sampler); } static cl_int release(cl_sampler sampler) { return ::clReleaseSampler(sampler); } }; template <> struct ReferenceHandler { static cl_int retain(cl_program program) { return ::clRetainProgram(program); } static cl_int release(cl_program program) { return ::clReleaseProgram(program); } }; template <> struct ReferenceHandler { static cl_int retain(cl_kernel kernel) { return ::clRetainKernel(kernel); } static cl_int release(cl_kernel kernel) { return ::clReleaseKernel(kernel); } }; template <> struct ReferenceHandler { static cl_int retain(cl_event event) { return ::clRetainEvent(event); } static cl_int release(cl_event event) { return ::clReleaseEvent(event); } }; // Extracts version number with major in the upper 16 bits, minor in the lower 16 static cl_uint getVersion(const char *versionInfo) { int highVersion = 0; int lowVersion = 0; int index = 7; while(versionInfo[index] != '.' ) { highVersion *= 10; highVersion += versionInfo[index]-'0'; ++index; } ++index; while(versionInfo[index] != ' ' && versionInfo[index] != '\0') { lowVersion *= 10; lowVersion += versionInfo[index]-'0'; ++index; } return (highVersion << 16) | lowVersion; } static cl_uint getPlatformVersion(cl_platform_id platform) { ::size_t size = 0; clGetPlatformInfo(platform, CL_PLATFORM_VERSION, 0, NULL, &size); char *versionInfo = (char *) alloca(size); clGetPlatformInfo(platform, CL_PLATFORM_VERSION, size, &versionInfo[0], &size); return getVersion(versionInfo); } static cl_uint getDevicePlatformVersion(cl_device_id device) { cl_platform_id platform; clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(platform), &platform, NULL); return getPlatformVersion(platform); } #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) static cl_uint getContextPlatformVersion(cl_context context) { // The platform cannot be queried directly, so we first have to grab a // device and obtain its context ::size_t size = 0; clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &size); if (size == 0) return 0; cl_device_id *devices = (cl_device_id *) alloca(size); clGetContextInfo(context, CL_CONTEXT_DEVICES, size, devices, NULL); return getDevicePlatformVersion(devices[0]); } #endif // #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) template class Wrapper { public: typedef T cl_type; protected: cl_type object_; public: Wrapper() : object_(NULL) { } Wrapper(const cl_type &obj) : object_(obj) { } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT { object_ = rhs.object_; rhs.object_ = NULL; } #endif Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; rhs.object_ = NULL; } return *this; } #endif Wrapper& operator = (const cl_type &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs; return *this; } cl_type operator ()() const { return object_; } cl_type& operator ()() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); cl_int retain() const { return ReferenceHandler::retain(object_); } cl_int release() const { return ReferenceHandler::release(object_); } }; template <> class Wrapper { public: typedef cl_device_id cl_type; protected: cl_type object_; bool referenceCountable_; static bool isReferenceCountable(cl_device_id device) { bool retVal = false; if (device != NULL) { int version = getDevicePlatformVersion(device); if(version > ((1 << 16) + 1)) { retVal = true; } } return retVal; } public: Wrapper() : object_(NULL), referenceCountable_(false) { } Wrapper(const cl_type &obj) : object_(obj), referenceCountable_(false) { referenceCountable_ = isReferenceCountable(obj); } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; referenceCountable_ = isReferenceCountable(object_); if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT { object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } #endif Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } return *this; } #endif Wrapper& operator = (const cl_type &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs; referenceCountable_ = isReferenceCountable(object_); return *this; } cl_type operator ()() const { return object_; } cl_type& operator ()() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); template friend inline cl_int getInfoHelper(Func, cl_uint, VECTOR_CLASS*, int, typename U::cl_type); cl_int retain() const { if( referenceCountable_ ) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if( referenceCountable_ ) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; } // namespace detail //! \endcond /*! \stuct ImageFormat * \brief Adds constructors and member functions for cl_image_format. * * \see cl_image_format */ struct ImageFormat : public cl_image_format { //! \brief Default constructor - performs no initialization. ImageFormat(){} //! \brief Initializing constructor. ImageFormat(cl_channel_order order, cl_channel_type type) { image_channel_order = order; image_channel_data_type = type; } //! \brief Assignment operator. ImageFormat& operator = (const ImageFormat& rhs) { if (this != &rhs) { this->image_channel_data_type = rhs.image_channel_data_type; this->image_channel_order = rhs.image_channel_order; } return *this; } }; /*! \brief Class interface for cl_device_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_device_id */ class Device : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Device() : detail::Wrapper() { } /*! \brief Constructor from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ __CL_EXPLICIT_CONSTRUCTORS Device(const cl_device_id &device) : detail::Wrapper(device) { } /*! \brief Returns the first device on the default context. * * \see Context::getDefault() */ static Device getDefault(cl_int * err = NULL); /*! \brief Assignment operator from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ Device& operator = (const cl_device_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Device(const Device& dev) : detail::Wrapper(dev) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Device& operator = (const Device &dev) { detail::Wrapper::operator=(dev); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Device(Device&& dev) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(dev)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Device& operator = (Device &&dev) { detail::Wrapper::operator=(std::move(dev)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetDeviceInfo(). template cl_int getInfo(cl_device_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetDeviceInfo, object_, name, param), __GET_DEVICE_INFO_ERR); } //! \brief Wrapper for clGetDeviceInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_device_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /** * CL 1.2 version */ #if defined(CL_VERSION_1_2) //! \brief Wrapper for clCreateSubDevicesEXT(). cl_int createSubDevices( const cl_device_partition_property * properties, VECTOR_CLASS* devices) { cl_uint n = 0; cl_int err = clCreateSubDevices(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = clCreateSubDevices(object_, properties, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif // #if defined(CL_VERSION_1_2) /** * CL 1.1 version that uses device fission. */ #if defined(CL_VERSION_1_1) #if defined(USE_CL_DEVICE_FISSION) cl_int createSubDevices( const cl_device_partition_property_ext * properties, VECTOR_CLASS* devices) { typedef CL_API_ENTRY cl_int ( CL_API_CALL * PFN_clCreateSubDevicesEXT)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; static PFN_clCreateSubDevicesEXT pfn_clCreateSubDevicesEXT = NULL; __INIT_CL_EXT_FCN_PTR(clCreateSubDevicesEXT); cl_uint n = 0; cl_int err = pfn_clCreateSubDevicesEXT(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = pfn_clCreateSubDevicesEXT(object_, properties, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif // #if defined(USE_CL_DEVICE_FISSION) #endif // #if defined(CL_VERSION_1_1) }; /*! \brief Class interface for cl_platform_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_platform_id */ class Platform : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Platform() : detail::Wrapper() { } /*! \brief Constructor from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ __CL_EXPLICIT_CONSTRUCTORS Platform(const cl_platform_id &platform) : detail::Wrapper(platform) { } /*! \brief Assignment operator from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ Platform& operator = (const cl_platform_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetPlatformInfo(). cl_int getInfo(cl_platform_info name, STRING_CLASS* param) const { return detail::errHandler( detail::getInfo(&::clGetPlatformInfo, object_, name, param), __GET_PLATFORM_INFO_ERR); } //! \brief Wrapper for clGetPlatformInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_platform_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of devices for this platform. * * Wraps clGetDeviceIDs(). */ cl_int getDevices( cl_device_type type, VECTOR_CLASS* devices) const { cl_uint n = 0; if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } cl_int err = ::clGetDeviceIDs(object_, type, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = ::clGetDeviceIDs(object_, type, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #if defined(USE_DX_INTEROP) /*! \brief Get the list of available D3D10 devices. * * \param d3d_device_source. * * \param d3d_object. * * \param d3d_device_set. * * \param devices returns a vector of OpenCL D3D10 devices found. The cl::Device * values returned in devices can be used to identify a specific OpenCL * device. If \a devices argument is NULL, this argument is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * * The application can query specific capabilities of the OpenCL device(s) * returned by cl::getDevices. This can be used by the application to * determine which device(s) to use. * * \note In the case that exceptions are enabled and a return value * other than CL_SUCCESS is generated, then cl::Error exception is * generated. */ cl_int getDevices( cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, VECTOR_CLASS* devices) const { typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clGetDeviceIDsFromD3D10KHR)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint* num_devices); if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } static PFN_clGetDeviceIDsFromD3D10KHR pfn_clGetDeviceIDsFromD3D10KHR = NULL; __INIT_CL_EXT_FCN_PTR_PLATFORM(object_, clGetDeviceIDsFromD3D10KHR); cl_uint n = 0; cl_int err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif /*! \brief Gets a list of available platforms. * * Wraps clGetPlatformIDs(). */ static cl_int get( VECTOR_CLASS* platforms) { cl_uint n = 0; if( platforms == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } platforms->assign(&ids[0], &ids[n]); return CL_SUCCESS; } /*! \brief Gets the first available platform. * * Wraps clGetPlatformIDs(), returning the first result. */ static cl_int get( Platform * platform) { cl_uint n = 0; if( platform == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } *platform = ids[0]; return CL_SUCCESS; } /*! \brief Gets the first available platform, returning it by value. * * Wraps clGetPlatformIDs(), returning the first result. */ static Platform get( cl_int * errResult = NULL) { Platform platform; cl_uint n = 0; cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { detail::errHandler(err, __GET_PLATFORM_IDS_ERR); if (errResult != NULL) { *errResult = err; } return Platform(); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { detail::errHandler(err, __GET_PLATFORM_IDS_ERR); if (errResult != NULL) { *errResult = err; } return Platform(); } return Platform(ids[0]); } static Platform getDefault( cl_int *errResult = NULL ) { return get(errResult); } #if defined(CL_VERSION_1_2) //! \brief Wrapper for clUnloadCompiler(). cl_int unloadCompiler() { return ::clUnloadPlatformCompiler(object_); } #endif // #if defined(CL_VERSION_1_2) }; // class Platform /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) /** * Unload the OpenCL compiler. * \note Deprecated for OpenCL 1.2. Use Platform::unloadCompiler instead. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int UnloadCompiler() CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline cl_int UnloadCompiler() { return ::clUnloadCompiler(); } #endif // #if defined(CL_VERSION_1_1) /*! \brief Class interface for cl_context. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_context as the original. For details, see * clRetainContext() and clReleaseContext(). * * \see cl_context */ class Context : public detail::Wrapper { private: #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED static std::atomic default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED static volatile int default_initialized_; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED static Context default_; static volatile cl_int default_error_; public: /*! \brief Constructs a context including a list of specified devices. * * Wraps clCreateContext(). */ Context( const VECTOR_CLASS& devices, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateContext( properties, (cl_uint) numDevices, deviceIDs, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } Context( const Device& device, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; cl_device_id deviceID = device(); object_ = ::clCreateContext( properties, 1, &deviceID, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a context including all or a subset of devices of a specified type. * * Wraps clCreateContextFromType(). */ Context( cl_device_type type, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties prop[4] = {CL_CONTEXT_PLATFORM, 0, 0, 0 }; if (properties == NULL) { // Get a valid platform ID as we cannot send in a blank one VECTOR_CLASS platforms; error = Platform::get(&platforms); if (error != CL_SUCCESS) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } return; } // Check the platforms we found for a device of our specified type cl_context_properties platform_id = 0; for (unsigned int i = 0; i < platforms.size(); i++) { VECTOR_CLASS devices; #if defined(__CL_ENABLE_EXCEPTIONS) try { #endif error = platforms[i].getDevices(type, &devices); #if defined(__CL_ENABLE_EXCEPTIONS) } catch (Error) {} // Catch if exceptions are enabled as we don't want to exit if first platform has no devices of type // We do error checking next anyway, and can throw there if needed #endif // Only squash CL_SUCCESS and CL_DEVICE_NOT_FOUND if (error != CL_SUCCESS && error != CL_DEVICE_NOT_FOUND) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } if (devices.size() > 0) { platform_id = (cl_context_properties)platforms[i](); break; } } if (platform_id == 0) { detail::errHandler(CL_DEVICE_NOT_FOUND, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = CL_DEVICE_NOT_FOUND; } return; } prop[1] = platform_id; properties = &prop[0]; } #endif object_ = ::clCreateContextFromType( properties, type, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Context(const Context& ctx) : detail::Wrapper(ctx) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Context& operator = (const Context &ctx) { detail::Wrapper::operator=(ctx); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Context(Context&& ctx) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(ctx)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Context& operator = (Context &&ctx) { detail::Wrapper::operator=(std::move(ctx)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Returns a singleton context including all devices of CL_DEVICE_TYPE_DEFAULT. * * \note All calls to this function return the same cl_context as the first. */ static Context getDefault(cl_int * err = NULL) { int state = detail::compare_exchange( &default_initialized_, __DEFAULT_BEING_INITIALIZED, __DEFAULT_NOT_INITIALIZED); if (state & __DEFAULT_INITIALIZED) { if (err != NULL) { *err = default_error_; } return default_; } if (state & __DEFAULT_BEING_INITIALIZED) { // Assume writes will propagate eventually... while(default_initialized_ != __DEFAULT_INITIALIZED) { detail::fence(); } if (err != NULL) { *err = default_error_; } return default_; } cl_int error; default_ = Context( CL_DEVICE_TYPE_DEFAULT, NULL, NULL, NULL, &error); detail::fence(); default_error_ = error; // Assume writes will propagate eventually... default_initialized_ = __DEFAULT_INITIALIZED; detail::fence(); if (err != NULL) { *err = default_error_; } return default_; } //! \brief Default constructor - initializes to NULL. Context() : detail::Wrapper() { } /*! \brief Constructor from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the cl_context * into the new Context object. */ __CL_EXPLICIT_CONSTRUCTORS Context(const cl_context& context) : detail::Wrapper(context) { } /*! \brief Assignment operator from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseContext() on the value previously held by this instance. */ Context& operator = (const cl_context& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetContextInfo(). template cl_int getInfo(cl_context_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetContextInfo, object_, name, param), __GET_CONTEXT_INFO_ERR); } //! \brief Wrapper for clGetContextInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_context_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of supported image formats. * * Wraps clGetSupportedImageFormats(). */ cl_int getSupportedImageFormats( cl_mem_flags flags, cl_mem_object_type type, VECTOR_CLASS* formats) const { cl_uint numEntries; if (!formats) { return CL_SUCCESS; } cl_int err = ::clGetSupportedImageFormats( object_, flags, type, 0, NULL, &numEntries); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } if (numEntries > 0) { ImageFormat* value = (ImageFormat*) alloca(numEntries * sizeof(ImageFormat)); err = ::clGetSupportedImageFormats( object_, flags, type, numEntries, (cl_image_format*)value, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } formats->assign(&value[0], &value[numEntries]); } else { formats->clear(); } return CL_SUCCESS; } }; inline Device Device::getDefault(cl_int * err) { cl_int error; Device device; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { device = context.getInfo()[0]; if (err != NULL) { *err = CL_SUCCESS; } } return device; } #ifdef _WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) std::atomic Context::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) volatile int Context::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) Context Context::default_; __declspec(selectany) volatile cl_int Context::default_error_ = CL_SUCCESS; #else // !_WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) std::atomic Context::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) volatile int Context::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) Context Context::default_; __attribute__((weak)) volatile cl_int Context::default_error_ = CL_SUCCESS; #endif // !_WIN32 /*! \brief Class interface for cl_event. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_event as the original. For details, see * clRetainEvent() and clReleaseEvent(). * * \see cl_event */ class Event : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Event() : detail::Wrapper() { } /*! \brief Constructor from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the cl_event * into the new Event object. */ __CL_EXPLICIT_CONSTRUCTORS Event(const cl_event& event) : detail::Wrapper(event) { } /*! \brief Assignment operator from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseEvent() on the value previously held by this instance. */ Event& operator = (const cl_event& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetEventInfo(). template cl_int getInfo(cl_event_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetEventInfo, object_, name, param), __GET_EVENT_INFO_ERR); } //! \brief Wrapper for clGetEventInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_event_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } //! \brief Wrapper for clGetEventProfilingInfo(). template cl_int getProfilingInfo(cl_profiling_info name, T* param) const { return detail::errHandler(detail::getInfo( &::clGetEventProfilingInfo, object_, name, param), __GET_EVENT_PROFILE_INFO_ERR); } //! \brief Wrapper for clGetEventProfilingInfo() that returns by value. template typename detail::param_traits::param_type getProfilingInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_profiling_info, name>::param_type param; cl_int result = getProfilingInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Blocks the calling thread until this event completes. * * Wraps clWaitForEvents(). */ cl_int wait() const { return detail::errHandler( ::clWaitForEvents(1, &object_), __WAIT_FOR_EVENTS_ERR); } #if defined(CL_VERSION_1_1) /*! \brief Registers a user callback function for a specific command execution status. * * Wraps clSetEventCallback(). */ cl_int setCallback( cl_int type, void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data = NULL) { return detail::errHandler( ::clSetEventCallback( object_, type, pfn_notify, user_data), __SET_EVENT_CALLBACK_ERR); } #endif /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ static cl_int waitForEvents(const VECTOR_CLASS& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } }; #if defined(CL_VERSION_1_1) /*! \brief Class interface for user events (a subset of cl_event's). * * See Event for details about copy semantics, etc. */ class UserEvent : public Event { public: /*! \brief Constructs a user event on a given context. * * Wraps clCreateUserEvent(). */ UserEvent( const Context& context, cl_int * err = NULL) { cl_int error; object_ = ::clCreateUserEvent( context(), &error); detail::errHandler(error, __CREATE_USER_EVENT_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. UserEvent() : Event() { } /*! \brief Sets the execution status of a user event object. * * Wraps clSetUserEventStatus(). */ cl_int setStatus(cl_int status) { return detail::errHandler( ::clSetUserEventStatus(object_,status), __SET_USER_EVENT_STATUS_ERR); } }; #endif /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ inline static cl_int WaitForEvents(const VECTOR_CLASS& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } /*! \brief Class interface for cl_mem. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_mem as the original. For details, see * clRetainMemObject() and clReleaseMemObject(). * * \see cl_mem */ class Memory : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Memory() : detail::Wrapper() { } /*! \brief Constructor from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the cl_mem * into the new Memory object. */ __CL_EXPLICIT_CONSTRUCTORS Memory(const cl_mem& memory) : detail::Wrapper(memory) { } /*! \brief Assignment operator from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseMemObject() on the value previously held by this instance. */ Memory& operator = (const cl_mem& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Memory(const Memory& mem) : detail::Wrapper(mem) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Memory& operator = (const Memory &mem) { detail::Wrapper::operator=(mem); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Memory(Memory&& mem) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(mem)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Memory& operator = (Memory &&mem) { detail::Wrapper::operator=(std::move(mem)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_mem_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetMemObjectInfo, object_, name, param), __GET_MEM_OBJECT_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_mem_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if defined(CL_VERSION_1_1) /*! \brief Registers a callback function to be called when the memory object * is no longer needed. * * Wraps clSetMemObjectDestructorCallback(). * * Repeated calls to this function, for a given cl_mem value, will append * to the list of functions called (in reverse order) when memory object's * resources are freed and the memory object is deleted. * * \note * The registered callbacks are associated with the underlying cl_mem * value - not the Memory class instance. */ cl_int setDestructorCallback( void (CL_CALLBACK * pfn_notify)(cl_mem, void *), void * user_data = NULL) { return detail::errHandler( ::clSetMemObjectDestructorCallback( object_, pfn_notify, user_data), __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR); } #endif }; // Pre-declare copy functions class Buffer; template< typename IteratorType > cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); /*! \brief Class interface for Buffer Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Buffer : public Memory { public: /*! \brief Constructs a Buffer in a specified context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. */ Buffer( const Context& context, cl_mem_flags flags, ::size_t size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Buffer in the default context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. * * \see Context::getDefault() */ Buffer( cl_mem_flags flags, ::size_t size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! * \brief Construct a Buffer from a host container via iterators. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer( IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); Context context = Context::getDefault(err); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { error = cl::copy(startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Construct a Buffer from a host container via iterators using a specified context. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); /*! * \brief Construct a Buffer from a host container via iterators using a specified queue. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Buffer() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Buffer(const cl_mem& buffer) : Memory(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Buffer& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Buffer(const Buffer& buf) : Memory(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Buffer& operator = (const Buffer &buf) { Memory::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Buffer(Buffer&& buf) CL_HPP_NOEXCEPT : Memory(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Buffer& operator = (Buffer &&buf) { Memory::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) #if defined(CL_VERSION_1_1) /*! \brief Creates a new buffer object from this. * * Wraps clCreateSubBuffer(). */ Buffer createSubBuffer( cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * err = NULL) { Buffer result; cl_int error; result.object_ = ::clCreateSubBuffer( object_, flags, buffer_create_type, buffer_create_info, &error); detail::errHandler(error, __CREATE_SUBBUFFER_ERR); if (err != NULL) { *err = error; } return result; } #endif }; #if defined (USE_DX_INTEROP) /*! \brief Class interface for creating OpenCL buffers from ID3D10Buffer's. * * This is provided to facilitate interoperability with Direct3D. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferD3D10 : public Buffer { public: typedef CL_API_ENTRY cl_mem (CL_API_CALL *PFN_clCreateFromD3D10BufferKHR)( cl_context context, cl_mem_flags flags, ID3D10Buffer* buffer, cl_int* errcode_ret); /*! \brief Constructs a BufferD3D10, in a specified context, from a * given ID3D10Buffer. * * Wraps clCreateFromD3D10BufferKHR(). */ BufferD3D10( const Context& context, cl_mem_flags flags, ID3D10Buffer* bufobj, cl_int * err = NULL) { static PFN_clCreateFromD3D10BufferKHR pfn_clCreateFromD3D10BufferKHR = NULL; #if defined(CL_VERSION_1_2) vector props = context.getInfo(); cl_platform platform = -1; for( int i = 0; i < props.size(); ++i ) { if( props[i] == CL_CONTEXT_PLATFORM ) { platform = props[i+1]; } } __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clCreateFromD3D10BufferKHR); #endif #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clCreateFromD3D10BufferKHR); #endif cl_int error; object_ = pfn_clCreateFromD3D10BufferKHR( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferD3D10() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS BufferD3D10(const cl_mem& buffer) : Buffer(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferD3D10& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10(const BufferD3D10& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (const BufferD3D10 &buf) { Buffer::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10(BufferD3D10&& buf) CL_HPP_NOEXCEPT : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (BufferD3D10 &&buf) { Buffer::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif /*! \brief Class interface for GL Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferGL : public Buffer { public: /*! \brief Constructs a BufferGL in a specified context, from a given * GL buffer. * * Wraps clCreateFromGLBuffer(). */ BufferGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLBuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS BufferGL(const cl_mem& buffer) : Buffer(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL(const BufferGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (const BufferGL &buf) { Buffer::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferGL(BufferGL&& buf) CL_HPP_NOEXCEPT : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (BufferGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief C++ base class for Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image : public Memory { protected: //! \brief Default constructor - initializes to NULL. Image() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image(const cl_mem& image) : Memory(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image(const Image& img) : Memory(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image& operator = (const Image &img) { Memory::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image(Image&& img) CL_HPP_NOEXCEPT : Memory(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image& operator = (Image &&img) { Memory::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) public: //! \brief Wrapper for clGetImageInfo(). template cl_int getImageInfo(cl_image_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetImageInfo, object_, name, param), __GET_IMAGE_INFO_ERR); } //! \brief Wrapper for clGetImageInfo() that returns by value. template typename detail::param_traits::param_type getImageInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_image_info, name>::param_type param; cl_int result = getImageInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; #if defined(CL_VERSION_1_2) /*! \brief Class interface for 1D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image1D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image1D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D, width, 0, 0, 0, 0, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image1D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image1D(const cl_mem& image1D) : Image(image1D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image1D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1D(const Image1D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1D& operator = (const Image1D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1D(Image1D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1D& operator = (Image1D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; /*! \class Image1DBuffer * \brief Image interface for 1D buffer images. */ class Image1DBuffer : public Image { public: Image1DBuffer( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, const Buffer &buffer, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_BUFFER, width, 0, 0, 0, 0, 0, 0, 0, buffer() }; object_ = ::clCreateImage( context(), flags, &format, &desc, NULL, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DBuffer() { } __CL_EXPLICIT_CONSTRUCTORS Image1DBuffer(const cl_mem& image1D) : Image(image1D) { } Image1DBuffer& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer(const Image1DBuffer& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (const Image1DBuffer &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer(Image1DBuffer&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (Image1DBuffer &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; /*! \class Image1DArray * \brief Image interface for arrays of 1D images. */ class Image1DArray : public Image { public: Image1DArray( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t arraySize, ::size_t width, ::size_t rowPitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_ARRAY, width, 0, 0, // height, depth (unused) arraySize, rowPitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DArray() { } __CL_EXPLICIT_CONSTRUCTORS Image1DArray(const cl_mem& imageArray) : Image(imageArray) { } Image1DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray(const Image1DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (const Image1DArray &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray(Image1DArray&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (Image1DArray &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for 2D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image2D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, ::size_t height, ::size_t row_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif defined(CL_VERSION_1_2) useCreateImage = true; #else useCreateImage = false; #endif #if defined(CL_VERSION_1_2) if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) if (!useCreateImage) { object_ = ::clCreateImage2D( context(), flags,&format, width, height, row_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE2D_ERR); if (err != NULL) { *err = error; } } #endif // #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) } //! \brief Default constructor - initializes to NULL. Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image2D(const cl_mem& image2D) : Image(image2D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2D(const Image2D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2D& operator = (const Image2D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2D(Image2D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2D& operator = (Image2D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #if !defined(CL_VERSION_1_2) /*! \brief Class interface for GL 2D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory * \note Deprecated for OpenCL 1.2. Please use ImageGL instead. */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED Image2DGL CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED : public Image2D { public: /*! \brief Constructs an Image2DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture2D(). */ Image2DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture2D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_2D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image2DGL() : Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image2DGL(const cl_mem& image) : Image2D(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2DGL& operator = (const cl_mem& rhs) { Image2D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL(const Image2DGL& img) : Image2D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (const Image2DGL &img) { Image2D::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL(Image2DGL&& img) CL_HPP_NOEXCEPT : Image2D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (Image2DGL &&img) { Image2D::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if !defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_2) /*! \class Image2DArray * \brief Image interface for arrays of 2D images. */ class Image2DArray : public Image { public: Image2DArray( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t arraySize, ::size_t width, ::size_t height, ::size_t rowPitch, ::size_t slicePitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D_ARRAY, width, height, 0, // depth (unused) arraySize, rowPitch, slicePitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image2DArray() { } __CL_EXPLICIT_CONSTRUCTORS Image2DArray(const cl_mem& imageArray) : Image(imageArray) { } Image2DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray(const Image2DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (const Image2DArray &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray(Image2DArray&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (Image2DArray &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for 3D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3D : public Image { public: /*! \brief Constructs a 3D Image in a specified context. * * Wraps clCreateImage(). */ Image3D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, ::size_t height, ::size_t depth, ::size_t row_pitch = 0, ::size_t slice_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif defined(CL_VERSION_1_2) useCreateImage = true; #else useCreateImage = false; #endif #if defined(CL_VERSION_1_2) if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE3D, width, height, depth, 0, // array size (unused) row_pitch, slice_pitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) if (!useCreateImage) { object_ = ::clCreateImage3D( context(), flags, &format, width, height, depth, row_pitch, slice_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE3D_ERR); if (err != NULL) { *err = error; } } #endif // #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) } //! \brief Default constructor - initializes to NULL. Image3D() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image3D(const cl_mem& image3D) : Image(image3D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3D(const Image3D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3D& operator = (const Image3D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3D(Image3D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3D& operator = (Image3D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #if !defined(CL_VERSION_1_2) /*! \brief Class interface for GL 3D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3DGL : public Image3D { public: /*! \brief Constructs an Image3DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture3D(). */ Image3DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture3D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_3D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image3DGL() : Image3D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image3DGL(const cl_mem& image) : Image3D(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3DGL& operator = (const cl_mem& rhs) { Image3D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL(const Image3DGL& img) : Image3D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (const Image3DGL &img) { Image3D::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL(Image3DGL&& img) CL_HPP_NOEXCEPT : Image3D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (Image3DGL &&img) { Image3D::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if !defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_2) /*! \class ImageGL * \brief general image interface for GL interop. * We abstract the 2D and 3D GL images into a single instance here * that wraps all GL sourced images on the grounds that setup information * was performed by OpenCL anyway. */ class ImageGL : public Image { public: ImageGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_ERR); if (err != NULL) { *err = error; } } ImageGL() : Image() { } __CL_EXPLICIT_CONSTRUCTORS ImageGL(const cl_mem& image) : Image(image) { } ImageGL& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL(const ImageGL& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (const ImageGL &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ ImageGL(ImageGL&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (ImageGL &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for GL Render Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferRenderGL : #if defined(CL_VERSION_1_2) public ImageGL #else // #if defined(CL_VERSION_1_2) public Image2DGL #endif //#if defined(CL_VERSION_1_2) { public: /*! \brief Constructs a BufferRenderGL in a specified context, from a given * GL Renderbuffer. * * Wraps clCreateFromGLRenderbuffer(). */ BufferRenderGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLRenderbuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_RENDER_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. #if defined(CL_VERSION_1_2) BufferRenderGL() : ImageGL() {}; #else // #if defined(CL_VERSION_1_2) BufferRenderGL() : Image2DGL() {}; #endif //#if defined(CL_VERSION_1_2) /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ #if defined(CL_VERSION_1_2) __CL_EXPLICIT_CONSTRUCTORS BufferRenderGL(const cl_mem& buffer) : ImageGL(buffer) { } #else // #if defined(CL_VERSION_1_2) __CL_EXPLICIT_CONSTRUCTORS BufferRenderGL(const cl_mem& buffer) : Image2DGL(buffer) { } #endif //#if defined(CL_VERSION_1_2) /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferRenderGL& operator = (const cl_mem& rhs) { #if defined(CL_VERSION_1_2) ImageGL::operator=(rhs); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(rhs); #endif //#if defined(CL_VERSION_1_2) return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ #if defined(CL_VERSION_1_2) BufferRenderGL(const BufferRenderGL& buf) : ImageGL(buf) {} #else // #if defined(CL_VERSION_1_2) BufferRenderGL(const BufferRenderGL& buf) : Image2DGL(buf) {} #endif //#if defined(CL_VERSION_1_2) /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (const BufferRenderGL &rhs) { #if defined(CL_VERSION_1_2) ImageGL::operator=(rhs); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(rhs); #endif //#if defined(CL_VERSION_1_2) return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ #if defined(CL_VERSION_1_2) BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT : ImageGL(std::move(buf)) {} #else // #if defined(CL_VERSION_1_2) BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT : Image2DGL(std::move(buf)) {} #endif //#if defined(CL_VERSION_1_2) /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (BufferRenderGL &&buf) { #if defined(CL_VERSION_1_2) ImageGL::operator=(std::move(buf)); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(std::move(buf)); #endif //#if defined(CL_VERSION_1_2) return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_, type, gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief Class interface for cl_sampler. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_sampler as the original. For details, see * clRetainSampler() and clReleaseSampler(). * * \see cl_sampler */ class Sampler : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Sampler() { } /*! \brief Constructs a Sampler in a specified context. * * Wraps clCreateSampler(). */ Sampler( const Context& context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int* err = NULL) { cl_int error; object_ = ::clCreateSampler( context(), normalized_coords, addressing_mode, filter_mode, &error); detail::errHandler(error, __CREATE_SAMPLER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructor from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the cl_sampler * into the new Sampler object. */ __CL_EXPLICIT_CONSTRUCTORS Sampler(const cl_sampler& sampler) : detail::Wrapper(sampler) { } /*! \brief Assignment operator from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseSampler() on the value previously held by this instance. */ Sampler& operator = (const cl_sampler& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Sampler(const Sampler& sam) : detail::Wrapper(sam) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Sampler& operator = (const Sampler &sam) { detail::Wrapper::operator=(sam); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Sampler(Sampler&& sam) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(sam)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Sampler& operator = (Sampler &&sam) { detail::Wrapper::operator=(std::move(sam)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetSamplerInfo(). template cl_int getInfo(cl_sampler_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetSamplerInfo, object_, name, param), __GET_SAMPLER_INFO_ERR); } //! \brief Wrapper for clGetSamplerInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_sampler_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; class Program; class CommandQueue; class Kernel; //! \brief Class interface for specifying NDRange values. class NDRange { private: size_t<3> sizes_; cl_uint dimensions_; public: //! \brief Default constructor - resulting range has zero dimensions. NDRange() : dimensions_(0) { } //! \brief Constructs one-dimensional range. NDRange(::size_t size0) : dimensions_(1) { sizes_[0] = size0; } //! \brief Constructs two-dimensional range. NDRange(::size_t size0, ::size_t size1) : dimensions_(2) { sizes_[0] = size0; sizes_[1] = size1; } //! \brief Constructs three-dimensional range. NDRange(::size_t size0, ::size_t size1, ::size_t size2) : dimensions_(3) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = size2; } /*! \brief Conversion operator to const ::size_t *. * * \returns a pointer to the size of the first dimension. */ operator const ::size_t*() const { return (const ::size_t*) sizes_; } //! \brief Queries the number of dimensions in the range. ::size_t dimensions() const { return dimensions_; } }; //! \brief A zero-dimensional range. static const NDRange NullRange; //! \brief Local address wrapper for use with Kernel::setArg struct LocalSpaceArg { ::size_t size_; }; namespace detail { template struct KernelArgumentHandler { static ::size_t size(const T&) { return sizeof(T); } static const T* ptr(const T& value) { return &value; } }; template <> struct KernelArgumentHandler { static ::size_t size(const LocalSpaceArg& value) { return value.size_; } static const void* ptr(const LocalSpaceArg&) { return NULL; } }; } //! \endcond /*! __local * \brief Helper function for generating LocalSpaceArg objects. * Deprecated. Replaced with Local. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED LocalSpaceArg __local(::size_t size) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline LocalSpaceArg __local(::size_t size) { LocalSpaceArg ret = { size }; return ret; } /*! Local * \brief Helper function for generating LocalSpaceArg objects. */ inline LocalSpaceArg Local(::size_t size) { LocalSpaceArg ret = { size }; return ret; } //class KernelFunctor; /*! \brief Class interface for cl_kernel. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_kernel as the original. For details, see * clRetainKernel() and clReleaseKernel(). * * \see cl_kernel */ class Kernel : public detail::Wrapper { public: inline Kernel(const Program& program, const char* name, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Kernel() { } /*! \brief Constructor from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the cl_kernel * into the new Kernel object. */ __CL_EXPLICIT_CONSTRUCTORS Kernel(const cl_kernel& kernel) : detail::Wrapper(kernel) { } /*! \brief Assignment operator from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseKernel() on the value previously held by this instance. */ Kernel& operator = (const cl_kernel& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Kernel(const Kernel& kernel) : detail::Wrapper(kernel) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Kernel& operator = (const Kernel &kernel) { detail::Wrapper::operator=(kernel); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Kernel(Kernel&& kernel) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(kernel)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Kernel& operator = (Kernel &&kernel) { detail::Wrapper::operator=(std::move(kernel)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) template cl_int getInfo(cl_kernel_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelInfo, object_, name, param), __GET_KERNEL_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if defined(CL_VERSION_1_2) template cl_int getArgInfo(cl_uint argIndex, cl_kernel_arg_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelArgInfo, object_, argIndex, name, param), __GET_KERNEL_ARG_INFO_ERR); } template typename detail::param_traits::param_type getArgInfo(cl_uint argIndex, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_arg_info, name>::param_type param; cl_int result = getArgInfo(argIndex, name, ¶m); if (err != NULL) { *err = result; } return param; } #endif // #if defined(CL_VERSION_1_2) template cl_int getWorkGroupInfo( const Device& device, cl_kernel_work_group_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetKernelWorkGroupInfo, object_, device(), name, param), __GET_KERNEL_WORK_GROUP_INFO_ERR); } template typename detail::param_traits::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_work_group_info, name>::param_type param; cl_int result = getWorkGroupInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int setArg(cl_uint index, const T &value) { return detail::errHandler( ::clSetKernelArg( object_, index, detail::KernelArgumentHandler::size(value), detail::KernelArgumentHandler::ptr(value)), __SET_KERNEL_ARGS_ERR); } cl_int setArg(cl_uint index, ::size_t size, const void* argPtr) { return detail::errHandler( ::clSetKernelArg(object_, index, size, argPtr), __SET_KERNEL_ARGS_ERR); } }; /*! \class Program * \brief Program interface that implements cl_program. */ class Program : public detail::Wrapper { public: typedef VECTOR_CLASS > Binaries; typedef VECTOR_CLASS > Sources; Program( const STRING_CLASS& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const ::size_t length = source.size(); Context context = Context::getDefault(err); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, "", NULL, NULL); detail::errHandler(error, __BUILD_PROGRAM_ERR); } if (err != NULL) { *err = error; } } Program( const Context& context, const STRING_CLASS& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const ::size_t length = source.size(); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, "", NULL, NULL); detail::errHandler(error, __BUILD_PROGRAM_ERR); } if (err != NULL) { *err = error; } } Program( const Context& context, const Sources& sources, cl_int* err = NULL) { cl_int error; const ::size_t n = (::size_t)sources.size(); ::size_t* lengths = (::size_t*) alloca(n * sizeof(::size_t)); const char** strings = (const char**) alloca(n * sizeof(const char*)); for (::size_t i = 0; i < n; ++i) { strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings, lengths, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Construct a program object from a list of devices and a per-device list of binaries. * \param context A valid OpenCL context in which to construct the program. * \param devices A vector of OpenCL device objects for which the program will be created. * \param binaries A vector of pairs of a pointer to a binary object and its length. * \param binaryStatus An optional vector that on completion will be resized to * match the size of binaries and filled with values to specify if each binary * was successfully loaded. * Set to CL_SUCCESS if the binary was successfully loaded. * Set to CL_INVALID_VALUE if the length is 0 or the binary pointer is NULL. * Set to CL_INVALID_BINARY if the binary provided is not valid for the matching device. * \param err if non-NULL will be set to CL_SUCCESS on successful operation or one of the following errors: * CL_INVALID_CONTEXT if context is not a valid context. * CL_INVALID_VALUE if the length of devices is zero; or if the length of binaries does not match the length of devices; * or if any entry in binaries is NULL or has length 0. * CL_INVALID_DEVICE if OpenCL devices listed in devices are not in the list of devices associated with context. * CL_INVALID_BINARY if an invalid program binary was encountered for any device. binaryStatus will return specific status for each device. * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ Program( const Context& context, const VECTOR_CLASS& devices, const Binaries& binaries, VECTOR_CLASS* binaryStatus = NULL, cl_int* err = NULL) { cl_int error; const ::size_t numDevices = devices.size(); // Catch size mismatch early and return if(binaries.size() != numDevices) { error = CL_INVALID_VALUE; detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } return; } ::size_t* lengths = (::size_t*) alloca(numDevices * sizeof(::size_t)); const unsigned char** images = (const unsigned char**) alloca(numDevices * sizeof(const unsigned char**)); for (::size_t i = 0; i < numDevices; ++i) { images[i] = (const unsigned char*)binaries[i].first; lengths[i] = binaries[(int)i].second; } cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } if(binaryStatus) { binaryStatus->resize(numDevices); } object_ = ::clCreateProgramWithBinary( context(), (cl_uint) devices.size(), deviceIDs, lengths, images, (binaryStatus != NULL && numDevices > 0) ? &binaryStatus->front() : NULL, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } } #if defined(CL_VERSION_1_2) /** * Create program using builtin kernels. * \param kernelNames Semi-colon separated list of builtin kernel names */ Program( const Context& context, const VECTOR_CLASS& devices, const STRING_CLASS& kernelNames, cl_int* err = NULL) { cl_int error; ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateProgramWithBuiltInKernels( context(), (cl_uint) devices.size(), deviceIDs, kernelNames.c_str(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) Program() { } __CL_EXPLICIT_CONSTRUCTORS Program(const cl_program& program) : detail::Wrapper(program) { } Program& operator = (const cl_program& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Program(const Program& program) : detail::Wrapper(program) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Program& operator = (const Program &program) { detail::Wrapper::operator=(program); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Program(Program&& program) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(program)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Program& operator = (Program &&program) { detail::Wrapper::operator=(std::move(program)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) cl_int build( const VECTOR_CLASS& devices, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } return detail::errHandler( ::clBuildProgram( object_, (cl_uint) devices.size(), deviceIDs, options, notifyFptr, data), __BUILD_PROGRAM_ERR); } cl_int build( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { return detail::errHandler( ::clBuildProgram( object_, 0, NULL, options, notifyFptr, data), __BUILD_PROGRAM_ERR); } #if defined(CL_VERSION_1_2) cl_int compile( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { return detail::errHandler( ::clCompileProgram( object_, 0, NULL, options, 0, NULL, NULL, notifyFptr, data), __COMPILE_PROGRAM_ERR); } #endif template cl_int getInfo(cl_program_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int getBuildInfo( const Device& device, cl_program_build_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetProgramBuildInfo, object_, device(), name, param), __GET_PROGRAM_BUILD_INFO_ERR); } template typename detail::param_traits::param_type getBuildInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; cl_int result = getBuildInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int createKernels(VECTOR_CLASS* kernels) { cl_uint numKernels; cl_int err = ::clCreateKernelsInProgram(object_, 0, NULL, &numKernels); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } Kernel* value = (Kernel*) alloca(numKernels * sizeof(Kernel)); err = ::clCreateKernelsInProgram( object_, numKernels, (cl_kernel*) value, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } kernels->assign(&value[0], &value[numKernels]); return CL_SUCCESS; } }; #if defined(CL_VERSION_1_2) inline Program linkProgram( Program input1, Program input2, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program programs[2] = { input1(), input2() }; Context ctx = input1.getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, 2, programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } inline Program linkProgram( VECTOR_CLASS inputPrograms, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program * programs = (cl_program*) alloca(inputPrograms.size() * sizeof(cl_program)); if (programs != NULL) { for (unsigned int i = 0; i < inputPrograms.size(); i++) { programs[i] = inputPrograms[i](); } } Context ctx; if(inputPrograms.size() > 0) { ctx = inputPrograms[0].getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, (cl_uint)inputPrograms.size(), programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } #endif template<> inline VECTOR_CLASS cl::Program::getInfo(cl_int* err) const { VECTOR_CLASS< ::size_t> sizes = getInfo(); VECTOR_CLASS binaries; for (VECTOR_CLASS< ::size_t>::iterator s = sizes.begin(); s != sizes.end(); ++s) { char *ptr = NULL; if (*s != 0) ptr = new char[*s]; binaries.push_back(ptr); } cl_int result = getInfo(CL_PROGRAM_BINARIES, &binaries); if (err != NULL) { *err = result; } return binaries; } inline Kernel::Kernel(const Program& program, const char* name, cl_int* err) { cl_int error; object_ = ::clCreateKernel(program(), name, &error); detail::errHandler(error, __CREATE_KERNEL_ERR); if (err != NULL) { *err = error; } } /*! \class CommandQueue * \brief CommandQueue interface for cl_command_queue. */ class CommandQueue : public detail::Wrapper { private: #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED static std::atomic default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED static volatile int default_initialized_; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED static CommandQueue default_; static volatile cl_int default_error_; public: CommandQueue( cl_command_queue_properties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context */ explicit CommandQueue( const Context& context, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; VECTOR_CLASS devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } object_ = ::clCreateCommandQueue(context(), devices[0](), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } CommandQueue( const Context& context, const Device& device, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue(const CommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (const CommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue(CommandQueue&& queue) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (CommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) static CommandQueue getDefault(cl_int * err = NULL) { int state = detail::compare_exchange( &default_initialized_, __DEFAULT_BEING_INITIALIZED, __DEFAULT_NOT_INITIALIZED); if (state & __DEFAULT_INITIALIZED) { if (err != NULL) { *err = default_error_; } return default_; } if (state & __DEFAULT_BEING_INITIALIZED) { // Assume writes will propagate eventually... while(default_initialized_ != __DEFAULT_INITIALIZED) { detail::fence(); } if (err != NULL) { *err = default_error_; } return default_; } cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; default_ = CommandQueue(context, device, 0, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } detail::fence(); default_error_ = error; // Assume writes will propagate eventually... default_initialized_ = __DEFAULT_INITIALIZED; detail::fence(); if (err != NULL) { *err = default_error_; } return default_; } CommandQueue() { } __CL_EXPLICIT_CONSTRUCTORS CommandQueue(const cl_command_queue& commandQueue) : detail::Wrapper(commandQueue) { } CommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, ::size_t src_offset, ::size_t dst_offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBuffer( object_, src(), dst(), src_offset, dst_offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBufferRect( object_, buffer(), blocking, (const ::size_t *)buffer_offset, (const ::size_t *)host_offset, (const ::size_t *)region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, const void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBufferRect( object_, buffer(), blocking, (const ::size_t *)buffer_offset, (const ::size_t *)host_offset, (const ::size_t *)region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, ::size_t src_row_pitch, ::size_t src_slice_pitch, ::size_t dst_row_pitch, ::size_t dst_slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferRect( object_, src(), dst(), (const ::size_t *)src_origin, (const ::size_t *)dst_origin, (const ::size_t *)region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueue a command to fill a buffer object with a pattern * of a given size. The pattern is specified a as vector. * \tparam PatternType The datatype of the pattern field. * The pattern type must be an accepted OpenCL data type. */ template cl_int enqueueFillBuffer( const Buffer& buffer, PatternType pattern, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillBuffer( object_, buffer(), static_cast(&pattern), sizeof(PatternType), offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueReadImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadImage( object_, image(), blocking, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteImage( object_, image(), blocking, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyImage( const Image& src, const Image& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImage( object_, src(), dst(), (const ::size_t *) src_origin, (const ::size_t *)dst_origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA floating-point color value if * the image channel data type is not an unnormalized signed or * unsigned data type. */ cl_int enqueueFillImage( const Image& image, cl_float4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA signed integer color value if * the image channel data type is an unnormalized signed integer * type. */ cl_int enqueueFillImage( const Image& image, cl_int4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA unsigned integer color value if * the image channel data type is an unnormalized unsigned integer * type. */ cl_int enqueueFillImage( const Image& image, cl_uint4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& region, ::size_t dst_offset, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImageToBuffer( object_, src(), dst(), (const ::size_t *) src_origin, (const ::size_t *) region, dst_offset, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, ::size_t src_offset, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferToImage( object_, src(), dst(), src_offset, (const ::size_t *) dst_origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapBuffer( object_, buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } void* enqueueMapImage( const Image& buffer, cl_bool blocking, cl_map_flags flags, const size_t<3>& origin, const size_t<3>& region, ::size_t * row_pitch, ::size_t * slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapImage( object_, buffer(), blocking, flags, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_IMAGE_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( object_, memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueues a marker command which waits for either a list of events to complete, * or all previously enqueued commands to complete. * * Enqueues a marker command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command returns an event which can be waited on, * i.e. this event can be waited on to insure that all events either in the event_wait_list * or all previously enqueued commands, queued before this command to command_queue, * have completed. */ cl_int enqueueMarkerWithWaitList( const VECTOR_CLASS *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarkerWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * A synchronization point that enqueues a barrier operation. * * Enqueues a barrier command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command blocks command execution, that is, any * following commands enqueued after it do not execute until it completes. This command * returns an event which can be waited on, i.e. this event can be waited on to insure that * all events either in the event_wait_list or all previously enqueued commands, queued * before this command to command_queue, have completed. */ cl_int enqueueBarrierWithWaitList( const VECTOR_CLASS *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueBarrierWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_BARRIER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command to indicate with which device a set of memory objects * should be associated. */ cl_int enqueueMigrateMemObjects( const VECTOR_CLASS &memObjects, cl_mem_migration_flags flags, const VECTOR_CLASS* events = NULL, Event* event = NULL ) const { cl_event tmp; cl_mem* localMemObjects = static_cast(alloca(memObjects.size() * sizeof(cl_mem))); for( int i = 0; i < (int)memObjects.size(); ++i ) { localMemObjects[i] = memObjects[i](); } cl_int err = detail::errHandler( ::clEnqueueMigrateMemObjects( object_, (cl_uint)memObjects.size(), static_cast(localMemObjects), flags, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueNDRangeKernel( const Kernel& kernel, const NDRange& offset, const NDRange& global, const NDRange& local = NullRange, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNDRangeKernel( object_, kernel(), (cl_uint) global.dimensions(), offset.dimensions() != 0 ? (const ::size_t*) offset : NULL, (const ::size_t*) global, local.dimensions() != 0 ? (const ::size_t*) local : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NDRANGE_KERNEL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueTask( const Kernel& kernel, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueTask( object_, kernel(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_TASK_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueNativeKernel( void (CL_CALLBACK *userFptr)(void *), std::pair args, const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* mem_locs = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_mem * mems = (mem_objects != NULL && mem_objects->size() > 0) ? (cl_mem*) alloca(mem_objects->size() * sizeof(cl_mem)) : NULL; if (mems != NULL) { for (unsigned int i = 0; i < mem_objects->size(); i++) { mems[i] = ((*mem_objects)[i])(); } } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNativeKernel( object_, userFptr, args.first, args.second, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, mems, (mem_locs != NULL && mem_locs->size() > 0) ? (const void **) &mem_locs->front() : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NATIVE_KERNEL); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueMarker(Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarker( object_, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueWaitForEvents(const VECTOR_CLASS& events) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueWaitForEvents( object_, (cl_uint) events.size(), events.size() > 0 ? (const cl_event*) &events.front() : NULL), __ENQUEUE_WAIT_FOR_EVENTS_ERR); } #endif // #if defined(CL_VERSION_1_1) cl_int enqueueAcquireGLObjects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueAcquireGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseGLObjects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReleaseGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined (USE_DX_INTEROP) typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueAcquireD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueReleaseD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); cl_int enqueueAcquireD3D10Objects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { static PFN_clEnqueueAcquireD3D10ObjectsKHR pfn_clEnqueueAcquireD3D10ObjectsKHR = NULL; #if defined(CL_VERSION_1_2) cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clEnqueueAcquireD3D10ObjectsKHR); #endif #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clEnqueueAcquireD3D10ObjectsKHR); #endif cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueAcquireD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseD3D10Objects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { static PFN_clEnqueueReleaseD3D10ObjectsKHR pfn_clEnqueueReleaseD3D10ObjectsKHR = NULL; #if defined(CL_VERSION_1_2) cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clEnqueueReleaseD3D10ObjectsKHR); #endif // #if defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clEnqueueReleaseD3D10ObjectsKHR); #endif // #if defined(CL_VERSION_1_1) cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueReleaseD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueBarrier() const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueBarrier(object_), __ENQUEUE_BARRIER_ERR); } #endif // #if defined(CL_VERSION_1_1) cl_int flush() const { return detail::errHandler(::clFlush(object_), __FLUSH_ERR); } cl_int finish() const { return detail::errHandler(::clFinish(object_), __FINISH_ERR); } }; #ifdef _WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) std::atomic CommandQueue::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) volatile int CommandQueue::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) CommandQueue CommandQueue::default_; __declspec(selectany) volatile cl_int CommandQueue::default_error_ = CL_SUCCESS; #else // !_WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) std::atomic CommandQueue::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) volatile int CommandQueue::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) CommandQueue CommandQueue::default_; __attribute__((weak)) volatile cl_int CommandQueue::default_error_ = CL_SUCCESS; #endif // !_WIN32 template< typename IteratorType > Buffer::Buffer( const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { CommandQueue queue(context, 0, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } template< typename IteratorType > Buffer::Buffer( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if (readOnly) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); Context context = queue.getInfo(); if (useHostPtr) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if (!useHostPtr) { error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } inline cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBuffer(buffer, blocking, offset, size, ptr, events, event); } inline cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBuffer(buffer, blocking, offset, size, ptr, events, event); } inline void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } void * result = ::clEnqueueMapBuffer( queue(), buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (cl_event*) event, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } return result; } inline cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (error != CL_SUCCESS) { return error; } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( queue(), memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } inline cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, ::size_t src_offset, ::size_t dst_offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBuffer(src, dst, src_offset, dst_offset, size, events, event); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, startIterator, endIterator, buffer); } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, buffer, startIterator, endIterator); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; ::size_t length = endIterator-startIterator; ::size_t byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_WRITE, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } #if defined(_MSC_VER) std::copy( startIterator, endIterator, stdext::checked_array_iterator( pointer, length)); #else std::copy(startIterator, endIterator, pointer); #endif Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; ::size_t length = endIterator-startIterator; ::size_t byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_READ, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } std::copy(pointer, pointer + length, startIterator); Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } #if defined(CL_VERSION_1_1) inline cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, const void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, ::size_t src_row_pitch, ::size_t src_slice_pitch, ::size_t dst_row_pitch, ::size_t dst_slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferRect( src, dst, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, events, event); } #endif inline cl_int enqueueReadImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueCopyImage( const Image& src, const Image& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImage( src, dst, src_origin, dst_origin, region, events, event); } inline cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& region, ::size_t dst_offset, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImageToBuffer( src, dst, src_origin, region, dst_offset, events, event); } inline cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, ::size_t src_offset, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferToImage( src, dst, src_offset, dst_origin, region, events, event); } inline cl_int flush(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.flush(); } inline cl_int finish(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.finish(); } // Kernel Functor support // New interface as of September 2011 // Requires the C++11 std::tr1::function (note do not support TR1) // Visual Studio 2010 and GCC 4.2 struct EnqueueArgs { CommandQueue queue_; const NDRange offset_; const NDRange global_; const NDRange local_; VECTOR_CLASS events_; EnqueueArgs(NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { } EnqueueArgs(Event e, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(Event e, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(Event e, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(const VECTOR_CLASS &events, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(const VECTOR_CLASS &events, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(const VECTOR_CLASS &events, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(CommandQueue &queue, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, Event e, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local), events_(events) { } }; namespace detail { class NullType {}; template struct SetArg { static void set (Kernel kernel, T0 arg) { kernel.setArg(index, arg); } }; template struct SetArg { static void set (Kernel, NullType) { } }; template < typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30, typename T31 > class KernelFunctorGlobal { private: Kernel kernel_; public: KernelFunctorGlobal( Kernel kernel) : kernel_(kernel) {} KernelFunctorGlobal( const Program& program, const STRING_CLASS name, cl_int * err = NULL) : kernel_(program, name.c_str(), err) {} Event operator() ( const EnqueueArgs& args, T0 t0, T1 t1 = NullType(), T2 t2 = NullType(), T3 t3 = NullType(), T4 t4 = NullType(), T5 t5 = NullType(), T6 t6 = NullType(), T7 t7 = NullType(), T8 t8 = NullType(), T9 t9 = NullType(), T10 t10 = NullType(), T11 t11 = NullType(), T12 t12 = NullType(), T13 t13 = NullType(), T14 t14 = NullType(), T15 t15 = NullType(), T16 t16 = NullType(), T17 t17 = NullType(), T18 t18 = NullType(), T19 t19 = NullType(), T20 t20 = NullType(), T21 t21 = NullType(), T22 t22 = NullType(), T23 t23 = NullType(), T24 t24 = NullType(), T25 t25 = NullType(), T26 t26 = NullType(), T27 t27 = NullType(), T28 t28 = NullType(), T29 t29 = NullType(), T30 t30 = NullType(), T31 t31 = NullType() ) { Event event; SetArg<0, T0>::set(kernel_, t0); SetArg<1, T1>::set(kernel_, t1); SetArg<2, T2>::set(kernel_, t2); SetArg<3, T3>::set(kernel_, t3); SetArg<4, T4>::set(kernel_, t4); SetArg<5, T5>::set(kernel_, t5); SetArg<6, T6>::set(kernel_, t6); SetArg<7, T7>::set(kernel_, t7); SetArg<8, T8>::set(kernel_, t8); SetArg<9, T9>::set(kernel_, t9); SetArg<10, T10>::set(kernel_, t10); SetArg<11, T11>::set(kernel_, t11); SetArg<12, T12>::set(kernel_, t12); SetArg<13, T13>::set(kernel_, t13); SetArg<14, T14>::set(kernel_, t14); SetArg<15, T15>::set(kernel_, t15); SetArg<16, T16>::set(kernel_, t16); SetArg<17, T17>::set(kernel_, t17); SetArg<18, T18>::set(kernel_, t18); SetArg<19, T19>::set(kernel_, t19); SetArg<20, T20>::set(kernel_, t20); SetArg<21, T21>::set(kernel_, t21); SetArg<22, T22>::set(kernel_, t22); SetArg<23, T23>::set(kernel_, t23); SetArg<24, T24>::set(kernel_, t24); SetArg<25, T25>::set(kernel_, t25); SetArg<26, T26>::set(kernel_, t26); SetArg<27, T27>::set(kernel_, t27); SetArg<28, T28>::set(kernel_, t28); SetArg<29, T29>::set(kernel_, t29); SetArg<30, T30>::set(kernel_, t30); SetArg<31, T31>::set(kernel_, t31); args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } }; //------------------------------------------------------------------------------------------------------ template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30, typename T31> struct functionImplementation_ { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 32)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29, T30 arg30, T31 arg31) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29, arg30, arg31); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 31)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29, T30 arg30) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29, arg30); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 30)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 29)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 28)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 27)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 26)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 25)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 24)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 23)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 22)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 21)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 20)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 19)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 18)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 17)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 16)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 15)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 14)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 13)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 12)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 11)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 10)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 9)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 8)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 7)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 6)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4> struct functionImplementation_ < T0, T1, T2, T3, T4, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 5)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4); } }; template< typename T0, typename T1, typename T2, typename T3> struct functionImplementation_ < T0, T1, T2, T3, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 4)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3); } }; template< typename T0, typename T1, typename T2> struct functionImplementation_ < T0, T1, T2, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 3)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2) { return functor_( enqueueArgs, arg0, arg1, arg2); } }; template< typename T0, typename T1> struct functionImplementation_ < T0, T1, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 2)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1) { return functor_( enqueueArgs, arg0, arg1); } }; template< typename T0> struct functionImplementation_ < T0, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 1)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0) { return functor_( enqueueArgs, arg0); } }; } // namespace detail //---------------------------------------------------------------------------------------------- template < typename T0, typename T1 = detail::NullType, typename T2 = detail::NullType, typename T3 = detail::NullType, typename T4 = detail::NullType, typename T5 = detail::NullType, typename T6 = detail::NullType, typename T7 = detail::NullType, typename T8 = detail::NullType, typename T9 = detail::NullType, typename T10 = detail::NullType, typename T11 = detail::NullType, typename T12 = detail::NullType, typename T13 = detail::NullType, typename T14 = detail::NullType, typename T15 = detail::NullType, typename T16 = detail::NullType, typename T17 = detail::NullType, typename T18 = detail::NullType, typename T19 = detail::NullType, typename T20 = detail::NullType, typename T21 = detail::NullType, typename T22 = detail::NullType, typename T23 = detail::NullType, typename T24 = detail::NullType, typename T25 = detail::NullType, typename T26 = detail::NullType, typename T27 = detail::NullType, typename T28 = detail::NullType, typename T29 = detail::NullType, typename T30 = detail::NullType, typename T31 = detail::NullType > struct make_kernel : public detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 > { public: typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 > FunctorType; make_kernel( const Program& program, const STRING_CLASS name, cl_int * err = NULL) : detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 >( FunctorType(program, name, err)) {} make_kernel( const Kernel kernel) : detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 >( FunctorType(kernel)) {} }; //---------------------------------------------------------------------------------------------------------------------- #undef __ERR_STR #if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS) #undef __GET_DEVICE_INFO_ERR #undef __GET_PLATFORM_INFO_ERR #undef __GET_DEVICE_IDS_ERR #undef __GET_CONTEXT_INFO_ERR #undef __GET_EVENT_INFO_ERR #undef __GET_EVENT_PROFILE_INFO_ERR #undef __GET_MEM_OBJECT_INFO_ERR #undef __GET_IMAGE_INFO_ERR #undef __GET_SAMPLER_INFO_ERR #undef __GET_KERNEL_INFO_ERR #undef __GET_KERNEL_ARG_INFO_ERR #undef __GET_KERNEL_WORK_GROUP_INFO_ERR #undef __GET_PROGRAM_INFO_ERR #undef __GET_PROGRAM_BUILD_INFO_ERR #undef __GET_COMMAND_QUEUE_INFO_ERR #undef __CREATE_CONTEXT_ERR #undef __CREATE_CONTEXT_FROM_TYPE_ERR #undef __GET_SUPPORTED_IMAGE_FORMATS_ERR #undef __CREATE_BUFFER_ERR #undef __CREATE_SUBBUFFER_ERR #undef __CREATE_IMAGE2D_ERR #undef __CREATE_IMAGE3D_ERR #undef __CREATE_SAMPLER_ERR #undef __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR #undef __CREATE_USER_EVENT_ERR #undef __SET_USER_EVENT_STATUS_ERR #undef __SET_EVENT_CALLBACK_ERR #undef __SET_PRINTF_CALLBACK_ERR #undef __WAIT_FOR_EVENTS_ERR #undef __CREATE_KERNEL_ERR #undef __SET_KERNEL_ARGS_ERR #undef __CREATE_PROGRAM_WITH_SOURCE_ERR #undef __CREATE_PROGRAM_WITH_BINARY_ERR #undef __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR #undef __BUILD_PROGRAM_ERR #undef __CREATE_KERNELS_IN_PROGRAM_ERR #undef __CREATE_COMMAND_QUEUE_ERR #undef __SET_COMMAND_QUEUE_PROPERTY_ERR #undef __ENQUEUE_READ_BUFFER_ERR #undef __ENQUEUE_WRITE_BUFFER_ERR #undef __ENQUEUE_READ_BUFFER_RECT_ERR #undef __ENQUEUE_WRITE_BUFFER_RECT_ERR #undef __ENQEUE_COPY_BUFFER_ERR #undef __ENQEUE_COPY_BUFFER_RECT_ERR #undef __ENQUEUE_READ_IMAGE_ERR #undef __ENQUEUE_WRITE_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR #undef __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR #undef __ENQUEUE_MAP_BUFFER_ERR #undef __ENQUEUE_MAP_IMAGE_ERR #undef __ENQUEUE_UNMAP_MEM_OBJECT_ERR #undef __ENQUEUE_NDRANGE_KERNEL_ERR #undef __ENQUEUE_TASK_ERR #undef __ENQUEUE_NATIVE_KERNEL #undef __CL_EXPLICIT_CONSTRUCTORS #undef __UNLOAD_COMPILER_ERR #endif //__CL_USER_OVERRIDE_ERROR_STRINGS #undef __CL_FUNCTION_TYPE // Extensions /** * Deprecated APIs for 1.2 */ #if defined(CL_VERSION_1_1) #undef __INIT_CL_EXT_FCN_PTR #endif // #if defined(CL_VERSION_1_1) #undef __CREATE_SUB_DEVICES #if defined(USE_CL_DEVICE_FISSION) #undef __PARAM_NAME_DEVICE_FISSION #endif // USE_CL_DEVICE_FISSION #undef __DEFAULT_NOT_INITIALIZED #undef __DEFAULT_BEING_INITIALIZED #undef __DEFAULT_INITIALIZED #undef CL_HPP_RVALUE_REFERENCES_SUPPORTED #undef CL_HPP_NOEXCEPT } // namespace cl #endif // CL_HPP_ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl_d3d10.h000066400000000000000000000120021450307266000232570ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D10_H #define __OPENCL_CL_D3D10_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d10_sharing */ #define cl_khr_d3d10_sharing 1 typedef cl_uint cl_d3d10_device_source_khr; typedef cl_uint cl_d3d10_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D10_DEVICE_KHR -1002 #define CL_INVALID_D3D10_RESOURCE_KHR -1003 #define CL_D3D10_RESOURCE_ALREADY_ACQUIRED_KHR -1004 #define CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR -1005 /* cl_d3d10_device_source_nv */ #define CL_D3D10_DEVICE_KHR 0x4010 #define CL_D3D10_DXGI_ADAPTER_KHR 0x4011 /* cl_d3d10_device_set_nv */ #define CL_PREFERRED_DEVICES_FOR_D3D10_KHR 0x4012 #define CL_ALL_DEVICES_FOR_D3D10_KHR 0x4013 /* cl_context_info */ #define CL_CONTEXT_D3D10_DEVICE_KHR 0x4014 #define CL_CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR 0x402C /* cl_mem_info */ #define CL_MEM_D3D10_RESOURCE_KHR 0x4015 /* cl_image_info */ #define CL_IMAGE_D3D10_SUBRESOURCE_KHR 0x4016 /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR 0x4017 #define CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR 0x4018 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D10KHR_fn)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D10_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl_d3d11.h000066400000000000000000000117741450307266000232770ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D11_H #define __OPENCL_CL_D3D11_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d11_sharing */ #define cl_khr_d3d11_sharing 1 typedef cl_uint cl_d3d11_device_source_khr; typedef cl_uint cl_d3d11_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D11_DEVICE_KHR -1006 #define CL_INVALID_D3D11_RESOURCE_KHR -1007 #define CL_D3D11_RESOURCE_ALREADY_ACQUIRED_KHR -1008 #define CL_D3D11_RESOURCE_NOT_ACQUIRED_KHR -1009 /* cl_d3d11_device_source */ #define CL_D3D11_DEVICE_KHR 0x4019 #define CL_D3D11_DXGI_ADAPTER_KHR 0x401A /* cl_d3d11_device_set */ #define CL_PREFERRED_DEVICES_FOR_D3D11_KHR 0x401B #define CL_ALL_DEVICES_FOR_D3D11_KHR 0x401C /* cl_context_info */ #define CL_CONTEXT_D3D11_DEVICE_KHR 0x401D #define CL_CONTEXT_D3D11_PREFER_SHARED_RESOURCES_KHR 0x402D /* cl_mem_info */ #define CL_MEM_D3D11_RESOURCE_KHR 0x401E /* cl_image_info */ #define CL_IMAGE_D3D11_SUBRESOURCE_KHR 0x401F /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D11_OBJECTS_KHR 0x4020 #define CL_COMMAND_RELEASE_D3D11_OBJECTS_KHR 0x4021 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D11KHR_fn)( cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void * d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D11_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl_dx9_media_sharing.h000066400000000000000000000124541450307266000260350ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_DX9_MEDIA_SHARING_H #define __OPENCL_CL_DX9_MEDIA_SHARING_H #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** /* cl_khr_dx9_media_sharing */ #define cl_khr_dx9_media_sharing 1 typedef cl_uint cl_dx9_media_adapter_type_khr; typedef cl_uint cl_dx9_media_adapter_set_khr; #if defined(_WIN32) #include typedef struct _cl_dx9_surface_info_khr { IDirect3DSurface9 *resource; HANDLE shared_handle; } cl_dx9_surface_info_khr; #endif /******************************************************************************/ /* Error Codes */ #define CL_INVALID_DX9_MEDIA_ADAPTER_KHR -1010 #define CL_INVALID_DX9_MEDIA_SURFACE_KHR -1011 #define CL_DX9_MEDIA_SURFACE_ALREADY_ACQUIRED_KHR -1012 #define CL_DX9_MEDIA_SURFACE_NOT_ACQUIRED_KHR -1013 /* cl_media_adapter_type_khr */ #define CL_ADAPTER_D3D9_KHR 0x2020 #define CL_ADAPTER_D3D9EX_KHR 0x2021 #define CL_ADAPTER_DXVA_KHR 0x2022 /* cl_media_adapter_set_khr */ #define CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2023 #define CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2024 /* cl_context_info */ #define CL_CONTEXT_ADAPTER_D3D9_KHR 0x2025 #define CL_CONTEXT_ADAPTER_D3D9EX_KHR 0x2026 #define CL_CONTEXT_ADAPTER_DXVA_KHR 0x2027 /* cl_mem_info */ #define CL_MEM_DX9_MEDIA_ADAPTER_TYPE_KHR 0x2028 #define CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR 0x2029 /* cl_image_info */ #define CL_IMAGE_DX9_MEDIA_PLANE_KHR 0x202A /* cl_command_type */ #define CL_COMMAND_ACQUIRE_DX9_MEDIA_SURFACES_KHR 0x202B #define CL_COMMAND_RELEASE_DX9_MEDIA_SURFACES_KHR 0x202C /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromDX9MediaAdapterKHR_fn)( cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr * media_adapter_type, void * media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceKHR_fn)( cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void * surface_info, cl_uint plane, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_DX9_MEDIA_SHARING_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl_ext.h000066400000000000000000000716131450307266000232610ustar00rootroot00000000000000/* Modifications Copyright (C) 2010-2021 Advanced Micro Devices, Inc. */ /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /* $Revision: 11928 $ on $Date: 2010-07-13 09:04:56 -0700 (Tue, 13 Jul 2010) $ */ /* cl_ext.h contains OpenCL extensions which don't have external */ /* (OpenGL, D3D) dependencies. */ #ifndef __CL_EXT_H #define __CL_EXT_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #include #else #include #endif /* cl_khr_fp16 extension - no extension #define since it has no functions */ #define CL_DEVICE_HALF_FP_CONFIG 0x1033 /* Memory object destruction * * Apple extension for use to manage externally allocated buffers used with cl_mem objects with CL_MEM_USE_HOST_PTR * * Registers a user callback function that will be called when the memory object is deleted and its resources * freed. Each call to clSetMemObjectCallbackFn registers the specified user callback function on a callback * stack associated with memobj. The registered user callback functions are called in the reverse order in * which they were registered. The user callback functions are called and then the memory object is deleted * and its resources freed. This provides a mechanism for the application (and libraries) using memobj to be * notified when the memory referenced by host_ptr, specified when the memory object is created and used as * the storage bits for the memory object, can be reused or freed. * * The application may not call CL api's with the cl_mem object passed to the pfn_notify. * * Please check for the "cl_APPLE_SetMemObjectDestructor" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. */ #define cl_APPLE_SetMemObjectDestructor 1 cl_int CL_API_ENTRY clSetMemObjectDestructorAPPLE( cl_mem /* memobj */, void (* /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/), void * /*user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* Context Logging Functions * * The next three convenience functions are intended to be used as the pfn_notify parameter to clCreateContext(). * Please check for the "cl_APPLE_ContextLoggingFunctions" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. * * clLogMessagesToSystemLog fowards on all log messages to the Apple System Logger */ #define cl_APPLE_ContextLoggingFunctions 1 extern void CL_API_ENTRY clLogMessagesToSystemLogAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStdout sends all log messages to the file descriptor stdout */ extern void CL_API_ENTRY clLogMessagesToStdoutAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStderr sends all log messages to the file descriptor stderr */ extern void CL_API_ENTRY clLogMessagesToStderrAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /************************ * cl_khr_icd extension * ************************/ #define cl_khr_icd 1 /* cl_platform_info */ #define CL_PLATFORM_ICD_SUFFIX_KHR 0x0920 /* Additional Error Codes */ #define CL_PLATFORM_NOT_FOUND_KHR -1001 extern CL_API_ENTRY cl_int CL_API_CALL clIcdGetPlatformIDsKHR(cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */); typedef CL_API_ENTRY cl_int (CL_API_CALL *clIcdGetPlatformIDsKHR_fn)( cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */); /* Extension: cl_khr_image2D_buffer * * This extension allows a 2D image to be created from a cl_mem buffer without a copy. * The type associated with a 2D image created from a buffer in an OpenCL program is image2d_t. * Both the sampler and sampler-less read_image built-in functions are supported for 2D images * and 2D images created from a buffer. Similarly, the write_image built-ins are also supported * for 2D images created from a buffer. * * When the 2D image from buffer is created, the client must specify the width, * height, image format (i.e. channel order and channel data type) and optionally the row pitch * * The pitch specified must be a multiple of CL_DEVICE_IMAGE_PITCH_ALIGNMENT pixels. * The base address of the buffer must be aligned to CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT pixels. */ /************************************* * cl_khr_initalize_memory extension * *************************************/ #define CL_CONTEXT_MEMORY_INITIALIZE_KHR 0x200E /************************************** * cl_khr_terminate_context extension * **************************************/ #define CL_DEVICE_TERMINATE_CAPABILITY_KHR 0x200F #define CL_CONTEXT_TERMINATE_KHR 0x2010 #define cl_khr_terminate_context 1 extern CL_API_ENTRY cl_int CL_API_CALL clTerminateContextKHR(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clTerminateContextKHR_fn)(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2; /* * Extension: cl_khr_spir * * This extension adds support to create an OpenCL program object from a * Standard Portable Intermediate Representation (SPIR) instance */ #define CL_DEVICE_SPIR_VERSIONS 0x40E0 #define CL_PROGRAM_BINARY_TYPE_INTERMEDIATE 0x40E1 /****************************************** * cl_nv_device_attribute_query extension * ******************************************/ /* cl_nv_device_attribute_query extension - no extension #define since it has no functions */ #define CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV 0x4000 #define CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV 0x4001 #define CL_DEVICE_REGISTERS_PER_BLOCK_NV 0x4002 #define CL_DEVICE_WARP_SIZE_NV 0x4003 #define CL_DEVICE_GPU_OVERLAP_NV 0x4004 #define CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV 0x4005 #define CL_DEVICE_INTEGRATED_MEMORY_NV 0x4006 /********************************* * cl_amd_device_memory_flags * *********************************/ #define cl_amd_device_memory_flags 1 #define CL_MEM_USE_PERSISTENT_MEM_AMD (1 << 6) // Alloc from GPU's CPU visible heap /* cl_device_info */ #define CL_DEVICE_MAX_ATOMIC_COUNTERS_EXT 0x4032 /********************************* * cl_amd_device_attribute_query * *********************************/ #define CL_DEVICE_PROFILING_TIMER_OFFSET_AMD 0x4036 #define CL_DEVICE_TOPOLOGY_AMD 0x4037 #define CL_DEVICE_BOARD_NAME_AMD 0x4038 #define CL_DEVICE_GLOBAL_FREE_MEMORY_AMD 0x4039 #define CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD 0x4040 #define CL_DEVICE_SIMD_WIDTH_AMD 0x4041 #define CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD 0x4042 #define CL_DEVICE_WAVEFRONT_WIDTH_AMD 0x4043 #define CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD 0x4044 #define CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD 0x4045 #define CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD 0x4046 #define CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD 0x4047 #define CL_DEVICE_LOCAL_MEM_BANKS_AMD 0x4048 #define CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD 0x4049 #define CL_DEVICE_GFXIP_MAJOR_AMD 0x404A #define CL_DEVICE_GFXIP_MINOR_AMD 0x404B #define CL_DEVICE_AVAILABLE_ASYNC_QUEUES_AMD 0x404C #define CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD 0x4030 #define CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD 0x4031 #define CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD 0x4033 #define CL_DEVICE_PCIE_ID_AMD 0x4034 typedef union { struct { cl_uint type; cl_uint data[5]; } raw; struct { cl_uint type; cl_uchar unused[17]; cl_uchar bus; cl_uchar device; cl_uchar function; } pcie; } cl_device_topology_amd; #define CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD 1 /************************** * cl_amd_offline_devices * **************************/ #define CL_CONTEXT_OFFLINE_DEVICES_AMD 0x403F /******************************** * cl_amd_bus_addressable_memory * ********************************/ /* cl_mem flag - bitfield */ #define CL_MEM_BUS_ADDRESSABLE_AMD (1<<30) #define CL_MEM_EXTERNAL_PHYSICAL_AMD (1<<31) #define CL_COMMAND_WAIT_SIGNAL_AMD 0x4080 #define CL_COMMAND_WRITE_SIGNAL_AMD 0x4081 #define CL_COMMAND_MAKE_BUFFERS_RESIDENT_AMD 0x4082 typedef struct _cl_bus_address_amd { cl_ulong surface_bus_address; cl_ulong marker_bus_address; } cl_bus_address_amd; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueWaitSignalAMD_fn)( cl_command_queue /*command_queue*/, cl_mem /*mem_object*/, cl_uint /*value*/, cl_uint /*num_events*/, const cl_event * /*event_wait_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueWriteSignalAMD_fn)( cl_command_queue /*command_queue*/, cl_mem /*mem_object*/, cl_uint /*value*/, cl_ulong /*offset*/, cl_uint /*num_events*/, const cl_event * /*event_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueMakeBuffersResidentAMD_fn)( cl_command_queue /*command_queue*/, cl_uint /*num_mem_objs*/, cl_mem * /*mem_objects*/, cl_bool /*blocking_make_resident*/, cl_bus_address_amd * /*bus_addresses*/, cl_uint /*num_events*/, const cl_event * /*event_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; /************************* * cl_amd_copy_buffer_p2p * **************************/ #define CL_DEVICE_NUM_P2P_DEVICES_AMD 0x4088 #define CL_DEVICE_P2P_DEVICES_AMD 0x4089 #define cl_amd_copy_buffer_p2p 1 typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueCopyBufferP2PAMD_fn)(cl_command_queue /*command_queue*/, cl_mem /*src_buffer*/, cl_mem /*dst_buffer*/, size_t /*src_offset*/, size_t /*dst_offset*/, size_t /*cb*/, cl_uint /*num_events_in_wait_list*/, const cl_event* /*event_wait_list*/, cl_event* /*event*/) CL_EXT_SUFFIX__VERSION_1_2; /*********************************** * cl_amd_assembly_program extension * ***********************************/ #define cl_amd_assembly_program 1 typedef CL_API_ENTRY cl_program (CL_API_CALL * clCreateProgramWithAssemblyAMD_fn) ( cl_context /* context */, cl_uint /* count */, const char** /* strings */, const size_t* /* lengths */, cl_int* /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2; // /************************** * cl_amd_command_queue_info * **************************/ #define CL_QUEUE_THREAD_HANDLE_AMD 0x403E /* cl_kernel_exec_info for DVR DOPP texture support */ #define CL_KERNEL_EXEC_INFO_NEW_VCOP_AMD 0x4120 #define CL_KERNEL_EXEC_INFO_PFPA_VCOP_AMD 0x4121 // /********************************* * cl_arm_printf extension *********************************/ #define CL_PRINTF_CALLBACK_ARM 0x40B0 #define CL_PRINTF_BUFFERSIZE_ARM 0x40B1 #ifdef CL_VERSION_1_1 /*********************************** * cl_ext_device_fission extension * ***********************************/ #define cl_ext_device_fission 1 extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clReleaseDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clRetainDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef cl_ulong cl_device_partition_property_ext; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevicesEXT( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int ( CL_API_CALL * clCreateSubDevicesEXT_fn)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; /* cl_device_partition_property_ext */ #define CL_DEVICE_PARTITION_EQUALLY_EXT 0x4050 #define CL_DEVICE_PARTITION_BY_COUNTS_EXT 0x4051 #define CL_DEVICE_PARTITION_BY_NAMES_EXT 0x4052 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT 0x4053 /* clDeviceGetInfo selectors */ #define CL_DEVICE_PARENT_DEVICE_EXT 0x4054 #define CL_DEVICE_PARTITION_TYPES_EXT 0x4055 #define CL_DEVICE_AFFINITY_DOMAINS_EXT 0x4056 #define CL_DEVICE_REFERENCE_COUNT_EXT 0x4057 #define CL_DEVICE_PARTITION_STYLE_EXT 0x4058 /* clGetImageInfo enum */ #define CL_IMAGE_BYTE_PITCH_AMD 0x4059 /* error codes */ #define CL_DEVICE_PARTITION_FAILED_EXT -1057 #define CL_INVALID_PARTITION_COUNT_EXT -1058 #define CL_INVALID_PARTITION_NAME_EXT -1059 /* CL_AFFINITY_DOMAINs */ #define CL_AFFINITY_DOMAIN_L1_CACHE_EXT 0x1 #define CL_AFFINITY_DOMAIN_L2_CACHE_EXT 0x2 #define CL_AFFINITY_DOMAIN_L3_CACHE_EXT 0x3 #define CL_AFFINITY_DOMAIN_L4_CACHE_EXT 0x4 #define CL_AFFINITY_DOMAIN_NUMA_EXT 0x10 #define CL_AFFINITY_DOMAIN_NEXT_FISSIONABLE_EXT 0x100 /* cl_device_partition_property_ext list terminators */ #define CL_PROPERTIES_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_COUNTS_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_NAMES_LIST_END_EXT ((cl_device_partition_property_ext) 0 - 1) /********************************* * cl_qcom_ext_host_ptr extension *********************************/ #define CL_MEM_EXT_HOST_PTR_QCOM (1 << 29) #define CL_DEVICE_EXT_MEM_PADDING_IN_BYTES_QCOM 0x40A0 #define CL_DEVICE_PAGE_SIZE_QCOM 0x40A1 #define CL_IMAGE_ROW_ALIGNMENT_QCOM 0x40A2 #define CL_IMAGE_SLICE_ALIGNMENT_QCOM 0x40A3 #define CL_MEM_HOST_UNCACHED_QCOM 0x40A4 #define CL_MEM_HOST_WRITEBACK_QCOM 0x40A5 #define CL_MEM_HOST_WRITETHROUGH_QCOM 0x40A6 #define CL_MEM_HOST_WRITE_COMBINING_QCOM 0x40A7 typedef cl_uint cl_image_pitch_info_qcom; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceImageInfoQCOM(cl_device_id device, size_t image_width, size_t image_height, const cl_image_format *image_format, cl_image_pitch_info_qcom param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); typedef struct _cl_mem_ext_host_ptr { /* Type of external memory allocation. */ /* Legal values will be defined in layered extensions. */ cl_uint allocation_type; /* Host cache policy for this external memory allocation. */ cl_uint host_cache_policy; } cl_mem_ext_host_ptr; /********************************* * cl_qcom_ion_host_ptr extension *********************************/ #define CL_MEM_ION_HOST_PTR_QCOM 0x40A8 typedef struct _cl_mem_ion_host_ptr { /* Type of external memory allocation. */ /* Must be CL_MEM_ION_HOST_PTR_QCOM for ION allocations. */ cl_mem_ext_host_ptr ext_host_ptr; /* ION file descriptor */ int ion_filedesc; /* Host pointer to the ION allocated memory */ void* ion_hostptr; } cl_mem_ion_host_ptr; #endif /* CL_VERSION_1_1 */ #if defined(CL_VERSION_1_2) /****************************************** * cl_img_yuv_image extension * ******************************************/ /* Image formats used in clCreateImage */ #define CL_NV21_IMG 0x40D0 #define CL_YV12_IMG 0x40D1 /****************************************** * cl_img_cached_allocations extension * ******************************************/ /* Flag values used by clCreteBuffer */ #define CL_MEM_USE_UNCACHED_CPU_MEMORY_IMG (1 << 26) #define CL_MEM_USE_CACHED_CPU_MEMORY_IMG (1 << 27) /****************************************** * cl_img_use_gralloc_ptr extension * ******************************************/ /* Flag values used by clCreteBuffer */ #define CL_MEM_USE_GRALLOC_PTR_IMG (1 << 28) /* To be used by clGetEventInfo: */ #define CL_COMMAND_ACQUIRE_GRALLOC_OBJECTS_IMG 0x40D2 #define CL_COMMAND_RELEASE_GRALLOC_OBJECTS_IMG 0x40D3 /* Error code from clEnqueueReleaseGrallocObjectsIMG */ #define CL_GRALLOC_RESOURCE_NOT_ACQUIRED_IMG 0x40D4 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGrallocObjectsIMG(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGrallocObjectsIMG(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; #endif /* CL_VERSION_1_2 */ /********************************** * cl_arm_import_memory extension * **********************************/ #ifdef CL_VERSION_1_0 typedef intptr_t cl_import_properties_arm; /* Default and valid proporties name for cl_arm_import_memory */ #define CL_IMPORT_TYPE_ARM 0x40B2 /* Host process memory type default value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_HOST_ARM 0x40B3 /* DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_DMA_BUF_ARM 0x40B4 /* Secure DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_SECURE_ARM 0x40B5 /* This extension adds a new function that allows for direct memory import into * OpenCL via the clImportMemoryARM function. * * Memory imported through this interface will be mapped into the device's page * tables directly, providing zero copy access. It will never fall back to copy * operations and aliased buffers. * * Types of memory supported for import are specified as additional extension * strings. * * This extension produces cl_mem allocations which are compatible with all other * users of cl_mem in the standard API. * * This extension maps pages with the same properties as the normal buffer creation * function clCreateBuffer. */ extern CL_API_ENTRY cl_mem CL_API_CALL clImportMemoryARM( cl_context context, cl_mem_flags flags, const cl_import_properties_arm *properties, void *memory, size_t size, cl_int *errcode_ret) CL_EXT_SUFFIX__VERSION_1_0; #endif /* CL_VERSION_1_0 */ /****************************************** * cl_arm_shared_virtual_memory extension * ******************************************/ #ifdef CL_VERSION_1_2 /* Used by clGetDeviceInfo */ #define CL_DEVICE_SVM_CAPABILITIES_ARM 0x40B6 /* Used by clGetMemObjectInfo */ #define CL_MEM_USES_SVM_POINTER_ARM 0x40B7 /* Used by clSetKernelExecInfoARM: */ #define CL_KERNEL_EXEC_INFO_SVM_PTRS_ARM 0x40B8 #define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM_ARM 0x40B9 /* To be used by clGetEventInfo: */ #define CL_COMMAND_SVM_FREE_ARM 0x40BA #define CL_COMMAND_SVM_MEMCPY_ARM 0x40BB #define CL_COMMAND_SVM_MEMFILL_ARM 0x40BC #define CL_COMMAND_SVM_MAP_ARM 0x40BD #define CL_COMMAND_SVM_UNMAP_ARM 0x40BE /* Flag values returned by clGetDeviceInfo with CL_DEVICE_SVM_CAPABILITIES_ARM as the param_name. */ #define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER_ARM (1 << 0) #define CL_DEVICE_SVM_FINE_GRAIN_BUFFER_ARM (1 << 1) #define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM_ARM (1 << 2) #define CL_DEVICE_SVM_ATOMICS_ARM (1 << 3) /* Flag values used by clSVMAllocARM: */ #define CL_MEM_SVM_FINE_GRAIN_BUFFER_ARM (1 << 10) #define CL_MEM_SVM_ATOMICS_ARM (1 << 11) typedef cl_bitfield cl_svm_mem_flags_arm; typedef cl_uint cl_kernel_exec_info_arm; typedef cl_bitfield cl_device_svm_capabilities_arm; extern CL_API_ENTRY void * CL_API_CALL clSVMAllocARM(cl_context /* context */, cl_svm_mem_flags_arm /* flags */, size_t /* size */, cl_uint /* alignment */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY void CL_API_CALL clSVMFreeARM(cl_context /* context */, void * /* svm_pointer */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFreeARM(cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void (CL_CALLBACK * /*pfn_free_func*/)(cl_command_queue /* queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void * /* user_data */), void * /* user_data */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpyARM(cl_command_queue /* command_queue */, cl_bool /* blocking_copy */, void * /* dst_ptr */, const void * /* src_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFillARM(cl_command_queue /* command_queue */, void * /* svm_ptr */, const void * /* pattern */, size_t /* pattern_size */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMapARM(cl_command_queue /* command_queue */, cl_bool /* blocking_map */, cl_map_flags /* flags */, void * /* svm_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmapARM(cl_command_queue /* command_queue */, void * /* svm_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointerARM(cl_kernel /* kernel */, cl_uint /* arg_index */, const void * /* arg_value */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfoARM(cl_kernel /* kernel */, cl_kernel_exec_info_arm /* param_name */, size_t /* param_value_size */, const void * /* param_value */) CL_EXT_SUFFIX__VERSION_1_2; #endif /* CL_VERSION_1_2 */ #ifdef __cplusplus } #endif #endif /* __CL_EXT_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl_gl.h000066400000000000000000000166371450307266000230700ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ #ifndef __OPENCL_CL_GL_H #define __OPENCL_CL_GL_H #ifdef __APPLE__ #include #else #include #endif #ifdef __cplusplus extern "C" { #endif typedef cl_uint cl_gl_object_type; typedef cl_uint cl_gl_texture_info; typedef cl_uint cl_gl_platform_info; typedef struct __GLsync *cl_GLsync; /* cl_gl_object_type = 0x2000 - 0x200F enum values are currently taken */ #define CL_GL_OBJECT_BUFFER 0x2000 #define CL_GL_OBJECT_TEXTURE2D 0x2001 #define CL_GL_OBJECT_TEXTURE3D 0x2002 #define CL_GL_OBJECT_RENDERBUFFER 0x2003 #define CL_GL_OBJECT_TEXTURE2D_ARRAY 0x200E #define CL_GL_OBJECT_TEXTURE1D 0x200F #define CL_GL_OBJECT_TEXTURE1D_ARRAY 0x2010 #define CL_GL_OBJECT_TEXTURE_BUFFER 0x2011 /* cl_gl_texture_info */ #define CL_GL_TEXTURE_TARGET 0x2004 #define CL_GL_MIPMAP_LEVEL 0x2005 #define CL_GL_NUM_SAMPLES 0x2012 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLBuffer(cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* bufobj */, int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLRenderbuffer(cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* renderbuffer */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLObjectInfo(cl_mem /* memobj */, cl_gl_object_type * /* gl_object_type */, cl_GLuint * /* gl_object_name */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLTextureInfo(cl_mem /* memobj */, cl_gl_texture_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGLObjects(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGLObjects(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture2D(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture3D(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* cl_khr_gl_sharing extension */ #define cl_khr_gl_sharing 1 typedef cl_uint cl_gl_context_info; /* Additional Error Codes */ #define CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR -1000 /* cl_gl_context_info */ #define CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR 0x2006 #define CL_DEVICES_FOR_GL_CONTEXT_KHR 0x2007 /* Additional cl_context_properties */ #define CL_GL_CONTEXT_KHR 0x2008 #define CL_EGL_DISPLAY_KHR 0x2009 #define CL_GLX_DISPLAY_KHR 0x200A #define CL_WGL_HDC_KHR 0x200B #define CL_CGL_SHAREGROUP_KHR 0x200C extern CL_API_ENTRY cl_int CL_API_CALL clGetGLContextInfoKHR(const cl_context_properties * /* properties */, cl_gl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetGLContextInfoKHR_fn)( const cl_context_properties * properties, cl_gl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl_gl_ext.h000066400000000000000000000054651450307266000237450ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ /* cl_gl_ext.h contains vendor (non-KHR) OpenCL extensions which have */ /* OpenGL dependencies. */ #ifndef __OPENCL_CL_GL_EXT_H #define __OPENCL_CL_GL_EXT_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #else #include #endif /* * For each extension, follow this template * cl_VEN_extname extension */ /* #define cl_VEN_extname 1 * ... define new types, if any * ... define new tokens, if any * ... define new APIs, if any * * If you need GLtypes here, mirror them with a cl_GLtype, rather than including a GL header * This allows us to avoid having to decide whether to include GL headers or GLES here. */ /* * cl_khr_gl_event extension * See section 9.9 in the OpenCL 1.1 spec for more information */ #define CL_COMMAND_GL_FENCE_SYNC_OBJECT_KHR 0x200D extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromGLsyncKHR(cl_context /* context */, cl_GLsync /* cl_GLsync */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_EXT_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/cl_platform.h000066400000000000000000001265531450307266000243110ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11803 $ on $Date: 2010-06-25 10:02:12 -0700 (Fri, 25 Jun 2010) $ */ #ifndef __CL_PLATFORM_H #define __CL_PLATFORM_H #ifdef __APPLE__ /* Contains #defines for AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER below */ #include #endif #ifdef __cplusplus extern "C" { #endif #if defined(_WIN32) #define CL_API_ENTRY #define CL_API_CALL __stdcall #define CL_CALLBACK __stdcall #else #define CL_API_ENTRY #define CL_API_CALL #define CL_CALLBACK #endif #ifdef __APPLE__ #define CL_EXTENSION_WEAK_LINK __attribute__((weak_import)) #define CL_API_SUFFIX__VERSION_1_0 AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_0 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER #define CL_API_SUFFIX__VERSION_1_1 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define GCL_API_SUFFIX__VERSION_1_1 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_1 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER_BUT_DEPRECATED_IN_MAC_OS_X_VERSION_10_7 #ifdef AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define GCL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_2 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER_BUT_DEPRECATED_IN_MAC_OS_X_VERSION_10_8 #else #warning This path should never happen outside of internal operating system development. AvailabilityMacros do not function correctly here! #define CL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define GCL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_2 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #endif #else #define CL_EXTENSION_WEAK_LINK #define CL_API_SUFFIX__VERSION_1_0 #define CL_EXT_SUFFIX__VERSION_1_0 #define CL_API_SUFFIX__VERSION_1_1 #define CL_EXT_SUFFIX__VERSION_1_1 #define CL_API_SUFFIX__VERSION_1_2 #define CL_EXT_SUFFIX__VERSION_1_2 #ifdef __GNUC__ #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif #elif defined(_WIN32) #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED __declspec(deprecated) #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED __declspec(deprecated) #endif #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif #endif #if (defined (_WIN32) && defined(_MSC_VER)) /* scalar types */ typedef signed __int8 cl_char; typedef unsigned __int8 cl_uchar; typedef signed __int16 cl_short; typedef unsigned __int16 cl_ushort; typedef signed __int32 cl_int; typedef unsigned __int32 cl_uint; typedef signed __int64 cl_long; typedef unsigned __int64 cl_ulong; typedef unsigned __int16 cl_half; typedef float cl_float; typedef double cl_double; /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 1.1920928955078125e-7f #define CL_HALF_DIG 3 #define CL_HALF_MANT_DIG 11 #define CL_HALF_MAX_10_EXP +4 #define CL_HALF_MAX_EXP +16 #define CL_HALF_MIN_10_EXP -4 #define CL_HALF_MIN_EXP -13 #define CL_HALF_RADIX 2 #define CL_HALF_MAX 65504.0f #define CL_HALF_MIN 6.103515625e-05f #define CL_HALF_EPSILON 9.765625e-04f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 1.7976931348623158e+308 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.7182818284590452354 #define CL_M_LOG2E 1.4426950408889634074 #define CL_M_LOG10E 0.43429448190325182765 #define CL_M_LN2 0.69314718055994530942 #define CL_M_LN10 2.30258509299404568402 #define CL_M_PI 3.14159265358979323846 #define CL_M_PI_2 1.57079632679489661923 #define CL_M_PI_4 0.78539816339744830962 #define CL_M_1_PI 0.31830988618379067154 #define CL_M_2_PI 0.63661977236758134308 #define CL_M_2_SQRTPI 1.12837916709551257390 #define CL_M_SQRT2 1.41421356237309504880 #define CL_M_SQRT1_2 0.70710678118654752440 #define CL_M_E_F 2.718281828f #define CL_M_LOG2E_F 1.442695041f #define CL_M_LOG10E_F 0.434294482f #define CL_M_LN2_F 0.693147181f #define CL_M_LN10_F 2.302585093f #define CL_M_PI_F 3.141592654f #define CL_M_PI_2_F 1.570796327f #define CL_M_PI_4_F 0.785398163f #define CL_M_1_PI_F 0.318309886f #define CL_M_2_PI_F 0.636619772f #define CL_M_2_SQRTPI_F 1.128379167f #define CL_M_SQRT2_F 1.414213562f #define CL_M_SQRT1_2_F 0.707106781f #define CL_NAN (CL_INFINITY - CL_INFINITY) #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #else #include /* scalar types */ typedef int8_t cl_char; typedef uint8_t cl_uchar; typedef int16_t cl_short __attribute__((aligned(2))); typedef uint16_t cl_ushort __attribute__((aligned(2))); typedef int32_t cl_int __attribute__((aligned(4))); typedef uint32_t cl_uint __attribute__((aligned(4))); typedef int64_t cl_long __attribute__((aligned(8))); typedef uint64_t cl_ulong __attribute__((aligned(8))); typedef uint16_t cl_half __attribute__((aligned(2))); typedef float cl_float __attribute__((aligned(4))); typedef double cl_double __attribute__((aligned(8))); /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 1.1920928955078125e-7f #define CL_HALF_DIG 3 #define CL_HALF_MANT_DIG 11 #define CL_HALF_MAX_10_EXP +4 #define CL_HALF_MAX_EXP +16 #define CL_HALF_MIN_10_EXP -4 #define CL_HALF_MIN_EXP -13 #define CL_HALF_RADIX 2 #define CL_HALF_MAX 65504.0f #define CL_HALF_MIN 6.103515625e-05f #define CL_HALF_EPSILON 9.765625e-04f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.7182818284590452354 #define CL_M_LOG2E 1.4426950408889634074 #define CL_M_LOG10E 0.43429448190325182765 #define CL_M_LN2 0.69314718055994530942 #define CL_M_LN10 2.30258509299404568402 #define CL_M_PI 3.14159265358979323846 #define CL_M_PI_2 1.57079632679489661923 #define CL_M_PI_4 0.78539816339744830962 #define CL_M_1_PI 0.31830988618379067154 #define CL_M_2_PI 0.63661977236758134308 #define CL_M_2_SQRTPI 1.12837916709551257390 #define CL_M_SQRT2 1.41421356237309504880 #define CL_M_SQRT1_2 0.70710678118654752440 #define CL_M_E_F 2.718281828f #define CL_M_LOG2E_F 1.442695041f #define CL_M_LOG10E_F 0.434294482f #define CL_M_LN2_F 0.693147181f #define CL_M_LN10_F 2.302585093f #define CL_M_PI_F 3.141592654f #define CL_M_PI_2_F 1.570796327f #define CL_M_PI_4_F 0.785398163f #define CL_M_1_PI_F 0.318309886f #define CL_M_2_PI_F 0.636619772f #define CL_M_2_SQRTPI_F 1.128379167f #define CL_M_SQRT2_F 1.414213562f #define CL_M_SQRT1_2_F 0.707106781f #if defined( __GNUC__ ) #define CL_HUGE_VALF __builtin_huge_valf() #define CL_HUGE_VAL __builtin_huge_val() #define CL_NAN __builtin_nanf( "" ) #else #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) float nanf( const char * ); #define CL_NAN nanf( "" ) #endif #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #endif #include /* Mirror types to GL types. Mirror types allow us to avoid deciding which 87s to load based on whether we are using GL or GLES here. */ typedef unsigned int cl_GLuint; typedef int cl_GLint; typedef unsigned int cl_GLenum; /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ /* Define basic vector types */ #if defined( __VEC__ ) #include /* may be omitted depending on compiler. AltiVec spec provides no way to detect whether the header is required. */ typedef vector unsigned char __cl_uchar16; typedef vector signed char __cl_char16; typedef vector unsigned short __cl_ushort8; typedef vector signed short __cl_short8; typedef vector unsigned int __cl_uint4; typedef vector signed int __cl_int4; typedef vector float __cl_float4; #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_UINT4__ 1 #define __CL_INT4__ 1 #define __CL_FLOAT4__ 1 #endif #if defined( __SSE__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef float __cl_float4 __attribute__((vector_size(16))); #else typedef __m128 __cl_float4; #endif #define __CL_FLOAT4__ 1 #endif #if defined( __SSE2__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar16 __attribute__((vector_size(16))); typedef cl_char __cl_char16 __attribute__((vector_size(16))); typedef cl_ushort __cl_ushort8 __attribute__((vector_size(16))); typedef cl_short __cl_short8 __attribute__((vector_size(16))); typedef cl_uint __cl_uint4 __attribute__((vector_size(16))); typedef cl_int __cl_int4 __attribute__((vector_size(16))); typedef cl_ulong __cl_ulong2 __attribute__((vector_size(16))); typedef cl_long __cl_long2 __attribute__((vector_size(16))); typedef cl_double __cl_double2 __attribute__((vector_size(16))); #else typedef __m128i __cl_uchar16; typedef __m128i __cl_char16; typedef __m128i __cl_ushort8; typedef __m128i __cl_short8; typedef __m128i __cl_uint4; typedef __m128i __cl_int4; typedef __m128i __cl_ulong2; typedef __m128i __cl_long2; typedef __m128d __cl_double2; #endif #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_INT4__ 1 #define __CL_UINT4__ 1 #define __CL_ULONG2__ 1 #define __CL_LONG2__ 1 #define __CL_DOUBLE2__ 1 #endif #if defined( __MMX__ ) #include #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar8 __attribute__((vector_size(8))); typedef cl_char __cl_char8 __attribute__((vector_size(8))); typedef cl_ushort __cl_ushort4 __attribute__((vector_size(8))); typedef cl_short __cl_short4 __attribute__((vector_size(8))); typedef cl_uint __cl_uint2 __attribute__((vector_size(8))); typedef cl_int __cl_int2 __attribute__((vector_size(8))); typedef cl_ulong __cl_ulong1 __attribute__((vector_size(8))); typedef cl_long __cl_long1 __attribute__((vector_size(8))); typedef cl_float __cl_float2 __attribute__((vector_size(8))); #else typedef __m64 __cl_uchar8; typedef __m64 __cl_char8; typedef __m64 __cl_ushort4; typedef __m64 __cl_short4; typedef __m64 __cl_uint2; typedef __m64 __cl_int2; typedef __m64 __cl_ulong1; typedef __m64 __cl_long1; typedef __m64 __cl_float2; #endif #define __CL_UCHAR8__ 1 #define __CL_CHAR8__ 1 #define __CL_USHORT4__ 1 #define __CL_SHORT4__ 1 #define __CL_INT2__ 1 #define __CL_UINT2__ 1 #define __CL_ULONG1__ 1 #define __CL_LONG1__ 1 #define __CL_FLOAT2__ 1 #endif #if defined( __AVX__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_float __cl_float8 __attribute__((vector_size(32))); typedef cl_double __cl_double4 __attribute__((vector_size(32))); #else typedef __m256 __cl_float8; typedef __m256d __cl_double4; #endif #define __CL_FLOAT8__ 1 #define __CL_DOUBLE4__ 1 #endif /* Define capabilities for anonymous struct members. */ #if !defined(__cplusplus) && defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ #elif defined( __GNUC__) && ! defined( __STRICT_ANSI__ ) #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ __extension__ #elif defined( _WIN32) && defined(_MSC_VER) #if _MSC_VER >= 1500 /* Microsoft Developer Studio 2008 supports anonymous structs, but * complains by default. */ #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ /* Disable warning C4201: nonstandard extension used : nameless * struct/union */ #pragma warning( push ) #pragma warning( disable : 4201 ) #endif #else #define __CL_HAS_ANON_STRUCT__ 0 #define __CL_ANON_STRUCT__ #endif /* Define alignment keys */ #if defined( __GNUC__ ) #define CL_ALIGNED(_x) __attribute__ ((aligned(_x))) #elif defined( _WIN32) && (_MSC_VER) /* Alignment keys neutered on windows because MSVC can't swallow function arguments with alignment requirements */ /* http://msdn.microsoft.com/en-us/library/373ak2y1%28VS.71%29.aspx */ /* #include */ /* #define CL_ALIGNED(_x) _CRT_ALIGN(_x) */ #define CL_ALIGNED(_x) #else #warning Need to implement some method to align data here #define CL_ALIGNED(_x) #endif /* Indicate whether .xyzw, .s0123 and .hi.lo are supported */ #if __CL_HAS_ANON_STRUCT__ /* .xyzw and .s0123...{f|F} are supported */ #define CL_HAS_NAMED_VECTOR_FIELDS 1 /* .hi and .lo are supported */ #define CL_HAS_HI_LO_VECTOR_FIELDS 1 #endif /* Define cl_vector types */ /* ---- cl_charn ---- */ typedef union { cl_char CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_char lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2; #endif }cl_char2; typedef union { cl_char CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_char2 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[2]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4; #endif }cl_char4; /* cl_char3 is identical in size, alignment and behavior to cl_char4. See section 6.1.5. */ typedef cl_char4 cl_char3; typedef union { cl_char CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_char4 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[4]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[2]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8; #endif }cl_char8; typedef union { cl_char CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_char8 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[8]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[4]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8[2]; #endif #if defined( __CL_CHAR16__ ) __cl_char16 v16; #endif }cl_char16; /* ---- cl_ucharn ---- */ typedef union { cl_uchar CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uchar lo, hi; }; #endif #if defined( __cl_uchar2__) __cl_uchar2 v2; #endif }cl_uchar2; typedef union { cl_uchar CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uchar2 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[2]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4; #endif }cl_uchar4; /* cl_uchar3 is identical in size, alignment and behavior to cl_uchar4. See section 6.1.5. */ typedef cl_uchar4 cl_uchar3; typedef union { cl_uchar CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uchar4 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[4]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[2]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8; #endif }cl_uchar8; typedef union { cl_uchar CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uchar8 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[8]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[4]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8[2]; #endif #if defined( __CL_UCHAR16__ ) __cl_uchar16 v16; #endif }cl_uchar16; /* ---- cl_shortn ---- */ typedef union { cl_short CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_short lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2; #endif }cl_short2; typedef union { cl_short CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_short2 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[2]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4; #endif }cl_short4; /* cl_short3 is identical in size, alignment and behavior to cl_short4. See section 6.1.5. */ typedef cl_short4 cl_short3; typedef union { cl_short CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_short4 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[4]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[2]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8; #endif }cl_short8; typedef union { cl_short CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_short8 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[8]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[4]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8[2]; #endif #if defined( __CL_SHORT16__ ) __cl_short16 v16; #endif }cl_short16; /* ---- cl_ushortn ---- */ typedef union { cl_ushort CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ushort lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2; #endif }cl_ushort2; typedef union { cl_ushort CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ushort2 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[2]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4; #endif }cl_ushort4; /* cl_ushort3 is identical in size, alignment and behavior to cl_ushort4. See section 6.1.5. */ typedef cl_ushort4 cl_ushort3; typedef union { cl_ushort CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ushort4 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[4]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[2]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8; #endif }cl_ushort8; typedef union { cl_ushort CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ushort8 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[8]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[4]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8[2]; #endif #if defined( __CL_USHORT16__ ) __cl_ushort16 v16; #endif }cl_ushort16; /* ---- cl_halfn ---- */ typedef union { cl_half CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_half lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2; #endif }cl_half2; typedef union { cl_half CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_half2 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[2]; #endif #if defined( __CL_HALF4__) __cl_half4 v4; #endif }cl_half4; /* cl_half3 is identical in size, alignment and behavior to cl_half4. See section 6.1.5. */ typedef cl_half4 cl_half3; typedef union { cl_half CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_half4 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[4]; #endif #if defined( __CL_HALF4__) __cl_half4 v4[2]; #endif #if defined( __CL_HALF8__ ) __cl_half8 v8; #endif }cl_half8; typedef union { cl_half CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_half8 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[8]; #endif #if defined( __CL_HALF4__) __cl_half4 v4[4]; #endif #if defined( __CL_HALF8__ ) __cl_half8 v8[2]; #endif #if defined( __CL_HALF16__ ) __cl_half16 v16; #endif }cl_half16; /* ---- cl_intn ---- */ typedef union { cl_int CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_int lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2; #endif }cl_int2; typedef union { cl_int CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_int2 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[2]; #endif #if defined( __CL_INT4__) __cl_int4 v4; #endif }cl_int4; /* cl_int3 is identical in size, alignment and behavior to cl_int4. See section 6.1.5. */ typedef cl_int4 cl_int3; typedef union { cl_int CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_int4 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[4]; #endif #if defined( __CL_INT4__) __cl_int4 v4[2]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8; #endif }cl_int8; typedef union { cl_int CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_int8 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[8]; #endif #if defined( __CL_INT4__) __cl_int4 v4[4]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8[2]; #endif #if defined( __CL_INT16__ ) __cl_int16 v16; #endif }cl_int16; /* ---- cl_uintn ---- */ typedef union { cl_uint CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uint lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2; #endif }cl_uint2; typedef union { cl_uint CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uint2 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[2]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4; #endif }cl_uint4; /* cl_uint3 is identical in size, alignment and behavior to cl_uint4. See section 6.1.5. */ typedef cl_uint4 cl_uint3; typedef union { cl_uint CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uint4 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[4]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[2]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8; #endif }cl_uint8; typedef union { cl_uint CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uint8 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[8]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[4]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8[2]; #endif #if defined( __CL_UINT16__ ) __cl_uint16 v16; #endif }cl_uint16; /* ---- cl_longn ---- */ typedef union { cl_long CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_long lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2; #endif }cl_long2; typedef union { cl_long CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_long2 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[2]; #endif #if defined( __CL_LONG4__) __cl_long4 v4; #endif }cl_long4; /* cl_long3 is identical in size, alignment and behavior to cl_long4. See section 6.1.5. */ typedef cl_long4 cl_long3; typedef union { cl_long CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_long4 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[4]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[2]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8; #endif }cl_long8; typedef union { cl_long CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_long8 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[8]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[4]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8[2]; #endif #if defined( __CL_LONG16__ ) __cl_long16 v16; #endif }cl_long16; /* ---- cl_ulongn ---- */ typedef union { cl_ulong CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ulong lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2; #endif }cl_ulong2; typedef union { cl_ulong CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ulong2 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[2]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4; #endif }cl_ulong4; /* cl_ulong3 is identical in size, alignment and behavior to cl_ulong4. See section 6.1.5. */ typedef cl_ulong4 cl_ulong3; typedef union { cl_ulong CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ulong4 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[4]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[2]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8; #endif }cl_ulong8; typedef union { cl_ulong CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ulong8 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[8]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[4]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8[2]; #endif #if defined( __CL_ULONG16__ ) __cl_ulong16 v16; #endif }cl_ulong16; /* --- cl_floatn ---- */ typedef union { cl_float CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_float lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2; #endif }cl_float2; typedef union { cl_float CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_float2 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[2]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4; #endif }cl_float4; /* cl_float3 is identical in size, alignment and behavior to cl_float4. See section 6.1.5. */ typedef cl_float4 cl_float3; typedef union { cl_float CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_float4 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[4]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[2]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8; #endif }cl_float8; typedef union { cl_float CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_float8 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[8]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[4]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8[2]; #endif #if defined( __CL_FLOAT16__ ) __cl_float16 v16; #endif }cl_float16; /* --- cl_doublen ---- */ typedef union { cl_double CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_double lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2; #endif }cl_double2; typedef union { cl_double CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_double2 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[2]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4; #endif }cl_double4; /* cl_double3 is identical in size, alignment and behavior to cl_double4. See section 6.1.5. */ typedef cl_double4 cl_double3; typedef union { cl_double CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_double4 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[4]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[2]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8; #endif }cl_double8; typedef union { cl_double CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_double8 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[8]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[4]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8[2]; #endif #if defined( __CL_DOUBLE16__ ) __cl_double16 v16; #endif }cl_double16; /* Macro to facilitate debugging * Usage: * Place CL_PROGRAM_STRING_DEBUG_INFO on the line before the first line of your source. * The first line ends with: CL_PROGRAM_STRING_DEBUG_INFO \" * Each line thereafter of OpenCL C source must end with: \n\ * The last line ends in "; * * Example: * * const char *my_program = CL_PROGRAM_STRING_DEBUG_INFO "\ * kernel void foo( int a, float * b ) \n\ * { \n\ * // my comment \n\ * *b[ get_global_id(0)] = a; \n\ * } \n\ * "; * * This should correctly set up the line, (column) and file information for your source * string so you can do source level debugging. */ #define __CL_STRINGIFY( _x ) # _x #define _CL_STRINGIFY( _x ) __CL_STRINGIFY( _x ) #define CL_PROGRAM_STRING_DEBUG_INFO "#line " _CL_STRINGIFY(__LINE__) " \"" __FILE__ "\" \n\n" #ifdef __cplusplus } #endif #undef __CL_HAS_ANON_STRUCT__ #undef __CL_ANON_STRUCT__ #if defined( _WIN32) && defined(_MSC_VER) #if _MSC_VER >=1500 #pragma warning( pop ) #endif #endif #endif /* __CL_PLATFORM_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl1.2/CL/opencl.h000066400000000000000000000037111450307266000232550ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_H #define __OPENCL_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #include #include #include #else #include #include #include #include #endif #ifdef __cplusplus } #endif #endif /* __OPENCL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/000077500000000000000000000000001450307266000213235ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/000077500000000000000000000000001450307266000216215ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl.h000066400000000000000000002130701450307266000223730ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ #ifndef __OPENCL_CL_H #define __OPENCL_CL_H #ifdef __APPLE__ #include #else #include #endif #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ typedef struct _cl_platform_id * cl_platform_id; typedef struct _cl_device_id * cl_device_id; typedef struct _cl_context * cl_context; typedef struct _cl_command_queue * cl_command_queue; typedef struct _cl_mem * cl_mem; typedef struct _cl_program * cl_program; typedef struct _cl_kernel * cl_kernel; typedef struct _cl_event * cl_event; typedef struct _cl_sampler * cl_sampler; typedef cl_uint cl_bool; /* WARNING! Unlike cl_ types in cl_platform.h, cl_bool is not guaranteed to be the same size as the bool in kernels. */ typedef cl_ulong cl_bitfield; typedef cl_bitfield cl_device_type; typedef cl_uint cl_platform_info; typedef cl_uint cl_device_info; typedef cl_bitfield cl_device_fp_config; typedef cl_uint cl_device_mem_cache_type; typedef cl_uint cl_device_local_mem_type; typedef cl_bitfield cl_device_exec_capabilities; typedef cl_bitfield cl_device_svm_capabilities; typedef cl_bitfield cl_command_queue_properties; typedef intptr_t cl_device_partition_property; typedef cl_bitfield cl_device_affinity_domain; typedef intptr_t cl_context_properties; typedef cl_uint cl_context_info; typedef cl_bitfield cl_queue_properties; typedef cl_uint cl_command_queue_info; typedef cl_uint cl_channel_order; typedef cl_uint cl_channel_type; typedef cl_bitfield cl_mem_flags; typedef cl_bitfield cl_svm_mem_flags; typedef cl_uint cl_mem_object_type; typedef cl_uint cl_mem_info; typedef cl_bitfield cl_mem_migration_flags; typedef cl_uint cl_image_info; typedef cl_uint cl_buffer_create_type; typedef cl_uint cl_addressing_mode; typedef cl_uint cl_filter_mode; typedef cl_uint cl_sampler_info; typedef cl_bitfield cl_map_flags; typedef intptr_t cl_pipe_properties; typedef cl_uint cl_pipe_info; typedef cl_uint cl_program_info; typedef cl_uint cl_program_build_info; typedef cl_uint cl_program_binary_type; typedef cl_int cl_build_status; typedef cl_uint cl_kernel_info; typedef cl_uint cl_kernel_arg_info; typedef cl_uint cl_kernel_arg_address_qualifier; typedef cl_uint cl_kernel_arg_access_qualifier; typedef cl_bitfield cl_kernel_arg_type_qualifier; typedef cl_uint cl_kernel_work_group_info; typedef cl_uint cl_event_info; typedef cl_uint cl_command_type; typedef cl_uint cl_profiling_info; typedef cl_bitfield cl_sampler_properties; typedef cl_uint cl_kernel_exec_info; typedef struct _cl_image_format { cl_channel_order image_channel_order; cl_channel_type image_channel_data_type; } cl_image_format; typedef struct _cl_image_desc { cl_mem_object_type image_type; size_t image_width; size_t image_height; size_t image_depth; size_t image_array_size; size_t image_row_pitch; size_t image_slice_pitch; cl_uint num_mip_levels; cl_uint num_samples; #ifdef __GNUC__ __extension__ /* Prevents warnings about anonymous union in -pedantic builds */ #endif union { cl_mem buffer; cl_mem mem_object; }; } cl_image_desc; typedef struct _cl_buffer_region { size_t origin; size_t size; } cl_buffer_region; /******************************************************************************/ /* Error Codes */ #define CL_SUCCESS 0 #define CL_DEVICE_NOT_FOUND -1 #define CL_DEVICE_NOT_AVAILABLE -2 #define CL_COMPILER_NOT_AVAILABLE -3 #define CL_MEM_OBJECT_ALLOCATION_FAILURE -4 #define CL_OUT_OF_RESOURCES -5 #define CL_OUT_OF_HOST_MEMORY -6 #define CL_PROFILING_INFO_NOT_AVAILABLE -7 #define CL_MEM_COPY_OVERLAP -8 #define CL_IMAGE_FORMAT_MISMATCH -9 #define CL_IMAGE_FORMAT_NOT_SUPPORTED -10 #define CL_BUILD_PROGRAM_FAILURE -11 #define CL_MAP_FAILURE -12 #define CL_MISALIGNED_SUB_BUFFER_OFFSET -13 #define CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST -14 #define CL_COMPILE_PROGRAM_FAILURE -15 #define CL_LINKER_NOT_AVAILABLE -16 #define CL_LINK_PROGRAM_FAILURE -17 #define CL_DEVICE_PARTITION_FAILED -18 #define CL_KERNEL_ARG_INFO_NOT_AVAILABLE -19 #define CL_INVALID_VALUE -30 #define CL_INVALID_DEVICE_TYPE -31 #define CL_INVALID_PLATFORM -32 #define CL_INVALID_DEVICE -33 #define CL_INVALID_CONTEXT -34 #define CL_INVALID_QUEUE_PROPERTIES -35 #define CL_INVALID_COMMAND_QUEUE -36 #define CL_INVALID_HOST_PTR -37 #define CL_INVALID_MEM_OBJECT -38 #define CL_INVALID_IMAGE_FORMAT_DESCRIPTOR -39 #define CL_INVALID_IMAGE_SIZE -40 #define CL_INVALID_SAMPLER -41 #define CL_INVALID_BINARY -42 #define CL_INVALID_BUILD_OPTIONS -43 #define CL_INVALID_PROGRAM -44 #define CL_INVALID_PROGRAM_EXECUTABLE -45 #define CL_INVALID_KERNEL_NAME -46 #define CL_INVALID_KERNEL_DEFINITION -47 #define CL_INVALID_KERNEL -48 #define CL_INVALID_ARG_INDEX -49 #define CL_INVALID_ARG_VALUE -50 #define CL_INVALID_ARG_SIZE -51 #define CL_INVALID_KERNEL_ARGS -52 #define CL_INVALID_WORK_DIMENSION -53 #define CL_INVALID_WORK_GROUP_SIZE -54 #define CL_INVALID_WORK_ITEM_SIZE -55 #define CL_INVALID_GLOBAL_OFFSET -56 #define CL_INVALID_EVENT_WAIT_LIST -57 #define CL_INVALID_EVENT -58 #define CL_INVALID_OPERATION -59 #define CL_INVALID_GL_OBJECT -60 #define CL_INVALID_BUFFER_SIZE -61 #define CL_INVALID_MIP_LEVEL -62 #define CL_INVALID_GLOBAL_WORK_SIZE -63 #define CL_INVALID_PROPERTY -64 #define CL_INVALID_IMAGE_DESCRIPTOR -65 #define CL_INVALID_COMPILER_OPTIONS -66 #define CL_INVALID_LINKER_OPTIONS -67 #define CL_INVALID_DEVICE_PARTITION_COUNT -68 #define CL_INVALID_PIPE_SIZE -69 #define CL_INVALID_DEVICE_QUEUE -70 /* OpenCL Version */ #define CL_VERSION_1_0 1 #define CL_VERSION_1_1 1 #define CL_VERSION_1_2 1 #define CL_VERSION_2_0 1 /* cl_bool */ #define CL_FALSE 0 #define CL_TRUE 1 #define CL_BLOCKING CL_TRUE #define CL_NON_BLOCKING CL_FALSE /* cl_platform_info */ #define CL_PLATFORM_PROFILE 0x0900 #define CL_PLATFORM_VERSION 0x0901 #define CL_PLATFORM_NAME 0x0902 #define CL_PLATFORM_VENDOR 0x0903 #define CL_PLATFORM_EXTENSIONS 0x0904 /* cl_device_type - bitfield */ #define CL_DEVICE_TYPE_DEFAULT (1 << 0) #define CL_DEVICE_TYPE_CPU (1 << 1) #define CL_DEVICE_TYPE_GPU (1 << 2) #define CL_DEVICE_TYPE_ACCELERATOR (1 << 3) #define CL_DEVICE_TYPE_CUSTOM (1 << 4) #define CL_DEVICE_TYPE_ALL 0xFFFFFFFF /* cl_device_info */ #define CL_DEVICE_TYPE 0x1000 #define CL_DEVICE_VENDOR_ID 0x1001 #define CL_DEVICE_MAX_COMPUTE_UNITS 0x1002 #define CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 0x1003 #define CL_DEVICE_MAX_WORK_GROUP_SIZE 0x1004 #define CL_DEVICE_MAX_WORK_ITEM_SIZES 0x1005 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR 0x1006 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT 0x1007 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT 0x1008 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG 0x1009 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT 0x100A #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 0x100B #define CL_DEVICE_MAX_CLOCK_FREQUENCY 0x100C #define CL_DEVICE_ADDRESS_BITS 0x100D #define CL_DEVICE_MAX_READ_IMAGE_ARGS 0x100E #define CL_DEVICE_MAX_WRITE_IMAGE_ARGS 0x100F #define CL_DEVICE_MAX_MEM_ALLOC_SIZE 0x1010 #define CL_DEVICE_IMAGE2D_MAX_WIDTH 0x1011 #define CL_DEVICE_IMAGE2D_MAX_HEIGHT 0x1012 #define CL_DEVICE_IMAGE3D_MAX_WIDTH 0x1013 #define CL_DEVICE_IMAGE3D_MAX_HEIGHT 0x1014 #define CL_DEVICE_IMAGE3D_MAX_DEPTH 0x1015 #define CL_DEVICE_IMAGE_SUPPORT 0x1016 #define CL_DEVICE_MAX_PARAMETER_SIZE 0x1017 #define CL_DEVICE_MAX_SAMPLERS 0x1018 #define CL_DEVICE_MEM_BASE_ADDR_ALIGN 0x1019 #define CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE 0x101A #define CL_DEVICE_SINGLE_FP_CONFIG 0x101B #define CL_DEVICE_GLOBAL_MEM_CACHE_TYPE 0x101C #define CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE 0x101D #define CL_DEVICE_GLOBAL_MEM_CACHE_SIZE 0x101E #define CL_DEVICE_GLOBAL_MEM_SIZE 0x101F #define CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE 0x1020 #define CL_DEVICE_MAX_CONSTANT_ARGS 0x1021 #define CL_DEVICE_LOCAL_MEM_TYPE 0x1022 #define CL_DEVICE_LOCAL_MEM_SIZE 0x1023 #define CL_DEVICE_ERROR_CORRECTION_SUPPORT 0x1024 #define CL_DEVICE_PROFILING_TIMER_RESOLUTION 0x1025 #define CL_DEVICE_ENDIAN_LITTLE 0x1026 #define CL_DEVICE_AVAILABLE 0x1027 #define CL_DEVICE_COMPILER_AVAILABLE 0x1028 #define CL_DEVICE_EXECUTION_CAPABILITIES 0x1029 #define CL_DEVICE_QUEUE_PROPERTIES 0x102A /* deprecated */ #define CL_DEVICE_QUEUE_ON_HOST_PROPERTIES 0x102A #define CL_DEVICE_NAME 0x102B #define CL_DEVICE_VENDOR 0x102C #define CL_DRIVER_VERSION 0x102D #define CL_DEVICE_PROFILE 0x102E #define CL_DEVICE_VERSION 0x102F #define CL_DEVICE_EXTENSIONS 0x1030 #define CL_DEVICE_PLATFORM 0x1031 #define CL_DEVICE_DOUBLE_FP_CONFIG 0x1032 #define CL_DEVICE_HALF_FP_CONFIG 0x1033 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF 0x1034 #define CL_DEVICE_HOST_UNIFIED_MEMORY 0x1035 /* deprecated */ #define CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR 0x1036 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT 0x1037 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_INT 0x1038 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG 0x1039 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT 0x103A #define CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE 0x103B #define CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF 0x103C #define CL_DEVICE_OPENCL_C_VERSION 0x103D #define CL_DEVICE_LINKER_AVAILABLE 0x103E #define CL_DEVICE_BUILT_IN_KERNELS 0x103F #define CL_DEVICE_IMAGE_MAX_BUFFER_SIZE 0x1040 #define CL_DEVICE_IMAGE_MAX_ARRAY_SIZE 0x1041 #define CL_DEVICE_PARENT_DEVICE 0x1042 #define CL_DEVICE_PARTITION_MAX_SUB_DEVICES 0x1043 #define CL_DEVICE_PARTITION_PROPERTIES 0x1044 #define CL_DEVICE_PARTITION_AFFINITY_DOMAIN 0x1045 #define CL_DEVICE_PARTITION_TYPE 0x1046 #define CL_DEVICE_REFERENCE_COUNT 0x1047 #define CL_DEVICE_PREFERRED_INTEROP_USER_SYNC 0x1048 #define CL_DEVICE_PRINTF_BUFFER_SIZE 0x1049 #define CL_DEVICE_IMAGE_PITCH_ALIGNMENT 0x104A #define CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT 0x104B #define CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS 0x104C #define CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE 0x104D #define CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES 0x104E #define CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE 0x104F #define CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE 0x1050 #define CL_DEVICE_MAX_ON_DEVICE_QUEUES 0x1051 #define CL_DEVICE_MAX_ON_DEVICE_EVENTS 0x1052 #define CL_DEVICE_SVM_CAPABILITIES 0x1053 #define CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE 0x1054 #define CL_DEVICE_MAX_PIPE_ARGS 0x1055 #define CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS 0x1056 #define CL_DEVICE_PIPE_MAX_PACKET_SIZE 0x1057 #define CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT 0x1058 #define CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT 0x1059 #define CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT 0x105A /* cl_device_fp_config - bitfield */ #define CL_FP_DENORM (1 << 0) #define CL_FP_INF_NAN (1 << 1) #define CL_FP_ROUND_TO_NEAREST (1 << 2) #define CL_FP_ROUND_TO_ZERO (1 << 3) #define CL_FP_ROUND_TO_INF (1 << 4) #define CL_FP_FMA (1 << 5) #define CL_FP_SOFT_FLOAT (1 << 6) #define CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT (1 << 7) /* cl_device_mem_cache_type */ #define CL_NONE 0x0 #define CL_READ_ONLY_CACHE 0x1 #define CL_READ_WRITE_CACHE 0x2 /* cl_device_local_mem_type */ #define CL_LOCAL 0x1 #define CL_GLOBAL 0x2 /* cl_device_exec_capabilities - bitfield */ #define CL_EXEC_KERNEL (1 << 0) #define CL_EXEC_NATIVE_KERNEL (1 << 1) /* cl_command_queue_properties - bitfield */ #define CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE (1 << 0) #define CL_QUEUE_PROFILING_ENABLE (1 << 1) #define CL_QUEUE_ON_DEVICE (1 << 2) #define CL_QUEUE_ON_DEVICE_DEFAULT (1 << 3) /* cl_context_info */ #define CL_CONTEXT_REFERENCE_COUNT 0x1080 #define CL_CONTEXT_DEVICES 0x1081 #define CL_CONTEXT_PROPERTIES 0x1082 #define CL_CONTEXT_NUM_DEVICES 0x1083 /* cl_context_properties */ #define CL_CONTEXT_PLATFORM 0x1084 #define CL_CONTEXT_INTEROP_USER_SYNC 0x1085 /* cl_device_partition_property */ #define CL_DEVICE_PARTITION_EQUALLY 0x1086 #define CL_DEVICE_PARTITION_BY_COUNTS 0x1087 #define CL_DEVICE_PARTITION_BY_COUNTS_LIST_END 0x0 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN 0x1088 /* cl_device_affinity_domain */ #define CL_DEVICE_AFFINITY_DOMAIN_NUMA (1 << 0) #define CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE (1 << 1) #define CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE (1 << 2) #define CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE (1 << 3) #define CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE (1 << 4) #define CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE (1 << 5) /* cl_device_svm_capabilities */ #define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER (1 << 0) #define CL_DEVICE_SVM_FINE_GRAIN_BUFFER (1 << 1) #define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM (1 << 2) #define CL_DEVICE_SVM_ATOMICS (1 << 3) /* cl_command_queue_info */ #define CL_QUEUE_CONTEXT 0x1090 #define CL_QUEUE_DEVICE 0x1091 #define CL_QUEUE_REFERENCE_COUNT 0x1092 #define CL_QUEUE_PROPERTIES 0x1093 #define CL_QUEUE_SIZE 0x1094 /* cl_mem_flags and cl_svm_mem_flags - bitfield */ #define CL_MEM_READ_WRITE (1 << 0) #define CL_MEM_WRITE_ONLY (1 << 1) #define CL_MEM_READ_ONLY (1 << 2) #define CL_MEM_USE_HOST_PTR (1 << 3) #define CL_MEM_ALLOC_HOST_PTR (1 << 4) #define CL_MEM_COPY_HOST_PTR (1 << 5) /* reserved (1 << 6) */ #define CL_MEM_HOST_WRITE_ONLY (1 << 7) #define CL_MEM_HOST_READ_ONLY (1 << 8) #define CL_MEM_HOST_NO_ACCESS (1 << 9) #define CL_MEM_SVM_FINE_GRAIN_BUFFER (1 << 10) /* used by cl_svm_mem_flags only */ #define CL_MEM_SVM_ATOMICS (1 << 11) /* used by cl_svm_mem_flags only */ #define CL_MEM_KERNEL_READ_AND_WRITE (1 << 12) /* cl_mem_migration_flags - bitfield */ #define CL_MIGRATE_MEM_OBJECT_HOST (1 << 0) #define CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED (1 << 1) /* cl_channel_order */ #define CL_R 0x10B0 #define CL_A 0x10B1 #define CL_RG 0x10B2 #define CL_RA 0x10B3 #define CL_RGB 0x10B4 #define CL_RGBA 0x10B5 #define CL_BGRA 0x10B6 #define CL_ARGB 0x10B7 #define CL_INTENSITY 0x10B8 #define CL_LUMINANCE 0x10B9 #define CL_Rx 0x10BA #define CL_RGx 0x10BB #define CL_RGBx 0x10BC #define CL_DEPTH 0x10BD #define CL_DEPTH_STENCIL 0x10BE #define CL_sRGB 0x10BF #define CL_sRGBx 0x10C0 #define CL_sRGBA 0x10C1 #define CL_sBGRA 0x10C2 #define CL_ABGR 0x10C3 /* cl_channel_type */ #define CL_SNORM_INT8 0x10D0 #define CL_SNORM_INT16 0x10D1 #define CL_UNORM_INT8 0x10D2 #define CL_UNORM_INT16 0x10D3 #define CL_UNORM_SHORT_565 0x10D4 #define CL_UNORM_SHORT_555 0x10D5 #define CL_UNORM_INT_101010 0x10D6 #define CL_SIGNED_INT8 0x10D7 #define CL_SIGNED_INT16 0x10D8 #define CL_SIGNED_INT32 0x10D9 #define CL_UNSIGNED_INT8 0x10DA #define CL_UNSIGNED_INT16 0x10DB #define CL_UNSIGNED_INT32 0x10DC #define CL_HALF_FLOAT 0x10DD #define CL_FLOAT 0x10DE #define CL_UNORM_INT24 0x10DF /* cl_mem_object_type */ #define CL_MEM_OBJECT_BUFFER 0x10F0 #define CL_MEM_OBJECT_IMAGE2D 0x10F1 #define CL_MEM_OBJECT_IMAGE3D 0x10F2 #define CL_MEM_OBJECT_IMAGE2D_ARRAY 0x10F3 #define CL_MEM_OBJECT_IMAGE1D 0x10F4 #define CL_MEM_OBJECT_IMAGE1D_ARRAY 0x10F5 #define CL_MEM_OBJECT_IMAGE1D_BUFFER 0x10F6 #define CL_MEM_OBJECT_PIPE 0x10F7 /* cl_mem_info */ #define CL_MEM_TYPE 0x1100 #define CL_MEM_FLAGS 0x1101 #define CL_MEM_SIZE 0x1102 #define CL_MEM_HOST_PTR 0x1103 #define CL_MEM_MAP_COUNT 0x1104 #define CL_MEM_REFERENCE_COUNT 0x1105 #define CL_MEM_CONTEXT 0x1106 #define CL_MEM_ASSOCIATED_MEMOBJECT 0x1107 #define CL_MEM_OFFSET 0x1108 #define CL_MEM_USES_SVM_POINTER 0x1109 /* cl_image_info */ #define CL_IMAGE_FORMAT 0x1110 #define CL_IMAGE_ELEMENT_SIZE 0x1111 #define CL_IMAGE_ROW_PITCH 0x1112 #define CL_IMAGE_SLICE_PITCH 0x1113 #define CL_IMAGE_WIDTH 0x1114 #define CL_IMAGE_HEIGHT 0x1115 #define CL_IMAGE_DEPTH 0x1116 #define CL_IMAGE_ARRAY_SIZE 0x1117 #define CL_IMAGE_BUFFER 0x1118 #define CL_IMAGE_NUM_MIP_LEVELS 0x1119 #define CL_IMAGE_NUM_SAMPLES 0x111A /* cl_pipe_info */ #define CL_PIPE_PACKET_SIZE 0x1120 #define CL_PIPE_MAX_PACKETS 0x1121 /* cl_addressing_mode */ #define CL_ADDRESS_NONE 0x1130 #define CL_ADDRESS_CLAMP_TO_EDGE 0x1131 #define CL_ADDRESS_CLAMP 0x1132 #define CL_ADDRESS_REPEAT 0x1133 #define CL_ADDRESS_MIRRORED_REPEAT 0x1134 /* cl_filter_mode */ #define CL_FILTER_NEAREST 0x1140 #define CL_FILTER_LINEAR 0x1141 /* cl_sampler_info */ #define CL_SAMPLER_REFERENCE_COUNT 0x1150 #define CL_SAMPLER_CONTEXT 0x1151 #define CL_SAMPLER_NORMALIZED_COORDS 0x1152 #define CL_SAMPLER_ADDRESSING_MODE 0x1153 #define CL_SAMPLER_FILTER_MODE 0x1154 #define CL_SAMPLER_MIP_FILTER_MODE 0x1155 #define CL_SAMPLER_LOD_MIN 0x1156 #define CL_SAMPLER_LOD_MAX 0x1157 /* cl_map_flags - bitfield */ #define CL_MAP_READ (1 << 0) #define CL_MAP_WRITE (1 << 1) #define CL_MAP_WRITE_INVALIDATE_REGION (1 << 2) /* cl_program_info */ #define CL_PROGRAM_REFERENCE_COUNT 0x1160 #define CL_PROGRAM_CONTEXT 0x1161 #define CL_PROGRAM_NUM_DEVICES 0x1162 #define CL_PROGRAM_DEVICES 0x1163 #define CL_PROGRAM_SOURCE 0x1164 #define CL_PROGRAM_BINARY_SIZES 0x1165 #define CL_PROGRAM_BINARIES 0x1166 #define CL_PROGRAM_NUM_KERNELS 0x1167 #define CL_PROGRAM_KERNEL_NAMES 0x1168 /* cl_program_build_info */ #define CL_PROGRAM_BUILD_STATUS 0x1181 #define CL_PROGRAM_BUILD_OPTIONS 0x1182 #define CL_PROGRAM_BUILD_LOG 0x1183 #define CL_PROGRAM_BINARY_TYPE 0x1184 #define CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE 0x1185 /* cl_program_binary_type */ #define CL_PROGRAM_BINARY_TYPE_NONE 0x0 #define CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT 0x1 #define CL_PROGRAM_BINARY_TYPE_LIBRARY 0x2 #define CL_PROGRAM_BINARY_TYPE_EXECUTABLE 0x4 /* cl_build_status */ #define CL_BUILD_SUCCESS 0 #define CL_BUILD_NONE -1 #define CL_BUILD_ERROR -2 #define CL_BUILD_IN_PROGRESS -3 /* cl_kernel_info */ #define CL_KERNEL_FUNCTION_NAME 0x1190 #define CL_KERNEL_NUM_ARGS 0x1191 #define CL_KERNEL_REFERENCE_COUNT 0x1192 #define CL_KERNEL_CONTEXT 0x1193 #define CL_KERNEL_PROGRAM 0x1194 #define CL_KERNEL_ATTRIBUTES 0x1195 /* cl_kernel_arg_info */ #define CL_KERNEL_ARG_ADDRESS_QUALIFIER 0x1196 #define CL_KERNEL_ARG_ACCESS_QUALIFIER 0x1197 #define CL_KERNEL_ARG_TYPE_NAME 0x1198 #define CL_KERNEL_ARG_TYPE_QUALIFIER 0x1199 #define CL_KERNEL_ARG_NAME 0x119A /* cl_kernel_arg_address_qualifier */ #define CL_KERNEL_ARG_ADDRESS_GLOBAL 0x119B #define CL_KERNEL_ARG_ADDRESS_LOCAL 0x119C #define CL_KERNEL_ARG_ADDRESS_CONSTANT 0x119D #define CL_KERNEL_ARG_ADDRESS_PRIVATE 0x119E /* cl_kernel_arg_access_qualifier */ #define CL_KERNEL_ARG_ACCESS_READ_ONLY 0x11A0 #define CL_KERNEL_ARG_ACCESS_WRITE_ONLY 0x11A1 #define CL_KERNEL_ARG_ACCESS_READ_WRITE 0x11A2 #define CL_KERNEL_ARG_ACCESS_NONE 0x11A3 /* cl_kernel_arg_type_qualifier */ #define CL_KERNEL_ARG_TYPE_NONE 0 #define CL_KERNEL_ARG_TYPE_CONST (1 << 0) #define CL_KERNEL_ARG_TYPE_RESTRICT (1 << 1) #define CL_KERNEL_ARG_TYPE_VOLATILE (1 << 2) #define CL_KERNEL_ARG_TYPE_PIPE (1 << 3) /* cl_kernel_work_group_info */ #define CL_KERNEL_WORK_GROUP_SIZE 0x11B0 #define CL_KERNEL_COMPILE_WORK_GROUP_SIZE 0x11B1 #define CL_KERNEL_LOCAL_MEM_SIZE 0x11B2 #define CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE 0x11B3 #define CL_KERNEL_PRIVATE_MEM_SIZE 0x11B4 #define CL_KERNEL_GLOBAL_WORK_SIZE 0x11B5 /* cl_kernel_exec_info */ #define CL_KERNEL_EXEC_INFO_SVM_PTRS 0x11B6 #define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM 0x11B7 /* cl_event_info */ #define CL_EVENT_COMMAND_QUEUE 0x11D0 #define CL_EVENT_COMMAND_TYPE 0x11D1 #define CL_EVENT_REFERENCE_COUNT 0x11D2 #define CL_EVENT_COMMAND_EXECUTION_STATUS 0x11D3 #define CL_EVENT_CONTEXT 0x11D4 /* cl_command_type */ #define CL_COMMAND_NDRANGE_KERNEL 0x11F0 #define CL_COMMAND_TASK 0x11F1 #define CL_COMMAND_NATIVE_KERNEL 0x11F2 #define CL_COMMAND_READ_BUFFER 0x11F3 #define CL_COMMAND_WRITE_BUFFER 0x11F4 #define CL_COMMAND_COPY_BUFFER 0x11F5 #define CL_COMMAND_READ_IMAGE 0x11F6 #define CL_COMMAND_WRITE_IMAGE 0x11F7 #define CL_COMMAND_COPY_IMAGE 0x11F8 #define CL_COMMAND_COPY_IMAGE_TO_BUFFER 0x11F9 #define CL_COMMAND_COPY_BUFFER_TO_IMAGE 0x11FA #define CL_COMMAND_MAP_BUFFER 0x11FB #define CL_COMMAND_MAP_IMAGE 0x11FC #define CL_COMMAND_UNMAP_MEM_OBJECT 0x11FD #define CL_COMMAND_MARKER 0x11FE #define CL_COMMAND_ACQUIRE_GL_OBJECTS 0x11FF #define CL_COMMAND_RELEASE_GL_OBJECTS 0x1200 #define CL_COMMAND_READ_BUFFER_RECT 0x1201 #define CL_COMMAND_WRITE_BUFFER_RECT 0x1202 #define CL_COMMAND_COPY_BUFFER_RECT 0x1203 #define CL_COMMAND_USER 0x1204 #define CL_COMMAND_BARRIER 0x1205 #define CL_COMMAND_MIGRATE_MEM_OBJECTS 0x1206 #define CL_COMMAND_FILL_BUFFER 0x1207 #define CL_COMMAND_FILL_IMAGE 0x1208 #define CL_COMMAND_SVM_FREE 0x1209 #define CL_COMMAND_SVM_MEMCPY 0x120A #define CL_COMMAND_SVM_MEMFILL 0x120B #define CL_COMMAND_SVM_MAP 0x120C #define CL_COMMAND_SVM_UNMAP 0x120D /* command execution status */ #define CL_COMPLETE 0x0 #define CL_RUNNING 0x1 #define CL_SUBMITTED 0x2 #define CL_QUEUED 0x3 /* cl_buffer_create_type */ #define CL_BUFFER_CREATE_TYPE_REGION 0x1220 /* cl_profiling_info */ #define CL_PROFILING_COMMAND_QUEUED 0x1280 #define CL_PROFILING_COMMAND_SUBMIT 0x1281 #define CL_PROFILING_COMMAND_START 0x1282 #define CL_PROFILING_COMMAND_END 0x1283 #define CL_PROFILING_COMMAND_COMPLETE 0x1284 /********************************************************************************************************/ /* Platform API */ extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformIDs(cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformInfo(cl_platform_id /* platform */, cl_platform_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Device APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDs(cl_platform_id /* platform */, cl_device_type /* device_type */, cl_uint /* num_entries */, cl_device_id * /* devices */, cl_uint * /* num_devices */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceInfo(cl_device_id /* device */, cl_device_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevices(cl_device_id /* in_device */, const cl_device_partition_property * /* properties */, cl_uint /* num_devices */, cl_device_id * /* out_devices */, cl_uint * /* num_devices_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDevice(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDevice(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; /* Context APIs */ extern CL_API_ENTRY cl_context CL_API_CALL clCreateContext(const cl_context_properties * /* properties */, cl_uint /* num_devices */, const cl_device_id * /* devices */, void (CL_CALLBACK * /* pfn_notify */)(const char *, const void *, size_t, void *), void * /* user_data */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_context CL_API_CALL clCreateContextFromType(const cl_context_properties * /* properties */, cl_device_type /* device_type */, void (CL_CALLBACK * /* pfn_notify*/ )(const char *, const void *, size_t, void *), void * /* user_data */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainContext(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseContext(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetContextInfo(cl_context /* context */, cl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Command Queue APIs */ extern CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueueWithProperties(cl_context /* context */, cl_device_id /* device */, const cl_queue_properties * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainCommandQueue(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseCommandQueue(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetCommandQueueInfo(cl_command_queue /* command_queue */, cl_command_queue_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Memory Object APIs */ extern CL_API_ENTRY cl_mem CL_API_CALL clCreateBuffer(cl_context /* context */, cl_mem_flags /* flags */, size_t /* size */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateSubBuffer(cl_mem /* buffer */, cl_mem_flags /* flags */, cl_buffer_create_type /* buffer_create_type */, const void * /* buffer_create_info */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateImage(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, const cl_image_desc * /* image_desc */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_mem CL_API_CALL clCreatePipe(cl_context /* context */, cl_mem_flags /* flags */, cl_uint /* pipe_packet_size */, cl_uint /* pipe_max_packets */, const cl_pipe_properties * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainMemObject(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseMemObject(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSupportedImageFormats(cl_context /* context */, cl_mem_flags /* flags */, cl_mem_object_type /* image_type */, cl_uint /* num_entries */, cl_image_format * /* image_formats */, cl_uint * /* num_image_formats */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectInfo(cl_mem /* memobj */, cl_mem_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetImageInfo(cl_mem /* image */, cl_image_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetPipeInfo(cl_mem /* pipe */, cl_pipe_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetMemObjectDestructorCallback(cl_mem /* memobj */, void (CL_CALLBACK * /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/), void * /*user_data */ ) CL_API_SUFFIX__VERSION_1_1; /* SVM Allocation APIs */ extern CL_API_ENTRY void * CL_API_CALL clSVMAlloc(cl_context /* context */, cl_svm_mem_flags /* flags */, size_t /* size */, cl_uint /* alignment */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY void CL_API_CALL clSVMFree(cl_context /* context */, void * /* svm_pointer */) CL_API_SUFFIX__VERSION_2_0; /* Sampler APIs */ extern CL_API_ENTRY cl_sampler CL_API_CALL clCreateSamplerWithProperties(cl_context /* context */, const cl_sampler_properties * /* normalized_coords */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainSampler(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseSampler(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSamplerInfo(cl_sampler /* sampler */, cl_sampler_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Program Object APIs */ extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithSource(cl_context /* context */, cl_uint /* count */, const char ** /* strings */, const size_t * /* lengths */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBinary(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const size_t * /* lengths */, const unsigned char ** /* binaries */, cl_int * /* binary_status */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBuiltInKernels(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* kernel_names */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainProgram(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseProgram(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clBuildProgram(cl_program /* program */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCompileProgram(cl_program /* program */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, cl_uint /* num_input_headers */, const cl_program * /* input_headers */, const char ** /* header_include_names */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_program CL_API_CALL clLinkProgram(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, cl_uint /* num_input_programs */, const cl_program * /* input_programs */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */, cl_int * /* errcode_ret */ ) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clUnloadPlatformCompiler(cl_platform_id /* platform */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramInfo(cl_program /* program */, cl_program_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramBuildInfo(cl_program /* program */, cl_device_id /* device */, cl_program_build_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Kernel Object APIs */ extern CL_API_ENTRY cl_kernel CL_API_CALL clCreateKernel(cl_program /* program */, const char * /* kernel_name */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateKernelsInProgram(cl_program /* program */, cl_uint /* num_kernels */, cl_kernel * /* kernels */, cl_uint * /* num_kernels_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainKernel(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseKernel(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArg(cl_kernel /* kernel */, cl_uint /* arg_index */, size_t /* arg_size */, const void * /* arg_value */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointer(cl_kernel /* kernel */, cl_uint /* arg_index */, const void * /* arg_value */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfo(cl_kernel /* kernel */, cl_kernel_exec_info /* param_name */, size_t /* param_value_size */, const void * /* param_value */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelInfo(cl_kernel /* kernel */, cl_kernel_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelArgInfo(cl_kernel /* kernel */, cl_uint /* arg_indx */, cl_kernel_arg_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelWorkGroupInfo(cl_kernel /* kernel */, cl_device_id /* device */, cl_kernel_work_group_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Event Object APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clWaitForEvents(cl_uint /* num_events */, const cl_event * /* event_list */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetEventInfo(cl_event /* event */, cl_event_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_event CL_API_CALL clCreateUserEvent(cl_context /* context */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainEvent(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseEvent(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetUserEventStatus(cl_event /* event */, cl_int /* execution_status */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clSetEventCallback( cl_event /* event */, cl_int /* command_exec_callback_type */, void (CL_CALLBACK * /* pfn_notify */)(cl_event, cl_int, void *), void * /* user_data */) CL_API_SUFFIX__VERSION_1_1; /* Profiling APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetEventProfilingInfo(cl_event /* event */, cl_profiling_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Flush and Finish APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clFlush(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clFinish(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; /* Enqueued Commands APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, size_t /* offset */, size_t /* size */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBufferRect(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, const size_t * /* buffer_offset */, const size_t * /* host_offset */, const size_t * /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, size_t /* offset */, size_t /* size */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBufferRect(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, const size_t * /* buffer_offset */, const size_t * /* host_offset */, const size_t * /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, const void * /* pattern */, size_t /* pattern_size */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBuffer(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, size_t /* src_offset */, size_t /* dst_offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferRect(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, const size_t * /* src_origin */, const size_t * /* dst_origin */, const size_t * /* region */, size_t /* src_row_pitch */, size_t /* src_slice_pitch */, size_t /* dst_row_pitch */, size_t /* dst_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_read */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t /* row_pitch */, size_t /* slice_pitch */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_write */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t /* input_row_pitch */, size_t /* input_slice_pitch */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillImage(cl_command_queue /* command_queue */, cl_mem /* image */, const void * /* fill_color */, const size_t * /* origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImage(cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_image */, const size_t * /* src_origin[3] */, const size_t * /* dst_origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImageToBuffer(cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_buffer */, const size_t * /* src_origin[3] */, const size_t * /* region[3] */, size_t /* dst_offset */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferToImage(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_image */, size_t /* src_offset */, const size_t * /* dst_origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t * /* image_row_pitch */, size_t * /* image_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueUnmapMemObject(cl_command_queue /* command_queue */, cl_mem /* memobj */, void * /* mapped_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMigrateMemObjects(cl_command_queue /* command_queue */, cl_uint /* num_mem_objects */, const cl_mem * /* mem_objects */, cl_mem_migration_flags /* flags */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNDRangeKernel(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* work_dim */, const size_t * /* global_work_offset */, const size_t * /* global_work_size */, const size_t * /* local_work_size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNativeKernel(cl_command_queue /* command_queue */, void (CL_CALLBACK * /*user_func*/)(void *), void * /* args */, size_t /* cb_args */, cl_uint /* num_mem_objects */, const cl_mem * /* mem_list */, const void ** /* args_mem_loc */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarkerWithWaitList(cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrierWithWaitList(cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFree(cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void (CL_CALLBACK * /*pfn_free_func*/)(cl_command_queue /* queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void * /* user_data */), void * /* user_data */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpy(cl_command_queue /* command_queue */, cl_bool /* blocking_copy */, void * /* dst_ptr */, const void * /* src_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFill(cl_command_queue /* command_queue */, void * /* svm_ptr */, const void * /* pattern */, size_t /* pattern_size */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMap(cl_command_queue /* command_queue */, cl_bool /* blocking_map */, cl_map_flags /* flags */, void * /* svm_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmap(cl_command_queue /* command_queue */, void * /* svm_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; /* Extension function access * * Returns the extension function address for the given function name, * or NULL if a valid function can not be found. The client must * check to make sure the address is not NULL, before using or * calling the returned function address. */ extern CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddressForPlatform(cl_platform_id /* platform */, const char * /* func_name */) CL_API_SUFFIX__VERSION_1_2; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage2D(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_row_pitch */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage3D(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_depth */, size_t /* image_row_pitch */, size_t /* image_slice_pitch */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueMarker(cl_command_queue /* command_queue */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueWaitForEvents(cl_command_queue /* command_queue */, cl_uint /* num_events */, const cl_event * /* event_list */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueBarrier(cl_command_queue /* command_queue */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clUnloadCompiler(void) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED void * CL_API_CALL clGetExtensionFunctionAddress(const char * /* func_name */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* Deprecated OpenCL 2.0 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_command_queue CL_API_CALL clCreateCommandQueue(cl_context /* context */, cl_device_id /* device */, cl_command_queue_properties /* properties */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_sampler CL_API_CALL clCreateSampler(cl_context /* context */, cl_bool /* normalized_coords */, cl_addressing_mode /* addressing_mode */, cl_filter_mode /* filter_mode */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_int CL_API_CALL clEnqueueTask(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl.hpp000066400000000000000000011076001450307266000227350ustar00rootroot00000000000000/* Modifications Copyright(C)[2021-2022] Advanced Micro Devices, Inc. * All rights reserved. * */ /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /*! \file * * \brief C++ bindings for OpenCL 1.0 (rev 48), OpenCL 1.1 (rev 33) and * OpenCL 1.2 (rev 15) * \author Benedict R. Gaster, Laurent Morichetti and Lee Howes * * Additions and fixes from: * Brian Cole, March 3rd 2010 and April 2012 * Matt Gruenke, April 2012. * Bruce Merry, February 2013. * Tom Deakin and Simon McIntosh-Smith, July 2013 * * \version 1.2.9 * \date December 2015 * * Optional extension support * * cl * cl_ext_device_fission * #define USE_CL_DEVICE_FISSION */ /*! \mainpage * \section intro Introduction * For many large applications C++ is the language of choice and so it seems * reasonable to define C++ bindings for OpenCL. * * * The interface is contained with a single C++ header file \em cl.hpp and all * definitions are contained within the namespace \em cl. There is no additional * requirement to include \em cl.h and to use either the C++ or original C * bindings it is enough to simply include \em cl.hpp. * * The bindings themselves are lightweight and correspond closely to the * underlying C API. Using the C++ bindings introduces no additional execution * overhead. * * For detail documentation on the bindings see: * * The OpenCL C++ Wrapper API 1.2 (revision 09) * http://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.2.pdf * * \section example Example * * The following example shows a general use case for the C++ * bindings, including support for the optional exception feature and * also the supplied vector and string classes, see following sections for * decriptions of these features. * * \code * #define __CL_ENABLE_EXCEPTIONS * * #if defined(__APPLE__) || defined(__MACOSX) * #include * #else * #include * #endif * #include * #include * #include * * const char * helloStr = "__kernel void " * "hello(void) " * "{ " * " " * "} "; * * int * main(void) * { * cl_int err = CL_SUCCESS; * try { * * std::vector platforms; * cl::Platform::get(&platforms); * if (platforms.size() == 0) { * std::cout << "Platform size 0\n"; * return -1; * } * * cl_context_properties properties[] = * { CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[0])(), 0}; * cl::Context context(CL_DEVICE_TYPE_CPU, properties); * * std::vector devices = context.getInfo(); * * cl::Program::Sources source(1, * std::make_pair(helloStr,strlen(helloStr))); * cl::Program program_ = cl::Program(context, source); * program_.build(devices); * * cl::Kernel kernel(program_, "hello", &err); * * cl::Event event; * cl::CommandQueue queue(context, devices[0], 0, &err); * queue.enqueueNDRangeKernel( * kernel, * cl::NullRange, * cl::NDRange(4,4), * cl::NullRange, * NULL, * &event); * * event.wait(); * } * catch (cl::Error err) { * std::cerr * << "ERROR: " * << err.what() * << "(" * << err.err() * << ")" * << std::endl; * } * * return EXIT_SUCCESS; * } * * \endcode * */ #ifndef CL_HPP_ #define CL_HPP_ #ifdef _WIN32 #include #if defined(USE_DX_INTEROP) #include #include #endif #endif // _WIN32 #if defined(_MSC_VER) #include #endif // _MSC_VER // #if defined(USE_CL_DEVICE_FISSION) #include #endif #if defined(__APPLE__) || defined(__MACOSX) #include #else #include #endif // !__APPLE__ #if (_MSC_VER >= 1700) || (__cplusplus >= 201103L) #define CL_HPP_RVALUE_REFERENCES_SUPPORTED #define CL_HPP_CPP11_ATOMICS_SUPPORTED #include #endif #if (__cplusplus >= 201103L) #define CL_HPP_NOEXCEPT noexcept #else #define CL_HPP_NOEXCEPT #endif // To avoid accidentally taking ownership of core OpenCL types // such as cl_kernel constructors are made explicit // under OpenCL 1.2 #if defined(CL_VERSION_1_2) && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __CL_EXPLICIT_CONSTRUCTORS explicit #else // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __CL_EXPLICIT_CONSTRUCTORS #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Define deprecated prefixes and suffixes to ensure compilation // in case they are not pre-defined #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_CALLBACK) #define CL_CALLBACK #endif //CL_CALLBACK #include #include #include #if defined(__CL_ENABLE_EXCEPTIONS) #include #endif // #if defined(__CL_ENABLE_EXCEPTIONS) #if !defined(__NO_STD_VECTOR) #include #endif #if !defined(__NO_STD_STRING) #include #endif #if defined(__ANDROID__) || defined(linux) || defined(__APPLE__) || defined(__MACOSX) #include #endif // linux #include /*! \namespace cl * * \brief The OpenCL C++ bindings are defined within this namespace. * */ namespace cl { class Memory; /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) #define __INIT_CL_EXT_FCN_PTR(name) \ if(!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddress(#name); \ if(!pfn_##name) { \ } \ } #endif // #if defined(CL_VERSION_1_1) #if defined(CL_VERSION_1_2) #define __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, name) \ if(!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddressForPlatform(platform, #name); \ if(!pfn_##name) { \ } \ } #endif // #if defined(CL_VERSION_1_1) class Program; class Device; class Context; class CommandQueue; class Memory; class Buffer; #if defined(__CL_ENABLE_EXCEPTIONS) /*! \brief Exception class * * This may be thrown by API functions when __CL_ENABLE_EXCEPTIONS is defined. */ class Error : public std::exception { private: cl_int err_; const char * errStr_; public: /*! \brief Create a new CL error exception for a given error code * and corresponding message. * * \param err error code value. * * \param errStr a descriptive string that must remain in scope until * handling of the exception has concluded. If set, it * will be returned by what(). */ Error(cl_int err, const char * errStr = NULL) : err_(err), errStr_(errStr) {} ~Error() throw() {} /*! \brief Get error string associated with exception * * \return A memory pointer to the error message string. */ virtual const char * what() const throw () { if (errStr_ == NULL) { return "empty"; } else { return errStr_; } } /*! \brief Get error code associated with exception * * \return The error code. */ cl_int err(void) const { return err_; } }; #define __ERR_STR(x) #x #else #define __ERR_STR(x) NULL #endif // __CL_ENABLE_EXCEPTIONS namespace detail { #if defined(__CL_ENABLE_EXCEPTIONS) static inline cl_int errHandler ( cl_int err, const char * errStr = NULL) { if (err != CL_SUCCESS) { throw Error(err, errStr); } return err; } #else static inline cl_int errHandler (cl_int err, const char * errStr = NULL) { (void) errStr; // suppress unused variable warning return err; } #endif // __CL_ENABLE_EXCEPTIONS } //! \cond DOXYGEN_DETAIL #if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS) #define __GET_DEVICE_INFO_ERR __ERR_STR(clGetDeviceInfo) #define __GET_PLATFORM_INFO_ERR __ERR_STR(clGetPlatformInfo) #define __GET_DEVICE_IDS_ERR __ERR_STR(clGetDeviceIDs) #define __GET_PLATFORM_IDS_ERR __ERR_STR(clGetPlatformIDs) #define __GET_CONTEXT_INFO_ERR __ERR_STR(clGetContextInfo) #define __GET_EVENT_INFO_ERR __ERR_STR(clGetEventInfo) #define __GET_EVENT_PROFILE_INFO_ERR __ERR_STR(clGetEventProfileInfo) #define __GET_MEM_OBJECT_INFO_ERR __ERR_STR(clGetMemObjectInfo) #define __GET_IMAGE_INFO_ERR __ERR_STR(clGetImageInfo) #define __GET_SAMPLER_INFO_ERR __ERR_STR(clGetSamplerInfo) #define __GET_KERNEL_INFO_ERR __ERR_STR(clGetKernelInfo) #if defined(CL_VERSION_1_2) #define __GET_KERNEL_ARG_INFO_ERR __ERR_STR(clGetKernelArgInfo) #endif // #if defined(CL_VERSION_1_2) #define __GET_KERNEL_WORK_GROUP_INFO_ERR __ERR_STR(clGetKernelWorkGroupInfo) #define __GET_PROGRAM_INFO_ERR __ERR_STR(clGetProgramInfo) #define __GET_PROGRAM_BUILD_INFO_ERR __ERR_STR(clGetProgramBuildInfo) #define __GET_COMMAND_QUEUE_INFO_ERR __ERR_STR(clGetCommandQueueInfo) #define __CREATE_CONTEXT_ERR __ERR_STR(clCreateContext) #define __CREATE_CONTEXT_FROM_TYPE_ERR __ERR_STR(clCreateContextFromType) #define __GET_SUPPORTED_IMAGE_FORMATS_ERR __ERR_STR(clGetSupportedImageFormats) #define __CREATE_BUFFER_ERR __ERR_STR(clCreateBuffer) #define __COPY_ERR __ERR_STR(cl::copy) #define __CREATE_SUBBUFFER_ERR __ERR_STR(clCreateSubBuffer) #define __CREATE_GL_BUFFER_ERR __ERR_STR(clCreateFromGLBuffer) #define __CREATE_GL_RENDER_BUFFER_ERR __ERR_STR(clCreateFromGLBuffer) #define __GET_GL_OBJECT_INFO_ERR __ERR_STR(clGetGLObjectInfo) #if defined(CL_VERSION_1_2) #define __CREATE_IMAGE_ERR __ERR_STR(clCreateImage) #define __CREATE_GL_TEXTURE_ERR __ERR_STR(clCreateFromGLTexture) #define __IMAGE_DIMENSION_ERR __ERR_STR(Incorrect image dimensions) #endif // #if defined(CL_VERSION_1_2) #define __CREATE_SAMPLER_ERR __ERR_STR(clCreateSampler) #define __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR __ERR_STR(clSetMemObjectDestructorCallback) #define __CREATE_USER_EVENT_ERR __ERR_STR(clCreateUserEvent) #define __SET_USER_EVENT_STATUS_ERR __ERR_STR(clSetUserEventStatus) #define __SET_EVENT_CALLBACK_ERR __ERR_STR(clSetEventCallback) #define __WAIT_FOR_EVENTS_ERR __ERR_STR(clWaitForEvents) #define __CREATE_KERNEL_ERR __ERR_STR(clCreateKernel) #define __SET_KERNEL_ARGS_ERR __ERR_STR(clSetKernelArg) #define __CREATE_PROGRAM_WITH_SOURCE_ERR __ERR_STR(clCreateProgramWithSource) #define __CREATE_PROGRAM_WITH_BINARY_ERR __ERR_STR(clCreateProgramWithBinary) #if defined(CL_VERSION_1_2) #define __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR __ERR_STR(clCreateProgramWithBuiltInKernels) #endif // #if defined(CL_VERSION_1_2) #define __BUILD_PROGRAM_ERR __ERR_STR(clBuildProgram) #if defined(CL_VERSION_1_2) #define __COMPILE_PROGRAM_ERR __ERR_STR(clCompileProgram) #define __LINK_PROGRAM_ERR __ERR_STR(clLinkProgram) #endif // #if defined(CL_VERSION_1_2) #define __CREATE_KERNELS_IN_PROGRAM_ERR __ERR_STR(clCreateKernelsInProgram) #define __CREATE_COMMAND_QUEUE_ERR __ERR_STR(clCreateCommandQueue) #define __SET_COMMAND_QUEUE_PROPERTY_ERR __ERR_STR(clSetCommandQueueProperty) #define __ENQUEUE_READ_BUFFER_ERR __ERR_STR(clEnqueueReadBuffer) #define __ENQUEUE_READ_BUFFER_RECT_ERR __ERR_STR(clEnqueueReadBufferRect) #define __ENQUEUE_WRITE_BUFFER_ERR __ERR_STR(clEnqueueWriteBuffer) #define __ENQUEUE_WRITE_BUFFER_RECT_ERR __ERR_STR(clEnqueueWriteBufferRect) #define __ENQEUE_COPY_BUFFER_ERR __ERR_STR(clEnqueueCopyBuffer) #define __ENQEUE_COPY_BUFFER_RECT_ERR __ERR_STR(clEnqueueCopyBufferRect) #define __ENQUEUE_FILL_BUFFER_ERR __ERR_STR(clEnqueueFillBuffer) #define __ENQUEUE_READ_IMAGE_ERR __ERR_STR(clEnqueueReadImage) #define __ENQUEUE_WRITE_IMAGE_ERR __ERR_STR(clEnqueueWriteImage) #define __ENQUEUE_COPY_IMAGE_ERR __ERR_STR(clEnqueueCopyImage) #define __ENQUEUE_FILL_IMAGE_ERR __ERR_STR(clEnqueueFillImage) #define __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR __ERR_STR(clEnqueueCopyImageToBuffer) #define __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR __ERR_STR(clEnqueueCopyBufferToImage) #define __ENQUEUE_MAP_BUFFER_ERR __ERR_STR(clEnqueueMapBuffer) #define __ENQUEUE_MAP_IMAGE_ERR __ERR_STR(clEnqueueMapImage) #define __ENQUEUE_UNMAP_MEM_OBJECT_ERR __ERR_STR(clEnqueueUnMapMemObject) #define __ENQUEUE_NDRANGE_KERNEL_ERR __ERR_STR(clEnqueueNDRangeKernel) #define __ENQUEUE_TASK_ERR __ERR_STR(clEnqueueTask) #define __ENQUEUE_NATIVE_KERNEL __ERR_STR(clEnqueueNativeKernel) #if defined(CL_VERSION_1_2) #define __ENQUEUE_MIGRATE_MEM_OBJECTS_ERR __ERR_STR(clEnqueueMigrateMemObjects) #endif // #if defined(CL_VERSION_1_2) #define __ENQUEUE_ACQUIRE_GL_ERR __ERR_STR(clEnqueueAcquireGLObjects) #define __ENQUEUE_RELEASE_GL_ERR __ERR_STR(clEnqueueReleaseGLObjects) #define __RETAIN_ERR __ERR_STR(Retain Object) #define __RELEASE_ERR __ERR_STR(Release Object) #define __FLUSH_ERR __ERR_STR(clFlush) #define __FINISH_ERR __ERR_STR(clFinish) #define __VECTOR_CAPACITY_ERR __ERR_STR(Vector capacity error) /** * CL 1.2 version that uses device fission. */ #if defined(CL_VERSION_1_2) #define __CREATE_SUB_DEVICES __ERR_STR(clCreateSubDevices) #else #define __CREATE_SUB_DEVICES __ERR_STR(clCreateSubDevicesEXT) #endif // #if defined(CL_VERSION_1_2) /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) #define __ENQUEUE_MARKER_ERR __ERR_STR(clEnqueueMarker) #define __ENQUEUE_WAIT_FOR_EVENTS_ERR __ERR_STR(clEnqueueWaitForEvents) #define __ENQUEUE_BARRIER_ERR __ERR_STR(clEnqueueBarrier) #define __UNLOAD_COMPILER_ERR __ERR_STR(clUnloadCompiler) #define __CREATE_GL_TEXTURE_2D_ERR __ERR_STR(clCreateFromGLTexture2D) #define __CREATE_GL_TEXTURE_3D_ERR __ERR_STR(clCreateFromGLTexture3D) #define __CREATE_IMAGE2D_ERR __ERR_STR(clCreateImage2D) #define __CREATE_IMAGE3D_ERR __ERR_STR(clCreateImage3D) #endif // #if defined(CL_VERSION_1_1) #endif // __CL_USER_OVERRIDE_ERROR_STRINGS //! \endcond /** * CL 1.2 marker and barrier commands */ #if defined(CL_VERSION_1_2) #define __ENQUEUE_MARKER_WAIT_LIST_ERR __ERR_STR(clEnqueueMarkerWithWaitList) #define __ENQUEUE_BARRIER_WAIT_LIST_ERR __ERR_STR(clEnqueueBarrierWithWaitList) #endif // #if defined(CL_VERSION_1_2) #if !defined(__USE_DEV_STRING) && !defined(__NO_STD_STRING) typedef std::string STRING_CLASS; #elif !defined(__USE_DEV_STRING) /*! \class string * \brief Simple string class, that provides a limited subset of std::string * functionality but avoids many of the issues that come with that class. * \note Deprecated. Please use std::string as default or * re-define the string class to match the std::string * interface by defining STRING_CLASS */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED string { private: ::size_t size_; char * str_; public: //! \brief Constructs an empty string, allocating no memory. string(void) : size_(0), str_(NULL) { } /*! \brief Constructs a string populated from an arbitrary value of * specified size. * * An extra '\0' is added, in case none was contained in str. * * \param str the initial value of the string instance. Note that '\0' * characters receive no special treatment. If NULL, * the string is left empty, with a size of 0. * * \param size the number of characters to copy from str. */ string(const char * str, ::size_t size) : size_(size), str_(NULL) { if( size > 0 ) { str_ = new char[size_+1]; if (str_ != NULL) { memcpy(str_, str, size_ * sizeof(char)); str_[size_] = '\0'; } else { size_ = 0; } } } /*! \brief Constructs a string populated from a null-terminated value. * * \param str the null-terminated initial value of the string instance. * If NULL, the string is left empty, with a size of 0. */ string(const char * str) : size_(0), str_(NULL) { if( str ) { size_= ::strlen(str); } if( size_ > 0 ) { str_ = new char[size_ + 1]; if (str_ != NULL) { memcpy(str_, str, (size_ + 1) * sizeof(char)); } } } void resize( ::size_t n ) { if( size_ == n ) { return; } if (n == 0) { if( str_ ) { delete [] str_; } str_ = NULL; size_ = 0; } else { char *newString = new char[n + 1]; ::size_t copySize = n; if( size_ < n ) { copySize = size_; } size_ = n; if(str_) { memcpy(newString, str_, (copySize + 1) * sizeof(char)); } if( copySize < size_ ) { memset(newString + copySize, 0, size_ - copySize); } newString[size_] = '\0'; delete [] str_; str_ = newString; } } const char& operator[] ( ::size_t pos ) const { return str_[pos]; } char& operator[] ( ::size_t pos ) { return str_[pos]; } /*! \brief Copies the value of another string to this one. * * \param rhs the string to copy. * * \returns a reference to the modified instance. */ string& operator=(const string& rhs) { if (this == &rhs) { return *this; } if( str_ != NULL ) { delete [] str_; str_ = NULL; size_ = 0; } if (rhs.size_ == 0 || rhs.str_ == NULL) { str_ = NULL; size_ = 0; } else { str_ = new char[rhs.size_ + 1]; size_ = rhs.size_; if (str_ != NULL) { memcpy(str_, rhs.str_, (size_ + 1) * sizeof(char)); } else { size_ = 0; } } return *this; } /*! \brief Constructs a string by copying the value of another instance. * * \param rhs the string to copy. */ string(const string& rhs) : size_(0), str_(NULL) { *this = rhs; } //! \brief Destructor - frees memory used to hold the current value. ~string() { delete[] str_; str_ = NULL; } //! \brief Queries the length of the string, excluding any added '\0's. ::size_t size(void) const { return size_; } //! \brief Queries the length of the string, excluding any added '\0's. ::size_t length(void) const { return size(); } /*! \brief Returns a pointer to the private copy held by this instance, * or "" if empty/unset. */ const char * c_str(void) const { return (str_) ? str_ : "";} } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef cl::string STRING_CLASS; #endif // #elif !defined(__USE_DEV_STRING) #if !defined(__USE_DEV_VECTOR) && !defined(__NO_STD_VECTOR) #define VECTOR_CLASS std::vector #elif !defined(__USE_DEV_VECTOR) #define VECTOR_CLASS cl::vector #if !defined(__MAX_DEFAULT_VECTOR_SIZE) #define __MAX_DEFAULT_VECTOR_SIZE 10 #endif /*! \class vector * \brief Fixed sized vector implementation that mirroring * * \note Deprecated. Please use std::vector as default or * re-define the vector class to match the std::vector * interface by defining VECTOR_CLASS * \note Not recommended for use with custom objects as * current implementation will construct N elements * * std::vector functionality. * \brief Fixed sized vector compatible with std::vector. * * \note * This differs from std::vector<> not just in memory allocation, * but also in terms of when members are constructed, destroyed, * and assigned instead of being copy constructed. * * \param T type of element contained in the vector. * * \param N maximum size of the vector. */ template class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED vector { private: T data_[N]; unsigned int size_; public: //! \brief Constructs an empty vector with no memory allocated. vector() : size_(static_cast(0)) {} //! \brief Deallocates the vector's memory and destroys all of its elements. ~vector() { clear(); } //! \brief Returns the number of elements currently contained. unsigned int size(void) const { return size_; } /*! \brief Empties the vector of all elements. * \note * This does not deallocate memory but will invoke destructors * on contained elements. */ void clear() { while(!empty()) { pop_back(); } } /*! \brief Appends an element after the last valid element. * Calling this on a vector that has reached capacity will throw an * exception if exceptions are enabled. */ void push_back (const T& x) { if (size() < N) { new (&data_[size_]) T(x); size_++; } else { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } } /*! \brief Removes the last valid element from the vector. * Calling this on an empty vector will throw an exception * if exceptions are enabled. */ void pop_back(void) { if (size_ != 0) { --size_; data_[size_].~T(); } else { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } } /*! \brief Constructs with a value copied from another. * * \param vec the vector to copy. */ vector(const vector& vec) : size_(vec.size_) { if (size_ != 0) { assign(vec.begin(), vec.end()); } } /*! \brief Constructs with a specified number of initial elements. * * \param size number of initial elements. * * \param val value of initial elements. */ vector(unsigned int size, const T& val = T()) : size_(0) { for (unsigned int i = 0; i < size; i++) { push_back(val); } } /*! \brief Overwrites the current content with that copied from another * instance. * * \param rhs vector to copy. * * \returns a reference to this. */ vector& operator=(const vector& rhs) { if (this == &rhs) { return *this; } if (rhs.size_ != 0) { assign(rhs.begin(), rhs.end()); } else { clear(); } return *this; } /*! \brief Tests equality against another instance. * * \param vec the vector against which to compare. */ bool operator==(vector &vec) { if (size() != vec.size()) { return false; } for( unsigned int i = 0; i < size(); ++i ) { if( operator[](i) != vec[i] ) { return false; } } return true; } //! \brief Conversion operator to T*. operator T* () { return data_; } //! \brief Conversion operator to const T*. operator const T* () const { return data_; } //! \brief Tests whether this instance has any elements. bool empty (void) const { return size_==0; } //! \brief Returns the maximum number of elements this instance can hold. unsigned int max_size (void) const { return N; } //! \brief Returns the maximum number of elements this instance can hold. unsigned int capacity () const { return N; } //! \brief Resizes the vector to the given size void resize(unsigned int newSize, T fill = T()) { if (newSize > N) { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } else { while (size_ < newSize) { new (&data_[size_]) T(fill); size_++; } while (size_ > newSize) { --size_; data_[size_].~T(); } } } /*! \brief Returns a reference to a given element. * * \param index which element to access. * * \note * The caller is responsible for ensuring index is >= 0 and < size(). */ T& operator[](int index) { return data_[index]; } /*! \brief Returns a const reference to a given element. * * \param index which element to access. * * \note * The caller is responsible for ensuring index is >= 0 and < size(). */ const T& operator[](int index) const { return data_[index]; } /*! \brief Assigns elements of the vector based on a source iterator range. * * \param start Beginning iterator of source range * \param end Enditerator of source range * * \note * Will throw an exception if exceptions are enabled and size exceeded. */ template void assign(I start, I end) { clear(); while(start != end) { push_back(*start); start++; } } /*! \class iterator * \brief Const iterator class for vectors */ class iterator { private: const vector *vec_; int index_; /** * Internal iterator constructor to capture reference * to the vector it iterates over rather than taking * the vector by copy. */ iterator (const vector &vec, int index) : vec_(&vec) { if( !vec.empty() ) { index_ = index; } else { index_ = -1; } } public: iterator(void) : index_(-1), vec_(NULL) { } iterator(const iterator& rhs) : vec_(rhs.vec_), index_(rhs.index_) { } ~iterator(void) {} static iterator begin(const cl::vector &vec) { iterator i(vec, 0); return i; } static iterator end(const cl::vector &vec) { iterator i(vec, vec.size()); return i; } bool operator==(iterator i) { return ((vec_ == i.vec_) && (index_ == i.index_)); } bool operator!=(iterator i) { return (!(*this==i)); } iterator& operator++() { ++index_; return *this; } iterator operator++(int) { iterator retVal(*this); ++index_; return retVal; } iterator& operator--() { --index_; return *this; } iterator operator--(int) { iterator retVal(*this); --index_; return retVal; } const T& operator *() const { return (*vec_)[index_]; } }; iterator begin(void) { return iterator::begin(*this); } iterator begin(void) const { return iterator::begin(*this); } iterator end(void) { return iterator::end(*this); } iterator end(void) const { return iterator::end(*this); } T& front(void) { return data_[0]; } T& back(void) { return data_[size_]; } const T& front(void) const { return data_[0]; } const T& back(void) const { return data_[size_-1]; } } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; #endif // #if !defined(__USE_DEV_VECTOR) && !defined(__NO_STD_VECTOR) namespace detail { #define __DEFAULT_NOT_INITIALIZED 1 #define __DEFAULT_BEING_INITIALIZED 2 #define __DEFAULT_INITIALIZED 4 /* * Compare and exchange primitives are needed for handling of defaults */ #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED inline int compare_exchange(std::atomic * dest, int exchange, int comparand) #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED inline int compare_exchange(volatile int * dest, int exchange, int comparand) #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED { #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED std::atomic_compare_exchange_strong(dest, &comparand, exchange); return comparand; #elif _MSC_VER return (int)(_InterlockedCompareExchange( (volatile long*)dest, (long)exchange, (long)comparand)); #else // !_MSC_VER && !CL_HPP_CPP11_ATOMICS_SUPPORTED return (__sync_val_compare_and_swap( dest, comparand, exchange)); #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED } inline void fence() { #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED std::atomic_thread_fence(std::memory_order_seq_cst); #elif _MSC_VER // !CL_HPP_CPP11_ATOMICS_SUPPORTED _ReadWriteBarrier(); #else // !_MSC_VER && !CL_HPP_CPP11_ATOMICS_SUPPORTED __sync_synchronize(); #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED } } // namespace detail /*! \brief class used to interface between C++ and * OpenCL C calls that require arrays of size_t values, whose * size is known statically. */ template class size_t { private: ::size_t data_[N]; public: //! \brief Initialize size_t to all 0s size_t() { for( int i = 0; i < N; ++i ) { data_[i] = 0; } } ::size_t& operator[](int index) { return data_[index]; } const ::size_t& operator[](int index) const { return data_[index]; } //! \brief Conversion operator to T*. operator ::size_t* () { return data_; } //! \brief Conversion operator to const T*. operator const ::size_t* () const { return data_; } }; namespace detail { // Generic getInfoHelper. The final parameter is used to guide overload // resolution: the actual parameter passed is an int, which makes this // a worse conversion sequence than a specialization that declares the // parameter as an int. template inline cl_int getInfoHelper(Functor f, cl_uint name, T* param, long) { return f(name, sizeof(T), param, NULL); } // Specialized getInfoHelper for VECTOR_CLASS params template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, long) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } T* value = (T*) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } param->assign(&value[0], &value[required/sizeof(T)]); return CL_SUCCESS; } /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, int, typename T::cl_type = 0) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } typename T::cl_type * value = (typename T::cl_type *) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } ::size_t elements = required / sizeof(typename T::cl_type); param->assign(&value[0], &value[elements]); for (::size_t i = 0; i < elements; i++) { if (value[i] != NULL) { err = (*param)[i].retain(); if (err != CL_SUCCESS) { return err; } } } return CL_SUCCESS; } // Specialized for getInfo template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, int) { cl_int err = f(name, param->size() * sizeof(char *), &(*param)[0], NULL); if (err != CL_SUCCESS) { return err; } return CL_SUCCESS; } // Specialized GetInfoHelper for STRING_CLASS params template inline cl_int getInfoHelper(Func f, cl_uint name, STRING_CLASS* param, long) { #if defined(__NO_STD_VECTOR) || defined(__NO_STD_STRING) ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } char* value = (char*)alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; return CL_SUCCESS; #else ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } // std::string has a constant data member // a char vector does not VECTOR_CLASS value(required); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { param->assign(value.begin(), value.end()); } #endif return CL_SUCCESS; } // Specialized GetInfoHelper for cl::size_t params template inline cl_int getInfoHelper(Func f, cl_uint name, size_t* param, long) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } ::size_t* value = (::size_t*) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } for(int i = 0; i < N; ++i) { (*param)[i] = value[i]; } return CL_SUCCESS; } template struct ReferenceHandler; /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, T* param, int, typename T::cl_type = 0) { typename T::cl_type value; cl_int err = f(name, sizeof(value), &value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; if (value != NULL) { err = param->retain(); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } #define __PARAM_NAME_INFO_1_0(F) \ F(cl_platform_info, CL_PLATFORM_PROFILE, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_VERSION, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_NAME, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_VENDOR, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_EXTENSIONS, STRING_CLASS) \ \ F(cl_device_info, CL_DEVICE_TYPE, cl_device_type) \ F(cl_device_info, CL_DEVICE_VENDOR_ID, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_COMPUTE_UNITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE, ::size_t) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_SIZES, VECTOR_CLASS< ::size_t>) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_CLOCK_FREQUENCY, cl_uint) \ F(cl_device_info, CL_DEVICE_ADDRESS_BITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_READ_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_MEM_ALLOC_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_WIDTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_HEIGHT, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_WIDTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_HEIGHT, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_DEPTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_MAX_PARAMETER_SIZE, ::size_t) \ F(cl_device_info, CL_DEVICE_MAX_SAMPLERS, cl_uint) \ F(cl_device_info, CL_DEVICE_MEM_BASE_ADDR_ALIGN, cl_uint) \ F(cl_device_info, CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SINGLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, cl_device_mem_cache_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, cl_uint)\ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_TYPE, cl_device_local_mem_type) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_ERROR_CORRECTION_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_PROFILING_TIMER_RESOLUTION, ::size_t) \ F(cl_device_info, CL_DEVICE_ENDIAN_LITTLE, cl_bool) \ F(cl_device_info, CL_DEVICE_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_COMPILER_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_EXECUTION_CAPABILITIES, cl_device_exec_capabilities) \ F(cl_device_info, CL_DEVICE_QUEUE_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_PLATFORM, cl_platform_id) \ F(cl_device_info, CL_DEVICE_NAME, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_VENDOR, STRING_CLASS) \ F(cl_device_info, CL_DRIVER_VERSION, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_PROFILE, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_VERSION, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_EXTENSIONS, STRING_CLASS) \ \ F(cl_context_info, CL_CONTEXT_REFERENCE_COUNT, cl_uint) \ F(cl_context_info, CL_CONTEXT_DEVICES, VECTOR_CLASS) \ F(cl_context_info, CL_CONTEXT_PROPERTIES, VECTOR_CLASS) \ \ F(cl_event_info, CL_EVENT_COMMAND_QUEUE, cl::CommandQueue) \ F(cl_event_info, CL_EVENT_COMMAND_TYPE, cl_command_type) \ F(cl_event_info, CL_EVENT_REFERENCE_COUNT, cl_uint) \ F(cl_event_info, CL_EVENT_COMMAND_EXECUTION_STATUS, cl_int) \ \ F(cl_profiling_info, CL_PROFILING_COMMAND_QUEUED, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_SUBMIT, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_START, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_END, cl_ulong) \ \ F(cl_mem_info, CL_MEM_TYPE, cl_mem_object_type) \ F(cl_mem_info, CL_MEM_FLAGS, cl_mem_flags) \ F(cl_mem_info, CL_MEM_SIZE, ::size_t) \ F(cl_mem_info, CL_MEM_HOST_PTR, void*) \ F(cl_mem_info, CL_MEM_MAP_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_REFERENCE_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_CONTEXT, cl::Context) \ \ F(cl_image_info, CL_IMAGE_FORMAT, cl_image_format) \ F(cl_image_info, CL_IMAGE_ELEMENT_SIZE, ::size_t) \ F(cl_image_info, CL_IMAGE_ROW_PITCH, ::size_t) \ F(cl_image_info, CL_IMAGE_SLICE_PITCH, ::size_t) \ F(cl_image_info, CL_IMAGE_WIDTH, ::size_t) \ F(cl_image_info, CL_IMAGE_HEIGHT, ::size_t) \ F(cl_image_info, CL_IMAGE_DEPTH, ::size_t) \ \ F(cl_sampler_info, CL_SAMPLER_REFERENCE_COUNT, cl_uint) \ F(cl_sampler_info, CL_SAMPLER_CONTEXT, cl::Context) \ F(cl_sampler_info, CL_SAMPLER_NORMALIZED_COORDS, cl_bool) \ F(cl_sampler_info, CL_SAMPLER_ADDRESSING_MODE, cl_addressing_mode) \ F(cl_sampler_info, CL_SAMPLER_FILTER_MODE, cl_filter_mode) \ \ F(cl_program_info, CL_PROGRAM_REFERENCE_COUNT, cl_uint) \ F(cl_program_info, CL_PROGRAM_CONTEXT, cl::Context) \ F(cl_program_info, CL_PROGRAM_NUM_DEVICES, cl_uint) \ F(cl_program_info, CL_PROGRAM_DEVICES, VECTOR_CLASS) \ F(cl_program_info, CL_PROGRAM_SOURCE, STRING_CLASS) \ F(cl_program_info, CL_PROGRAM_BINARY_SIZES, VECTOR_CLASS< ::size_t>) \ F(cl_program_info, CL_PROGRAM_BINARIES, VECTOR_CLASS) \ \ F(cl_program_build_info, CL_PROGRAM_BUILD_STATUS, cl_build_status) \ F(cl_program_build_info, CL_PROGRAM_BUILD_OPTIONS, STRING_CLASS) \ F(cl_program_build_info, CL_PROGRAM_BUILD_LOG, STRING_CLASS) \ \ F(cl_kernel_info, CL_KERNEL_FUNCTION_NAME, STRING_CLASS) \ F(cl_kernel_info, CL_KERNEL_NUM_ARGS, cl_uint) \ F(cl_kernel_info, CL_KERNEL_REFERENCE_COUNT, cl_uint) \ F(cl_kernel_info, CL_KERNEL_CONTEXT, cl::Context) \ F(cl_kernel_info, CL_KERNEL_PROGRAM, cl::Program) \ \ F(cl_kernel_work_group_info, CL_KERNEL_WORK_GROUP_SIZE, ::size_t) \ F(cl_kernel_work_group_info, CL_KERNEL_COMPILE_WORK_GROUP_SIZE, cl::size_t<3>) \ F(cl_kernel_work_group_info, CL_KERNEL_LOCAL_MEM_SIZE, cl_ulong) \ \ F(cl_command_queue_info, CL_QUEUE_CONTEXT, cl::Context) \ F(cl_command_queue_info, CL_QUEUE_DEVICE, cl::Device) \ F(cl_command_queue_info, CL_QUEUE_REFERENCE_COUNT, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_PROPERTIES, cl_command_queue_properties) #if defined(CL_VERSION_1_1) #define __PARAM_NAME_INFO_1_1(F) \ F(cl_context_info, CL_CONTEXT_NUM_DEVICES, cl_uint)\ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_DOUBLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HALF_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HOST_UNIFIED_MEMORY, cl_bool) \ F(cl_device_info, CL_DEVICE_OPENCL_C_VERSION, STRING_CLASS) \ \ F(cl_mem_info, CL_MEM_ASSOCIATED_MEMOBJECT, cl::Memory) \ F(cl_mem_info, CL_MEM_OFFSET, ::size_t) \ \ F(cl_kernel_work_group_info, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, ::size_t) \ F(cl_kernel_work_group_info, CL_KERNEL_PRIVATE_MEM_SIZE, cl_ulong) \ \ F(cl_event_info, CL_EVENT_CONTEXT, cl::Context) #endif // CL_VERSION_1_1 #if defined(CL_VERSION_1_2) #define __PARAM_NAME_INFO_1_2(F) \ F(cl_image_info, CL_IMAGE_BUFFER, cl::Buffer) \ \ F(cl_program_info, CL_PROGRAM_NUM_KERNELS, ::size_t) \ F(cl_program_info, CL_PROGRAM_KERNEL_NAMES, STRING_CLASS) \ \ F(cl_program_build_info, CL_PROGRAM_BINARY_TYPE, cl_program_binary_type) \ \ F(cl_kernel_info, CL_KERNEL_ATTRIBUTES, STRING_CLASS) \ \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ADDRESS_QUALIFIER, cl_kernel_arg_address_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ACCESS_QUALIFIER, cl_kernel_arg_access_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_NAME, STRING_CLASS) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_NAME, STRING_CLASS) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_QUALIFIER, cl_kernel_arg_type_qualifier) \ \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_PROPERTIES, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPE, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, ::size_t) \ F(cl_device_info, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, cl_device_affinity_domain) \ F(cl_device_info, CL_DEVICE_BUILT_IN_KERNELS, STRING_CLASS) #endif // #if defined(CL_VERSION_1_2) #if defined(USE_CL_DEVICE_FISSION) #define __PARAM_NAME_DEVICE_FISSION(F) \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE_EXT, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPES_EXT, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_AFFINITY_DOMAINS_EXT, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT_EXT , cl_uint) \ F(cl_device_info, CL_DEVICE_PARTITION_STYLE_EXT, VECTOR_CLASS) #endif // USE_CL_DEVICE_FISSION template struct param_traits {}; #define __CL_DECLARE_PARAM_TRAITS(token, param_name, T) \ struct token; \ template<> \ struct param_traits \ { \ enum { value = param_name }; \ typedef T param_type; \ }; __PARAM_NAME_INFO_1_0(__CL_DECLARE_PARAM_TRAITS) #if defined(CL_VERSION_1_1) __PARAM_NAME_INFO_1_1(__CL_DECLARE_PARAM_TRAITS) #endif // CL_VERSION_1_1 #if defined(CL_VERSION_1_2) __PARAM_NAME_INFO_1_2(__CL_DECLARE_PARAM_TRAITS) #endif // CL_VERSION_1_1 #if defined(USE_CL_DEVICE_FISSION) __PARAM_NAME_DEVICE_FISSION(__CL_DECLARE_PARAM_TRAITS); #endif // USE_CL_DEVICE_FISSION #ifdef CL_PLATFORM_ICD_SUFFIX_KHR __CL_DECLARE_PARAM_TRAITS(cl_platform_info, CL_PLATFORM_ICD_SUFFIX_KHR, STRING_CLASS) #endif #ifdef CL_DEVICE_PROFILING_TIMER_OFFSET_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PROFILING_TIMER_OFFSET_AMD, cl_ulong) #endif #ifdef CL_DEVICE_GLOBAL_FREE_MEMORY_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, VECTOR_CLASS< ::size_t>) #endif #ifdef CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_WAVEFRONT_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_WAVEFRONT_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_BANKS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_LOCAL_MEM_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV, cl_uint) #endif #ifdef CL_DEVICE_REGISTERS_PER_BLOCK_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_REGISTERS_PER_BLOCK_NV, cl_uint) #endif #ifdef CL_DEVICE_WARP_SIZE_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_WARP_SIZE_NV, cl_uint) #endif #ifdef CL_DEVICE_GPU_OVERLAP_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GPU_OVERLAP_NV, cl_bool) #endif #ifdef CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV, cl_bool) #endif #ifdef CL_DEVICE_INTEGRATED_MEMORY_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_INTEGRATED_MEMORY_NV, cl_bool) #endif // Convenience functions template inline cl_int getInfo(Func f, cl_uint name, T* param) { return getInfoHelper(f, name, param, 0); } template struct GetInfoFunctor0 { Func f_; const Arg0& arg0_; cl_int operator ()( cl_uint param, ::size_t size, void* value, ::size_t* size_ret) { return f_(arg0_, param, size, value, size_ret); } }; template struct GetInfoFunctor1 { Func f_; const Arg0& arg0_; const Arg1& arg1_; cl_int operator ()( cl_uint param, ::size_t size, void* value, ::size_t* size_ret) { return f_(arg0_, arg1_, param, size, value, size_ret); } }; template inline cl_int getInfo(Func f, const Arg0& arg0, cl_uint name, T* param) { GetInfoFunctor0 f0 = { f, arg0 }; return getInfoHelper(f0, name, param, 0); } template inline cl_int getInfo(Func f, const Arg0& arg0, const Arg1& arg1, cl_uint name, T* param) { GetInfoFunctor1 f0 = { f, arg0, arg1 }; return getInfoHelper(f0, name, param, 0); } template struct ReferenceHandler { }; #if defined(CL_VERSION_1_2) /** * OpenCL 1.2 devices do have retain/release. */ template <> struct ReferenceHandler { /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int retain(cl_device_id device) { return ::clRetainDevice(device); } /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int release(cl_device_id device) { return ::clReleaseDevice(device); } }; #else // #if defined(CL_VERSION_1_2) /** * OpenCL 1.1 devices do not have retain/release. */ template <> struct ReferenceHandler { // cl_device_id does not have retain(). static cl_int retain(cl_device_id) { return CL_SUCCESS; } // cl_device_id does not have release(). static cl_int release(cl_device_id) { return CL_SUCCESS; } }; #endif // #if defined(CL_VERSION_1_2) template <> struct ReferenceHandler { // cl_platform_id does not have retain(). static cl_int retain(cl_platform_id) { return CL_SUCCESS; } // cl_platform_id does not have release(). static cl_int release(cl_platform_id) { return CL_SUCCESS; } }; template <> struct ReferenceHandler { static cl_int retain(cl_context context) { return ::clRetainContext(context); } static cl_int release(cl_context context) { return ::clReleaseContext(context); } }; template <> struct ReferenceHandler { static cl_int retain(cl_command_queue queue) { return ::clRetainCommandQueue(queue); } static cl_int release(cl_command_queue queue) { return ::clReleaseCommandQueue(queue); } }; template <> struct ReferenceHandler { static cl_int retain(cl_mem memory) { return ::clRetainMemObject(memory); } static cl_int release(cl_mem memory) { return ::clReleaseMemObject(memory); } }; template <> struct ReferenceHandler { static cl_int retain(cl_sampler sampler) { return ::clRetainSampler(sampler); } static cl_int release(cl_sampler sampler) { return ::clReleaseSampler(sampler); } }; template <> struct ReferenceHandler { static cl_int retain(cl_program program) { return ::clRetainProgram(program); } static cl_int release(cl_program program) { return ::clReleaseProgram(program); } }; template <> struct ReferenceHandler { static cl_int retain(cl_kernel kernel) { return ::clRetainKernel(kernel); } static cl_int release(cl_kernel kernel) { return ::clReleaseKernel(kernel); } }; template <> struct ReferenceHandler { static cl_int retain(cl_event event) { return ::clRetainEvent(event); } static cl_int release(cl_event event) { return ::clReleaseEvent(event); } }; // Extracts version number with major in the upper 16 bits, minor in the lower 16 static cl_uint getVersion(const char *versionInfo) { int highVersion = 0; int lowVersion = 0; int index = 7; while(versionInfo[index] != '.' ) { highVersion *= 10; highVersion += versionInfo[index]-'0'; ++index; } ++index; while(versionInfo[index] != ' ' && versionInfo[index] != '\0') { lowVersion *= 10; lowVersion += versionInfo[index]-'0'; ++index; } return (highVersion << 16) | lowVersion; } static cl_uint getPlatformVersion(cl_platform_id platform) { ::size_t size = 0; clGetPlatformInfo(platform, CL_PLATFORM_VERSION, 0, NULL, &size); char *versionInfo = (char *) alloca(size); clGetPlatformInfo(platform, CL_PLATFORM_VERSION, size, &versionInfo[0], &size); return getVersion(versionInfo); } static cl_uint getDevicePlatformVersion(cl_device_id device) { cl_platform_id platform; clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(platform), &platform, NULL); return getPlatformVersion(platform); } #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) static cl_uint getContextPlatformVersion(cl_context context) { // The platform cannot be queried directly, so we first have to grab a // device and obtain its context ::size_t size = 0; clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &size); if (size == 0) return 0; cl_device_id *devices = (cl_device_id *) alloca(size); clGetContextInfo(context, CL_CONTEXT_DEVICES, size, devices, NULL); return getDevicePlatformVersion(devices[0]); } #endif // #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) template class Wrapper { public: typedef T cl_type; protected: cl_type object_; public: Wrapper() : object_(NULL) { } Wrapper(const cl_type &obj) : object_(obj) { } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT { object_ = rhs.object_; rhs.object_ = NULL; } #endif Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; rhs.object_ = NULL; } return *this; } #endif Wrapper& operator = (const cl_type &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs; return *this; } cl_type operator ()() const { return object_; } cl_type& operator ()() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); cl_int retain() const { return ReferenceHandler::retain(object_); } cl_int release() const { return ReferenceHandler::release(object_); } }; template <> class Wrapper { public: typedef cl_device_id cl_type; protected: cl_type object_; bool referenceCountable_; static bool isReferenceCountable(cl_device_id device) { bool retVal = false; if (device != NULL) { int version = getDevicePlatformVersion(device); if(version > ((1 << 16) + 1)) { retVal = true; } } return retVal; } public: Wrapper() : object_(NULL), referenceCountable_(false) { } Wrapper(const cl_type &obj) : object_(obj), referenceCountable_(false) { referenceCountable_ = isReferenceCountable(obj); } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; referenceCountable_ = isReferenceCountable(object_); if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT { object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } #endif Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } return *this; } #endif Wrapper& operator = (const cl_type &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs; referenceCountable_ = isReferenceCountable(object_); return *this; } cl_type operator ()() const { return object_; } cl_type& operator ()() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); template friend inline cl_int getInfoHelper(Func, cl_uint, VECTOR_CLASS*, int, typename U::cl_type); cl_int retain() const { if( referenceCountable_ ) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if( referenceCountable_ ) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; } // namespace detail //! \endcond /*! \stuct ImageFormat * \brief Adds constructors and member functions for cl_image_format. * * \see cl_image_format */ struct ImageFormat : public cl_image_format { //! \brief Default constructor - performs no initialization. ImageFormat(){} //! \brief Initializing constructor. ImageFormat(cl_channel_order order, cl_channel_type type) { image_channel_order = order; image_channel_data_type = type; } //! \brief Assignment operator. ImageFormat& operator = (const ImageFormat& rhs) { if (this != &rhs) { this->image_channel_data_type = rhs.image_channel_data_type; this->image_channel_order = rhs.image_channel_order; } return *this; } }; /*! \brief Class interface for cl_device_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_device_id */ class Device : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Device() : detail::Wrapper() { } /*! \brief Constructor from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ __CL_EXPLICIT_CONSTRUCTORS Device(const cl_device_id &device) : detail::Wrapper(device) { } /*! \brief Returns the first device on the default context. * * \see Context::getDefault() */ static Device getDefault(cl_int * err = NULL); /*! \brief Assignment operator from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ Device& operator = (const cl_device_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Device(const Device& dev) : detail::Wrapper(dev) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Device& operator = (const Device &dev) { detail::Wrapper::operator=(dev); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Device(Device&& dev) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(dev)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Device& operator = (Device &&dev) { detail::Wrapper::operator=(std::move(dev)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetDeviceInfo(). template cl_int getInfo(cl_device_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetDeviceInfo, object_, name, param), __GET_DEVICE_INFO_ERR); } //! \brief Wrapper for clGetDeviceInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_device_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /** * CL 1.2 version */ #if defined(CL_VERSION_1_2) //! \brief Wrapper for clCreateSubDevicesEXT(). cl_int createSubDevices( const cl_device_partition_property * properties, VECTOR_CLASS* devices) { cl_uint n = 0; cl_int err = clCreateSubDevices(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = clCreateSubDevices(object_, properties, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif // #if defined(CL_VERSION_1_2) /** * CL 1.1 version that uses device fission. */ #if defined(CL_VERSION_1_1) #if defined(USE_CL_DEVICE_FISSION) cl_int createSubDevices( const cl_device_partition_property_ext * properties, VECTOR_CLASS* devices) { typedef CL_API_ENTRY cl_int ( CL_API_CALL * PFN_clCreateSubDevicesEXT)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; static PFN_clCreateSubDevicesEXT pfn_clCreateSubDevicesEXT = NULL; __INIT_CL_EXT_FCN_PTR(clCreateSubDevicesEXT); cl_uint n = 0; cl_int err = pfn_clCreateSubDevicesEXT(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = pfn_clCreateSubDevicesEXT(object_, properties, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif // #if defined(USE_CL_DEVICE_FISSION) #endif // #if defined(CL_VERSION_1_1) }; /*! \brief Class interface for cl_platform_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_platform_id */ class Platform : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Platform() : detail::Wrapper() { } /*! \brief Constructor from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ __CL_EXPLICIT_CONSTRUCTORS Platform(const cl_platform_id &platform) : detail::Wrapper(platform) { } /*! \brief Assignment operator from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ Platform& operator = (const cl_platform_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetPlatformInfo(). cl_int getInfo(cl_platform_info name, STRING_CLASS* param) const { return detail::errHandler( detail::getInfo(&::clGetPlatformInfo, object_, name, param), __GET_PLATFORM_INFO_ERR); } //! \brief Wrapper for clGetPlatformInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_platform_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of devices for this platform. * * Wraps clGetDeviceIDs(). */ cl_int getDevices( cl_device_type type, VECTOR_CLASS* devices) const { cl_uint n = 0; if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } cl_int err = ::clGetDeviceIDs(object_, type, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = ::clGetDeviceIDs(object_, type, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #if defined(USE_DX_INTEROP) /*! \brief Get the list of available D3D10 devices. * * \param d3d_device_source. * * \param d3d_object. * * \param d3d_device_set. * * \param devices returns a vector of OpenCL D3D10 devices found. The cl::Device * values returned in devices can be used to identify a specific OpenCL * device. If \a devices argument is NULL, this argument is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * * The application can query specific capabilities of the OpenCL device(s) * returned by cl::getDevices. This can be used by the application to * determine which device(s) to use. * * \note In the case that exceptions are enabled and a return value * other than CL_SUCCESS is generated, then cl::Error exception is * generated. */ cl_int getDevices( cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, VECTOR_CLASS* devices) const { typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clGetDeviceIDsFromD3D10KHR)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint* num_devices); if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } static PFN_clGetDeviceIDsFromD3D10KHR pfn_clGetDeviceIDsFromD3D10KHR = NULL; __INIT_CL_EXT_FCN_PTR_PLATFORM(object_, clGetDeviceIDsFromD3D10KHR); cl_uint n = 0; cl_int err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif /*! \brief Gets a list of available platforms. * * Wraps clGetPlatformIDs(). */ static cl_int get( VECTOR_CLASS* platforms) { cl_uint n = 0; if( platforms == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } platforms->assign(&ids[0], &ids[n]); return CL_SUCCESS; } /*! \brief Gets the first available platform. * * Wraps clGetPlatformIDs(), returning the first result. */ static cl_int get( Platform * platform) { cl_uint n = 0; if( platform == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } *platform = ids[0]; return CL_SUCCESS; } /*! \brief Gets the first available platform, returning it by value. * * Wraps clGetPlatformIDs(), returning the first result. */ static Platform get( cl_int * errResult = NULL) { Platform platform; cl_uint n = 0; cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { detail::errHandler(err, __GET_PLATFORM_IDS_ERR); if (errResult != NULL) { *errResult = err; } return Platform(); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { detail::errHandler(err, __GET_PLATFORM_IDS_ERR); if (errResult != NULL) { *errResult = err; } return Platform(); } return Platform(ids[0]); } static Platform getDefault( cl_int *errResult = NULL ) { return get(errResult); } #if defined(CL_VERSION_1_2) //! \brief Wrapper for clUnloadCompiler(). cl_int unloadCompiler() { return ::clUnloadPlatformCompiler(object_); } #endif // #if defined(CL_VERSION_1_2) }; // class Platform /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) /** * Unload the OpenCL compiler. * \note Deprecated for OpenCL 1.2. Use Platform::unloadCompiler instead. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int UnloadCompiler() CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline cl_int UnloadCompiler() { return ::clUnloadCompiler(); } #endif // #if defined(CL_VERSION_1_1) /*! \brief Class interface for cl_context. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_context as the original. For details, see * clRetainContext() and clReleaseContext(). * * \see cl_context */ class Context : public detail::Wrapper { private: #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED static std::atomic default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED static volatile int default_initialized_; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED static Context default_; static volatile cl_int default_error_; public: /*! \brief Constructs a context including a list of specified devices. * * Wraps clCreateContext(). */ Context( const VECTOR_CLASS& devices, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateContext( properties, (cl_uint) numDevices, deviceIDs, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } Context( const Device& device, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; cl_device_id deviceID = device(); object_ = ::clCreateContext( properties, 1, &deviceID, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a context including all or a subset of devices of a specified type. * * Wraps clCreateContextFromType(). */ Context( cl_device_type type, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties prop[4] = {CL_CONTEXT_PLATFORM, 0, 0, 0 }; if (properties == NULL) { // Get a valid platform ID as we cannot send in a blank one VECTOR_CLASS platforms; error = Platform::get(&platforms); if (error != CL_SUCCESS) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } return; } // Check the platforms we found for a device of our specified type cl_context_properties platform_id = 0; for (unsigned int i = 0; i < platforms.size(); i++) { VECTOR_CLASS devices; #if defined(__CL_ENABLE_EXCEPTIONS) try { #endif error = platforms[i].getDevices(type, &devices); #if defined(__CL_ENABLE_EXCEPTIONS) } catch (Error) {} // Catch if exceptions are enabled as we don't want to exit if first platform has no devices of type // We do error checking next anyway, and can throw there if needed #endif // Only squash CL_SUCCESS and CL_DEVICE_NOT_FOUND if (error != CL_SUCCESS && error != CL_DEVICE_NOT_FOUND) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } if (devices.size() > 0) { platform_id = (cl_context_properties)platforms[i](); break; } } if (platform_id == 0) { detail::errHandler(CL_DEVICE_NOT_FOUND, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = CL_DEVICE_NOT_FOUND; } return; } prop[1] = platform_id; properties = &prop[0]; } #endif object_ = ::clCreateContextFromType( properties, type, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Context(const Context& ctx) : detail::Wrapper(ctx) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Context& operator = (const Context &ctx) { detail::Wrapper::operator=(ctx); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Context(Context&& ctx) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(ctx)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Context& operator = (Context &&ctx) { detail::Wrapper::operator=(std::move(ctx)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Returns a singleton context including all devices of CL_DEVICE_TYPE_DEFAULT. * * \note All calls to this function return the same cl_context as the first. */ static Context getDefault(cl_int * err = NULL) { int state = detail::compare_exchange( &default_initialized_, __DEFAULT_BEING_INITIALIZED, __DEFAULT_NOT_INITIALIZED); if (state & __DEFAULT_INITIALIZED) { if (err != NULL) { *err = default_error_; } return default_; } if (state & __DEFAULT_BEING_INITIALIZED) { // Assume writes will propagate eventually... while(default_initialized_ != __DEFAULT_INITIALIZED) { detail::fence(); } if (err != NULL) { *err = default_error_; } return default_; } cl_int error; default_ = Context( CL_DEVICE_TYPE_DEFAULT, NULL, NULL, NULL, &error); detail::fence(); default_error_ = error; // Assume writes will propagate eventually... default_initialized_ = __DEFAULT_INITIALIZED; detail::fence(); if (err != NULL) { *err = default_error_; } return default_; } //! \brief Default constructor - initializes to NULL. Context() : detail::Wrapper() { } /*! \brief Constructor from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the cl_context * into the new Context object. */ __CL_EXPLICIT_CONSTRUCTORS Context(const cl_context& context) : detail::Wrapper(context) { } /*! \brief Assignment operator from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseContext() on the value previously held by this instance. */ Context& operator = (const cl_context& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetContextInfo(). template cl_int getInfo(cl_context_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetContextInfo, object_, name, param), __GET_CONTEXT_INFO_ERR); } //! \brief Wrapper for clGetContextInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_context_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of supported image formats. * * Wraps clGetSupportedImageFormats(). */ cl_int getSupportedImageFormats( cl_mem_flags flags, cl_mem_object_type type, VECTOR_CLASS* formats) const { cl_uint numEntries; if (!formats) { return CL_SUCCESS; } cl_int err = ::clGetSupportedImageFormats( object_, flags, type, 0, NULL, &numEntries); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } if (numEntries > 0) { ImageFormat* value = (ImageFormat*) alloca(numEntries * sizeof(ImageFormat)); err = ::clGetSupportedImageFormats( object_, flags, type, numEntries, (cl_image_format*)value, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } formats->assign(&value[0], &value[numEntries]); } else { formats->clear(); } return CL_SUCCESS; } }; inline Device Device::getDefault(cl_int * err) { cl_int error; Device device; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { device = context.getInfo()[0]; if (err != NULL) { *err = CL_SUCCESS; } } return device; } #ifdef _WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) std::atomic Context::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) volatile int Context::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) Context Context::default_; __declspec(selectany) volatile cl_int Context::default_error_ = CL_SUCCESS; #else // !_WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) std::atomic Context::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) volatile int Context::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) Context Context::default_; __attribute__((weak)) volatile cl_int Context::default_error_ = CL_SUCCESS; #endif // !_WIN32 /*! \brief Class interface for cl_event. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_event as the original. For details, see * clRetainEvent() and clReleaseEvent(). * * \see cl_event */ class Event : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Event() : detail::Wrapper() { } /*! \brief Constructor from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the cl_event * into the new Event object. */ __CL_EXPLICIT_CONSTRUCTORS Event(const cl_event& event) : detail::Wrapper(event) { } /*! \brief Assignment operator from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseEvent() on the value previously held by this instance. */ Event& operator = (const cl_event& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetEventInfo(). template cl_int getInfo(cl_event_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetEventInfo, object_, name, param), __GET_EVENT_INFO_ERR); } //! \brief Wrapper for clGetEventInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_event_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } //! \brief Wrapper for clGetEventProfilingInfo(). template cl_int getProfilingInfo(cl_profiling_info name, T* param) const { return detail::errHandler(detail::getInfo( &::clGetEventProfilingInfo, object_, name, param), __GET_EVENT_PROFILE_INFO_ERR); } //! \brief Wrapper for clGetEventProfilingInfo() that returns by value. template typename detail::param_traits::param_type getProfilingInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_profiling_info, name>::param_type param; cl_int result = getProfilingInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Blocks the calling thread until this event completes. * * Wraps clWaitForEvents(). */ cl_int wait() const { return detail::errHandler( ::clWaitForEvents(1, &object_), __WAIT_FOR_EVENTS_ERR); } #if defined(CL_VERSION_1_1) /*! \brief Registers a user callback function for a specific command execution status. * * Wraps clSetEventCallback(). */ cl_int setCallback( cl_int type, void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data = NULL) { return detail::errHandler( ::clSetEventCallback( object_, type, pfn_notify, user_data), __SET_EVENT_CALLBACK_ERR); } #endif /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ static cl_int waitForEvents(const VECTOR_CLASS& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } }; #if defined(CL_VERSION_1_1) /*! \brief Class interface for user events (a subset of cl_event's). * * See Event for details about copy semantics, etc. */ class UserEvent : public Event { public: /*! \brief Constructs a user event on a given context. * * Wraps clCreateUserEvent(). */ UserEvent( const Context& context, cl_int * err = NULL) { cl_int error; object_ = ::clCreateUserEvent( context(), &error); detail::errHandler(error, __CREATE_USER_EVENT_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. UserEvent() : Event() { } /*! \brief Sets the execution status of a user event object. * * Wraps clSetUserEventStatus(). */ cl_int setStatus(cl_int status) { return detail::errHandler( ::clSetUserEventStatus(object_,status), __SET_USER_EVENT_STATUS_ERR); } }; #endif /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ inline static cl_int WaitForEvents(const VECTOR_CLASS& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } /*! \brief Class interface for cl_mem. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_mem as the original. For details, see * clRetainMemObject() and clReleaseMemObject(). * * \see cl_mem */ class Memory : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Memory() : detail::Wrapper() { } /*! \brief Constructor from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the cl_mem * into the new Memory object. */ __CL_EXPLICIT_CONSTRUCTORS Memory(const cl_mem& memory) : detail::Wrapper(memory) { } /*! \brief Assignment operator from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseMemObject() on the value previously held by this instance. */ Memory& operator = (const cl_mem& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Memory(const Memory& mem) : detail::Wrapper(mem) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Memory& operator = (const Memory &mem) { detail::Wrapper::operator=(mem); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Memory(Memory&& mem) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(mem)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Memory& operator = (Memory &&mem) { detail::Wrapper::operator=(std::move(mem)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_mem_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetMemObjectInfo, object_, name, param), __GET_MEM_OBJECT_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_mem_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if defined(CL_VERSION_1_1) /*! \brief Registers a callback function to be called when the memory object * is no longer needed. * * Wraps clSetMemObjectDestructorCallback(). * * Repeated calls to this function, for a given cl_mem value, will append * to the list of functions called (in reverse order) when memory object's * resources are freed and the memory object is deleted. * * \note * The registered callbacks are associated with the underlying cl_mem * value - not the Memory class instance. */ cl_int setDestructorCallback( void (CL_CALLBACK * pfn_notify)(cl_mem, void *), void * user_data = NULL) { return detail::errHandler( ::clSetMemObjectDestructorCallback( object_, pfn_notify, user_data), __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR); } #endif }; // Pre-declare copy functions class Buffer; template< typename IteratorType > cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); /*! \brief Class interface for Buffer Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Buffer : public Memory { public: /*! \brief Constructs a Buffer in a specified context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. */ Buffer( const Context& context, cl_mem_flags flags, ::size_t size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Buffer in the default context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. * * \see Context::getDefault() */ Buffer( cl_mem_flags flags, ::size_t size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! * \brief Construct a Buffer from a host container via iterators. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer( IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); Context context = Context::getDefault(err); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { error = cl::copy(startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Construct a Buffer from a host container via iterators using a specified context. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); /*! * \brief Construct a Buffer from a host container via iterators using a specified queue. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Buffer() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Buffer(const cl_mem& buffer) : Memory(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Buffer& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Buffer(const Buffer& buf) : Memory(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Buffer& operator = (const Buffer &buf) { Memory::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Buffer(Buffer&& buf) CL_HPP_NOEXCEPT : Memory(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Buffer& operator = (Buffer &&buf) { Memory::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) #if defined(CL_VERSION_1_1) /*! \brief Creates a new buffer object from this. * * Wraps clCreateSubBuffer(). */ Buffer createSubBuffer( cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * err = NULL) { Buffer result; cl_int error; result.object_ = ::clCreateSubBuffer( object_, flags, buffer_create_type, buffer_create_info, &error); detail::errHandler(error, __CREATE_SUBBUFFER_ERR); if (err != NULL) { *err = error; } return result; } #endif }; #if defined (USE_DX_INTEROP) /*! \brief Class interface for creating OpenCL buffers from ID3D10Buffer's. * * This is provided to facilitate interoperability with Direct3D. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferD3D10 : public Buffer { public: typedef CL_API_ENTRY cl_mem (CL_API_CALL *PFN_clCreateFromD3D10BufferKHR)( cl_context context, cl_mem_flags flags, ID3D10Buffer* buffer, cl_int* errcode_ret); /*! \brief Constructs a BufferD3D10, in a specified context, from a * given ID3D10Buffer. * * Wraps clCreateFromD3D10BufferKHR(). */ BufferD3D10( const Context& context, cl_mem_flags flags, ID3D10Buffer* bufobj, cl_int * err = NULL) { static PFN_clCreateFromD3D10BufferKHR pfn_clCreateFromD3D10BufferKHR = NULL; #if defined(CL_VERSION_1_2) vector props = context.getInfo(); cl_platform platform = -1; for( int i = 0; i < props.size(); ++i ) { if( props[i] == CL_CONTEXT_PLATFORM ) { platform = props[i+1]; } } __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clCreateFromD3D10BufferKHR); #endif #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clCreateFromD3D10BufferKHR); #endif cl_int error; object_ = pfn_clCreateFromD3D10BufferKHR( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferD3D10() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS BufferD3D10(const cl_mem& buffer) : Buffer(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferD3D10& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10(const BufferD3D10& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (const BufferD3D10 &buf) { Buffer::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10(BufferD3D10&& buf) CL_HPP_NOEXCEPT : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (BufferD3D10 &&buf) { Buffer::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif /*! \brief Class interface for GL Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferGL : public Buffer { public: /*! \brief Constructs a BufferGL in a specified context, from a given * GL buffer. * * Wraps clCreateFromGLBuffer(). */ BufferGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLBuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS BufferGL(const cl_mem& buffer) : Buffer(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL(const BufferGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (const BufferGL &buf) { Buffer::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferGL(BufferGL&& buf) CL_HPP_NOEXCEPT : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (BufferGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief C++ base class for Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image : public Memory { protected: //! \brief Default constructor - initializes to NULL. Image() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image(const cl_mem& image) : Memory(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image(const Image& img) : Memory(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image& operator = (const Image &img) { Memory::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image(Image&& img) CL_HPP_NOEXCEPT : Memory(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image& operator = (Image &&img) { Memory::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) public: //! \brief Wrapper for clGetImageInfo(). template cl_int getImageInfo(cl_image_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetImageInfo, object_, name, param), __GET_IMAGE_INFO_ERR); } //! \brief Wrapper for clGetImageInfo() that returns by value. template typename detail::param_traits::param_type getImageInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_image_info, name>::param_type param; cl_int result = getImageInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; #if defined(CL_VERSION_1_2) /*! \brief Class interface for 1D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image1D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image1D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D, width, 0, 0, 0, 0, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image1D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image1D(const cl_mem& image1D) : Image(image1D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image1D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1D(const Image1D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1D& operator = (const Image1D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1D(Image1D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1D& operator = (Image1D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; /*! \class Image1DBuffer * \brief Image interface for 1D buffer images. */ class Image1DBuffer : public Image { public: Image1DBuffer( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, const Buffer &buffer, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_BUFFER, width, 0, 0, 0, 0, 0, 0, 0, buffer() }; object_ = ::clCreateImage( context(), flags, &format, &desc, NULL, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DBuffer() { } __CL_EXPLICIT_CONSTRUCTORS Image1DBuffer(const cl_mem& image1D) : Image(image1D) { } Image1DBuffer& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer(const Image1DBuffer& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (const Image1DBuffer &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer(Image1DBuffer&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (Image1DBuffer &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; /*! \class Image1DArray * \brief Image interface for arrays of 1D images. */ class Image1DArray : public Image { public: Image1DArray( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t arraySize, ::size_t width, ::size_t rowPitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_ARRAY, width, 0, 0, // height, depth (unused) arraySize, rowPitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DArray() { } __CL_EXPLICIT_CONSTRUCTORS Image1DArray(const cl_mem& imageArray) : Image(imageArray) { } Image1DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray(const Image1DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (const Image1DArray &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray(Image1DArray&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (Image1DArray &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for 2D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image2D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, ::size_t height, ::size_t row_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif defined(CL_VERSION_1_2) useCreateImage = true; #else useCreateImage = false; #endif #if defined(CL_VERSION_1_2) if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) if (!useCreateImage) { object_ = ::clCreateImage2D( context(), flags,&format, width, height, row_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE2D_ERR); if (err != NULL) { *err = error; } } #endif // #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) } //! \brief Default constructor - initializes to NULL. Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image2D(const cl_mem& image2D) : Image(image2D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2D(const Image2D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2D& operator = (const Image2D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2D(Image2D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2D& operator = (Image2D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #if !defined(CL_VERSION_1_2) /*! \brief Class interface for GL 2D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory * \note Deprecated for OpenCL 1.2. Please use ImageGL instead. */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED Image2DGL CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED : public Image2D { public: /*! \brief Constructs an Image2DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture2D(). */ Image2DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture2D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_2D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image2DGL() : Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image2DGL(const cl_mem& image) : Image2D(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2DGL& operator = (const cl_mem& rhs) { Image2D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL(const Image2DGL& img) : Image2D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (const Image2DGL &img) { Image2D::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL(Image2DGL&& img) CL_HPP_NOEXCEPT : Image2D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (Image2DGL &&img) { Image2D::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if !defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_2) /*! \class Image2DArray * \brief Image interface for arrays of 2D images. */ class Image2DArray : public Image { public: Image2DArray( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t arraySize, ::size_t width, ::size_t height, ::size_t rowPitch, ::size_t slicePitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D_ARRAY, width, height, 0, // depth (unused) arraySize, rowPitch, slicePitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image2DArray() { } __CL_EXPLICIT_CONSTRUCTORS Image2DArray(const cl_mem& imageArray) : Image(imageArray) { } Image2DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray(const Image2DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (const Image2DArray &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray(Image2DArray&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (Image2DArray &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for 3D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3D : public Image { public: /*! \brief Constructs a 3D Image in a specified context. * * Wraps clCreateImage(). */ Image3D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, ::size_t height, ::size_t depth, ::size_t row_pitch = 0, ::size_t slice_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif defined(CL_VERSION_1_2) useCreateImage = true; #else useCreateImage = false; #endif #if defined(CL_VERSION_1_2) if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE3D, width, height, depth, 0, // array size (unused) row_pitch, slice_pitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) if (!useCreateImage) { object_ = ::clCreateImage3D( context(), flags, &format, width, height, depth, row_pitch, slice_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE3D_ERR); if (err != NULL) { *err = error; } } #endif // #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) } //! \brief Default constructor - initializes to NULL. Image3D() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image3D(const cl_mem& image3D) : Image(image3D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3D(const Image3D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3D& operator = (const Image3D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3D(Image3D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3D& operator = (Image3D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #if !defined(CL_VERSION_1_2) /*! \brief Class interface for GL 3D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3DGL : public Image3D { public: /*! \brief Constructs an Image3DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture3D(). */ Image3DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture3D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_3D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image3DGL() : Image3D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image3DGL(const cl_mem& image) : Image3D(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3DGL& operator = (const cl_mem& rhs) { Image3D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL(const Image3DGL& img) : Image3D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (const Image3DGL &img) { Image3D::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL(Image3DGL&& img) CL_HPP_NOEXCEPT : Image3D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (Image3DGL &&img) { Image3D::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if !defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_2) /*! \class ImageGL * \brief general image interface for GL interop. * We abstract the 2D and 3D GL images into a single instance here * that wraps all GL sourced images on the grounds that setup information * was performed by OpenCL anyway. */ class ImageGL : public Image { public: ImageGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_ERR); if (err != NULL) { *err = error; } } ImageGL() : Image() { } __CL_EXPLICIT_CONSTRUCTORS ImageGL(const cl_mem& image) : Image(image) { } ImageGL& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL(const ImageGL& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (const ImageGL &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ ImageGL(ImageGL&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (ImageGL &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for GL Render Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferRenderGL : #if defined(CL_VERSION_1_2) public ImageGL #else // #if defined(CL_VERSION_1_2) public Image2DGL #endif //#if defined(CL_VERSION_1_2) { public: /*! \brief Constructs a BufferRenderGL in a specified context, from a given * GL Renderbuffer. * * Wraps clCreateFromGLRenderbuffer(). */ BufferRenderGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLRenderbuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_RENDER_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. #if defined(CL_VERSION_1_2) BufferRenderGL() : ImageGL() {}; #else // #if defined(CL_VERSION_1_2) BufferRenderGL() : Image2DGL() {}; #endif //#if defined(CL_VERSION_1_2) /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ #if defined(CL_VERSION_1_2) __CL_EXPLICIT_CONSTRUCTORS BufferRenderGL(const cl_mem& buffer) : ImageGL(buffer) { } #else // #if defined(CL_VERSION_1_2) __CL_EXPLICIT_CONSTRUCTORS BufferRenderGL(const cl_mem& buffer) : Image2DGL(buffer) { } #endif //#if defined(CL_VERSION_1_2) /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferRenderGL& operator = (const cl_mem& rhs) { #if defined(CL_VERSION_1_2) ImageGL::operator=(rhs); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(rhs); #endif //#if defined(CL_VERSION_1_2) return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ #if defined(CL_VERSION_1_2) BufferRenderGL(const BufferRenderGL& buf) : ImageGL(buf) {} #else // #if defined(CL_VERSION_1_2) BufferRenderGL(const BufferRenderGL& buf) : Image2DGL(buf) {} #endif //#if defined(CL_VERSION_1_2) /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (const BufferRenderGL &rhs) { #if defined(CL_VERSION_1_2) ImageGL::operator=(rhs); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(rhs); #endif //#if defined(CL_VERSION_1_2) return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ #if defined(CL_VERSION_1_2) BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT : ImageGL(std::move(buf)) {} #else // #if defined(CL_VERSION_1_2) BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT : Image2DGL(std::move(buf)) {} #endif //#if defined(CL_VERSION_1_2) /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (BufferRenderGL &&buf) { #if defined(CL_VERSION_1_2) ImageGL::operator=(std::move(buf)); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(std::move(buf)); #endif //#if defined(CL_VERSION_1_2) return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_, type, gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief Class interface for cl_sampler. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_sampler as the original. For details, see * clRetainSampler() and clReleaseSampler(). * * \see cl_sampler */ class Sampler : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Sampler() { } /*! \brief Constructs a Sampler in a specified context. * * Wraps clCreateSampler(). */ Sampler( const Context& context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int* err = NULL) { cl_int error; object_ = ::clCreateSampler( context(), normalized_coords, addressing_mode, filter_mode, &error); detail::errHandler(error, __CREATE_SAMPLER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructor from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the cl_sampler * into the new Sampler object. */ __CL_EXPLICIT_CONSTRUCTORS Sampler(const cl_sampler& sampler) : detail::Wrapper(sampler) { } /*! \brief Assignment operator from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseSampler() on the value previously held by this instance. */ Sampler& operator = (const cl_sampler& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Sampler(const Sampler& sam) : detail::Wrapper(sam) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Sampler& operator = (const Sampler &sam) { detail::Wrapper::operator=(sam); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Sampler(Sampler&& sam) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(sam)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Sampler& operator = (Sampler &&sam) { detail::Wrapper::operator=(std::move(sam)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetSamplerInfo(). template cl_int getInfo(cl_sampler_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetSamplerInfo, object_, name, param), __GET_SAMPLER_INFO_ERR); } //! \brief Wrapper for clGetSamplerInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_sampler_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; class Program; class CommandQueue; class Kernel; //! \brief Class interface for specifying NDRange values. class NDRange { private: size_t<3> sizes_; cl_uint dimensions_; public: //! \brief Default constructor - resulting range has zero dimensions. NDRange() : dimensions_(0) { } //! \brief Constructs one-dimensional range. NDRange(::size_t size0) : dimensions_(1) { sizes_[0] = size0; } //! \brief Constructs two-dimensional range. NDRange(::size_t size0, ::size_t size1) : dimensions_(2) { sizes_[0] = size0; sizes_[1] = size1; } //! \brief Constructs three-dimensional range. NDRange(::size_t size0, ::size_t size1, ::size_t size2) : dimensions_(3) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = size2; } /*! \brief Conversion operator to const ::size_t *. * * \returns a pointer to the size of the first dimension. */ operator const ::size_t*() const { return (const ::size_t*) sizes_; } //! \brief Queries the number of dimensions in the range. ::size_t dimensions() const { return dimensions_; } }; //! \brief A zero-dimensional range. static const NDRange NullRange; //! \brief Local address wrapper for use with Kernel::setArg struct LocalSpaceArg { ::size_t size_; }; namespace detail { template struct KernelArgumentHandler { static ::size_t size(const T&) { return sizeof(T); } static const T* ptr(const T& value) { return &value; } }; template <> struct KernelArgumentHandler { static ::size_t size(const LocalSpaceArg& value) { return value.size_; } static const void* ptr(const LocalSpaceArg&) { return NULL; } }; } //! \endcond /*! __local * \brief Helper function for generating LocalSpaceArg objects. * Deprecated. Replaced with Local. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED LocalSpaceArg __local(::size_t size) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline LocalSpaceArg __local(::size_t size) { LocalSpaceArg ret = { size }; return ret; } /*! Local * \brief Helper function for generating LocalSpaceArg objects. */ inline LocalSpaceArg Local(::size_t size) { LocalSpaceArg ret = { size }; return ret; } //class KernelFunctor; /*! \brief Class interface for cl_kernel. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_kernel as the original. For details, see * clRetainKernel() and clReleaseKernel(). * * \see cl_kernel */ class Kernel : public detail::Wrapper { public: inline Kernel(const Program& program, const char* name, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Kernel() { } /*! \brief Constructor from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the cl_kernel * into the new Kernel object. */ __CL_EXPLICIT_CONSTRUCTORS Kernel(const cl_kernel& kernel) : detail::Wrapper(kernel) { } /*! \brief Assignment operator from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseKernel() on the value previously held by this instance. */ Kernel& operator = (const cl_kernel& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Kernel(const Kernel& kernel) : detail::Wrapper(kernel) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Kernel& operator = (const Kernel &kernel) { detail::Wrapper::operator=(kernel); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Kernel(Kernel&& kernel) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(kernel)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Kernel& operator = (Kernel &&kernel) { detail::Wrapper::operator=(std::move(kernel)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) template cl_int getInfo(cl_kernel_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelInfo, object_, name, param), __GET_KERNEL_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if defined(CL_VERSION_1_2) template cl_int getArgInfo(cl_uint argIndex, cl_kernel_arg_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelArgInfo, object_, argIndex, name, param), __GET_KERNEL_ARG_INFO_ERR); } template typename detail::param_traits::param_type getArgInfo(cl_uint argIndex, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_arg_info, name>::param_type param; cl_int result = getArgInfo(argIndex, name, ¶m); if (err != NULL) { *err = result; } return param; } #endif // #if defined(CL_VERSION_1_2) template cl_int getWorkGroupInfo( const Device& device, cl_kernel_work_group_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetKernelWorkGroupInfo, object_, device(), name, param), __GET_KERNEL_WORK_GROUP_INFO_ERR); } template typename detail::param_traits::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_work_group_info, name>::param_type param; cl_int result = getWorkGroupInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int setArg(cl_uint index, const T &value) { return detail::errHandler( ::clSetKernelArg( object_, index, detail::KernelArgumentHandler::size(value), detail::KernelArgumentHandler::ptr(value)), __SET_KERNEL_ARGS_ERR); } cl_int setArg(cl_uint index, ::size_t size, const void* argPtr) { return detail::errHandler( ::clSetKernelArg(object_, index, size, argPtr), __SET_KERNEL_ARGS_ERR); } }; /*! \class Program * \brief Program interface that implements cl_program. */ class Program : public detail::Wrapper { public: typedef VECTOR_CLASS > Binaries; typedef VECTOR_CLASS > Sources; Program( const STRING_CLASS& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const ::size_t length = source.size(); Context context = Context::getDefault(err); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, "", NULL, NULL); detail::errHandler(error, __BUILD_PROGRAM_ERR); } if (err != NULL) { *err = error; } } Program( const Context& context, const STRING_CLASS& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const ::size_t length = source.size(); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, "", NULL, NULL); detail::errHandler(error, __BUILD_PROGRAM_ERR); } if (err != NULL) { *err = error; } } Program( const Context& context, const Sources& sources, cl_int* err = NULL) { cl_int error; const ::size_t n = (::size_t)sources.size(); ::size_t* lengths = (::size_t*) alloca(n * sizeof(::size_t)); const char** strings = (const char**) alloca(n * sizeof(const char*)); for (::size_t i = 0; i < n; ++i) { strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings, lengths, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Construct a program object from a list of devices and a per-device list of binaries. * \param context A valid OpenCL context in which to construct the program. * \param devices A vector of OpenCL device objects for which the program will be created. * \param binaries A vector of pairs of a pointer to a binary object and its length. * \param binaryStatus An optional vector that on completion will be resized to * match the size of binaries and filled with values to specify if each binary * was successfully loaded. * Set to CL_SUCCESS if the binary was successfully loaded. * Set to CL_INVALID_VALUE if the length is 0 or the binary pointer is NULL. * Set to CL_INVALID_BINARY if the binary provided is not valid for the matching device. * \param err if non-NULL will be set to CL_SUCCESS on successful operation or one of the following errors: * CL_INVALID_CONTEXT if context is not a valid context. * CL_INVALID_VALUE if the length of devices is zero; or if the length of binaries does not match the length of devices; * or if any entry in binaries is NULL or has length 0. * CL_INVALID_DEVICE if OpenCL devices listed in devices are not in the list of devices associated with context. * CL_INVALID_BINARY if an invalid program binary was encountered for any device. binaryStatus will return specific status for each device. * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ Program( const Context& context, const VECTOR_CLASS& devices, const Binaries& binaries, VECTOR_CLASS* binaryStatus = NULL, cl_int* err = NULL) { cl_int error; const ::size_t numDevices = devices.size(); // Catch size mismatch early and return if(binaries.size() != numDevices) { error = CL_INVALID_VALUE; detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } return; } ::size_t* lengths = (::size_t*) alloca(numDevices * sizeof(::size_t)); const unsigned char** images = (const unsigned char**) alloca(numDevices * sizeof(const unsigned char**)); for (::size_t i = 0; i < numDevices; ++i) { images[i] = (const unsigned char*)binaries[i].first; lengths[i] = binaries[(int)i].second; } cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } if(binaryStatus) { binaryStatus->resize(numDevices); } object_ = ::clCreateProgramWithBinary( context(), (cl_uint) devices.size(), deviceIDs, lengths, images, (binaryStatus != NULL && numDevices > 0) ? &binaryStatus->front() : NULL, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } } #if defined(CL_VERSION_1_2) /** * Create program using builtin kernels. * \param kernelNames Semi-colon separated list of builtin kernel names */ Program( const Context& context, const VECTOR_CLASS& devices, const STRING_CLASS& kernelNames, cl_int* err = NULL) { cl_int error; ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateProgramWithBuiltInKernels( context(), (cl_uint) devices.size(), deviceIDs, kernelNames.c_str(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) Program() { } __CL_EXPLICIT_CONSTRUCTORS Program(const cl_program& program) : detail::Wrapper(program) { } Program& operator = (const cl_program& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Program(const Program& program) : detail::Wrapper(program) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Program& operator = (const Program &program) { detail::Wrapper::operator=(program); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Program(Program&& program) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(program)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Program& operator = (Program &&program) { detail::Wrapper::operator=(std::move(program)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) cl_int build( const VECTOR_CLASS& devices, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } return detail::errHandler( ::clBuildProgram( object_, (cl_uint) devices.size(), deviceIDs, options, notifyFptr, data), __BUILD_PROGRAM_ERR); } cl_int build( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { return detail::errHandler( ::clBuildProgram( object_, 0, NULL, options, notifyFptr, data), __BUILD_PROGRAM_ERR); } #if defined(CL_VERSION_1_2) cl_int compile( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { return detail::errHandler( ::clCompileProgram( object_, 0, NULL, options, 0, NULL, NULL, notifyFptr, data), __COMPILE_PROGRAM_ERR); } #endif template cl_int getInfo(cl_program_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int getBuildInfo( const Device& device, cl_program_build_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetProgramBuildInfo, object_, device(), name, param), __GET_PROGRAM_BUILD_INFO_ERR); } template typename detail::param_traits::param_type getBuildInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; cl_int result = getBuildInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int createKernels(VECTOR_CLASS* kernels) { cl_uint numKernels; cl_int err = ::clCreateKernelsInProgram(object_, 0, NULL, &numKernels); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } Kernel* value = (Kernel*) alloca(numKernels * sizeof(Kernel)); err = ::clCreateKernelsInProgram( object_, numKernels, (cl_kernel*) value, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } kernels->assign(&value[0], &value[numKernels]); return CL_SUCCESS; } }; #if defined(CL_VERSION_1_2) inline Program linkProgram( Program input1, Program input2, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program programs[2] = { input1(), input2() }; Context ctx = input1.getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, 2, programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } inline Program linkProgram( VECTOR_CLASS inputPrograms, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program * programs = (cl_program*) alloca(inputPrograms.size() * sizeof(cl_program)); if (programs != NULL) { for (unsigned int i = 0; i < inputPrograms.size(); i++) { programs[i] = inputPrograms[i](); } } Context ctx; if(inputPrograms.size() > 0) { ctx = inputPrograms[0].getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, (cl_uint)inputPrograms.size(), programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } #endif template<> inline VECTOR_CLASS cl::Program::getInfo(cl_int* err) const { VECTOR_CLASS< ::size_t> sizes = getInfo(); VECTOR_CLASS binaries; for (VECTOR_CLASS< ::size_t>::iterator s = sizes.begin(); s != sizes.end(); ++s) { char *ptr = NULL; if (*s != 0) ptr = new char[*s]; binaries.push_back(ptr); } cl_int result = getInfo(CL_PROGRAM_BINARIES, &binaries); if (err != NULL) { *err = result; } return binaries; } inline Kernel::Kernel(const Program& program, const char* name, cl_int* err) { cl_int error; object_ = ::clCreateKernel(program(), name, &error); detail::errHandler(error, __CREATE_KERNEL_ERR); if (err != NULL) { *err = error; } } /*! \class CommandQueue * \brief CommandQueue interface for cl_command_queue. */ class CommandQueue : public detail::Wrapper { private: #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED static std::atomic default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED static volatile int default_initialized_; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED static CommandQueue default_; static volatile cl_int default_error_; public: CommandQueue( cl_command_queue_properties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context */ explicit CommandQueue( const Context& context, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; VECTOR_CLASS devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } object_ = ::clCreateCommandQueue(context(), devices[0](), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } CommandQueue( const Context& context, const Device& device, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue(const CommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (const CommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue(CommandQueue&& queue) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (CommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) static CommandQueue getDefault(cl_int * err = NULL) { int state = detail::compare_exchange( &default_initialized_, __DEFAULT_BEING_INITIALIZED, __DEFAULT_NOT_INITIALIZED); if (state & __DEFAULT_INITIALIZED) { if (err != NULL) { *err = default_error_; } return default_; } if (state & __DEFAULT_BEING_INITIALIZED) { // Assume writes will propagate eventually... while(default_initialized_ != __DEFAULT_INITIALIZED) { detail::fence(); } if (err != NULL) { *err = default_error_; } return default_; } cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; default_ = CommandQueue(context, device, 0, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } detail::fence(); default_error_ = error; // Assume writes will propagate eventually... default_initialized_ = __DEFAULT_INITIALIZED; detail::fence(); if (err != NULL) { *err = default_error_; } return default_; } CommandQueue() { } __CL_EXPLICIT_CONSTRUCTORS CommandQueue(const cl_command_queue& commandQueue) : detail::Wrapper(commandQueue) { } CommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, ::size_t src_offset, ::size_t dst_offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBuffer( object_, src(), dst(), src_offset, dst_offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBufferRect( object_, buffer(), blocking, (const ::size_t *)buffer_offset, (const ::size_t *)host_offset, (const ::size_t *)region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, const void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBufferRect( object_, buffer(), blocking, (const ::size_t *)buffer_offset, (const ::size_t *)host_offset, (const ::size_t *)region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, ::size_t src_row_pitch, ::size_t src_slice_pitch, ::size_t dst_row_pitch, ::size_t dst_slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferRect( object_, src(), dst(), (const ::size_t *)src_origin, (const ::size_t *)dst_origin, (const ::size_t *)region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueue a command to fill a buffer object with a pattern * of a given size. The pattern is specified a as vector. * \tparam PatternType The datatype of the pattern field. * The pattern type must be an accepted OpenCL data type. */ template cl_int enqueueFillBuffer( const Buffer& buffer, PatternType pattern, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillBuffer( object_, buffer(), static_cast(&pattern), sizeof(PatternType), offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueReadImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadImage( object_, image(), blocking, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteImage( object_, image(), blocking, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyImage( const Image& src, const Image& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImage( object_, src(), dst(), (const ::size_t *) src_origin, (const ::size_t *)dst_origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA floating-point color value if * the image channel data type is not an unnormalized signed or * unsigned data type. */ cl_int enqueueFillImage( const Image& image, cl_float4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA signed integer color value if * the image channel data type is an unnormalized signed integer * type. */ cl_int enqueueFillImage( const Image& image, cl_int4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA unsigned integer color value if * the image channel data type is an unnormalized unsigned integer * type. */ cl_int enqueueFillImage( const Image& image, cl_uint4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& region, ::size_t dst_offset, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImageToBuffer( object_, src(), dst(), (const ::size_t *) src_origin, (const ::size_t *) region, dst_offset, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, ::size_t src_offset, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferToImage( object_, src(), dst(), src_offset, (const ::size_t *) dst_origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapBuffer( object_, buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } void* enqueueMapImage( const Image& buffer, cl_bool blocking, cl_map_flags flags, const size_t<3>& origin, const size_t<3>& region, ::size_t * row_pitch, ::size_t * slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapImage( object_, buffer(), blocking, flags, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_IMAGE_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( object_, memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueues a marker command which waits for either a list of events to complete, * or all previously enqueued commands to complete. * * Enqueues a marker command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command returns an event which can be waited on, * i.e. this event can be waited on to insure that all events either in the event_wait_list * or all previously enqueued commands, queued before this command to command_queue, * have completed. */ cl_int enqueueMarkerWithWaitList( const VECTOR_CLASS *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarkerWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * A synchronization point that enqueues a barrier operation. * * Enqueues a barrier command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command blocks command execution, that is, any * following commands enqueued after it do not execute until it completes. This command * returns an event which can be waited on, i.e. this event can be waited on to insure that * all events either in the event_wait_list or all previously enqueued commands, queued * before this command to command_queue, have completed. */ cl_int enqueueBarrierWithWaitList( const VECTOR_CLASS *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueBarrierWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_BARRIER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command to indicate with which device a set of memory objects * should be associated. */ cl_int enqueueMigrateMemObjects( const VECTOR_CLASS &memObjects, cl_mem_migration_flags flags, const VECTOR_CLASS* events = NULL, Event* event = NULL ) const { cl_event tmp; cl_mem* localMemObjects = static_cast(alloca(memObjects.size() * sizeof(cl_mem))); for( int i = 0; i < (int)memObjects.size(); ++i ) { localMemObjects[i] = memObjects[i](); } cl_int err = detail::errHandler( ::clEnqueueMigrateMemObjects( object_, (cl_uint)memObjects.size(), static_cast(localMemObjects), flags, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueNDRangeKernel( const Kernel& kernel, const NDRange& offset, const NDRange& global, const NDRange& local = NullRange, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNDRangeKernel( object_, kernel(), (cl_uint) global.dimensions(), offset.dimensions() != 0 ? (const ::size_t*) offset : NULL, (const ::size_t*) global, local.dimensions() != 0 ? (const ::size_t*) local : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NDRANGE_KERNEL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueTask( const Kernel& kernel, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueTask( object_, kernel(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_TASK_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueNativeKernel( void (CL_CALLBACK *userFptr)(void *), std::pair args, const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* mem_locs = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_mem * mems = (mem_objects != NULL && mem_objects->size() > 0) ? (cl_mem*) alloca(mem_objects->size() * sizeof(cl_mem)) : NULL; if (mems != NULL) { for (unsigned int i = 0; i < mem_objects->size(); i++) { mems[i] = ((*mem_objects)[i])(); } } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNativeKernel( object_, userFptr, args.first, args.second, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, mems, (mem_locs != NULL && mem_locs->size() > 0) ? (const void **) &mem_locs->front() : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NATIVE_KERNEL); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueMarker(Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarker( object_, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueWaitForEvents(const VECTOR_CLASS& events) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueWaitForEvents( object_, (cl_uint) events.size(), events.size() > 0 ? (const cl_event*) &events.front() : NULL), __ENQUEUE_WAIT_FOR_EVENTS_ERR); } #endif // #if defined(CL_VERSION_1_1) cl_int enqueueAcquireGLObjects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueAcquireGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseGLObjects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReleaseGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined (USE_DX_INTEROP) typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueAcquireD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueReleaseD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); cl_int enqueueAcquireD3D10Objects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { static PFN_clEnqueueAcquireD3D10ObjectsKHR pfn_clEnqueueAcquireD3D10ObjectsKHR = NULL; #if defined(CL_VERSION_1_2) cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clEnqueueAcquireD3D10ObjectsKHR); #endif #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clEnqueueAcquireD3D10ObjectsKHR); #endif cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueAcquireD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseD3D10Objects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { static PFN_clEnqueueReleaseD3D10ObjectsKHR pfn_clEnqueueReleaseD3D10ObjectsKHR = NULL; #if defined(CL_VERSION_1_2) cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clEnqueueReleaseD3D10ObjectsKHR); #endif // #if defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clEnqueueReleaseD3D10ObjectsKHR); #endif // #if defined(CL_VERSION_1_1) cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueReleaseD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueBarrier() const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueBarrier(object_), __ENQUEUE_BARRIER_ERR); } #endif // #if defined(CL_VERSION_1_1) cl_int flush() const { return detail::errHandler(::clFlush(object_), __FLUSH_ERR); } cl_int finish() const { return detail::errHandler(::clFinish(object_), __FINISH_ERR); } }; #ifdef _WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) std::atomic CommandQueue::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) volatile int CommandQueue::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) CommandQueue CommandQueue::default_; __declspec(selectany) volatile cl_int CommandQueue::default_error_ = CL_SUCCESS; #else // !_WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) std::atomic CommandQueue::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) volatile int CommandQueue::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) CommandQueue CommandQueue::default_; __attribute__((weak)) volatile cl_int CommandQueue::default_error_ = CL_SUCCESS; #endif // !_WIN32 template< typename IteratorType > Buffer::Buffer( const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { CommandQueue queue(context, 0, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } template< typename IteratorType > Buffer::Buffer( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if (readOnly) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); Context context = queue.getInfo(); if (useHostPtr) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if (!useHostPtr) { error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } inline cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBuffer(buffer, blocking, offset, size, ptr, events, event); } inline cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBuffer(buffer, blocking, offset, size, ptr, events, event); } inline void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } void * result = ::clEnqueueMapBuffer( queue(), buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (cl_event*) event, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } return result; } inline cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (error != CL_SUCCESS) { return error; } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( queue(), memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } inline cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, ::size_t src_offset, ::size_t dst_offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBuffer(src, dst, src_offset, dst_offset, size, events, event); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, startIterator, endIterator, buffer); } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, buffer, startIterator, endIterator); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; ::size_t length = endIterator-startIterator; ::size_t byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_WRITE, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } #if defined(_MSC_VER) std::copy( startIterator, endIterator, stdext::checked_array_iterator( pointer, length)); #else std::copy(startIterator, endIterator, pointer); #endif Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; ::size_t length = endIterator-startIterator; ::size_t byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_READ, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } std::copy(pointer, pointer + length, startIterator); Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } #if defined(CL_VERSION_1_1) inline cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, const void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, ::size_t src_row_pitch, ::size_t src_slice_pitch, ::size_t dst_row_pitch, ::size_t dst_slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferRect( src, dst, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, events, event); } #endif inline cl_int enqueueReadImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueCopyImage( const Image& src, const Image& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImage( src, dst, src_origin, dst_origin, region, events, event); } inline cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& region, ::size_t dst_offset, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImageToBuffer( src, dst, src_origin, region, dst_offset, events, event); } inline cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, ::size_t src_offset, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferToImage( src, dst, src_offset, dst_origin, region, events, event); } inline cl_int flush(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.flush(); } inline cl_int finish(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.finish(); } // Kernel Functor support // New interface as of September 2011 // Requires the C++11 std::tr1::function (note do not support TR1) // Visual Studio 2010 and GCC 4.2 struct EnqueueArgs { CommandQueue queue_; const NDRange offset_; const NDRange global_; const NDRange local_; VECTOR_CLASS events_; EnqueueArgs(NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { } EnqueueArgs(Event e, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(Event e, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(Event e, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(const VECTOR_CLASS &events, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(const VECTOR_CLASS &events, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(const VECTOR_CLASS &events, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(CommandQueue &queue, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, Event e, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local), events_(events) { } }; namespace detail { class NullType {}; template struct SetArg { static void set (Kernel kernel, T0 arg) { kernel.setArg(index, arg); } }; template struct SetArg { static void set (Kernel, NullType) { } }; template < typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30, typename T31 > class KernelFunctorGlobal { private: Kernel kernel_; public: KernelFunctorGlobal( Kernel kernel) : kernel_(kernel) {} KernelFunctorGlobal( const Program& program, const STRING_CLASS name, cl_int * err = NULL) : kernel_(program, name.c_str(), err) {} Event operator() ( const EnqueueArgs& args, T0 t0, T1 t1 = NullType(), T2 t2 = NullType(), T3 t3 = NullType(), T4 t4 = NullType(), T5 t5 = NullType(), T6 t6 = NullType(), T7 t7 = NullType(), T8 t8 = NullType(), T9 t9 = NullType(), T10 t10 = NullType(), T11 t11 = NullType(), T12 t12 = NullType(), T13 t13 = NullType(), T14 t14 = NullType(), T15 t15 = NullType(), T16 t16 = NullType(), T17 t17 = NullType(), T18 t18 = NullType(), T19 t19 = NullType(), T20 t20 = NullType(), T21 t21 = NullType(), T22 t22 = NullType(), T23 t23 = NullType(), T24 t24 = NullType(), T25 t25 = NullType(), T26 t26 = NullType(), T27 t27 = NullType(), T28 t28 = NullType(), T29 t29 = NullType(), T30 t30 = NullType(), T31 t31 = NullType() ) { Event event; SetArg<0, T0>::set(kernel_, t0); SetArg<1, T1>::set(kernel_, t1); SetArg<2, T2>::set(kernel_, t2); SetArg<3, T3>::set(kernel_, t3); SetArg<4, T4>::set(kernel_, t4); SetArg<5, T5>::set(kernel_, t5); SetArg<6, T6>::set(kernel_, t6); SetArg<7, T7>::set(kernel_, t7); SetArg<8, T8>::set(kernel_, t8); SetArg<9, T9>::set(kernel_, t9); SetArg<10, T10>::set(kernel_, t10); SetArg<11, T11>::set(kernel_, t11); SetArg<12, T12>::set(kernel_, t12); SetArg<13, T13>::set(kernel_, t13); SetArg<14, T14>::set(kernel_, t14); SetArg<15, T15>::set(kernel_, t15); SetArg<16, T16>::set(kernel_, t16); SetArg<17, T17>::set(kernel_, t17); SetArg<18, T18>::set(kernel_, t18); SetArg<19, T19>::set(kernel_, t19); SetArg<20, T20>::set(kernel_, t20); SetArg<21, T21>::set(kernel_, t21); SetArg<22, T22>::set(kernel_, t22); SetArg<23, T23>::set(kernel_, t23); SetArg<24, T24>::set(kernel_, t24); SetArg<25, T25>::set(kernel_, t25); SetArg<26, T26>::set(kernel_, t26); SetArg<27, T27>::set(kernel_, t27); SetArg<28, T28>::set(kernel_, t28); SetArg<29, T29>::set(kernel_, t29); SetArg<30, T30>::set(kernel_, t30); SetArg<31, T31>::set(kernel_, t31); args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } }; //------------------------------------------------------------------------------------------------------ template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30, typename T31> struct functionImplementation_ { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 32)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29, T30 arg30, T31 arg31) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29, arg30, arg31); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 31)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29, T30 arg30) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29, arg30); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 30)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 29)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 28)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 27)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 26)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 25)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 24)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 23)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 22)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 21)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 20)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 19)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 18)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 17)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 16)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 15)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 14)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 13)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 12)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 11)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 10)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 9)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 8)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 7)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 6)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4> struct functionImplementation_ < T0, T1, T2, T3, T4, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 5)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4); } }; template< typename T0, typename T1, typename T2, typename T3> struct functionImplementation_ < T0, T1, T2, T3, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 4)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3); } }; template< typename T0, typename T1, typename T2> struct functionImplementation_ < T0, T1, T2, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 3)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2) { return functor_( enqueueArgs, arg0, arg1, arg2); } }; template< typename T0, typename T1> struct functionImplementation_ < T0, T1, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 2)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1) { return functor_( enqueueArgs, arg0, arg1); } }; template< typename T0> struct functionImplementation_ < T0, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 1)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0) { return functor_( enqueueArgs, arg0); } }; } // namespace detail //---------------------------------------------------------------------------------------------- template < typename T0, typename T1 = detail::NullType, typename T2 = detail::NullType, typename T3 = detail::NullType, typename T4 = detail::NullType, typename T5 = detail::NullType, typename T6 = detail::NullType, typename T7 = detail::NullType, typename T8 = detail::NullType, typename T9 = detail::NullType, typename T10 = detail::NullType, typename T11 = detail::NullType, typename T12 = detail::NullType, typename T13 = detail::NullType, typename T14 = detail::NullType, typename T15 = detail::NullType, typename T16 = detail::NullType, typename T17 = detail::NullType, typename T18 = detail::NullType, typename T19 = detail::NullType, typename T20 = detail::NullType, typename T21 = detail::NullType, typename T22 = detail::NullType, typename T23 = detail::NullType, typename T24 = detail::NullType, typename T25 = detail::NullType, typename T26 = detail::NullType, typename T27 = detail::NullType, typename T28 = detail::NullType, typename T29 = detail::NullType, typename T30 = detail::NullType, typename T31 = detail::NullType > struct make_kernel : public detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 > { public: typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 > FunctorType; make_kernel( const Program& program, const STRING_CLASS name, cl_int * err = NULL) : detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 >( FunctorType(program, name, err)) {} make_kernel( const Kernel kernel) : detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 >( FunctorType(kernel)) {} }; //---------------------------------------------------------------------------------------------------------------------- #undef __ERR_STR #if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS) #undef __GET_DEVICE_INFO_ERR #undef __GET_PLATFORM_INFO_ERR #undef __GET_DEVICE_IDS_ERR #undef __GET_CONTEXT_INFO_ERR #undef __GET_EVENT_INFO_ERR #undef __GET_EVENT_PROFILE_INFO_ERR #undef __GET_MEM_OBJECT_INFO_ERR #undef __GET_IMAGE_INFO_ERR #undef __GET_SAMPLER_INFO_ERR #undef __GET_KERNEL_INFO_ERR #undef __GET_KERNEL_ARG_INFO_ERR #undef __GET_KERNEL_WORK_GROUP_INFO_ERR #undef __GET_PROGRAM_INFO_ERR #undef __GET_PROGRAM_BUILD_INFO_ERR #undef __GET_COMMAND_QUEUE_INFO_ERR #undef __CREATE_CONTEXT_ERR #undef __CREATE_CONTEXT_FROM_TYPE_ERR #undef __GET_SUPPORTED_IMAGE_FORMATS_ERR #undef __CREATE_BUFFER_ERR #undef __CREATE_SUBBUFFER_ERR #undef __CREATE_IMAGE2D_ERR #undef __CREATE_IMAGE3D_ERR #undef __CREATE_SAMPLER_ERR #undef __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR #undef __CREATE_USER_EVENT_ERR #undef __SET_USER_EVENT_STATUS_ERR #undef __SET_EVENT_CALLBACK_ERR #undef __SET_PRINTF_CALLBACK_ERR #undef __WAIT_FOR_EVENTS_ERR #undef __CREATE_KERNEL_ERR #undef __SET_KERNEL_ARGS_ERR #undef __CREATE_PROGRAM_WITH_SOURCE_ERR #undef __CREATE_PROGRAM_WITH_BINARY_ERR #undef __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR #undef __BUILD_PROGRAM_ERR #undef __CREATE_KERNELS_IN_PROGRAM_ERR #undef __CREATE_COMMAND_QUEUE_ERR #undef __SET_COMMAND_QUEUE_PROPERTY_ERR #undef __ENQUEUE_READ_BUFFER_ERR #undef __ENQUEUE_WRITE_BUFFER_ERR #undef __ENQUEUE_READ_BUFFER_RECT_ERR #undef __ENQUEUE_WRITE_BUFFER_RECT_ERR #undef __ENQEUE_COPY_BUFFER_ERR #undef __ENQEUE_COPY_BUFFER_RECT_ERR #undef __ENQUEUE_READ_IMAGE_ERR #undef __ENQUEUE_WRITE_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR #undef __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR #undef __ENQUEUE_MAP_BUFFER_ERR #undef __ENQUEUE_MAP_IMAGE_ERR #undef __ENQUEUE_UNMAP_MEM_OBJECT_ERR #undef __ENQUEUE_NDRANGE_KERNEL_ERR #undef __ENQUEUE_TASK_ERR #undef __ENQUEUE_NATIVE_KERNEL #undef __CL_EXPLICIT_CONSTRUCTORS #undef __UNLOAD_COMPILER_ERR #endif //__CL_USER_OVERRIDE_ERROR_STRINGS #undef __CL_FUNCTION_TYPE // Extensions /** * Deprecated APIs for 1.2 */ #if defined(CL_VERSION_1_1) #undef __INIT_CL_EXT_FCN_PTR #endif // #if defined(CL_VERSION_1_1) #undef __CREATE_SUB_DEVICES #if defined(USE_CL_DEVICE_FISSION) #undef __PARAM_NAME_DEVICE_FISSION #endif // USE_CL_DEVICE_FISSION #undef __DEFAULT_NOT_INITIALIZED #undef __DEFAULT_BEING_INITIALIZED #undef __DEFAULT_INITIALIZED #undef CL_HPP_RVALUE_REFERENCES_SUPPORTED #undef CL_HPP_NOEXCEPT } // namespace cl #endif // CL_HPP_ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl2.hpp000066400000000000000000011116741450307266000230250ustar00rootroot00000000000000/* Modifications Copyright(C)[2021-2022] Advanced Micro Devices, Inc. * All rights reserved. * */ /******************************************************************************* * Copyright (c) 2008-2016 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /*! \file * * \brief C++ bindings for OpenCL 1.0 (rev 48), OpenCL 1.1 (rev 33), * OpenCL 1.2 (rev 15) and OpenCL 2.0 (rev 29) * \author Lee Howes and Bruce Merry * * Derived from the OpenCL 1.x C++ bindings written by * Benedict R. Gaster, Laurent Morichetti and Lee Howes * With additions and fixes from: * Brian Cole, March 3rd 2010 and April 2012 * Matt Gruenke, April 2012. * Bruce Merry, February 2013. * Tom Deakin and Simon McIntosh-Smith, July 2013 * James Price, 2015- * * \version 2.0.10 * \date 2016-07-20 * * Optional extension support * * cl_ext_device_fission * #define CL_HPP_USE_CL_DEVICE_FISSION * cl_khr_d3d10_sharing * #define CL_HPP_USE_DX_INTEROP * cl_khr_sub_groups * #define CL_HPP_USE_CL_SUB_GROUPS_KHR * cl_khr_image2d_from_buffer * #define CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR * * Doxygen documentation for this header is available here: * * http://khronosgroup.github.io/OpenCL-CLHPP/ * * The latest version of this header can be found on the GitHub releases page: * * https://github.com/KhronosGroup/OpenCL-CLHPP/releases * * Bugs and patches can be submitted to the GitHub repository: * * https://github.com/KhronosGroup/OpenCL-CLHPP */ /*! \mainpage * \section intro Introduction * For many large applications C++ is the language of choice and so it seems * reasonable to define C++ bindings for OpenCL. * * The interface is contained with a single C++ header file \em cl2.hpp and all * definitions are contained within the namespace \em cl. There is no additional * requirement to include \em cl.h and to use either the C++ or original C * bindings; it is enough to simply include \em cl2.hpp. * * The bindings themselves are lightweight and correspond closely to the * underlying C API. Using the C++ bindings introduces no additional execution * overhead. * * There are numerous compatibility, portability and memory management * fixes in the new header as well as additional OpenCL 2.0 features. * As a result the header is not directly backward compatible and for this * reason we release it as cl2.hpp rather than a new version of cl.hpp. * * * \section compatibility Compatibility * Due to the evolution of the underlying OpenCL API the 2.0 C++ bindings * include an updated approach to defining supported feature versions * and the range of valid underlying OpenCL runtime versions supported. * * The combination of preprocessor macros CL_HPP_TARGET_OPENCL_VERSION and * CL_HPP_MINIMUM_OPENCL_VERSION control this range. These are three digit * decimal values representing OpenCL runime versions. The default for * the target is 200, representing OpenCL 2.0 and the minimum is also * defined as 200. These settings would use 2.0 API calls only. * If backward compatibility with a 1.2 runtime is required, the minimum * version may be set to 120. * * Note that this is a compile-time setting, and so affects linking against * a particular SDK version rather than the versioning of the loaded runtime. * * The earlier versions of the header included basic vector and string * classes based loosely on STL versions. These were difficult to * maintain and very rarely used. For the 2.0 header we now assume * the presence of the standard library unless requested otherwise. * We use std::array, std::vector, std::shared_ptr and std::string * throughout to safely manage memory and reduce the chance of a * recurrance of earlier memory management bugs. * * These classes are used through typedefs in the cl namespace: * cl::array, cl::vector, cl::pointer and cl::string. * In addition cl::allocate_pointer forwards to std::allocate_shared * by default. * In all cases these standard library classes can be replaced with * custom interface-compatible versions using the CL_HPP_NO_STD_ARRAY, * CL_HPP_NO_STD_VECTOR, CL_HPP_NO_STD_UNIQUE_PTR and * CL_HPP_NO_STD_STRING macros. * * The OpenCL 1.x versions of the C++ bindings included a size_t wrapper * class to interface with kernel enqueue. This caused unpleasant interactions * with the standard size_t declaration and led to namespacing bugs. * In the 2.0 version we have replaced this with a std::array-based interface. * However, the old behaviour can be regained for backward compatibility * using the CL_HPP_ENABLE_SIZE_T_COMPATIBILITY macro. * * Finally, the program construction interface used a clumsy vector-of-pairs * design in the earlier versions. We have replaced that with a cleaner * vector-of-vectors and vector-of-strings design. However, for backward * compatibility old behaviour can be regained with the * CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY macro. * * In OpenCL 2.0 OpenCL C is not entirely backward compatibility with * earlier versions. As a result a flag must be passed to the OpenCL C * compiled to request OpenCL 2.0 compilation of kernels with 1.2 as * the default in the absence of the flag. * In some cases the C++ bindings automatically compile code for ease. * For those cases the compilation defaults to OpenCL C 2.0. * If this is not wanted, the CL_HPP_CL_1_2_DEFAULT_BUILD macro may * be specified to assume 1.2 compilation. * If more fine-grained decisions on a per-kernel bases are required * then explicit build operations that take the flag should be used. * * * \section parameterization Parameters * This header may be parameterized by a set of preprocessor macros. * * - CL_HPP_TARGET_OPENCL_VERSION * * Defines the target OpenCL runtime version to build the header * against. Defaults to 200, representing OpenCL 2.0. * * - CL_HPP_NO_STD_STRING * * Do not use the standard library string class. cl::string is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_VECTOR * * Do not use the standard library vector class. cl::vector is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_ARRAY * * Do not use the standard library array class. cl::array is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_UNIQUE_PTR * * Do not use the standard library unique_ptr class. cl::pointer and * the cl::allocate_pointer functions are not defined and may be * defined by the user before cl2.hpp is included. * * - CL_HPP_ENABLE_DEVICE_FISSION * * Enables device fission for OpenCL 1.2 platforms. * * - CL_HPP_ENABLE_EXCEPTIONS * * Enable exceptions for use in the C++ bindings header. This is the * preferred error handling mechanism but is not required. * * - CL_HPP_ENABLE_SIZE_T_COMPATIBILITY * * Backward compatibility option to support cl.hpp-style size_t * class. Replaces the updated std::array derived version and * removal of size_t from the namespace. Note that in this case the * new size_t class is placed in the cl::compatibility namespace and * thus requires an additional using declaration for direct backward * compatibility. * * - CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY * * Enable older vector of pairs interface for construction of * programs. * * - CL_HPP_CL_1_2_DEFAULT_BUILD * * Default to OpenCL C 1.2 compilation rather than OpenCL C 2.0 * applies to use of cl::Program construction and other program * build variants. * * * \section example Example * * The following example shows a general use case for the C++ * bindings, including support for the optional exception feature and * also the supplied vector and string classes, see following sections for * decriptions of these features. * * \code #define CL_HPP_ENABLE_EXCEPTIONS #define CL_HPP_TARGET_OPENCL_VERSION 200 #include #include #include #include #include const int numElements = 32; int main(void) { // Filter for a 2.0 platform and set it as the default std::vector platforms; cl::Platform::get(&platforms); cl::Platform plat; for (auto &p : platforms) { std::string platver = p.getInfo(); if (platver.find("OpenCL 2.") != std::string::npos) { plat = p; } } if (plat() == 0) { std::cout << "No OpenCL 2.0 platform found."; return -1; } cl::Platform newP = cl::Platform::setDefault(plat); if (newP != plat) { std::cout << "Error setting default platform."; return -1; } // Use C++11 raw string literals for kernel source code std::string kernel1{R"CLC( global int globalA; kernel void updateGlobal() { globalA = 75; } )CLC"}; std::string kernel2{R"CLC( typedef struct { global int *bar; } Foo; kernel void vectorAdd(global const Foo* aNum, global const int *inputA, global const int *inputB, global int *output, int val, write_only pipe int outPipe, queue_t childQueue) { output[get_global_id(0)] = inputA[get_global_id(0)] + inputB[get_global_id(0)] + val + *(aNum->bar); write_pipe(outPipe, &val); queue_t default_queue = get_default_queue(); ndrange_t ndrange = ndrange_1D(get_global_size(0)/2, get_global_size(0)/2); // Have a child kernel write into third quarter of output enqueue_kernel(default_queue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ output[get_global_size(0)*2 + get_global_id(0)] = inputA[get_global_size(0)*2 + get_global_id(0)] + inputB[get_global_size(0)*2 + get_global_id(0)] + globalA; }); // Have a child kernel write into last quarter of output enqueue_kernel(childQueue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ output[get_global_size(0)*3 + get_global_id(0)] = inputA[get_global_size(0)*3 + get_global_id(0)] + inputB[get_global_size(0)*3 + get_global_id(0)] + globalA + 2; }); } )CLC"}; // New simpler string interface style std::vector programStrings {kernel1, kernel2}; cl::Program vectorAddProgram(programStrings); try { vectorAddProgram.build("-cl-std=CL2.0"); } catch (...) { // Print build info for all devices cl_int buildErr = CL_SUCCESS; auto buildInfo = vectorAddProgram.getBuildInfo(&buildErr); for (auto &pair : buildInfo) { std::cerr << pair.second << std::endl << std::endl; } return 1; } typedef struct { int *bar; } Foo; // Get and run kernel that initializes the program-scope global // A test for kernels that take no arguments auto program2Kernel = cl::KernelFunctor<>(vectorAddProgram, "updateGlobal"); program2Kernel( cl::EnqueueArgs( cl::NDRange(1))); ////////////////// // SVM allocations auto anSVMInt = cl::allocate_svm>(); *anSVMInt = 5; cl::SVMAllocator>> svmAllocReadOnly; auto fooPointer = cl::allocate_pointer(svmAllocReadOnly); fooPointer->bar = anSVMInt.get(); cl::SVMAllocator> svmAlloc; std::vector>> inputA(numElements, 1, svmAlloc); cl::coarse_svm_vector inputB(numElements, 2, svmAlloc); // ////////////// // Traditional cl_mem allocations std::vector output(numElements, 0xdeadbeef); cl::Buffer outputBuffer(begin(output), end(output), false); cl::Pipe aPipe(sizeof(cl_int), numElements / 2); // Default command queue, also passed in as a parameter cl::DeviceCommandQueue defaultDeviceQueue = cl::DeviceCommandQueue::makeDefault( cl::Context::getDefault(), cl::Device::getDefault()); auto vectorAddKernel = cl::KernelFunctor< decltype(fooPointer)&, int*, cl::coarse_svm_vector&, cl::Buffer, int, cl::Pipe&, cl::DeviceCommandQueue >(vectorAddProgram, "vectorAdd"); // Ensure that the additional SVM pointer is available to the kernel // This one was not passed as a parameter vectorAddKernel.setSVMPointers(anSVMInt); // Hand control of coarse allocations to runtime cl::enqueueUnmapSVM(anSVMInt); cl::enqueueUnmapSVM(fooPointer); cl::unmapSVM(inputB); cl::unmapSVM(output2); cl_int error; vectorAddKernel( cl::EnqueueArgs( cl::NDRange(numElements/2), cl::NDRange(numElements/2)), fooPointer, inputA.data(), inputB, outputBuffer, 3, aPipe, defaultDeviceQueue, error ); cl::copy(outputBuffer, begin(output), end(output)); // Grab the SVM output vector using a map cl::mapSVM(output2); cl::Device d = cl::Device::getDefault(); std::cout << "Output:\n"; for (int i = 1; i < numElements; ++i) { std::cout << "\t" << output[i] << "\n"; } std::cout << "\n\n"; return 0; } * * \endcode * */ #ifndef CL_HPP_ #define CL_HPP_ /* Handle deprecated preprocessor definitions. In each case, we only check for * the old name if the new name is not defined, so that user code can define * both and hence work with either version of the bindings. */ #if !defined(CL_HPP_USE_DX_INTEROP) && defined(USE_DX_INTEROP) # pragma message("cl2.hpp: USE_DX_INTEROP is deprecated. Define CL_HPP_USE_DX_INTEROP instead") # define CL_HPP_USE_DX_INTEROP #endif #if !defined(CL_HPP_USE_CL_DEVICE_FISSION) && defined(USE_CL_DEVICE_FISSION) # pragma message("cl2.hpp: USE_CL_DEVICE_FISSION is deprecated. Define CL_HPP_USE_CL_DEVICE_FISSION instead") # define CL_HPP_USE_CL_DEVICE_FISSION #endif #if !defined(CL_HPP_ENABLE_EXCEPTIONS) && defined(__CL_ENABLE_EXCEPTIONS) # pragma message("cl2.hpp: __CL_ENABLE_EXCEPTIONS is deprecated. Define CL_HPP_ENABLE_EXCEPTIONS instead") # define CL_HPP_ENABLE_EXCEPTIONS #endif #if !defined(CL_HPP_NO_STD_VECTOR) && defined(__NO_STD_VECTOR) # pragma message("cl2.hpp: __NO_STD_VECTOR is deprecated. Define CL_HPP_NO_STD_VECTOR instead") # define CL_HPP_NO_STD_VECTOR #endif #if !defined(CL_HPP_NO_STD_STRING) && defined(__NO_STD_STRING) # pragma message("cl2.hpp: __NO_STD_STRING is deprecated. Define CL_HPP_NO_STD_STRING instead") # define CL_HPP_NO_STD_STRING #endif #if defined(VECTOR_CLASS) # pragma message("cl2.hpp: VECTOR_CLASS is deprecated. Alias cl::vector instead") #endif #if defined(STRING_CLASS) # pragma message("cl2.hpp: STRING_CLASS is deprecated. Alias cl::string instead.") #endif #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) && defined(__CL_USER_OVERRIDE_ERROR_STRINGS) # pragma message("cl2.hpp: __CL_USER_OVERRIDE_ERROR_STRINGS is deprecated. Define CL_HPP_USER_OVERRIDE_ERROR_STRINGS instead") # define CL_HPP_USER_OVERRIDE_ERROR_STRINGS #endif /* Warn about features that are no longer supported */ #if defined(__USE_DEV_VECTOR) # pragma message("cl2.hpp: __USE_DEV_VECTOR is no longer supported. Expect compilation errors") #endif #if defined(__USE_DEV_STRING) # pragma message("cl2.hpp: __USE_DEV_STRING is no longer supported. Expect compilation errors") #endif /* Detect which version to target */ #if !defined(CL_HPP_TARGET_OPENCL_VERSION) # pragma message("cl2.hpp: CL_HPP_TARGET_OPENCL_VERSION is not defined. It will default to 200 (OpenCL 2.0)") # define CL_HPP_TARGET_OPENCL_VERSION 200 #endif #if CL_HPP_TARGET_OPENCL_VERSION != 100 && CL_HPP_TARGET_OPENCL_VERSION != 110 && CL_HPP_TARGET_OPENCL_VERSION != 120 && CL_HPP_TARGET_OPENCL_VERSION != 200 # pragma message("cl2.hpp: CL_HPP_TARGET_OPENCL_VERSION is not a valid value (100, 110, 120 or 200). It will be set to 200") # undef CL_HPP_TARGET_OPENCL_VERSION # define CL_HPP_TARGET_OPENCL_VERSION 200 #endif #if !defined(CL_HPP_MINIMUM_OPENCL_VERSION) # define CL_HPP_MINIMUM_OPENCL_VERSION 200 #endif #if CL_HPP_MINIMUM_OPENCL_VERSION != 100 && CL_HPP_MINIMUM_OPENCL_VERSION != 110 && CL_HPP_MINIMUM_OPENCL_VERSION != 120 && CL_HPP_MINIMUM_OPENCL_VERSION != 200 # pragma message("cl2.hpp: CL_HPP_MINIMUM_OPENCL_VERSION is not a valid value (100, 110, 120 or 200). It will be set to 100") # undef CL_HPP_MINIMUM_OPENCL_VERSION # define CL_HPP_MINIMUM_OPENCL_VERSION 100 #endif #if CL_HPP_MINIMUM_OPENCL_VERSION > CL_HPP_TARGET_OPENCL_VERSION # error "CL_HPP_MINIMUM_OPENCL_VERSION must not be greater than CL_HPP_TARGET_OPENCL_VERSION" #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 100 && !defined(CL_USE_DEPRECATED_OPENCL_1_0_APIS) # define CL_USE_DEPRECATED_OPENCL_1_0_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 110 && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) # define CL_USE_DEPRECATED_OPENCL_1_1_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 120 && !defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) # define CL_USE_DEPRECATED_OPENCL_1_2_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 200 && !defined(CL_USE_DEPRECATED_OPENCL_2_0_APIS) # define CL_USE_DEPRECATED_OPENCL_2_0_APIS #endif #ifdef _WIN32 #include #if defined(CL_HPP_USE_DX_INTEROP) #include #include #endif #endif // _WIN32 #if defined(_MSC_VER) #include #endif // _MSC_VER // Check for a valid C++ version // Need to do both tests here because for some reason __cplusplus is not // updated in visual studio #if (!defined(_MSC_VER) && __cplusplus < 201103L) || (defined(_MSC_VER) && _MSC_VER < 1700) #error Visual studio 2013 or another C++11-supporting compiler required #endif // #if defined(CL_HPP_USE_CL_DEVICE_FISSION) || defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) #include #endif #if defined(__APPLE__) || defined(__MACOSX) #include #else #include #endif // !__APPLE__ #if (__cplusplus >= 201103L) #define CL_HPP_NOEXCEPT_ noexcept #else #define CL_HPP_NOEXCEPT_ #endif #if defined(_MSC_VER) # define CL_HPP_DEFINE_STATIC_MEMBER_ __declspec(selectany) #else # define CL_HPP_DEFINE_STATIC_MEMBER_ __attribute__((weak)) #endif // !_MSC_VER // Define deprecated prefixes and suffixes to ensure compilation // in case they are not pre-defined #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #if !defined(CL_CALLBACK) #define CL_CALLBACK #endif //CL_CALLBACK #include #include #include #include #include #include // Define a size_type to represent a correctly resolved size_t #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { using size_type = ::size_t; } // namespace cl #else // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { using size_type = size_t; } // namespace cl #endif // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) #if defined(CL_HPP_ENABLE_EXCEPTIONS) #include #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) #if !defined(CL_HPP_NO_STD_VECTOR) #include namespace cl { template < class T, class Alloc = std::allocator > using vector = std::vector; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_VECTOR) #if !defined(CL_HPP_NO_STD_STRING) #include namespace cl { using string = std::string; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_STRING) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) #include namespace cl { // Replace unique_ptr and allocate_pointer for internal use // to allow user to replace them template using pointer = std::unique_ptr; } // namespace cl #endif #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if !defined(CL_HPP_NO_STD_ARRAY) #include namespace cl { template < class T, size_type N > using array = std::array; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_ARRAY) // Define size_type appropriately to allow backward-compatibility // use of the old size_t interface class #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { namespace compatibility { /*! \brief class used to interface between C++ and * OpenCL C calls that require arrays of size_t values, whose * size is known statically. */ template class size_t { private: size_type data_[N]; public: //! \brief Initialize size_t to all 0s size_t() { for (int i = 0; i < N; ++i) { data_[i] = 0; } } size_t(const array &rhs) { for (int i = 0; i < N; ++i) { data_[i] = rhs[i]; } } size_type& operator[](int index) { return data_[index]; } const size_type& operator[](int index) const { return data_[index]; } //! \brief Conversion operator to T*. operator size_type* () { return data_; } //! \brief Conversion operator to const T*. operator const size_type* () const { return data_; } operator array() const { array ret; for (int i = 0; i < N; ++i) { ret[i] = data_[i]; } return ret; } }; } // namespace compatibility template using size_t = compatibility::size_t; } // namespace cl #endif // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) // Helper alias to avoid confusing the macros namespace cl { namespace detail { using size_t_array = array; } // namespace detail } // namespace cl /*! \namespace cl * * \brief The OpenCL C++ bindings are defined within this namespace. * */ namespace cl { class Memory; #define CL_HPP_INIT_CL_EXT_FCN_PTR_(name) \ if (!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddress(#name); \ if (!pfn_##name) { \ } \ } #define CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, name) \ if (!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddressForPlatform(platform, #name); \ if (!pfn_##name) { \ } \ } class Program; class Device; class Context; class CommandQueue; class DeviceCommandQueue; class Memory; class Buffer; class Pipe; #if defined(CL_HPP_ENABLE_EXCEPTIONS) /*! \brief Exception class * * This may be thrown by API functions when CL_HPP_ENABLE_EXCEPTIONS is defined. */ class Error : public std::exception { private: cl_int err_; const char * errStr_; public: /*! \brief Create a new CL error exception for a given error code * and corresponding message. * * \param err error code value. * * \param errStr a descriptive string that must remain in scope until * handling of the exception has concluded. If set, it * will be returned by what(). */ Error(cl_int err, const char * errStr = NULL) : err_(err), errStr_(errStr) {} ~Error() throw() {} /*! \brief Get error string associated with exception * * \return A memory pointer to the error message string. */ virtual const char * what() const throw () { if (errStr_ == NULL) { return "empty"; } else { return errStr_; } } /*! \brief Get error code associated with exception * * \return The error code. */ cl_int err(void) const { return err_; } }; #define CL_HPP_ERR_STR_(x) #x #else #define CL_HPP_ERR_STR_(x) NULL #endif // CL_HPP_ENABLE_EXCEPTIONS namespace detail { #if defined(CL_HPP_ENABLE_EXCEPTIONS) static inline cl_int errHandler ( cl_int err, const char * errStr = NULL) { if (err != CL_SUCCESS) { throw Error(err, errStr); } return err; } #else static inline cl_int errHandler (cl_int err, const char * errStr = NULL) { (void) errStr; // suppress unused variable warning return err; } #endif // CL_HPP_ENABLE_EXCEPTIONS } //! \cond DOXYGEN_DETAIL #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) #define __GET_DEVICE_INFO_ERR CL_HPP_ERR_STR_(clGetDeviceInfo) #define __GET_PLATFORM_INFO_ERR CL_HPP_ERR_STR_(clGetPlatformInfo) #define __GET_DEVICE_IDS_ERR CL_HPP_ERR_STR_(clGetDeviceIDs) #define __GET_PLATFORM_IDS_ERR CL_HPP_ERR_STR_(clGetPlatformIDs) #define __GET_CONTEXT_INFO_ERR CL_HPP_ERR_STR_(clGetContextInfo) #define __GET_EVENT_INFO_ERR CL_HPP_ERR_STR_(clGetEventInfo) #define __GET_EVENT_PROFILE_INFO_ERR CL_HPP_ERR_STR_(clGetEventProfileInfo) #define __GET_MEM_OBJECT_INFO_ERR CL_HPP_ERR_STR_(clGetMemObjectInfo) #define __GET_IMAGE_INFO_ERR CL_HPP_ERR_STR_(clGetImageInfo) #define __GET_SAMPLER_INFO_ERR CL_HPP_ERR_STR_(clGetSamplerInfo) #define __GET_KERNEL_INFO_ERR CL_HPP_ERR_STR_(clGetKernelInfo) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __GET_KERNEL_ARG_INFO_ERR CL_HPP_ERR_STR_(clGetKernelArgInfo) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __GET_KERNEL_WORK_GROUP_INFO_ERR CL_HPP_ERR_STR_(clGetKernelWorkGroupInfo) #define __GET_PROGRAM_INFO_ERR CL_HPP_ERR_STR_(clGetProgramInfo) #define __GET_PROGRAM_BUILD_INFO_ERR CL_HPP_ERR_STR_(clGetProgramBuildInfo) #define __GET_COMMAND_QUEUE_INFO_ERR CL_HPP_ERR_STR_(clGetCommandQueueInfo) #define __CREATE_CONTEXT_ERR CL_HPP_ERR_STR_(clCreateContext) #define __CREATE_CONTEXT_FROM_TYPE_ERR CL_HPP_ERR_STR_(clCreateContextFromType) #define __GET_SUPPORTED_IMAGE_FORMATS_ERR CL_HPP_ERR_STR_(clGetSupportedImageFormats) #define __CREATE_BUFFER_ERR CL_HPP_ERR_STR_(clCreateBuffer) #define __COPY_ERR CL_HPP_ERR_STR_(cl::copy) #define __CREATE_SUBBUFFER_ERR CL_HPP_ERR_STR_(clCreateSubBuffer) #define __CREATE_GL_BUFFER_ERR CL_HPP_ERR_STR_(clCreateFromGLBuffer) #define __CREATE_GL_RENDER_BUFFER_ERR CL_HPP_ERR_STR_(clCreateFromGLBuffer) #define __GET_GL_OBJECT_INFO_ERR CL_HPP_ERR_STR_(clGetGLObjectInfo) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_IMAGE_ERR CL_HPP_ERR_STR_(clCreateImage) #define __CREATE_GL_TEXTURE_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture) #define __IMAGE_DIMENSION_ERR CL_HPP_ERR_STR_(Incorrect image dimensions) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR CL_HPP_ERR_STR_(clSetMemObjectDestructorCallback) #define __CREATE_USER_EVENT_ERR CL_HPP_ERR_STR_(clCreateUserEvent) #define __SET_USER_EVENT_STATUS_ERR CL_HPP_ERR_STR_(clSetUserEventStatus) #define __SET_EVENT_CALLBACK_ERR CL_HPP_ERR_STR_(clSetEventCallback) #define __WAIT_FOR_EVENTS_ERR CL_HPP_ERR_STR_(clWaitForEvents) #define __CREATE_KERNEL_ERR CL_HPP_ERR_STR_(clCreateKernel) #define __SET_KERNEL_ARGS_ERR CL_HPP_ERR_STR_(clSetKernelArg) #define __CREATE_PROGRAM_WITH_SOURCE_ERR CL_HPP_ERR_STR_(clCreateProgramWithSource) #define __CREATE_PROGRAM_WITH_BINARY_ERR CL_HPP_ERR_STR_(clCreateProgramWithBinary) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR CL_HPP_ERR_STR_(clCreateProgramWithBuiltInKernels) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __BUILD_PROGRAM_ERR CL_HPP_ERR_STR_(clBuildProgram) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __COMPILE_PROGRAM_ERR CL_HPP_ERR_STR_(clCompileProgram) #define __LINK_PROGRAM_ERR CL_HPP_ERR_STR_(clLinkProgram) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_KERNELS_IN_PROGRAM_ERR CL_HPP_ERR_STR_(clCreateKernelsInProgram) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #define __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR CL_HPP_ERR_STR_(clCreateCommandQueueWithProperties) #define __CREATE_SAMPLER_WITH_PROPERTIES_ERR CL_HPP_ERR_STR_(clCreateSamplerWithProperties) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 #define __SET_COMMAND_QUEUE_PROPERTY_ERR CL_HPP_ERR_STR_(clSetCommandQueueProperty) #define __ENQUEUE_READ_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueReadBuffer) #define __ENQUEUE_READ_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueReadBufferRect) #define __ENQUEUE_WRITE_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueWriteBuffer) #define __ENQUEUE_WRITE_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueWriteBufferRect) #define __ENQEUE_COPY_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueCopyBuffer) #define __ENQEUE_COPY_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueCopyBufferRect) #define __ENQUEUE_FILL_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueFillBuffer) #define __ENQUEUE_READ_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueReadImage) #define __ENQUEUE_WRITE_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueWriteImage) #define __ENQUEUE_COPY_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueCopyImage) #define __ENQUEUE_FILL_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueFillImage) #define __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueCopyImageToBuffer) #define __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueCopyBufferToImage) #define __ENQUEUE_MAP_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueMapBuffer) #define __ENQUEUE_MAP_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueMapImage) #define __ENQUEUE_UNMAP_MEM_OBJECT_ERR CL_HPP_ERR_STR_(clEnqueueUnMapMemObject) #define __ENQUEUE_NDRANGE_KERNEL_ERR CL_HPP_ERR_STR_(clEnqueueNDRangeKernel) #define __ENQUEUE_NATIVE_KERNEL CL_HPP_ERR_STR_(clEnqueueNativeKernel) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_MIGRATE_MEM_OBJECTS_ERR CL_HPP_ERR_STR_(clEnqueueMigrateMemObjects) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_ACQUIRE_GL_ERR CL_HPP_ERR_STR_(clEnqueueAcquireGLObjects) #define __ENQUEUE_RELEASE_GL_ERR CL_HPP_ERR_STR_(clEnqueueReleaseGLObjects) #define __CREATE_PIPE_ERR CL_HPP_ERR_STR_(clCreatePipe) #define __GET_PIPE_INFO_ERR CL_HPP_ERR_STR_(clGetPipeInfo) #define __RETAIN_ERR CL_HPP_ERR_STR_(Retain Object) #define __RELEASE_ERR CL_HPP_ERR_STR_(Release Object) #define __FLUSH_ERR CL_HPP_ERR_STR_(clFlush) #define __FINISH_ERR CL_HPP_ERR_STR_(clFinish) #define __VECTOR_CAPACITY_ERR CL_HPP_ERR_STR_(Vector capacity error) /** * CL 1.2 version that uses device fission. */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_SUB_DEVICES_ERR CL_HPP_ERR_STR_(clCreateSubDevices) #else #define __CREATE_SUB_DEVICES_ERR CL_HPP_ERR_STR_(clCreateSubDevicesEXT) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __ENQUEUE_MARKER_ERR CL_HPP_ERR_STR_(clEnqueueMarker) #define __ENQUEUE_WAIT_FOR_EVENTS_ERR CL_HPP_ERR_STR_(clEnqueueWaitForEvents) #define __ENQUEUE_BARRIER_ERR CL_HPP_ERR_STR_(clEnqueueBarrier) #define __UNLOAD_COMPILER_ERR CL_HPP_ERR_STR_(clUnloadCompiler) #define __CREATE_GL_TEXTURE_2D_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture2D) #define __CREATE_GL_TEXTURE_3D_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture3D) #define __CREATE_IMAGE2D_ERR CL_HPP_ERR_STR_(clCreateImage2D) #define __CREATE_IMAGE3D_ERR CL_HPP_ERR_STR_(clCreateImage3D) #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * Deprecated APIs for 2.0 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) #define __CREATE_COMMAND_QUEUE_ERR CL_HPP_ERR_STR_(clCreateCommandQueue) #define __ENQUEUE_TASK_ERR CL_HPP_ERR_STR_(clEnqueueTask) #define __CREATE_SAMPLER_ERR CL_HPP_ERR_STR_(clCreateSampler) #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * CL 1.2 marker and barrier commands */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_MARKER_WAIT_LIST_ERR CL_HPP_ERR_STR_(clEnqueueMarkerWithWaitList) #define __ENQUEUE_BARRIER_WAIT_LIST_ERR CL_HPP_ERR_STR_(clEnqueueBarrierWithWaitList) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #endif // CL_HPP_USER_OVERRIDE_ERROR_STRINGS //! \endcond namespace detail { // Generic getInfoHelper. The final parameter is used to guide overload // resolution: the actual parameter passed is an int, which makes this // a worse conversion sequence than a specialization that declares the // parameter as an int. template inline cl_int getInfoHelper(Functor f, cl_uint name, T* param, long) { return f(name, sizeof(T), param, NULL); } // Specialized for getInfo // Assumes that the output vector was correctly resized on the way in template inline cl_int getInfoHelper(Func f, cl_uint name, vector>* param, int) { if (name != CL_PROGRAM_BINARIES) { return CL_INVALID_VALUE; } if (param) { // Create array of pointers, calculate total size and pass pointer array in size_type numBinaries = param->size(); vector binariesPointers(numBinaries); for (size_type i = 0; i < numBinaries; ++i) { binariesPointers[i] = (*param)[i].data(); } cl_int err = f(name, numBinaries * sizeof(unsigned char*), binariesPointers.data(), NULL); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } // Specialized getInfoHelper for vector params template inline cl_int getInfoHelper(Func f, cl_uint name, vector* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } const size_type elements = required / sizeof(T); // Temporary to avoid changing param on an error vector localData(elements); err = f(name, required, localData.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { *param = std::move(localData); } return CL_SUCCESS; } /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper( Func f, cl_uint name, vector* param, int, typename T::cl_type = 0) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } const size_type elements = required / sizeof(typename T::cl_type); vector value(elements); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { // Assign to convert CL type to T for each element param->resize(elements); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < elements; i++) { (*param)[i] = T(value[i], true); } } return CL_SUCCESS; } // Specialized GetInfoHelper for string params template inline cl_int getInfoHelper(Func f, cl_uint name, string* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } // std::string has a constant data member // a char vector does not if (required > 0) { vector value(required); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { param->assign(begin(value), prev(end(value))); } } else if (param) { param->assign(""); } return CL_SUCCESS; } // Specialized GetInfoHelper for clsize_t params template inline cl_int getInfoHelper(Func f, cl_uint name, array* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } size_type elements = required / sizeof(size_type); vector value(elements, 0); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } // Bound the copy with N to prevent overruns // if passed N > than the amount copied if (elements > N) { elements = N; } for (size_type i = 0; i < elements; ++i) { (*param)[i] = value[i]; } return CL_SUCCESS; } template struct ReferenceHandler; /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, T* param, int, typename T::cl_type = 0) { typename T::cl_type value; cl_int err = f(name, sizeof(value), &value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; if (value != NULL) { err = param->retain(); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } #define CL_HPP_PARAM_NAME_INFO_1_0_(F) \ F(cl_platform_info, CL_PLATFORM_PROFILE, string) \ F(cl_platform_info, CL_PLATFORM_VERSION, string) \ F(cl_platform_info, CL_PLATFORM_NAME, string) \ F(cl_platform_info, CL_PLATFORM_VENDOR, string) \ F(cl_platform_info, CL_PLATFORM_EXTENSIONS, string) \ \ F(cl_device_info, CL_DEVICE_TYPE, cl_device_type) \ F(cl_device_info, CL_DEVICE_VENDOR_ID, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_COMPUTE_UNITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_SIZES, cl::vector) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_CLOCK_FREQUENCY, cl_uint) \ F(cl_device_info, CL_DEVICE_ADDRESS_BITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_READ_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_MEM_ALLOC_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_WIDTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_HEIGHT, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_WIDTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_HEIGHT, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_DEPTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_MAX_PARAMETER_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_SAMPLERS, cl_uint) \ F(cl_device_info, CL_DEVICE_MEM_BASE_ADDR_ALIGN, cl_uint) \ F(cl_device_info, CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SINGLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, cl_device_mem_cache_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, cl_uint)\ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_TYPE, cl_device_local_mem_type) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_ERROR_CORRECTION_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_PROFILING_TIMER_RESOLUTION, size_type) \ F(cl_device_info, CL_DEVICE_ENDIAN_LITTLE, cl_bool) \ F(cl_device_info, CL_DEVICE_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_COMPILER_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_EXECUTION_CAPABILITIES, cl_device_exec_capabilities) \ F(cl_device_info, CL_DEVICE_PLATFORM, cl_platform_id) \ F(cl_device_info, CL_DEVICE_NAME, string) \ F(cl_device_info, CL_DEVICE_VENDOR, string) \ F(cl_device_info, CL_DRIVER_VERSION, string) \ F(cl_device_info, CL_DEVICE_PROFILE, string) \ F(cl_device_info, CL_DEVICE_VERSION, string) \ F(cl_device_info, CL_DEVICE_EXTENSIONS, string) \ \ F(cl_context_info, CL_CONTEXT_REFERENCE_COUNT, cl_uint) \ F(cl_context_info, CL_CONTEXT_DEVICES, cl::vector) \ F(cl_context_info, CL_CONTEXT_PROPERTIES, cl::vector) \ \ F(cl_event_info, CL_EVENT_COMMAND_QUEUE, cl::CommandQueue) \ F(cl_event_info, CL_EVENT_COMMAND_TYPE, cl_command_type) \ F(cl_event_info, CL_EVENT_REFERENCE_COUNT, cl_uint) \ F(cl_event_info, CL_EVENT_COMMAND_EXECUTION_STATUS, cl_int) \ \ F(cl_profiling_info, CL_PROFILING_COMMAND_QUEUED, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_SUBMIT, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_START, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_END, cl_ulong) \ \ F(cl_mem_info, CL_MEM_TYPE, cl_mem_object_type) \ F(cl_mem_info, CL_MEM_FLAGS, cl_mem_flags) \ F(cl_mem_info, CL_MEM_SIZE, size_type) \ F(cl_mem_info, CL_MEM_HOST_PTR, void*) \ F(cl_mem_info, CL_MEM_MAP_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_REFERENCE_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_CONTEXT, cl::Context) \ \ F(cl_image_info, CL_IMAGE_FORMAT, cl_image_format) \ F(cl_image_info, CL_IMAGE_ELEMENT_SIZE, size_type) \ F(cl_image_info, CL_IMAGE_ROW_PITCH, size_type) \ F(cl_image_info, CL_IMAGE_SLICE_PITCH, size_type) \ F(cl_image_info, CL_IMAGE_WIDTH, size_type) \ F(cl_image_info, CL_IMAGE_HEIGHT, size_type) \ F(cl_image_info, CL_IMAGE_DEPTH, size_type) \ \ F(cl_sampler_info, CL_SAMPLER_REFERENCE_COUNT, cl_uint) \ F(cl_sampler_info, CL_SAMPLER_CONTEXT, cl::Context) \ F(cl_sampler_info, CL_SAMPLER_NORMALIZED_COORDS, cl_bool) \ F(cl_sampler_info, CL_SAMPLER_ADDRESSING_MODE, cl_addressing_mode) \ F(cl_sampler_info, CL_SAMPLER_FILTER_MODE, cl_filter_mode) \ \ F(cl_program_info, CL_PROGRAM_REFERENCE_COUNT, cl_uint) \ F(cl_program_info, CL_PROGRAM_CONTEXT, cl::Context) \ F(cl_program_info, CL_PROGRAM_NUM_DEVICES, cl_uint) \ F(cl_program_info, CL_PROGRAM_DEVICES, cl::vector) \ F(cl_program_info, CL_PROGRAM_SOURCE, string) \ F(cl_program_info, CL_PROGRAM_BINARY_SIZES, cl::vector) \ F(cl_program_info, CL_PROGRAM_BINARIES, cl::vector>) \ \ F(cl_program_build_info, CL_PROGRAM_BUILD_STATUS, cl_build_status) \ F(cl_program_build_info, CL_PROGRAM_BUILD_OPTIONS, string) \ F(cl_program_build_info, CL_PROGRAM_BUILD_LOG, string) \ \ F(cl_kernel_info, CL_KERNEL_FUNCTION_NAME, string) \ F(cl_kernel_info, CL_KERNEL_NUM_ARGS, cl_uint) \ F(cl_kernel_info, CL_KERNEL_REFERENCE_COUNT, cl_uint) \ F(cl_kernel_info, CL_KERNEL_CONTEXT, cl::Context) \ F(cl_kernel_info, CL_KERNEL_PROGRAM, cl::Program) \ \ F(cl_kernel_work_group_info, CL_KERNEL_WORK_GROUP_SIZE, size_type) \ F(cl_kernel_work_group_info, CL_KERNEL_COMPILE_WORK_GROUP_SIZE, cl::detail::size_t_array) \ F(cl_kernel_work_group_info, CL_KERNEL_LOCAL_MEM_SIZE, cl_ulong) \ \ F(cl_command_queue_info, CL_QUEUE_CONTEXT, cl::Context) \ F(cl_command_queue_info, CL_QUEUE_DEVICE, cl::Device) \ F(cl_command_queue_info, CL_QUEUE_REFERENCE_COUNT, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_PROPERTIES, cl_command_queue_properties) #define CL_HPP_PARAM_NAME_INFO_1_1_(F) \ F(cl_context_info, CL_CONTEXT_NUM_DEVICES, cl_uint)\ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_DOUBLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HALF_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_OPENCL_C_VERSION, string) \ \ F(cl_mem_info, CL_MEM_ASSOCIATED_MEMOBJECT, cl::Memory) \ F(cl_mem_info, CL_MEM_OFFSET, size_type) \ \ F(cl_kernel_work_group_info, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, size_type) \ F(cl_kernel_work_group_info, CL_KERNEL_PRIVATE_MEM_SIZE, cl_ulong) \ \ F(cl_event_info, CL_EVENT_CONTEXT, cl::Context) #define CL_HPP_PARAM_NAME_INFO_1_2_(F) \ F(cl_program_info, CL_PROGRAM_NUM_KERNELS, size_type) \ F(cl_program_info, CL_PROGRAM_KERNEL_NAMES, string) \ \ F(cl_program_build_info, CL_PROGRAM_BINARY_TYPE, cl_program_binary_type) \ \ F(cl_kernel_info, CL_KERNEL_ATTRIBUTES, string) \ \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ADDRESS_QUALIFIER, cl_kernel_arg_address_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ACCESS_QUALIFIER, cl_kernel_arg_access_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_NAME, string) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_NAME, string) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_QUALIFIER, cl_kernel_arg_type_qualifier) \ \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE, cl::Device) \ F(cl_device_info, CL_DEVICE_PARTITION_PROPERTIES, cl::vector) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPE, cl::vector) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, size_type) \ F(cl_device_info, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, cl_device_affinity_domain) \ F(cl_device_info, CL_DEVICE_BUILT_IN_KERNELS, string) \ \ F(cl_image_info, CL_IMAGE_ARRAY_SIZE, size_type) \ F(cl_image_info, CL_IMAGE_NUM_MIP_LEVELS, cl_uint) \ F(cl_image_info, CL_IMAGE_NUM_SAMPLES, cl_uint) #define CL_HPP_PARAM_NAME_INFO_2_0_(F) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_ON_DEVICE_QUEUES, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_ON_DEVICE_EVENTS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_PIPE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_PIPE_MAX_PACKET_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SVM_CAPABILITIES, cl_device_svm_capabilities) \ F(cl_device_info, CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_SIZE, cl_uint) \ F(cl_mem_info, CL_MEM_USES_SVM_POINTER, cl_bool) \ F(cl_program_build_info, CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE, size_type) \ F(cl_pipe_info, CL_PIPE_PACKET_SIZE, cl_uint) \ F(cl_pipe_info, CL_PIPE_MAX_PACKETS, cl_uint) #define CL_HPP_PARAM_NAME_DEVICE_FISSION_(F) \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE_EXT, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPES_EXT, cl::vector) \ F(cl_device_info, CL_DEVICE_AFFINITY_DOMAINS_EXT, cl::vector) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT_EXT , cl_uint) \ F(cl_device_info, CL_DEVICE_PARTITION_STYLE_EXT, cl::vector) template struct param_traits {}; #define CL_HPP_DECLARE_PARAM_TRAITS_(token, param_name, T) \ struct token; \ template<> \ struct param_traits \ { \ enum { value = param_name }; \ typedef T param_type; \ }; CL_HPP_PARAM_NAME_INFO_1_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_PARAM_NAME_INFO_1_1_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 #if CL_HPP_TARGET_OPENCL_VERSION >= 120 CL_HPP_PARAM_NAME_INFO_1_2_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 CL_HPP_PARAM_NAME_INFO_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 // Flags deprecated in OpenCL 2.0 #define CL_HPP_PARAM_NAME_INFO_1_0_DEPRECATED_IN_2_0_(F) \ F(cl_device_info, CL_DEVICE_QUEUE_PROPERTIES, cl_command_queue_properties) #define CL_HPP_PARAM_NAME_INFO_1_1_DEPRECATED_IN_2_0_(F) \ F(cl_device_info, CL_DEVICE_HOST_UNIFIED_MEMORY, cl_bool) #define CL_HPP_PARAM_NAME_INFO_1_2_DEPRECATED_IN_2_0_(F) \ F(cl_image_info, CL_IMAGE_BUFFER, cl::Buffer) // Include deprecated query flags based on versions // Only include deprecated 1.0 flags if 2.0 not active as there is an enum clash #if CL_HPP_TARGET_OPENCL_VERSION > 100 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 && CL_HPP_TARGET_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_0_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 110 #if CL_HPP_TARGET_OPENCL_VERSION > 110 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_1_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 #if CL_HPP_TARGET_OPENCL_VERSION > 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_2_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 200 #if defined(CL_HPP_USE_CL_DEVICE_FISSION) CL_HPP_PARAM_NAME_DEVICE_FISSION_(CL_HPP_DECLARE_PARAM_TRAITS_); #endif // CL_HPP_USE_CL_DEVICE_FISSION #ifdef CL_PLATFORM_ICD_SUFFIX_KHR CL_HPP_DECLARE_PARAM_TRAITS_(cl_platform_info, CL_PLATFORM_ICD_SUFFIX_KHR, string) #endif #ifdef CL_DEVICE_PROFILING_TIMER_OFFSET_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_PROFILING_TIMER_OFFSET_AMD, cl_ulong) #endif #ifdef CL_DEVICE_GLOBAL_FREE_MEMORY_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, vector) #endif #ifdef CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_WAVEFRONT_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_WAVEFRONT_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_BANKS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_LOCAL_MEM_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV, cl_uint) #endif #ifdef CL_DEVICE_REGISTERS_PER_BLOCK_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_REGISTERS_PER_BLOCK_NV, cl_uint) #endif #ifdef CL_DEVICE_WARP_SIZE_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_WARP_SIZE_NV, cl_uint) #endif #ifdef CL_DEVICE_GPU_OVERLAP_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GPU_OVERLAP_NV, cl_bool) #endif #ifdef CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV, cl_bool) #endif #ifdef CL_DEVICE_INTEGRATED_MEMORY_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_INTEGRATED_MEMORY_NV, cl_bool) #endif // Convenience functions template inline cl_int getInfo(Func f, cl_uint name, T* param) { return getInfoHelper(f, name, param, 0); } template struct GetInfoFunctor0 { Func f_; const Arg0& arg0_; cl_int operator ()( cl_uint param, size_type size, void* value, size_type* size_ret) { return f_(arg0_, param, size, value, size_ret); } }; template struct GetInfoFunctor1 { Func f_; const Arg0& arg0_; const Arg1& arg1_; cl_int operator ()( cl_uint param, size_type size, void* value, size_type* size_ret) { return f_(arg0_, arg1_, param, size, value, size_ret); } }; template inline cl_int getInfo(Func f, const Arg0& arg0, cl_uint name, T* param) { GetInfoFunctor0 f0 = { f, arg0 }; return getInfoHelper(f0, name, param, 0); } template inline cl_int getInfo(Func f, const Arg0& arg0, const Arg1& arg1, cl_uint name, T* param) { GetInfoFunctor1 f0 = { f, arg0, arg1 }; return getInfoHelper(f0, name, param, 0); } template struct ReferenceHandler { }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * OpenCL 1.2 devices do have retain/release. */ template <> struct ReferenceHandler { /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int retain(cl_device_id device) { return ::clRetainDevice(device); } /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int release(cl_device_id device) { return ::clReleaseDevice(device); } }; #else // CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * OpenCL 1.1 devices do not have retain/release. */ template <> struct ReferenceHandler { // cl_device_id does not have retain(). static cl_int retain(cl_device_id) { return CL_SUCCESS; } // cl_device_id does not have release(). static cl_int release(cl_device_id) { return CL_SUCCESS; } }; #endif // ! (CL_HPP_TARGET_OPENCL_VERSION >= 120) template <> struct ReferenceHandler { // cl_platform_id does not have retain(). static cl_int retain(cl_platform_id) { return CL_SUCCESS; } // cl_platform_id does not have release(). static cl_int release(cl_platform_id) { return CL_SUCCESS; } }; template <> struct ReferenceHandler { static cl_int retain(cl_context context) { return ::clRetainContext(context); } static cl_int release(cl_context context) { return ::clReleaseContext(context); } }; template <> struct ReferenceHandler { static cl_int retain(cl_command_queue queue) { return ::clRetainCommandQueue(queue); } static cl_int release(cl_command_queue queue) { return ::clReleaseCommandQueue(queue); } }; template <> struct ReferenceHandler { static cl_int retain(cl_mem memory) { return ::clRetainMemObject(memory); } static cl_int release(cl_mem memory) { return ::clReleaseMemObject(memory); } }; template <> struct ReferenceHandler { static cl_int retain(cl_sampler sampler) { return ::clRetainSampler(sampler); } static cl_int release(cl_sampler sampler) { return ::clReleaseSampler(sampler); } }; template <> struct ReferenceHandler { static cl_int retain(cl_program program) { return ::clRetainProgram(program); } static cl_int release(cl_program program) { return ::clReleaseProgram(program); } }; template <> struct ReferenceHandler { static cl_int retain(cl_kernel kernel) { return ::clRetainKernel(kernel); } static cl_int release(cl_kernel kernel) { return ::clReleaseKernel(kernel); } }; template <> struct ReferenceHandler { static cl_int retain(cl_event event) { return ::clRetainEvent(event); } static cl_int release(cl_event event) { return ::clReleaseEvent(event); } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Extracts version number with major in the upper 16 bits, minor in the lower 16 static cl_uint getVersion(const vector &versionInfo) { int highVersion = 0; int lowVersion = 0; int index = 7; while(versionInfo[index] != '.' ) { highVersion *= 10; highVersion += versionInfo[index]-'0'; ++index; } ++index; while(versionInfo[index] != ' ' && versionInfo[index] != '\0') { lowVersion *= 10; lowVersion += versionInfo[index]-'0'; ++index; } return (highVersion << 16) | lowVersion; } static cl_uint getPlatformVersion(cl_platform_id platform) { size_type size = 0; clGetPlatformInfo(platform, CL_PLATFORM_VERSION, 0, NULL, &size); vector versionInfo(size); clGetPlatformInfo(platform, CL_PLATFORM_VERSION, size, versionInfo.data(), &size); return getVersion(versionInfo); } static cl_uint getDevicePlatformVersion(cl_device_id device) { cl_platform_id platform; clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(platform), &platform, NULL); return getPlatformVersion(platform); } static cl_uint getContextPlatformVersion(cl_context context) { // The platform cannot be queried directly, so we first have to grab a // device and obtain its context size_type size = 0; clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &size); if (size == 0) return 0; vector devices(size/sizeof(cl_device_id)); clGetContextInfo(context, CL_CONTEXT_DEVICES, size, devices.data(), NULL); return getDevicePlatformVersion(devices[0]); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 template class Wrapper { public: typedef T cl_type; protected: cl_type object_; public: Wrapper() : object_(NULL) { } Wrapper(const cl_type &obj, bool retainObject) : object_(obj) { if (retainObject) { detail::errHandler(retain(), __RETAIN_ERR); } } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; detail::errHandler(retain(), __RETAIN_ERR); } Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT_ { object_ = rhs.object_; rhs.object_ = NULL; } Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; detail::errHandler(retain(), __RETAIN_ERR); } return *this; } Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; rhs.object_ = NULL; } return *this; } Wrapper& operator = (const cl_type &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs; return *this; } const cl_type& operator ()() const { return object_; } cl_type& operator ()() { return object_; } const cl_type get() const { return object_; } cl_type get() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); cl_int retain() const { if (object_ != nullptr) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if (object_ != nullptr) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; template <> class Wrapper { public: typedef cl_device_id cl_type; protected: cl_type object_; bool referenceCountable_; static bool isReferenceCountable(cl_device_id device) { bool retVal = false; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (device != NULL) { int version = getDevicePlatformVersion(device); if(version > ((1 << 16) + 1)) { retVal = true; } } #else // CL_HPP_MINIMUM_OPENCL_VERSION < 120 retVal = true; #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 return retVal; } public: Wrapper() : object_(NULL), referenceCountable_(false) { } Wrapper(const cl_type &obj, bool retainObject) : object_(obj), referenceCountable_(false) { referenceCountable_ = isReferenceCountable(obj); if (retainObject) { detail::errHandler(retain(), __RETAIN_ERR); } } ~Wrapper() { release(); } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; referenceCountable_ = isReferenceCountable(object_); detail::errHandler(retain(), __RETAIN_ERR); } Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT_ { object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; detail::errHandler(retain(), __RETAIN_ERR); } return *this; } Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } return *this; } Wrapper& operator = (const cl_type &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs; referenceCountable_ = isReferenceCountable(object_); return *this; } const cl_type& operator ()() const { return object_; } cl_type& operator ()() { return object_; } const cl_type get() const { return object_; } cl_type get() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); template friend inline cl_int getInfoHelper(Func, cl_uint, vector*, int, typename U::cl_type); cl_int retain() const { if( object_ != nullptr && referenceCountable_ ) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if (object_ != nullptr && referenceCountable_) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; template inline bool operator==(const Wrapper &lhs, const Wrapper &rhs) { return lhs() == rhs(); } template inline bool operator!=(const Wrapper &lhs, const Wrapper &rhs) { return !operator==(lhs, rhs); } } // namespace detail //! \endcond using BuildLogType = vector::param_type>>; #if defined(CL_HPP_ENABLE_EXCEPTIONS) /** * Exception class for build errors to carry build info */ class BuildError : public Error { private: BuildLogType buildLogs; public: BuildError(cl_int err, const char * errStr, const BuildLogType &vec) : Error(err, errStr), buildLogs(vec) { } BuildLogType getBuildLog() const { return buildLogs; } }; namespace detail { static inline cl_int buildErrHandler( cl_int err, const char * errStr, const BuildLogType &buildLogs) { if (err != CL_SUCCESS) { throw BuildError(err, errStr, buildLogs); } return err; } } // namespace detail #else namespace detail { static inline cl_int buildErrHandler( cl_int err, const char * errStr, const BuildLogType &buildLogs) { (void)buildLogs; // suppress unused variable warning (void)errStr; return err; } } // namespace detail #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) /*! \stuct ImageFormat * \brief Adds constructors and member functions for cl_image_format. * * \see cl_image_format */ struct ImageFormat : public cl_image_format { //! \brief Default constructor - performs no initialization. ImageFormat(){} //! \brief Initializing constructor. ImageFormat(cl_channel_order order, cl_channel_type type) { image_channel_order = order; image_channel_data_type = type; } //! \brief Assignment operator. ImageFormat& operator = (const ImageFormat& rhs) { if (this != &rhs) { this->image_channel_data_type = rhs.image_channel_data_type; this->image_channel_order = rhs.image_channel_order; } return *this; } }; /*! \brief Class interface for cl_device_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_device_id */ class Device : public detail::Wrapper { private: static std::once_flag default_initialized_; static Device default_; static cl_int default_error_; /*! \brief Create the default context. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault(); /*! \brief Create the default platform from a provided platform. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Device &p) { default_ = p; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Device(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE //! \brief Default constructor - initializes to NULL. Device() : detail::Wrapper() { } /*! \brief Constructor from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ explicit Device(const cl_device_id &device, bool retainObject = false) : detail::Wrapper(device, retainObject) { } /*! \brief Returns the first device on the default context. * * \see Context::getDefault() */ static Device getDefault( cl_int *errResult = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (errResult != NULL) { *errResult = default_error_; } return default_; } /** * Modify the default device to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default device. * Should be compared to the passed value to ensure that it was updated. */ static Device setDefault(const Device &default_device) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_device)); detail::errHandler(default_error_); return default_; } /*! \brief Assignment operator from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ Device& operator = (const cl_device_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Device(const Device& dev) : detail::Wrapper(dev) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Device& operator = (const Device &dev) { detail::Wrapper::operator=(dev); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Device(Device&& dev) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(dev)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Device& operator = (Device &&dev) { detail::Wrapper::operator=(std::move(dev)); return *this; } //! \brief Wrapper for clGetDeviceInfo(). template cl_int getInfo(cl_device_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetDeviceInfo, object_, name, param), __GET_DEVICE_INFO_ERR); } //! \brief Wrapper for clGetDeviceInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_device_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /** * CL 1.2 version */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 //! \brief Wrapper for clCreateSubDevices(). cl_int createSubDevices( const cl_device_partition_property * properties, vector* devices) { cl_uint n = 0; cl_int err = clCreateSubDevices(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } vector ids(n); err = clCreateSubDevices(object_, properties, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { // We do not need to retain because this device is being created // by the runtime (*devices)[i] = Device(ids[i], false); } } return CL_SUCCESS; } #elif defined(CL_HPP_USE_CL_DEVICE_FISSION) /** * CL 1.1 version that uses device fission extension. */ cl_int createSubDevices( const cl_device_partition_property_ext * properties, vector* devices) { typedef CL_API_ENTRY cl_int ( CL_API_CALL * PFN_clCreateSubDevicesEXT)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; static PFN_clCreateSubDevicesEXT pfn_clCreateSubDevicesEXT = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_(clCreateSubDevicesEXT); cl_uint n = 0; cl_int err = pfn_clCreateSubDevicesEXT(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } vector ids(n); err = pfn_clCreateSubDevicesEXT(object_, properties, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { // We do not need to retain because this device is being created // by the runtime (*devices)[i] = Device(ids[i], false); } } return CL_SUCCESS; } #endif // defined(CL_HPP_USE_CL_DEVICE_FISSION) }; CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Device::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Device Device::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Device::default_error_ = CL_SUCCESS; /*! \brief Class interface for cl_platform_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_platform_id */ class Platform : public detail::Wrapper { private: static std::once_flag default_initialized_; static Platform default_; static cl_int default_error_; /*! \brief Create the default context. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { // If default wasn't passed ,generate one // Otherwise set it cl_uint n = 0; cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { default_error_ = err; return; } if (n == 0) { default_error_ = CL_INVALID_PLATFORM; return; } vector ids(n); err = ::clGetPlatformIDs(n, ids.data(), NULL); if (err != CL_SUCCESS) { default_error_ = err; return; } default_ = Platform(ids[0]); } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default platform from a provided platform. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Platform &p) { default_ = p; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Platform(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE //! \brief Default constructor - initializes to NULL. Platform() : detail::Wrapper() { } /*! \brief Constructor from cl_platform_id. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This simply copies the platform ID value, which is an inexpensive operation. */ explicit Platform(const cl_platform_id &platform, bool retainObject = false) : detail::Wrapper(platform, retainObject) { } /*! \brief Assignment operator from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ Platform& operator = (const cl_platform_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } static Platform getDefault( cl_int *errResult = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (errResult != NULL) { *errResult = default_error_; } return default_; } /** * Modify the default platform to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default platform. * Should be compared to the passed value to ensure that it was updated. */ static Platform setDefault(const Platform &default_platform) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_platform)); detail::errHandler(default_error_); return default_; } //! \brief Wrapper for clGetPlatformInfo(). cl_int getInfo(cl_platform_info name, string* param) const { return detail::errHandler( detail::getInfo(&::clGetPlatformInfo, object_, name, param), __GET_PLATFORM_INFO_ERR); } //! \brief Wrapper for clGetPlatformInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_platform_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of devices for this platform. * * Wraps clGetDeviceIDs(). */ cl_int getDevices( cl_device_type type, vector* devices) const { cl_uint n = 0; if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } cl_int err = ::clGetDeviceIDs(object_, type, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } vector ids(n); err = ::clGetDeviceIDs(object_, type, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction // We must retain things we obtain from the API to avoid releasing // API-owned objects. if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { (*devices)[i] = Device(ids[i], true); } } return CL_SUCCESS; } #if defined(CL_HPP_USE_DX_INTEROP) /*! \brief Get the list of available D3D10 devices. * * \param d3d_device_source. * * \param d3d_object. * * \param d3d_device_set. * * \param devices returns a vector of OpenCL D3D10 devices found. The cl::Device * values returned in devices can be used to identify a specific OpenCL * device. If \a devices argument is NULL, this argument is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * * The application can query specific capabilities of the OpenCL device(s) * returned by cl::getDevices. This can be used by the application to * determine which device(s) to use. * * \note In the case that exceptions are enabled and a return value * other than CL_SUCCESS is generated, then cl::Error exception is * generated. */ cl_int getDevices( cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, vector* devices) const { typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clGetDeviceIDsFromD3D10KHR)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint* num_devices); if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } static PFN_clGetDeviceIDsFromD3D10KHR pfn_clGetDeviceIDsFromD3D10KHR = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(object_, clGetDeviceIDsFromD3D10KHR); cl_uint n = 0; cl_int err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } vector ids(n); err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction // We must retain things we obtain from the API to avoid releasing // API-owned objects. if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { (*devices)[i] = Device(ids[i], true); } } return CL_SUCCESS; } #endif /*! \brief Gets a list of available platforms. * * Wraps clGetPlatformIDs(). */ static cl_int get( vector* platforms) { cl_uint n = 0; if( platforms == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } vector ids(n); err = ::clGetPlatformIDs(n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } if (platforms) { platforms->resize(ids.size()); // Platforms don't reference count for (size_type i = 0; i < ids.size(); i++) { (*platforms)[i] = Platform(ids[i]); } } return CL_SUCCESS; } /*! \brief Gets the first available platform. * * Wraps clGetPlatformIDs(), returning the first result. */ static cl_int get( Platform * platform) { cl_int err; Platform default_platform = Platform::getDefault(&err); if (platform) { *platform = default_platform; } return err; } /*! \brief Gets the first available platform, returning it by value. * * \return Returns a valid platform if one is available. * If no platform is available will return a null platform. * Throws an exception if no platforms are available * or an error condition occurs. * Wraps clGetPlatformIDs(), returning the first result. */ static Platform get( cl_int * errResult = NULL) { cl_int err; Platform default_platform = Platform::getDefault(&err); if (errResult) { *errResult = err; } return default_platform; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 //! \brief Wrapper for clUnloadCompiler(). cl_int unloadCompiler() { return ::clUnloadPlatformCompiler(object_); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 }; // class Platform CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Platform::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Platform Platform::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Platform::default_error_ = CL_SUCCESS; /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * Unload the OpenCL compiler. * \note Deprecated for OpenCL 1.2. Use Platform::unloadCompiler instead. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int UnloadCompiler() CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline cl_int UnloadCompiler() { return ::clUnloadCompiler(); } #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for cl_context. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_context as the original. For details, see * clRetainContext() and clReleaseContext(). * * \see cl_context */ class Context : public detail::Wrapper { private: static std::once_flag default_initialized_; static Context default_; static cl_int default_error_; /*! \brief Create the default context from the default device type in the default platform. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { #if !defined(__APPLE__) && !defined(__MACOS) const Platform &p = Platform::getDefault(); cl_platform_id defaultPlatform = p(); cl_context_properties properties[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)defaultPlatform, 0 }; #else // #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties *properties = nullptr; #endif // #if !defined(__APPLE__) && !defined(__MACOS) default_ = Context( CL_DEVICE_TYPE_DEFAULT, properties, NULL, NULL, &default_error_); } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default context from a provided Context. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Context &c) { default_ = c; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Context(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Constructs a context including a list of specified devices. * * Wraps clCreateContext(). */ Context( const vector& devices, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateContext( properties, (cl_uint) numDevices, deviceIDs.data(), notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } Context( const Device& device, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; cl_device_id deviceID = device(); object_ = ::clCreateContext( properties, 1, &deviceID, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a context including all or a subset of devices of a specified type. * * Wraps clCreateContextFromType(). */ Context( cl_device_type type, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties prop[4] = {CL_CONTEXT_PLATFORM, 0, 0, 0 }; if (properties == NULL) { // Get a valid platform ID as we cannot send in a blank one vector platforms; error = Platform::get(&platforms); if (error != CL_SUCCESS) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } return; } // Check the platforms we found for a device of our specified type cl_context_properties platform_id = 0; for (unsigned int i = 0; i < platforms.size(); i++) { vector devices; #if defined(CL_HPP_ENABLE_EXCEPTIONS) try { #endif error = platforms[i].getDevices(type, &devices); #if defined(CL_HPP_ENABLE_EXCEPTIONS) } catch (Error) {} // Catch if exceptions are enabled as we don't want to exit if first platform has no devices of type // We do error checking next anyway, and can throw there if needed #endif // Only squash CL_SUCCESS and CL_DEVICE_NOT_FOUND if (error != CL_SUCCESS && error != CL_DEVICE_NOT_FOUND) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } if (devices.size() > 0) { platform_id = (cl_context_properties)platforms[i](); break; } } if (platform_id == 0) { detail::errHandler(CL_DEVICE_NOT_FOUND, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = CL_DEVICE_NOT_FOUND; } return; } prop[1] = platform_id; properties = &prop[0]; } #endif object_ = ::clCreateContextFromType( properties, type, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Context(const Context& ctx) : detail::Wrapper(ctx) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Context& operator = (const Context &ctx) { detail::Wrapper::operator=(ctx); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Context(Context&& ctx) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(ctx)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Context& operator = (Context &&ctx) { detail::Wrapper::operator=(std::move(ctx)); return *this; } /*! \brief Returns a singleton context including all devices of CL_DEVICE_TYPE_DEFAULT. * * \note All calls to this function return the same cl_context as the first. */ static Context getDefault(cl_int * err = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (err != NULL) { *err = default_error_; } return default_; } /** * Modify the default context to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default context. * Should be compared to the passed value to ensure that it was updated. */ static Context setDefault(const Context &default_context) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_context)); detail::errHandler(default_error_); return default_; } //! \brief Default constructor - initializes to NULL. Context() : detail::Wrapper() { } /*! \brief Constructor from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the cl_context * into the new Context object. */ explicit Context(const cl_context& context, bool retainObject = false) : detail::Wrapper(context, retainObject) { } /*! \brief Assignment operator from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseContext() on the value previously held by this instance. */ Context& operator = (const cl_context& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetContextInfo(). template cl_int getInfo(cl_context_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetContextInfo, object_, name, param), __GET_CONTEXT_INFO_ERR); } //! \brief Wrapper for clGetContextInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_context_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of supported image formats. * * Wraps clGetSupportedImageFormats(). */ cl_int getSupportedImageFormats( cl_mem_flags flags, cl_mem_object_type type, vector* formats) const { cl_uint numEntries; if (!formats) { return CL_SUCCESS; } cl_int err = ::clGetSupportedImageFormats( object_, flags, type, 0, NULL, &numEntries); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } if (numEntries > 0) { vector value(numEntries); err = ::clGetSupportedImageFormats( object_, flags, type, numEntries, (cl_image_format*)value.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } formats->assign(begin(value), end(value)); } else { // If no values are being returned, ensure an empty vector comes back formats->clear(); } return CL_SUCCESS; } }; inline void Device::makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { cl_int error = 0; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { default_error_ = error; } else { default_ = context.getInfo()[0]; default_error_ = CL_SUCCESS; } } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Context::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Context Context::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Context::default_error_ = CL_SUCCESS; /*! \brief Class interface for cl_event. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_event as the original. For details, see * clRetainEvent() and clReleaseEvent(). * * \see cl_event */ class Event : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Event() : detail::Wrapper() { } /*! \brief Constructor from cl_event - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_event * into the new Event object. */ explicit Event(const cl_event& event, bool retainObject = false) : detail::Wrapper(event, retainObject) { } /*! \brief Assignment operator from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseEvent() on the value previously held by this instance. */ Event& operator = (const cl_event& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetEventInfo(). template cl_int getInfo(cl_event_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetEventInfo, object_, name, param), __GET_EVENT_INFO_ERR); } //! \brief Wrapper for clGetEventInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_event_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } //! \brief Wrapper for clGetEventProfilingInfo(). template cl_int getProfilingInfo(cl_profiling_info name, T* param) const { return detail::errHandler(detail::getInfo( &::clGetEventProfilingInfo, object_, name, param), __GET_EVENT_PROFILE_INFO_ERR); } //! \brief Wrapper for clGetEventProfilingInfo() that returns by value. template typename detail::param_traits::param_type getProfilingInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_profiling_info, name>::param_type param; cl_int result = getProfilingInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Blocks the calling thread until this event completes. * * Wraps clWaitForEvents(). */ cl_int wait() const { return detail::errHandler( ::clWaitForEvents(1, &object_), __WAIT_FOR_EVENTS_ERR); } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Registers a user callback function for a specific command execution status. * * Wraps clSetEventCallback(). */ cl_int setCallback( cl_int type, void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data = NULL) { return detail::errHandler( ::clSetEventCallback( object_, type, pfn_notify, user_data), __SET_EVENT_CALLBACK_ERR); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ static cl_int waitForEvents(const vector& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Class interface for user events (a subset of cl_event's). * * See Event for details about copy semantics, etc. */ class UserEvent : public Event { public: /*! \brief Constructs a user event on a given context. * * Wraps clCreateUserEvent(). */ UserEvent( const Context& context, cl_int * err = NULL) { cl_int error; object_ = ::clCreateUserEvent( context(), &error); detail::errHandler(error, __CREATE_USER_EVENT_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. UserEvent() : Event() { } /*! \brief Sets the execution status of a user event object. * * Wraps clSetUserEventStatus(). */ cl_int setStatus(cl_int status) { return detail::errHandler( ::clSetUserEventStatus(object_,status), __SET_USER_EVENT_STATUS_ERR); } }; #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ inline static cl_int WaitForEvents(const vector& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } /*! \brief Class interface for cl_mem. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_mem as the original. For details, see * clRetainMemObject() and clReleaseMemObject(). * * \see cl_mem */ class Memory : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Memory() : detail::Wrapper() { } /*! \brief Constructor from cl_mem - takes ownership. * * Optionally transfer ownership of a refcount on the cl_mem * into the new Memory object. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * * See Memory for further details. */ explicit Memory(const cl_mem& memory, bool retainObject) : detail::Wrapper(memory, retainObject) { } /*! \brief Assignment operator from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseMemObject() on the value previously held by this instance. */ Memory& operator = (const cl_mem& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Memory(const Memory& mem) : detail::Wrapper(mem) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Memory& operator = (const Memory &mem) { detail::Wrapper::operator=(mem); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Memory(Memory&& mem) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(mem)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Memory& operator = (Memory &&mem) { detail::Wrapper::operator=(std::move(mem)); return *this; } //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_mem_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetMemObjectInfo, object_, name, param), __GET_MEM_OBJECT_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_mem_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Registers a callback function to be called when the memory object * is no longer needed. * * Wraps clSetMemObjectDestructorCallback(). * * Repeated calls to this function, for a given cl_mem value, will append * to the list of functions called (in reverse order) when memory object's * resources are freed and the memory object is deleted. * * \note * The registered callbacks are associated with the underlying cl_mem * value - not the Memory class instance. */ cl_int setDestructorCallback( void (CL_CALLBACK * pfn_notify)(cl_mem, void *), void * user_data = NULL) { return detail::errHandler( ::clSetMemObjectDestructorCallback( object_, pfn_notify, user_data), __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 }; // Pre-declare copy functions class Buffer; template< typename IteratorType > cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); #if CL_HPP_TARGET_OPENCL_VERSION >= 200 namespace detail { class SVMTraitNull { public: static cl_svm_mem_flags getSVMMemFlags() { return 0; } }; } // namespace detail template class SVMTraitReadWrite { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_READ_WRITE | Trait::getSVMMemFlags(); } }; template class SVMTraitReadOnly { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_READ_ONLY | Trait::getSVMMemFlags(); } }; template class SVMTraitWriteOnly { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_WRITE_ONLY | Trait::getSVMMemFlags(); } }; template> class SVMTraitCoarse { public: static cl_svm_mem_flags getSVMMemFlags() { return Trait::getSVMMemFlags(); } }; template> class SVMTraitFine { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_SVM_FINE_GRAIN_BUFFER | Trait::getSVMMemFlags(); } }; template> class SVMTraitAtomic { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS | Trait::getSVMMemFlags(); } }; // Pre-declare SVM map function template inline cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL); /** * STL-like allocator class for managing SVM objects provided for convenience. * * Note that while this behaves like an allocator for the purposes of constructing vectors and similar objects, * care must be taken when using with smart pointers. * The allocator should not be used to construct a unique_ptr if we are using coarse-grained SVM mode because * the coarse-grained management behaviour would behave incorrectly with respect to reference counting. * * Instead the allocator embeds a Deleter which may be used with unique_ptr and is used * with the allocate_shared and allocate_ptr supplied operations. */ template class SVMAllocator { private: Context context_; public: typedef T value_type; typedef value_type* pointer; typedef const value_type* const_pointer; typedef value_type& reference; typedef const value_type& const_reference; typedef std::size_t size_type; typedef std::ptrdiff_t difference_type; template struct rebind { typedef SVMAllocator other; }; template friend class SVMAllocator; SVMAllocator() : context_(Context::getDefault()) { } explicit SVMAllocator(cl::Context context) : context_(context) { } SVMAllocator(const SVMAllocator &other) : context_(other.context_) { } template SVMAllocator(const SVMAllocator &other) : context_(other.context_) { } ~SVMAllocator() { } pointer address(reference r) CL_HPP_NOEXCEPT_ { return std::addressof(r); } const_pointer address(const_reference r) CL_HPP_NOEXCEPT_ { return std::addressof(r); } /** * Allocate an SVM pointer. * * If the allocator is coarse-grained, this will take ownership to allow * containers to correctly construct data in place. */ pointer allocate( size_type size, typename cl::SVMAllocator::const_pointer = 0) { // Allocate memory with default alignment matching the size of the type void* voidPointer = clSVMAlloc( context_(), SVMTrait::getSVMMemFlags(), size*sizeof(T), 0); pointer retValue = reinterpret_cast( voidPointer); #if defined(CL_HPP_ENABLE_EXCEPTIONS) if (!retValue) { std::bad_alloc excep; throw excep; } #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) // If allocation was coarse-grained then map it if (!(SVMTrait::getSVMMemFlags() & CL_MEM_SVM_FINE_GRAIN_BUFFER)) { cl_int err = enqueueMapSVM(retValue, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, size*sizeof(T)); if (err != CL_SUCCESS) { std::bad_alloc excep; throw excep; } } // If exceptions disabled, return null pointer from allocator return retValue; } void deallocate(pointer p, size_type) { clSVMFree(context_(), p); } /** * Return the maximum possible allocation size. * This is the minimum of the maximum sizes of all devices in the context. */ size_type max_size() const CL_HPP_NOEXCEPT_ { size_type maxSize = std::numeric_limits::max() / sizeof(T); for (const Device &d : context_.getInfo()) { maxSize = std::min( maxSize, static_cast(d.getInfo())); } return maxSize; } template< class U, class... Args > void construct(U* p, Args&&... args) { new(p)T(args...); } template< class U > void destroy(U* p) { p->~U(); } /** * Returns true if the contexts match. */ inline bool operator==(SVMAllocator const& rhs) { return (context_==rhs.context_); } inline bool operator!=(SVMAllocator const& a) { return !operator==(a); } }; // class SVMAllocator return cl::pointer(tmp, detail::Deleter{alloc, copies}); template class SVMAllocator { public: typedef void value_type; typedef value_type* pointer; typedef const value_type* const_pointer; template struct rebind { typedef SVMAllocator other; }; template friend class SVMAllocator; }; #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) namespace detail { template class Deleter { private: Alloc alloc_; size_type copies_; public: typedef typename std::allocator_traits::pointer pointer; Deleter(const Alloc &alloc, size_type copies) : alloc_{ alloc }, copies_{ copies } { } void operator()(pointer ptr) const { Alloc tmpAlloc{ alloc_ }; std::allocator_traits::destroy(tmpAlloc, std::addressof(*ptr)); std::allocator_traits::deallocate(tmpAlloc, ptr, copies_); } }; } // namespace detail /** * Allocation operation compatible with std::allocate_ptr. * Creates a unique_ptr by default. * This requirement is to ensure that the control block is not * allocated in memory inaccessible to the host. */ template cl::pointer> allocate_pointer(const Alloc &alloc_, Args&&... args) { Alloc alloc(alloc_); static const size_type copies = 1; // Ensure that creation of the management block and the // object are dealt with separately such that we only provide a deleter T* tmp = std::allocator_traits::allocate(alloc, copies); if (!tmp) { std::bad_alloc excep; throw excep; } try { std::allocator_traits::construct( alloc, std::addressof(*tmp), std::forward(args)...); return cl::pointer>(tmp, detail::Deleter{alloc, copies}); } catch (std::bad_alloc b) { std::allocator_traits::deallocate(alloc, tmp, copies); throw; } } template< class T, class SVMTrait, class... Args > cl::pointer>> allocate_svm(Args... args) { SVMAllocator alloc; return cl::allocate_pointer(alloc, args...); } template< class T, class SVMTrait, class... Args > cl::pointer>> allocate_svm(const cl::Context &c, Args... args) { SVMAllocator alloc(c); return cl::allocate_pointer(alloc, args...); } #endif // #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) /*! \brief Vector alias to simplify contruction of coarse-grained SVM containers. * */ template < class T > using coarse_svm_vector = vector>>; /*! \brief Vector alias to simplify contruction of fine-grained SVM containers. * */ template < class T > using fine_svm_vector = vector>>; /*! \brief Vector alias to simplify contruction of fine-grained SVM containers that support platform atomics. * */ template < class T > using atomic_svm_vector = vector>>; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for Buffer Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Buffer : public Memory { public: /*! \brief Constructs a Buffer in a specified context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. */ Buffer( const Context& context, cl_mem_flags flags, size_type size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Buffer in the default context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. * * \see Context::getDefault() */ Buffer( cl_mem_flags flags, size_type size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! * \brief Construct a Buffer from a host container via iterators. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer( IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); Context context = Context::getDefault(err); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { error = cl::copy(startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Construct a Buffer from a host container via iterators using a specified context. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); /*! * \brief Construct a Buffer from a host container via iterators using a specified queue. * If useHostPtr is specified iterators must be random access. */ template< typename IteratorType > Buffer(const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Buffer() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with earlier versions. * * See Memory for further details. */ explicit Buffer(const cl_mem& buffer, bool retainObject = false) : Memory(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Buffer& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Buffer(const Buffer& buf) : Memory(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Buffer& operator = (const Buffer &buf) { Memory::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Buffer(Buffer&& buf) CL_HPP_NOEXCEPT_ : Memory(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Buffer& operator = (Buffer &&buf) { Memory::operator=(std::move(buf)); return *this; } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Creates a new buffer object from this. * * Wraps clCreateSubBuffer(). */ Buffer createSubBuffer( cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * err = NULL) { Buffer result; cl_int error; result.object_ = ::clCreateSubBuffer( object_, flags, buffer_create_type, buffer_create_info, &error); detail::errHandler(error, __CREATE_SUBBUFFER_ERR); if (err != NULL) { *err = error; } return result; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 }; #if defined (CL_HPP_USE_DX_INTEROP) /*! \brief Class interface for creating OpenCL buffers from ID3D10Buffer's. * * This is provided to facilitate interoperability with Direct3D. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferD3D10 : public Buffer { public: /*! \brief Constructs a BufferD3D10, in a specified context, from a * given ID3D10Buffer. * * Wraps clCreateFromD3D10BufferKHR(). */ BufferD3D10( const Context& context, cl_mem_flags flags, ID3D10Buffer* bufobj, cl_int * err = NULL) : pfn_clCreateFromD3D10BufferKHR(nullptr) { typedef CL_API_ENTRY cl_mem (CL_API_CALL *PFN_clCreateFromD3D10BufferKHR)( cl_context context, cl_mem_flags flags, ID3D10Buffer* buffer, cl_int* errcode_ret); PFN_clCreateFromD3D10BufferKHR pfn_clCreateFromD3D10BufferKHR; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 vector props = context.getInfo(); cl_platform platform = -1; for( int i = 0; i < props.size(); ++i ) { if( props[i] == CL_CONTEXT_PLATFORM ) { platform = props[i+1]; } } CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clCreateFromD3D10BufferKHR); #elif CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clCreateFromD3D10BufferKHR); #endif cl_int error; object_ = pfn_clCreateFromD3D10BufferKHR( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferD3D10() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferD3D10(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferD3D10& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10(const BufferD3D10& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (const BufferD3D10 &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10(BufferD3D10&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (BufferD3D10 &&buf) { Buffer::operator=(std::move(buf)); return *this; } }; #endif /*! \brief Class interface for GL Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferGL : public Buffer { public: /*! \brief Constructs a BufferGL in a specified context, from a given * GL buffer. * * Wraps clCreateFromGLBuffer(). */ BufferGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLBuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferGL(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL(const BufferGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (const BufferGL &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferGL(BufferGL&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (BufferGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief Class interface for GL Render Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferRenderGL : public Buffer { public: /*! \brief Constructs a BufferRenderGL in a specified context, from a given * GL Renderbuffer. * * Wraps clCreateFromGLRenderbuffer(). */ BufferRenderGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLRenderbuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_RENDER_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferRenderGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferRenderGL(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferRenderGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL(const BufferRenderGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (const BufferRenderGL &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (BufferRenderGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief C++ base class for Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image : public Memory { protected: //! \brief Default constructor - initializes to NULL. Image() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image(const cl_mem& image, bool retainObject = false) : Memory(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image(const Image& img) : Memory(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image& operator = (const Image &img) { Memory::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image(Image&& img) CL_HPP_NOEXCEPT_ : Memory(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image& operator = (Image &&img) { Memory::operator=(std::move(img)); return *this; } public: //! \brief Wrapper for clGetImageInfo(). template cl_int getImageInfo(cl_image_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetImageInfo, object_, name, param), __GET_IMAGE_INFO_ERR); } //! \brief Wrapper for clGetImageInfo() that returns by value. template typename detail::param_traits::param_type getImageInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_image_info, name>::param_type param; cl_int result = getImageInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 1D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image1D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image1D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D, width, 0, 0, 0, 0, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image1D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1D(const cl_mem& image1D, bool retainObject = false) : Image(image1D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image1D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1D(const Image1D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1D& operator = (const Image1D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1D(Image1D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1D& operator = (Image1D &&img) { Image::operator=(std::move(img)); return *this; } }; /*! \class Image1DBuffer * \brief Image interface for 1D buffer images. */ class Image1DBuffer : public Image { public: Image1DBuffer( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, const Buffer &buffer, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_BUFFER, width, 0, 0, 0, 0, 0, 0, 0, buffer() }; object_ = ::clCreateImage( context(), flags, &format, &desc, NULL, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DBuffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1DBuffer(const cl_mem& image1D, bool retainObject = false) : Image(image1D, retainObject) { } Image1DBuffer& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer(const Image1DBuffer& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (const Image1DBuffer &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer(Image1DBuffer&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (Image1DBuffer &&img) { Image::operator=(std::move(img)); return *this; } }; /*! \class Image1DArray * \brief Image interface for arrays of 1D images. */ class Image1DArray : public Image { public: Image1DArray( const Context& context, cl_mem_flags flags, ImageFormat format, size_type arraySize, size_type width, size_type rowPitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_ARRAY, width, 0, 0, // height, depth (unused) arraySize, rowPitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DArray() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1DArray(const cl_mem& imageArray, bool retainObject = false) : Image(imageArray, retainObject) { } Image1DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray(const Image1DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (const Image1DArray &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray(Image1DArray&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (Image1DArray &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 2D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image2D : public Image { public: /*! \brief Constructs a 2D Image in a specified context. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, size_type height, size_type row_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif CL_HPP_TARGET_OPENCL_VERSION >= 120 useCreateImage = true; #else useCreateImage = false; #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 120 if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (!useCreateImage) { object_ = ::clCreateImage2D( context(), flags,&format, width, height, row_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE2D_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 || defined(CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR) /*! \brief Constructs a 2D Image from a buffer. * \note This will share storage with the underlying buffer. * * Wraps clCreateImage(). */ Image2D( const Context& context, ImageFormat format, const Buffer &sourceBuffer, size_type width, size_type height, size_type row_pitch = 0, cl_int* err = nullptr) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, // Use buffer as input to image sourceBuffer() }; object_ = ::clCreateImage( context(), 0, // flags inherited from buffer &format, &desc, nullptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != nullptr) { *err = error; } } #endif //#if CL_HPP_TARGET_OPENCL_VERSION >= 200 || defined(CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Constructs a 2D Image from an image. * \note This will share storage with the underlying image but may * reinterpret the channel order and type. * * The image will be created matching with a descriptor matching the source. * * \param order is the channel order to reinterpret the image data as. * The channel order may differ as described in the OpenCL * 2.0 API specification. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_channel_order order, const Image &sourceImage, cl_int* err = nullptr) { cl_int error; // Descriptor fields have to match source image size_type sourceWidth = sourceImage.getImageInfo(); size_type sourceHeight = sourceImage.getImageInfo(); size_type sourceRowPitch = sourceImage.getImageInfo(); cl_uint sourceNumMIPLevels = sourceImage.getImageInfo(); cl_uint sourceNumSamples = sourceImage.getImageInfo(); cl_image_format sourceFormat = sourceImage.getImageInfo(); // Update only the channel order. // Channel format inherited from source. sourceFormat.image_channel_order = order; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, sourceWidth, sourceHeight, 0, 0, // depth (unused), array size (unused) sourceRowPitch, 0, // slice pitch (unused) sourceNumMIPLevels, sourceNumSamples, // Use buffer as input to image sourceImage() }; object_ = ::clCreateImage( context(), 0, // flags should be inherited from mem_object &sourceFormat, &desc, nullptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != nullptr) { *err = error; } } #endif //#if CL_HPP_TARGET_OPENCL_VERSION >= 200 //! \brief Default constructor - initializes to NULL. Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2D(const cl_mem& image2D, bool retainObject = false) : Image(image2D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2D(const Image2D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2D& operator = (const Image2D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2D(Image2D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2D& operator = (Image2D &&img) { Image::operator=(std::move(img)); return *this; } }; #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for GL 2D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory * \note Deprecated for OpenCL 1.2. Please use ImageGL instead. */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED Image2DGL : public Image2D { public: /*! \brief Constructs an Image2DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture2D(). */ Image2DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture2D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_2D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image2DGL() : Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2DGL(const cl_mem& image, bool retainObject = false) : Image2D(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. *c * See Memory for further details. */ Image2DGL& operator = (const cl_mem& rhs) { Image2D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL(const Image2DGL& img) : Image2D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (const Image2DGL &img) { Image2D::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL(Image2DGL&& img) CL_HPP_NOEXCEPT_ : Image2D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (Image2DGL &&img) { Image2D::operator=(std::move(img)); return *this; } } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \class Image2DArray * \brief Image interface for arrays of 2D images. */ class Image2DArray : public Image { public: Image2DArray( const Context& context, cl_mem_flags flags, ImageFormat format, size_type arraySize, size_type width, size_type height, size_type rowPitch, size_type slicePitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D_ARRAY, width, height, 0, // depth (unused) arraySize, rowPitch, slicePitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image2DArray() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2DArray(const cl_mem& imageArray, bool retainObject = false) : Image(imageArray, retainObject) { } Image2DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray(const Image2DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (const Image2DArray &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray(Image2DArray&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (Image2DArray &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 3D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3D : public Image { public: /*! \brief Constructs a 3D Image in a specified context. * * Wraps clCreateImage(). */ Image3D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, size_type height, size_type depth, size_type row_pitch = 0, size_type slice_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif CL_HPP_TARGET_OPENCL_VERSION >= 120 useCreateImage = true; #else useCreateImage = false; #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 120 if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE3D, width, height, depth, 0, // array size (unused) row_pitch, slice_pitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (!useCreateImage) { object_ = ::clCreateImage3D( context(), flags, &format, width, height, depth, row_pitch, slice_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE3D_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 } //! \brief Default constructor - initializes to NULL. Image3D() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image3D(const cl_mem& image3D, bool retainObject = false) : Image(image3D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3D(const Image3D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3D& operator = (const Image3D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3D(Image3D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3D& operator = (Image3D &&img) { Image::operator=(std::move(img)); return *this; } }; #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for GL 3D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3DGL : public Image3D { public: /*! \brief Constructs an Image3DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture3D(). */ Image3DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture3D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_3D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image3DGL() : Image3D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image3DGL(const cl_mem& image, bool retainObject = false) : Image3D(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3DGL& operator = (const cl_mem& rhs) { Image3D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL(const Image3DGL& img) : Image3D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (const Image3DGL &img) { Image3D::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL(Image3DGL&& img) CL_HPP_NOEXCEPT_ : Image3D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (Image3DGL &&img) { Image3D::operator=(std::move(img)); return *this; } }; #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \class ImageGL * \brief general image interface for GL interop. * We abstract the 2D and 3D GL images into a single instance here * that wraps all GL sourced images on the grounds that setup information * was performed by OpenCL anyway. */ class ImageGL : public Image { public: ImageGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_ERR); if (err != NULL) { *err = error; } } ImageGL() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit ImageGL(const cl_mem& image, bool retainObject = false) : Image(image, retainObject) { } ImageGL& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL(const ImageGL& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (const ImageGL &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ ImageGL(ImageGL&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (ImageGL &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for Pipe Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Pipe : public Memory { public: /*! \brief Constructs a Pipe in a specified context. * * Wraps clCreatePipe(). * @param context Context in which to create the pipe. * @param flags Bitfield. Only CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS are valid. * @param packet_size Size in bytes of a single packet of the pipe. * @param max_packets Number of packets that may be stored in the pipe. * */ Pipe( const Context& context, cl_uint packet_size, cl_uint max_packets, cl_int* err = NULL) { cl_int error; cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS; object_ = ::clCreatePipe(context(), flags, packet_size, max_packets, nullptr, &error); detail::errHandler(error, __CREATE_PIPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Pipe in a the default context. * * Wraps clCreatePipe(). * @param flags Bitfield. Only CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS are valid. * @param packet_size Size in bytes of a single packet of the pipe. * @param max_packets Number of packets that may be stored in the pipe. * */ Pipe( cl_uint packet_size, cl_uint max_packets, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS; object_ = ::clCreatePipe(context(), flags, packet_size, max_packets, nullptr, &error); detail::errHandler(error, __CREATE_PIPE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Pipe() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with earlier versions. * * See Memory for further details. */ explicit Pipe(const cl_mem& pipe, bool retainObject = false) : Memory(pipe, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Pipe& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Pipe(const Pipe& pipe) : Memory(pipe) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Pipe& operator = (const Pipe &pipe) { Memory::operator=(pipe); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Pipe(Pipe&& pipe) CL_HPP_NOEXCEPT_ : Memory(std::move(pipe)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Pipe& operator = (Pipe &&pipe) { Memory::operator=(std::move(pipe)); return *this; } //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_pipe_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetPipeInfo, object_, name, param), __GET_PIPE_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_pipe_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; // class Pipe #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for cl_sampler. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_sampler as the original. For details, see * clRetainSampler() and clReleaseSampler(). * * \see cl_sampler */ class Sampler : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Sampler() { } /*! \brief Constructs a Sampler in a specified context. * * Wraps clCreateSampler(). */ Sampler( const Context& context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_sampler_properties sampler_properties[] = { CL_SAMPLER_NORMALIZED_COORDS, normalized_coords, CL_SAMPLER_ADDRESSING_MODE, addressing_mode, CL_SAMPLER_FILTER_MODE, filter_mode, 0 }; object_ = ::clCreateSamplerWithProperties( context(), sampler_properties, &error); detail::errHandler(error, __CREATE_SAMPLER_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateSampler( context(), normalized_coords, addressing_mode, filter_mode, &error); detail::errHandler(error, __CREATE_SAMPLER_ERR); if (err != NULL) { *err = error; } #endif } /*! \brief Constructor from cl_sampler - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_sampler * into the new Sampler object. */ explicit Sampler(const cl_sampler& sampler, bool retainObject = false) : detail::Wrapper(sampler, retainObject) { } /*! \brief Assignment operator from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseSampler() on the value previously held by this instance. */ Sampler& operator = (const cl_sampler& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Sampler(const Sampler& sam) : detail::Wrapper(sam) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Sampler& operator = (const Sampler &sam) { detail::Wrapper::operator=(sam); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Sampler(Sampler&& sam) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(sam)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Sampler& operator = (Sampler &&sam) { detail::Wrapper::operator=(std::move(sam)); return *this; } //! \brief Wrapper for clGetSamplerInfo(). template cl_int getInfo(cl_sampler_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetSamplerInfo, object_, name, param), __GET_SAMPLER_INFO_ERR); } //! \brief Wrapper for clGetSamplerInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_sampler_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; class Program; class CommandQueue; class DeviceCommandQueue; class Kernel; //! \brief Class interface for specifying NDRange values. class NDRange { private: size_type sizes_[3]; cl_uint dimensions_; public: //! \brief Default constructor - resulting range has zero dimensions. NDRange() : dimensions_(0) { sizes_[0] = 0; sizes_[1] = 0; sizes_[2] = 0; } //! \brief Constructs one-dimensional range. NDRange(size_type size0) : dimensions_(1) { sizes_[0] = size0; sizes_[1] = 1; sizes_[2] = 1; } //! \brief Constructs two-dimensional range. NDRange(size_type size0, size_type size1) : dimensions_(2) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = 1; } //! \brief Constructs three-dimensional range. NDRange(size_type size0, size_type size1, size_type size2) : dimensions_(3) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = size2; } /*! \brief Conversion operator to const size_type *. * * \returns a pointer to the size of the first dimension. */ operator const size_type*() const { return sizes_; } //! \brief Queries the number of dimensions in the range. size_type dimensions() const { return dimensions_; } //! \brief Returns the size of the object in bytes based on the // runtime number of dimensions size_type size() const { return dimensions_*sizeof(size_type); } size_type* get() { return sizes_; } const size_type* get() const { return sizes_; } }; //! \brief A zero-dimensional range. static const NDRange NullRange; //! \brief Local address wrapper for use with Kernel::setArg struct LocalSpaceArg { size_type size_; }; namespace detail { template struct KernelArgumentHandler; // Enable for objects that are not subclasses of memory // Pointers, constants etc template struct KernelArgumentHandler::value>::type> { static size_type size(const T&) { return sizeof(T); } static const T* ptr(const T& value) { return &value; } }; // Enable for subclasses of memory where we want to get a reference to the cl_mem out // and pass that in for safety template struct KernelArgumentHandler::value>::type> { static size_type size(const T&) { return sizeof(cl_mem); } static const cl_mem* ptr(const T& value) { return &(value()); } }; // Specialization for DeviceCommandQueue defined later template <> struct KernelArgumentHandler { static size_type size(const LocalSpaceArg& value) { return value.size_; } static const void* ptr(const LocalSpaceArg&) { return NULL; } }; } //! \endcond /*! Local * \brief Helper function for generating LocalSpaceArg objects. */ inline LocalSpaceArg Local(size_type size) { LocalSpaceArg ret = { size }; return ret; } /*! \brief Class interface for cl_kernel. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_kernel as the original. For details, see * clRetainKernel() and clReleaseKernel(). * * \see cl_kernel */ class Kernel : public detail::Wrapper { public: inline Kernel(const Program& program, const char* name, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Kernel() { } /*! \brief Constructor from cl_kernel - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_kernel * into the new Kernel object. */ explicit Kernel(const cl_kernel& kernel, bool retainObject = false) : detail::Wrapper(kernel, retainObject) { } /*! \brief Assignment operator from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseKernel() on the value previously held by this instance. */ Kernel& operator = (const cl_kernel& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Kernel(const Kernel& kernel) : detail::Wrapper(kernel) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Kernel& operator = (const Kernel &kernel) { detail::Wrapper::operator=(kernel); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Kernel(Kernel&& kernel) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(kernel)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Kernel& operator = (Kernel &&kernel) { detail::Wrapper::operator=(std::move(kernel)); return *this; } template cl_int getInfo(cl_kernel_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelInfo, object_, name, param), __GET_KERNEL_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getArgInfo(cl_uint argIndex, cl_kernel_arg_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelArgInfo, object_, argIndex, name, param), __GET_KERNEL_ARG_INFO_ERR); } template typename detail::param_traits::param_type getArgInfo(cl_uint argIndex, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_arg_info, name>::param_type param; cl_int result = getArgInfo(argIndex, name, ¶m); if (err != NULL) { *err = result; } return param; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getWorkGroupInfo( const Device& device, cl_kernel_work_group_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetKernelWorkGroupInfo, object_, device(), name, param), __GET_KERNEL_WORK_GROUP_INFO_ERR); } template typename detail::param_traits::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_work_group_info, name>::param_type param; cl_int result = getWorkGroupInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) cl_int getSubGroupInfo(const cl::Device &dev, cl_kernel_sub_group_info name, const cl::NDRange &range, size_type* param) const { typedef clGetKernelSubGroupInfoKHR_fn PFN_clGetKernelSubGroupInfoKHR; static PFN_clGetKernelSubGroupInfoKHR pfn_clGetKernelSubGroupInfoKHR = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_(clGetKernelSubGroupInfoKHR); return detail::errHandler( pfn_clGetKernelSubGroupInfoKHR(object_, dev(), name, range.size(), range.get(), sizeof(size_type), param, nullptr), __GET_KERNEL_ARG_INFO_ERR); } template size_type getSubGroupInfo(const cl::Device &dev, const cl::NDRange &range, cl_int* err = NULL) const { size_type param; cl_int result = getSubGroupInfo(dev, name, range, ¶m); if (err != NULL) { *err = result; } return param; } #endif // #if defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief setArg overload taking a shared_ptr type */ template cl_int setArg(cl_uint index, const cl::pointer &argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr.get()), __SET_KERNEL_ARGS_ERR); } /*! \brief setArg overload taking a vector type. */ template cl_int setArg(cl_uint index, const cl::vector &argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr.data()), __SET_KERNEL_ARGS_ERR); } /*! \brief setArg overload taking a pointer type */ template typename std::enable_if::value, cl_int>::type setArg(cl_uint index, const T argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr), __SET_KERNEL_ARGS_ERR); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief setArg overload taking a POD type */ template typename std::enable_if::value, cl_int>::type setArg(cl_uint index, const T &value) { return detail::errHandler( ::clSetKernelArg( object_, index, detail::KernelArgumentHandler::size(value), detail::KernelArgumentHandler::ptr(value)), __SET_KERNEL_ARGS_ERR); } cl_int setArg(cl_uint index, size_type size, const void* argPtr) { return detail::errHandler( ::clSetKernelArg(object_, index, size, argPtr), __SET_KERNEL_ARGS_ERR); } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! * Specify a vector of SVM pointers that the kernel may access in * addition to its arguments. */ cl_int setSVMPointers(const vector &pointerList) { return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*pointerList.size(), pointerList.data())); } /*! * Specify a std::array of SVM pointers that the kernel may access in * addition to its arguments. */ template cl_int setSVMPointers(const std::array &pointerList) { return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*pointerList.size(), pointerList.data())); } /*! \brief Enable fine-grained system SVM. * * \note It is only possible to enable fine-grained system SVM if all devices * in the context associated with kernel support it. * * \param svmEnabled True if fine-grained system SVM is requested. False otherwise. * \return CL_SUCCESS if the function was executed succesfully. CL_INVALID_OPERATION * if no devices in the context support fine-grained system SVM. * * \see clSetKernelExecInfo */ cl_int enableFineGrainedSystemSVM(bool svmEnabled) { cl_bool svmEnabled_ = svmEnabled ? CL_TRUE : CL_FALSE; return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM, sizeof(cl_bool), &svmEnabled_ ) ); } template void setSVMPointersHelper(std::array &pointerList, const pointer &t0, Ts... ts) { pointerList[index] = static_cast(t0.get()); setSVMPointersHelper(ts...); } template typename std::enable_if::value, void>::type setSVMPointersHelper(std::array &pointerList, T0 t0, Ts... ts) { pointerList[index] = static_cast(t0); setSVMPointersHelper(ts...); } template void setSVMPointersHelper(std::array &pointerList, const pointer &t0) { pointerList[index] = static_cast(t0.get()); } template typename std::enable_if::value, void>::type setSVMPointersHelper(std::array &pointerList, T0 t0) { pointerList[index] = static_cast(t0); } template cl_int setSVMPointers(const T0 &t0, Ts... ts) { std::array pointerList; setSVMPointersHelper<0, 1 + sizeof...(Ts)>(pointerList, t0, ts...); return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*(1 + sizeof...(Ts)), pointerList.data())); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 }; /*! \class Program * \brief Program interface that implements cl_program. */ class Program : public detail::Wrapper { public: #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) typedef vector> Binaries; typedef vector Sources; #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) typedef vector > Binaries; typedef vector > Sources; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) Program( const string& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const size_type length = source.size(); Context context = Context::getDefault(err); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) "-cl-std=CL2.0", #else "", #endif // #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) NULL, NULL); detail::buildErrHandler(error, __BUILD_PROGRAM_ERR, getBuildInfo()); } if (err != NULL) { *err = error; } } Program( const Context& context, const string& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const size_type length = source.size(); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) "-cl-std=CL2.0", #else "", #endif // #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) NULL, NULL); detail::buildErrHandler(error, __BUILD_PROGRAM_ERR, getBuildInfo()); } if (err != NULL) { *err = error; } } /** * Create a program from a vector of source strings and the default context. * Does not compile or link the program. */ Program( const Sources& sources, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); const size_type n = (size_type)sources.size(); vector lengths(n); vector strings(n); for (size_type i = 0; i < n; ++i) { #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].data(); lengths[i] = sources[(int)i].length(); #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings.data(), lengths.data(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Create a program from a vector of source strings and a provided context. * Does not compile or link the program. */ Program( const Context& context, const Sources& sources, cl_int* err = NULL) { cl_int error; const size_type n = (size_type)sources.size(); vector lengths(n); vector strings(n); for (size_type i = 0; i < n; ++i) { #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].data(); lengths[i] = sources[(int)i].length(); #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings.data(), lengths.data(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Construct a program object from a list of devices and a per-device list of binaries. * \param context A valid OpenCL context in which to construct the program. * \param devices A vector of OpenCL device objects for which the program will be created. * \param binaries A vector of pairs of a pointer to a binary object and its length. * \param binaryStatus An optional vector that on completion will be resized to * match the size of binaries and filled with values to specify if each binary * was successfully loaded. * Set to CL_SUCCESS if the binary was successfully loaded. * Set to CL_INVALID_VALUE if the length is 0 or the binary pointer is NULL. * Set to CL_INVALID_BINARY if the binary provided is not valid for the matching device. * \param err if non-NULL will be set to CL_SUCCESS on successful operation or one of the following errors: * CL_INVALID_CONTEXT if context is not a valid context. * CL_INVALID_VALUE if the length of devices is zero; or if the length of binaries does not match the length of devices; * or if any entry in binaries is NULL or has length 0. * CL_INVALID_DEVICE if OpenCL devices listed in devices are not in the list of devices associated with context. * CL_INVALID_BINARY if an invalid program binary was encountered for any device. binaryStatus will return specific status for each device. * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ Program( const Context& context, const vector& devices, const Binaries& binaries, vector* binaryStatus = NULL, cl_int* err = NULL) { cl_int error; const size_type numDevices = devices.size(); // Catch size mismatch early and return if(binaries.size() != numDevices) { error = CL_INVALID_VALUE; detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } return; } vector lengths(numDevices); vector images(numDevices); #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) for (size_type i = 0; i < numDevices; ++i) { images[i] = binaries[i].data(); lengths[i] = binaries[(int)i].size(); } #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) for (size_type i = 0; i < numDevices; ++i) { images[i] = (const unsigned char*)binaries[i].first; lengths[i] = binaries[(int)i].second; } #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } if(binaryStatus) { binaryStatus->resize(numDevices); } object_ = ::clCreateProgramWithBinary( context(), (cl_uint) devices.size(), deviceIDs.data(), lengths.data(), images.data(), (binaryStatus != NULL && numDevices > 0) ? &binaryStatus->front() : NULL, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Create program using builtin kernels. * \param kernelNames Semi-colon separated list of builtin kernel names */ Program( const Context& context, const vector& devices, const string& kernelNames, cl_int* err = NULL) { cl_int error; size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateProgramWithBuiltInKernels( context(), (cl_uint) devices.size(), deviceIDs.data(), kernelNames.c_str(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 Program() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit Program(const cl_program& program, bool retainObject = false) : detail::Wrapper(program, retainObject) { } Program& operator = (const cl_program& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Program(const Program& program) : detail::Wrapper(program) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Program& operator = (const Program &program) { detail::Wrapper::operator=(program); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Program(Program&& program) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(program)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Program& operator = (Program &&program) { detail::Wrapper::operator=(std::move(program)); return *this; } cl_int build( const vector& devices, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } cl_int buildError = ::clBuildProgram( object_, (cl_uint) devices.size(), deviceIDs.data(), options, notifyFptr, data); return detail::buildErrHandler(buildError, __BUILD_PROGRAM_ERR, getBuildInfo()); } cl_int build( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { cl_int buildError = ::clBuildProgram( object_, 0, NULL, options, notifyFptr, data); return detail::buildErrHandler(buildError, __BUILD_PROGRAM_ERR, getBuildInfo()); } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int compile( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { cl_int error = ::clCompileProgram( object_, 0, NULL, options, 0, NULL, NULL, notifyFptr, data); return detail::buildErrHandler(error, __COMPILE_PROGRAM_ERR, getBuildInfo()); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getInfo(cl_program_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int getBuildInfo( const Device& device, cl_program_build_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetProgramBuildInfo, object_, device(), name, param), __GET_PROGRAM_BUILD_INFO_ERR); } template typename detail::param_traits::param_type getBuildInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; cl_int result = getBuildInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } /** * Build info function that returns a vector of device/info pairs for the specified * info type and for all devices in the program. * On an error reading the info for any device, an empty vector of info will be returned. */ template vector::param_type>> getBuildInfo(cl_int *err = NULL) const { cl_int result = CL_SUCCESS; auto devs = getInfo(&result); vector::param_type>> devInfo; // If there was an initial error from getInfo return the error if (result != CL_SUCCESS) { if (err != NULL) { *err = result; } return devInfo; } for (const cl::Device &d : devs) { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; result = getBuildInfo(d, name, ¶m); devInfo.push_back( std::pair::param_type> (d, param)); if (result != CL_SUCCESS) { // On error, leave the loop and return the error code break; } } if (err != NULL) { *err = result; } if (result != CL_SUCCESS) { devInfo.clear(); } return devInfo; } cl_int createKernels(vector* kernels) { cl_uint numKernels; cl_int err = ::clCreateKernelsInProgram(object_, 0, NULL, &numKernels); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } vector value(numKernels); err = ::clCreateKernelsInProgram( object_, numKernels, value.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } if (kernels) { kernels->resize(value.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < value.size(); i++) { // We do not need to retain because this kernel is being created // by the runtime (*kernels)[i] = Kernel(value[i], false); } } return CL_SUCCESS; } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 inline Program linkProgram( Program input1, Program input2, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program programs[2] = { input1(), input2() }; Context ctx = input1.getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, 2, programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } inline Program linkProgram( vector inputPrograms, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; vector programs(inputPrograms.size()); for (unsigned int i = 0; i < inputPrograms.size(); i++) { programs[i] = inputPrograms[i](); } Context ctx; if(inputPrograms.size() > 0) { ctx = inputPrograms[0].getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, (cl_uint)inputPrograms.size(), programs.data(), notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog, false); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 // Template specialization for CL_PROGRAM_BINARIES template <> inline cl_int cl::Program::getInfo(cl_program_info name, vector>* param) const { if (name != CL_PROGRAM_BINARIES) { return CL_INVALID_VALUE; } if (param) { // Resize the parameter array appropriately for each allocation // and pass down to the helper vector sizes = getInfo(); size_type numBinaries = sizes.size(); // Resize the parameter array and constituent arrays param->resize(numBinaries); for (size_type i = 0; i < numBinaries; ++i) { (*param)[i].resize(sizes[i]); } return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } return CL_SUCCESS; } template<> inline vector> cl::Program::getInfo(cl_int* err) const { vector> binariesVectors; cl_int result = getInfo(CL_PROGRAM_BINARIES, &binariesVectors); if (err != NULL) { *err = result; } return binariesVectors; } inline Kernel::Kernel(const Program& program, const char* name, cl_int* err) { cl_int error; object_ = ::clCreateKernel(program(), name, &error); detail::errHandler(error, __CREATE_KERNEL_ERR); if (err != NULL) { *err = error; } } enum class QueueProperties : cl_command_queue_properties { None = 0, Profiling = CL_QUEUE_PROFILING_ENABLE, OutOfOrder = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, }; inline QueueProperties operator|(QueueProperties lhs, QueueProperties rhs) { return static_cast(static_cast(lhs) | static_cast(rhs)); } /*! \class CommandQueue * \brief CommandQueue interface for cl_command_queue. */ class CommandQueue : public detail::Wrapper { private: static std::once_flag default_initialized_; static CommandQueue default_; static cl_int default_error_; /*! \brief Create the default command queue returned by @ref getDefault. * * It sets default_error_ to indicate success or failure. It does not throw * @c cl::Error. */ static void makeDefault() { /* We don't want to throw an error from this function, so we have to * catch and set the error flag. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { int error; Context context = Context::getDefault(&error); if (error != CL_SUCCESS) { default_error_ = error; } else { Device device = Device::getDefault(); default_ = CommandQueue(context, device, 0, &default_error_); } } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default command queue. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const CommandQueue &c) { default_ = c; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = CommandQueue(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE /*! * \brief Constructs a CommandQueue based on passed properties. * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( cl_command_queue_properties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; if ((properties & CL_QUEUE_ON_DEVICE) == 0) { object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); } else { error = CL_INVALID_QUEUE_PROPERTIES; } detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } } /*! * \brief Constructs a CommandQueue based on passed properties. * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( QueueProperties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ explicit CommandQueue( const Context& context, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; vector devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; if ((properties & CL_QUEUE_ON_DEVICE) == 0) { object_ = ::clCreateCommandQueueWithProperties( context(), devices[0](), queue_properties, &error); } else { error = CL_INVALID_QUEUE_PROPERTIES; } detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), devices[0](), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ explicit CommandQueue( const Context& context, QueueProperties properties, cl_int* err = NULL) { cl_int error; vector devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), devices[0](), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), devices[0](), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for a passed device and context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( const Context& context, const Device& device, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for a passed device and context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( const Context& context, const Device& device, QueueProperties properties, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } static CommandQueue getDefault(cl_int * err = NULL) { std::call_once(default_initialized_, makeDefault); #if CL_HPP_TARGET_OPENCL_VERSION >= 200 detail::errHandler(default_error_, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); #else // CL_HPP_TARGET_OPENCL_VERSION >= 200 detail::errHandler(default_error_, __CREATE_COMMAND_QUEUE_ERR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 if (err != NULL) { *err = default_error_; } return default_; } /** * Modify the default command queue to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default command queue. * Should be compared to the passed value to ensure that it was updated. */ static CommandQueue setDefault(const CommandQueue &default_queue) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_queue)); detail::errHandler(default_error_); return default_; } CommandQueue() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit CommandQueue(const cl_command_queue& commandQueue, bool retainObject = false) : detail::Wrapper(commandQueue, retainObject) { } CommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue(const CommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (const CommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue(CommandQueue&& queue) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (CommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, const void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, size_type src_offset, size_type dst_offset, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBuffer( object_, src(), dst(), src_offset, dst_offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, void *ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBufferRect( object_, buffer(), blocking, buffer_offset.data(), host_offset.data(), region.data(), buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, const void *ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBufferRect( object_, buffer(), blocking, buffer_offset.data(), host_offset.data(), region.data(), buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const array& src_origin, const array& dst_origin, const array& region, size_type src_row_pitch, size_type src_slice_pitch, size_type dst_row_pitch, size_type dst_slice_pitch, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferRect( object_, src(), dst(), src_origin.data(), dst_origin.data(), region.data(), src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueue a command to fill a buffer object with a pattern * of a given size. The pattern is specified as a vector type. * \tparam PatternType The datatype of the pattern field. * The pattern type must be an accepted OpenCL data type. * \tparam offset Is the offset in bytes into the buffer at * which to start filling. This must be a multiple of * the pattern size. * \tparam size Is the size in bytes of the region to fill. * This must be a multiple of the pattern size. */ template cl_int enqueueFillBuffer( const Buffer& buffer, PatternType pattern, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillBuffer( object_, buffer(), static_cast(&pattern), sizeof(PatternType), offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueReadImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadImage( object_, image(), blocking, origin.data(), region.data(), row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, const void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteImage( object_, image(), blocking, origin.data(), region.data(), row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyImage( const Image& src, const Image& dst, const array& src_origin, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImage( object_, src(), dst(), src_origin.data(), dst_origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA floating-point color value if * the image channel data type is not an unnormalized signed or * unsigned data type. */ cl_int enqueueFillImage( const Image& image, cl_float4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA signed integer color value if * the image channel data type is an unnormalized signed integer * type. */ cl_int enqueueFillImage( const Image& image, cl_int4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA unsigned integer color value if * the image channel data type is an unnormalized unsigned integer * type. */ cl_int enqueueFillImage( const Image& image, cl_uint4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const array& src_origin, const array& region, size_type dst_offset, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImageToBuffer( object_, src(), dst(), src_origin.data(), region.data(), dst_offset, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, size_type src_offset, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferToImage( object_, src(), dst(), src_offset, dst_origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapBuffer( object_, buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } void* enqueueMapImage( const Image& buffer, cl_bool blocking, cl_map_flags flags, const array& origin, const array& region, size_type * row_pitch, size_type * slice_pitch, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapImage( object_, buffer(), blocking, flags, origin.data(), region.data(), row_pitch, slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_IMAGE_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a raw SVM pointer. */ template cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(ptr), size, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a cl::pointer instance. */ template cl_int enqueueMapSVM( cl::pointer &ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(ptr.get()), size, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a cl::vector instance. */ template cl_int enqueueMapSVM( cl::vector &container, cl_bool blocking, cl_map_flags flags, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(container.data()), container.size(), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( object_, memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a raw SVM pointer. */ template cl_int enqueueUnmapSVM( T* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(ptr), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a cl::pointer instance. */ template cl_int enqueueUnmapSVM( cl::pointer &ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(ptr.get()), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a cl::vector instance. */ template cl_int enqueueUnmapSVM( cl::vector &container, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(container.data()), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueues a marker command which waits for either a list of events to complete, * or all previously enqueued commands to complete. * * Enqueues a marker command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command returns an event which can be waited on, * i.e. this event can be waited on to insure that all events either in the event_wait_list * or all previously enqueued commands, queued before this command to command_queue, * have completed. */ cl_int enqueueMarkerWithWaitList( const vector *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarkerWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * A synchronization point that enqueues a barrier operation. * * Enqueues a barrier command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command blocks command execution, that is, any * following commands enqueued after it do not execute until it completes. This command * returns an event which can be waited on, i.e. this event can be waited on to insure that * all events either in the event_wait_list or all previously enqueued commands, queued * before this command to command_queue, have completed. */ cl_int enqueueBarrierWithWaitList( const vector *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueBarrierWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_BARRIER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command to indicate with which device a set of memory objects * should be associated. */ cl_int enqueueMigrateMemObjects( const vector &memObjects, cl_mem_migration_flags flags, const vector* events = NULL, Event* event = NULL ) const { cl_event tmp; vector localMemObjects(memObjects.size()); for( int i = 0; i < (int)memObjects.size(); ++i ) { localMemObjects[i] = memObjects[i](); } cl_int err = detail::errHandler( ::clEnqueueMigrateMemObjects( object_, (cl_uint)memObjects.size(), localMemObjects.data(), flags, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueNDRangeKernel( const Kernel& kernel, const NDRange& offset, const NDRange& global, const NDRange& local = NullRange, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNDRangeKernel( object_, kernel(), (cl_uint) global.dimensions(), offset.dimensions() != 0 ? (const size_type*) offset : NULL, (const size_type*) global, local.dimensions() != 0 ? (const size_type*) local : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NDRANGE_KERNEL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_int enqueueTask( const Kernel& kernel, const vector* events = NULL, Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueTask( object_, kernel(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_TASK_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) cl_int enqueueNativeKernel( void (CL_CALLBACK *userFptr)(void *), std::pair args, const vector* mem_objects = NULL, const vector* mem_locs = NULL, const vector* events = NULL, Event* event = NULL) const { size_type elements = 0; if (mem_objects != NULL) { elements = mem_objects->size(); } vector mems(elements); for (unsigned int i = 0; i < elements; i++) { mems[i] = ((*mem_objects)[i])(); } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNativeKernel( object_, userFptr, args.first, args.second, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, mems.data(), (mem_locs != NULL && mem_locs->size() > 0) ? (const void **) &mem_locs->front() : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NATIVE_KERNEL); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueMarker(Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarker( object_, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueWaitForEvents(const vector& events) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueWaitForEvents( object_, (cl_uint) events.size(), events.size() > 0 ? (const cl_event*) &events.front() : NULL), __ENQUEUE_WAIT_FOR_EVENTS_ERR); } #endif // defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) cl_int enqueueAcquireGLObjects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueAcquireGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseGLObjects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReleaseGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined (CL_HPP_USE_DX_INTEROP) typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueAcquireD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueReleaseD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); cl_int enqueueAcquireD3D10Objects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { static PFN_clEnqueueAcquireD3D10ObjectsKHR pfn_clEnqueueAcquireD3D10ObjectsKHR = NULL; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clEnqueueAcquireD3D10ObjectsKHR); #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clEnqueueAcquireD3D10ObjectsKHR); #endif cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueAcquireD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseD3D10Objects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { static PFN_clEnqueueReleaseD3D10ObjectsKHR pfn_clEnqueueReleaseD3D10ObjectsKHR = NULL; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clEnqueueReleaseD3D10ObjectsKHR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clEnqueueReleaseD3D10ObjectsKHR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueReleaseD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueBarrier() const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueBarrier(object_), __ENQUEUE_BARRIER_ERR); } #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS cl_int flush() const { return detail::errHandler(::clFlush(object_), __FLUSH_ERR); } cl_int finish() const { return detail::errHandler(::clFinish(object_), __FINISH_ERR); } }; // CommandQueue CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag CommandQueue::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ CommandQueue CommandQueue::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int CommandQueue::default_error_ = CL_SUCCESS; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 enum class DeviceQueueProperties : cl_command_queue_properties { None = 0, Profiling = CL_QUEUE_PROFILING_ENABLE, }; inline DeviceQueueProperties operator|(DeviceQueueProperties lhs, DeviceQueueProperties rhs) { return static_cast(static_cast(lhs) | static_cast(rhs)); } /*! \class DeviceCommandQueue * \brief DeviceCommandQueue interface for device cl_command_queues. */ class DeviceCommandQueue : public detail::Wrapper { public: /*! * Trivial empty constructor to create a null queue. */ DeviceCommandQueue() { } /*! * Default construct device command queue on default context and device */ DeviceCommandQueue(DeviceQueueProperties properties, cl_int* err = NULL) { cl_int error; cl::Context context = cl::Context::getDefault(); cl::Device device = cl::Device::getDefault(); cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! * Create a device command queue for a specified device in the passed context. */ DeviceCommandQueue( const Context& context, const Device& device, DeviceQueueProperties properties = DeviceQueueProperties::None, cl_int* err = NULL) { cl_int error; cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! * Create a device command queue for a specified device in the passed context. */ DeviceCommandQueue( const Context& context, const Device& device, cl_uint queueSize, DeviceQueueProperties properties = DeviceQueueProperties::None, cl_int* err = NULL) { cl_int error; cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, CL_QUEUE_SIZE, queueSize, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructor from cl_command_queue - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit DeviceCommandQueue(const cl_command_queue& commandQueue, bool retainObject = false) : detail::Wrapper(commandQueue, retainObject) { } DeviceCommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue(const DeviceCommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue& operator = (const DeviceCommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue(DeviceCommandQueue&& queue) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue& operator = (DeviceCommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! * Create a new default device command queue for the default device, * in the default context and of the default size. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( cl_int *err = nullptr) { cl_int error; cl::Context context = cl::Context::getDefault(); cl::Device device = cl::Device::getDefault(); cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } /*! * Create a new default device command queue for the specified device * and of the default size. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( const Context &context, const Device &device, cl_int *err = nullptr) { cl_int error; cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } /*! * Create a new default device command queue for the specified device * and of the requested size in bytes. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( const Context &context, const Device &device, cl_uint queueSize, cl_int *err = nullptr) { cl_int error; cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, CL_QUEUE_SIZE, queueSize, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } }; // DeviceCommandQueue namespace detail { // Specialization for device command queue template <> struct KernelArgumentHandler { static size_type size(const cl::DeviceCommandQueue&) { return sizeof(cl_command_queue); } static const cl_command_queue* ptr(const cl::DeviceCommandQueue& value) { return &(value()); } }; } // namespace detail #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 template< typename IteratorType > Buffer::Buffer( const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { CommandQueue queue(context, 0, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } template< typename IteratorType > Buffer::Buffer( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if (readOnly) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); Context context = queue.getInfo(); if (useHostPtr) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if (!useHostPtr) { error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } inline cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBuffer(buffer, blocking, offset, size, ptr, events, event); } inline cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, const void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBuffer(buffer, blocking, offset, size, ptr, events, event); } inline void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } void * result = ::clEnqueueMapBuffer( queue(), buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (cl_event*) event, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } return result; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a raw SVM pointer. */ template inline cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events, Event* event) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( ptr, blocking, flags, size, events, event); } /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a cl::pointer instance. */ template inline cl_int enqueueMapSVM( cl::pointer ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( ptr, blocking, flags, size, events, event); } /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a cl::vector instance. */ template inline cl_int enqueueMapSVM( cl::vector container, cl_bool blocking, cl_map_flags flags, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( container, blocking, flags, events, event); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 inline cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (error != CL_SUCCESS) { return error; } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( queue(), memory(), mapped_ptr, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a raw SVM pointer. */ template inline cl_int enqueueUnmapSVM( T* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(ptr, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a cl::pointer instance. */ template inline cl_int enqueueUnmapSVM( cl::pointer &ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(ptr, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a cl::vector instance. */ template inline cl_int enqueueUnmapSVM( cl::vector &container, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(container, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 inline cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, size_type src_offset, size_type dst_offset, size_type size, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBuffer(src, dst, src_offset, dst_offset, size, events, event); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, startIterator, endIterator, buffer); } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, buffer, startIterator, endIterator); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; size_type length = endIterator-startIterator; size_type byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_WRITE, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } #if defined(_MSC_VER) std::copy( startIterator, endIterator, stdext::checked_array_iterator( pointer, length)); #else std::copy(startIterator, endIterator, pointer); #endif Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; size_type length = endIterator-startIterator; size_type byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_READ, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } std::copy(pointer, pointer + length, startIterator); Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Blocking SVM map operation - performs a blocking map underneath. */ template inline cl_int mapSVM(cl::vector &container) { return enqueueMapSVM(container, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE); } /** * Blocking SVM map operation - performs a blocking map underneath. */ template inline cl_int unmapSVM(cl::vector &container) { return enqueueUnmapSVM(container); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 110 inline cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, void *ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, const void *ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const array& src_origin, const array& dst_origin, const array& region, size_type src_row_pitch, size_type src_slice_pitch, size_type dst_row_pitch, size_type dst_slice_pitch, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferRect( src, dst, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, events, event); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 inline cl_int enqueueReadImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, const void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueCopyImage( const Image& src, const Image& dst, const array& src_origin, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImage( src, dst, src_origin, dst_origin, region, events, event); } inline cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const array& src_origin, const array& region, size_type dst_offset, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImageToBuffer( src, dst, src_origin, region, dst_offset, events, event); } inline cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, size_type src_offset, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferToImage( src, dst, src_offset, dst_origin, region, events, event); } inline cl_int flush(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.flush(); } inline cl_int finish(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.finish(); } class EnqueueArgs { private: CommandQueue queue_; const NDRange offset_; const NDRange global_; const NDRange local_; vector events_; template friend class KernelFunctor; public: EnqueueArgs(NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { } EnqueueArgs(Event e, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(Event e, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(Event e, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(const vector &events, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(const vector &events, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(const vector &events, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(CommandQueue &queue, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, Event e, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local), events_(events) { } }; //---------------------------------------------------------------------------------------------- /** * Type safe kernel functor. * */ template class KernelFunctor { private: Kernel kernel_; template void setArgs(T0&& t0, T1s&&... t1s) { kernel_.setArg(index, t0); setArgs(std::forward(t1s)...); } template void setArgs(T0&& t0) { kernel_.setArg(index, t0); } template void setArgs() { } public: KernelFunctor(Kernel kernel) : kernel_(kernel) {} KernelFunctor( const Program& program, const string name, cl_int * err = NULL) : kernel_(program, name.c_str(), err) {} //! \brief Return type of the functor typedef Event result_type; /** * Enqueue kernel. * @param args Launch parameters of the kernel. * @param t0... List of kernel arguments based on the template type of the functor. */ Event operator() ( const EnqueueArgs& args, Ts... ts) { Event event; setArgs<0>(std::forward(ts)...); args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } /** * Enqueue kernel with support for error code. * @param args Launch parameters of the kernel. * @param t0... List of kernel arguments based on the template type of the functor. * @param error Out parameter returning the error code from the execution. */ Event operator() ( const EnqueueArgs& args, Ts... ts, cl_int &error) { Event event; setArgs<0>(std::forward(ts)...); error = args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_int setSVMPointers(const vector &pointerList) { return kernel_.setSVMPointers(pointerList); } template cl_int setSVMPointers(const T0 &t0, T1s... ts) { return kernel_.setSVMPointers(t0, ts...); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 Kernel getKernel() { return kernel_; } }; namespace compatibility { /** * Backward compatibility class to ensure that cl.hpp code works with cl2.hpp. * Please use KernelFunctor directly. */ template struct make_kernel { typedef KernelFunctor FunctorType; FunctorType functor_; make_kernel( const Program& program, const string name, cl_int * err = NULL) : functor_(FunctorType(program, name, err)) {} make_kernel( const Kernel kernel) : functor_(FunctorType(kernel)) {} //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, Ts...); Event operator()( const EnqueueArgs& enqueueArgs, Ts... args) { return functor_( enqueueArgs, args...); } }; } // namespace compatibility //---------------------------------------------------------------------------------------------------------------------- #undef CL_HPP_ERR_STR_ #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) #undef __GET_DEVICE_INFO_ERR #undef __GET_PLATFORM_INFO_ERR #undef __GET_DEVICE_IDS_ERR #undef __GET_CONTEXT_INFO_ERR #undef __GET_EVENT_INFO_ERR #undef __GET_EVENT_PROFILE_INFO_ERR #undef __GET_MEM_OBJECT_INFO_ERR #undef __GET_IMAGE_INFO_ERR #undef __GET_SAMPLER_INFO_ERR #undef __GET_KERNEL_INFO_ERR #undef __GET_KERNEL_ARG_INFO_ERR #undef __GET_KERNEL_WORK_GROUP_INFO_ERR #undef __GET_PROGRAM_INFO_ERR #undef __GET_PROGRAM_BUILD_INFO_ERR #undef __GET_COMMAND_QUEUE_INFO_ERR #undef __CREATE_CONTEXT_ERR #undef __CREATE_CONTEXT_FROM_TYPE_ERR #undef __GET_SUPPORTED_IMAGE_FORMATS_ERR #undef __CREATE_BUFFER_ERR #undef __CREATE_SUBBUFFER_ERR #undef __CREATE_IMAGE2D_ERR #undef __CREATE_IMAGE3D_ERR #undef __CREATE_SAMPLER_ERR #undef __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR #undef __CREATE_USER_EVENT_ERR #undef __SET_USER_EVENT_STATUS_ERR #undef __SET_EVENT_CALLBACK_ERR #undef __SET_PRINTF_CALLBACK_ERR #undef __WAIT_FOR_EVENTS_ERR #undef __CREATE_KERNEL_ERR #undef __SET_KERNEL_ARGS_ERR #undef __CREATE_PROGRAM_WITH_SOURCE_ERR #undef __CREATE_PROGRAM_WITH_BINARY_ERR #undef __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR #undef __BUILD_PROGRAM_ERR #undef __CREATE_KERNELS_IN_PROGRAM_ERR #undef __CREATE_COMMAND_QUEUE_ERR #undef __SET_COMMAND_QUEUE_PROPERTY_ERR #undef __ENQUEUE_READ_BUFFER_ERR #undef __ENQUEUE_WRITE_BUFFER_ERR #undef __ENQUEUE_READ_BUFFER_RECT_ERR #undef __ENQUEUE_WRITE_BUFFER_RECT_ERR #undef __ENQEUE_COPY_BUFFER_ERR #undef __ENQEUE_COPY_BUFFER_RECT_ERR #undef __ENQUEUE_READ_IMAGE_ERR #undef __ENQUEUE_WRITE_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR #undef __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR #undef __ENQUEUE_MAP_BUFFER_ERR #undef __ENQUEUE_MAP_IMAGE_ERR #undef __ENQUEUE_UNMAP_MEM_OBJECT_ERR #undef __ENQUEUE_NDRANGE_KERNEL_ERR #undef __ENQUEUE_TASK_ERR #undef __ENQUEUE_NATIVE_KERNEL #undef __UNLOAD_COMPILER_ERR #undef __CREATE_SUB_DEVICES_ERR #undef __CREATE_PIPE_ERR #undef __GET_PIPE_INFO_ERR #endif //CL_HPP_USER_OVERRIDE_ERROR_STRINGS // Extensions #undef CL_HPP_INIT_CL_EXT_FCN_PTR_ #undef CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_ #if defined(CL_HPP_USE_CL_DEVICE_FISSION) #undef CL_HPP_PARAM_NAME_DEVICE_FISSION_ #endif // CL_HPP_USE_CL_DEVICE_FISSION #undef CL_HPP_NOEXCEPT_ #undef CL_HPP_DEFINE_STATIC_MEMBER_ } // namespace cl #endif // CL_HPP_ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl_d3d10.h000066400000000000000000000120021450307266000232560ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D10_H #define __OPENCL_CL_D3D10_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d10_sharing */ #define cl_khr_d3d10_sharing 1 typedef cl_uint cl_d3d10_device_source_khr; typedef cl_uint cl_d3d10_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D10_DEVICE_KHR -1002 #define CL_INVALID_D3D10_RESOURCE_KHR -1003 #define CL_D3D10_RESOURCE_ALREADY_ACQUIRED_KHR -1004 #define CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR -1005 /* cl_d3d10_device_source_nv */ #define CL_D3D10_DEVICE_KHR 0x4010 #define CL_D3D10_DXGI_ADAPTER_KHR 0x4011 /* cl_d3d10_device_set_nv */ #define CL_PREFERRED_DEVICES_FOR_D3D10_KHR 0x4012 #define CL_ALL_DEVICES_FOR_D3D10_KHR 0x4013 /* cl_context_info */ #define CL_CONTEXT_D3D10_DEVICE_KHR 0x4014 #define CL_CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR 0x402C /* cl_mem_info */ #define CL_MEM_D3D10_RESOURCE_KHR 0x4015 /* cl_image_info */ #define CL_IMAGE_D3D10_SUBRESOURCE_KHR 0x4016 /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR 0x4017 #define CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR 0x4018 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D10KHR_fn)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D10_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl_d3d11.h000066400000000000000000000117741450307266000232760ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D11_H #define __OPENCL_CL_D3D11_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d11_sharing */ #define cl_khr_d3d11_sharing 1 typedef cl_uint cl_d3d11_device_source_khr; typedef cl_uint cl_d3d11_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D11_DEVICE_KHR -1006 #define CL_INVALID_D3D11_RESOURCE_KHR -1007 #define CL_D3D11_RESOURCE_ALREADY_ACQUIRED_KHR -1008 #define CL_D3D11_RESOURCE_NOT_ACQUIRED_KHR -1009 /* cl_d3d11_device_source */ #define CL_D3D11_DEVICE_KHR 0x4019 #define CL_D3D11_DXGI_ADAPTER_KHR 0x401A /* cl_d3d11_device_set */ #define CL_PREFERRED_DEVICES_FOR_D3D11_KHR 0x401B #define CL_ALL_DEVICES_FOR_D3D11_KHR 0x401C /* cl_context_info */ #define CL_CONTEXT_D3D11_DEVICE_KHR 0x401D #define CL_CONTEXT_D3D11_PREFER_SHARED_RESOURCES_KHR 0x402D /* cl_mem_info */ #define CL_MEM_D3D11_RESOURCE_KHR 0x401E /* cl_image_info */ #define CL_IMAGE_D3D11_SUBRESOURCE_KHR 0x401F /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D11_OBJECTS_KHR 0x4020 #define CL_COMMAND_RELEASE_D3D11_OBJECTS_KHR 0x4021 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D11KHR_fn)( cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void * d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D11_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl_dx9_media_sharing.h000066400000000000000000000124551450307266000260350ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_DX9_MEDIA_SHARING_H #define __OPENCL_CL_DX9_MEDIA_SHARING_H #include #include #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ /* cl_khr_dx9_media_sharing */ #define cl_khr_dx9_media_sharing 1 typedef cl_uint cl_dx9_media_adapter_type_khr; typedef cl_uint cl_dx9_media_adapter_set_khr; #if defined(_WIN32) #include typedef struct _cl_dx9_surface_info_khr { IDirect3DSurface9 *resource; HANDLE shared_handle; } cl_dx9_surface_info_khr; #endif /******************************************************************************/ /* Error Codes */ #define CL_INVALID_DX9_MEDIA_ADAPTER_KHR -1010 #define CL_INVALID_DX9_MEDIA_SURFACE_KHR -1011 #define CL_DX9_MEDIA_SURFACE_ALREADY_ACQUIRED_KHR -1012 #define CL_DX9_MEDIA_SURFACE_NOT_ACQUIRED_KHR -1013 /* cl_media_adapter_type_khr */ #define CL_ADAPTER_D3D9_KHR 0x2020 #define CL_ADAPTER_D3D9EX_KHR 0x2021 #define CL_ADAPTER_DXVA_KHR 0x2022 /* cl_media_adapter_set_khr */ #define CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2023 #define CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2024 /* cl_context_info */ #define CL_CONTEXT_ADAPTER_D3D9_KHR 0x2025 #define CL_CONTEXT_ADAPTER_D3D9EX_KHR 0x2026 #define CL_CONTEXT_ADAPTER_DXVA_KHR 0x2027 /* cl_mem_info */ #define CL_MEM_DX9_MEDIA_ADAPTER_TYPE_KHR 0x2028 #define CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR 0x2029 /* cl_image_info */ #define CL_IMAGE_DX9_MEDIA_PLANE_KHR 0x202A /* cl_command_type */ #define CL_COMMAND_ACQUIRE_DX9_MEDIA_SURFACES_KHR 0x202B #define CL_COMMAND_RELEASE_DX9_MEDIA_SURFACES_KHR 0x202C /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromDX9MediaAdapterKHR_fn)( cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr * media_adapter_type, void * media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceKHR_fn)( cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void * surface_info, cl_uint plane, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_DX9_MEDIA_SHARING_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl_egl.h000066400000000000000000000123561450307266000232260ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ #ifndef __OPENCL_CL_EGL_H #define __OPENCL_CL_EGL_H #ifdef __APPLE__ #else #include #endif #ifdef __cplusplus extern "C" { #endif /* Command type for events created with clEnqueueAcquireEGLObjectsKHR */ #define CL_COMMAND_EGL_FENCE_SYNC_OBJECT_KHR 0x202F #define CL_COMMAND_ACQUIRE_EGL_OBJECTS_KHR 0x202D #define CL_COMMAND_RELEASE_EGL_OBJECTS_KHR 0x202E /* Error type for clCreateFromEGLImageKHR */ #define CL_INVALID_EGL_OBJECT_KHR -1093 #define CL_EGL_RESOURCE_NOT_ACQUIRED_KHR -1092 /* CLeglImageKHR is an opaque handle to an EGLImage */ typedef void* CLeglImageKHR; /* CLeglDisplayKHR is an opaque handle to an EGLDisplay */ typedef void* CLeglDisplayKHR; /* CLeglSyncKHR is an opaque handle to an EGLSync object */ typedef void* CLeglSyncKHR; /* properties passed to clCreateFromEGLImageKHR */ typedef intptr_t cl_egl_image_properties_khr; #define cl_khr_egl_image 1 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromEGLImageKHR(cl_context /* context */, CLeglDisplayKHR /* egldisplay */, CLeglImageKHR /* eglimage */, cl_mem_flags /* flags */, const cl_egl_image_properties_khr * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromEGLImageKHR_fn)( cl_context context, CLeglDisplayKHR egldisplay, CLeglImageKHR eglimage, cl_mem_flags flags, const cl_egl_image_properties_khr * properties, cl_int * errcode_ret); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireEGLObjectsKHR(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireEGLObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseEGLObjectsKHR(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseEGLObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); #define cl_khr_egl_event 1 extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromEGLSyncKHR(cl_context /* context */, CLeglSyncKHR /* sync */, CLeglDisplayKHR /* display */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_event (CL_API_CALL *clCreateEventFromEGLSyncKHR_fn)( cl_context context, CLeglSyncKHR sync, CLeglDisplayKHR display, cl_int * errcode_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_EGL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl_ext.h000066400000000000000000000767461450307266000232740ustar00rootroot00000000000000/* Modifications Copyright (C) 2010-2021 Advanced Micro Devices, Inc. */ /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /* $Revision: 11928 $ on $Date: 2010-07-13 09:04:56 -0700 (Tue, 13 Jul 2010) $ */ /* cl_ext.h contains OpenCL extensions which don't have external */ /* (OpenGL, D3D) dependencies. */ #ifndef __CL_EXT_H #define __CL_EXT_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #include #else #include #endif /* cl_khr_fp16 extension - no extension #define since it has no functions */ #define CL_DEVICE_HALF_FP_CONFIG 0x1033 /* Memory object destruction * * Apple extension for use to manage externally allocated buffers used with cl_mem objects with CL_MEM_USE_HOST_PTR * * Registers a user callback function that will be called when the memory object is deleted and its resources * freed. Each call to clSetMemObjectCallbackFn registers the specified user callback function on a callback * stack associated with memobj. The registered user callback functions are called in the reverse order in * which they were registered. The user callback functions are called and then the memory object is deleted * and its resources freed. This provides a mechanism for the application (and libraries) using memobj to be * notified when the memory referenced by host_ptr, specified when the memory object is created and used as * the storage bits for the memory object, can be reused or freed. * * The application may not call CL api's with the cl_mem object passed to the pfn_notify. * * Please check for the "cl_APPLE_SetMemObjectDestructor" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. */ #define cl_APPLE_SetMemObjectDestructor 1 cl_int CL_API_ENTRY clSetMemObjectDestructorAPPLE( cl_mem /* memobj */, void (* /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/), void * /*user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* Context Logging Functions * * The next three convenience functions are intended to be used as the pfn_notify parameter to clCreateContext(). * Please check for the "cl_APPLE_ContextLoggingFunctions" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. * * clLogMessagesToSystemLog fowards on all log messages to the Apple System Logger */ #define cl_APPLE_ContextLoggingFunctions 1 extern void CL_API_ENTRY clLogMessagesToSystemLogAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStdout sends all log messages to the file descriptor stdout */ extern void CL_API_ENTRY clLogMessagesToStdoutAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStderr sends all log messages to the file descriptor stderr */ extern void CL_API_ENTRY clLogMessagesToStderrAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /************************ * cl_khr_icd extension * ************************/ #define cl_khr_icd 1 /* cl_platform_info */ #define CL_PLATFORM_ICD_SUFFIX_KHR 0x0920 /* Additional Error Codes */ #define CL_PLATFORM_NOT_FOUND_KHR -1001 extern CL_API_ENTRY cl_int CL_API_CALL clIcdGetPlatformIDsKHR(cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */); typedef CL_API_ENTRY cl_int (CL_API_CALL *clIcdGetPlatformIDsKHR_fn)( cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */); /* Extension: cl_khr_image2D_buffer * * This extension allows a 2D image to be created from a cl_mem buffer without a copy. * The type associated with a 2D image created from a buffer in an OpenCL program is image2d_t. * Both the sampler and sampler-less read_image built-in functions are supported for 2D images * and 2D images created from a buffer. Similarly, the write_image built-ins are also supported * for 2D images created from a buffer. * * When the 2D image from buffer is created, the client must specify the width, * height, image format (i.e. channel order and channel data type) and optionally the row pitch * * The pitch specified must be a multiple of CL_DEVICE_IMAGE_PITCH_ALIGNMENT pixels. * The base address of the buffer must be aligned to CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT pixels. */ /************************************* * cl_khr_initalize_memory extension * *************************************/ #define CL_CONTEXT_MEMORY_INITIALIZE_KHR 0x2030 /************************************** * cl_khr_terminate_context extension * **************************************/ #define CL_DEVICE_TERMINATE_CAPABILITY_KHR 0x2031 #define CL_CONTEXT_TERMINATE_KHR 0x2032 #define cl_khr_terminate_context 1 extern CL_API_ENTRY cl_int CL_API_CALL clTerminateContextKHR(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clTerminateContextKHR_fn)(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2; /* * Extension: cl_khr_spir * * This extension adds support to create an OpenCL program object from a * Standard Portable Intermediate Representation (SPIR) instance */ #define CL_DEVICE_SPIR_VERSIONS 0x40E0 #define CL_PROGRAM_BINARY_TYPE_INTERMEDIATE 0x40E1 #ifdef CL_VERSION_2_0 /********************************* * cl_khr_il_program extension *********************************/ #define cl_khr_il_program 1 extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithILKHR(cl_context /* context */, const void * /* strings */, size_t /* lengths */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_program ( CL_API_CALL * clCreateProgramWithILKHR_fn)(cl_context /* context */, const void * /* strings */, size_t /* lengths */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_2_0; #endif /* CL_VERSION_2_0 */ /****************************************** * cl_nv_device_attribute_query extension * ******************************************/ /* cl_nv_device_attribute_query extension - no extension #define since it has no functions */ #define CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV 0x4000 #define CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV 0x4001 #define CL_DEVICE_REGISTERS_PER_BLOCK_NV 0x4002 #define CL_DEVICE_WARP_SIZE_NV 0x4003 #define CL_DEVICE_GPU_OVERLAP_NV 0x4004 #define CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV 0x4005 #define CL_DEVICE_INTEGRATED_MEMORY_NV 0x4006 /********************************* * cl_amd_device_memory_flags * *********************************/ #define cl_amd_device_memory_flags 1 #define CL_MEM_USE_PERSISTENT_MEM_AMD (1 << 6) // Alloc from GPU's CPU visible heap /* cl_device_info */ #define CL_DEVICE_MAX_ATOMIC_COUNTERS_EXT 0x4032 /********************************* * cl_amd_device_attribute_query * *********************************/ #define CL_DEVICE_PROFILING_TIMER_OFFSET_AMD 0x4036 #define CL_DEVICE_TOPOLOGY_AMD 0x4037 #define CL_DEVICE_BOARD_NAME_AMD 0x4038 #define CL_DEVICE_GLOBAL_FREE_MEMORY_AMD 0x4039 #define CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD 0x4040 #define CL_DEVICE_SIMD_WIDTH_AMD 0x4041 #define CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD 0x4042 #define CL_DEVICE_WAVEFRONT_WIDTH_AMD 0x4043 #define CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD 0x4044 #define CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD 0x4045 #define CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD 0x4046 #define CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD 0x4047 #define CL_DEVICE_LOCAL_MEM_BANKS_AMD 0x4048 #define CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD 0x4049 #define CL_DEVICE_GFXIP_MAJOR_AMD 0x404A #define CL_DEVICE_GFXIP_MINOR_AMD 0x404B #define CL_DEVICE_AVAILABLE_ASYNC_QUEUES_AMD 0x404C #define CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD 0x4030 #define CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD 0x4031 #define CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD 0x4033 #define CL_DEVICE_PCIE_ID_AMD 0x4034 typedef union { struct { cl_uint type; cl_uint data[5]; } raw; struct { cl_uint type; cl_uchar unused[17]; cl_uchar bus; cl_uchar device; cl_uchar function; } pcie; } cl_device_topology_amd; #define CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD 1 /************************** * cl_amd_offline_devices * **************************/ #define CL_CONTEXT_OFFLINE_DEVICES_AMD 0x403F /******************************** * cl_amd_bus_addressable_memory * ********************************/ /* cl_mem flag - bitfield */ #define CL_MEM_BUS_ADDRESSABLE_AMD (1<<30) #define CL_MEM_EXTERNAL_PHYSICAL_AMD (1<<31) #define CL_COMMAND_WAIT_SIGNAL_AMD 0x4080 #define CL_COMMAND_WRITE_SIGNAL_AMD 0x4081 #define CL_COMMAND_MAKE_BUFFERS_RESIDENT_AMD 0x4082 typedef struct _cl_bus_address_amd { cl_ulong surface_bus_address; cl_ulong marker_bus_address; } cl_bus_address_amd; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueWaitSignalAMD_fn)( cl_command_queue /*command_queue*/, cl_mem /*mem_object*/, cl_uint /*value*/, cl_uint /*num_events*/, const cl_event * /*event_wait_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueWriteSignalAMD_fn)( cl_command_queue /*command_queue*/, cl_mem /*mem_object*/, cl_uint /*value*/, cl_ulong /*offset*/, cl_uint /*num_events*/, const cl_event * /*event_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueMakeBuffersResidentAMD_fn)( cl_command_queue /*command_queue*/, cl_uint /*num_mem_objs*/, cl_mem * /*mem_objects*/, cl_bool /*blocking_make_resident*/, cl_bus_address_amd * /*bus_addresses*/, cl_uint /*num_events*/, const cl_event * /*event_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; /************************* * cl_amd_copy_buffer_p2p * **************************/ #define CL_DEVICE_NUM_P2P_DEVICES_AMD 0x4088 #define CL_DEVICE_P2P_DEVICES_AMD 0x4089 #define cl_amd_copy_buffer_p2p 1 typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueCopyBufferP2PAMD_fn)(cl_command_queue /*command_queue*/, cl_mem /*src_buffer*/, cl_mem /*dst_buffer*/, size_t /*src_offset*/, size_t /*dst_offset*/, size_t /*cb*/, cl_uint /*num_events_in_wait_list*/, const cl_event* /*event_wait_list*/, cl_event* /*event*/) CL_EXT_SUFFIX__VERSION_1_2; /*********************************** * cl_amd_assembly_program extension * ***********************************/ #define cl_amd_assembly_program 1 typedef CL_API_ENTRY cl_program (CL_API_CALL * clCreateProgramWithAssemblyAMD_fn) ( cl_context /* context */, cl_uint /* count */, const char** /* strings */, const size_t* /* lengths */, cl_int* /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2; #ifdef CL_VERSION_2_0 /******************************** * cl_amd_planar_yuv * ********************************/ /* cl_mem flag - bitfield */ #define CL_YUV_IMAGE_Y_PLANE_AMD 0x0 #define CL_YUV_IMAGE_UV_PLANE_AMD 0x1 typedef CL_API_ENTRY cl_mem (CL_API_CALL * clGetPlaneFromImageAMD_fn)(cl_context /*context*/, cl_mem /*mem*/, cl_uint /*plane*/, cl_int * /*errcode_ret*/) CL_EXT_SUFFIX__VERSION_2_0; #endif // /************************** * cl_amd_command_queue_info * **************************/ #define CL_QUEUE_THREAD_HANDLE_AMD 0x403E /* cl_kernel_exec_info for DVR DOPP texture support */ #define CL_KERNEL_EXEC_INFO_NEW_VCOP_AMD 0x4120 #define CL_KERNEL_EXEC_INFO_PFPA_VCOP_AMD 0x4121 // /********************************* * cl_arm_printf extension *********************************/ #define CL_PRINTF_CALLBACK_ARM 0x40B0 #define CL_PRINTF_BUFFERSIZE_ARM 0x40B1 #ifdef CL_VERSION_1_1 /*********************************** * cl_ext_device_fission extension * ***********************************/ #define cl_ext_device_fission 1 extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clReleaseDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clRetainDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef cl_ulong cl_device_partition_property_ext; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevicesEXT( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int ( CL_API_CALL * clCreateSubDevicesEXT_fn)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; /* cl_device_partition_property_ext */ #define CL_DEVICE_PARTITION_EQUALLY_EXT 0x4050 #define CL_DEVICE_PARTITION_BY_COUNTS_EXT 0x4051 #define CL_DEVICE_PARTITION_BY_NAMES_EXT 0x4052 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT 0x4053 /* clDeviceGetInfo selectors */ #define CL_DEVICE_PARENT_DEVICE_EXT 0x4054 #define CL_DEVICE_PARTITION_TYPES_EXT 0x4055 #define CL_DEVICE_AFFINITY_DOMAINS_EXT 0x4056 #define CL_DEVICE_REFERENCE_COUNT_EXT 0x4057 #define CL_DEVICE_PARTITION_STYLE_EXT 0x4058 /* clGetImageInfo enum */ #define CL_IMAGE_BYTE_PITCH_AMD 0x4059 /* error codes */ #define CL_DEVICE_PARTITION_FAILED_EXT -1057 #define CL_INVALID_PARTITION_COUNT_EXT -1058 #define CL_INVALID_PARTITION_NAME_EXT -1059 /* CL_AFFINITY_DOMAINs */ #define CL_AFFINITY_DOMAIN_L1_CACHE_EXT 0x1 #define CL_AFFINITY_DOMAIN_L2_CACHE_EXT 0x2 #define CL_AFFINITY_DOMAIN_L3_CACHE_EXT 0x3 #define CL_AFFINITY_DOMAIN_L4_CACHE_EXT 0x4 #define CL_AFFINITY_DOMAIN_NUMA_EXT 0x10 #define CL_AFFINITY_DOMAIN_NEXT_FISSIONABLE_EXT 0x100 /* cl_device_partition_property_ext list terminators */ #define CL_PROPERTIES_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_COUNTS_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_NAMES_LIST_END_EXT ((cl_device_partition_property_ext) 0 - 1) /********************************* * cl_qcom_ext_host_ptr extension *********************************/ #define CL_MEM_EXT_HOST_PTR_QCOM (1 << 29) #define CL_DEVICE_EXT_MEM_PADDING_IN_BYTES_QCOM 0x40A0 #define CL_DEVICE_PAGE_SIZE_QCOM 0x40A1 #define CL_IMAGE_ROW_ALIGNMENT_QCOM 0x40A2 #define CL_IMAGE_SLICE_ALIGNMENT_QCOM 0x40A3 #define CL_MEM_HOST_UNCACHED_QCOM 0x40A4 #define CL_MEM_HOST_WRITEBACK_QCOM 0x40A5 #define CL_MEM_HOST_WRITETHROUGH_QCOM 0x40A6 #define CL_MEM_HOST_WRITE_COMBINING_QCOM 0x40A7 typedef cl_uint cl_image_pitch_info_qcom; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceImageInfoQCOM(cl_device_id device, size_t image_width, size_t image_height, const cl_image_format *image_format, cl_image_pitch_info_qcom param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); typedef struct _cl_mem_ext_host_ptr { /* Type of external memory allocation. */ /* Legal values will be defined in layered extensions. */ cl_uint allocation_type; /* Host cache policy for this external memory allocation. */ cl_uint host_cache_policy; } cl_mem_ext_host_ptr; /********************************* * cl_qcom_ion_host_ptr extension *********************************/ #define CL_MEM_ION_HOST_PTR_QCOM 0x40A8 typedef struct _cl_mem_ion_host_ptr { /* Type of external memory allocation. */ /* Must be CL_MEM_ION_HOST_PTR_QCOM for ION allocations. */ cl_mem_ext_host_ptr ext_host_ptr; /* ION file descriptor */ int ion_filedesc; /* Host pointer to the ION allocated memory */ void* ion_hostptr; } cl_mem_ion_host_ptr; #endif /* CL_VERSION_1_1 */ #if defined(CL_VERSION_1_2) /****************************************** * cl_img_yuv_image extension * ******************************************/ /* Image formats used in clCreateImage */ #define CL_NV21_IMG 0x40D0 #define CL_YV12_IMG 0x40D1 /****************************************** * cl_img_cached_allocations extension * ******************************************/ /* Flag values used by clCreteBuffer */ #define CL_MEM_USE_UNCACHED_CPU_MEMORY_IMG (1 << 26) #define CL_MEM_USE_CACHED_CPU_MEMORY_IMG (1 << 27) /****************************************** * cl_img_use_gralloc_ptr extension * ******************************************/ /* Flag values used by clCreteBuffer */ #define CL_MEM_USE_GRALLOC_PTR_IMG (1 << 28) /* To be used by clGetEventInfo: */ #define CL_COMMAND_ACQUIRE_GRALLOC_OBJECTS_IMG 0x40D2 #define CL_COMMAND_RELEASE_GRALLOC_OBJECTS_IMG 0x40D3 /* Error code from clEnqueueReleaseGrallocObjectsIMG */ #define CL_GRALLOC_RESOURCE_NOT_ACQUIRED_IMG 0x40D4 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGrallocObjectsIMG(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGrallocObjectsIMG(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; #endif /* CL_VERSION_1_2 */ #ifdef CL_VERSION_2_0 /********************************* * cl_khr_subgroups extension *********************************/ #define cl_khr_subgroups 1 typedef cl_uint cl_kernel_sub_group_info; /* cl_kernel_sub_group_info */ #define CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR 0x2033 #define CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE_KHR 0x2034 extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfoKHR(cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ) CL_EXT_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int ( CL_API_CALL * clGetKernelSubGroupInfoKHR_fn)(cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ) CL_EXT_SUFFIX__VERSION_2_0; #endif /* CL_VERSION_2_0 */ /****************************************** * cl_arm_shared_virtual_memory extension * ******************************************/ #ifdef CL_VERSION_1_2 /* Used by clGetDeviceInfo */ #define CL_DEVICE_SVM_CAPABILITIES_ARM 0x40B6 /* Used by clGetMemObjectInfo */ #define CL_MEM_USES_SVM_POINTER_ARM 0x40B7 /* Used by clSetKernelExecInfoARM: */ #define CL_KERNEL_EXEC_INFO_SVM_PTRS_ARM 0x40B8 #define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM_ARM 0x40B9 /* To be used by clGetEventInfo: */ #define CL_COMMAND_SVM_FREE_ARM 0x40BA #define CL_COMMAND_SVM_MEMCPY_ARM 0x40BB #define CL_COMMAND_SVM_MEMFILL_ARM 0x40BC #define CL_COMMAND_SVM_MAP_ARM 0x40BD #define CL_COMMAND_SVM_UNMAP_ARM 0x40BE /* Flag values returned by clGetDeviceInfo with CL_DEVICE_SVM_CAPABILITIES_ARM as the param_name. */ #define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER_ARM (1 << 0) #define CL_DEVICE_SVM_FINE_GRAIN_BUFFER_ARM (1 << 1) #define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM_ARM (1 << 2) #define CL_DEVICE_SVM_ATOMICS_ARM (1 << 3) /* Flag values used by clSVMAllocARM: */ #define CL_MEM_SVM_FINE_GRAIN_BUFFER_ARM (1 << 10) #define CL_MEM_SVM_ATOMICS_ARM (1 << 11) typedef cl_bitfield cl_svm_mem_flags_arm; typedef cl_uint cl_kernel_exec_info_arm; typedef cl_bitfield cl_device_svm_capabilities_arm; extern CL_API_ENTRY void * CL_API_CALL clSVMAllocARM(cl_context /* context */, cl_svm_mem_flags_arm /* flags */, size_t /* size */, cl_uint /* alignment */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY void CL_API_CALL clSVMFreeARM(cl_context /* context */, void * /* svm_pointer */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFreeARM(cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void (CL_CALLBACK * /*pfn_free_func*/)(cl_command_queue /* queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void * /* user_data */), void * /* user_data */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpyARM(cl_command_queue /* command_queue */, cl_bool /* blocking_copy */, void * /* dst_ptr */, const void * /* src_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFillARM(cl_command_queue /* command_queue */, void * /* svm_ptr */, const void * /* pattern */, size_t /* pattern_size */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMapARM(cl_command_queue /* command_queue */, cl_bool /* blocking_map */, cl_map_flags /* flags */, void * /* svm_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmapARM(cl_command_queue /* command_queue */, void * /* svm_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointerARM(cl_kernel /* kernel */, cl_uint /* arg_index */, const void * /* arg_value */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfoARM(cl_kernel /* kernel */, cl_kernel_exec_info_arm /* param_name */, size_t /* param_value_size */, const void * /* param_value */) CL_EXT_SUFFIX__VERSION_1_2; #endif /* CL_VERSION_1_2 */ /********************************** * cl_arm_import_memory extension * **********************************/ #ifdef CL_VERSION_1_0 typedef intptr_t cl_import_properties_arm; /* Default and valid proporties name for cl_arm_import_memory */ #define CL_IMPORT_TYPE_ARM 0x40B2 /* Host process memory type default value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_HOST_ARM 0x40B3 /* DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_DMA_BUF_ARM 0x40B4 /* Secure DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_SECURE_ARM 0x40B5 /* This extension adds a new function that allows for direct memory import into * OpenCL via the clImportMemoryARM function. * * Memory imported through this interface will be mapped into the device's page * tables directly, providing zero copy access. It will never fall back to copy * operations and aliased buffers. * * Types of memory supported for import are specified as additional extension * strings. * * This extension produces cl_mem allocations which are compatible with all other * users of cl_mem in the standard API. * * This extension maps pages with the same properties as the normal buffer creation * function clCreateBuffer. */ extern CL_API_ENTRY cl_mem CL_API_CALL clImportMemoryARM( cl_context context, cl_mem_flags flags, const cl_import_properties_arm *properties, void *memory, size_t size, cl_int *errcode_ret) CL_EXT_SUFFIX__VERSION_1_0; #endif /* CL_VERSION_1_0 */ #ifdef __cplusplus } #endif #endif /* __CL_EXT_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl_gl.h000066400000000000000000000166371450307266000230670ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ #ifndef __OPENCL_CL_GL_H #define __OPENCL_CL_GL_H #ifdef __APPLE__ #include #else #include #endif #ifdef __cplusplus extern "C" { #endif typedef cl_uint cl_gl_object_type; typedef cl_uint cl_gl_texture_info; typedef cl_uint cl_gl_platform_info; typedef struct __GLsync *cl_GLsync; /* cl_gl_object_type = 0x2000 - 0x200F enum values are currently taken */ #define CL_GL_OBJECT_BUFFER 0x2000 #define CL_GL_OBJECT_TEXTURE2D 0x2001 #define CL_GL_OBJECT_TEXTURE3D 0x2002 #define CL_GL_OBJECT_RENDERBUFFER 0x2003 #define CL_GL_OBJECT_TEXTURE2D_ARRAY 0x200E #define CL_GL_OBJECT_TEXTURE1D 0x200F #define CL_GL_OBJECT_TEXTURE1D_ARRAY 0x2010 #define CL_GL_OBJECT_TEXTURE_BUFFER 0x2011 /* cl_gl_texture_info */ #define CL_GL_TEXTURE_TARGET 0x2004 #define CL_GL_MIPMAP_LEVEL 0x2005 #define CL_GL_NUM_SAMPLES 0x2012 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLBuffer(cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* bufobj */, int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLRenderbuffer(cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* renderbuffer */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLObjectInfo(cl_mem /* memobj */, cl_gl_object_type * /* gl_object_type */, cl_GLuint * /* gl_object_name */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLTextureInfo(cl_mem /* memobj */, cl_gl_texture_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGLObjects(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGLObjects(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture2D(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture3D(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* cl_khr_gl_sharing extension */ #define cl_khr_gl_sharing 1 typedef cl_uint cl_gl_context_info; /* Additional Error Codes */ #define CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR -1000 /* cl_gl_context_info */ #define CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR 0x2006 #define CL_DEVICES_FOR_GL_CONTEXT_KHR 0x2007 /* Additional cl_context_properties */ #define CL_GL_CONTEXT_KHR 0x2008 #define CL_EGL_DISPLAY_KHR 0x2009 #define CL_GLX_DISPLAY_KHR 0x200A #define CL_WGL_HDC_KHR 0x200B #define CL_CGL_SHAREGROUP_KHR 0x200C extern CL_API_ENTRY cl_int CL_API_CALL clGetGLContextInfoKHR(const cl_context_properties * /* properties */, cl_gl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetGLContextInfoKHR_fn)( const cl_context_properties * properties, cl_gl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl_gl_ext.h000066400000000000000000000054651450307266000237440ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ /* cl_gl_ext.h contains vendor (non-KHR) OpenCL extensions which have */ /* OpenGL dependencies. */ #ifndef __OPENCL_CL_GL_EXT_H #define __OPENCL_CL_GL_EXT_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #else #include #endif /* * For each extension, follow this template * cl_VEN_extname extension */ /* #define cl_VEN_extname 1 * ... define new types, if any * ... define new tokens, if any * ... define new APIs, if any * * If you need GLtypes here, mirror them with a cl_GLtype, rather than including a GL header * This allows us to avoid having to decide whether to include GL headers or GLES here. */ /* * cl_khr_gl_event extension * See section 9.9 in the OpenCL 1.1 spec for more information */ #define CL_COMMAND_GL_FENCE_SYNC_OBJECT_KHR 0x200D extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromGLsyncKHR(cl_context /* context */, cl_GLsync /* cl_GLsync */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_EXT_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/cl_platform.h000066400000000000000000001307101450307266000242760ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11803 $ on $Date: 2010-06-25 10:02:12 -0700 (Fri, 25 Jun 2010) $ */ #ifndef __CL_PLATFORM_H #define __CL_PLATFORM_H #ifdef __APPLE__ /* Contains #defines for AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER below */ #include #endif #ifdef __cplusplus extern "C" { #endif #if defined(_WIN32) #define CL_API_ENTRY #define CL_API_CALL __stdcall #define CL_CALLBACK __stdcall #else #define CL_API_ENTRY #define CL_API_CALL #define CL_CALLBACK #endif /* * Deprecation flags refer to the last version of the header in which the * feature was not deprecated. * * E.g. VERSION_1_1_DEPRECATED means the feature is present in 1.1 without * deprecation but is deprecated in versions later than 1.1. */ #ifdef __APPLE__ #define CL_EXTENSION_WEAK_LINK __attribute__((weak_import)) #define CL_API_SUFFIX__VERSION_1_0 AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_0 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER #define CL_API_SUFFIX__VERSION_1_1 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define GCL_API_SUFFIX__VERSION_1_1 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_1 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER_BUT_DEPRECATED_IN_MAC_OS_X_VERSION_10_7 #ifdef AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define GCL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_2 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER_BUT_DEPRECATED_IN_MAC_OS_X_VERSION_10_8 #else #warning This path should never happen outside of internal operating system development. AvailabilityMacros do not function correctly here! #define CL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define GCL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_2 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #endif #else #define CL_EXTENSION_WEAK_LINK #define CL_API_SUFFIX__VERSION_1_0 #define CL_EXT_SUFFIX__VERSION_1_0 #define CL_API_SUFFIX__VERSION_1_1 #define CL_EXT_SUFFIX__VERSION_1_1 #define CL_API_SUFFIX__VERSION_1_2 #define CL_EXT_SUFFIX__VERSION_1_2 #define CL_API_SUFFIX__VERSION_2_0 #define CL_EXT_SUFFIX__VERSION_2_0 #ifdef __GNUC__ #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #endif #elif defined(_WIN32) #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED __declspec(deprecated) #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED __declspec(deprecated) #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED __declspec(deprecated) #endif #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #endif #endif #if (defined (_WIN32) && defined(_MSC_VER)) /* scalar types */ typedef signed __int8 cl_char; typedef unsigned __int8 cl_uchar; typedef signed __int16 cl_short; typedef unsigned __int16 cl_ushort; typedef signed __int32 cl_int; typedef unsigned __int32 cl_uint; typedef signed __int64 cl_long; typedef unsigned __int64 cl_ulong; typedef unsigned __int16 cl_half; typedef float cl_float; typedef double cl_double; /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 1.1920928955078125e-7f #define CL_HALF_DIG 3 #define CL_HALF_MANT_DIG 11 #define CL_HALF_MAX_10_EXP +4 #define CL_HALF_MAX_EXP +16 #define CL_HALF_MIN_10_EXP -4 #define CL_HALF_MIN_EXP -13 #define CL_HALF_RADIX 2 #define CL_HALF_MAX 65504.0f #define CL_HALF_MIN 6.103515625e-05f #define CL_HALF_EPSILON 9.765625e-04f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 1.7976931348623158e+308 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.7182818284590452354 #define CL_M_LOG2E 1.4426950408889634074 #define CL_M_LOG10E 0.43429448190325182765 #define CL_M_LN2 0.69314718055994530942 #define CL_M_LN10 2.30258509299404568402 #define CL_M_PI 3.14159265358979323846 #define CL_M_PI_2 1.57079632679489661923 #define CL_M_PI_4 0.78539816339744830962 #define CL_M_1_PI 0.31830988618379067154 #define CL_M_2_PI 0.63661977236758134308 #define CL_M_2_SQRTPI 1.12837916709551257390 #define CL_M_SQRT2 1.41421356237309504880 #define CL_M_SQRT1_2 0.70710678118654752440 #define CL_M_E_F 2.718281828f #define CL_M_LOG2E_F 1.442695041f #define CL_M_LOG10E_F 0.434294482f #define CL_M_LN2_F 0.693147181f #define CL_M_LN10_F 2.302585093f #define CL_M_PI_F 3.141592654f #define CL_M_PI_2_F 1.570796327f #define CL_M_PI_4_F 0.785398163f #define CL_M_1_PI_F 0.318309886f #define CL_M_2_PI_F 0.636619772f #define CL_M_2_SQRTPI_F 1.128379167f #define CL_M_SQRT2_F 1.414213562f #define CL_M_SQRT1_2_F 0.707106781f #define CL_NAN (CL_INFINITY - CL_INFINITY) #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #else #include /* scalar types */ typedef int8_t cl_char; typedef uint8_t cl_uchar; typedef int16_t cl_short __attribute__((aligned(2))); typedef uint16_t cl_ushort __attribute__((aligned(2))); typedef int32_t cl_int __attribute__((aligned(4))); typedef uint32_t cl_uint __attribute__((aligned(4))); typedef int64_t cl_long __attribute__((aligned(8))); typedef uint64_t cl_ulong __attribute__((aligned(8))); typedef uint16_t cl_half __attribute__((aligned(2))); typedef float cl_float __attribute__((aligned(4))); typedef double cl_double __attribute__((aligned(8))); /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 1.1920928955078125e-7f #define CL_HALF_DIG 3 #define CL_HALF_MANT_DIG 11 #define CL_HALF_MAX_10_EXP +4 #define CL_HALF_MAX_EXP +16 #define CL_HALF_MIN_10_EXP -4 #define CL_HALF_MIN_EXP -13 #define CL_HALF_RADIX 2 #define CL_HALF_MAX 65504.0f #define CL_HALF_MIN 6.103515625e-05f #define CL_HALF_EPSILON 9.765625e-04f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.7182818284590452354 #define CL_M_LOG2E 1.4426950408889634074 #define CL_M_LOG10E 0.43429448190325182765 #define CL_M_LN2 0.69314718055994530942 #define CL_M_LN10 2.30258509299404568402 #define CL_M_PI 3.14159265358979323846 #define CL_M_PI_2 1.57079632679489661923 #define CL_M_PI_4 0.78539816339744830962 #define CL_M_1_PI 0.31830988618379067154 #define CL_M_2_PI 0.63661977236758134308 #define CL_M_2_SQRTPI 1.12837916709551257390 #define CL_M_SQRT2 1.41421356237309504880 #define CL_M_SQRT1_2 0.70710678118654752440 #define CL_M_E_F 2.718281828f #define CL_M_LOG2E_F 1.442695041f #define CL_M_LOG10E_F 0.434294482f #define CL_M_LN2_F 0.693147181f #define CL_M_LN10_F 2.302585093f #define CL_M_PI_F 3.141592654f #define CL_M_PI_2_F 1.570796327f #define CL_M_PI_4_F 0.785398163f #define CL_M_1_PI_F 0.318309886f #define CL_M_2_PI_F 0.636619772f #define CL_M_2_SQRTPI_F 1.128379167f #define CL_M_SQRT2_F 1.414213562f #define CL_M_SQRT1_2_F 0.707106781f #if defined( __GNUC__ ) #define CL_HUGE_VALF __builtin_huge_valf() #define CL_HUGE_VAL __builtin_huge_val() #define CL_NAN __builtin_nanf( "" ) #else #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) float nanf( const char * ); #define CL_NAN nanf( "" ) #endif #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #endif #include /* Mirror types to GL types. Mirror types allow us to avoid deciding which 87s to load based on whether we are using GL or GLES here. */ typedef unsigned int cl_GLuint; typedef int cl_GLint; typedef unsigned int cl_GLenum; /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ /* Define basic vector types */ #if defined( __VEC__ ) #include /* may be omitted depending on compiler. AltiVec spec provides no way to detect whether the header is required. */ typedef vector unsigned char __cl_uchar16; typedef vector signed char __cl_char16; typedef vector unsigned short __cl_ushort8; typedef vector signed short __cl_short8; typedef vector unsigned int __cl_uint4; typedef vector signed int __cl_int4; typedef vector float __cl_float4; #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_UINT4__ 1 #define __CL_INT4__ 1 #define __CL_FLOAT4__ 1 #endif #if defined( __SSE__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef float __cl_float4 __attribute__((vector_size(16))); #else typedef __m128 __cl_float4; #endif #define __CL_FLOAT4__ 1 #endif #if defined( __SSE2__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar16 __attribute__((vector_size(16))); typedef cl_char __cl_char16 __attribute__((vector_size(16))); typedef cl_ushort __cl_ushort8 __attribute__((vector_size(16))); typedef cl_short __cl_short8 __attribute__((vector_size(16))); typedef cl_uint __cl_uint4 __attribute__((vector_size(16))); typedef cl_int __cl_int4 __attribute__((vector_size(16))); typedef cl_ulong __cl_ulong2 __attribute__((vector_size(16))); typedef cl_long __cl_long2 __attribute__((vector_size(16))); typedef cl_double __cl_double2 __attribute__((vector_size(16))); #else typedef __m128i __cl_uchar16; typedef __m128i __cl_char16; typedef __m128i __cl_ushort8; typedef __m128i __cl_short8; typedef __m128i __cl_uint4; typedef __m128i __cl_int4; typedef __m128i __cl_ulong2; typedef __m128i __cl_long2; typedef __m128d __cl_double2; #endif #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_INT4__ 1 #define __CL_UINT4__ 1 #define __CL_ULONG2__ 1 #define __CL_LONG2__ 1 #define __CL_DOUBLE2__ 1 #endif #if defined( __MMX__ ) #include #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar8 __attribute__((vector_size(8))); typedef cl_char __cl_char8 __attribute__((vector_size(8))); typedef cl_ushort __cl_ushort4 __attribute__((vector_size(8))); typedef cl_short __cl_short4 __attribute__((vector_size(8))); typedef cl_uint __cl_uint2 __attribute__((vector_size(8))); typedef cl_int __cl_int2 __attribute__((vector_size(8))); typedef cl_ulong __cl_ulong1 __attribute__((vector_size(8))); typedef cl_long __cl_long1 __attribute__((vector_size(8))); typedef cl_float __cl_float2 __attribute__((vector_size(8))); #else typedef __m64 __cl_uchar8; typedef __m64 __cl_char8; typedef __m64 __cl_ushort4; typedef __m64 __cl_short4; typedef __m64 __cl_uint2; typedef __m64 __cl_int2; typedef __m64 __cl_ulong1; typedef __m64 __cl_long1; typedef __m64 __cl_float2; #endif #define __CL_UCHAR8__ 1 #define __CL_CHAR8__ 1 #define __CL_USHORT4__ 1 #define __CL_SHORT4__ 1 #define __CL_INT2__ 1 #define __CL_UINT2__ 1 #define __CL_ULONG1__ 1 #define __CL_LONG1__ 1 #define __CL_FLOAT2__ 1 #endif #if defined( __AVX__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_float __cl_float8 __attribute__((vector_size(32))); typedef cl_double __cl_double4 __attribute__((vector_size(32))); #else typedef __m256 __cl_float8; typedef __m256d __cl_double4; #endif #define __CL_FLOAT8__ 1 #define __CL_DOUBLE4__ 1 #endif /* Define capabilities for anonymous struct members. */ #if !defined(__cplusplus) && defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ #elif defined( __GNUC__) && ! defined( __STRICT_ANSI__ ) #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ __extension__ #elif defined( _WIN32) && defined(_MSC_VER) #if _MSC_VER >= 1500 /* Microsoft Developer Studio 2008 supports anonymous structs, but * complains by default. */ #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ /* Disable warning C4201: nonstandard extension used : nameless * struct/union */ #pragma warning( push ) #pragma warning( disable : 4201 ) #endif #else #define __CL_HAS_ANON_STRUCT__ 0 #define __CL_ANON_STRUCT__ #endif /* Define alignment keys */ #if defined( __GNUC__ ) #define CL_ALIGNED(_x) __attribute__ ((aligned(_x))) #elif defined( _WIN32) && (_MSC_VER) /* Alignment keys neutered on windows because MSVC can't swallow function arguments with alignment requirements */ /* http://msdn.microsoft.com/en-us/library/373ak2y1%28VS.71%29.aspx */ /* #include */ /* #define CL_ALIGNED(_x) _CRT_ALIGN(_x) */ #define CL_ALIGNED(_x) #else #warning Need to implement some method to align data here #define CL_ALIGNED(_x) #endif /* Indicate whether .xyzw, .s0123 and .hi.lo are supported */ #if __CL_HAS_ANON_STRUCT__ /* .xyzw and .s0123...{f|F} are supported */ #define CL_HAS_NAMED_VECTOR_FIELDS 1 /* .hi and .lo are supported */ #define CL_HAS_HI_LO_VECTOR_FIELDS 1 #endif /* Define cl_vector types */ /* ---- cl_charn ---- */ typedef union { cl_char CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_char lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2; #endif }cl_char2; typedef union { cl_char CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_char2 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[2]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4; #endif }cl_char4; /* cl_char3 is identical in size, alignment and behavior to cl_char4. See section 6.1.5. */ typedef cl_char4 cl_char3; typedef union { cl_char CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_char4 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[4]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[2]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8; #endif }cl_char8; typedef union { cl_char CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_char8 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[8]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[4]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8[2]; #endif #if defined( __CL_CHAR16__ ) __cl_char16 v16; #endif }cl_char16; /* ---- cl_ucharn ---- */ typedef union { cl_uchar CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uchar lo, hi; }; #endif #if defined( __cl_uchar2__) __cl_uchar2 v2; #endif }cl_uchar2; typedef union { cl_uchar CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uchar2 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[2]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4; #endif }cl_uchar4; /* cl_uchar3 is identical in size, alignment and behavior to cl_uchar4. See section 6.1.5. */ typedef cl_uchar4 cl_uchar3; typedef union { cl_uchar CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uchar4 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[4]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[2]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8; #endif }cl_uchar8; typedef union { cl_uchar CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uchar8 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[8]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[4]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8[2]; #endif #if defined( __CL_UCHAR16__ ) __cl_uchar16 v16; #endif }cl_uchar16; /* ---- cl_shortn ---- */ typedef union { cl_short CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_short lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2; #endif }cl_short2; typedef union { cl_short CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_short2 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[2]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4; #endif }cl_short4; /* cl_short3 is identical in size, alignment and behavior to cl_short4. See section 6.1.5. */ typedef cl_short4 cl_short3; typedef union { cl_short CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_short4 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[4]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[2]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8; #endif }cl_short8; typedef union { cl_short CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_short8 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[8]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[4]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8[2]; #endif #if defined( __CL_SHORT16__ ) __cl_short16 v16; #endif }cl_short16; /* ---- cl_ushortn ---- */ typedef union { cl_ushort CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ushort lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2; #endif }cl_ushort2; typedef union { cl_ushort CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ushort2 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[2]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4; #endif }cl_ushort4; /* cl_ushort3 is identical in size, alignment and behavior to cl_ushort4. See section 6.1.5. */ typedef cl_ushort4 cl_ushort3; typedef union { cl_ushort CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ushort4 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[4]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[2]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8; #endif }cl_ushort8; typedef union { cl_ushort CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ushort8 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[8]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[4]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8[2]; #endif #if defined( __CL_USHORT16__ ) __cl_ushort16 v16; #endif }cl_ushort16; /* ---- cl_halfn ---- */ typedef union { cl_half CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_half lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2; #endif }cl_half2; typedef union { cl_half CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_half2 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[2]; #endif #if defined( __CL_HALF4__) __cl_half4 v4; #endif }cl_half4; /* cl_half3 is identical in size, alignment and behavior to cl_half4. See section 6.1.5. */ typedef cl_half4 cl_half3; typedef union { cl_half CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_half4 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[4]; #endif #if defined( __CL_HALF4__) __cl_half4 v4[2]; #endif #if defined( __CL_HALF8__ ) __cl_half8 v8; #endif }cl_half8; typedef union { cl_half CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_half8 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[8]; #endif #if defined( __CL_HALF4__) __cl_half4 v4[4]; #endif #if defined( __CL_HALF8__ ) __cl_half8 v8[2]; #endif #if defined( __CL_HALF16__ ) __cl_half16 v16; #endif }cl_half16; /* ---- cl_intn ---- */ typedef union { cl_int CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_int lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2; #endif }cl_int2; typedef union { cl_int CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_int2 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[2]; #endif #if defined( __CL_INT4__) __cl_int4 v4; #endif }cl_int4; /* cl_int3 is identical in size, alignment and behavior to cl_int4. See section 6.1.5. */ typedef cl_int4 cl_int3; typedef union { cl_int CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_int4 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[4]; #endif #if defined( __CL_INT4__) __cl_int4 v4[2]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8; #endif }cl_int8; typedef union { cl_int CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_int8 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[8]; #endif #if defined( __CL_INT4__) __cl_int4 v4[4]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8[2]; #endif #if defined( __CL_INT16__ ) __cl_int16 v16; #endif }cl_int16; /* ---- cl_uintn ---- */ typedef union { cl_uint CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uint lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2; #endif }cl_uint2; typedef union { cl_uint CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uint2 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[2]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4; #endif }cl_uint4; /* cl_uint3 is identical in size, alignment and behavior to cl_uint4. See section 6.1.5. */ typedef cl_uint4 cl_uint3; typedef union { cl_uint CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uint4 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[4]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[2]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8; #endif }cl_uint8; typedef union { cl_uint CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uint8 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[8]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[4]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8[2]; #endif #if defined( __CL_UINT16__ ) __cl_uint16 v16; #endif }cl_uint16; /* ---- cl_longn ---- */ typedef union { cl_long CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_long lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2; #endif }cl_long2; typedef union { cl_long CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_long2 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[2]; #endif #if defined( __CL_LONG4__) __cl_long4 v4; #endif }cl_long4; /* cl_long3 is identical in size, alignment and behavior to cl_long4. See section 6.1.5. */ typedef cl_long4 cl_long3; typedef union { cl_long CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_long4 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[4]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[2]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8; #endif }cl_long8; typedef union { cl_long CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_long8 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[8]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[4]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8[2]; #endif #if defined( __CL_LONG16__ ) __cl_long16 v16; #endif }cl_long16; /* ---- cl_ulongn ---- */ typedef union { cl_ulong CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ulong lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2; #endif }cl_ulong2; typedef union { cl_ulong CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ulong2 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[2]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4; #endif }cl_ulong4; /* cl_ulong3 is identical in size, alignment and behavior to cl_ulong4. See section 6.1.5. */ typedef cl_ulong4 cl_ulong3; typedef union { cl_ulong CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ulong4 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[4]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[2]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8; #endif }cl_ulong8; typedef union { cl_ulong CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ulong8 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[8]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[4]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8[2]; #endif #if defined( __CL_ULONG16__ ) __cl_ulong16 v16; #endif }cl_ulong16; /* --- cl_floatn ---- */ typedef union { cl_float CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_float lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2; #endif }cl_float2; typedef union { cl_float CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_float2 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[2]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4; #endif }cl_float4; /* cl_float3 is identical in size, alignment and behavior to cl_float4. See section 6.1.5. */ typedef cl_float4 cl_float3; typedef union { cl_float CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_float4 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[4]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[2]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8; #endif }cl_float8; typedef union { cl_float CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_float8 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[8]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[4]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8[2]; #endif #if defined( __CL_FLOAT16__ ) __cl_float16 v16; #endif }cl_float16; /* --- cl_doublen ---- */ typedef union { cl_double CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_double lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2; #endif }cl_double2; typedef union { cl_double CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_double2 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[2]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4; #endif }cl_double4; /* cl_double3 is identical in size, alignment and behavior to cl_double4. See section 6.1.5. */ typedef cl_double4 cl_double3; typedef union { cl_double CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_double4 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[4]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[2]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8; #endif }cl_double8; typedef union { cl_double CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_double8 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[8]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[4]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8[2]; #endif #if defined( __CL_DOUBLE16__ ) __cl_double16 v16; #endif }cl_double16; /* Macro to facilitate debugging * Usage: * Place CL_PROGRAM_STRING_DEBUG_INFO on the line before the first line of your source. * The first line ends with: CL_PROGRAM_STRING_DEBUG_INFO \" * Each line thereafter of OpenCL C source must end with: \n\ * The last line ends in "; * * Example: * * const char *my_program = CL_PROGRAM_STRING_DEBUG_INFO "\ * kernel void foo( int a, float * b ) \n\ * { \n\ * // my comment \n\ * *b[ get_global_id(0)] = a; \n\ * } \n\ * "; * * This should correctly set up the line, (column) and file information for your source * string so you can do source level debugging. */ #define __CL_STRINGIFY( _x ) # _x #define _CL_STRINGIFY( _x ) __CL_STRINGIFY( _x ) #define CL_PROGRAM_STRING_DEBUG_INFO "#line " _CL_STRINGIFY(__LINE__) " \"" __FILE__ "\" \n\n" #ifdef __cplusplus } #endif #undef __CL_HAS_ANON_STRUCT__ #undef __CL_ANON_STRUCT__ #if defined( _WIN32) && defined(_MSC_VER) #if _MSC_VER >=1500 #pragma warning( pop ) #endif #endif #endif /* __CL_PLATFORM_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.0/CL/opencl.h000066400000000000000000000037111450307266000232540ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_H #define __OPENCL_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #include #include #include #else #include #include #include #include #endif #ifdef __cplusplus } #endif #endif /* __OPENCL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/000077500000000000000000000000001450307266000213245ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/000077500000000000000000000000001450307266000216225ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl.h000066400000000000000000002215671450307266000224060ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ #ifndef __OPENCL_CL_H #define __OPENCL_CL_H #ifdef __APPLE__ #include #else #include #endif #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ typedef struct _cl_platform_id * cl_platform_id; typedef struct _cl_device_id * cl_device_id; typedef struct _cl_context * cl_context; typedef struct _cl_command_queue * cl_command_queue; typedef struct _cl_mem * cl_mem; typedef struct _cl_program * cl_program; typedef struct _cl_kernel * cl_kernel; typedef struct _cl_event * cl_event; typedef struct _cl_sampler * cl_sampler; typedef cl_uint cl_bool; /* WARNING! Unlike cl_ types in cl_platform.h, cl_bool is not guaranteed to be the same size as the bool in kernels. */ typedef cl_ulong cl_bitfield; typedef cl_bitfield cl_device_type; typedef cl_uint cl_platform_info; typedef cl_uint cl_device_info; typedef cl_bitfield cl_device_fp_config; typedef cl_uint cl_device_mem_cache_type; typedef cl_uint cl_device_local_mem_type; typedef cl_bitfield cl_device_exec_capabilities; typedef cl_bitfield cl_device_svm_capabilities; typedef cl_bitfield cl_command_queue_properties; typedef intptr_t cl_device_partition_property; typedef cl_bitfield cl_device_affinity_domain; typedef intptr_t cl_context_properties; typedef cl_uint cl_context_info; typedef cl_bitfield cl_queue_properties; typedef cl_uint cl_command_queue_info; typedef cl_uint cl_channel_order; typedef cl_uint cl_channel_type; typedef cl_bitfield cl_mem_flags; typedef cl_bitfield cl_svm_mem_flags; typedef cl_uint cl_mem_object_type; typedef cl_uint cl_mem_info; typedef cl_bitfield cl_mem_migration_flags; typedef cl_uint cl_image_info; typedef cl_uint cl_buffer_create_type; typedef cl_uint cl_addressing_mode; typedef cl_uint cl_filter_mode; typedef cl_uint cl_sampler_info; typedef cl_bitfield cl_map_flags; typedef intptr_t cl_pipe_properties; typedef cl_uint cl_pipe_info; typedef cl_uint cl_program_info; typedef cl_uint cl_program_build_info; typedef cl_uint cl_program_binary_type; typedef cl_int cl_build_status; typedef cl_uint cl_kernel_info; typedef cl_uint cl_kernel_arg_info; typedef cl_uint cl_kernel_arg_address_qualifier; typedef cl_uint cl_kernel_arg_access_qualifier; typedef cl_bitfield cl_kernel_arg_type_qualifier; typedef cl_uint cl_kernel_work_group_info; typedef cl_uint cl_kernel_sub_group_info; typedef cl_uint cl_event_info; typedef cl_uint cl_command_type; typedef cl_uint cl_profiling_info; typedef cl_bitfield cl_sampler_properties; typedef cl_uint cl_kernel_exec_info; typedef struct _cl_image_format { cl_channel_order image_channel_order; cl_channel_type image_channel_data_type; } cl_image_format; typedef struct _cl_image_desc { cl_mem_object_type image_type; size_t image_width; size_t image_height; size_t image_depth; size_t image_array_size; size_t image_row_pitch; size_t image_slice_pitch; cl_uint num_mip_levels; cl_uint num_samples; #ifdef __GNUC__ __extension__ /* Prevents warnings about anonymous union in -pedantic builds */ #endif union { cl_mem buffer; cl_mem mem_object; }; } cl_image_desc; typedef struct _cl_buffer_region { size_t origin; size_t size; } cl_buffer_region; /******************************************************************************/ /* Error Codes */ #define CL_SUCCESS 0 #define CL_DEVICE_NOT_FOUND -1 #define CL_DEVICE_NOT_AVAILABLE -2 #define CL_COMPILER_NOT_AVAILABLE -3 #define CL_MEM_OBJECT_ALLOCATION_FAILURE -4 #define CL_OUT_OF_RESOURCES -5 #define CL_OUT_OF_HOST_MEMORY -6 #define CL_PROFILING_INFO_NOT_AVAILABLE -7 #define CL_MEM_COPY_OVERLAP -8 #define CL_IMAGE_FORMAT_MISMATCH -9 #define CL_IMAGE_FORMAT_NOT_SUPPORTED -10 #define CL_BUILD_PROGRAM_FAILURE -11 #define CL_MAP_FAILURE -12 #define CL_MISALIGNED_SUB_BUFFER_OFFSET -13 #define CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST -14 #define CL_COMPILE_PROGRAM_FAILURE -15 #define CL_LINKER_NOT_AVAILABLE -16 #define CL_LINK_PROGRAM_FAILURE -17 #define CL_DEVICE_PARTITION_FAILED -18 #define CL_KERNEL_ARG_INFO_NOT_AVAILABLE -19 #define CL_INVALID_VALUE -30 #define CL_INVALID_DEVICE_TYPE -31 #define CL_INVALID_PLATFORM -32 #define CL_INVALID_DEVICE -33 #define CL_INVALID_CONTEXT -34 #define CL_INVALID_QUEUE_PROPERTIES -35 #define CL_INVALID_COMMAND_QUEUE -36 #define CL_INVALID_HOST_PTR -37 #define CL_INVALID_MEM_OBJECT -38 #define CL_INVALID_IMAGE_FORMAT_DESCRIPTOR -39 #define CL_INVALID_IMAGE_SIZE -40 #define CL_INVALID_SAMPLER -41 #define CL_INVALID_BINARY -42 #define CL_INVALID_BUILD_OPTIONS -43 #define CL_INVALID_PROGRAM -44 #define CL_INVALID_PROGRAM_EXECUTABLE -45 #define CL_INVALID_KERNEL_NAME -46 #define CL_INVALID_KERNEL_DEFINITION -47 #define CL_INVALID_KERNEL -48 #define CL_INVALID_ARG_INDEX -49 #define CL_INVALID_ARG_VALUE -50 #define CL_INVALID_ARG_SIZE -51 #define CL_INVALID_KERNEL_ARGS -52 #define CL_INVALID_WORK_DIMENSION -53 #define CL_INVALID_WORK_GROUP_SIZE -54 #define CL_INVALID_WORK_ITEM_SIZE -55 #define CL_INVALID_GLOBAL_OFFSET -56 #define CL_INVALID_EVENT_WAIT_LIST -57 #define CL_INVALID_EVENT -58 #define CL_INVALID_OPERATION -59 #define CL_INVALID_GL_OBJECT -60 #define CL_INVALID_BUFFER_SIZE -61 #define CL_INVALID_MIP_LEVEL -62 #define CL_INVALID_GLOBAL_WORK_SIZE -63 #define CL_INVALID_PROPERTY -64 #define CL_INVALID_IMAGE_DESCRIPTOR -65 #define CL_INVALID_COMPILER_OPTIONS -66 #define CL_INVALID_LINKER_OPTIONS -67 #define CL_INVALID_DEVICE_PARTITION_COUNT -68 #define CL_INVALID_PIPE_SIZE -69 #define CL_INVALID_DEVICE_QUEUE -70 /* OpenCL Version */ #define CL_VERSION_1_0 1 #define CL_VERSION_1_1 1 #define CL_VERSION_1_2 1 #define CL_VERSION_2_0 1 #define CL_VERSION_2_1 1 /* cl_bool */ #define CL_FALSE 0 #define CL_TRUE 1 #define CL_BLOCKING CL_TRUE #define CL_NON_BLOCKING CL_FALSE /* cl_platform_info */ #define CL_PLATFORM_PROFILE 0x0900 #define CL_PLATFORM_VERSION 0x0901 #define CL_PLATFORM_NAME 0x0902 #define CL_PLATFORM_VENDOR 0x0903 #define CL_PLATFORM_EXTENSIONS 0x0904 #define CL_PLATFORM_HOST_TIMER_RESOLUTION 0x0905 /* cl_device_type - bitfield */ #define CL_DEVICE_TYPE_DEFAULT (1 << 0) #define CL_DEVICE_TYPE_CPU (1 << 1) #define CL_DEVICE_TYPE_GPU (1 << 2) #define CL_DEVICE_TYPE_ACCELERATOR (1 << 3) #define CL_DEVICE_TYPE_CUSTOM (1 << 4) #define CL_DEVICE_TYPE_ALL 0xFFFFFFFF /* cl_device_info */ #define CL_DEVICE_TYPE 0x1000 #define CL_DEVICE_VENDOR_ID 0x1001 #define CL_DEVICE_MAX_COMPUTE_UNITS 0x1002 #define CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 0x1003 #define CL_DEVICE_MAX_WORK_GROUP_SIZE 0x1004 #define CL_DEVICE_MAX_WORK_ITEM_SIZES 0x1005 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR 0x1006 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT 0x1007 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT 0x1008 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG 0x1009 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT 0x100A #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 0x100B #define CL_DEVICE_MAX_CLOCK_FREQUENCY 0x100C #define CL_DEVICE_ADDRESS_BITS 0x100D #define CL_DEVICE_MAX_READ_IMAGE_ARGS 0x100E #define CL_DEVICE_MAX_WRITE_IMAGE_ARGS 0x100F #define CL_DEVICE_MAX_MEM_ALLOC_SIZE 0x1010 #define CL_DEVICE_IMAGE2D_MAX_WIDTH 0x1011 #define CL_DEVICE_IMAGE2D_MAX_HEIGHT 0x1012 #define CL_DEVICE_IMAGE3D_MAX_WIDTH 0x1013 #define CL_DEVICE_IMAGE3D_MAX_HEIGHT 0x1014 #define CL_DEVICE_IMAGE3D_MAX_DEPTH 0x1015 #define CL_DEVICE_IMAGE_SUPPORT 0x1016 #define CL_DEVICE_MAX_PARAMETER_SIZE 0x1017 #define CL_DEVICE_MAX_SAMPLERS 0x1018 #define CL_DEVICE_MEM_BASE_ADDR_ALIGN 0x1019 #define CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE 0x101A #define CL_DEVICE_SINGLE_FP_CONFIG 0x101B #define CL_DEVICE_GLOBAL_MEM_CACHE_TYPE 0x101C #define CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE 0x101D #define CL_DEVICE_GLOBAL_MEM_CACHE_SIZE 0x101E #define CL_DEVICE_GLOBAL_MEM_SIZE 0x101F #define CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE 0x1020 #define CL_DEVICE_MAX_CONSTANT_ARGS 0x1021 #define CL_DEVICE_LOCAL_MEM_TYPE 0x1022 #define CL_DEVICE_LOCAL_MEM_SIZE 0x1023 #define CL_DEVICE_ERROR_CORRECTION_SUPPORT 0x1024 #define CL_DEVICE_PROFILING_TIMER_RESOLUTION 0x1025 #define CL_DEVICE_ENDIAN_LITTLE 0x1026 #define CL_DEVICE_AVAILABLE 0x1027 #define CL_DEVICE_COMPILER_AVAILABLE 0x1028 #define CL_DEVICE_EXECUTION_CAPABILITIES 0x1029 #define CL_DEVICE_QUEUE_PROPERTIES 0x102A /* deprecated */ #define CL_DEVICE_QUEUE_ON_HOST_PROPERTIES 0x102A #define CL_DEVICE_NAME 0x102B #define CL_DEVICE_VENDOR 0x102C #define CL_DRIVER_VERSION 0x102D #define CL_DEVICE_PROFILE 0x102E #define CL_DEVICE_VERSION 0x102F #define CL_DEVICE_EXTENSIONS 0x1030 #define CL_DEVICE_PLATFORM 0x1031 #define CL_DEVICE_DOUBLE_FP_CONFIG 0x1032 #define CL_DEVICE_HALF_FP_CONFIG 0x1033 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF 0x1034 #define CL_DEVICE_HOST_UNIFIED_MEMORY 0x1035 /* deprecated */ #define CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR 0x1036 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT 0x1037 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_INT 0x1038 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG 0x1039 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT 0x103A #define CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE 0x103B #define CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF 0x103C #define CL_DEVICE_OPENCL_C_VERSION 0x103D #define CL_DEVICE_LINKER_AVAILABLE 0x103E #define CL_DEVICE_BUILT_IN_KERNELS 0x103F #define CL_DEVICE_IMAGE_MAX_BUFFER_SIZE 0x1040 #define CL_DEVICE_IMAGE_MAX_ARRAY_SIZE 0x1041 #define CL_DEVICE_PARENT_DEVICE 0x1042 #define CL_DEVICE_PARTITION_MAX_SUB_DEVICES 0x1043 #define CL_DEVICE_PARTITION_PROPERTIES 0x1044 #define CL_DEVICE_PARTITION_AFFINITY_DOMAIN 0x1045 #define CL_DEVICE_PARTITION_TYPE 0x1046 #define CL_DEVICE_REFERENCE_COUNT 0x1047 #define CL_DEVICE_PREFERRED_INTEROP_USER_SYNC 0x1048 #define CL_DEVICE_PRINTF_BUFFER_SIZE 0x1049 #define CL_DEVICE_IMAGE_PITCH_ALIGNMENT 0x104A #define CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT 0x104B #define CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS 0x104C #define CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE 0x104D #define CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES 0x104E #define CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE 0x104F #define CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE 0x1050 #define CL_DEVICE_MAX_ON_DEVICE_QUEUES 0x1051 #define CL_DEVICE_MAX_ON_DEVICE_EVENTS 0x1052 #define CL_DEVICE_SVM_CAPABILITIES 0x1053 #define CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE 0x1054 #define CL_DEVICE_MAX_PIPE_ARGS 0x1055 #define CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS 0x1056 #define CL_DEVICE_PIPE_MAX_PACKET_SIZE 0x1057 #define CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT 0x1058 #define CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT 0x1059 #define CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT 0x105A #define CL_DEVICE_IL_VERSION 0x105B #define CL_DEVICE_MAX_NUM_SUB_GROUPS 0x105C #define CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS 0x105D /* cl_device_fp_config - bitfield */ #define CL_FP_DENORM (1 << 0) #define CL_FP_INF_NAN (1 << 1) #define CL_FP_ROUND_TO_NEAREST (1 << 2) #define CL_FP_ROUND_TO_ZERO (1 << 3) #define CL_FP_ROUND_TO_INF (1 << 4) #define CL_FP_FMA (1 << 5) #define CL_FP_SOFT_FLOAT (1 << 6) #define CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT (1 << 7) /* cl_device_mem_cache_type */ #define CL_NONE 0x0 #define CL_READ_ONLY_CACHE 0x1 #define CL_READ_WRITE_CACHE 0x2 /* cl_device_local_mem_type */ #define CL_LOCAL 0x1 #define CL_GLOBAL 0x2 /* cl_device_exec_capabilities - bitfield */ #define CL_EXEC_KERNEL (1 << 0) #define CL_EXEC_NATIVE_KERNEL (1 << 1) /* cl_command_queue_properties - bitfield */ #define CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE (1 << 0) #define CL_QUEUE_PROFILING_ENABLE (1 << 1) #define CL_QUEUE_ON_DEVICE (1 << 2) #define CL_QUEUE_ON_DEVICE_DEFAULT (1 << 3) /* cl_context_info */ #define CL_CONTEXT_REFERENCE_COUNT 0x1080 #define CL_CONTEXT_DEVICES 0x1081 #define CL_CONTEXT_PROPERTIES 0x1082 #define CL_CONTEXT_NUM_DEVICES 0x1083 /* cl_context_properties */ #define CL_CONTEXT_PLATFORM 0x1084 #define CL_CONTEXT_INTEROP_USER_SYNC 0x1085 /* cl_device_partition_property */ #define CL_DEVICE_PARTITION_EQUALLY 0x1086 #define CL_DEVICE_PARTITION_BY_COUNTS 0x1087 #define CL_DEVICE_PARTITION_BY_COUNTS_LIST_END 0x0 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN 0x1088 /* cl_device_affinity_domain */ #define CL_DEVICE_AFFINITY_DOMAIN_NUMA (1 << 0) #define CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE (1 << 1) #define CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE (1 << 2) #define CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE (1 << 3) #define CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE (1 << 4) #define CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE (1 << 5) /* cl_device_svm_capabilities */ #define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER (1 << 0) #define CL_DEVICE_SVM_FINE_GRAIN_BUFFER (1 << 1) #define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM (1 << 2) #define CL_DEVICE_SVM_ATOMICS (1 << 3) /* cl_command_queue_info */ #define CL_QUEUE_CONTEXT 0x1090 #define CL_QUEUE_DEVICE 0x1091 #define CL_QUEUE_REFERENCE_COUNT 0x1092 #define CL_QUEUE_PROPERTIES 0x1093 #define CL_QUEUE_SIZE 0x1094 #define CL_QUEUE_DEVICE_DEFAULT 0x1095 /* cl_mem_flags and cl_svm_mem_flags - bitfield */ #define CL_MEM_READ_WRITE (1 << 0) #define CL_MEM_WRITE_ONLY (1 << 1) #define CL_MEM_READ_ONLY (1 << 2) #define CL_MEM_USE_HOST_PTR (1 << 3) #define CL_MEM_ALLOC_HOST_PTR (1 << 4) #define CL_MEM_COPY_HOST_PTR (1 << 5) /* reserved (1 << 6) */ #define CL_MEM_HOST_WRITE_ONLY (1 << 7) #define CL_MEM_HOST_READ_ONLY (1 << 8) #define CL_MEM_HOST_NO_ACCESS (1 << 9) #define CL_MEM_SVM_FINE_GRAIN_BUFFER (1 << 10) /* used by cl_svm_mem_flags only */ #define CL_MEM_SVM_ATOMICS (1 << 11) /* used by cl_svm_mem_flags only */ #define CL_MEM_KERNEL_READ_AND_WRITE (1 << 12) /* cl_mem_migration_flags - bitfield */ #define CL_MIGRATE_MEM_OBJECT_HOST (1 << 0) #define CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED (1 << 1) /* cl_channel_order */ #define CL_R 0x10B0 #define CL_A 0x10B1 #define CL_RG 0x10B2 #define CL_RA 0x10B3 #define CL_RGB 0x10B4 #define CL_RGBA 0x10B5 #define CL_BGRA 0x10B6 #define CL_ARGB 0x10B7 #define CL_INTENSITY 0x10B8 #define CL_LUMINANCE 0x10B9 #define CL_Rx 0x10BA #define CL_RGx 0x10BB #define CL_RGBx 0x10BC #define CL_DEPTH 0x10BD #define CL_DEPTH_STENCIL 0x10BE #define CL_sRGB 0x10BF #define CL_sRGBx 0x10C0 #define CL_sRGBA 0x10C1 #define CL_sBGRA 0x10C2 #define CL_ABGR 0x10C3 /* cl_channel_type */ #define CL_SNORM_INT8 0x10D0 #define CL_SNORM_INT16 0x10D1 #define CL_UNORM_INT8 0x10D2 #define CL_UNORM_INT16 0x10D3 #define CL_UNORM_SHORT_565 0x10D4 #define CL_UNORM_SHORT_555 0x10D5 #define CL_UNORM_INT_101010 0x10D6 #define CL_SIGNED_INT8 0x10D7 #define CL_SIGNED_INT16 0x10D8 #define CL_SIGNED_INT32 0x10D9 #define CL_UNSIGNED_INT8 0x10DA #define CL_UNSIGNED_INT16 0x10DB #define CL_UNSIGNED_INT32 0x10DC #define CL_HALF_FLOAT 0x10DD #define CL_FLOAT 0x10DE #define CL_UNORM_INT24 0x10DF #define CL_UNORM_INT_101010_2 0x10E0 /* cl_mem_object_type */ #define CL_MEM_OBJECT_BUFFER 0x10F0 #define CL_MEM_OBJECT_IMAGE2D 0x10F1 #define CL_MEM_OBJECT_IMAGE3D 0x10F2 #define CL_MEM_OBJECT_IMAGE2D_ARRAY 0x10F3 #define CL_MEM_OBJECT_IMAGE1D 0x10F4 #define CL_MEM_OBJECT_IMAGE1D_ARRAY 0x10F5 #define CL_MEM_OBJECT_IMAGE1D_BUFFER 0x10F6 #define CL_MEM_OBJECT_PIPE 0x10F7 /* cl_mem_info */ #define CL_MEM_TYPE 0x1100 #define CL_MEM_FLAGS 0x1101 #define CL_MEM_SIZE 0x1102 #define CL_MEM_HOST_PTR 0x1103 #define CL_MEM_MAP_COUNT 0x1104 #define CL_MEM_REFERENCE_COUNT 0x1105 #define CL_MEM_CONTEXT 0x1106 #define CL_MEM_ASSOCIATED_MEMOBJECT 0x1107 #define CL_MEM_OFFSET 0x1108 #define CL_MEM_USES_SVM_POINTER 0x1109 /* cl_image_info */ #define CL_IMAGE_FORMAT 0x1110 #define CL_IMAGE_ELEMENT_SIZE 0x1111 #define CL_IMAGE_ROW_PITCH 0x1112 #define CL_IMAGE_SLICE_PITCH 0x1113 #define CL_IMAGE_WIDTH 0x1114 #define CL_IMAGE_HEIGHT 0x1115 #define CL_IMAGE_DEPTH 0x1116 #define CL_IMAGE_ARRAY_SIZE 0x1117 #define CL_IMAGE_BUFFER 0x1118 #define CL_IMAGE_NUM_MIP_LEVELS 0x1119 #define CL_IMAGE_NUM_SAMPLES 0x111A /* cl_pipe_info */ #define CL_PIPE_PACKET_SIZE 0x1120 #define CL_PIPE_MAX_PACKETS 0x1121 /* cl_addressing_mode */ #define CL_ADDRESS_NONE 0x1130 #define CL_ADDRESS_CLAMP_TO_EDGE 0x1131 #define CL_ADDRESS_CLAMP 0x1132 #define CL_ADDRESS_REPEAT 0x1133 #define CL_ADDRESS_MIRRORED_REPEAT 0x1134 /* cl_filter_mode */ #define CL_FILTER_NEAREST 0x1140 #define CL_FILTER_LINEAR 0x1141 /* cl_sampler_info */ #define CL_SAMPLER_REFERENCE_COUNT 0x1150 #define CL_SAMPLER_CONTEXT 0x1151 #define CL_SAMPLER_NORMALIZED_COORDS 0x1152 #define CL_SAMPLER_ADDRESSING_MODE 0x1153 #define CL_SAMPLER_FILTER_MODE 0x1154 #define CL_SAMPLER_MIP_FILTER_MODE 0x1155 #define CL_SAMPLER_LOD_MIN 0x1156 #define CL_SAMPLER_LOD_MAX 0x1157 /* cl_map_flags - bitfield */ #define CL_MAP_READ (1 << 0) #define CL_MAP_WRITE (1 << 1) #define CL_MAP_WRITE_INVALIDATE_REGION (1 << 2) /* cl_program_info */ #define CL_PROGRAM_REFERENCE_COUNT 0x1160 #define CL_PROGRAM_CONTEXT 0x1161 #define CL_PROGRAM_NUM_DEVICES 0x1162 #define CL_PROGRAM_DEVICES 0x1163 #define CL_PROGRAM_SOURCE 0x1164 #define CL_PROGRAM_BINARY_SIZES 0x1165 #define CL_PROGRAM_BINARIES 0x1166 #define CL_PROGRAM_NUM_KERNELS 0x1167 #define CL_PROGRAM_KERNEL_NAMES 0x1168 #define CL_PROGRAM_IL 0x1169 /* cl_program_build_info */ #define CL_PROGRAM_BUILD_STATUS 0x1181 #define CL_PROGRAM_BUILD_OPTIONS 0x1182 #define CL_PROGRAM_BUILD_LOG 0x1183 #define CL_PROGRAM_BINARY_TYPE 0x1184 #define CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE 0x1185 /* cl_program_binary_type */ #define CL_PROGRAM_BINARY_TYPE_NONE 0x0 #define CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT 0x1 #define CL_PROGRAM_BINARY_TYPE_LIBRARY 0x2 #define CL_PROGRAM_BINARY_TYPE_EXECUTABLE 0x4 /* cl_build_status */ #define CL_BUILD_SUCCESS 0 #define CL_BUILD_NONE -1 #define CL_BUILD_ERROR -2 #define CL_BUILD_IN_PROGRESS -3 /* cl_kernel_info */ #define CL_KERNEL_FUNCTION_NAME 0x1190 #define CL_KERNEL_NUM_ARGS 0x1191 #define CL_KERNEL_REFERENCE_COUNT 0x1192 #define CL_KERNEL_CONTEXT 0x1193 #define CL_KERNEL_PROGRAM 0x1194 #define CL_KERNEL_ATTRIBUTES 0x1195 #define CL_KERNEL_MAX_NUM_SUB_GROUPS 0x11B9 #define CL_KERNEL_COMPILE_NUM_SUB_GROUPS 0x11BA /* cl_kernel_arg_info */ #define CL_KERNEL_ARG_ADDRESS_QUALIFIER 0x1196 #define CL_KERNEL_ARG_ACCESS_QUALIFIER 0x1197 #define CL_KERNEL_ARG_TYPE_NAME 0x1198 #define CL_KERNEL_ARG_TYPE_QUALIFIER 0x1199 #define CL_KERNEL_ARG_NAME 0x119A /* cl_kernel_arg_address_qualifier */ #define CL_KERNEL_ARG_ADDRESS_GLOBAL 0x119B #define CL_KERNEL_ARG_ADDRESS_LOCAL 0x119C #define CL_KERNEL_ARG_ADDRESS_CONSTANT 0x119D #define CL_KERNEL_ARG_ADDRESS_PRIVATE 0x119E /* cl_kernel_arg_access_qualifier */ #define CL_KERNEL_ARG_ACCESS_READ_ONLY 0x11A0 #define CL_KERNEL_ARG_ACCESS_WRITE_ONLY 0x11A1 #define CL_KERNEL_ARG_ACCESS_READ_WRITE 0x11A2 #define CL_KERNEL_ARG_ACCESS_NONE 0x11A3 /* cl_kernel_arg_type_qualifier */ #define CL_KERNEL_ARG_TYPE_NONE 0 #define CL_KERNEL_ARG_TYPE_CONST (1 << 0) #define CL_KERNEL_ARG_TYPE_RESTRICT (1 << 1) #define CL_KERNEL_ARG_TYPE_VOLATILE (1 << 2) #define CL_KERNEL_ARG_TYPE_PIPE (1 << 3) /* cl_kernel_work_group_info */ #define CL_KERNEL_WORK_GROUP_SIZE 0x11B0 #define CL_KERNEL_COMPILE_WORK_GROUP_SIZE 0x11B1 #define CL_KERNEL_LOCAL_MEM_SIZE 0x11B2 #define CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE 0x11B3 #define CL_KERNEL_PRIVATE_MEM_SIZE 0x11B4 #define CL_KERNEL_GLOBAL_WORK_SIZE 0x11B5 /* cl_kernel_sub_group_info */ #define CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE 0x2033 #define CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE 0x2034 #define CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT 0x11B8 /* cl_kernel_exec_info */ #define CL_KERNEL_EXEC_INFO_SVM_PTRS 0x11B6 #define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM 0x11B7 /* cl_event_info */ #define CL_EVENT_COMMAND_QUEUE 0x11D0 #define CL_EVENT_COMMAND_TYPE 0x11D1 #define CL_EVENT_REFERENCE_COUNT 0x11D2 #define CL_EVENT_COMMAND_EXECUTION_STATUS 0x11D3 #define CL_EVENT_CONTEXT 0x11D4 /* cl_command_type */ #define CL_COMMAND_NDRANGE_KERNEL 0x11F0 #define CL_COMMAND_TASK 0x11F1 #define CL_COMMAND_NATIVE_KERNEL 0x11F2 #define CL_COMMAND_READ_BUFFER 0x11F3 #define CL_COMMAND_WRITE_BUFFER 0x11F4 #define CL_COMMAND_COPY_BUFFER 0x11F5 #define CL_COMMAND_READ_IMAGE 0x11F6 #define CL_COMMAND_WRITE_IMAGE 0x11F7 #define CL_COMMAND_COPY_IMAGE 0x11F8 #define CL_COMMAND_COPY_IMAGE_TO_BUFFER 0x11F9 #define CL_COMMAND_COPY_BUFFER_TO_IMAGE 0x11FA #define CL_COMMAND_MAP_BUFFER 0x11FB #define CL_COMMAND_MAP_IMAGE 0x11FC #define CL_COMMAND_UNMAP_MEM_OBJECT 0x11FD #define CL_COMMAND_MARKER 0x11FE #define CL_COMMAND_ACQUIRE_GL_OBJECTS 0x11FF #define CL_COMMAND_RELEASE_GL_OBJECTS 0x1200 #define CL_COMMAND_READ_BUFFER_RECT 0x1201 #define CL_COMMAND_WRITE_BUFFER_RECT 0x1202 #define CL_COMMAND_COPY_BUFFER_RECT 0x1203 #define CL_COMMAND_USER 0x1204 #define CL_COMMAND_BARRIER 0x1205 #define CL_COMMAND_MIGRATE_MEM_OBJECTS 0x1206 #define CL_COMMAND_FILL_BUFFER 0x1207 #define CL_COMMAND_FILL_IMAGE 0x1208 #define CL_COMMAND_SVM_FREE 0x1209 #define CL_COMMAND_SVM_MEMCPY 0x120A #define CL_COMMAND_SVM_MEMFILL 0x120B #define CL_COMMAND_SVM_MAP 0x120C #define CL_COMMAND_SVM_UNMAP 0x120D #define CL_COMMAND_SVM_MIGRATE_MEM 0x120E /* command execution status */ #define CL_COMPLETE 0x0 #define CL_RUNNING 0x1 #define CL_SUBMITTED 0x2 #define CL_QUEUED 0x3 /* cl_buffer_create_type */ #define CL_BUFFER_CREATE_TYPE_REGION 0x1220 /* cl_profiling_info */ #define CL_PROFILING_COMMAND_QUEUED 0x1280 #define CL_PROFILING_COMMAND_SUBMIT 0x1281 #define CL_PROFILING_COMMAND_START 0x1282 #define CL_PROFILING_COMMAND_END 0x1283 #define CL_PROFILING_COMMAND_COMPLETE 0x1284 /********************************************************************************************************/ /* Platform API */ extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformIDs(cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformInfo(cl_platform_id /* platform */, cl_platform_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Device APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDs(cl_platform_id /* platform */, cl_device_type /* device_type */, cl_uint /* num_entries */, cl_device_id * /* devices */, cl_uint * /* num_devices */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceInfo(cl_device_id /* device */, cl_device_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevices(cl_device_id /* in_device */, const cl_device_partition_property * /* properties */, cl_uint /* num_devices */, cl_device_id * /* out_devices */, cl_uint * /* num_devices_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDevice(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDevice(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetDefaultDeviceCommandQueue(cl_context /* context */, cl_device_id /* device */, cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_2_1; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceAndHostTimer(cl_device_id /* device */, cl_ulong* /* device_timestamp */, cl_ulong* /* host_timestamp */) CL_API_SUFFIX__VERSION_2_1; extern CL_API_ENTRY cl_int CL_API_CALL clGetHostTimer(cl_device_id /* device */, cl_ulong * /* host_timestamp */) CL_API_SUFFIX__VERSION_2_1; /* Context APIs */ extern CL_API_ENTRY cl_context CL_API_CALL clCreateContext(const cl_context_properties * /* properties */, cl_uint /* num_devices */, const cl_device_id * /* devices */, void (CL_CALLBACK * /* pfn_notify */)(const char *, const void *, size_t, void *), void * /* user_data */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_context CL_API_CALL clCreateContextFromType(const cl_context_properties * /* properties */, cl_device_type /* device_type */, void (CL_CALLBACK * /* pfn_notify*/ )(const char *, const void *, size_t, void *), void * /* user_data */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainContext(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseContext(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetContextInfo(cl_context /* context */, cl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Command Queue APIs */ extern CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueueWithProperties(cl_context /* context */, cl_device_id /* device */, const cl_queue_properties * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainCommandQueue(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseCommandQueue(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetCommandQueueInfo(cl_command_queue /* command_queue */, cl_command_queue_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Memory Object APIs */ extern CL_API_ENTRY cl_mem CL_API_CALL clCreateBuffer(cl_context /* context */, cl_mem_flags /* flags */, size_t /* size */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateSubBuffer(cl_mem /* buffer */, cl_mem_flags /* flags */, cl_buffer_create_type /* buffer_create_type */, const void * /* buffer_create_info */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateImage(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, const cl_image_desc * /* image_desc */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_mem CL_API_CALL clCreatePipe(cl_context /* context */, cl_mem_flags /* flags */, cl_uint /* pipe_packet_size */, cl_uint /* pipe_max_packets */, const cl_pipe_properties * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainMemObject(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseMemObject(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSupportedImageFormats(cl_context /* context */, cl_mem_flags /* flags */, cl_mem_object_type /* image_type */, cl_uint /* num_entries */, cl_image_format * /* image_formats */, cl_uint * /* num_image_formats */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectInfo(cl_mem /* memobj */, cl_mem_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetImageInfo(cl_mem /* image */, cl_image_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetPipeInfo(cl_mem /* pipe */, cl_pipe_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetMemObjectDestructorCallback(cl_mem /* memobj */, void (CL_CALLBACK * /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/), void * /*user_data */ ) CL_API_SUFFIX__VERSION_1_1; /* SVM Allocation APIs */ extern CL_API_ENTRY void * CL_API_CALL clSVMAlloc(cl_context /* context */, cl_svm_mem_flags /* flags */, size_t /* size */, cl_uint /* alignment */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY void CL_API_CALL clSVMFree(cl_context /* context */, void * /* svm_pointer */) CL_API_SUFFIX__VERSION_2_0; /* Sampler APIs */ extern CL_API_ENTRY cl_sampler CL_API_CALL clCreateSamplerWithProperties(cl_context /* context */, const cl_sampler_properties * /* normalized_coords */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainSampler(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseSampler(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSamplerInfo(cl_sampler /* sampler */, cl_sampler_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Program Object APIs */ extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithSource(cl_context /* context */, cl_uint /* count */, const char ** /* strings */, const size_t * /* lengths */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBinary(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const size_t * /* lengths */, const unsigned char ** /* binaries */, cl_int * /* binary_status */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBuiltInKernels(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* kernel_names */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithIL(cl_context /* context */, const void* /* il */, size_t /* length */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_2_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainProgram(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseProgram(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clBuildProgram(cl_program /* program */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCompileProgram(cl_program /* program */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, cl_uint /* num_input_headers */, const cl_program * /* input_headers */, const char ** /* header_include_names */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_program CL_API_CALL clLinkProgram(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, cl_uint /* num_input_programs */, const cl_program * /* input_programs */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */, cl_int * /* errcode_ret */ ) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clUnloadPlatformCompiler(cl_platform_id /* platform */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramInfo(cl_program /* program */, cl_program_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramBuildInfo(cl_program /* program */, cl_device_id /* device */, cl_program_build_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Kernel Object APIs */ extern CL_API_ENTRY cl_kernel CL_API_CALL clCreateKernel(cl_program /* program */, const char * /* kernel_name */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateKernelsInProgram(cl_program /* program */, cl_uint /* num_kernels */, cl_kernel * /* kernels */, cl_uint * /* num_kernels_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_kernel CL_API_CALL clCloneKernel(cl_kernel /* source_kernel */, cl_int* /* errcode_ret */) CL_API_SUFFIX__VERSION_2_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainKernel(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseKernel(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArg(cl_kernel /* kernel */, cl_uint /* arg_index */, size_t /* arg_size */, const void * /* arg_value */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointer(cl_kernel /* kernel */, cl_uint /* arg_index */, const void * /* arg_value */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfo(cl_kernel /* kernel */, cl_kernel_exec_info /* param_name */, size_t /* param_value_size */, const void * /* param_value */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelInfo(cl_kernel /* kernel */, cl_kernel_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelArgInfo(cl_kernel /* kernel */, cl_uint /* arg_indx */, cl_kernel_arg_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelWorkGroupInfo(cl_kernel /* kernel */, cl_device_id /* device */, cl_kernel_work_group_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfo(cl_kernel /* kernel */, cl_device_id /* device */, cl_kernel_sub_group_info /* param_name */, size_t /* input_value_size */, const void* /*input_value */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */ ) CL_API_SUFFIX__VERSION_2_1; /* Event Object APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clWaitForEvents(cl_uint /* num_events */, const cl_event * /* event_list */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetEventInfo(cl_event /* event */, cl_event_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_event CL_API_CALL clCreateUserEvent(cl_context /* context */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainEvent(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseEvent(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetUserEventStatus(cl_event /* event */, cl_int /* execution_status */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clSetEventCallback( cl_event /* event */, cl_int /* command_exec_callback_type */, void (CL_CALLBACK * /* pfn_notify */)(cl_event, cl_int, void *), void * /* user_data */) CL_API_SUFFIX__VERSION_1_1; /* Profiling APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetEventProfilingInfo(cl_event /* event */, cl_profiling_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Flush and Finish APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clFlush(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clFinish(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; /* Enqueued Commands APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, size_t /* offset */, size_t /* size */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBufferRect(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, const size_t * /* buffer_offset */, const size_t * /* host_offset */, const size_t * /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, size_t /* offset */, size_t /* size */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBufferRect(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, const size_t * /* buffer_offset */, const size_t * /* host_offset */, const size_t * /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, const void * /* pattern */, size_t /* pattern_size */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBuffer(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, size_t /* src_offset */, size_t /* dst_offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferRect(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, const size_t * /* src_origin */, const size_t * /* dst_origin */, const size_t * /* region */, size_t /* src_row_pitch */, size_t /* src_slice_pitch */, size_t /* dst_row_pitch */, size_t /* dst_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_read */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t /* row_pitch */, size_t /* slice_pitch */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_write */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t /* input_row_pitch */, size_t /* input_slice_pitch */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillImage(cl_command_queue /* command_queue */, cl_mem /* image */, const void * /* fill_color */, const size_t * /* origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImage(cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_image */, const size_t * /* src_origin[3] */, const size_t * /* dst_origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImageToBuffer(cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_buffer */, const size_t * /* src_origin[3] */, const size_t * /* region[3] */, size_t /* dst_offset */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferToImage(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_image */, size_t /* src_offset */, const size_t * /* dst_origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t * /* image_row_pitch */, size_t * /* image_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueUnmapMemObject(cl_command_queue /* command_queue */, cl_mem /* memobj */, void * /* mapped_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMigrateMemObjects(cl_command_queue /* command_queue */, cl_uint /* num_mem_objects */, const cl_mem * /* mem_objects */, cl_mem_migration_flags /* flags */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNDRangeKernel(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* work_dim */, const size_t * /* global_work_offset */, const size_t * /* global_work_size */, const size_t * /* local_work_size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNativeKernel(cl_command_queue /* command_queue */, void (CL_CALLBACK * /*user_func*/)(void *), void * /* args */, size_t /* cb_args */, cl_uint /* num_mem_objects */, const cl_mem * /* mem_list */, const void ** /* args_mem_loc */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarkerWithWaitList(cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrierWithWaitList(cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFree(cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void (CL_CALLBACK * /*pfn_free_func*/)(cl_command_queue /* queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void * /* user_data */), void * /* user_data */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpy(cl_command_queue /* command_queue */, cl_bool /* blocking_copy */, void * /* dst_ptr */, const void * /* src_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFill(cl_command_queue /* command_queue */, void * /* svm_ptr */, const void * /* pattern */, size_t /* pattern_size */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMap(cl_command_queue /* command_queue */, cl_bool /* blocking_map */, cl_map_flags /* flags */, void * /* svm_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmap(cl_command_queue /* command_queue */, void * /* svm_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMigrateMem(cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, const void ** /* svm_pointers */, const size_t * /* sizes */, cl_mem_migration_flags /* flags */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_1; /* Extension function access * * Returns the extension function address for the given function name, * or NULL if a valid function can not be found. The client must * check to make sure the address is not NULL, before using or * calling the returned function address. */ extern CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddressForPlatform(cl_platform_id /* platform */, const char * /* func_name */) CL_API_SUFFIX__VERSION_1_2; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage2D(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_row_pitch */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage3D(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_depth */, size_t /* image_row_pitch */, size_t /* image_slice_pitch */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueMarker(cl_command_queue /* command_queue */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueWaitForEvents(cl_command_queue /* command_queue */, cl_uint /* num_events */, const cl_event * /* event_list */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueBarrier(cl_command_queue /* command_queue */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clUnloadCompiler(void) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED void * CL_API_CALL clGetExtensionFunctionAddress(const char * /* func_name */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* Deprecated OpenCL 2.0 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_command_queue CL_API_CALL clCreateCommandQueue(cl_context /* context */, cl_device_id /* device */, cl_command_queue_properties /* properties */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_sampler CL_API_CALL clCreateSampler(cl_context /* context */, cl_bool /* normalized_coords */, cl_addressing_mode /* addressing_mode */, cl_filter_mode /* filter_mode */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_int CL_API_CALL clEnqueueTask(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl.hpp000066400000000000000000011076001450307266000227360ustar00rootroot00000000000000/* Modifications Copyright(C)[2021-2022] Advanced Micro Devices, Inc. * All rights reserved. * */ /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /*! \file * * \brief C++ bindings for OpenCL 1.0 (rev 48), OpenCL 1.1 (rev 33) and * OpenCL 1.2 (rev 15) * \author Benedict R. Gaster, Laurent Morichetti and Lee Howes * * Additions and fixes from: * Brian Cole, March 3rd 2010 and April 2012 * Matt Gruenke, April 2012. * Bruce Merry, February 2013. * Tom Deakin and Simon McIntosh-Smith, July 2013 * * \version 1.2.9 * \date December 2015 * * Optional extension support * * cl * cl_ext_device_fission * #define USE_CL_DEVICE_FISSION */ /*! \mainpage * \section intro Introduction * For many large applications C++ is the language of choice and so it seems * reasonable to define C++ bindings for OpenCL. * * * The interface is contained with a single C++ header file \em cl.hpp and all * definitions are contained within the namespace \em cl. There is no additional * requirement to include \em cl.h and to use either the C++ or original C * bindings it is enough to simply include \em cl.hpp. * * The bindings themselves are lightweight and correspond closely to the * underlying C API. Using the C++ bindings introduces no additional execution * overhead. * * For detail documentation on the bindings see: * * The OpenCL C++ Wrapper API 1.2 (revision 09) * http://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.2.pdf * * \section example Example * * The following example shows a general use case for the C++ * bindings, including support for the optional exception feature and * also the supplied vector and string classes, see following sections for * decriptions of these features. * * \code * #define __CL_ENABLE_EXCEPTIONS * * #if defined(__APPLE__) || defined(__MACOSX) * #include * #else * #include * #endif * #include * #include * #include * * const char * helloStr = "__kernel void " * "hello(void) " * "{ " * " " * "} "; * * int * main(void) * { * cl_int err = CL_SUCCESS; * try { * * std::vector platforms; * cl::Platform::get(&platforms); * if (platforms.size() == 0) { * std::cout << "Platform size 0\n"; * return -1; * } * * cl_context_properties properties[] = * { CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[0])(), 0}; * cl::Context context(CL_DEVICE_TYPE_CPU, properties); * * std::vector devices = context.getInfo(); * * cl::Program::Sources source(1, * std::make_pair(helloStr,strlen(helloStr))); * cl::Program program_ = cl::Program(context, source); * program_.build(devices); * * cl::Kernel kernel(program_, "hello", &err); * * cl::Event event; * cl::CommandQueue queue(context, devices[0], 0, &err); * queue.enqueueNDRangeKernel( * kernel, * cl::NullRange, * cl::NDRange(4,4), * cl::NullRange, * NULL, * &event); * * event.wait(); * } * catch (cl::Error err) { * std::cerr * << "ERROR: " * << err.what() * << "(" * << err.err() * << ")" * << std::endl; * } * * return EXIT_SUCCESS; * } * * \endcode * */ #ifndef CL_HPP_ #define CL_HPP_ #ifdef _WIN32 #include #if defined(USE_DX_INTEROP) #include #include #endif #endif // _WIN32 #if defined(_MSC_VER) #include #endif // _MSC_VER // #if defined(USE_CL_DEVICE_FISSION) #include #endif #if defined(__APPLE__) || defined(__MACOSX) #include #else #include #endif // !__APPLE__ #if (_MSC_VER >= 1700) || (__cplusplus >= 201103L) #define CL_HPP_RVALUE_REFERENCES_SUPPORTED #define CL_HPP_CPP11_ATOMICS_SUPPORTED #include #endif #if (__cplusplus >= 201103L) #define CL_HPP_NOEXCEPT noexcept #else #define CL_HPP_NOEXCEPT #endif // To avoid accidentally taking ownership of core OpenCL types // such as cl_kernel constructors are made explicit // under OpenCL 1.2 #if defined(CL_VERSION_1_2) && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __CL_EXPLICIT_CONSTRUCTORS explicit #else // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __CL_EXPLICIT_CONSTRUCTORS #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Define deprecated prefixes and suffixes to ensure compilation // in case they are not pre-defined #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_CALLBACK) #define CL_CALLBACK #endif //CL_CALLBACK #include #include #include #if defined(__CL_ENABLE_EXCEPTIONS) #include #endif // #if defined(__CL_ENABLE_EXCEPTIONS) #if !defined(__NO_STD_VECTOR) #include #endif #if !defined(__NO_STD_STRING) #include #endif #if defined(__ANDROID__) || defined(linux) || defined(__APPLE__) || defined(__MACOSX) #include #endif // linux #include /*! \namespace cl * * \brief The OpenCL C++ bindings are defined within this namespace. * */ namespace cl { class Memory; /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) #define __INIT_CL_EXT_FCN_PTR(name) \ if(!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddress(#name); \ if(!pfn_##name) { \ } \ } #endif // #if defined(CL_VERSION_1_1) #if defined(CL_VERSION_1_2) #define __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, name) \ if(!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddressForPlatform(platform, #name); \ if(!pfn_##name) { \ } \ } #endif // #if defined(CL_VERSION_1_1) class Program; class Device; class Context; class CommandQueue; class Memory; class Buffer; #if defined(__CL_ENABLE_EXCEPTIONS) /*! \brief Exception class * * This may be thrown by API functions when __CL_ENABLE_EXCEPTIONS is defined. */ class Error : public std::exception { private: cl_int err_; const char * errStr_; public: /*! \brief Create a new CL error exception for a given error code * and corresponding message. * * \param err error code value. * * \param errStr a descriptive string that must remain in scope until * handling of the exception has concluded. If set, it * will be returned by what(). */ Error(cl_int err, const char * errStr = NULL) : err_(err), errStr_(errStr) {} ~Error() throw() {} /*! \brief Get error string associated with exception * * \return A memory pointer to the error message string. */ virtual const char * what() const throw () { if (errStr_ == NULL) { return "empty"; } else { return errStr_; } } /*! \brief Get error code associated with exception * * \return The error code. */ cl_int err(void) const { return err_; } }; #define __ERR_STR(x) #x #else #define __ERR_STR(x) NULL #endif // __CL_ENABLE_EXCEPTIONS namespace detail { #if defined(__CL_ENABLE_EXCEPTIONS) static inline cl_int errHandler ( cl_int err, const char * errStr = NULL) { if (err != CL_SUCCESS) { throw Error(err, errStr); } return err; } #else static inline cl_int errHandler (cl_int err, const char * errStr = NULL) { (void) errStr; // suppress unused variable warning return err; } #endif // __CL_ENABLE_EXCEPTIONS } //! \cond DOXYGEN_DETAIL #if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS) #define __GET_DEVICE_INFO_ERR __ERR_STR(clGetDeviceInfo) #define __GET_PLATFORM_INFO_ERR __ERR_STR(clGetPlatformInfo) #define __GET_DEVICE_IDS_ERR __ERR_STR(clGetDeviceIDs) #define __GET_PLATFORM_IDS_ERR __ERR_STR(clGetPlatformIDs) #define __GET_CONTEXT_INFO_ERR __ERR_STR(clGetContextInfo) #define __GET_EVENT_INFO_ERR __ERR_STR(clGetEventInfo) #define __GET_EVENT_PROFILE_INFO_ERR __ERR_STR(clGetEventProfileInfo) #define __GET_MEM_OBJECT_INFO_ERR __ERR_STR(clGetMemObjectInfo) #define __GET_IMAGE_INFO_ERR __ERR_STR(clGetImageInfo) #define __GET_SAMPLER_INFO_ERR __ERR_STR(clGetSamplerInfo) #define __GET_KERNEL_INFO_ERR __ERR_STR(clGetKernelInfo) #if defined(CL_VERSION_1_2) #define __GET_KERNEL_ARG_INFO_ERR __ERR_STR(clGetKernelArgInfo) #endif // #if defined(CL_VERSION_1_2) #define __GET_KERNEL_WORK_GROUP_INFO_ERR __ERR_STR(clGetKernelWorkGroupInfo) #define __GET_PROGRAM_INFO_ERR __ERR_STR(clGetProgramInfo) #define __GET_PROGRAM_BUILD_INFO_ERR __ERR_STR(clGetProgramBuildInfo) #define __GET_COMMAND_QUEUE_INFO_ERR __ERR_STR(clGetCommandQueueInfo) #define __CREATE_CONTEXT_ERR __ERR_STR(clCreateContext) #define __CREATE_CONTEXT_FROM_TYPE_ERR __ERR_STR(clCreateContextFromType) #define __GET_SUPPORTED_IMAGE_FORMATS_ERR __ERR_STR(clGetSupportedImageFormats) #define __CREATE_BUFFER_ERR __ERR_STR(clCreateBuffer) #define __COPY_ERR __ERR_STR(cl::copy) #define __CREATE_SUBBUFFER_ERR __ERR_STR(clCreateSubBuffer) #define __CREATE_GL_BUFFER_ERR __ERR_STR(clCreateFromGLBuffer) #define __CREATE_GL_RENDER_BUFFER_ERR __ERR_STR(clCreateFromGLBuffer) #define __GET_GL_OBJECT_INFO_ERR __ERR_STR(clGetGLObjectInfo) #if defined(CL_VERSION_1_2) #define __CREATE_IMAGE_ERR __ERR_STR(clCreateImage) #define __CREATE_GL_TEXTURE_ERR __ERR_STR(clCreateFromGLTexture) #define __IMAGE_DIMENSION_ERR __ERR_STR(Incorrect image dimensions) #endif // #if defined(CL_VERSION_1_2) #define __CREATE_SAMPLER_ERR __ERR_STR(clCreateSampler) #define __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR __ERR_STR(clSetMemObjectDestructorCallback) #define __CREATE_USER_EVENT_ERR __ERR_STR(clCreateUserEvent) #define __SET_USER_EVENT_STATUS_ERR __ERR_STR(clSetUserEventStatus) #define __SET_EVENT_CALLBACK_ERR __ERR_STR(clSetEventCallback) #define __WAIT_FOR_EVENTS_ERR __ERR_STR(clWaitForEvents) #define __CREATE_KERNEL_ERR __ERR_STR(clCreateKernel) #define __SET_KERNEL_ARGS_ERR __ERR_STR(clSetKernelArg) #define __CREATE_PROGRAM_WITH_SOURCE_ERR __ERR_STR(clCreateProgramWithSource) #define __CREATE_PROGRAM_WITH_BINARY_ERR __ERR_STR(clCreateProgramWithBinary) #if defined(CL_VERSION_1_2) #define __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR __ERR_STR(clCreateProgramWithBuiltInKernels) #endif // #if defined(CL_VERSION_1_2) #define __BUILD_PROGRAM_ERR __ERR_STR(clBuildProgram) #if defined(CL_VERSION_1_2) #define __COMPILE_PROGRAM_ERR __ERR_STR(clCompileProgram) #define __LINK_PROGRAM_ERR __ERR_STR(clLinkProgram) #endif // #if defined(CL_VERSION_1_2) #define __CREATE_KERNELS_IN_PROGRAM_ERR __ERR_STR(clCreateKernelsInProgram) #define __CREATE_COMMAND_QUEUE_ERR __ERR_STR(clCreateCommandQueue) #define __SET_COMMAND_QUEUE_PROPERTY_ERR __ERR_STR(clSetCommandQueueProperty) #define __ENQUEUE_READ_BUFFER_ERR __ERR_STR(clEnqueueReadBuffer) #define __ENQUEUE_READ_BUFFER_RECT_ERR __ERR_STR(clEnqueueReadBufferRect) #define __ENQUEUE_WRITE_BUFFER_ERR __ERR_STR(clEnqueueWriteBuffer) #define __ENQUEUE_WRITE_BUFFER_RECT_ERR __ERR_STR(clEnqueueWriteBufferRect) #define __ENQEUE_COPY_BUFFER_ERR __ERR_STR(clEnqueueCopyBuffer) #define __ENQEUE_COPY_BUFFER_RECT_ERR __ERR_STR(clEnqueueCopyBufferRect) #define __ENQUEUE_FILL_BUFFER_ERR __ERR_STR(clEnqueueFillBuffer) #define __ENQUEUE_READ_IMAGE_ERR __ERR_STR(clEnqueueReadImage) #define __ENQUEUE_WRITE_IMAGE_ERR __ERR_STR(clEnqueueWriteImage) #define __ENQUEUE_COPY_IMAGE_ERR __ERR_STR(clEnqueueCopyImage) #define __ENQUEUE_FILL_IMAGE_ERR __ERR_STR(clEnqueueFillImage) #define __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR __ERR_STR(clEnqueueCopyImageToBuffer) #define __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR __ERR_STR(clEnqueueCopyBufferToImage) #define __ENQUEUE_MAP_BUFFER_ERR __ERR_STR(clEnqueueMapBuffer) #define __ENQUEUE_MAP_IMAGE_ERR __ERR_STR(clEnqueueMapImage) #define __ENQUEUE_UNMAP_MEM_OBJECT_ERR __ERR_STR(clEnqueueUnMapMemObject) #define __ENQUEUE_NDRANGE_KERNEL_ERR __ERR_STR(clEnqueueNDRangeKernel) #define __ENQUEUE_TASK_ERR __ERR_STR(clEnqueueTask) #define __ENQUEUE_NATIVE_KERNEL __ERR_STR(clEnqueueNativeKernel) #if defined(CL_VERSION_1_2) #define __ENQUEUE_MIGRATE_MEM_OBJECTS_ERR __ERR_STR(clEnqueueMigrateMemObjects) #endif // #if defined(CL_VERSION_1_2) #define __ENQUEUE_ACQUIRE_GL_ERR __ERR_STR(clEnqueueAcquireGLObjects) #define __ENQUEUE_RELEASE_GL_ERR __ERR_STR(clEnqueueReleaseGLObjects) #define __RETAIN_ERR __ERR_STR(Retain Object) #define __RELEASE_ERR __ERR_STR(Release Object) #define __FLUSH_ERR __ERR_STR(clFlush) #define __FINISH_ERR __ERR_STR(clFinish) #define __VECTOR_CAPACITY_ERR __ERR_STR(Vector capacity error) /** * CL 1.2 version that uses device fission. */ #if defined(CL_VERSION_1_2) #define __CREATE_SUB_DEVICES __ERR_STR(clCreateSubDevices) #else #define __CREATE_SUB_DEVICES __ERR_STR(clCreateSubDevicesEXT) #endif // #if defined(CL_VERSION_1_2) /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) #define __ENQUEUE_MARKER_ERR __ERR_STR(clEnqueueMarker) #define __ENQUEUE_WAIT_FOR_EVENTS_ERR __ERR_STR(clEnqueueWaitForEvents) #define __ENQUEUE_BARRIER_ERR __ERR_STR(clEnqueueBarrier) #define __UNLOAD_COMPILER_ERR __ERR_STR(clUnloadCompiler) #define __CREATE_GL_TEXTURE_2D_ERR __ERR_STR(clCreateFromGLTexture2D) #define __CREATE_GL_TEXTURE_3D_ERR __ERR_STR(clCreateFromGLTexture3D) #define __CREATE_IMAGE2D_ERR __ERR_STR(clCreateImage2D) #define __CREATE_IMAGE3D_ERR __ERR_STR(clCreateImage3D) #endif // #if defined(CL_VERSION_1_1) #endif // __CL_USER_OVERRIDE_ERROR_STRINGS //! \endcond /** * CL 1.2 marker and barrier commands */ #if defined(CL_VERSION_1_2) #define __ENQUEUE_MARKER_WAIT_LIST_ERR __ERR_STR(clEnqueueMarkerWithWaitList) #define __ENQUEUE_BARRIER_WAIT_LIST_ERR __ERR_STR(clEnqueueBarrierWithWaitList) #endif // #if defined(CL_VERSION_1_2) #if !defined(__USE_DEV_STRING) && !defined(__NO_STD_STRING) typedef std::string STRING_CLASS; #elif !defined(__USE_DEV_STRING) /*! \class string * \brief Simple string class, that provides a limited subset of std::string * functionality but avoids many of the issues that come with that class. * \note Deprecated. Please use std::string as default or * re-define the string class to match the std::string * interface by defining STRING_CLASS */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED string { private: ::size_t size_; char * str_; public: //! \brief Constructs an empty string, allocating no memory. string(void) : size_(0), str_(NULL) { } /*! \brief Constructs a string populated from an arbitrary value of * specified size. * * An extra '\0' is added, in case none was contained in str. * * \param str the initial value of the string instance. Note that '\0' * characters receive no special treatment. If NULL, * the string is left empty, with a size of 0. * * \param size the number of characters to copy from str. */ string(const char * str, ::size_t size) : size_(size), str_(NULL) { if( size > 0 ) { str_ = new char[size_+1]; if (str_ != NULL) { memcpy(str_, str, size_ * sizeof(char)); str_[size_] = '\0'; } else { size_ = 0; } } } /*! \brief Constructs a string populated from a null-terminated value. * * \param str the null-terminated initial value of the string instance. * If NULL, the string is left empty, with a size of 0. */ string(const char * str) : size_(0), str_(NULL) { if( str ) { size_= ::strlen(str); } if( size_ > 0 ) { str_ = new char[size_ + 1]; if (str_ != NULL) { memcpy(str_, str, (size_ + 1) * sizeof(char)); } } } void resize( ::size_t n ) { if( size_ == n ) { return; } if (n == 0) { if( str_ ) { delete [] str_; } str_ = NULL; size_ = 0; } else { char *newString = new char[n + 1]; ::size_t copySize = n; if( size_ < n ) { copySize = size_; } size_ = n; if(str_) { memcpy(newString, str_, (copySize + 1) * sizeof(char)); } if( copySize < size_ ) { memset(newString + copySize, 0, size_ - copySize); } newString[size_] = '\0'; delete [] str_; str_ = newString; } } const char& operator[] ( ::size_t pos ) const { return str_[pos]; } char& operator[] ( ::size_t pos ) { return str_[pos]; } /*! \brief Copies the value of another string to this one. * * \param rhs the string to copy. * * \returns a reference to the modified instance. */ string& operator=(const string& rhs) { if (this == &rhs) { return *this; } if( str_ != NULL ) { delete [] str_; str_ = NULL; size_ = 0; } if (rhs.size_ == 0 || rhs.str_ == NULL) { str_ = NULL; size_ = 0; } else { str_ = new char[rhs.size_ + 1]; size_ = rhs.size_; if (str_ != NULL) { memcpy(str_, rhs.str_, (size_ + 1) * sizeof(char)); } else { size_ = 0; } } return *this; } /*! \brief Constructs a string by copying the value of another instance. * * \param rhs the string to copy. */ string(const string& rhs) : size_(0), str_(NULL) { *this = rhs; } //! \brief Destructor - frees memory used to hold the current value. ~string() { delete[] str_; str_ = NULL; } //! \brief Queries the length of the string, excluding any added '\0's. ::size_t size(void) const { return size_; } //! \brief Queries the length of the string, excluding any added '\0's. ::size_t length(void) const { return size(); } /*! \brief Returns a pointer to the private copy held by this instance, * or "" if empty/unset. */ const char * c_str(void) const { return (str_) ? str_ : "";} } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef cl::string STRING_CLASS; #endif // #elif !defined(__USE_DEV_STRING) #if !defined(__USE_DEV_VECTOR) && !defined(__NO_STD_VECTOR) #define VECTOR_CLASS std::vector #elif !defined(__USE_DEV_VECTOR) #define VECTOR_CLASS cl::vector #if !defined(__MAX_DEFAULT_VECTOR_SIZE) #define __MAX_DEFAULT_VECTOR_SIZE 10 #endif /*! \class vector * \brief Fixed sized vector implementation that mirroring * * \note Deprecated. Please use std::vector as default or * re-define the vector class to match the std::vector * interface by defining VECTOR_CLASS * \note Not recommended for use with custom objects as * current implementation will construct N elements * * std::vector functionality. * \brief Fixed sized vector compatible with std::vector. * * \note * This differs from std::vector<> not just in memory allocation, * but also in terms of when members are constructed, destroyed, * and assigned instead of being copy constructed. * * \param T type of element contained in the vector. * * \param N maximum size of the vector. */ template class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED vector { private: T data_[N]; unsigned int size_; public: //! \brief Constructs an empty vector with no memory allocated. vector() : size_(static_cast(0)) {} //! \brief Deallocates the vector's memory and destroys all of its elements. ~vector() { clear(); } //! \brief Returns the number of elements currently contained. unsigned int size(void) const { return size_; } /*! \brief Empties the vector of all elements. * \note * This does not deallocate memory but will invoke destructors * on contained elements. */ void clear() { while(!empty()) { pop_back(); } } /*! \brief Appends an element after the last valid element. * Calling this on a vector that has reached capacity will throw an * exception if exceptions are enabled. */ void push_back (const T& x) { if (size() < N) { new (&data_[size_]) T(x); size_++; } else { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } } /*! \brief Removes the last valid element from the vector. * Calling this on an empty vector will throw an exception * if exceptions are enabled. */ void pop_back(void) { if (size_ != 0) { --size_; data_[size_].~T(); } else { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } } /*! \brief Constructs with a value copied from another. * * \param vec the vector to copy. */ vector(const vector& vec) : size_(vec.size_) { if (size_ != 0) { assign(vec.begin(), vec.end()); } } /*! \brief Constructs with a specified number of initial elements. * * \param size number of initial elements. * * \param val value of initial elements. */ vector(unsigned int size, const T& val = T()) : size_(0) { for (unsigned int i = 0; i < size; i++) { push_back(val); } } /*! \brief Overwrites the current content with that copied from another * instance. * * \param rhs vector to copy. * * \returns a reference to this. */ vector& operator=(const vector& rhs) { if (this == &rhs) { return *this; } if (rhs.size_ != 0) { assign(rhs.begin(), rhs.end()); } else { clear(); } return *this; } /*! \brief Tests equality against another instance. * * \param vec the vector against which to compare. */ bool operator==(vector &vec) { if (size() != vec.size()) { return false; } for( unsigned int i = 0; i < size(); ++i ) { if( operator[](i) != vec[i] ) { return false; } } return true; } //! \brief Conversion operator to T*. operator T* () { return data_; } //! \brief Conversion operator to const T*. operator const T* () const { return data_; } //! \brief Tests whether this instance has any elements. bool empty (void) const { return size_==0; } //! \brief Returns the maximum number of elements this instance can hold. unsigned int max_size (void) const { return N; } //! \brief Returns the maximum number of elements this instance can hold. unsigned int capacity () const { return N; } //! \brief Resizes the vector to the given size void resize(unsigned int newSize, T fill = T()) { if (newSize > N) { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } else { while (size_ < newSize) { new (&data_[size_]) T(fill); size_++; } while (size_ > newSize) { --size_; data_[size_].~T(); } } } /*! \brief Returns a reference to a given element. * * \param index which element to access. * * \note * The caller is responsible for ensuring index is >= 0 and < size(). */ T& operator[](int index) { return data_[index]; } /*! \brief Returns a const reference to a given element. * * \param index which element to access. * * \note * The caller is responsible for ensuring index is >= 0 and < size(). */ const T& operator[](int index) const { return data_[index]; } /*! \brief Assigns elements of the vector based on a source iterator range. * * \param start Beginning iterator of source range * \param end Enditerator of source range * * \note * Will throw an exception if exceptions are enabled and size exceeded. */ template void assign(I start, I end) { clear(); while(start != end) { push_back(*start); start++; } } /*! \class iterator * \brief Const iterator class for vectors */ class iterator { private: const vector *vec_; int index_; /** * Internal iterator constructor to capture reference * to the vector it iterates over rather than taking * the vector by copy. */ iterator (const vector &vec, int index) : vec_(&vec) { if( !vec.empty() ) { index_ = index; } else { index_ = -1; } } public: iterator(void) : index_(-1), vec_(NULL) { } iterator(const iterator& rhs) : vec_(rhs.vec_), index_(rhs.index_) { } ~iterator(void) {} static iterator begin(const cl::vector &vec) { iterator i(vec, 0); return i; } static iterator end(const cl::vector &vec) { iterator i(vec, vec.size()); return i; } bool operator==(iterator i) { return ((vec_ == i.vec_) && (index_ == i.index_)); } bool operator!=(iterator i) { return (!(*this==i)); } iterator& operator++() { ++index_; return *this; } iterator operator++(int) { iterator retVal(*this); ++index_; return retVal; } iterator& operator--() { --index_; return *this; } iterator operator--(int) { iterator retVal(*this); --index_; return retVal; } const T& operator *() const { return (*vec_)[index_]; } }; iterator begin(void) { return iterator::begin(*this); } iterator begin(void) const { return iterator::begin(*this); } iterator end(void) { return iterator::end(*this); } iterator end(void) const { return iterator::end(*this); } T& front(void) { return data_[0]; } T& back(void) { return data_[size_]; } const T& front(void) const { return data_[0]; } const T& back(void) const { return data_[size_-1]; } } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; #endif // #if !defined(__USE_DEV_VECTOR) && !defined(__NO_STD_VECTOR) namespace detail { #define __DEFAULT_NOT_INITIALIZED 1 #define __DEFAULT_BEING_INITIALIZED 2 #define __DEFAULT_INITIALIZED 4 /* * Compare and exchange primitives are needed for handling of defaults */ #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED inline int compare_exchange(std::atomic * dest, int exchange, int comparand) #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED inline int compare_exchange(volatile int * dest, int exchange, int comparand) #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED { #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED std::atomic_compare_exchange_strong(dest, &comparand, exchange); return comparand; #elif _MSC_VER return (int)(_InterlockedCompareExchange( (volatile long*)dest, (long)exchange, (long)comparand)); #else // !_MSC_VER && !CL_HPP_CPP11_ATOMICS_SUPPORTED return (__sync_val_compare_and_swap( dest, comparand, exchange)); #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED } inline void fence() { #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED std::atomic_thread_fence(std::memory_order_seq_cst); #elif _MSC_VER // !CL_HPP_CPP11_ATOMICS_SUPPORTED _ReadWriteBarrier(); #else // !_MSC_VER && !CL_HPP_CPP11_ATOMICS_SUPPORTED __sync_synchronize(); #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED } } // namespace detail /*! \brief class used to interface between C++ and * OpenCL C calls that require arrays of size_t values, whose * size is known statically. */ template class size_t { private: ::size_t data_[N]; public: //! \brief Initialize size_t to all 0s size_t() { for( int i = 0; i < N; ++i ) { data_[i] = 0; } } ::size_t& operator[](int index) { return data_[index]; } const ::size_t& operator[](int index) const { return data_[index]; } //! \brief Conversion operator to T*. operator ::size_t* () { return data_; } //! \brief Conversion operator to const T*. operator const ::size_t* () const { return data_; } }; namespace detail { // Generic getInfoHelper. The final parameter is used to guide overload // resolution: the actual parameter passed is an int, which makes this // a worse conversion sequence than a specialization that declares the // parameter as an int. template inline cl_int getInfoHelper(Functor f, cl_uint name, T* param, long) { return f(name, sizeof(T), param, NULL); } // Specialized getInfoHelper for VECTOR_CLASS params template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, long) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } T* value = (T*) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } param->assign(&value[0], &value[required/sizeof(T)]); return CL_SUCCESS; } /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, int, typename T::cl_type = 0) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } typename T::cl_type * value = (typename T::cl_type *) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } ::size_t elements = required / sizeof(typename T::cl_type); param->assign(&value[0], &value[elements]); for (::size_t i = 0; i < elements; i++) { if (value[i] != NULL) { err = (*param)[i].retain(); if (err != CL_SUCCESS) { return err; } } } return CL_SUCCESS; } // Specialized for getInfo template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, int) { cl_int err = f(name, param->size() * sizeof(char *), &(*param)[0], NULL); if (err != CL_SUCCESS) { return err; } return CL_SUCCESS; } // Specialized GetInfoHelper for STRING_CLASS params template inline cl_int getInfoHelper(Func f, cl_uint name, STRING_CLASS* param, long) { #if defined(__NO_STD_VECTOR) || defined(__NO_STD_STRING) ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } char* value = (char*)alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; return CL_SUCCESS; #else ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } // std::string has a constant data member // a char vector does not VECTOR_CLASS value(required); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { param->assign(value.begin(), value.end()); } #endif return CL_SUCCESS; } // Specialized GetInfoHelper for cl::size_t params template inline cl_int getInfoHelper(Func f, cl_uint name, size_t* param, long) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } ::size_t* value = (::size_t*) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } for(int i = 0; i < N; ++i) { (*param)[i] = value[i]; } return CL_SUCCESS; } template struct ReferenceHandler; /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, T* param, int, typename T::cl_type = 0) { typename T::cl_type value; cl_int err = f(name, sizeof(value), &value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; if (value != NULL) { err = param->retain(); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } #define __PARAM_NAME_INFO_1_0(F) \ F(cl_platform_info, CL_PLATFORM_PROFILE, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_VERSION, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_NAME, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_VENDOR, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_EXTENSIONS, STRING_CLASS) \ \ F(cl_device_info, CL_DEVICE_TYPE, cl_device_type) \ F(cl_device_info, CL_DEVICE_VENDOR_ID, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_COMPUTE_UNITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE, ::size_t) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_SIZES, VECTOR_CLASS< ::size_t>) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_CLOCK_FREQUENCY, cl_uint) \ F(cl_device_info, CL_DEVICE_ADDRESS_BITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_READ_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_MEM_ALLOC_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_WIDTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_HEIGHT, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_WIDTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_HEIGHT, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_DEPTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_MAX_PARAMETER_SIZE, ::size_t) \ F(cl_device_info, CL_DEVICE_MAX_SAMPLERS, cl_uint) \ F(cl_device_info, CL_DEVICE_MEM_BASE_ADDR_ALIGN, cl_uint) \ F(cl_device_info, CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SINGLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, cl_device_mem_cache_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, cl_uint)\ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_TYPE, cl_device_local_mem_type) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_ERROR_CORRECTION_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_PROFILING_TIMER_RESOLUTION, ::size_t) \ F(cl_device_info, CL_DEVICE_ENDIAN_LITTLE, cl_bool) \ F(cl_device_info, CL_DEVICE_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_COMPILER_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_EXECUTION_CAPABILITIES, cl_device_exec_capabilities) \ F(cl_device_info, CL_DEVICE_QUEUE_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_PLATFORM, cl_platform_id) \ F(cl_device_info, CL_DEVICE_NAME, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_VENDOR, STRING_CLASS) \ F(cl_device_info, CL_DRIVER_VERSION, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_PROFILE, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_VERSION, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_EXTENSIONS, STRING_CLASS) \ \ F(cl_context_info, CL_CONTEXT_REFERENCE_COUNT, cl_uint) \ F(cl_context_info, CL_CONTEXT_DEVICES, VECTOR_CLASS) \ F(cl_context_info, CL_CONTEXT_PROPERTIES, VECTOR_CLASS) \ \ F(cl_event_info, CL_EVENT_COMMAND_QUEUE, cl::CommandQueue) \ F(cl_event_info, CL_EVENT_COMMAND_TYPE, cl_command_type) \ F(cl_event_info, CL_EVENT_REFERENCE_COUNT, cl_uint) \ F(cl_event_info, CL_EVENT_COMMAND_EXECUTION_STATUS, cl_int) \ \ F(cl_profiling_info, CL_PROFILING_COMMAND_QUEUED, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_SUBMIT, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_START, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_END, cl_ulong) \ \ F(cl_mem_info, CL_MEM_TYPE, cl_mem_object_type) \ F(cl_mem_info, CL_MEM_FLAGS, cl_mem_flags) \ F(cl_mem_info, CL_MEM_SIZE, ::size_t) \ F(cl_mem_info, CL_MEM_HOST_PTR, void*) \ F(cl_mem_info, CL_MEM_MAP_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_REFERENCE_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_CONTEXT, cl::Context) \ \ F(cl_image_info, CL_IMAGE_FORMAT, cl_image_format) \ F(cl_image_info, CL_IMAGE_ELEMENT_SIZE, ::size_t) \ F(cl_image_info, CL_IMAGE_ROW_PITCH, ::size_t) \ F(cl_image_info, CL_IMAGE_SLICE_PITCH, ::size_t) \ F(cl_image_info, CL_IMAGE_WIDTH, ::size_t) \ F(cl_image_info, CL_IMAGE_HEIGHT, ::size_t) \ F(cl_image_info, CL_IMAGE_DEPTH, ::size_t) \ \ F(cl_sampler_info, CL_SAMPLER_REFERENCE_COUNT, cl_uint) \ F(cl_sampler_info, CL_SAMPLER_CONTEXT, cl::Context) \ F(cl_sampler_info, CL_SAMPLER_NORMALIZED_COORDS, cl_bool) \ F(cl_sampler_info, CL_SAMPLER_ADDRESSING_MODE, cl_addressing_mode) \ F(cl_sampler_info, CL_SAMPLER_FILTER_MODE, cl_filter_mode) \ \ F(cl_program_info, CL_PROGRAM_REFERENCE_COUNT, cl_uint) \ F(cl_program_info, CL_PROGRAM_CONTEXT, cl::Context) \ F(cl_program_info, CL_PROGRAM_NUM_DEVICES, cl_uint) \ F(cl_program_info, CL_PROGRAM_DEVICES, VECTOR_CLASS) \ F(cl_program_info, CL_PROGRAM_SOURCE, STRING_CLASS) \ F(cl_program_info, CL_PROGRAM_BINARY_SIZES, VECTOR_CLASS< ::size_t>) \ F(cl_program_info, CL_PROGRAM_BINARIES, VECTOR_CLASS) \ \ F(cl_program_build_info, CL_PROGRAM_BUILD_STATUS, cl_build_status) \ F(cl_program_build_info, CL_PROGRAM_BUILD_OPTIONS, STRING_CLASS) \ F(cl_program_build_info, CL_PROGRAM_BUILD_LOG, STRING_CLASS) \ \ F(cl_kernel_info, CL_KERNEL_FUNCTION_NAME, STRING_CLASS) \ F(cl_kernel_info, CL_KERNEL_NUM_ARGS, cl_uint) \ F(cl_kernel_info, CL_KERNEL_REFERENCE_COUNT, cl_uint) \ F(cl_kernel_info, CL_KERNEL_CONTEXT, cl::Context) \ F(cl_kernel_info, CL_KERNEL_PROGRAM, cl::Program) \ \ F(cl_kernel_work_group_info, CL_KERNEL_WORK_GROUP_SIZE, ::size_t) \ F(cl_kernel_work_group_info, CL_KERNEL_COMPILE_WORK_GROUP_SIZE, cl::size_t<3>) \ F(cl_kernel_work_group_info, CL_KERNEL_LOCAL_MEM_SIZE, cl_ulong) \ \ F(cl_command_queue_info, CL_QUEUE_CONTEXT, cl::Context) \ F(cl_command_queue_info, CL_QUEUE_DEVICE, cl::Device) \ F(cl_command_queue_info, CL_QUEUE_REFERENCE_COUNT, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_PROPERTIES, cl_command_queue_properties) #if defined(CL_VERSION_1_1) #define __PARAM_NAME_INFO_1_1(F) \ F(cl_context_info, CL_CONTEXT_NUM_DEVICES, cl_uint)\ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_DOUBLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HALF_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HOST_UNIFIED_MEMORY, cl_bool) \ F(cl_device_info, CL_DEVICE_OPENCL_C_VERSION, STRING_CLASS) \ \ F(cl_mem_info, CL_MEM_ASSOCIATED_MEMOBJECT, cl::Memory) \ F(cl_mem_info, CL_MEM_OFFSET, ::size_t) \ \ F(cl_kernel_work_group_info, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, ::size_t) \ F(cl_kernel_work_group_info, CL_KERNEL_PRIVATE_MEM_SIZE, cl_ulong) \ \ F(cl_event_info, CL_EVENT_CONTEXT, cl::Context) #endif // CL_VERSION_1_1 #if defined(CL_VERSION_1_2) #define __PARAM_NAME_INFO_1_2(F) \ F(cl_image_info, CL_IMAGE_BUFFER, cl::Buffer) \ \ F(cl_program_info, CL_PROGRAM_NUM_KERNELS, ::size_t) \ F(cl_program_info, CL_PROGRAM_KERNEL_NAMES, STRING_CLASS) \ \ F(cl_program_build_info, CL_PROGRAM_BINARY_TYPE, cl_program_binary_type) \ \ F(cl_kernel_info, CL_KERNEL_ATTRIBUTES, STRING_CLASS) \ \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ADDRESS_QUALIFIER, cl_kernel_arg_address_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ACCESS_QUALIFIER, cl_kernel_arg_access_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_NAME, STRING_CLASS) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_NAME, STRING_CLASS) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_QUALIFIER, cl_kernel_arg_type_qualifier) \ \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_PROPERTIES, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPE, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, ::size_t) \ F(cl_device_info, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, cl_device_affinity_domain) \ F(cl_device_info, CL_DEVICE_BUILT_IN_KERNELS, STRING_CLASS) #endif // #if defined(CL_VERSION_1_2) #if defined(USE_CL_DEVICE_FISSION) #define __PARAM_NAME_DEVICE_FISSION(F) \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE_EXT, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPES_EXT, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_AFFINITY_DOMAINS_EXT, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT_EXT , cl_uint) \ F(cl_device_info, CL_DEVICE_PARTITION_STYLE_EXT, VECTOR_CLASS) #endif // USE_CL_DEVICE_FISSION template struct param_traits {}; #define __CL_DECLARE_PARAM_TRAITS(token, param_name, T) \ struct token; \ template<> \ struct param_traits \ { \ enum { value = param_name }; \ typedef T param_type; \ }; __PARAM_NAME_INFO_1_0(__CL_DECLARE_PARAM_TRAITS) #if defined(CL_VERSION_1_1) __PARAM_NAME_INFO_1_1(__CL_DECLARE_PARAM_TRAITS) #endif // CL_VERSION_1_1 #if defined(CL_VERSION_1_2) __PARAM_NAME_INFO_1_2(__CL_DECLARE_PARAM_TRAITS) #endif // CL_VERSION_1_1 #if defined(USE_CL_DEVICE_FISSION) __PARAM_NAME_DEVICE_FISSION(__CL_DECLARE_PARAM_TRAITS); #endif // USE_CL_DEVICE_FISSION #ifdef CL_PLATFORM_ICD_SUFFIX_KHR __CL_DECLARE_PARAM_TRAITS(cl_platform_info, CL_PLATFORM_ICD_SUFFIX_KHR, STRING_CLASS) #endif #ifdef CL_DEVICE_PROFILING_TIMER_OFFSET_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PROFILING_TIMER_OFFSET_AMD, cl_ulong) #endif #ifdef CL_DEVICE_GLOBAL_FREE_MEMORY_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, VECTOR_CLASS< ::size_t>) #endif #ifdef CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_WAVEFRONT_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_WAVEFRONT_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_BANKS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_LOCAL_MEM_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV, cl_uint) #endif #ifdef CL_DEVICE_REGISTERS_PER_BLOCK_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_REGISTERS_PER_BLOCK_NV, cl_uint) #endif #ifdef CL_DEVICE_WARP_SIZE_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_WARP_SIZE_NV, cl_uint) #endif #ifdef CL_DEVICE_GPU_OVERLAP_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GPU_OVERLAP_NV, cl_bool) #endif #ifdef CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV, cl_bool) #endif #ifdef CL_DEVICE_INTEGRATED_MEMORY_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_INTEGRATED_MEMORY_NV, cl_bool) #endif // Convenience functions template inline cl_int getInfo(Func f, cl_uint name, T* param) { return getInfoHelper(f, name, param, 0); } template struct GetInfoFunctor0 { Func f_; const Arg0& arg0_; cl_int operator ()( cl_uint param, ::size_t size, void* value, ::size_t* size_ret) { return f_(arg0_, param, size, value, size_ret); } }; template struct GetInfoFunctor1 { Func f_; const Arg0& arg0_; const Arg1& arg1_; cl_int operator ()( cl_uint param, ::size_t size, void* value, ::size_t* size_ret) { return f_(arg0_, arg1_, param, size, value, size_ret); } }; template inline cl_int getInfo(Func f, const Arg0& arg0, cl_uint name, T* param) { GetInfoFunctor0 f0 = { f, arg0 }; return getInfoHelper(f0, name, param, 0); } template inline cl_int getInfo(Func f, const Arg0& arg0, const Arg1& arg1, cl_uint name, T* param) { GetInfoFunctor1 f0 = { f, arg0, arg1 }; return getInfoHelper(f0, name, param, 0); } template struct ReferenceHandler { }; #if defined(CL_VERSION_1_2) /** * OpenCL 1.2 devices do have retain/release. */ template <> struct ReferenceHandler { /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int retain(cl_device_id device) { return ::clRetainDevice(device); } /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int release(cl_device_id device) { return ::clReleaseDevice(device); } }; #else // #if defined(CL_VERSION_1_2) /** * OpenCL 1.1 devices do not have retain/release. */ template <> struct ReferenceHandler { // cl_device_id does not have retain(). static cl_int retain(cl_device_id) { return CL_SUCCESS; } // cl_device_id does not have release(). static cl_int release(cl_device_id) { return CL_SUCCESS; } }; #endif // #if defined(CL_VERSION_1_2) template <> struct ReferenceHandler { // cl_platform_id does not have retain(). static cl_int retain(cl_platform_id) { return CL_SUCCESS; } // cl_platform_id does not have release(). static cl_int release(cl_platform_id) { return CL_SUCCESS; } }; template <> struct ReferenceHandler { static cl_int retain(cl_context context) { return ::clRetainContext(context); } static cl_int release(cl_context context) { return ::clReleaseContext(context); } }; template <> struct ReferenceHandler { static cl_int retain(cl_command_queue queue) { return ::clRetainCommandQueue(queue); } static cl_int release(cl_command_queue queue) { return ::clReleaseCommandQueue(queue); } }; template <> struct ReferenceHandler { static cl_int retain(cl_mem memory) { return ::clRetainMemObject(memory); } static cl_int release(cl_mem memory) { return ::clReleaseMemObject(memory); } }; template <> struct ReferenceHandler { static cl_int retain(cl_sampler sampler) { return ::clRetainSampler(sampler); } static cl_int release(cl_sampler sampler) { return ::clReleaseSampler(sampler); } }; template <> struct ReferenceHandler { static cl_int retain(cl_program program) { return ::clRetainProgram(program); } static cl_int release(cl_program program) { return ::clReleaseProgram(program); } }; template <> struct ReferenceHandler { static cl_int retain(cl_kernel kernel) { return ::clRetainKernel(kernel); } static cl_int release(cl_kernel kernel) { return ::clReleaseKernel(kernel); } }; template <> struct ReferenceHandler { static cl_int retain(cl_event event) { return ::clRetainEvent(event); } static cl_int release(cl_event event) { return ::clReleaseEvent(event); } }; // Extracts version number with major in the upper 16 bits, minor in the lower 16 static cl_uint getVersion(const char *versionInfo) { int highVersion = 0; int lowVersion = 0; int index = 7; while(versionInfo[index] != '.' ) { highVersion *= 10; highVersion += versionInfo[index]-'0'; ++index; } ++index; while(versionInfo[index] != ' ' && versionInfo[index] != '\0') { lowVersion *= 10; lowVersion += versionInfo[index]-'0'; ++index; } return (highVersion << 16) | lowVersion; } static cl_uint getPlatformVersion(cl_platform_id platform) { ::size_t size = 0; clGetPlatformInfo(platform, CL_PLATFORM_VERSION, 0, NULL, &size); char *versionInfo = (char *) alloca(size); clGetPlatformInfo(platform, CL_PLATFORM_VERSION, size, &versionInfo[0], &size); return getVersion(versionInfo); } static cl_uint getDevicePlatformVersion(cl_device_id device) { cl_platform_id platform; clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(platform), &platform, NULL); return getPlatformVersion(platform); } #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) static cl_uint getContextPlatformVersion(cl_context context) { // The platform cannot be queried directly, so we first have to grab a // device and obtain its context ::size_t size = 0; clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &size); if (size == 0) return 0; cl_device_id *devices = (cl_device_id *) alloca(size); clGetContextInfo(context, CL_CONTEXT_DEVICES, size, devices, NULL); return getDevicePlatformVersion(devices[0]); } #endif // #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) template class Wrapper { public: typedef T cl_type; protected: cl_type object_; public: Wrapper() : object_(NULL) { } Wrapper(const cl_type &obj) : object_(obj) { } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT { object_ = rhs.object_; rhs.object_ = NULL; } #endif Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; rhs.object_ = NULL; } return *this; } #endif Wrapper& operator = (const cl_type &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs; return *this; } cl_type operator ()() const { return object_; } cl_type& operator ()() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); cl_int retain() const { return ReferenceHandler::retain(object_); } cl_int release() const { return ReferenceHandler::release(object_); } }; template <> class Wrapper { public: typedef cl_device_id cl_type; protected: cl_type object_; bool referenceCountable_; static bool isReferenceCountable(cl_device_id device) { bool retVal = false; if (device != NULL) { int version = getDevicePlatformVersion(device); if(version > ((1 << 16) + 1)) { retVal = true; } } return retVal; } public: Wrapper() : object_(NULL), referenceCountable_(false) { } Wrapper(const cl_type &obj) : object_(obj), referenceCountable_(false) { referenceCountable_ = isReferenceCountable(obj); } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; referenceCountable_ = isReferenceCountable(object_); if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT { object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } #endif Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } return *this; } #endif Wrapper& operator = (const cl_type &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs; referenceCountable_ = isReferenceCountable(object_); return *this; } cl_type operator ()() const { return object_; } cl_type& operator ()() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); template friend inline cl_int getInfoHelper(Func, cl_uint, VECTOR_CLASS*, int, typename U::cl_type); cl_int retain() const { if( referenceCountable_ ) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if( referenceCountable_ ) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; } // namespace detail //! \endcond /*! \stuct ImageFormat * \brief Adds constructors and member functions for cl_image_format. * * \see cl_image_format */ struct ImageFormat : public cl_image_format { //! \brief Default constructor - performs no initialization. ImageFormat(){} //! \brief Initializing constructor. ImageFormat(cl_channel_order order, cl_channel_type type) { image_channel_order = order; image_channel_data_type = type; } //! \brief Assignment operator. ImageFormat& operator = (const ImageFormat& rhs) { if (this != &rhs) { this->image_channel_data_type = rhs.image_channel_data_type; this->image_channel_order = rhs.image_channel_order; } return *this; } }; /*! \brief Class interface for cl_device_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_device_id */ class Device : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Device() : detail::Wrapper() { } /*! \brief Constructor from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ __CL_EXPLICIT_CONSTRUCTORS Device(const cl_device_id &device) : detail::Wrapper(device) { } /*! \brief Returns the first device on the default context. * * \see Context::getDefault() */ static Device getDefault(cl_int * err = NULL); /*! \brief Assignment operator from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ Device& operator = (const cl_device_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Device(const Device& dev) : detail::Wrapper(dev) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Device& operator = (const Device &dev) { detail::Wrapper::operator=(dev); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Device(Device&& dev) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(dev)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Device& operator = (Device &&dev) { detail::Wrapper::operator=(std::move(dev)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetDeviceInfo(). template cl_int getInfo(cl_device_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetDeviceInfo, object_, name, param), __GET_DEVICE_INFO_ERR); } //! \brief Wrapper for clGetDeviceInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_device_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /** * CL 1.2 version */ #if defined(CL_VERSION_1_2) //! \brief Wrapper for clCreateSubDevicesEXT(). cl_int createSubDevices( const cl_device_partition_property * properties, VECTOR_CLASS* devices) { cl_uint n = 0; cl_int err = clCreateSubDevices(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = clCreateSubDevices(object_, properties, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif // #if defined(CL_VERSION_1_2) /** * CL 1.1 version that uses device fission. */ #if defined(CL_VERSION_1_1) #if defined(USE_CL_DEVICE_FISSION) cl_int createSubDevices( const cl_device_partition_property_ext * properties, VECTOR_CLASS* devices) { typedef CL_API_ENTRY cl_int ( CL_API_CALL * PFN_clCreateSubDevicesEXT)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; static PFN_clCreateSubDevicesEXT pfn_clCreateSubDevicesEXT = NULL; __INIT_CL_EXT_FCN_PTR(clCreateSubDevicesEXT); cl_uint n = 0; cl_int err = pfn_clCreateSubDevicesEXT(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = pfn_clCreateSubDevicesEXT(object_, properties, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif // #if defined(USE_CL_DEVICE_FISSION) #endif // #if defined(CL_VERSION_1_1) }; /*! \brief Class interface for cl_platform_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_platform_id */ class Platform : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Platform() : detail::Wrapper() { } /*! \brief Constructor from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ __CL_EXPLICIT_CONSTRUCTORS Platform(const cl_platform_id &platform) : detail::Wrapper(platform) { } /*! \brief Assignment operator from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ Platform& operator = (const cl_platform_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetPlatformInfo(). cl_int getInfo(cl_platform_info name, STRING_CLASS* param) const { return detail::errHandler( detail::getInfo(&::clGetPlatformInfo, object_, name, param), __GET_PLATFORM_INFO_ERR); } //! \brief Wrapper for clGetPlatformInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_platform_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of devices for this platform. * * Wraps clGetDeviceIDs(). */ cl_int getDevices( cl_device_type type, VECTOR_CLASS* devices) const { cl_uint n = 0; if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } cl_int err = ::clGetDeviceIDs(object_, type, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = ::clGetDeviceIDs(object_, type, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #if defined(USE_DX_INTEROP) /*! \brief Get the list of available D3D10 devices. * * \param d3d_device_source. * * \param d3d_object. * * \param d3d_device_set. * * \param devices returns a vector of OpenCL D3D10 devices found. The cl::Device * values returned in devices can be used to identify a specific OpenCL * device. If \a devices argument is NULL, this argument is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * * The application can query specific capabilities of the OpenCL device(s) * returned by cl::getDevices. This can be used by the application to * determine which device(s) to use. * * \note In the case that exceptions are enabled and a return value * other than CL_SUCCESS is generated, then cl::Error exception is * generated. */ cl_int getDevices( cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, VECTOR_CLASS* devices) const { typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clGetDeviceIDsFromD3D10KHR)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint* num_devices); if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } static PFN_clGetDeviceIDsFromD3D10KHR pfn_clGetDeviceIDsFromD3D10KHR = NULL; __INIT_CL_EXT_FCN_PTR_PLATFORM(object_, clGetDeviceIDsFromD3D10KHR); cl_uint n = 0; cl_int err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif /*! \brief Gets a list of available platforms. * * Wraps clGetPlatformIDs(). */ static cl_int get( VECTOR_CLASS* platforms) { cl_uint n = 0; if( platforms == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } platforms->assign(&ids[0], &ids[n]); return CL_SUCCESS; } /*! \brief Gets the first available platform. * * Wraps clGetPlatformIDs(), returning the first result. */ static cl_int get( Platform * platform) { cl_uint n = 0; if( platform == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } *platform = ids[0]; return CL_SUCCESS; } /*! \brief Gets the first available platform, returning it by value. * * Wraps clGetPlatformIDs(), returning the first result. */ static Platform get( cl_int * errResult = NULL) { Platform platform; cl_uint n = 0; cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { detail::errHandler(err, __GET_PLATFORM_IDS_ERR); if (errResult != NULL) { *errResult = err; } return Platform(); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { detail::errHandler(err, __GET_PLATFORM_IDS_ERR); if (errResult != NULL) { *errResult = err; } return Platform(); } return Platform(ids[0]); } static Platform getDefault( cl_int *errResult = NULL ) { return get(errResult); } #if defined(CL_VERSION_1_2) //! \brief Wrapper for clUnloadCompiler(). cl_int unloadCompiler() { return ::clUnloadPlatformCompiler(object_); } #endif // #if defined(CL_VERSION_1_2) }; // class Platform /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) /** * Unload the OpenCL compiler. * \note Deprecated for OpenCL 1.2. Use Platform::unloadCompiler instead. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int UnloadCompiler() CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline cl_int UnloadCompiler() { return ::clUnloadCompiler(); } #endif // #if defined(CL_VERSION_1_1) /*! \brief Class interface for cl_context. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_context as the original. For details, see * clRetainContext() and clReleaseContext(). * * \see cl_context */ class Context : public detail::Wrapper { private: #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED static std::atomic default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED static volatile int default_initialized_; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED static Context default_; static volatile cl_int default_error_; public: /*! \brief Constructs a context including a list of specified devices. * * Wraps clCreateContext(). */ Context( const VECTOR_CLASS& devices, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateContext( properties, (cl_uint) numDevices, deviceIDs, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } Context( const Device& device, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; cl_device_id deviceID = device(); object_ = ::clCreateContext( properties, 1, &deviceID, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a context including all or a subset of devices of a specified type. * * Wraps clCreateContextFromType(). */ Context( cl_device_type type, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties prop[4] = {CL_CONTEXT_PLATFORM, 0, 0, 0 }; if (properties == NULL) { // Get a valid platform ID as we cannot send in a blank one VECTOR_CLASS platforms; error = Platform::get(&platforms); if (error != CL_SUCCESS) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } return; } // Check the platforms we found for a device of our specified type cl_context_properties platform_id = 0; for (unsigned int i = 0; i < platforms.size(); i++) { VECTOR_CLASS devices; #if defined(__CL_ENABLE_EXCEPTIONS) try { #endif error = platforms[i].getDevices(type, &devices); #if defined(__CL_ENABLE_EXCEPTIONS) } catch (Error) {} // Catch if exceptions are enabled as we don't want to exit if first platform has no devices of type // We do error checking next anyway, and can throw there if needed #endif // Only squash CL_SUCCESS and CL_DEVICE_NOT_FOUND if (error != CL_SUCCESS && error != CL_DEVICE_NOT_FOUND) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } if (devices.size() > 0) { platform_id = (cl_context_properties)platforms[i](); break; } } if (platform_id == 0) { detail::errHandler(CL_DEVICE_NOT_FOUND, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = CL_DEVICE_NOT_FOUND; } return; } prop[1] = platform_id; properties = &prop[0]; } #endif object_ = ::clCreateContextFromType( properties, type, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Context(const Context& ctx) : detail::Wrapper(ctx) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Context& operator = (const Context &ctx) { detail::Wrapper::operator=(ctx); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Context(Context&& ctx) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(ctx)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Context& operator = (Context &&ctx) { detail::Wrapper::operator=(std::move(ctx)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Returns a singleton context including all devices of CL_DEVICE_TYPE_DEFAULT. * * \note All calls to this function return the same cl_context as the first. */ static Context getDefault(cl_int * err = NULL) { int state = detail::compare_exchange( &default_initialized_, __DEFAULT_BEING_INITIALIZED, __DEFAULT_NOT_INITIALIZED); if (state & __DEFAULT_INITIALIZED) { if (err != NULL) { *err = default_error_; } return default_; } if (state & __DEFAULT_BEING_INITIALIZED) { // Assume writes will propagate eventually... while(default_initialized_ != __DEFAULT_INITIALIZED) { detail::fence(); } if (err != NULL) { *err = default_error_; } return default_; } cl_int error; default_ = Context( CL_DEVICE_TYPE_DEFAULT, NULL, NULL, NULL, &error); detail::fence(); default_error_ = error; // Assume writes will propagate eventually... default_initialized_ = __DEFAULT_INITIALIZED; detail::fence(); if (err != NULL) { *err = default_error_; } return default_; } //! \brief Default constructor - initializes to NULL. Context() : detail::Wrapper() { } /*! \brief Constructor from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the cl_context * into the new Context object. */ __CL_EXPLICIT_CONSTRUCTORS Context(const cl_context& context) : detail::Wrapper(context) { } /*! \brief Assignment operator from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseContext() on the value previously held by this instance. */ Context& operator = (const cl_context& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetContextInfo(). template cl_int getInfo(cl_context_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetContextInfo, object_, name, param), __GET_CONTEXT_INFO_ERR); } //! \brief Wrapper for clGetContextInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_context_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of supported image formats. * * Wraps clGetSupportedImageFormats(). */ cl_int getSupportedImageFormats( cl_mem_flags flags, cl_mem_object_type type, VECTOR_CLASS* formats) const { cl_uint numEntries; if (!formats) { return CL_SUCCESS; } cl_int err = ::clGetSupportedImageFormats( object_, flags, type, 0, NULL, &numEntries); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } if (numEntries > 0) { ImageFormat* value = (ImageFormat*) alloca(numEntries * sizeof(ImageFormat)); err = ::clGetSupportedImageFormats( object_, flags, type, numEntries, (cl_image_format*)value, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } formats->assign(&value[0], &value[numEntries]); } else { formats->clear(); } return CL_SUCCESS; } }; inline Device Device::getDefault(cl_int * err) { cl_int error; Device device; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { device = context.getInfo()[0]; if (err != NULL) { *err = CL_SUCCESS; } } return device; } #ifdef _WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) std::atomic Context::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) volatile int Context::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) Context Context::default_; __declspec(selectany) volatile cl_int Context::default_error_ = CL_SUCCESS; #else // !_WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) std::atomic Context::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) volatile int Context::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) Context Context::default_; __attribute__((weak)) volatile cl_int Context::default_error_ = CL_SUCCESS; #endif // !_WIN32 /*! \brief Class interface for cl_event. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_event as the original. For details, see * clRetainEvent() and clReleaseEvent(). * * \see cl_event */ class Event : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Event() : detail::Wrapper() { } /*! \brief Constructor from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the cl_event * into the new Event object. */ __CL_EXPLICIT_CONSTRUCTORS Event(const cl_event& event) : detail::Wrapper(event) { } /*! \brief Assignment operator from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseEvent() on the value previously held by this instance. */ Event& operator = (const cl_event& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetEventInfo(). template cl_int getInfo(cl_event_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetEventInfo, object_, name, param), __GET_EVENT_INFO_ERR); } //! \brief Wrapper for clGetEventInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_event_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } //! \brief Wrapper for clGetEventProfilingInfo(). template cl_int getProfilingInfo(cl_profiling_info name, T* param) const { return detail::errHandler(detail::getInfo( &::clGetEventProfilingInfo, object_, name, param), __GET_EVENT_PROFILE_INFO_ERR); } //! \brief Wrapper for clGetEventProfilingInfo() that returns by value. template typename detail::param_traits::param_type getProfilingInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_profiling_info, name>::param_type param; cl_int result = getProfilingInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Blocks the calling thread until this event completes. * * Wraps clWaitForEvents(). */ cl_int wait() const { return detail::errHandler( ::clWaitForEvents(1, &object_), __WAIT_FOR_EVENTS_ERR); } #if defined(CL_VERSION_1_1) /*! \brief Registers a user callback function for a specific command execution status. * * Wraps clSetEventCallback(). */ cl_int setCallback( cl_int type, void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data = NULL) { return detail::errHandler( ::clSetEventCallback( object_, type, pfn_notify, user_data), __SET_EVENT_CALLBACK_ERR); } #endif /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ static cl_int waitForEvents(const VECTOR_CLASS& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } }; #if defined(CL_VERSION_1_1) /*! \brief Class interface for user events (a subset of cl_event's). * * See Event for details about copy semantics, etc. */ class UserEvent : public Event { public: /*! \brief Constructs a user event on a given context. * * Wraps clCreateUserEvent(). */ UserEvent( const Context& context, cl_int * err = NULL) { cl_int error; object_ = ::clCreateUserEvent( context(), &error); detail::errHandler(error, __CREATE_USER_EVENT_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. UserEvent() : Event() { } /*! \brief Sets the execution status of a user event object. * * Wraps clSetUserEventStatus(). */ cl_int setStatus(cl_int status) { return detail::errHandler( ::clSetUserEventStatus(object_,status), __SET_USER_EVENT_STATUS_ERR); } }; #endif /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ inline static cl_int WaitForEvents(const VECTOR_CLASS& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } /*! \brief Class interface for cl_mem. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_mem as the original. For details, see * clRetainMemObject() and clReleaseMemObject(). * * \see cl_mem */ class Memory : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Memory() : detail::Wrapper() { } /*! \brief Constructor from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the cl_mem * into the new Memory object. */ __CL_EXPLICIT_CONSTRUCTORS Memory(const cl_mem& memory) : detail::Wrapper(memory) { } /*! \brief Assignment operator from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseMemObject() on the value previously held by this instance. */ Memory& operator = (const cl_mem& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Memory(const Memory& mem) : detail::Wrapper(mem) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Memory& operator = (const Memory &mem) { detail::Wrapper::operator=(mem); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Memory(Memory&& mem) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(mem)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Memory& operator = (Memory &&mem) { detail::Wrapper::operator=(std::move(mem)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_mem_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetMemObjectInfo, object_, name, param), __GET_MEM_OBJECT_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_mem_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if defined(CL_VERSION_1_1) /*! \brief Registers a callback function to be called when the memory object * is no longer needed. * * Wraps clSetMemObjectDestructorCallback(). * * Repeated calls to this function, for a given cl_mem value, will append * to the list of functions called (in reverse order) when memory object's * resources are freed and the memory object is deleted. * * \note * The registered callbacks are associated with the underlying cl_mem * value - not the Memory class instance. */ cl_int setDestructorCallback( void (CL_CALLBACK * pfn_notify)(cl_mem, void *), void * user_data = NULL) { return detail::errHandler( ::clSetMemObjectDestructorCallback( object_, pfn_notify, user_data), __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR); } #endif }; // Pre-declare copy functions class Buffer; template< typename IteratorType > cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); /*! \brief Class interface for Buffer Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Buffer : public Memory { public: /*! \brief Constructs a Buffer in a specified context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. */ Buffer( const Context& context, cl_mem_flags flags, ::size_t size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Buffer in the default context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. * * \see Context::getDefault() */ Buffer( cl_mem_flags flags, ::size_t size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! * \brief Construct a Buffer from a host container via iterators. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer( IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); Context context = Context::getDefault(err); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { error = cl::copy(startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Construct a Buffer from a host container via iterators using a specified context. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); /*! * \brief Construct a Buffer from a host container via iterators using a specified queue. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Buffer() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Buffer(const cl_mem& buffer) : Memory(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Buffer& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Buffer(const Buffer& buf) : Memory(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Buffer& operator = (const Buffer &buf) { Memory::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Buffer(Buffer&& buf) CL_HPP_NOEXCEPT : Memory(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Buffer& operator = (Buffer &&buf) { Memory::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) #if defined(CL_VERSION_1_1) /*! \brief Creates a new buffer object from this. * * Wraps clCreateSubBuffer(). */ Buffer createSubBuffer( cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * err = NULL) { Buffer result; cl_int error; result.object_ = ::clCreateSubBuffer( object_, flags, buffer_create_type, buffer_create_info, &error); detail::errHandler(error, __CREATE_SUBBUFFER_ERR); if (err != NULL) { *err = error; } return result; } #endif }; #if defined (USE_DX_INTEROP) /*! \brief Class interface for creating OpenCL buffers from ID3D10Buffer's. * * This is provided to facilitate interoperability with Direct3D. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferD3D10 : public Buffer { public: typedef CL_API_ENTRY cl_mem (CL_API_CALL *PFN_clCreateFromD3D10BufferKHR)( cl_context context, cl_mem_flags flags, ID3D10Buffer* buffer, cl_int* errcode_ret); /*! \brief Constructs a BufferD3D10, in a specified context, from a * given ID3D10Buffer. * * Wraps clCreateFromD3D10BufferKHR(). */ BufferD3D10( const Context& context, cl_mem_flags flags, ID3D10Buffer* bufobj, cl_int * err = NULL) { static PFN_clCreateFromD3D10BufferKHR pfn_clCreateFromD3D10BufferKHR = NULL; #if defined(CL_VERSION_1_2) vector props = context.getInfo(); cl_platform platform = -1; for( int i = 0; i < props.size(); ++i ) { if( props[i] == CL_CONTEXT_PLATFORM ) { platform = props[i+1]; } } __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clCreateFromD3D10BufferKHR); #endif #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clCreateFromD3D10BufferKHR); #endif cl_int error; object_ = pfn_clCreateFromD3D10BufferKHR( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferD3D10() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS BufferD3D10(const cl_mem& buffer) : Buffer(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferD3D10& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10(const BufferD3D10& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (const BufferD3D10 &buf) { Buffer::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10(BufferD3D10&& buf) CL_HPP_NOEXCEPT : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (BufferD3D10 &&buf) { Buffer::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif /*! \brief Class interface for GL Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferGL : public Buffer { public: /*! \brief Constructs a BufferGL in a specified context, from a given * GL buffer. * * Wraps clCreateFromGLBuffer(). */ BufferGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLBuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS BufferGL(const cl_mem& buffer) : Buffer(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL(const BufferGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (const BufferGL &buf) { Buffer::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferGL(BufferGL&& buf) CL_HPP_NOEXCEPT : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (BufferGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief C++ base class for Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image : public Memory { protected: //! \brief Default constructor - initializes to NULL. Image() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image(const cl_mem& image) : Memory(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image(const Image& img) : Memory(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image& operator = (const Image &img) { Memory::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image(Image&& img) CL_HPP_NOEXCEPT : Memory(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image& operator = (Image &&img) { Memory::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) public: //! \brief Wrapper for clGetImageInfo(). template cl_int getImageInfo(cl_image_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetImageInfo, object_, name, param), __GET_IMAGE_INFO_ERR); } //! \brief Wrapper for clGetImageInfo() that returns by value. template typename detail::param_traits::param_type getImageInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_image_info, name>::param_type param; cl_int result = getImageInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; #if defined(CL_VERSION_1_2) /*! \brief Class interface for 1D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image1D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image1D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D, width, 0, 0, 0, 0, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image1D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image1D(const cl_mem& image1D) : Image(image1D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image1D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1D(const Image1D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1D& operator = (const Image1D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1D(Image1D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1D& operator = (Image1D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; /*! \class Image1DBuffer * \brief Image interface for 1D buffer images. */ class Image1DBuffer : public Image { public: Image1DBuffer( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, const Buffer &buffer, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_BUFFER, width, 0, 0, 0, 0, 0, 0, 0, buffer() }; object_ = ::clCreateImage( context(), flags, &format, &desc, NULL, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DBuffer() { } __CL_EXPLICIT_CONSTRUCTORS Image1DBuffer(const cl_mem& image1D) : Image(image1D) { } Image1DBuffer& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer(const Image1DBuffer& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (const Image1DBuffer &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer(Image1DBuffer&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (Image1DBuffer &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; /*! \class Image1DArray * \brief Image interface for arrays of 1D images. */ class Image1DArray : public Image { public: Image1DArray( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t arraySize, ::size_t width, ::size_t rowPitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_ARRAY, width, 0, 0, // height, depth (unused) arraySize, rowPitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DArray() { } __CL_EXPLICIT_CONSTRUCTORS Image1DArray(const cl_mem& imageArray) : Image(imageArray) { } Image1DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray(const Image1DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (const Image1DArray &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray(Image1DArray&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (Image1DArray &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for 2D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image2D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, ::size_t height, ::size_t row_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif defined(CL_VERSION_1_2) useCreateImage = true; #else useCreateImage = false; #endif #if defined(CL_VERSION_1_2) if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) if (!useCreateImage) { object_ = ::clCreateImage2D( context(), flags,&format, width, height, row_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE2D_ERR); if (err != NULL) { *err = error; } } #endif // #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) } //! \brief Default constructor - initializes to NULL. Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image2D(const cl_mem& image2D) : Image(image2D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2D(const Image2D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2D& operator = (const Image2D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2D(Image2D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2D& operator = (Image2D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #if !defined(CL_VERSION_1_2) /*! \brief Class interface for GL 2D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory * \note Deprecated for OpenCL 1.2. Please use ImageGL instead. */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED Image2DGL CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED : public Image2D { public: /*! \brief Constructs an Image2DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture2D(). */ Image2DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture2D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_2D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image2DGL() : Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image2DGL(const cl_mem& image) : Image2D(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2DGL& operator = (const cl_mem& rhs) { Image2D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL(const Image2DGL& img) : Image2D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (const Image2DGL &img) { Image2D::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL(Image2DGL&& img) CL_HPP_NOEXCEPT : Image2D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (Image2DGL &&img) { Image2D::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if !defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_2) /*! \class Image2DArray * \brief Image interface for arrays of 2D images. */ class Image2DArray : public Image { public: Image2DArray( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t arraySize, ::size_t width, ::size_t height, ::size_t rowPitch, ::size_t slicePitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D_ARRAY, width, height, 0, // depth (unused) arraySize, rowPitch, slicePitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image2DArray() { } __CL_EXPLICIT_CONSTRUCTORS Image2DArray(const cl_mem& imageArray) : Image(imageArray) { } Image2DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray(const Image2DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (const Image2DArray &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray(Image2DArray&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (Image2DArray &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for 3D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3D : public Image { public: /*! \brief Constructs a 3D Image in a specified context. * * Wraps clCreateImage(). */ Image3D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, ::size_t height, ::size_t depth, ::size_t row_pitch = 0, ::size_t slice_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif defined(CL_VERSION_1_2) useCreateImage = true; #else useCreateImage = false; #endif #if defined(CL_VERSION_1_2) if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE3D, width, height, depth, 0, // array size (unused) row_pitch, slice_pitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) if (!useCreateImage) { object_ = ::clCreateImage3D( context(), flags, &format, width, height, depth, row_pitch, slice_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE3D_ERR); if (err != NULL) { *err = error; } } #endif // #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) } //! \brief Default constructor - initializes to NULL. Image3D() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image3D(const cl_mem& image3D) : Image(image3D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3D(const Image3D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3D& operator = (const Image3D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3D(Image3D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3D& operator = (Image3D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #if !defined(CL_VERSION_1_2) /*! \brief Class interface for GL 3D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3DGL : public Image3D { public: /*! \brief Constructs an Image3DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture3D(). */ Image3DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture3D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_3D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image3DGL() : Image3D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image3DGL(const cl_mem& image) : Image3D(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3DGL& operator = (const cl_mem& rhs) { Image3D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL(const Image3DGL& img) : Image3D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (const Image3DGL &img) { Image3D::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL(Image3DGL&& img) CL_HPP_NOEXCEPT : Image3D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (Image3DGL &&img) { Image3D::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if !defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_2) /*! \class ImageGL * \brief general image interface for GL interop. * We abstract the 2D and 3D GL images into a single instance here * that wraps all GL sourced images on the grounds that setup information * was performed by OpenCL anyway. */ class ImageGL : public Image { public: ImageGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_ERR); if (err != NULL) { *err = error; } } ImageGL() : Image() { } __CL_EXPLICIT_CONSTRUCTORS ImageGL(const cl_mem& image) : Image(image) { } ImageGL& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL(const ImageGL& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (const ImageGL &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ ImageGL(ImageGL&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (ImageGL &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for GL Render Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferRenderGL : #if defined(CL_VERSION_1_2) public ImageGL #else // #if defined(CL_VERSION_1_2) public Image2DGL #endif //#if defined(CL_VERSION_1_2) { public: /*! \brief Constructs a BufferRenderGL in a specified context, from a given * GL Renderbuffer. * * Wraps clCreateFromGLRenderbuffer(). */ BufferRenderGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLRenderbuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_RENDER_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. #if defined(CL_VERSION_1_2) BufferRenderGL() : ImageGL() {}; #else // #if defined(CL_VERSION_1_2) BufferRenderGL() : Image2DGL() {}; #endif //#if defined(CL_VERSION_1_2) /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ #if defined(CL_VERSION_1_2) __CL_EXPLICIT_CONSTRUCTORS BufferRenderGL(const cl_mem& buffer) : ImageGL(buffer) { } #else // #if defined(CL_VERSION_1_2) __CL_EXPLICIT_CONSTRUCTORS BufferRenderGL(const cl_mem& buffer) : Image2DGL(buffer) { } #endif //#if defined(CL_VERSION_1_2) /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferRenderGL& operator = (const cl_mem& rhs) { #if defined(CL_VERSION_1_2) ImageGL::operator=(rhs); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(rhs); #endif //#if defined(CL_VERSION_1_2) return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ #if defined(CL_VERSION_1_2) BufferRenderGL(const BufferRenderGL& buf) : ImageGL(buf) {} #else // #if defined(CL_VERSION_1_2) BufferRenderGL(const BufferRenderGL& buf) : Image2DGL(buf) {} #endif //#if defined(CL_VERSION_1_2) /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (const BufferRenderGL &rhs) { #if defined(CL_VERSION_1_2) ImageGL::operator=(rhs); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(rhs); #endif //#if defined(CL_VERSION_1_2) return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ #if defined(CL_VERSION_1_2) BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT : ImageGL(std::move(buf)) {} #else // #if defined(CL_VERSION_1_2) BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT : Image2DGL(std::move(buf)) {} #endif //#if defined(CL_VERSION_1_2) /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (BufferRenderGL &&buf) { #if defined(CL_VERSION_1_2) ImageGL::operator=(std::move(buf)); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(std::move(buf)); #endif //#if defined(CL_VERSION_1_2) return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_, type, gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief Class interface for cl_sampler. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_sampler as the original. For details, see * clRetainSampler() and clReleaseSampler(). * * \see cl_sampler */ class Sampler : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Sampler() { } /*! \brief Constructs a Sampler in a specified context. * * Wraps clCreateSampler(). */ Sampler( const Context& context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int* err = NULL) { cl_int error; object_ = ::clCreateSampler( context(), normalized_coords, addressing_mode, filter_mode, &error); detail::errHandler(error, __CREATE_SAMPLER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructor from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the cl_sampler * into the new Sampler object. */ __CL_EXPLICIT_CONSTRUCTORS Sampler(const cl_sampler& sampler) : detail::Wrapper(sampler) { } /*! \brief Assignment operator from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseSampler() on the value previously held by this instance. */ Sampler& operator = (const cl_sampler& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Sampler(const Sampler& sam) : detail::Wrapper(sam) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Sampler& operator = (const Sampler &sam) { detail::Wrapper::operator=(sam); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Sampler(Sampler&& sam) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(sam)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Sampler& operator = (Sampler &&sam) { detail::Wrapper::operator=(std::move(sam)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetSamplerInfo(). template cl_int getInfo(cl_sampler_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetSamplerInfo, object_, name, param), __GET_SAMPLER_INFO_ERR); } //! \brief Wrapper for clGetSamplerInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_sampler_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; class Program; class CommandQueue; class Kernel; //! \brief Class interface for specifying NDRange values. class NDRange { private: size_t<3> sizes_; cl_uint dimensions_; public: //! \brief Default constructor - resulting range has zero dimensions. NDRange() : dimensions_(0) { } //! \brief Constructs one-dimensional range. NDRange(::size_t size0) : dimensions_(1) { sizes_[0] = size0; } //! \brief Constructs two-dimensional range. NDRange(::size_t size0, ::size_t size1) : dimensions_(2) { sizes_[0] = size0; sizes_[1] = size1; } //! \brief Constructs three-dimensional range. NDRange(::size_t size0, ::size_t size1, ::size_t size2) : dimensions_(3) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = size2; } /*! \brief Conversion operator to const ::size_t *. * * \returns a pointer to the size of the first dimension. */ operator const ::size_t*() const { return (const ::size_t*) sizes_; } //! \brief Queries the number of dimensions in the range. ::size_t dimensions() const { return dimensions_; } }; //! \brief A zero-dimensional range. static const NDRange NullRange; //! \brief Local address wrapper for use with Kernel::setArg struct LocalSpaceArg { ::size_t size_; }; namespace detail { template struct KernelArgumentHandler { static ::size_t size(const T&) { return sizeof(T); } static const T* ptr(const T& value) { return &value; } }; template <> struct KernelArgumentHandler { static ::size_t size(const LocalSpaceArg& value) { return value.size_; } static const void* ptr(const LocalSpaceArg&) { return NULL; } }; } //! \endcond /*! __local * \brief Helper function for generating LocalSpaceArg objects. * Deprecated. Replaced with Local. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED LocalSpaceArg __local(::size_t size) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline LocalSpaceArg __local(::size_t size) { LocalSpaceArg ret = { size }; return ret; } /*! Local * \brief Helper function for generating LocalSpaceArg objects. */ inline LocalSpaceArg Local(::size_t size) { LocalSpaceArg ret = { size }; return ret; } //class KernelFunctor; /*! \brief Class interface for cl_kernel. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_kernel as the original. For details, see * clRetainKernel() and clReleaseKernel(). * * \see cl_kernel */ class Kernel : public detail::Wrapper { public: inline Kernel(const Program& program, const char* name, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Kernel() { } /*! \brief Constructor from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the cl_kernel * into the new Kernel object. */ __CL_EXPLICIT_CONSTRUCTORS Kernel(const cl_kernel& kernel) : detail::Wrapper(kernel) { } /*! \brief Assignment operator from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseKernel() on the value previously held by this instance. */ Kernel& operator = (const cl_kernel& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Kernel(const Kernel& kernel) : detail::Wrapper(kernel) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Kernel& operator = (const Kernel &kernel) { detail::Wrapper::operator=(kernel); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Kernel(Kernel&& kernel) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(kernel)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Kernel& operator = (Kernel &&kernel) { detail::Wrapper::operator=(std::move(kernel)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) template cl_int getInfo(cl_kernel_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelInfo, object_, name, param), __GET_KERNEL_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if defined(CL_VERSION_1_2) template cl_int getArgInfo(cl_uint argIndex, cl_kernel_arg_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelArgInfo, object_, argIndex, name, param), __GET_KERNEL_ARG_INFO_ERR); } template typename detail::param_traits::param_type getArgInfo(cl_uint argIndex, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_arg_info, name>::param_type param; cl_int result = getArgInfo(argIndex, name, ¶m); if (err != NULL) { *err = result; } return param; } #endif // #if defined(CL_VERSION_1_2) template cl_int getWorkGroupInfo( const Device& device, cl_kernel_work_group_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetKernelWorkGroupInfo, object_, device(), name, param), __GET_KERNEL_WORK_GROUP_INFO_ERR); } template typename detail::param_traits::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_work_group_info, name>::param_type param; cl_int result = getWorkGroupInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int setArg(cl_uint index, const T &value) { return detail::errHandler( ::clSetKernelArg( object_, index, detail::KernelArgumentHandler::size(value), detail::KernelArgumentHandler::ptr(value)), __SET_KERNEL_ARGS_ERR); } cl_int setArg(cl_uint index, ::size_t size, const void* argPtr) { return detail::errHandler( ::clSetKernelArg(object_, index, size, argPtr), __SET_KERNEL_ARGS_ERR); } }; /*! \class Program * \brief Program interface that implements cl_program. */ class Program : public detail::Wrapper { public: typedef VECTOR_CLASS > Binaries; typedef VECTOR_CLASS > Sources; Program( const STRING_CLASS& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const ::size_t length = source.size(); Context context = Context::getDefault(err); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, "", NULL, NULL); detail::errHandler(error, __BUILD_PROGRAM_ERR); } if (err != NULL) { *err = error; } } Program( const Context& context, const STRING_CLASS& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const ::size_t length = source.size(); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, "", NULL, NULL); detail::errHandler(error, __BUILD_PROGRAM_ERR); } if (err != NULL) { *err = error; } } Program( const Context& context, const Sources& sources, cl_int* err = NULL) { cl_int error; const ::size_t n = (::size_t)sources.size(); ::size_t* lengths = (::size_t*) alloca(n * sizeof(::size_t)); const char** strings = (const char**) alloca(n * sizeof(const char*)); for (::size_t i = 0; i < n; ++i) { strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings, lengths, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Construct a program object from a list of devices and a per-device list of binaries. * \param context A valid OpenCL context in which to construct the program. * \param devices A vector of OpenCL device objects for which the program will be created. * \param binaries A vector of pairs of a pointer to a binary object and its length. * \param binaryStatus An optional vector that on completion will be resized to * match the size of binaries and filled with values to specify if each binary * was successfully loaded. * Set to CL_SUCCESS if the binary was successfully loaded. * Set to CL_INVALID_VALUE if the length is 0 or the binary pointer is NULL. * Set to CL_INVALID_BINARY if the binary provided is not valid for the matching device. * \param err if non-NULL will be set to CL_SUCCESS on successful operation or one of the following errors: * CL_INVALID_CONTEXT if context is not a valid context. * CL_INVALID_VALUE if the length of devices is zero; or if the length of binaries does not match the length of devices; * or if any entry in binaries is NULL or has length 0. * CL_INVALID_DEVICE if OpenCL devices listed in devices are not in the list of devices associated with context. * CL_INVALID_BINARY if an invalid program binary was encountered for any device. binaryStatus will return specific status for each device. * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ Program( const Context& context, const VECTOR_CLASS& devices, const Binaries& binaries, VECTOR_CLASS* binaryStatus = NULL, cl_int* err = NULL) { cl_int error; const ::size_t numDevices = devices.size(); // Catch size mismatch early and return if(binaries.size() != numDevices) { error = CL_INVALID_VALUE; detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } return; } ::size_t* lengths = (::size_t*) alloca(numDevices * sizeof(::size_t)); const unsigned char** images = (const unsigned char**) alloca(numDevices * sizeof(const unsigned char**)); for (::size_t i = 0; i < numDevices; ++i) { images[i] = (const unsigned char*)binaries[i].first; lengths[i] = binaries[(int)i].second; } cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } if(binaryStatus) { binaryStatus->resize(numDevices); } object_ = ::clCreateProgramWithBinary( context(), (cl_uint) devices.size(), deviceIDs, lengths, images, (binaryStatus != NULL && numDevices > 0) ? &binaryStatus->front() : NULL, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } } #if defined(CL_VERSION_1_2) /** * Create program using builtin kernels. * \param kernelNames Semi-colon separated list of builtin kernel names */ Program( const Context& context, const VECTOR_CLASS& devices, const STRING_CLASS& kernelNames, cl_int* err = NULL) { cl_int error; ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateProgramWithBuiltInKernels( context(), (cl_uint) devices.size(), deviceIDs, kernelNames.c_str(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) Program() { } __CL_EXPLICIT_CONSTRUCTORS Program(const cl_program& program) : detail::Wrapper(program) { } Program& operator = (const cl_program& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Program(const Program& program) : detail::Wrapper(program) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Program& operator = (const Program &program) { detail::Wrapper::operator=(program); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Program(Program&& program) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(program)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Program& operator = (Program &&program) { detail::Wrapper::operator=(std::move(program)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) cl_int build( const VECTOR_CLASS& devices, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } return detail::errHandler( ::clBuildProgram( object_, (cl_uint) devices.size(), deviceIDs, options, notifyFptr, data), __BUILD_PROGRAM_ERR); } cl_int build( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { return detail::errHandler( ::clBuildProgram( object_, 0, NULL, options, notifyFptr, data), __BUILD_PROGRAM_ERR); } #if defined(CL_VERSION_1_2) cl_int compile( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { return detail::errHandler( ::clCompileProgram( object_, 0, NULL, options, 0, NULL, NULL, notifyFptr, data), __COMPILE_PROGRAM_ERR); } #endif template cl_int getInfo(cl_program_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int getBuildInfo( const Device& device, cl_program_build_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetProgramBuildInfo, object_, device(), name, param), __GET_PROGRAM_BUILD_INFO_ERR); } template typename detail::param_traits::param_type getBuildInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; cl_int result = getBuildInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int createKernels(VECTOR_CLASS* kernels) { cl_uint numKernels; cl_int err = ::clCreateKernelsInProgram(object_, 0, NULL, &numKernels); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } Kernel* value = (Kernel*) alloca(numKernels * sizeof(Kernel)); err = ::clCreateKernelsInProgram( object_, numKernels, (cl_kernel*) value, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } kernels->assign(&value[0], &value[numKernels]); return CL_SUCCESS; } }; #if defined(CL_VERSION_1_2) inline Program linkProgram( Program input1, Program input2, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program programs[2] = { input1(), input2() }; Context ctx = input1.getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, 2, programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } inline Program linkProgram( VECTOR_CLASS inputPrograms, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program * programs = (cl_program*) alloca(inputPrograms.size() * sizeof(cl_program)); if (programs != NULL) { for (unsigned int i = 0; i < inputPrograms.size(); i++) { programs[i] = inputPrograms[i](); } } Context ctx; if(inputPrograms.size() > 0) { ctx = inputPrograms[0].getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, (cl_uint)inputPrograms.size(), programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } #endif template<> inline VECTOR_CLASS cl::Program::getInfo(cl_int* err) const { VECTOR_CLASS< ::size_t> sizes = getInfo(); VECTOR_CLASS binaries; for (VECTOR_CLASS< ::size_t>::iterator s = sizes.begin(); s != sizes.end(); ++s) { char *ptr = NULL; if (*s != 0) ptr = new char[*s]; binaries.push_back(ptr); } cl_int result = getInfo(CL_PROGRAM_BINARIES, &binaries); if (err != NULL) { *err = result; } return binaries; } inline Kernel::Kernel(const Program& program, const char* name, cl_int* err) { cl_int error; object_ = ::clCreateKernel(program(), name, &error); detail::errHandler(error, __CREATE_KERNEL_ERR); if (err != NULL) { *err = error; } } /*! \class CommandQueue * \brief CommandQueue interface for cl_command_queue. */ class CommandQueue : public detail::Wrapper { private: #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED static std::atomic default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED static volatile int default_initialized_; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED static CommandQueue default_; static volatile cl_int default_error_; public: CommandQueue( cl_command_queue_properties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context */ explicit CommandQueue( const Context& context, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; VECTOR_CLASS devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } object_ = ::clCreateCommandQueue(context(), devices[0](), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } CommandQueue( const Context& context, const Device& device, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue(const CommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (const CommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue(CommandQueue&& queue) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (CommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) static CommandQueue getDefault(cl_int * err = NULL) { int state = detail::compare_exchange( &default_initialized_, __DEFAULT_BEING_INITIALIZED, __DEFAULT_NOT_INITIALIZED); if (state & __DEFAULT_INITIALIZED) { if (err != NULL) { *err = default_error_; } return default_; } if (state & __DEFAULT_BEING_INITIALIZED) { // Assume writes will propagate eventually... while(default_initialized_ != __DEFAULT_INITIALIZED) { detail::fence(); } if (err != NULL) { *err = default_error_; } return default_; } cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; default_ = CommandQueue(context, device, 0, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } detail::fence(); default_error_ = error; // Assume writes will propagate eventually... default_initialized_ = __DEFAULT_INITIALIZED; detail::fence(); if (err != NULL) { *err = default_error_; } return default_; } CommandQueue() { } __CL_EXPLICIT_CONSTRUCTORS CommandQueue(const cl_command_queue& commandQueue) : detail::Wrapper(commandQueue) { } CommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, ::size_t src_offset, ::size_t dst_offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBuffer( object_, src(), dst(), src_offset, dst_offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBufferRect( object_, buffer(), blocking, (const ::size_t *)buffer_offset, (const ::size_t *)host_offset, (const ::size_t *)region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, const void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBufferRect( object_, buffer(), blocking, (const ::size_t *)buffer_offset, (const ::size_t *)host_offset, (const ::size_t *)region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, ::size_t src_row_pitch, ::size_t src_slice_pitch, ::size_t dst_row_pitch, ::size_t dst_slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferRect( object_, src(), dst(), (const ::size_t *)src_origin, (const ::size_t *)dst_origin, (const ::size_t *)region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueue a command to fill a buffer object with a pattern * of a given size. The pattern is specified a as vector. * \tparam PatternType The datatype of the pattern field. * The pattern type must be an accepted OpenCL data type. */ template cl_int enqueueFillBuffer( const Buffer& buffer, PatternType pattern, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillBuffer( object_, buffer(), static_cast(&pattern), sizeof(PatternType), offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueReadImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadImage( object_, image(), blocking, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteImage( object_, image(), blocking, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyImage( const Image& src, const Image& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImage( object_, src(), dst(), (const ::size_t *) src_origin, (const ::size_t *)dst_origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA floating-point color value if * the image channel data type is not an unnormalized signed or * unsigned data type. */ cl_int enqueueFillImage( const Image& image, cl_float4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA signed integer color value if * the image channel data type is an unnormalized signed integer * type. */ cl_int enqueueFillImage( const Image& image, cl_int4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA unsigned integer color value if * the image channel data type is an unnormalized unsigned integer * type. */ cl_int enqueueFillImage( const Image& image, cl_uint4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& region, ::size_t dst_offset, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImageToBuffer( object_, src(), dst(), (const ::size_t *) src_origin, (const ::size_t *) region, dst_offset, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, ::size_t src_offset, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferToImage( object_, src(), dst(), src_offset, (const ::size_t *) dst_origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapBuffer( object_, buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } void* enqueueMapImage( const Image& buffer, cl_bool blocking, cl_map_flags flags, const size_t<3>& origin, const size_t<3>& region, ::size_t * row_pitch, ::size_t * slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapImage( object_, buffer(), blocking, flags, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_IMAGE_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( object_, memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueues a marker command which waits for either a list of events to complete, * or all previously enqueued commands to complete. * * Enqueues a marker command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command returns an event which can be waited on, * i.e. this event can be waited on to insure that all events either in the event_wait_list * or all previously enqueued commands, queued before this command to command_queue, * have completed. */ cl_int enqueueMarkerWithWaitList( const VECTOR_CLASS *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarkerWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * A synchronization point that enqueues a barrier operation. * * Enqueues a barrier command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command blocks command execution, that is, any * following commands enqueued after it do not execute until it completes. This command * returns an event which can be waited on, i.e. this event can be waited on to insure that * all events either in the event_wait_list or all previously enqueued commands, queued * before this command to command_queue, have completed. */ cl_int enqueueBarrierWithWaitList( const VECTOR_CLASS *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueBarrierWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_BARRIER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command to indicate with which device a set of memory objects * should be associated. */ cl_int enqueueMigrateMemObjects( const VECTOR_CLASS &memObjects, cl_mem_migration_flags flags, const VECTOR_CLASS* events = NULL, Event* event = NULL ) const { cl_event tmp; cl_mem* localMemObjects = static_cast(alloca(memObjects.size() * sizeof(cl_mem))); for( int i = 0; i < (int)memObjects.size(); ++i ) { localMemObjects[i] = memObjects[i](); } cl_int err = detail::errHandler( ::clEnqueueMigrateMemObjects( object_, (cl_uint)memObjects.size(), static_cast(localMemObjects), flags, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueNDRangeKernel( const Kernel& kernel, const NDRange& offset, const NDRange& global, const NDRange& local = NullRange, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNDRangeKernel( object_, kernel(), (cl_uint) global.dimensions(), offset.dimensions() != 0 ? (const ::size_t*) offset : NULL, (const ::size_t*) global, local.dimensions() != 0 ? (const ::size_t*) local : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NDRANGE_KERNEL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueTask( const Kernel& kernel, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueTask( object_, kernel(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_TASK_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueNativeKernel( void (CL_CALLBACK *userFptr)(void *), std::pair args, const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* mem_locs = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_mem * mems = (mem_objects != NULL && mem_objects->size() > 0) ? (cl_mem*) alloca(mem_objects->size() * sizeof(cl_mem)) : NULL; if (mems != NULL) { for (unsigned int i = 0; i < mem_objects->size(); i++) { mems[i] = ((*mem_objects)[i])(); } } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNativeKernel( object_, userFptr, args.first, args.second, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, mems, (mem_locs != NULL && mem_locs->size() > 0) ? (const void **) &mem_locs->front() : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NATIVE_KERNEL); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueMarker(Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarker( object_, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueWaitForEvents(const VECTOR_CLASS& events) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueWaitForEvents( object_, (cl_uint) events.size(), events.size() > 0 ? (const cl_event*) &events.front() : NULL), __ENQUEUE_WAIT_FOR_EVENTS_ERR); } #endif // #if defined(CL_VERSION_1_1) cl_int enqueueAcquireGLObjects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueAcquireGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseGLObjects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReleaseGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined (USE_DX_INTEROP) typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueAcquireD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueReleaseD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); cl_int enqueueAcquireD3D10Objects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { static PFN_clEnqueueAcquireD3D10ObjectsKHR pfn_clEnqueueAcquireD3D10ObjectsKHR = NULL; #if defined(CL_VERSION_1_2) cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clEnqueueAcquireD3D10ObjectsKHR); #endif #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clEnqueueAcquireD3D10ObjectsKHR); #endif cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueAcquireD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseD3D10Objects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { static PFN_clEnqueueReleaseD3D10ObjectsKHR pfn_clEnqueueReleaseD3D10ObjectsKHR = NULL; #if defined(CL_VERSION_1_2) cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clEnqueueReleaseD3D10ObjectsKHR); #endif // #if defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clEnqueueReleaseD3D10ObjectsKHR); #endif // #if defined(CL_VERSION_1_1) cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueReleaseD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueBarrier() const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueBarrier(object_), __ENQUEUE_BARRIER_ERR); } #endif // #if defined(CL_VERSION_1_1) cl_int flush() const { return detail::errHandler(::clFlush(object_), __FLUSH_ERR); } cl_int finish() const { return detail::errHandler(::clFinish(object_), __FINISH_ERR); } }; #ifdef _WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) std::atomic CommandQueue::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) volatile int CommandQueue::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) CommandQueue CommandQueue::default_; __declspec(selectany) volatile cl_int CommandQueue::default_error_ = CL_SUCCESS; #else // !_WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) std::atomic CommandQueue::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) volatile int CommandQueue::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) CommandQueue CommandQueue::default_; __attribute__((weak)) volatile cl_int CommandQueue::default_error_ = CL_SUCCESS; #endif // !_WIN32 template< typename IteratorType > Buffer::Buffer( const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { CommandQueue queue(context, 0, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } template< typename IteratorType > Buffer::Buffer( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if (readOnly) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); Context context = queue.getInfo(); if (useHostPtr) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if (!useHostPtr) { error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } inline cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBuffer(buffer, blocking, offset, size, ptr, events, event); } inline cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBuffer(buffer, blocking, offset, size, ptr, events, event); } inline void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } void * result = ::clEnqueueMapBuffer( queue(), buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (cl_event*) event, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } return result; } inline cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (error != CL_SUCCESS) { return error; } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( queue(), memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } inline cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, ::size_t src_offset, ::size_t dst_offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBuffer(src, dst, src_offset, dst_offset, size, events, event); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, startIterator, endIterator, buffer); } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, buffer, startIterator, endIterator); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; ::size_t length = endIterator-startIterator; ::size_t byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_WRITE, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } #if defined(_MSC_VER) std::copy( startIterator, endIterator, stdext::checked_array_iterator( pointer, length)); #else std::copy(startIterator, endIterator, pointer); #endif Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; ::size_t length = endIterator-startIterator; ::size_t byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_READ, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } std::copy(pointer, pointer + length, startIterator); Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } #if defined(CL_VERSION_1_1) inline cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, const void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, ::size_t src_row_pitch, ::size_t src_slice_pitch, ::size_t dst_row_pitch, ::size_t dst_slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferRect( src, dst, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, events, event); } #endif inline cl_int enqueueReadImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueCopyImage( const Image& src, const Image& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImage( src, dst, src_origin, dst_origin, region, events, event); } inline cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& region, ::size_t dst_offset, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImageToBuffer( src, dst, src_origin, region, dst_offset, events, event); } inline cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, ::size_t src_offset, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferToImage( src, dst, src_offset, dst_origin, region, events, event); } inline cl_int flush(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.flush(); } inline cl_int finish(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.finish(); } // Kernel Functor support // New interface as of September 2011 // Requires the C++11 std::tr1::function (note do not support TR1) // Visual Studio 2010 and GCC 4.2 struct EnqueueArgs { CommandQueue queue_; const NDRange offset_; const NDRange global_; const NDRange local_; VECTOR_CLASS events_; EnqueueArgs(NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { } EnqueueArgs(Event e, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(Event e, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(Event e, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(const VECTOR_CLASS &events, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(const VECTOR_CLASS &events, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(const VECTOR_CLASS &events, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(CommandQueue &queue, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, Event e, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local), events_(events) { } }; namespace detail { class NullType {}; template struct SetArg { static void set (Kernel kernel, T0 arg) { kernel.setArg(index, arg); } }; template struct SetArg { static void set (Kernel, NullType) { } }; template < typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30, typename T31 > class KernelFunctorGlobal { private: Kernel kernel_; public: KernelFunctorGlobal( Kernel kernel) : kernel_(kernel) {} KernelFunctorGlobal( const Program& program, const STRING_CLASS name, cl_int * err = NULL) : kernel_(program, name.c_str(), err) {} Event operator() ( const EnqueueArgs& args, T0 t0, T1 t1 = NullType(), T2 t2 = NullType(), T3 t3 = NullType(), T4 t4 = NullType(), T5 t5 = NullType(), T6 t6 = NullType(), T7 t7 = NullType(), T8 t8 = NullType(), T9 t9 = NullType(), T10 t10 = NullType(), T11 t11 = NullType(), T12 t12 = NullType(), T13 t13 = NullType(), T14 t14 = NullType(), T15 t15 = NullType(), T16 t16 = NullType(), T17 t17 = NullType(), T18 t18 = NullType(), T19 t19 = NullType(), T20 t20 = NullType(), T21 t21 = NullType(), T22 t22 = NullType(), T23 t23 = NullType(), T24 t24 = NullType(), T25 t25 = NullType(), T26 t26 = NullType(), T27 t27 = NullType(), T28 t28 = NullType(), T29 t29 = NullType(), T30 t30 = NullType(), T31 t31 = NullType() ) { Event event; SetArg<0, T0>::set(kernel_, t0); SetArg<1, T1>::set(kernel_, t1); SetArg<2, T2>::set(kernel_, t2); SetArg<3, T3>::set(kernel_, t3); SetArg<4, T4>::set(kernel_, t4); SetArg<5, T5>::set(kernel_, t5); SetArg<6, T6>::set(kernel_, t6); SetArg<7, T7>::set(kernel_, t7); SetArg<8, T8>::set(kernel_, t8); SetArg<9, T9>::set(kernel_, t9); SetArg<10, T10>::set(kernel_, t10); SetArg<11, T11>::set(kernel_, t11); SetArg<12, T12>::set(kernel_, t12); SetArg<13, T13>::set(kernel_, t13); SetArg<14, T14>::set(kernel_, t14); SetArg<15, T15>::set(kernel_, t15); SetArg<16, T16>::set(kernel_, t16); SetArg<17, T17>::set(kernel_, t17); SetArg<18, T18>::set(kernel_, t18); SetArg<19, T19>::set(kernel_, t19); SetArg<20, T20>::set(kernel_, t20); SetArg<21, T21>::set(kernel_, t21); SetArg<22, T22>::set(kernel_, t22); SetArg<23, T23>::set(kernel_, t23); SetArg<24, T24>::set(kernel_, t24); SetArg<25, T25>::set(kernel_, t25); SetArg<26, T26>::set(kernel_, t26); SetArg<27, T27>::set(kernel_, t27); SetArg<28, T28>::set(kernel_, t28); SetArg<29, T29>::set(kernel_, t29); SetArg<30, T30>::set(kernel_, t30); SetArg<31, T31>::set(kernel_, t31); args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } }; //------------------------------------------------------------------------------------------------------ template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30, typename T31> struct functionImplementation_ { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 32)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29, T30 arg30, T31 arg31) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29, arg30, arg31); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 31)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29, T30 arg30) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29, arg30); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 30)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 29)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 28)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 27)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 26)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 25)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 24)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 23)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 22)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 21)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 20)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 19)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 18)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 17)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 16)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 15)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 14)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 13)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 12)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 11)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 10)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 9)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 8)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 7)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 6)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4> struct functionImplementation_ < T0, T1, T2, T3, T4, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 5)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4); } }; template< typename T0, typename T1, typename T2, typename T3> struct functionImplementation_ < T0, T1, T2, T3, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 4)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3); } }; template< typename T0, typename T1, typename T2> struct functionImplementation_ < T0, T1, T2, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 3)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2) { return functor_( enqueueArgs, arg0, arg1, arg2); } }; template< typename T0, typename T1> struct functionImplementation_ < T0, T1, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 2)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1) { return functor_( enqueueArgs, arg0, arg1); } }; template< typename T0> struct functionImplementation_ < T0, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 1)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0) { return functor_( enqueueArgs, arg0); } }; } // namespace detail //---------------------------------------------------------------------------------------------- template < typename T0, typename T1 = detail::NullType, typename T2 = detail::NullType, typename T3 = detail::NullType, typename T4 = detail::NullType, typename T5 = detail::NullType, typename T6 = detail::NullType, typename T7 = detail::NullType, typename T8 = detail::NullType, typename T9 = detail::NullType, typename T10 = detail::NullType, typename T11 = detail::NullType, typename T12 = detail::NullType, typename T13 = detail::NullType, typename T14 = detail::NullType, typename T15 = detail::NullType, typename T16 = detail::NullType, typename T17 = detail::NullType, typename T18 = detail::NullType, typename T19 = detail::NullType, typename T20 = detail::NullType, typename T21 = detail::NullType, typename T22 = detail::NullType, typename T23 = detail::NullType, typename T24 = detail::NullType, typename T25 = detail::NullType, typename T26 = detail::NullType, typename T27 = detail::NullType, typename T28 = detail::NullType, typename T29 = detail::NullType, typename T30 = detail::NullType, typename T31 = detail::NullType > struct make_kernel : public detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 > { public: typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 > FunctorType; make_kernel( const Program& program, const STRING_CLASS name, cl_int * err = NULL) : detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 >( FunctorType(program, name, err)) {} make_kernel( const Kernel kernel) : detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 >( FunctorType(kernel)) {} }; //---------------------------------------------------------------------------------------------------------------------- #undef __ERR_STR #if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS) #undef __GET_DEVICE_INFO_ERR #undef __GET_PLATFORM_INFO_ERR #undef __GET_DEVICE_IDS_ERR #undef __GET_CONTEXT_INFO_ERR #undef __GET_EVENT_INFO_ERR #undef __GET_EVENT_PROFILE_INFO_ERR #undef __GET_MEM_OBJECT_INFO_ERR #undef __GET_IMAGE_INFO_ERR #undef __GET_SAMPLER_INFO_ERR #undef __GET_KERNEL_INFO_ERR #undef __GET_KERNEL_ARG_INFO_ERR #undef __GET_KERNEL_WORK_GROUP_INFO_ERR #undef __GET_PROGRAM_INFO_ERR #undef __GET_PROGRAM_BUILD_INFO_ERR #undef __GET_COMMAND_QUEUE_INFO_ERR #undef __CREATE_CONTEXT_ERR #undef __CREATE_CONTEXT_FROM_TYPE_ERR #undef __GET_SUPPORTED_IMAGE_FORMATS_ERR #undef __CREATE_BUFFER_ERR #undef __CREATE_SUBBUFFER_ERR #undef __CREATE_IMAGE2D_ERR #undef __CREATE_IMAGE3D_ERR #undef __CREATE_SAMPLER_ERR #undef __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR #undef __CREATE_USER_EVENT_ERR #undef __SET_USER_EVENT_STATUS_ERR #undef __SET_EVENT_CALLBACK_ERR #undef __SET_PRINTF_CALLBACK_ERR #undef __WAIT_FOR_EVENTS_ERR #undef __CREATE_KERNEL_ERR #undef __SET_KERNEL_ARGS_ERR #undef __CREATE_PROGRAM_WITH_SOURCE_ERR #undef __CREATE_PROGRAM_WITH_BINARY_ERR #undef __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR #undef __BUILD_PROGRAM_ERR #undef __CREATE_KERNELS_IN_PROGRAM_ERR #undef __CREATE_COMMAND_QUEUE_ERR #undef __SET_COMMAND_QUEUE_PROPERTY_ERR #undef __ENQUEUE_READ_BUFFER_ERR #undef __ENQUEUE_WRITE_BUFFER_ERR #undef __ENQUEUE_READ_BUFFER_RECT_ERR #undef __ENQUEUE_WRITE_BUFFER_RECT_ERR #undef __ENQEUE_COPY_BUFFER_ERR #undef __ENQEUE_COPY_BUFFER_RECT_ERR #undef __ENQUEUE_READ_IMAGE_ERR #undef __ENQUEUE_WRITE_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR #undef __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR #undef __ENQUEUE_MAP_BUFFER_ERR #undef __ENQUEUE_MAP_IMAGE_ERR #undef __ENQUEUE_UNMAP_MEM_OBJECT_ERR #undef __ENQUEUE_NDRANGE_KERNEL_ERR #undef __ENQUEUE_TASK_ERR #undef __ENQUEUE_NATIVE_KERNEL #undef __CL_EXPLICIT_CONSTRUCTORS #undef __UNLOAD_COMPILER_ERR #endif //__CL_USER_OVERRIDE_ERROR_STRINGS #undef __CL_FUNCTION_TYPE // Extensions /** * Deprecated APIs for 1.2 */ #if defined(CL_VERSION_1_1) #undef __INIT_CL_EXT_FCN_PTR #endif // #if defined(CL_VERSION_1_1) #undef __CREATE_SUB_DEVICES #if defined(USE_CL_DEVICE_FISSION) #undef __PARAM_NAME_DEVICE_FISSION #endif // USE_CL_DEVICE_FISSION #undef __DEFAULT_NOT_INITIALIZED #undef __DEFAULT_BEING_INITIALIZED #undef __DEFAULT_INITIALIZED #undef CL_HPP_RVALUE_REFERENCES_SUPPORTED #undef CL_HPP_NOEXCEPT } // namespace cl #endif // CL_HPP_ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl2.hpp000066400000000000000000011116741450307266000230260ustar00rootroot00000000000000/* Modifications Copyright(C)[2021-2022] Advanced Micro Devices, Inc. * All rights reserved. * */ /******************************************************************************* * Copyright (c) 2008-2016 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /*! \file * * \brief C++ bindings for OpenCL 1.0 (rev 48), OpenCL 1.1 (rev 33), * OpenCL 1.2 (rev 15) and OpenCL 2.0 (rev 29) * \author Lee Howes and Bruce Merry * * Derived from the OpenCL 1.x C++ bindings written by * Benedict R. Gaster, Laurent Morichetti and Lee Howes * With additions and fixes from: * Brian Cole, March 3rd 2010 and April 2012 * Matt Gruenke, April 2012. * Bruce Merry, February 2013. * Tom Deakin and Simon McIntosh-Smith, July 2013 * James Price, 2015- * * \version 2.0.10 * \date 2016-07-20 * * Optional extension support * * cl_ext_device_fission * #define CL_HPP_USE_CL_DEVICE_FISSION * cl_khr_d3d10_sharing * #define CL_HPP_USE_DX_INTEROP * cl_khr_sub_groups * #define CL_HPP_USE_CL_SUB_GROUPS_KHR * cl_khr_image2d_from_buffer * #define CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR * * Doxygen documentation for this header is available here: * * http://khronosgroup.github.io/OpenCL-CLHPP/ * * The latest version of this header can be found on the GitHub releases page: * * https://github.com/KhronosGroup/OpenCL-CLHPP/releases * * Bugs and patches can be submitted to the GitHub repository: * * https://github.com/KhronosGroup/OpenCL-CLHPP */ /*! \mainpage * \section intro Introduction * For many large applications C++ is the language of choice and so it seems * reasonable to define C++ bindings for OpenCL. * * The interface is contained with a single C++ header file \em cl2.hpp and all * definitions are contained within the namespace \em cl. There is no additional * requirement to include \em cl.h and to use either the C++ or original C * bindings; it is enough to simply include \em cl2.hpp. * * The bindings themselves are lightweight and correspond closely to the * underlying C API. Using the C++ bindings introduces no additional execution * overhead. * * There are numerous compatibility, portability and memory management * fixes in the new header as well as additional OpenCL 2.0 features. * As a result the header is not directly backward compatible and for this * reason we release it as cl2.hpp rather than a new version of cl.hpp. * * * \section compatibility Compatibility * Due to the evolution of the underlying OpenCL API the 2.0 C++ bindings * include an updated approach to defining supported feature versions * and the range of valid underlying OpenCL runtime versions supported. * * The combination of preprocessor macros CL_HPP_TARGET_OPENCL_VERSION and * CL_HPP_MINIMUM_OPENCL_VERSION control this range. These are three digit * decimal values representing OpenCL runime versions. The default for * the target is 200, representing OpenCL 2.0 and the minimum is also * defined as 200. These settings would use 2.0 API calls only. * If backward compatibility with a 1.2 runtime is required, the minimum * version may be set to 120. * * Note that this is a compile-time setting, and so affects linking against * a particular SDK version rather than the versioning of the loaded runtime. * * The earlier versions of the header included basic vector and string * classes based loosely on STL versions. These were difficult to * maintain and very rarely used. For the 2.0 header we now assume * the presence of the standard library unless requested otherwise. * We use std::array, std::vector, std::shared_ptr and std::string * throughout to safely manage memory and reduce the chance of a * recurrance of earlier memory management bugs. * * These classes are used through typedefs in the cl namespace: * cl::array, cl::vector, cl::pointer and cl::string. * In addition cl::allocate_pointer forwards to std::allocate_shared * by default. * In all cases these standard library classes can be replaced with * custom interface-compatible versions using the CL_HPP_NO_STD_ARRAY, * CL_HPP_NO_STD_VECTOR, CL_HPP_NO_STD_UNIQUE_PTR and * CL_HPP_NO_STD_STRING macros. * * The OpenCL 1.x versions of the C++ bindings included a size_t wrapper * class to interface with kernel enqueue. This caused unpleasant interactions * with the standard size_t declaration and led to namespacing bugs. * In the 2.0 version we have replaced this with a std::array-based interface. * However, the old behaviour can be regained for backward compatibility * using the CL_HPP_ENABLE_SIZE_T_COMPATIBILITY macro. * * Finally, the program construction interface used a clumsy vector-of-pairs * design in the earlier versions. We have replaced that with a cleaner * vector-of-vectors and vector-of-strings design. However, for backward * compatibility old behaviour can be regained with the * CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY macro. * * In OpenCL 2.0 OpenCL C is not entirely backward compatibility with * earlier versions. As a result a flag must be passed to the OpenCL C * compiled to request OpenCL 2.0 compilation of kernels with 1.2 as * the default in the absence of the flag. * In some cases the C++ bindings automatically compile code for ease. * For those cases the compilation defaults to OpenCL C 2.0. * If this is not wanted, the CL_HPP_CL_1_2_DEFAULT_BUILD macro may * be specified to assume 1.2 compilation. * If more fine-grained decisions on a per-kernel bases are required * then explicit build operations that take the flag should be used. * * * \section parameterization Parameters * This header may be parameterized by a set of preprocessor macros. * * - CL_HPP_TARGET_OPENCL_VERSION * * Defines the target OpenCL runtime version to build the header * against. Defaults to 200, representing OpenCL 2.0. * * - CL_HPP_NO_STD_STRING * * Do not use the standard library string class. cl::string is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_VECTOR * * Do not use the standard library vector class. cl::vector is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_ARRAY * * Do not use the standard library array class. cl::array is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_UNIQUE_PTR * * Do not use the standard library unique_ptr class. cl::pointer and * the cl::allocate_pointer functions are not defined and may be * defined by the user before cl2.hpp is included. * * - CL_HPP_ENABLE_DEVICE_FISSION * * Enables device fission for OpenCL 1.2 platforms. * * - CL_HPP_ENABLE_EXCEPTIONS * * Enable exceptions for use in the C++ bindings header. This is the * preferred error handling mechanism but is not required. * * - CL_HPP_ENABLE_SIZE_T_COMPATIBILITY * * Backward compatibility option to support cl.hpp-style size_t * class. Replaces the updated std::array derived version and * removal of size_t from the namespace. Note that in this case the * new size_t class is placed in the cl::compatibility namespace and * thus requires an additional using declaration for direct backward * compatibility. * * - CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY * * Enable older vector of pairs interface for construction of * programs. * * - CL_HPP_CL_1_2_DEFAULT_BUILD * * Default to OpenCL C 1.2 compilation rather than OpenCL C 2.0 * applies to use of cl::Program construction and other program * build variants. * * * \section example Example * * The following example shows a general use case for the C++ * bindings, including support for the optional exception feature and * also the supplied vector and string classes, see following sections for * decriptions of these features. * * \code #define CL_HPP_ENABLE_EXCEPTIONS #define CL_HPP_TARGET_OPENCL_VERSION 200 #include #include #include #include #include const int numElements = 32; int main(void) { // Filter for a 2.0 platform and set it as the default std::vector platforms; cl::Platform::get(&platforms); cl::Platform plat; for (auto &p : platforms) { std::string platver = p.getInfo(); if (platver.find("OpenCL 2.") != std::string::npos) { plat = p; } } if (plat() == 0) { std::cout << "No OpenCL 2.0 platform found."; return -1; } cl::Platform newP = cl::Platform::setDefault(plat); if (newP != plat) { std::cout << "Error setting default platform."; return -1; } // Use C++11 raw string literals for kernel source code std::string kernel1{R"CLC( global int globalA; kernel void updateGlobal() { globalA = 75; } )CLC"}; std::string kernel2{R"CLC( typedef struct { global int *bar; } Foo; kernel void vectorAdd(global const Foo* aNum, global const int *inputA, global const int *inputB, global int *output, int val, write_only pipe int outPipe, queue_t childQueue) { output[get_global_id(0)] = inputA[get_global_id(0)] + inputB[get_global_id(0)] + val + *(aNum->bar); write_pipe(outPipe, &val); queue_t default_queue = get_default_queue(); ndrange_t ndrange = ndrange_1D(get_global_size(0)/2, get_global_size(0)/2); // Have a child kernel write into third quarter of output enqueue_kernel(default_queue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ output[get_global_size(0)*2 + get_global_id(0)] = inputA[get_global_size(0)*2 + get_global_id(0)] + inputB[get_global_size(0)*2 + get_global_id(0)] + globalA; }); // Have a child kernel write into last quarter of output enqueue_kernel(childQueue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ output[get_global_size(0)*3 + get_global_id(0)] = inputA[get_global_size(0)*3 + get_global_id(0)] + inputB[get_global_size(0)*3 + get_global_id(0)] + globalA + 2; }); } )CLC"}; // New simpler string interface style std::vector programStrings {kernel1, kernel2}; cl::Program vectorAddProgram(programStrings); try { vectorAddProgram.build("-cl-std=CL2.0"); } catch (...) { // Print build info for all devices cl_int buildErr = CL_SUCCESS; auto buildInfo = vectorAddProgram.getBuildInfo(&buildErr); for (auto &pair : buildInfo) { std::cerr << pair.second << std::endl << std::endl; } return 1; } typedef struct { int *bar; } Foo; // Get and run kernel that initializes the program-scope global // A test for kernels that take no arguments auto program2Kernel = cl::KernelFunctor<>(vectorAddProgram, "updateGlobal"); program2Kernel( cl::EnqueueArgs( cl::NDRange(1))); ////////////////// // SVM allocations auto anSVMInt = cl::allocate_svm>(); *anSVMInt = 5; cl::SVMAllocator>> svmAllocReadOnly; auto fooPointer = cl::allocate_pointer(svmAllocReadOnly); fooPointer->bar = anSVMInt.get(); cl::SVMAllocator> svmAlloc; std::vector>> inputA(numElements, 1, svmAlloc); cl::coarse_svm_vector inputB(numElements, 2, svmAlloc); // ////////////// // Traditional cl_mem allocations std::vector output(numElements, 0xdeadbeef); cl::Buffer outputBuffer(begin(output), end(output), false); cl::Pipe aPipe(sizeof(cl_int), numElements / 2); // Default command queue, also passed in as a parameter cl::DeviceCommandQueue defaultDeviceQueue = cl::DeviceCommandQueue::makeDefault( cl::Context::getDefault(), cl::Device::getDefault()); auto vectorAddKernel = cl::KernelFunctor< decltype(fooPointer)&, int*, cl::coarse_svm_vector&, cl::Buffer, int, cl::Pipe&, cl::DeviceCommandQueue >(vectorAddProgram, "vectorAdd"); // Ensure that the additional SVM pointer is available to the kernel // This one was not passed as a parameter vectorAddKernel.setSVMPointers(anSVMInt); // Hand control of coarse allocations to runtime cl::enqueueUnmapSVM(anSVMInt); cl::enqueueUnmapSVM(fooPointer); cl::unmapSVM(inputB); cl::unmapSVM(output2); cl_int error; vectorAddKernel( cl::EnqueueArgs( cl::NDRange(numElements/2), cl::NDRange(numElements/2)), fooPointer, inputA.data(), inputB, outputBuffer, 3, aPipe, defaultDeviceQueue, error ); cl::copy(outputBuffer, begin(output), end(output)); // Grab the SVM output vector using a map cl::mapSVM(output2); cl::Device d = cl::Device::getDefault(); std::cout << "Output:\n"; for (int i = 1; i < numElements; ++i) { std::cout << "\t" << output[i] << "\n"; } std::cout << "\n\n"; return 0; } * * \endcode * */ #ifndef CL_HPP_ #define CL_HPP_ /* Handle deprecated preprocessor definitions. In each case, we only check for * the old name if the new name is not defined, so that user code can define * both and hence work with either version of the bindings. */ #if !defined(CL_HPP_USE_DX_INTEROP) && defined(USE_DX_INTEROP) # pragma message("cl2.hpp: USE_DX_INTEROP is deprecated. Define CL_HPP_USE_DX_INTEROP instead") # define CL_HPP_USE_DX_INTEROP #endif #if !defined(CL_HPP_USE_CL_DEVICE_FISSION) && defined(USE_CL_DEVICE_FISSION) # pragma message("cl2.hpp: USE_CL_DEVICE_FISSION is deprecated. Define CL_HPP_USE_CL_DEVICE_FISSION instead") # define CL_HPP_USE_CL_DEVICE_FISSION #endif #if !defined(CL_HPP_ENABLE_EXCEPTIONS) && defined(__CL_ENABLE_EXCEPTIONS) # pragma message("cl2.hpp: __CL_ENABLE_EXCEPTIONS is deprecated. Define CL_HPP_ENABLE_EXCEPTIONS instead") # define CL_HPP_ENABLE_EXCEPTIONS #endif #if !defined(CL_HPP_NO_STD_VECTOR) && defined(__NO_STD_VECTOR) # pragma message("cl2.hpp: __NO_STD_VECTOR is deprecated. Define CL_HPP_NO_STD_VECTOR instead") # define CL_HPP_NO_STD_VECTOR #endif #if !defined(CL_HPP_NO_STD_STRING) && defined(__NO_STD_STRING) # pragma message("cl2.hpp: __NO_STD_STRING is deprecated. Define CL_HPP_NO_STD_STRING instead") # define CL_HPP_NO_STD_STRING #endif #if defined(VECTOR_CLASS) # pragma message("cl2.hpp: VECTOR_CLASS is deprecated. Alias cl::vector instead") #endif #if defined(STRING_CLASS) # pragma message("cl2.hpp: STRING_CLASS is deprecated. Alias cl::string instead.") #endif #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) && defined(__CL_USER_OVERRIDE_ERROR_STRINGS) # pragma message("cl2.hpp: __CL_USER_OVERRIDE_ERROR_STRINGS is deprecated. Define CL_HPP_USER_OVERRIDE_ERROR_STRINGS instead") # define CL_HPP_USER_OVERRIDE_ERROR_STRINGS #endif /* Warn about features that are no longer supported */ #if defined(__USE_DEV_VECTOR) # pragma message("cl2.hpp: __USE_DEV_VECTOR is no longer supported. Expect compilation errors") #endif #if defined(__USE_DEV_STRING) # pragma message("cl2.hpp: __USE_DEV_STRING is no longer supported. Expect compilation errors") #endif /* Detect which version to target */ #if !defined(CL_HPP_TARGET_OPENCL_VERSION) # pragma message("cl2.hpp: CL_HPP_TARGET_OPENCL_VERSION is not defined. It will default to 200 (OpenCL 2.0)") # define CL_HPP_TARGET_OPENCL_VERSION 200 #endif #if CL_HPP_TARGET_OPENCL_VERSION != 100 && CL_HPP_TARGET_OPENCL_VERSION != 110 && CL_HPP_TARGET_OPENCL_VERSION != 120 && CL_HPP_TARGET_OPENCL_VERSION != 200 # pragma message("cl2.hpp: CL_HPP_TARGET_OPENCL_VERSION is not a valid value (100, 110, 120 or 200). It will be set to 200") # undef CL_HPP_TARGET_OPENCL_VERSION # define CL_HPP_TARGET_OPENCL_VERSION 200 #endif #if !defined(CL_HPP_MINIMUM_OPENCL_VERSION) # define CL_HPP_MINIMUM_OPENCL_VERSION 200 #endif #if CL_HPP_MINIMUM_OPENCL_VERSION != 100 && CL_HPP_MINIMUM_OPENCL_VERSION != 110 && CL_HPP_MINIMUM_OPENCL_VERSION != 120 && CL_HPP_MINIMUM_OPENCL_VERSION != 200 # pragma message("cl2.hpp: CL_HPP_MINIMUM_OPENCL_VERSION is not a valid value (100, 110, 120 or 200). It will be set to 100") # undef CL_HPP_MINIMUM_OPENCL_VERSION # define CL_HPP_MINIMUM_OPENCL_VERSION 100 #endif #if CL_HPP_MINIMUM_OPENCL_VERSION > CL_HPP_TARGET_OPENCL_VERSION # error "CL_HPP_MINIMUM_OPENCL_VERSION must not be greater than CL_HPP_TARGET_OPENCL_VERSION" #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 100 && !defined(CL_USE_DEPRECATED_OPENCL_1_0_APIS) # define CL_USE_DEPRECATED_OPENCL_1_0_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 110 && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) # define CL_USE_DEPRECATED_OPENCL_1_1_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 120 && !defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) # define CL_USE_DEPRECATED_OPENCL_1_2_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 200 && !defined(CL_USE_DEPRECATED_OPENCL_2_0_APIS) # define CL_USE_DEPRECATED_OPENCL_2_0_APIS #endif #ifdef _WIN32 #include #if defined(CL_HPP_USE_DX_INTEROP) #include #include #endif #endif // _WIN32 #if defined(_MSC_VER) #include #endif // _MSC_VER // Check for a valid C++ version // Need to do both tests here because for some reason __cplusplus is not // updated in visual studio #if (!defined(_MSC_VER) && __cplusplus < 201103L) || (defined(_MSC_VER) && _MSC_VER < 1700) #error Visual studio 2013 or another C++11-supporting compiler required #endif // #if defined(CL_HPP_USE_CL_DEVICE_FISSION) || defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) #include #endif #if defined(__APPLE__) || defined(__MACOSX) #include #else #include #endif // !__APPLE__ #if (__cplusplus >= 201103L) #define CL_HPP_NOEXCEPT_ noexcept #else #define CL_HPP_NOEXCEPT_ #endif #if defined(_MSC_VER) # define CL_HPP_DEFINE_STATIC_MEMBER_ __declspec(selectany) #else # define CL_HPP_DEFINE_STATIC_MEMBER_ __attribute__((weak)) #endif // !_MSC_VER // Define deprecated prefixes and suffixes to ensure compilation // in case they are not pre-defined #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #if !defined(CL_CALLBACK) #define CL_CALLBACK #endif //CL_CALLBACK #include #include #include #include #include #include // Define a size_type to represent a correctly resolved size_t #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { using size_type = ::size_t; } // namespace cl #else // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { using size_type = size_t; } // namespace cl #endif // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) #if defined(CL_HPP_ENABLE_EXCEPTIONS) #include #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) #if !defined(CL_HPP_NO_STD_VECTOR) #include namespace cl { template < class T, class Alloc = std::allocator > using vector = std::vector; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_VECTOR) #if !defined(CL_HPP_NO_STD_STRING) #include namespace cl { using string = std::string; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_STRING) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) #include namespace cl { // Replace unique_ptr and allocate_pointer for internal use // to allow user to replace them template using pointer = std::unique_ptr; } // namespace cl #endif #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if !defined(CL_HPP_NO_STD_ARRAY) #include namespace cl { template < class T, size_type N > using array = std::array; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_ARRAY) // Define size_type appropriately to allow backward-compatibility // use of the old size_t interface class #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { namespace compatibility { /*! \brief class used to interface between C++ and * OpenCL C calls that require arrays of size_t values, whose * size is known statically. */ template class size_t { private: size_type data_[N]; public: //! \brief Initialize size_t to all 0s size_t() { for (int i = 0; i < N; ++i) { data_[i] = 0; } } size_t(const array &rhs) { for (int i = 0; i < N; ++i) { data_[i] = rhs[i]; } } size_type& operator[](int index) { return data_[index]; } const size_type& operator[](int index) const { return data_[index]; } //! \brief Conversion operator to T*. operator size_type* () { return data_; } //! \brief Conversion operator to const T*. operator const size_type* () const { return data_; } operator array() const { array ret; for (int i = 0; i < N; ++i) { ret[i] = data_[i]; } return ret; } }; } // namespace compatibility template using size_t = compatibility::size_t; } // namespace cl #endif // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) // Helper alias to avoid confusing the macros namespace cl { namespace detail { using size_t_array = array; } // namespace detail } // namespace cl /*! \namespace cl * * \brief The OpenCL C++ bindings are defined within this namespace. * */ namespace cl { class Memory; #define CL_HPP_INIT_CL_EXT_FCN_PTR_(name) \ if (!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddress(#name); \ if (!pfn_##name) { \ } \ } #define CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, name) \ if (!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddressForPlatform(platform, #name); \ if (!pfn_##name) { \ } \ } class Program; class Device; class Context; class CommandQueue; class DeviceCommandQueue; class Memory; class Buffer; class Pipe; #if defined(CL_HPP_ENABLE_EXCEPTIONS) /*! \brief Exception class * * This may be thrown by API functions when CL_HPP_ENABLE_EXCEPTIONS is defined. */ class Error : public std::exception { private: cl_int err_; const char * errStr_; public: /*! \brief Create a new CL error exception for a given error code * and corresponding message. * * \param err error code value. * * \param errStr a descriptive string that must remain in scope until * handling of the exception has concluded. If set, it * will be returned by what(). */ Error(cl_int err, const char * errStr = NULL) : err_(err), errStr_(errStr) {} ~Error() throw() {} /*! \brief Get error string associated with exception * * \return A memory pointer to the error message string. */ virtual const char * what() const throw () { if (errStr_ == NULL) { return "empty"; } else { return errStr_; } } /*! \brief Get error code associated with exception * * \return The error code. */ cl_int err(void) const { return err_; } }; #define CL_HPP_ERR_STR_(x) #x #else #define CL_HPP_ERR_STR_(x) NULL #endif // CL_HPP_ENABLE_EXCEPTIONS namespace detail { #if defined(CL_HPP_ENABLE_EXCEPTIONS) static inline cl_int errHandler ( cl_int err, const char * errStr = NULL) { if (err != CL_SUCCESS) { throw Error(err, errStr); } return err; } #else static inline cl_int errHandler (cl_int err, const char * errStr = NULL) { (void) errStr; // suppress unused variable warning return err; } #endif // CL_HPP_ENABLE_EXCEPTIONS } //! \cond DOXYGEN_DETAIL #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) #define __GET_DEVICE_INFO_ERR CL_HPP_ERR_STR_(clGetDeviceInfo) #define __GET_PLATFORM_INFO_ERR CL_HPP_ERR_STR_(clGetPlatformInfo) #define __GET_DEVICE_IDS_ERR CL_HPP_ERR_STR_(clGetDeviceIDs) #define __GET_PLATFORM_IDS_ERR CL_HPP_ERR_STR_(clGetPlatformIDs) #define __GET_CONTEXT_INFO_ERR CL_HPP_ERR_STR_(clGetContextInfo) #define __GET_EVENT_INFO_ERR CL_HPP_ERR_STR_(clGetEventInfo) #define __GET_EVENT_PROFILE_INFO_ERR CL_HPP_ERR_STR_(clGetEventProfileInfo) #define __GET_MEM_OBJECT_INFO_ERR CL_HPP_ERR_STR_(clGetMemObjectInfo) #define __GET_IMAGE_INFO_ERR CL_HPP_ERR_STR_(clGetImageInfo) #define __GET_SAMPLER_INFO_ERR CL_HPP_ERR_STR_(clGetSamplerInfo) #define __GET_KERNEL_INFO_ERR CL_HPP_ERR_STR_(clGetKernelInfo) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __GET_KERNEL_ARG_INFO_ERR CL_HPP_ERR_STR_(clGetKernelArgInfo) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __GET_KERNEL_WORK_GROUP_INFO_ERR CL_HPP_ERR_STR_(clGetKernelWorkGroupInfo) #define __GET_PROGRAM_INFO_ERR CL_HPP_ERR_STR_(clGetProgramInfo) #define __GET_PROGRAM_BUILD_INFO_ERR CL_HPP_ERR_STR_(clGetProgramBuildInfo) #define __GET_COMMAND_QUEUE_INFO_ERR CL_HPP_ERR_STR_(clGetCommandQueueInfo) #define __CREATE_CONTEXT_ERR CL_HPP_ERR_STR_(clCreateContext) #define __CREATE_CONTEXT_FROM_TYPE_ERR CL_HPP_ERR_STR_(clCreateContextFromType) #define __GET_SUPPORTED_IMAGE_FORMATS_ERR CL_HPP_ERR_STR_(clGetSupportedImageFormats) #define __CREATE_BUFFER_ERR CL_HPP_ERR_STR_(clCreateBuffer) #define __COPY_ERR CL_HPP_ERR_STR_(cl::copy) #define __CREATE_SUBBUFFER_ERR CL_HPP_ERR_STR_(clCreateSubBuffer) #define __CREATE_GL_BUFFER_ERR CL_HPP_ERR_STR_(clCreateFromGLBuffer) #define __CREATE_GL_RENDER_BUFFER_ERR CL_HPP_ERR_STR_(clCreateFromGLBuffer) #define __GET_GL_OBJECT_INFO_ERR CL_HPP_ERR_STR_(clGetGLObjectInfo) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_IMAGE_ERR CL_HPP_ERR_STR_(clCreateImage) #define __CREATE_GL_TEXTURE_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture) #define __IMAGE_DIMENSION_ERR CL_HPP_ERR_STR_(Incorrect image dimensions) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR CL_HPP_ERR_STR_(clSetMemObjectDestructorCallback) #define __CREATE_USER_EVENT_ERR CL_HPP_ERR_STR_(clCreateUserEvent) #define __SET_USER_EVENT_STATUS_ERR CL_HPP_ERR_STR_(clSetUserEventStatus) #define __SET_EVENT_CALLBACK_ERR CL_HPP_ERR_STR_(clSetEventCallback) #define __WAIT_FOR_EVENTS_ERR CL_HPP_ERR_STR_(clWaitForEvents) #define __CREATE_KERNEL_ERR CL_HPP_ERR_STR_(clCreateKernel) #define __SET_KERNEL_ARGS_ERR CL_HPP_ERR_STR_(clSetKernelArg) #define __CREATE_PROGRAM_WITH_SOURCE_ERR CL_HPP_ERR_STR_(clCreateProgramWithSource) #define __CREATE_PROGRAM_WITH_BINARY_ERR CL_HPP_ERR_STR_(clCreateProgramWithBinary) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR CL_HPP_ERR_STR_(clCreateProgramWithBuiltInKernels) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __BUILD_PROGRAM_ERR CL_HPP_ERR_STR_(clBuildProgram) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __COMPILE_PROGRAM_ERR CL_HPP_ERR_STR_(clCompileProgram) #define __LINK_PROGRAM_ERR CL_HPP_ERR_STR_(clLinkProgram) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_KERNELS_IN_PROGRAM_ERR CL_HPP_ERR_STR_(clCreateKernelsInProgram) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #define __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR CL_HPP_ERR_STR_(clCreateCommandQueueWithProperties) #define __CREATE_SAMPLER_WITH_PROPERTIES_ERR CL_HPP_ERR_STR_(clCreateSamplerWithProperties) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 #define __SET_COMMAND_QUEUE_PROPERTY_ERR CL_HPP_ERR_STR_(clSetCommandQueueProperty) #define __ENQUEUE_READ_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueReadBuffer) #define __ENQUEUE_READ_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueReadBufferRect) #define __ENQUEUE_WRITE_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueWriteBuffer) #define __ENQUEUE_WRITE_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueWriteBufferRect) #define __ENQEUE_COPY_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueCopyBuffer) #define __ENQEUE_COPY_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueCopyBufferRect) #define __ENQUEUE_FILL_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueFillBuffer) #define __ENQUEUE_READ_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueReadImage) #define __ENQUEUE_WRITE_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueWriteImage) #define __ENQUEUE_COPY_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueCopyImage) #define __ENQUEUE_FILL_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueFillImage) #define __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueCopyImageToBuffer) #define __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueCopyBufferToImage) #define __ENQUEUE_MAP_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueMapBuffer) #define __ENQUEUE_MAP_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueMapImage) #define __ENQUEUE_UNMAP_MEM_OBJECT_ERR CL_HPP_ERR_STR_(clEnqueueUnMapMemObject) #define __ENQUEUE_NDRANGE_KERNEL_ERR CL_HPP_ERR_STR_(clEnqueueNDRangeKernel) #define __ENQUEUE_NATIVE_KERNEL CL_HPP_ERR_STR_(clEnqueueNativeKernel) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_MIGRATE_MEM_OBJECTS_ERR CL_HPP_ERR_STR_(clEnqueueMigrateMemObjects) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_ACQUIRE_GL_ERR CL_HPP_ERR_STR_(clEnqueueAcquireGLObjects) #define __ENQUEUE_RELEASE_GL_ERR CL_HPP_ERR_STR_(clEnqueueReleaseGLObjects) #define __CREATE_PIPE_ERR CL_HPP_ERR_STR_(clCreatePipe) #define __GET_PIPE_INFO_ERR CL_HPP_ERR_STR_(clGetPipeInfo) #define __RETAIN_ERR CL_HPP_ERR_STR_(Retain Object) #define __RELEASE_ERR CL_HPP_ERR_STR_(Release Object) #define __FLUSH_ERR CL_HPP_ERR_STR_(clFlush) #define __FINISH_ERR CL_HPP_ERR_STR_(clFinish) #define __VECTOR_CAPACITY_ERR CL_HPP_ERR_STR_(Vector capacity error) /** * CL 1.2 version that uses device fission. */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_SUB_DEVICES_ERR CL_HPP_ERR_STR_(clCreateSubDevices) #else #define __CREATE_SUB_DEVICES_ERR CL_HPP_ERR_STR_(clCreateSubDevicesEXT) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __ENQUEUE_MARKER_ERR CL_HPP_ERR_STR_(clEnqueueMarker) #define __ENQUEUE_WAIT_FOR_EVENTS_ERR CL_HPP_ERR_STR_(clEnqueueWaitForEvents) #define __ENQUEUE_BARRIER_ERR CL_HPP_ERR_STR_(clEnqueueBarrier) #define __UNLOAD_COMPILER_ERR CL_HPP_ERR_STR_(clUnloadCompiler) #define __CREATE_GL_TEXTURE_2D_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture2D) #define __CREATE_GL_TEXTURE_3D_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture3D) #define __CREATE_IMAGE2D_ERR CL_HPP_ERR_STR_(clCreateImage2D) #define __CREATE_IMAGE3D_ERR CL_HPP_ERR_STR_(clCreateImage3D) #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * Deprecated APIs for 2.0 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) #define __CREATE_COMMAND_QUEUE_ERR CL_HPP_ERR_STR_(clCreateCommandQueue) #define __ENQUEUE_TASK_ERR CL_HPP_ERR_STR_(clEnqueueTask) #define __CREATE_SAMPLER_ERR CL_HPP_ERR_STR_(clCreateSampler) #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * CL 1.2 marker and barrier commands */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_MARKER_WAIT_LIST_ERR CL_HPP_ERR_STR_(clEnqueueMarkerWithWaitList) #define __ENQUEUE_BARRIER_WAIT_LIST_ERR CL_HPP_ERR_STR_(clEnqueueBarrierWithWaitList) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #endif // CL_HPP_USER_OVERRIDE_ERROR_STRINGS //! \endcond namespace detail { // Generic getInfoHelper. The final parameter is used to guide overload // resolution: the actual parameter passed is an int, which makes this // a worse conversion sequence than a specialization that declares the // parameter as an int. template inline cl_int getInfoHelper(Functor f, cl_uint name, T* param, long) { return f(name, sizeof(T), param, NULL); } // Specialized for getInfo // Assumes that the output vector was correctly resized on the way in template inline cl_int getInfoHelper(Func f, cl_uint name, vector>* param, int) { if (name != CL_PROGRAM_BINARIES) { return CL_INVALID_VALUE; } if (param) { // Create array of pointers, calculate total size and pass pointer array in size_type numBinaries = param->size(); vector binariesPointers(numBinaries); for (size_type i = 0; i < numBinaries; ++i) { binariesPointers[i] = (*param)[i].data(); } cl_int err = f(name, numBinaries * sizeof(unsigned char*), binariesPointers.data(), NULL); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } // Specialized getInfoHelper for vector params template inline cl_int getInfoHelper(Func f, cl_uint name, vector* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } const size_type elements = required / sizeof(T); // Temporary to avoid changing param on an error vector localData(elements); err = f(name, required, localData.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { *param = std::move(localData); } return CL_SUCCESS; } /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper( Func f, cl_uint name, vector* param, int, typename T::cl_type = 0) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } const size_type elements = required / sizeof(typename T::cl_type); vector value(elements); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { // Assign to convert CL type to T for each element param->resize(elements); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < elements; i++) { (*param)[i] = T(value[i], true); } } return CL_SUCCESS; } // Specialized GetInfoHelper for string params template inline cl_int getInfoHelper(Func f, cl_uint name, string* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } // std::string has a constant data member // a char vector does not if (required > 0) { vector value(required); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { param->assign(begin(value), prev(end(value))); } } else if (param) { param->assign(""); } return CL_SUCCESS; } // Specialized GetInfoHelper for clsize_t params template inline cl_int getInfoHelper(Func f, cl_uint name, array* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } size_type elements = required / sizeof(size_type); vector value(elements, 0); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } // Bound the copy with N to prevent overruns // if passed N > than the amount copied if (elements > N) { elements = N; } for (size_type i = 0; i < elements; ++i) { (*param)[i] = value[i]; } return CL_SUCCESS; } template struct ReferenceHandler; /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, T* param, int, typename T::cl_type = 0) { typename T::cl_type value; cl_int err = f(name, sizeof(value), &value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; if (value != NULL) { err = param->retain(); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } #define CL_HPP_PARAM_NAME_INFO_1_0_(F) \ F(cl_platform_info, CL_PLATFORM_PROFILE, string) \ F(cl_platform_info, CL_PLATFORM_VERSION, string) \ F(cl_platform_info, CL_PLATFORM_NAME, string) \ F(cl_platform_info, CL_PLATFORM_VENDOR, string) \ F(cl_platform_info, CL_PLATFORM_EXTENSIONS, string) \ \ F(cl_device_info, CL_DEVICE_TYPE, cl_device_type) \ F(cl_device_info, CL_DEVICE_VENDOR_ID, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_COMPUTE_UNITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_SIZES, cl::vector) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_CLOCK_FREQUENCY, cl_uint) \ F(cl_device_info, CL_DEVICE_ADDRESS_BITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_READ_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_MEM_ALLOC_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_WIDTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_HEIGHT, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_WIDTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_HEIGHT, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_DEPTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_MAX_PARAMETER_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_SAMPLERS, cl_uint) \ F(cl_device_info, CL_DEVICE_MEM_BASE_ADDR_ALIGN, cl_uint) \ F(cl_device_info, CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SINGLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, cl_device_mem_cache_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, cl_uint)\ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_TYPE, cl_device_local_mem_type) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_ERROR_CORRECTION_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_PROFILING_TIMER_RESOLUTION, size_type) \ F(cl_device_info, CL_DEVICE_ENDIAN_LITTLE, cl_bool) \ F(cl_device_info, CL_DEVICE_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_COMPILER_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_EXECUTION_CAPABILITIES, cl_device_exec_capabilities) \ F(cl_device_info, CL_DEVICE_PLATFORM, cl_platform_id) \ F(cl_device_info, CL_DEVICE_NAME, string) \ F(cl_device_info, CL_DEVICE_VENDOR, string) \ F(cl_device_info, CL_DRIVER_VERSION, string) \ F(cl_device_info, CL_DEVICE_PROFILE, string) \ F(cl_device_info, CL_DEVICE_VERSION, string) \ F(cl_device_info, CL_DEVICE_EXTENSIONS, string) \ \ F(cl_context_info, CL_CONTEXT_REFERENCE_COUNT, cl_uint) \ F(cl_context_info, CL_CONTEXT_DEVICES, cl::vector) \ F(cl_context_info, CL_CONTEXT_PROPERTIES, cl::vector) \ \ F(cl_event_info, CL_EVENT_COMMAND_QUEUE, cl::CommandQueue) \ F(cl_event_info, CL_EVENT_COMMAND_TYPE, cl_command_type) \ F(cl_event_info, CL_EVENT_REFERENCE_COUNT, cl_uint) \ F(cl_event_info, CL_EVENT_COMMAND_EXECUTION_STATUS, cl_int) \ \ F(cl_profiling_info, CL_PROFILING_COMMAND_QUEUED, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_SUBMIT, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_START, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_END, cl_ulong) \ \ F(cl_mem_info, CL_MEM_TYPE, cl_mem_object_type) \ F(cl_mem_info, CL_MEM_FLAGS, cl_mem_flags) \ F(cl_mem_info, CL_MEM_SIZE, size_type) \ F(cl_mem_info, CL_MEM_HOST_PTR, void*) \ F(cl_mem_info, CL_MEM_MAP_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_REFERENCE_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_CONTEXT, cl::Context) \ \ F(cl_image_info, CL_IMAGE_FORMAT, cl_image_format) \ F(cl_image_info, CL_IMAGE_ELEMENT_SIZE, size_type) \ F(cl_image_info, CL_IMAGE_ROW_PITCH, size_type) \ F(cl_image_info, CL_IMAGE_SLICE_PITCH, size_type) \ F(cl_image_info, CL_IMAGE_WIDTH, size_type) \ F(cl_image_info, CL_IMAGE_HEIGHT, size_type) \ F(cl_image_info, CL_IMAGE_DEPTH, size_type) \ \ F(cl_sampler_info, CL_SAMPLER_REFERENCE_COUNT, cl_uint) \ F(cl_sampler_info, CL_SAMPLER_CONTEXT, cl::Context) \ F(cl_sampler_info, CL_SAMPLER_NORMALIZED_COORDS, cl_bool) \ F(cl_sampler_info, CL_SAMPLER_ADDRESSING_MODE, cl_addressing_mode) \ F(cl_sampler_info, CL_SAMPLER_FILTER_MODE, cl_filter_mode) \ \ F(cl_program_info, CL_PROGRAM_REFERENCE_COUNT, cl_uint) \ F(cl_program_info, CL_PROGRAM_CONTEXT, cl::Context) \ F(cl_program_info, CL_PROGRAM_NUM_DEVICES, cl_uint) \ F(cl_program_info, CL_PROGRAM_DEVICES, cl::vector) \ F(cl_program_info, CL_PROGRAM_SOURCE, string) \ F(cl_program_info, CL_PROGRAM_BINARY_SIZES, cl::vector) \ F(cl_program_info, CL_PROGRAM_BINARIES, cl::vector>) \ \ F(cl_program_build_info, CL_PROGRAM_BUILD_STATUS, cl_build_status) \ F(cl_program_build_info, CL_PROGRAM_BUILD_OPTIONS, string) \ F(cl_program_build_info, CL_PROGRAM_BUILD_LOG, string) \ \ F(cl_kernel_info, CL_KERNEL_FUNCTION_NAME, string) \ F(cl_kernel_info, CL_KERNEL_NUM_ARGS, cl_uint) \ F(cl_kernel_info, CL_KERNEL_REFERENCE_COUNT, cl_uint) \ F(cl_kernel_info, CL_KERNEL_CONTEXT, cl::Context) \ F(cl_kernel_info, CL_KERNEL_PROGRAM, cl::Program) \ \ F(cl_kernel_work_group_info, CL_KERNEL_WORK_GROUP_SIZE, size_type) \ F(cl_kernel_work_group_info, CL_KERNEL_COMPILE_WORK_GROUP_SIZE, cl::detail::size_t_array) \ F(cl_kernel_work_group_info, CL_KERNEL_LOCAL_MEM_SIZE, cl_ulong) \ \ F(cl_command_queue_info, CL_QUEUE_CONTEXT, cl::Context) \ F(cl_command_queue_info, CL_QUEUE_DEVICE, cl::Device) \ F(cl_command_queue_info, CL_QUEUE_REFERENCE_COUNT, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_PROPERTIES, cl_command_queue_properties) #define CL_HPP_PARAM_NAME_INFO_1_1_(F) \ F(cl_context_info, CL_CONTEXT_NUM_DEVICES, cl_uint)\ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_DOUBLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HALF_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_OPENCL_C_VERSION, string) \ \ F(cl_mem_info, CL_MEM_ASSOCIATED_MEMOBJECT, cl::Memory) \ F(cl_mem_info, CL_MEM_OFFSET, size_type) \ \ F(cl_kernel_work_group_info, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, size_type) \ F(cl_kernel_work_group_info, CL_KERNEL_PRIVATE_MEM_SIZE, cl_ulong) \ \ F(cl_event_info, CL_EVENT_CONTEXT, cl::Context) #define CL_HPP_PARAM_NAME_INFO_1_2_(F) \ F(cl_program_info, CL_PROGRAM_NUM_KERNELS, size_type) \ F(cl_program_info, CL_PROGRAM_KERNEL_NAMES, string) \ \ F(cl_program_build_info, CL_PROGRAM_BINARY_TYPE, cl_program_binary_type) \ \ F(cl_kernel_info, CL_KERNEL_ATTRIBUTES, string) \ \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ADDRESS_QUALIFIER, cl_kernel_arg_address_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ACCESS_QUALIFIER, cl_kernel_arg_access_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_NAME, string) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_NAME, string) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_QUALIFIER, cl_kernel_arg_type_qualifier) \ \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE, cl::Device) \ F(cl_device_info, CL_DEVICE_PARTITION_PROPERTIES, cl::vector) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPE, cl::vector) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, size_type) \ F(cl_device_info, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, cl_device_affinity_domain) \ F(cl_device_info, CL_DEVICE_BUILT_IN_KERNELS, string) \ \ F(cl_image_info, CL_IMAGE_ARRAY_SIZE, size_type) \ F(cl_image_info, CL_IMAGE_NUM_MIP_LEVELS, cl_uint) \ F(cl_image_info, CL_IMAGE_NUM_SAMPLES, cl_uint) #define CL_HPP_PARAM_NAME_INFO_2_0_(F) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_ON_DEVICE_QUEUES, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_ON_DEVICE_EVENTS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_PIPE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_PIPE_MAX_PACKET_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SVM_CAPABILITIES, cl_device_svm_capabilities) \ F(cl_device_info, CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_SIZE, cl_uint) \ F(cl_mem_info, CL_MEM_USES_SVM_POINTER, cl_bool) \ F(cl_program_build_info, CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE, size_type) \ F(cl_pipe_info, CL_PIPE_PACKET_SIZE, cl_uint) \ F(cl_pipe_info, CL_PIPE_MAX_PACKETS, cl_uint) #define CL_HPP_PARAM_NAME_DEVICE_FISSION_(F) \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE_EXT, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPES_EXT, cl::vector) \ F(cl_device_info, CL_DEVICE_AFFINITY_DOMAINS_EXT, cl::vector) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT_EXT , cl_uint) \ F(cl_device_info, CL_DEVICE_PARTITION_STYLE_EXT, cl::vector) template struct param_traits {}; #define CL_HPP_DECLARE_PARAM_TRAITS_(token, param_name, T) \ struct token; \ template<> \ struct param_traits \ { \ enum { value = param_name }; \ typedef T param_type; \ }; CL_HPP_PARAM_NAME_INFO_1_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_PARAM_NAME_INFO_1_1_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 #if CL_HPP_TARGET_OPENCL_VERSION >= 120 CL_HPP_PARAM_NAME_INFO_1_2_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 CL_HPP_PARAM_NAME_INFO_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 // Flags deprecated in OpenCL 2.0 #define CL_HPP_PARAM_NAME_INFO_1_0_DEPRECATED_IN_2_0_(F) \ F(cl_device_info, CL_DEVICE_QUEUE_PROPERTIES, cl_command_queue_properties) #define CL_HPP_PARAM_NAME_INFO_1_1_DEPRECATED_IN_2_0_(F) \ F(cl_device_info, CL_DEVICE_HOST_UNIFIED_MEMORY, cl_bool) #define CL_HPP_PARAM_NAME_INFO_1_2_DEPRECATED_IN_2_0_(F) \ F(cl_image_info, CL_IMAGE_BUFFER, cl::Buffer) // Include deprecated query flags based on versions // Only include deprecated 1.0 flags if 2.0 not active as there is an enum clash #if CL_HPP_TARGET_OPENCL_VERSION > 100 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 && CL_HPP_TARGET_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_0_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 110 #if CL_HPP_TARGET_OPENCL_VERSION > 110 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_1_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 #if CL_HPP_TARGET_OPENCL_VERSION > 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_2_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 200 #if defined(CL_HPP_USE_CL_DEVICE_FISSION) CL_HPP_PARAM_NAME_DEVICE_FISSION_(CL_HPP_DECLARE_PARAM_TRAITS_); #endif // CL_HPP_USE_CL_DEVICE_FISSION #ifdef CL_PLATFORM_ICD_SUFFIX_KHR CL_HPP_DECLARE_PARAM_TRAITS_(cl_platform_info, CL_PLATFORM_ICD_SUFFIX_KHR, string) #endif #ifdef CL_DEVICE_PROFILING_TIMER_OFFSET_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_PROFILING_TIMER_OFFSET_AMD, cl_ulong) #endif #ifdef CL_DEVICE_GLOBAL_FREE_MEMORY_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, vector) #endif #ifdef CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_WAVEFRONT_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_WAVEFRONT_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_BANKS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_LOCAL_MEM_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV, cl_uint) #endif #ifdef CL_DEVICE_REGISTERS_PER_BLOCK_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_REGISTERS_PER_BLOCK_NV, cl_uint) #endif #ifdef CL_DEVICE_WARP_SIZE_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_WARP_SIZE_NV, cl_uint) #endif #ifdef CL_DEVICE_GPU_OVERLAP_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GPU_OVERLAP_NV, cl_bool) #endif #ifdef CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV, cl_bool) #endif #ifdef CL_DEVICE_INTEGRATED_MEMORY_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_INTEGRATED_MEMORY_NV, cl_bool) #endif // Convenience functions template inline cl_int getInfo(Func f, cl_uint name, T* param) { return getInfoHelper(f, name, param, 0); } template struct GetInfoFunctor0 { Func f_; const Arg0& arg0_; cl_int operator ()( cl_uint param, size_type size, void* value, size_type* size_ret) { return f_(arg0_, param, size, value, size_ret); } }; template struct GetInfoFunctor1 { Func f_; const Arg0& arg0_; const Arg1& arg1_; cl_int operator ()( cl_uint param, size_type size, void* value, size_type* size_ret) { return f_(arg0_, arg1_, param, size, value, size_ret); } }; template inline cl_int getInfo(Func f, const Arg0& arg0, cl_uint name, T* param) { GetInfoFunctor0 f0 = { f, arg0 }; return getInfoHelper(f0, name, param, 0); } template inline cl_int getInfo(Func f, const Arg0& arg0, const Arg1& arg1, cl_uint name, T* param) { GetInfoFunctor1 f0 = { f, arg0, arg1 }; return getInfoHelper(f0, name, param, 0); } template struct ReferenceHandler { }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * OpenCL 1.2 devices do have retain/release. */ template <> struct ReferenceHandler { /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int retain(cl_device_id device) { return ::clRetainDevice(device); } /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int release(cl_device_id device) { return ::clReleaseDevice(device); } }; #else // CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * OpenCL 1.1 devices do not have retain/release. */ template <> struct ReferenceHandler { // cl_device_id does not have retain(). static cl_int retain(cl_device_id) { return CL_SUCCESS; } // cl_device_id does not have release(). static cl_int release(cl_device_id) { return CL_SUCCESS; } }; #endif // ! (CL_HPP_TARGET_OPENCL_VERSION >= 120) template <> struct ReferenceHandler { // cl_platform_id does not have retain(). static cl_int retain(cl_platform_id) { return CL_SUCCESS; } // cl_platform_id does not have release(). static cl_int release(cl_platform_id) { return CL_SUCCESS; } }; template <> struct ReferenceHandler { static cl_int retain(cl_context context) { return ::clRetainContext(context); } static cl_int release(cl_context context) { return ::clReleaseContext(context); } }; template <> struct ReferenceHandler { static cl_int retain(cl_command_queue queue) { return ::clRetainCommandQueue(queue); } static cl_int release(cl_command_queue queue) { return ::clReleaseCommandQueue(queue); } }; template <> struct ReferenceHandler { static cl_int retain(cl_mem memory) { return ::clRetainMemObject(memory); } static cl_int release(cl_mem memory) { return ::clReleaseMemObject(memory); } }; template <> struct ReferenceHandler { static cl_int retain(cl_sampler sampler) { return ::clRetainSampler(sampler); } static cl_int release(cl_sampler sampler) { return ::clReleaseSampler(sampler); } }; template <> struct ReferenceHandler { static cl_int retain(cl_program program) { return ::clRetainProgram(program); } static cl_int release(cl_program program) { return ::clReleaseProgram(program); } }; template <> struct ReferenceHandler { static cl_int retain(cl_kernel kernel) { return ::clRetainKernel(kernel); } static cl_int release(cl_kernel kernel) { return ::clReleaseKernel(kernel); } }; template <> struct ReferenceHandler { static cl_int retain(cl_event event) { return ::clRetainEvent(event); } static cl_int release(cl_event event) { return ::clReleaseEvent(event); } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Extracts version number with major in the upper 16 bits, minor in the lower 16 static cl_uint getVersion(const vector &versionInfo) { int highVersion = 0; int lowVersion = 0; int index = 7; while(versionInfo[index] != '.' ) { highVersion *= 10; highVersion += versionInfo[index]-'0'; ++index; } ++index; while(versionInfo[index] != ' ' && versionInfo[index] != '\0') { lowVersion *= 10; lowVersion += versionInfo[index]-'0'; ++index; } return (highVersion << 16) | lowVersion; } static cl_uint getPlatformVersion(cl_platform_id platform) { size_type size = 0; clGetPlatformInfo(platform, CL_PLATFORM_VERSION, 0, NULL, &size); vector versionInfo(size); clGetPlatformInfo(platform, CL_PLATFORM_VERSION, size, versionInfo.data(), &size); return getVersion(versionInfo); } static cl_uint getDevicePlatformVersion(cl_device_id device) { cl_platform_id platform; clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(platform), &platform, NULL); return getPlatformVersion(platform); } static cl_uint getContextPlatformVersion(cl_context context) { // The platform cannot be queried directly, so we first have to grab a // device and obtain its context size_type size = 0; clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &size); if (size == 0) return 0; vector devices(size/sizeof(cl_device_id)); clGetContextInfo(context, CL_CONTEXT_DEVICES, size, devices.data(), NULL); return getDevicePlatformVersion(devices[0]); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 template class Wrapper { public: typedef T cl_type; protected: cl_type object_; public: Wrapper() : object_(NULL) { } Wrapper(const cl_type &obj, bool retainObject) : object_(obj) { if (retainObject) { detail::errHandler(retain(), __RETAIN_ERR); } } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; detail::errHandler(retain(), __RETAIN_ERR); } Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT_ { object_ = rhs.object_; rhs.object_ = NULL; } Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; detail::errHandler(retain(), __RETAIN_ERR); } return *this; } Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; rhs.object_ = NULL; } return *this; } Wrapper& operator = (const cl_type &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs; return *this; } const cl_type& operator ()() const { return object_; } cl_type& operator ()() { return object_; } const cl_type get() const { return object_; } cl_type get() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); cl_int retain() const { if (object_ != nullptr) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if (object_ != nullptr) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; template <> class Wrapper { public: typedef cl_device_id cl_type; protected: cl_type object_; bool referenceCountable_; static bool isReferenceCountable(cl_device_id device) { bool retVal = false; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (device != NULL) { int version = getDevicePlatformVersion(device); if(version > ((1 << 16) + 1)) { retVal = true; } } #else // CL_HPP_MINIMUM_OPENCL_VERSION < 120 retVal = true; #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 return retVal; } public: Wrapper() : object_(NULL), referenceCountable_(false) { } Wrapper(const cl_type &obj, bool retainObject) : object_(obj), referenceCountable_(false) { referenceCountable_ = isReferenceCountable(obj); if (retainObject) { detail::errHandler(retain(), __RETAIN_ERR); } } ~Wrapper() { release(); } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; referenceCountable_ = isReferenceCountable(object_); detail::errHandler(retain(), __RETAIN_ERR); } Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT_ { object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; detail::errHandler(retain(), __RETAIN_ERR); } return *this; } Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } return *this; } Wrapper& operator = (const cl_type &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs; referenceCountable_ = isReferenceCountable(object_); return *this; } const cl_type& operator ()() const { return object_; } cl_type& operator ()() { return object_; } const cl_type get() const { return object_; } cl_type get() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); template friend inline cl_int getInfoHelper(Func, cl_uint, vector*, int, typename U::cl_type); cl_int retain() const { if( object_ != nullptr && referenceCountable_ ) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if (object_ != nullptr && referenceCountable_) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; template inline bool operator==(const Wrapper &lhs, const Wrapper &rhs) { return lhs() == rhs(); } template inline bool operator!=(const Wrapper &lhs, const Wrapper &rhs) { return !operator==(lhs, rhs); } } // namespace detail //! \endcond using BuildLogType = vector::param_type>>; #if defined(CL_HPP_ENABLE_EXCEPTIONS) /** * Exception class for build errors to carry build info */ class BuildError : public Error { private: BuildLogType buildLogs; public: BuildError(cl_int err, const char * errStr, const BuildLogType &vec) : Error(err, errStr), buildLogs(vec) { } BuildLogType getBuildLog() const { return buildLogs; } }; namespace detail { static inline cl_int buildErrHandler( cl_int err, const char * errStr, const BuildLogType &buildLogs) { if (err != CL_SUCCESS) { throw BuildError(err, errStr, buildLogs); } return err; } } // namespace detail #else namespace detail { static inline cl_int buildErrHandler( cl_int err, const char * errStr, const BuildLogType &buildLogs) { (void)buildLogs; // suppress unused variable warning (void)errStr; return err; } } // namespace detail #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) /*! \stuct ImageFormat * \brief Adds constructors and member functions for cl_image_format. * * \see cl_image_format */ struct ImageFormat : public cl_image_format { //! \brief Default constructor - performs no initialization. ImageFormat(){} //! \brief Initializing constructor. ImageFormat(cl_channel_order order, cl_channel_type type) { image_channel_order = order; image_channel_data_type = type; } //! \brief Assignment operator. ImageFormat& operator = (const ImageFormat& rhs) { if (this != &rhs) { this->image_channel_data_type = rhs.image_channel_data_type; this->image_channel_order = rhs.image_channel_order; } return *this; } }; /*! \brief Class interface for cl_device_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_device_id */ class Device : public detail::Wrapper { private: static std::once_flag default_initialized_; static Device default_; static cl_int default_error_; /*! \brief Create the default context. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault(); /*! \brief Create the default platform from a provided platform. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Device &p) { default_ = p; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Device(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE //! \brief Default constructor - initializes to NULL. Device() : detail::Wrapper() { } /*! \brief Constructor from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ explicit Device(const cl_device_id &device, bool retainObject = false) : detail::Wrapper(device, retainObject) { } /*! \brief Returns the first device on the default context. * * \see Context::getDefault() */ static Device getDefault( cl_int *errResult = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (errResult != NULL) { *errResult = default_error_; } return default_; } /** * Modify the default device to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default device. * Should be compared to the passed value to ensure that it was updated. */ static Device setDefault(const Device &default_device) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_device)); detail::errHandler(default_error_); return default_; } /*! \brief Assignment operator from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ Device& operator = (const cl_device_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Device(const Device& dev) : detail::Wrapper(dev) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Device& operator = (const Device &dev) { detail::Wrapper::operator=(dev); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Device(Device&& dev) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(dev)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Device& operator = (Device &&dev) { detail::Wrapper::operator=(std::move(dev)); return *this; } //! \brief Wrapper for clGetDeviceInfo(). template cl_int getInfo(cl_device_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetDeviceInfo, object_, name, param), __GET_DEVICE_INFO_ERR); } //! \brief Wrapper for clGetDeviceInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_device_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /** * CL 1.2 version */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 //! \brief Wrapper for clCreateSubDevices(). cl_int createSubDevices( const cl_device_partition_property * properties, vector* devices) { cl_uint n = 0; cl_int err = clCreateSubDevices(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } vector ids(n); err = clCreateSubDevices(object_, properties, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { // We do not need to retain because this device is being created // by the runtime (*devices)[i] = Device(ids[i], false); } } return CL_SUCCESS; } #elif defined(CL_HPP_USE_CL_DEVICE_FISSION) /** * CL 1.1 version that uses device fission extension. */ cl_int createSubDevices( const cl_device_partition_property_ext * properties, vector* devices) { typedef CL_API_ENTRY cl_int ( CL_API_CALL * PFN_clCreateSubDevicesEXT)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; static PFN_clCreateSubDevicesEXT pfn_clCreateSubDevicesEXT = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_(clCreateSubDevicesEXT); cl_uint n = 0; cl_int err = pfn_clCreateSubDevicesEXT(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } vector ids(n); err = pfn_clCreateSubDevicesEXT(object_, properties, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { // We do not need to retain because this device is being created // by the runtime (*devices)[i] = Device(ids[i], false); } } return CL_SUCCESS; } #endif // defined(CL_HPP_USE_CL_DEVICE_FISSION) }; CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Device::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Device Device::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Device::default_error_ = CL_SUCCESS; /*! \brief Class interface for cl_platform_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_platform_id */ class Platform : public detail::Wrapper { private: static std::once_flag default_initialized_; static Platform default_; static cl_int default_error_; /*! \brief Create the default context. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { // If default wasn't passed ,generate one // Otherwise set it cl_uint n = 0; cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { default_error_ = err; return; } if (n == 0) { default_error_ = CL_INVALID_PLATFORM; return; } vector ids(n); err = ::clGetPlatformIDs(n, ids.data(), NULL); if (err != CL_SUCCESS) { default_error_ = err; return; } default_ = Platform(ids[0]); } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default platform from a provided platform. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Platform &p) { default_ = p; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Platform(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE //! \brief Default constructor - initializes to NULL. Platform() : detail::Wrapper() { } /*! \brief Constructor from cl_platform_id. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This simply copies the platform ID value, which is an inexpensive operation. */ explicit Platform(const cl_platform_id &platform, bool retainObject = false) : detail::Wrapper(platform, retainObject) { } /*! \brief Assignment operator from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ Platform& operator = (const cl_platform_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } static Platform getDefault( cl_int *errResult = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (errResult != NULL) { *errResult = default_error_; } return default_; } /** * Modify the default platform to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default platform. * Should be compared to the passed value to ensure that it was updated. */ static Platform setDefault(const Platform &default_platform) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_platform)); detail::errHandler(default_error_); return default_; } //! \brief Wrapper for clGetPlatformInfo(). cl_int getInfo(cl_platform_info name, string* param) const { return detail::errHandler( detail::getInfo(&::clGetPlatformInfo, object_, name, param), __GET_PLATFORM_INFO_ERR); } //! \brief Wrapper for clGetPlatformInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_platform_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of devices for this platform. * * Wraps clGetDeviceIDs(). */ cl_int getDevices( cl_device_type type, vector* devices) const { cl_uint n = 0; if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } cl_int err = ::clGetDeviceIDs(object_, type, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } vector ids(n); err = ::clGetDeviceIDs(object_, type, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction // We must retain things we obtain from the API to avoid releasing // API-owned objects. if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { (*devices)[i] = Device(ids[i], true); } } return CL_SUCCESS; } #if defined(CL_HPP_USE_DX_INTEROP) /*! \brief Get the list of available D3D10 devices. * * \param d3d_device_source. * * \param d3d_object. * * \param d3d_device_set. * * \param devices returns a vector of OpenCL D3D10 devices found. The cl::Device * values returned in devices can be used to identify a specific OpenCL * device. If \a devices argument is NULL, this argument is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * * The application can query specific capabilities of the OpenCL device(s) * returned by cl::getDevices. This can be used by the application to * determine which device(s) to use. * * \note In the case that exceptions are enabled and a return value * other than CL_SUCCESS is generated, then cl::Error exception is * generated. */ cl_int getDevices( cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, vector* devices) const { typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clGetDeviceIDsFromD3D10KHR)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint* num_devices); if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } static PFN_clGetDeviceIDsFromD3D10KHR pfn_clGetDeviceIDsFromD3D10KHR = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(object_, clGetDeviceIDsFromD3D10KHR); cl_uint n = 0; cl_int err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } vector ids(n); err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction // We must retain things we obtain from the API to avoid releasing // API-owned objects. if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { (*devices)[i] = Device(ids[i], true); } } return CL_SUCCESS; } #endif /*! \brief Gets a list of available platforms. * * Wraps clGetPlatformIDs(). */ static cl_int get( vector* platforms) { cl_uint n = 0; if( platforms == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } vector ids(n); err = ::clGetPlatformIDs(n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } if (platforms) { platforms->resize(ids.size()); // Platforms don't reference count for (size_type i = 0; i < ids.size(); i++) { (*platforms)[i] = Platform(ids[i]); } } return CL_SUCCESS; } /*! \brief Gets the first available platform. * * Wraps clGetPlatformIDs(), returning the first result. */ static cl_int get( Platform * platform) { cl_int err; Platform default_platform = Platform::getDefault(&err); if (platform) { *platform = default_platform; } return err; } /*! \brief Gets the first available platform, returning it by value. * * \return Returns a valid platform if one is available. * If no platform is available will return a null platform. * Throws an exception if no platforms are available * or an error condition occurs. * Wraps clGetPlatformIDs(), returning the first result. */ static Platform get( cl_int * errResult = NULL) { cl_int err; Platform default_platform = Platform::getDefault(&err); if (errResult) { *errResult = err; } return default_platform; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 //! \brief Wrapper for clUnloadCompiler(). cl_int unloadCompiler() { return ::clUnloadPlatformCompiler(object_); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 }; // class Platform CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Platform::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Platform Platform::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Platform::default_error_ = CL_SUCCESS; /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * Unload the OpenCL compiler. * \note Deprecated for OpenCL 1.2. Use Platform::unloadCompiler instead. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int UnloadCompiler() CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline cl_int UnloadCompiler() { return ::clUnloadCompiler(); } #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for cl_context. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_context as the original. For details, see * clRetainContext() and clReleaseContext(). * * \see cl_context */ class Context : public detail::Wrapper { private: static std::once_flag default_initialized_; static Context default_; static cl_int default_error_; /*! \brief Create the default context from the default device type in the default platform. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { #if !defined(__APPLE__) && !defined(__MACOS) const Platform &p = Platform::getDefault(); cl_platform_id defaultPlatform = p(); cl_context_properties properties[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)defaultPlatform, 0 }; #else // #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties *properties = nullptr; #endif // #if !defined(__APPLE__) && !defined(__MACOS) default_ = Context( CL_DEVICE_TYPE_DEFAULT, properties, NULL, NULL, &default_error_); } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default context from a provided Context. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Context &c) { default_ = c; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Context(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Constructs a context including a list of specified devices. * * Wraps clCreateContext(). */ Context( const vector& devices, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateContext( properties, (cl_uint) numDevices, deviceIDs.data(), notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } Context( const Device& device, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; cl_device_id deviceID = device(); object_ = ::clCreateContext( properties, 1, &deviceID, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a context including all or a subset of devices of a specified type. * * Wraps clCreateContextFromType(). */ Context( cl_device_type type, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties prop[4] = {CL_CONTEXT_PLATFORM, 0, 0, 0 }; if (properties == NULL) { // Get a valid platform ID as we cannot send in a blank one vector platforms; error = Platform::get(&platforms); if (error != CL_SUCCESS) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } return; } // Check the platforms we found for a device of our specified type cl_context_properties platform_id = 0; for (unsigned int i = 0; i < platforms.size(); i++) { vector devices; #if defined(CL_HPP_ENABLE_EXCEPTIONS) try { #endif error = platforms[i].getDevices(type, &devices); #if defined(CL_HPP_ENABLE_EXCEPTIONS) } catch (Error) {} // Catch if exceptions are enabled as we don't want to exit if first platform has no devices of type // We do error checking next anyway, and can throw there if needed #endif // Only squash CL_SUCCESS and CL_DEVICE_NOT_FOUND if (error != CL_SUCCESS && error != CL_DEVICE_NOT_FOUND) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } if (devices.size() > 0) { platform_id = (cl_context_properties)platforms[i](); break; } } if (platform_id == 0) { detail::errHandler(CL_DEVICE_NOT_FOUND, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = CL_DEVICE_NOT_FOUND; } return; } prop[1] = platform_id; properties = &prop[0]; } #endif object_ = ::clCreateContextFromType( properties, type, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Context(const Context& ctx) : detail::Wrapper(ctx) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Context& operator = (const Context &ctx) { detail::Wrapper::operator=(ctx); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Context(Context&& ctx) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(ctx)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Context& operator = (Context &&ctx) { detail::Wrapper::operator=(std::move(ctx)); return *this; } /*! \brief Returns a singleton context including all devices of CL_DEVICE_TYPE_DEFAULT. * * \note All calls to this function return the same cl_context as the first. */ static Context getDefault(cl_int * err = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (err != NULL) { *err = default_error_; } return default_; } /** * Modify the default context to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default context. * Should be compared to the passed value to ensure that it was updated. */ static Context setDefault(const Context &default_context) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_context)); detail::errHandler(default_error_); return default_; } //! \brief Default constructor - initializes to NULL. Context() : detail::Wrapper() { } /*! \brief Constructor from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the cl_context * into the new Context object. */ explicit Context(const cl_context& context, bool retainObject = false) : detail::Wrapper(context, retainObject) { } /*! \brief Assignment operator from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseContext() on the value previously held by this instance. */ Context& operator = (const cl_context& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetContextInfo(). template cl_int getInfo(cl_context_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetContextInfo, object_, name, param), __GET_CONTEXT_INFO_ERR); } //! \brief Wrapper for clGetContextInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_context_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of supported image formats. * * Wraps clGetSupportedImageFormats(). */ cl_int getSupportedImageFormats( cl_mem_flags flags, cl_mem_object_type type, vector* formats) const { cl_uint numEntries; if (!formats) { return CL_SUCCESS; } cl_int err = ::clGetSupportedImageFormats( object_, flags, type, 0, NULL, &numEntries); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } if (numEntries > 0) { vector value(numEntries); err = ::clGetSupportedImageFormats( object_, flags, type, numEntries, (cl_image_format*)value.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } formats->assign(begin(value), end(value)); } else { // If no values are being returned, ensure an empty vector comes back formats->clear(); } return CL_SUCCESS; } }; inline void Device::makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { cl_int error = 0; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { default_error_ = error; } else { default_ = context.getInfo()[0]; default_error_ = CL_SUCCESS; } } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Context::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Context Context::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Context::default_error_ = CL_SUCCESS; /*! \brief Class interface for cl_event. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_event as the original. For details, see * clRetainEvent() and clReleaseEvent(). * * \see cl_event */ class Event : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Event() : detail::Wrapper() { } /*! \brief Constructor from cl_event - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_event * into the new Event object. */ explicit Event(const cl_event& event, bool retainObject = false) : detail::Wrapper(event, retainObject) { } /*! \brief Assignment operator from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseEvent() on the value previously held by this instance. */ Event& operator = (const cl_event& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetEventInfo(). template cl_int getInfo(cl_event_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetEventInfo, object_, name, param), __GET_EVENT_INFO_ERR); } //! \brief Wrapper for clGetEventInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_event_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } //! \brief Wrapper for clGetEventProfilingInfo(). template cl_int getProfilingInfo(cl_profiling_info name, T* param) const { return detail::errHandler(detail::getInfo( &::clGetEventProfilingInfo, object_, name, param), __GET_EVENT_PROFILE_INFO_ERR); } //! \brief Wrapper for clGetEventProfilingInfo() that returns by value. template typename detail::param_traits::param_type getProfilingInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_profiling_info, name>::param_type param; cl_int result = getProfilingInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Blocks the calling thread until this event completes. * * Wraps clWaitForEvents(). */ cl_int wait() const { return detail::errHandler( ::clWaitForEvents(1, &object_), __WAIT_FOR_EVENTS_ERR); } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Registers a user callback function for a specific command execution status. * * Wraps clSetEventCallback(). */ cl_int setCallback( cl_int type, void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data = NULL) { return detail::errHandler( ::clSetEventCallback( object_, type, pfn_notify, user_data), __SET_EVENT_CALLBACK_ERR); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ static cl_int waitForEvents(const vector& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Class interface for user events (a subset of cl_event's). * * See Event for details about copy semantics, etc. */ class UserEvent : public Event { public: /*! \brief Constructs a user event on a given context. * * Wraps clCreateUserEvent(). */ UserEvent( const Context& context, cl_int * err = NULL) { cl_int error; object_ = ::clCreateUserEvent( context(), &error); detail::errHandler(error, __CREATE_USER_EVENT_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. UserEvent() : Event() { } /*! \brief Sets the execution status of a user event object. * * Wraps clSetUserEventStatus(). */ cl_int setStatus(cl_int status) { return detail::errHandler( ::clSetUserEventStatus(object_,status), __SET_USER_EVENT_STATUS_ERR); } }; #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ inline static cl_int WaitForEvents(const vector& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } /*! \brief Class interface for cl_mem. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_mem as the original. For details, see * clRetainMemObject() and clReleaseMemObject(). * * \see cl_mem */ class Memory : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Memory() : detail::Wrapper() { } /*! \brief Constructor from cl_mem - takes ownership. * * Optionally transfer ownership of a refcount on the cl_mem * into the new Memory object. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * * See Memory for further details. */ explicit Memory(const cl_mem& memory, bool retainObject) : detail::Wrapper(memory, retainObject) { } /*! \brief Assignment operator from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseMemObject() on the value previously held by this instance. */ Memory& operator = (const cl_mem& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Memory(const Memory& mem) : detail::Wrapper(mem) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Memory& operator = (const Memory &mem) { detail::Wrapper::operator=(mem); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Memory(Memory&& mem) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(mem)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Memory& operator = (Memory &&mem) { detail::Wrapper::operator=(std::move(mem)); return *this; } //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_mem_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetMemObjectInfo, object_, name, param), __GET_MEM_OBJECT_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_mem_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Registers a callback function to be called when the memory object * is no longer needed. * * Wraps clSetMemObjectDestructorCallback(). * * Repeated calls to this function, for a given cl_mem value, will append * to the list of functions called (in reverse order) when memory object's * resources are freed and the memory object is deleted. * * \note * The registered callbacks are associated with the underlying cl_mem * value - not the Memory class instance. */ cl_int setDestructorCallback( void (CL_CALLBACK * pfn_notify)(cl_mem, void *), void * user_data = NULL) { return detail::errHandler( ::clSetMemObjectDestructorCallback( object_, pfn_notify, user_data), __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 }; // Pre-declare copy functions class Buffer; template< typename IteratorType > cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); #if CL_HPP_TARGET_OPENCL_VERSION >= 200 namespace detail { class SVMTraitNull { public: static cl_svm_mem_flags getSVMMemFlags() { return 0; } }; } // namespace detail template class SVMTraitReadWrite { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_READ_WRITE | Trait::getSVMMemFlags(); } }; template class SVMTraitReadOnly { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_READ_ONLY | Trait::getSVMMemFlags(); } }; template class SVMTraitWriteOnly { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_WRITE_ONLY | Trait::getSVMMemFlags(); } }; template> class SVMTraitCoarse { public: static cl_svm_mem_flags getSVMMemFlags() { return Trait::getSVMMemFlags(); } }; template> class SVMTraitFine { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_SVM_FINE_GRAIN_BUFFER | Trait::getSVMMemFlags(); } }; template> class SVMTraitAtomic { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS | Trait::getSVMMemFlags(); } }; // Pre-declare SVM map function template inline cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL); /** * STL-like allocator class for managing SVM objects provided for convenience. * * Note that while this behaves like an allocator for the purposes of constructing vectors and similar objects, * care must be taken when using with smart pointers. * The allocator should not be used to construct a unique_ptr if we are using coarse-grained SVM mode because * the coarse-grained management behaviour would behave incorrectly with respect to reference counting. * * Instead the allocator embeds a Deleter which may be used with unique_ptr and is used * with the allocate_shared and allocate_ptr supplied operations. */ template class SVMAllocator { private: Context context_; public: typedef T value_type; typedef value_type* pointer; typedef const value_type* const_pointer; typedef value_type& reference; typedef const value_type& const_reference; typedef std::size_t size_type; typedef std::ptrdiff_t difference_type; template struct rebind { typedef SVMAllocator other; }; template friend class SVMAllocator; SVMAllocator() : context_(Context::getDefault()) { } explicit SVMAllocator(cl::Context context) : context_(context) { } SVMAllocator(const SVMAllocator &other) : context_(other.context_) { } template SVMAllocator(const SVMAllocator &other) : context_(other.context_) { } ~SVMAllocator() { } pointer address(reference r) CL_HPP_NOEXCEPT_ { return std::addressof(r); } const_pointer address(const_reference r) CL_HPP_NOEXCEPT_ { return std::addressof(r); } /** * Allocate an SVM pointer. * * If the allocator is coarse-grained, this will take ownership to allow * containers to correctly construct data in place. */ pointer allocate( size_type size, typename cl::SVMAllocator::const_pointer = 0) { // Allocate memory with default alignment matching the size of the type void* voidPointer = clSVMAlloc( context_(), SVMTrait::getSVMMemFlags(), size*sizeof(T), 0); pointer retValue = reinterpret_cast( voidPointer); #if defined(CL_HPP_ENABLE_EXCEPTIONS) if (!retValue) { std::bad_alloc excep; throw excep; } #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) // If allocation was coarse-grained then map it if (!(SVMTrait::getSVMMemFlags() & CL_MEM_SVM_FINE_GRAIN_BUFFER)) { cl_int err = enqueueMapSVM(retValue, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, size*sizeof(T)); if (err != CL_SUCCESS) { std::bad_alloc excep; throw excep; } } // If exceptions disabled, return null pointer from allocator return retValue; } void deallocate(pointer p, size_type) { clSVMFree(context_(), p); } /** * Return the maximum possible allocation size. * This is the minimum of the maximum sizes of all devices in the context. */ size_type max_size() const CL_HPP_NOEXCEPT_ { size_type maxSize = std::numeric_limits::max() / sizeof(T); for (const Device &d : context_.getInfo()) { maxSize = std::min( maxSize, static_cast(d.getInfo())); } return maxSize; } template< class U, class... Args > void construct(U* p, Args&&... args) { new(p)T(args...); } template< class U > void destroy(U* p) { p->~U(); } /** * Returns true if the contexts match. */ inline bool operator==(SVMAllocator const& rhs) { return (context_==rhs.context_); } inline bool operator!=(SVMAllocator const& a) { return !operator==(a); } }; // class SVMAllocator return cl::pointer(tmp, detail::Deleter{alloc, copies}); template class SVMAllocator { public: typedef void value_type; typedef value_type* pointer; typedef const value_type* const_pointer; template struct rebind { typedef SVMAllocator other; }; template friend class SVMAllocator; }; #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) namespace detail { template class Deleter { private: Alloc alloc_; size_type copies_; public: typedef typename std::allocator_traits::pointer pointer; Deleter(const Alloc &alloc, size_type copies) : alloc_{ alloc }, copies_{ copies } { } void operator()(pointer ptr) const { Alloc tmpAlloc{ alloc_ }; std::allocator_traits::destroy(tmpAlloc, std::addressof(*ptr)); std::allocator_traits::deallocate(tmpAlloc, ptr, copies_); } }; } // namespace detail /** * Allocation operation compatible with std::allocate_ptr. * Creates a unique_ptr by default. * This requirement is to ensure that the control block is not * allocated in memory inaccessible to the host. */ template cl::pointer> allocate_pointer(const Alloc &alloc_, Args&&... args) { Alloc alloc(alloc_); static const size_type copies = 1; // Ensure that creation of the management block and the // object are dealt with separately such that we only provide a deleter T* tmp = std::allocator_traits::allocate(alloc, copies); if (!tmp) { std::bad_alloc excep; throw excep; } try { std::allocator_traits::construct( alloc, std::addressof(*tmp), std::forward(args)...); return cl::pointer>(tmp, detail::Deleter{alloc, copies}); } catch (std::bad_alloc b) { std::allocator_traits::deallocate(alloc, tmp, copies); throw; } } template< class T, class SVMTrait, class... Args > cl::pointer>> allocate_svm(Args... args) { SVMAllocator alloc; return cl::allocate_pointer(alloc, args...); } template< class T, class SVMTrait, class... Args > cl::pointer>> allocate_svm(const cl::Context &c, Args... args) { SVMAllocator alloc(c); return cl::allocate_pointer(alloc, args...); } #endif // #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) /*! \brief Vector alias to simplify contruction of coarse-grained SVM containers. * */ template < class T > using coarse_svm_vector = vector>>; /*! \brief Vector alias to simplify contruction of fine-grained SVM containers. * */ template < class T > using fine_svm_vector = vector>>; /*! \brief Vector alias to simplify contruction of fine-grained SVM containers that support platform atomics. * */ template < class T > using atomic_svm_vector = vector>>; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for Buffer Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Buffer : public Memory { public: /*! \brief Constructs a Buffer in a specified context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. */ Buffer( const Context& context, cl_mem_flags flags, size_type size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Buffer in the default context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. * * \see Context::getDefault() */ Buffer( cl_mem_flags flags, size_type size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! * \brief Construct a Buffer from a host container via iterators. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer( IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); Context context = Context::getDefault(err); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { error = cl::copy(startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Construct a Buffer from a host container via iterators using a specified context. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); /*! * \brief Construct a Buffer from a host container via iterators using a specified queue. * If useHostPtr is specified iterators must be random access. */ template< typename IteratorType > Buffer(const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Buffer() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with earlier versions. * * See Memory for further details. */ explicit Buffer(const cl_mem& buffer, bool retainObject = false) : Memory(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Buffer& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Buffer(const Buffer& buf) : Memory(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Buffer& operator = (const Buffer &buf) { Memory::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Buffer(Buffer&& buf) CL_HPP_NOEXCEPT_ : Memory(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Buffer& operator = (Buffer &&buf) { Memory::operator=(std::move(buf)); return *this; } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Creates a new buffer object from this. * * Wraps clCreateSubBuffer(). */ Buffer createSubBuffer( cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * err = NULL) { Buffer result; cl_int error; result.object_ = ::clCreateSubBuffer( object_, flags, buffer_create_type, buffer_create_info, &error); detail::errHandler(error, __CREATE_SUBBUFFER_ERR); if (err != NULL) { *err = error; } return result; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 }; #if defined (CL_HPP_USE_DX_INTEROP) /*! \brief Class interface for creating OpenCL buffers from ID3D10Buffer's. * * This is provided to facilitate interoperability with Direct3D. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferD3D10 : public Buffer { public: /*! \brief Constructs a BufferD3D10, in a specified context, from a * given ID3D10Buffer. * * Wraps clCreateFromD3D10BufferKHR(). */ BufferD3D10( const Context& context, cl_mem_flags flags, ID3D10Buffer* bufobj, cl_int * err = NULL) : pfn_clCreateFromD3D10BufferKHR(nullptr) { typedef CL_API_ENTRY cl_mem (CL_API_CALL *PFN_clCreateFromD3D10BufferKHR)( cl_context context, cl_mem_flags flags, ID3D10Buffer* buffer, cl_int* errcode_ret); PFN_clCreateFromD3D10BufferKHR pfn_clCreateFromD3D10BufferKHR; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 vector props = context.getInfo(); cl_platform platform = -1; for( int i = 0; i < props.size(); ++i ) { if( props[i] == CL_CONTEXT_PLATFORM ) { platform = props[i+1]; } } CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clCreateFromD3D10BufferKHR); #elif CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clCreateFromD3D10BufferKHR); #endif cl_int error; object_ = pfn_clCreateFromD3D10BufferKHR( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferD3D10() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferD3D10(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferD3D10& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10(const BufferD3D10& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (const BufferD3D10 &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10(BufferD3D10&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (BufferD3D10 &&buf) { Buffer::operator=(std::move(buf)); return *this; } }; #endif /*! \brief Class interface for GL Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferGL : public Buffer { public: /*! \brief Constructs a BufferGL in a specified context, from a given * GL buffer. * * Wraps clCreateFromGLBuffer(). */ BufferGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLBuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferGL(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL(const BufferGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (const BufferGL &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferGL(BufferGL&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (BufferGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief Class interface for GL Render Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferRenderGL : public Buffer { public: /*! \brief Constructs a BufferRenderGL in a specified context, from a given * GL Renderbuffer. * * Wraps clCreateFromGLRenderbuffer(). */ BufferRenderGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLRenderbuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_RENDER_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferRenderGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferRenderGL(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferRenderGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL(const BufferRenderGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (const BufferRenderGL &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (BufferRenderGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief C++ base class for Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image : public Memory { protected: //! \brief Default constructor - initializes to NULL. Image() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image(const cl_mem& image, bool retainObject = false) : Memory(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image(const Image& img) : Memory(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image& operator = (const Image &img) { Memory::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image(Image&& img) CL_HPP_NOEXCEPT_ : Memory(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image& operator = (Image &&img) { Memory::operator=(std::move(img)); return *this; } public: //! \brief Wrapper for clGetImageInfo(). template cl_int getImageInfo(cl_image_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetImageInfo, object_, name, param), __GET_IMAGE_INFO_ERR); } //! \brief Wrapper for clGetImageInfo() that returns by value. template typename detail::param_traits::param_type getImageInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_image_info, name>::param_type param; cl_int result = getImageInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 1D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image1D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image1D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D, width, 0, 0, 0, 0, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image1D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1D(const cl_mem& image1D, bool retainObject = false) : Image(image1D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image1D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1D(const Image1D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1D& operator = (const Image1D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1D(Image1D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1D& operator = (Image1D &&img) { Image::operator=(std::move(img)); return *this; } }; /*! \class Image1DBuffer * \brief Image interface for 1D buffer images. */ class Image1DBuffer : public Image { public: Image1DBuffer( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, const Buffer &buffer, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_BUFFER, width, 0, 0, 0, 0, 0, 0, 0, buffer() }; object_ = ::clCreateImage( context(), flags, &format, &desc, NULL, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DBuffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1DBuffer(const cl_mem& image1D, bool retainObject = false) : Image(image1D, retainObject) { } Image1DBuffer& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer(const Image1DBuffer& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (const Image1DBuffer &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer(Image1DBuffer&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (Image1DBuffer &&img) { Image::operator=(std::move(img)); return *this; } }; /*! \class Image1DArray * \brief Image interface for arrays of 1D images. */ class Image1DArray : public Image { public: Image1DArray( const Context& context, cl_mem_flags flags, ImageFormat format, size_type arraySize, size_type width, size_type rowPitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_ARRAY, width, 0, 0, // height, depth (unused) arraySize, rowPitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DArray() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1DArray(const cl_mem& imageArray, bool retainObject = false) : Image(imageArray, retainObject) { } Image1DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray(const Image1DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (const Image1DArray &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray(Image1DArray&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (Image1DArray &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 2D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image2D : public Image { public: /*! \brief Constructs a 2D Image in a specified context. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, size_type height, size_type row_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif CL_HPP_TARGET_OPENCL_VERSION >= 120 useCreateImage = true; #else useCreateImage = false; #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 120 if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (!useCreateImage) { object_ = ::clCreateImage2D( context(), flags,&format, width, height, row_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE2D_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 || defined(CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR) /*! \brief Constructs a 2D Image from a buffer. * \note This will share storage with the underlying buffer. * * Wraps clCreateImage(). */ Image2D( const Context& context, ImageFormat format, const Buffer &sourceBuffer, size_type width, size_type height, size_type row_pitch = 0, cl_int* err = nullptr) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, // Use buffer as input to image sourceBuffer() }; object_ = ::clCreateImage( context(), 0, // flags inherited from buffer &format, &desc, nullptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != nullptr) { *err = error; } } #endif //#if CL_HPP_TARGET_OPENCL_VERSION >= 200 || defined(CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Constructs a 2D Image from an image. * \note This will share storage with the underlying image but may * reinterpret the channel order and type. * * The image will be created matching with a descriptor matching the source. * * \param order is the channel order to reinterpret the image data as. * The channel order may differ as described in the OpenCL * 2.0 API specification. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_channel_order order, const Image &sourceImage, cl_int* err = nullptr) { cl_int error; // Descriptor fields have to match source image size_type sourceWidth = sourceImage.getImageInfo(); size_type sourceHeight = sourceImage.getImageInfo(); size_type sourceRowPitch = sourceImage.getImageInfo(); cl_uint sourceNumMIPLevels = sourceImage.getImageInfo(); cl_uint sourceNumSamples = sourceImage.getImageInfo(); cl_image_format sourceFormat = sourceImage.getImageInfo(); // Update only the channel order. // Channel format inherited from source. sourceFormat.image_channel_order = order; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, sourceWidth, sourceHeight, 0, 0, // depth (unused), array size (unused) sourceRowPitch, 0, // slice pitch (unused) sourceNumMIPLevels, sourceNumSamples, // Use buffer as input to image sourceImage() }; object_ = ::clCreateImage( context(), 0, // flags should be inherited from mem_object &sourceFormat, &desc, nullptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != nullptr) { *err = error; } } #endif //#if CL_HPP_TARGET_OPENCL_VERSION >= 200 //! \brief Default constructor - initializes to NULL. Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2D(const cl_mem& image2D, bool retainObject = false) : Image(image2D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2D(const Image2D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2D& operator = (const Image2D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2D(Image2D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2D& operator = (Image2D &&img) { Image::operator=(std::move(img)); return *this; } }; #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for GL 2D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory * \note Deprecated for OpenCL 1.2. Please use ImageGL instead. */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED Image2DGL : public Image2D { public: /*! \brief Constructs an Image2DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture2D(). */ Image2DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture2D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_2D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image2DGL() : Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2DGL(const cl_mem& image, bool retainObject = false) : Image2D(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. *c * See Memory for further details. */ Image2DGL& operator = (const cl_mem& rhs) { Image2D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL(const Image2DGL& img) : Image2D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (const Image2DGL &img) { Image2D::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL(Image2DGL&& img) CL_HPP_NOEXCEPT_ : Image2D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (Image2DGL &&img) { Image2D::operator=(std::move(img)); return *this; } } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \class Image2DArray * \brief Image interface for arrays of 2D images. */ class Image2DArray : public Image { public: Image2DArray( const Context& context, cl_mem_flags flags, ImageFormat format, size_type arraySize, size_type width, size_type height, size_type rowPitch, size_type slicePitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D_ARRAY, width, height, 0, // depth (unused) arraySize, rowPitch, slicePitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image2DArray() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2DArray(const cl_mem& imageArray, bool retainObject = false) : Image(imageArray, retainObject) { } Image2DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray(const Image2DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (const Image2DArray &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray(Image2DArray&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (Image2DArray &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 3D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3D : public Image { public: /*! \brief Constructs a 3D Image in a specified context. * * Wraps clCreateImage(). */ Image3D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, size_type height, size_type depth, size_type row_pitch = 0, size_type slice_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif CL_HPP_TARGET_OPENCL_VERSION >= 120 useCreateImage = true; #else useCreateImage = false; #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 120 if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE3D, width, height, depth, 0, // array size (unused) row_pitch, slice_pitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (!useCreateImage) { object_ = ::clCreateImage3D( context(), flags, &format, width, height, depth, row_pitch, slice_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE3D_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 } //! \brief Default constructor - initializes to NULL. Image3D() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image3D(const cl_mem& image3D, bool retainObject = false) : Image(image3D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3D(const Image3D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3D& operator = (const Image3D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3D(Image3D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3D& operator = (Image3D &&img) { Image::operator=(std::move(img)); return *this; } }; #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for GL 3D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3DGL : public Image3D { public: /*! \brief Constructs an Image3DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture3D(). */ Image3DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture3D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_3D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image3DGL() : Image3D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image3DGL(const cl_mem& image, bool retainObject = false) : Image3D(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3DGL& operator = (const cl_mem& rhs) { Image3D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL(const Image3DGL& img) : Image3D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (const Image3DGL &img) { Image3D::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL(Image3DGL&& img) CL_HPP_NOEXCEPT_ : Image3D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (Image3DGL &&img) { Image3D::operator=(std::move(img)); return *this; } }; #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \class ImageGL * \brief general image interface for GL interop. * We abstract the 2D and 3D GL images into a single instance here * that wraps all GL sourced images on the grounds that setup information * was performed by OpenCL anyway. */ class ImageGL : public Image { public: ImageGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_ERR); if (err != NULL) { *err = error; } } ImageGL() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit ImageGL(const cl_mem& image, bool retainObject = false) : Image(image, retainObject) { } ImageGL& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL(const ImageGL& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (const ImageGL &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ ImageGL(ImageGL&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (ImageGL &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for Pipe Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Pipe : public Memory { public: /*! \brief Constructs a Pipe in a specified context. * * Wraps clCreatePipe(). * @param context Context in which to create the pipe. * @param flags Bitfield. Only CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS are valid. * @param packet_size Size in bytes of a single packet of the pipe. * @param max_packets Number of packets that may be stored in the pipe. * */ Pipe( const Context& context, cl_uint packet_size, cl_uint max_packets, cl_int* err = NULL) { cl_int error; cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS; object_ = ::clCreatePipe(context(), flags, packet_size, max_packets, nullptr, &error); detail::errHandler(error, __CREATE_PIPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Pipe in a the default context. * * Wraps clCreatePipe(). * @param flags Bitfield. Only CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS are valid. * @param packet_size Size in bytes of a single packet of the pipe. * @param max_packets Number of packets that may be stored in the pipe. * */ Pipe( cl_uint packet_size, cl_uint max_packets, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS; object_ = ::clCreatePipe(context(), flags, packet_size, max_packets, nullptr, &error); detail::errHandler(error, __CREATE_PIPE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Pipe() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with earlier versions. * * See Memory for further details. */ explicit Pipe(const cl_mem& pipe, bool retainObject = false) : Memory(pipe, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Pipe& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Pipe(const Pipe& pipe) : Memory(pipe) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Pipe& operator = (const Pipe &pipe) { Memory::operator=(pipe); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Pipe(Pipe&& pipe) CL_HPP_NOEXCEPT_ : Memory(std::move(pipe)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Pipe& operator = (Pipe &&pipe) { Memory::operator=(std::move(pipe)); return *this; } //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_pipe_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetPipeInfo, object_, name, param), __GET_PIPE_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_pipe_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; // class Pipe #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for cl_sampler. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_sampler as the original. For details, see * clRetainSampler() and clReleaseSampler(). * * \see cl_sampler */ class Sampler : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Sampler() { } /*! \brief Constructs a Sampler in a specified context. * * Wraps clCreateSampler(). */ Sampler( const Context& context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_sampler_properties sampler_properties[] = { CL_SAMPLER_NORMALIZED_COORDS, normalized_coords, CL_SAMPLER_ADDRESSING_MODE, addressing_mode, CL_SAMPLER_FILTER_MODE, filter_mode, 0 }; object_ = ::clCreateSamplerWithProperties( context(), sampler_properties, &error); detail::errHandler(error, __CREATE_SAMPLER_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateSampler( context(), normalized_coords, addressing_mode, filter_mode, &error); detail::errHandler(error, __CREATE_SAMPLER_ERR); if (err != NULL) { *err = error; } #endif } /*! \brief Constructor from cl_sampler - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_sampler * into the new Sampler object. */ explicit Sampler(const cl_sampler& sampler, bool retainObject = false) : detail::Wrapper(sampler, retainObject) { } /*! \brief Assignment operator from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseSampler() on the value previously held by this instance. */ Sampler& operator = (const cl_sampler& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Sampler(const Sampler& sam) : detail::Wrapper(sam) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Sampler& operator = (const Sampler &sam) { detail::Wrapper::operator=(sam); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Sampler(Sampler&& sam) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(sam)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Sampler& operator = (Sampler &&sam) { detail::Wrapper::operator=(std::move(sam)); return *this; } //! \brief Wrapper for clGetSamplerInfo(). template cl_int getInfo(cl_sampler_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetSamplerInfo, object_, name, param), __GET_SAMPLER_INFO_ERR); } //! \brief Wrapper for clGetSamplerInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_sampler_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; class Program; class CommandQueue; class DeviceCommandQueue; class Kernel; //! \brief Class interface for specifying NDRange values. class NDRange { private: size_type sizes_[3]; cl_uint dimensions_; public: //! \brief Default constructor - resulting range has zero dimensions. NDRange() : dimensions_(0) { sizes_[0] = 0; sizes_[1] = 0; sizes_[2] = 0; } //! \brief Constructs one-dimensional range. NDRange(size_type size0) : dimensions_(1) { sizes_[0] = size0; sizes_[1] = 1; sizes_[2] = 1; } //! \brief Constructs two-dimensional range. NDRange(size_type size0, size_type size1) : dimensions_(2) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = 1; } //! \brief Constructs three-dimensional range. NDRange(size_type size0, size_type size1, size_type size2) : dimensions_(3) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = size2; } /*! \brief Conversion operator to const size_type *. * * \returns a pointer to the size of the first dimension. */ operator const size_type*() const { return sizes_; } //! \brief Queries the number of dimensions in the range. size_type dimensions() const { return dimensions_; } //! \brief Returns the size of the object in bytes based on the // runtime number of dimensions size_type size() const { return dimensions_*sizeof(size_type); } size_type* get() { return sizes_; } const size_type* get() const { return sizes_; } }; //! \brief A zero-dimensional range. static const NDRange NullRange; //! \brief Local address wrapper for use with Kernel::setArg struct LocalSpaceArg { size_type size_; }; namespace detail { template struct KernelArgumentHandler; // Enable for objects that are not subclasses of memory // Pointers, constants etc template struct KernelArgumentHandler::value>::type> { static size_type size(const T&) { return sizeof(T); } static const T* ptr(const T& value) { return &value; } }; // Enable for subclasses of memory where we want to get a reference to the cl_mem out // and pass that in for safety template struct KernelArgumentHandler::value>::type> { static size_type size(const T&) { return sizeof(cl_mem); } static const cl_mem* ptr(const T& value) { return &(value()); } }; // Specialization for DeviceCommandQueue defined later template <> struct KernelArgumentHandler { static size_type size(const LocalSpaceArg& value) { return value.size_; } static const void* ptr(const LocalSpaceArg&) { return NULL; } }; } //! \endcond /*! Local * \brief Helper function for generating LocalSpaceArg objects. */ inline LocalSpaceArg Local(size_type size) { LocalSpaceArg ret = { size }; return ret; } /*! \brief Class interface for cl_kernel. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_kernel as the original. For details, see * clRetainKernel() and clReleaseKernel(). * * \see cl_kernel */ class Kernel : public detail::Wrapper { public: inline Kernel(const Program& program, const char* name, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Kernel() { } /*! \brief Constructor from cl_kernel - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_kernel * into the new Kernel object. */ explicit Kernel(const cl_kernel& kernel, bool retainObject = false) : detail::Wrapper(kernel, retainObject) { } /*! \brief Assignment operator from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseKernel() on the value previously held by this instance. */ Kernel& operator = (const cl_kernel& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Kernel(const Kernel& kernel) : detail::Wrapper(kernel) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Kernel& operator = (const Kernel &kernel) { detail::Wrapper::operator=(kernel); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Kernel(Kernel&& kernel) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(kernel)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Kernel& operator = (Kernel &&kernel) { detail::Wrapper::operator=(std::move(kernel)); return *this; } template cl_int getInfo(cl_kernel_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelInfo, object_, name, param), __GET_KERNEL_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getArgInfo(cl_uint argIndex, cl_kernel_arg_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelArgInfo, object_, argIndex, name, param), __GET_KERNEL_ARG_INFO_ERR); } template typename detail::param_traits::param_type getArgInfo(cl_uint argIndex, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_arg_info, name>::param_type param; cl_int result = getArgInfo(argIndex, name, ¶m); if (err != NULL) { *err = result; } return param; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getWorkGroupInfo( const Device& device, cl_kernel_work_group_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetKernelWorkGroupInfo, object_, device(), name, param), __GET_KERNEL_WORK_GROUP_INFO_ERR); } template typename detail::param_traits::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_work_group_info, name>::param_type param; cl_int result = getWorkGroupInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) cl_int getSubGroupInfo(const cl::Device &dev, cl_kernel_sub_group_info name, const cl::NDRange &range, size_type* param) const { typedef clGetKernelSubGroupInfoKHR_fn PFN_clGetKernelSubGroupInfoKHR; static PFN_clGetKernelSubGroupInfoKHR pfn_clGetKernelSubGroupInfoKHR = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_(clGetKernelSubGroupInfoKHR); return detail::errHandler( pfn_clGetKernelSubGroupInfoKHR(object_, dev(), name, range.size(), range.get(), sizeof(size_type), param, nullptr), __GET_KERNEL_ARG_INFO_ERR); } template size_type getSubGroupInfo(const cl::Device &dev, const cl::NDRange &range, cl_int* err = NULL) const { size_type param; cl_int result = getSubGroupInfo(dev, name, range, ¶m); if (err != NULL) { *err = result; } return param; } #endif // #if defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief setArg overload taking a shared_ptr type */ template cl_int setArg(cl_uint index, const cl::pointer &argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr.get()), __SET_KERNEL_ARGS_ERR); } /*! \brief setArg overload taking a vector type. */ template cl_int setArg(cl_uint index, const cl::vector &argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr.data()), __SET_KERNEL_ARGS_ERR); } /*! \brief setArg overload taking a pointer type */ template typename std::enable_if::value, cl_int>::type setArg(cl_uint index, const T argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr), __SET_KERNEL_ARGS_ERR); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief setArg overload taking a POD type */ template typename std::enable_if::value, cl_int>::type setArg(cl_uint index, const T &value) { return detail::errHandler( ::clSetKernelArg( object_, index, detail::KernelArgumentHandler::size(value), detail::KernelArgumentHandler::ptr(value)), __SET_KERNEL_ARGS_ERR); } cl_int setArg(cl_uint index, size_type size, const void* argPtr) { return detail::errHandler( ::clSetKernelArg(object_, index, size, argPtr), __SET_KERNEL_ARGS_ERR); } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! * Specify a vector of SVM pointers that the kernel may access in * addition to its arguments. */ cl_int setSVMPointers(const vector &pointerList) { return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*pointerList.size(), pointerList.data())); } /*! * Specify a std::array of SVM pointers that the kernel may access in * addition to its arguments. */ template cl_int setSVMPointers(const std::array &pointerList) { return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*pointerList.size(), pointerList.data())); } /*! \brief Enable fine-grained system SVM. * * \note It is only possible to enable fine-grained system SVM if all devices * in the context associated with kernel support it. * * \param svmEnabled True if fine-grained system SVM is requested. False otherwise. * \return CL_SUCCESS if the function was executed succesfully. CL_INVALID_OPERATION * if no devices in the context support fine-grained system SVM. * * \see clSetKernelExecInfo */ cl_int enableFineGrainedSystemSVM(bool svmEnabled) { cl_bool svmEnabled_ = svmEnabled ? CL_TRUE : CL_FALSE; return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM, sizeof(cl_bool), &svmEnabled_ ) ); } template void setSVMPointersHelper(std::array &pointerList, const pointer &t0, Ts... ts) { pointerList[index] = static_cast(t0.get()); setSVMPointersHelper(ts...); } template typename std::enable_if::value, void>::type setSVMPointersHelper(std::array &pointerList, T0 t0, Ts... ts) { pointerList[index] = static_cast(t0); setSVMPointersHelper(ts...); } template void setSVMPointersHelper(std::array &pointerList, const pointer &t0) { pointerList[index] = static_cast(t0.get()); } template typename std::enable_if::value, void>::type setSVMPointersHelper(std::array &pointerList, T0 t0) { pointerList[index] = static_cast(t0); } template cl_int setSVMPointers(const T0 &t0, Ts... ts) { std::array pointerList; setSVMPointersHelper<0, 1 + sizeof...(Ts)>(pointerList, t0, ts...); return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*(1 + sizeof...(Ts)), pointerList.data())); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 }; /*! \class Program * \brief Program interface that implements cl_program. */ class Program : public detail::Wrapper { public: #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) typedef vector> Binaries; typedef vector Sources; #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) typedef vector > Binaries; typedef vector > Sources; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) Program( const string& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const size_type length = source.size(); Context context = Context::getDefault(err); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) "-cl-std=CL2.0", #else "", #endif // #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) NULL, NULL); detail::buildErrHandler(error, __BUILD_PROGRAM_ERR, getBuildInfo()); } if (err != NULL) { *err = error; } } Program( const Context& context, const string& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const size_type length = source.size(); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) "-cl-std=CL2.0", #else "", #endif // #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) NULL, NULL); detail::buildErrHandler(error, __BUILD_PROGRAM_ERR, getBuildInfo()); } if (err != NULL) { *err = error; } } /** * Create a program from a vector of source strings and the default context. * Does not compile or link the program. */ Program( const Sources& sources, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); const size_type n = (size_type)sources.size(); vector lengths(n); vector strings(n); for (size_type i = 0; i < n; ++i) { #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].data(); lengths[i] = sources[(int)i].length(); #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings.data(), lengths.data(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Create a program from a vector of source strings and a provided context. * Does not compile or link the program. */ Program( const Context& context, const Sources& sources, cl_int* err = NULL) { cl_int error; const size_type n = (size_type)sources.size(); vector lengths(n); vector strings(n); for (size_type i = 0; i < n; ++i) { #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].data(); lengths[i] = sources[(int)i].length(); #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings.data(), lengths.data(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Construct a program object from a list of devices and a per-device list of binaries. * \param context A valid OpenCL context in which to construct the program. * \param devices A vector of OpenCL device objects for which the program will be created. * \param binaries A vector of pairs of a pointer to a binary object and its length. * \param binaryStatus An optional vector that on completion will be resized to * match the size of binaries and filled with values to specify if each binary * was successfully loaded. * Set to CL_SUCCESS if the binary was successfully loaded. * Set to CL_INVALID_VALUE if the length is 0 or the binary pointer is NULL. * Set to CL_INVALID_BINARY if the binary provided is not valid for the matching device. * \param err if non-NULL will be set to CL_SUCCESS on successful operation or one of the following errors: * CL_INVALID_CONTEXT if context is not a valid context. * CL_INVALID_VALUE if the length of devices is zero; or if the length of binaries does not match the length of devices; * or if any entry in binaries is NULL or has length 0. * CL_INVALID_DEVICE if OpenCL devices listed in devices are not in the list of devices associated with context. * CL_INVALID_BINARY if an invalid program binary was encountered for any device. binaryStatus will return specific status for each device. * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ Program( const Context& context, const vector& devices, const Binaries& binaries, vector* binaryStatus = NULL, cl_int* err = NULL) { cl_int error; const size_type numDevices = devices.size(); // Catch size mismatch early and return if(binaries.size() != numDevices) { error = CL_INVALID_VALUE; detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } return; } vector lengths(numDevices); vector images(numDevices); #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) for (size_type i = 0; i < numDevices; ++i) { images[i] = binaries[i].data(); lengths[i] = binaries[(int)i].size(); } #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) for (size_type i = 0; i < numDevices; ++i) { images[i] = (const unsigned char*)binaries[i].first; lengths[i] = binaries[(int)i].second; } #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } if(binaryStatus) { binaryStatus->resize(numDevices); } object_ = ::clCreateProgramWithBinary( context(), (cl_uint) devices.size(), deviceIDs.data(), lengths.data(), images.data(), (binaryStatus != NULL && numDevices > 0) ? &binaryStatus->front() : NULL, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Create program using builtin kernels. * \param kernelNames Semi-colon separated list of builtin kernel names */ Program( const Context& context, const vector& devices, const string& kernelNames, cl_int* err = NULL) { cl_int error; size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateProgramWithBuiltInKernels( context(), (cl_uint) devices.size(), deviceIDs.data(), kernelNames.c_str(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 Program() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit Program(const cl_program& program, bool retainObject = false) : detail::Wrapper(program, retainObject) { } Program& operator = (const cl_program& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Program(const Program& program) : detail::Wrapper(program) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Program& operator = (const Program &program) { detail::Wrapper::operator=(program); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Program(Program&& program) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(program)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Program& operator = (Program &&program) { detail::Wrapper::operator=(std::move(program)); return *this; } cl_int build( const vector& devices, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } cl_int buildError = ::clBuildProgram( object_, (cl_uint) devices.size(), deviceIDs.data(), options, notifyFptr, data); return detail::buildErrHandler(buildError, __BUILD_PROGRAM_ERR, getBuildInfo()); } cl_int build( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { cl_int buildError = ::clBuildProgram( object_, 0, NULL, options, notifyFptr, data); return detail::buildErrHandler(buildError, __BUILD_PROGRAM_ERR, getBuildInfo()); } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int compile( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { cl_int error = ::clCompileProgram( object_, 0, NULL, options, 0, NULL, NULL, notifyFptr, data); return detail::buildErrHandler(error, __COMPILE_PROGRAM_ERR, getBuildInfo()); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getInfo(cl_program_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int getBuildInfo( const Device& device, cl_program_build_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetProgramBuildInfo, object_, device(), name, param), __GET_PROGRAM_BUILD_INFO_ERR); } template typename detail::param_traits::param_type getBuildInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; cl_int result = getBuildInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } /** * Build info function that returns a vector of device/info pairs for the specified * info type and for all devices in the program. * On an error reading the info for any device, an empty vector of info will be returned. */ template vector::param_type>> getBuildInfo(cl_int *err = NULL) const { cl_int result = CL_SUCCESS; auto devs = getInfo(&result); vector::param_type>> devInfo; // If there was an initial error from getInfo return the error if (result != CL_SUCCESS) { if (err != NULL) { *err = result; } return devInfo; } for (const cl::Device &d : devs) { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; result = getBuildInfo(d, name, ¶m); devInfo.push_back( std::pair::param_type> (d, param)); if (result != CL_SUCCESS) { // On error, leave the loop and return the error code break; } } if (err != NULL) { *err = result; } if (result != CL_SUCCESS) { devInfo.clear(); } return devInfo; } cl_int createKernels(vector* kernels) { cl_uint numKernels; cl_int err = ::clCreateKernelsInProgram(object_, 0, NULL, &numKernels); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } vector value(numKernels); err = ::clCreateKernelsInProgram( object_, numKernels, value.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } if (kernels) { kernels->resize(value.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < value.size(); i++) { // We do not need to retain because this kernel is being created // by the runtime (*kernels)[i] = Kernel(value[i], false); } } return CL_SUCCESS; } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 inline Program linkProgram( Program input1, Program input2, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program programs[2] = { input1(), input2() }; Context ctx = input1.getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, 2, programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } inline Program linkProgram( vector inputPrograms, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; vector programs(inputPrograms.size()); for (unsigned int i = 0; i < inputPrograms.size(); i++) { programs[i] = inputPrograms[i](); } Context ctx; if(inputPrograms.size() > 0) { ctx = inputPrograms[0].getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, (cl_uint)inputPrograms.size(), programs.data(), notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog, false); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 // Template specialization for CL_PROGRAM_BINARIES template <> inline cl_int cl::Program::getInfo(cl_program_info name, vector>* param) const { if (name != CL_PROGRAM_BINARIES) { return CL_INVALID_VALUE; } if (param) { // Resize the parameter array appropriately for each allocation // and pass down to the helper vector sizes = getInfo(); size_type numBinaries = sizes.size(); // Resize the parameter array and constituent arrays param->resize(numBinaries); for (size_type i = 0; i < numBinaries; ++i) { (*param)[i].resize(sizes[i]); } return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } return CL_SUCCESS; } template<> inline vector> cl::Program::getInfo(cl_int* err) const { vector> binariesVectors; cl_int result = getInfo(CL_PROGRAM_BINARIES, &binariesVectors); if (err != NULL) { *err = result; } return binariesVectors; } inline Kernel::Kernel(const Program& program, const char* name, cl_int* err) { cl_int error; object_ = ::clCreateKernel(program(), name, &error); detail::errHandler(error, __CREATE_KERNEL_ERR); if (err != NULL) { *err = error; } } enum class QueueProperties : cl_command_queue_properties { None = 0, Profiling = CL_QUEUE_PROFILING_ENABLE, OutOfOrder = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, }; inline QueueProperties operator|(QueueProperties lhs, QueueProperties rhs) { return static_cast(static_cast(lhs) | static_cast(rhs)); } /*! \class CommandQueue * \brief CommandQueue interface for cl_command_queue. */ class CommandQueue : public detail::Wrapper { private: static std::once_flag default_initialized_; static CommandQueue default_; static cl_int default_error_; /*! \brief Create the default command queue returned by @ref getDefault. * * It sets default_error_ to indicate success or failure. It does not throw * @c cl::Error. */ static void makeDefault() { /* We don't want to throw an error from this function, so we have to * catch and set the error flag. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { int error; Context context = Context::getDefault(&error); if (error != CL_SUCCESS) { default_error_ = error; } else { Device device = Device::getDefault(); default_ = CommandQueue(context, device, 0, &default_error_); } } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default command queue. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const CommandQueue &c) { default_ = c; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = CommandQueue(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE /*! * \brief Constructs a CommandQueue based on passed properties. * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( cl_command_queue_properties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; if ((properties & CL_QUEUE_ON_DEVICE) == 0) { object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); } else { error = CL_INVALID_QUEUE_PROPERTIES; } detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } } /*! * \brief Constructs a CommandQueue based on passed properties. * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( QueueProperties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ explicit CommandQueue( const Context& context, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; vector devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; if ((properties & CL_QUEUE_ON_DEVICE) == 0) { object_ = ::clCreateCommandQueueWithProperties( context(), devices[0](), queue_properties, &error); } else { error = CL_INVALID_QUEUE_PROPERTIES; } detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), devices[0](), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ explicit CommandQueue( const Context& context, QueueProperties properties, cl_int* err = NULL) { cl_int error; vector devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), devices[0](), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), devices[0](), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for a passed device and context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( const Context& context, const Device& device, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for a passed device and context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( const Context& context, const Device& device, QueueProperties properties, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } static CommandQueue getDefault(cl_int * err = NULL) { std::call_once(default_initialized_, makeDefault); #if CL_HPP_TARGET_OPENCL_VERSION >= 200 detail::errHandler(default_error_, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); #else // CL_HPP_TARGET_OPENCL_VERSION >= 200 detail::errHandler(default_error_, __CREATE_COMMAND_QUEUE_ERR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 if (err != NULL) { *err = default_error_; } return default_; } /** * Modify the default command queue to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default command queue. * Should be compared to the passed value to ensure that it was updated. */ static CommandQueue setDefault(const CommandQueue &default_queue) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_queue)); detail::errHandler(default_error_); return default_; } CommandQueue() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit CommandQueue(const cl_command_queue& commandQueue, bool retainObject = false) : detail::Wrapper(commandQueue, retainObject) { } CommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue(const CommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (const CommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue(CommandQueue&& queue) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (CommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, const void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, size_type src_offset, size_type dst_offset, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBuffer( object_, src(), dst(), src_offset, dst_offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, void *ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBufferRect( object_, buffer(), blocking, buffer_offset.data(), host_offset.data(), region.data(), buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, const void *ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBufferRect( object_, buffer(), blocking, buffer_offset.data(), host_offset.data(), region.data(), buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const array& src_origin, const array& dst_origin, const array& region, size_type src_row_pitch, size_type src_slice_pitch, size_type dst_row_pitch, size_type dst_slice_pitch, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferRect( object_, src(), dst(), src_origin.data(), dst_origin.data(), region.data(), src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueue a command to fill a buffer object with a pattern * of a given size. The pattern is specified as a vector type. * \tparam PatternType The datatype of the pattern field. * The pattern type must be an accepted OpenCL data type. * \tparam offset Is the offset in bytes into the buffer at * which to start filling. This must be a multiple of * the pattern size. * \tparam size Is the size in bytes of the region to fill. * This must be a multiple of the pattern size. */ template cl_int enqueueFillBuffer( const Buffer& buffer, PatternType pattern, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillBuffer( object_, buffer(), static_cast(&pattern), sizeof(PatternType), offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueReadImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadImage( object_, image(), blocking, origin.data(), region.data(), row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, const void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteImage( object_, image(), blocking, origin.data(), region.data(), row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyImage( const Image& src, const Image& dst, const array& src_origin, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImage( object_, src(), dst(), src_origin.data(), dst_origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA floating-point color value if * the image channel data type is not an unnormalized signed or * unsigned data type. */ cl_int enqueueFillImage( const Image& image, cl_float4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA signed integer color value if * the image channel data type is an unnormalized signed integer * type. */ cl_int enqueueFillImage( const Image& image, cl_int4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA unsigned integer color value if * the image channel data type is an unnormalized unsigned integer * type. */ cl_int enqueueFillImage( const Image& image, cl_uint4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const array& src_origin, const array& region, size_type dst_offset, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImageToBuffer( object_, src(), dst(), src_origin.data(), region.data(), dst_offset, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, size_type src_offset, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferToImage( object_, src(), dst(), src_offset, dst_origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapBuffer( object_, buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } void* enqueueMapImage( const Image& buffer, cl_bool blocking, cl_map_flags flags, const array& origin, const array& region, size_type * row_pitch, size_type * slice_pitch, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapImage( object_, buffer(), blocking, flags, origin.data(), region.data(), row_pitch, slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_IMAGE_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a raw SVM pointer. */ template cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(ptr), size, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a cl::pointer instance. */ template cl_int enqueueMapSVM( cl::pointer &ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(ptr.get()), size, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a cl::vector instance. */ template cl_int enqueueMapSVM( cl::vector &container, cl_bool blocking, cl_map_flags flags, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(container.data()), container.size(), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( object_, memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a raw SVM pointer. */ template cl_int enqueueUnmapSVM( T* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(ptr), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a cl::pointer instance. */ template cl_int enqueueUnmapSVM( cl::pointer &ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(ptr.get()), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a cl::vector instance. */ template cl_int enqueueUnmapSVM( cl::vector &container, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(container.data()), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueues a marker command which waits for either a list of events to complete, * or all previously enqueued commands to complete. * * Enqueues a marker command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command returns an event which can be waited on, * i.e. this event can be waited on to insure that all events either in the event_wait_list * or all previously enqueued commands, queued before this command to command_queue, * have completed. */ cl_int enqueueMarkerWithWaitList( const vector *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarkerWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * A synchronization point that enqueues a barrier operation. * * Enqueues a barrier command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command blocks command execution, that is, any * following commands enqueued after it do not execute until it completes. This command * returns an event which can be waited on, i.e. this event can be waited on to insure that * all events either in the event_wait_list or all previously enqueued commands, queued * before this command to command_queue, have completed. */ cl_int enqueueBarrierWithWaitList( const vector *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueBarrierWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_BARRIER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command to indicate with which device a set of memory objects * should be associated. */ cl_int enqueueMigrateMemObjects( const vector &memObjects, cl_mem_migration_flags flags, const vector* events = NULL, Event* event = NULL ) const { cl_event tmp; vector localMemObjects(memObjects.size()); for( int i = 0; i < (int)memObjects.size(); ++i ) { localMemObjects[i] = memObjects[i](); } cl_int err = detail::errHandler( ::clEnqueueMigrateMemObjects( object_, (cl_uint)memObjects.size(), localMemObjects.data(), flags, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueNDRangeKernel( const Kernel& kernel, const NDRange& offset, const NDRange& global, const NDRange& local = NullRange, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNDRangeKernel( object_, kernel(), (cl_uint) global.dimensions(), offset.dimensions() != 0 ? (const size_type*) offset : NULL, (const size_type*) global, local.dimensions() != 0 ? (const size_type*) local : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NDRANGE_KERNEL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_int enqueueTask( const Kernel& kernel, const vector* events = NULL, Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueTask( object_, kernel(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_TASK_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) cl_int enqueueNativeKernel( void (CL_CALLBACK *userFptr)(void *), std::pair args, const vector* mem_objects = NULL, const vector* mem_locs = NULL, const vector* events = NULL, Event* event = NULL) const { size_type elements = 0; if (mem_objects != NULL) { elements = mem_objects->size(); } vector mems(elements); for (unsigned int i = 0; i < elements; i++) { mems[i] = ((*mem_objects)[i])(); } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNativeKernel( object_, userFptr, args.first, args.second, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, mems.data(), (mem_locs != NULL && mem_locs->size() > 0) ? (const void **) &mem_locs->front() : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NATIVE_KERNEL); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueMarker(Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarker( object_, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueWaitForEvents(const vector& events) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueWaitForEvents( object_, (cl_uint) events.size(), events.size() > 0 ? (const cl_event*) &events.front() : NULL), __ENQUEUE_WAIT_FOR_EVENTS_ERR); } #endif // defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) cl_int enqueueAcquireGLObjects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueAcquireGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseGLObjects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReleaseGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined (CL_HPP_USE_DX_INTEROP) typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueAcquireD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueReleaseD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); cl_int enqueueAcquireD3D10Objects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { static PFN_clEnqueueAcquireD3D10ObjectsKHR pfn_clEnqueueAcquireD3D10ObjectsKHR = NULL; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clEnqueueAcquireD3D10ObjectsKHR); #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clEnqueueAcquireD3D10ObjectsKHR); #endif cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueAcquireD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseD3D10Objects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { static PFN_clEnqueueReleaseD3D10ObjectsKHR pfn_clEnqueueReleaseD3D10ObjectsKHR = NULL; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clEnqueueReleaseD3D10ObjectsKHR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clEnqueueReleaseD3D10ObjectsKHR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueReleaseD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueBarrier() const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueBarrier(object_), __ENQUEUE_BARRIER_ERR); } #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS cl_int flush() const { return detail::errHandler(::clFlush(object_), __FLUSH_ERR); } cl_int finish() const { return detail::errHandler(::clFinish(object_), __FINISH_ERR); } }; // CommandQueue CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag CommandQueue::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ CommandQueue CommandQueue::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int CommandQueue::default_error_ = CL_SUCCESS; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 enum class DeviceQueueProperties : cl_command_queue_properties { None = 0, Profiling = CL_QUEUE_PROFILING_ENABLE, }; inline DeviceQueueProperties operator|(DeviceQueueProperties lhs, DeviceQueueProperties rhs) { return static_cast(static_cast(lhs) | static_cast(rhs)); } /*! \class DeviceCommandQueue * \brief DeviceCommandQueue interface for device cl_command_queues. */ class DeviceCommandQueue : public detail::Wrapper { public: /*! * Trivial empty constructor to create a null queue. */ DeviceCommandQueue() { } /*! * Default construct device command queue on default context and device */ DeviceCommandQueue(DeviceQueueProperties properties, cl_int* err = NULL) { cl_int error; cl::Context context = cl::Context::getDefault(); cl::Device device = cl::Device::getDefault(); cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! * Create a device command queue for a specified device in the passed context. */ DeviceCommandQueue( const Context& context, const Device& device, DeviceQueueProperties properties = DeviceQueueProperties::None, cl_int* err = NULL) { cl_int error; cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! * Create a device command queue for a specified device in the passed context. */ DeviceCommandQueue( const Context& context, const Device& device, cl_uint queueSize, DeviceQueueProperties properties = DeviceQueueProperties::None, cl_int* err = NULL) { cl_int error; cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, CL_QUEUE_SIZE, queueSize, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructor from cl_command_queue - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit DeviceCommandQueue(const cl_command_queue& commandQueue, bool retainObject = false) : detail::Wrapper(commandQueue, retainObject) { } DeviceCommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue(const DeviceCommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue& operator = (const DeviceCommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue(DeviceCommandQueue&& queue) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue& operator = (DeviceCommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! * Create a new default device command queue for the default device, * in the default context and of the default size. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( cl_int *err = nullptr) { cl_int error; cl::Context context = cl::Context::getDefault(); cl::Device device = cl::Device::getDefault(); cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } /*! * Create a new default device command queue for the specified device * and of the default size. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( const Context &context, const Device &device, cl_int *err = nullptr) { cl_int error; cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } /*! * Create a new default device command queue for the specified device * and of the requested size in bytes. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( const Context &context, const Device &device, cl_uint queueSize, cl_int *err = nullptr) { cl_int error; cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, CL_QUEUE_SIZE, queueSize, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } }; // DeviceCommandQueue namespace detail { // Specialization for device command queue template <> struct KernelArgumentHandler { static size_type size(const cl::DeviceCommandQueue&) { return sizeof(cl_command_queue); } static const cl_command_queue* ptr(const cl::DeviceCommandQueue& value) { return &(value()); } }; } // namespace detail #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 template< typename IteratorType > Buffer::Buffer( const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { CommandQueue queue(context, 0, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } template< typename IteratorType > Buffer::Buffer( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if (readOnly) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); Context context = queue.getInfo(); if (useHostPtr) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if (!useHostPtr) { error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } inline cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBuffer(buffer, blocking, offset, size, ptr, events, event); } inline cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, const void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBuffer(buffer, blocking, offset, size, ptr, events, event); } inline void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } void * result = ::clEnqueueMapBuffer( queue(), buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (cl_event*) event, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } return result; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a raw SVM pointer. */ template inline cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events, Event* event) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( ptr, blocking, flags, size, events, event); } /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a cl::pointer instance. */ template inline cl_int enqueueMapSVM( cl::pointer ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( ptr, blocking, flags, size, events, event); } /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a cl::vector instance. */ template inline cl_int enqueueMapSVM( cl::vector container, cl_bool blocking, cl_map_flags flags, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( container, blocking, flags, events, event); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 inline cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (error != CL_SUCCESS) { return error; } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( queue(), memory(), mapped_ptr, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a raw SVM pointer. */ template inline cl_int enqueueUnmapSVM( T* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(ptr, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a cl::pointer instance. */ template inline cl_int enqueueUnmapSVM( cl::pointer &ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(ptr, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a cl::vector instance. */ template inline cl_int enqueueUnmapSVM( cl::vector &container, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(container, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 inline cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, size_type src_offset, size_type dst_offset, size_type size, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBuffer(src, dst, src_offset, dst_offset, size, events, event); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, startIterator, endIterator, buffer); } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, buffer, startIterator, endIterator); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; size_type length = endIterator-startIterator; size_type byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_WRITE, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } #if defined(_MSC_VER) std::copy( startIterator, endIterator, stdext::checked_array_iterator( pointer, length)); #else std::copy(startIterator, endIterator, pointer); #endif Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; size_type length = endIterator-startIterator; size_type byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_READ, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } std::copy(pointer, pointer + length, startIterator); Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Blocking SVM map operation - performs a blocking map underneath. */ template inline cl_int mapSVM(cl::vector &container) { return enqueueMapSVM(container, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE); } /** * Blocking SVM map operation - performs a blocking map underneath. */ template inline cl_int unmapSVM(cl::vector &container) { return enqueueUnmapSVM(container); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 110 inline cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, void *ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, const void *ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const array& src_origin, const array& dst_origin, const array& region, size_type src_row_pitch, size_type src_slice_pitch, size_type dst_row_pitch, size_type dst_slice_pitch, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferRect( src, dst, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, events, event); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 inline cl_int enqueueReadImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, const void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueCopyImage( const Image& src, const Image& dst, const array& src_origin, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImage( src, dst, src_origin, dst_origin, region, events, event); } inline cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const array& src_origin, const array& region, size_type dst_offset, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImageToBuffer( src, dst, src_origin, region, dst_offset, events, event); } inline cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, size_type src_offset, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferToImage( src, dst, src_offset, dst_origin, region, events, event); } inline cl_int flush(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.flush(); } inline cl_int finish(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.finish(); } class EnqueueArgs { private: CommandQueue queue_; const NDRange offset_; const NDRange global_; const NDRange local_; vector events_; template friend class KernelFunctor; public: EnqueueArgs(NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { } EnqueueArgs(Event e, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(Event e, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(Event e, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(const vector &events, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(const vector &events, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(const vector &events, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(CommandQueue &queue, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, Event e, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local), events_(events) { } }; //---------------------------------------------------------------------------------------------- /** * Type safe kernel functor. * */ template class KernelFunctor { private: Kernel kernel_; template void setArgs(T0&& t0, T1s&&... t1s) { kernel_.setArg(index, t0); setArgs(std::forward(t1s)...); } template void setArgs(T0&& t0) { kernel_.setArg(index, t0); } template void setArgs() { } public: KernelFunctor(Kernel kernel) : kernel_(kernel) {} KernelFunctor( const Program& program, const string name, cl_int * err = NULL) : kernel_(program, name.c_str(), err) {} //! \brief Return type of the functor typedef Event result_type; /** * Enqueue kernel. * @param args Launch parameters of the kernel. * @param t0... List of kernel arguments based on the template type of the functor. */ Event operator() ( const EnqueueArgs& args, Ts... ts) { Event event; setArgs<0>(std::forward(ts)...); args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } /** * Enqueue kernel with support for error code. * @param args Launch parameters of the kernel. * @param t0... List of kernel arguments based on the template type of the functor. * @param error Out parameter returning the error code from the execution. */ Event operator() ( const EnqueueArgs& args, Ts... ts, cl_int &error) { Event event; setArgs<0>(std::forward(ts)...); error = args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_int setSVMPointers(const vector &pointerList) { return kernel_.setSVMPointers(pointerList); } template cl_int setSVMPointers(const T0 &t0, T1s... ts) { return kernel_.setSVMPointers(t0, ts...); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 Kernel getKernel() { return kernel_; } }; namespace compatibility { /** * Backward compatibility class to ensure that cl.hpp code works with cl2.hpp. * Please use KernelFunctor directly. */ template struct make_kernel { typedef KernelFunctor FunctorType; FunctorType functor_; make_kernel( const Program& program, const string name, cl_int * err = NULL) : functor_(FunctorType(program, name, err)) {} make_kernel( const Kernel kernel) : functor_(FunctorType(kernel)) {} //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, Ts...); Event operator()( const EnqueueArgs& enqueueArgs, Ts... args) { return functor_( enqueueArgs, args...); } }; } // namespace compatibility //---------------------------------------------------------------------------------------------------------------------- #undef CL_HPP_ERR_STR_ #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) #undef __GET_DEVICE_INFO_ERR #undef __GET_PLATFORM_INFO_ERR #undef __GET_DEVICE_IDS_ERR #undef __GET_CONTEXT_INFO_ERR #undef __GET_EVENT_INFO_ERR #undef __GET_EVENT_PROFILE_INFO_ERR #undef __GET_MEM_OBJECT_INFO_ERR #undef __GET_IMAGE_INFO_ERR #undef __GET_SAMPLER_INFO_ERR #undef __GET_KERNEL_INFO_ERR #undef __GET_KERNEL_ARG_INFO_ERR #undef __GET_KERNEL_WORK_GROUP_INFO_ERR #undef __GET_PROGRAM_INFO_ERR #undef __GET_PROGRAM_BUILD_INFO_ERR #undef __GET_COMMAND_QUEUE_INFO_ERR #undef __CREATE_CONTEXT_ERR #undef __CREATE_CONTEXT_FROM_TYPE_ERR #undef __GET_SUPPORTED_IMAGE_FORMATS_ERR #undef __CREATE_BUFFER_ERR #undef __CREATE_SUBBUFFER_ERR #undef __CREATE_IMAGE2D_ERR #undef __CREATE_IMAGE3D_ERR #undef __CREATE_SAMPLER_ERR #undef __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR #undef __CREATE_USER_EVENT_ERR #undef __SET_USER_EVENT_STATUS_ERR #undef __SET_EVENT_CALLBACK_ERR #undef __SET_PRINTF_CALLBACK_ERR #undef __WAIT_FOR_EVENTS_ERR #undef __CREATE_KERNEL_ERR #undef __SET_KERNEL_ARGS_ERR #undef __CREATE_PROGRAM_WITH_SOURCE_ERR #undef __CREATE_PROGRAM_WITH_BINARY_ERR #undef __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR #undef __BUILD_PROGRAM_ERR #undef __CREATE_KERNELS_IN_PROGRAM_ERR #undef __CREATE_COMMAND_QUEUE_ERR #undef __SET_COMMAND_QUEUE_PROPERTY_ERR #undef __ENQUEUE_READ_BUFFER_ERR #undef __ENQUEUE_WRITE_BUFFER_ERR #undef __ENQUEUE_READ_BUFFER_RECT_ERR #undef __ENQUEUE_WRITE_BUFFER_RECT_ERR #undef __ENQEUE_COPY_BUFFER_ERR #undef __ENQEUE_COPY_BUFFER_RECT_ERR #undef __ENQUEUE_READ_IMAGE_ERR #undef __ENQUEUE_WRITE_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR #undef __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR #undef __ENQUEUE_MAP_BUFFER_ERR #undef __ENQUEUE_MAP_IMAGE_ERR #undef __ENQUEUE_UNMAP_MEM_OBJECT_ERR #undef __ENQUEUE_NDRANGE_KERNEL_ERR #undef __ENQUEUE_TASK_ERR #undef __ENQUEUE_NATIVE_KERNEL #undef __UNLOAD_COMPILER_ERR #undef __CREATE_SUB_DEVICES_ERR #undef __CREATE_PIPE_ERR #undef __GET_PIPE_INFO_ERR #endif //CL_HPP_USER_OVERRIDE_ERROR_STRINGS // Extensions #undef CL_HPP_INIT_CL_EXT_FCN_PTR_ #undef CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_ #if defined(CL_HPP_USE_CL_DEVICE_FISSION) #undef CL_HPP_PARAM_NAME_DEVICE_FISSION_ #endif // CL_HPP_USE_CL_DEVICE_FISSION #undef CL_HPP_NOEXCEPT_ #undef CL_HPP_DEFINE_STATIC_MEMBER_ } // namespace cl #endif // CL_HPP_ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl_d3d10.h000066400000000000000000000120021450307266000232570ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D10_H #define __OPENCL_CL_D3D10_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d10_sharing */ #define cl_khr_d3d10_sharing 1 typedef cl_uint cl_d3d10_device_source_khr; typedef cl_uint cl_d3d10_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D10_DEVICE_KHR -1002 #define CL_INVALID_D3D10_RESOURCE_KHR -1003 #define CL_D3D10_RESOURCE_ALREADY_ACQUIRED_KHR -1004 #define CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR -1005 /* cl_d3d10_device_source_nv */ #define CL_D3D10_DEVICE_KHR 0x4010 #define CL_D3D10_DXGI_ADAPTER_KHR 0x4011 /* cl_d3d10_device_set_nv */ #define CL_PREFERRED_DEVICES_FOR_D3D10_KHR 0x4012 #define CL_ALL_DEVICES_FOR_D3D10_KHR 0x4013 /* cl_context_info */ #define CL_CONTEXT_D3D10_DEVICE_KHR 0x4014 #define CL_CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR 0x402C /* cl_mem_info */ #define CL_MEM_D3D10_RESOURCE_KHR 0x4015 /* cl_image_info */ #define CL_IMAGE_D3D10_SUBRESOURCE_KHR 0x4016 /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR 0x4017 #define CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR 0x4018 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D10KHR_fn)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D10_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl_d3d11.h000066400000000000000000000117741450307266000232770ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D11_H #define __OPENCL_CL_D3D11_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d11_sharing */ #define cl_khr_d3d11_sharing 1 typedef cl_uint cl_d3d11_device_source_khr; typedef cl_uint cl_d3d11_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D11_DEVICE_KHR -1006 #define CL_INVALID_D3D11_RESOURCE_KHR -1007 #define CL_D3D11_RESOURCE_ALREADY_ACQUIRED_KHR -1008 #define CL_D3D11_RESOURCE_NOT_ACQUIRED_KHR -1009 /* cl_d3d11_device_source */ #define CL_D3D11_DEVICE_KHR 0x4019 #define CL_D3D11_DXGI_ADAPTER_KHR 0x401A /* cl_d3d11_device_set */ #define CL_PREFERRED_DEVICES_FOR_D3D11_KHR 0x401B #define CL_ALL_DEVICES_FOR_D3D11_KHR 0x401C /* cl_context_info */ #define CL_CONTEXT_D3D11_DEVICE_KHR 0x401D #define CL_CONTEXT_D3D11_PREFER_SHARED_RESOURCES_KHR 0x402D /* cl_mem_info */ #define CL_MEM_D3D11_RESOURCE_KHR 0x401E /* cl_image_info */ #define CL_IMAGE_D3D11_SUBRESOURCE_KHR 0x401F /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D11_OBJECTS_KHR 0x4020 #define CL_COMMAND_RELEASE_D3D11_OBJECTS_KHR 0x4021 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D11KHR_fn)( cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void * d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D11_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl_dx9_media_sharing.h000066400000000000000000000124551450307266000260360ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_DX9_MEDIA_SHARING_H #define __OPENCL_CL_DX9_MEDIA_SHARING_H #include #include #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ /* cl_khr_dx9_media_sharing */ #define cl_khr_dx9_media_sharing 1 typedef cl_uint cl_dx9_media_adapter_type_khr; typedef cl_uint cl_dx9_media_adapter_set_khr; #if defined(_WIN32) #include typedef struct _cl_dx9_surface_info_khr { IDirect3DSurface9 *resource; HANDLE shared_handle; } cl_dx9_surface_info_khr; #endif /******************************************************************************/ /* Error Codes */ #define CL_INVALID_DX9_MEDIA_ADAPTER_KHR -1010 #define CL_INVALID_DX9_MEDIA_SURFACE_KHR -1011 #define CL_DX9_MEDIA_SURFACE_ALREADY_ACQUIRED_KHR -1012 #define CL_DX9_MEDIA_SURFACE_NOT_ACQUIRED_KHR -1013 /* cl_media_adapter_type_khr */ #define CL_ADAPTER_D3D9_KHR 0x2020 #define CL_ADAPTER_D3D9EX_KHR 0x2021 #define CL_ADAPTER_DXVA_KHR 0x2022 /* cl_media_adapter_set_khr */ #define CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2023 #define CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2024 /* cl_context_info */ #define CL_CONTEXT_ADAPTER_D3D9_KHR 0x2025 #define CL_CONTEXT_ADAPTER_D3D9EX_KHR 0x2026 #define CL_CONTEXT_ADAPTER_DXVA_KHR 0x2027 /* cl_mem_info */ #define CL_MEM_DX9_MEDIA_ADAPTER_TYPE_KHR 0x2028 #define CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR 0x2029 /* cl_image_info */ #define CL_IMAGE_DX9_MEDIA_PLANE_KHR 0x202A /* cl_command_type */ #define CL_COMMAND_ACQUIRE_DX9_MEDIA_SURFACES_KHR 0x202B #define CL_COMMAND_RELEASE_DX9_MEDIA_SURFACES_KHR 0x202C /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromDX9MediaAdapterKHR_fn)( cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr * media_adapter_type, void * media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceKHR_fn)( cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void * surface_info, cl_uint plane, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_DX9_MEDIA_SHARING_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl_egl.h000066400000000000000000000123561450307266000232270ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ #ifndef __OPENCL_CL_EGL_H #define __OPENCL_CL_EGL_H #ifdef __APPLE__ #else #include #endif #ifdef __cplusplus extern "C" { #endif /* Command type for events created with clEnqueueAcquireEGLObjectsKHR */ #define CL_COMMAND_EGL_FENCE_SYNC_OBJECT_KHR 0x202F #define CL_COMMAND_ACQUIRE_EGL_OBJECTS_KHR 0x202D #define CL_COMMAND_RELEASE_EGL_OBJECTS_KHR 0x202E /* Error type for clCreateFromEGLImageKHR */ #define CL_INVALID_EGL_OBJECT_KHR -1093 #define CL_EGL_RESOURCE_NOT_ACQUIRED_KHR -1092 /* CLeglImageKHR is an opaque handle to an EGLImage */ typedef void* CLeglImageKHR; /* CLeglDisplayKHR is an opaque handle to an EGLDisplay */ typedef void* CLeglDisplayKHR; /* CLeglSyncKHR is an opaque handle to an EGLSync object */ typedef void* CLeglSyncKHR; /* properties passed to clCreateFromEGLImageKHR */ typedef intptr_t cl_egl_image_properties_khr; #define cl_khr_egl_image 1 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromEGLImageKHR(cl_context /* context */, CLeglDisplayKHR /* egldisplay */, CLeglImageKHR /* eglimage */, cl_mem_flags /* flags */, const cl_egl_image_properties_khr * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromEGLImageKHR_fn)( cl_context context, CLeglDisplayKHR egldisplay, CLeglImageKHR eglimage, cl_mem_flags flags, const cl_egl_image_properties_khr * properties, cl_int * errcode_ret); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireEGLObjectsKHR(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireEGLObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseEGLObjectsKHR(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseEGLObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); #define cl_khr_egl_event 1 extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromEGLSyncKHR(cl_context /* context */, CLeglSyncKHR /* sync */, CLeglDisplayKHR /* display */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_event (CL_API_CALL *clCreateEventFromEGLSyncKHR_fn)( cl_context context, CLeglSyncKHR sync, CLeglDisplayKHR display, cl_int * errcode_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_EGL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl_ext.h000066400000000000000000000770401450307266000232610ustar00rootroot00000000000000/* Modifications Copyright (C) 2010-2021 Advanced Micro Devices, Inc. */ /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /* $Revision: 11928 $ on $Date: 2010-07-13 09:04:56 -0700 (Tue, 13 Jul 2010) $ */ /* cl_ext.h contains OpenCL extensions which don't have external */ /* (OpenGL, D3D) dependencies. */ #ifndef __CL_EXT_H #define __CL_EXT_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #include #else #include #endif /* cl_khr_fp16 extension - no extension #define since it has no functions */ #define CL_DEVICE_HALF_FP_CONFIG 0x1033 /* Memory object destruction * * Apple extension for use to manage externally allocated buffers used with cl_mem objects with CL_MEM_USE_HOST_PTR * * Registers a user callback function that will be called when the memory object is deleted and its resources * freed. Each call to clSetMemObjectCallbackFn registers the specified user callback function on a callback * stack associated with memobj. The registered user callback functions are called in the reverse order in * which they were registered. The user callback functions are called and then the memory object is deleted * and its resources freed. This provides a mechanism for the application (and libraries) using memobj to be * notified when the memory referenced by host_ptr, specified when the memory object is created and used as * the storage bits for the memory object, can be reused or freed. * * The application may not call CL api's with the cl_mem object passed to the pfn_notify. * * Please check for the "cl_APPLE_SetMemObjectDestructor" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. */ #define cl_APPLE_SetMemObjectDestructor 1 cl_int CL_API_ENTRY clSetMemObjectDestructorAPPLE( cl_mem /* memobj */, void (* /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/), void * /*user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* Context Logging Functions * * The next three convenience functions are intended to be used as the pfn_notify parameter to clCreateContext(). * Please check for the "cl_APPLE_ContextLoggingFunctions" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. * * clLogMessagesToSystemLog fowards on all log messages to the Apple System Logger */ #define cl_APPLE_ContextLoggingFunctions 1 extern void CL_API_ENTRY clLogMessagesToSystemLogAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStdout sends all log messages to the file descriptor stdout */ extern void CL_API_ENTRY clLogMessagesToStdoutAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStderr sends all log messages to the file descriptor stderr */ extern void CL_API_ENTRY clLogMessagesToStderrAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /************************ * cl_khr_icd extension * ************************/ #define cl_khr_icd 1 /* cl_platform_info */ #define CL_PLATFORM_ICD_SUFFIX_KHR 0x0920 /* Additional Error Codes */ #define CL_PLATFORM_NOT_FOUND_KHR -1001 extern CL_API_ENTRY cl_int CL_API_CALL clIcdGetPlatformIDsKHR(cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */); typedef CL_API_ENTRY cl_int (CL_API_CALL *clIcdGetPlatformIDsKHR_fn)( cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */); /* Extension: cl_khr_image2D_buffer * * This extension allows a 2D image to be created from a cl_mem buffer without a copy. * The type associated with a 2D image created from a buffer in an OpenCL program is image2d_t. * Both the sampler and sampler-less read_image built-in functions are supported for 2D images * and 2D images created from a buffer. Similarly, the write_image built-ins are also supported * for 2D images created from a buffer. * * When the 2D image from buffer is created, the client must specify the width, * height, image format (i.e. channel order and channel data type) and optionally the row pitch * * The pitch specified must be a multiple of CL_DEVICE_IMAGE_PITCH_ALIGNMENT pixels. * The base address of the buffer must be aligned to CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT pixels. */ /************************************* * cl_khr_initalize_memory extension * *************************************/ #define CL_CONTEXT_MEMORY_INITIALIZE_KHR 0x2030 /************************************** * cl_khr_terminate_context extension * **************************************/ #define CL_DEVICE_TERMINATE_CAPABILITY_KHR 0x2031 #define CL_CONTEXT_TERMINATE_KHR 0x2032 #define cl_khr_terminate_context 1 extern CL_API_ENTRY cl_int CL_API_CALL clTerminateContextKHR(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clTerminateContextKHR_fn)(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2; /* * Extension: cl_khr_spir * * This extension adds support to create an OpenCL program object from a * Standard Portable Intermediate Representation (SPIR) instance */ #define CL_DEVICE_SPIR_VERSIONS 0x40E0 #define CL_PROGRAM_BINARY_TYPE_INTERMEDIATE 0x40E1 /****************************************** * cl_nv_device_attribute_query extension * ******************************************/ /* cl_nv_device_attribute_query extension - no extension #define since it has no functions */ #define CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV 0x4000 #define CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV 0x4001 #define CL_DEVICE_REGISTERS_PER_BLOCK_NV 0x4002 #define CL_DEVICE_WARP_SIZE_NV 0x4003 #define CL_DEVICE_GPU_OVERLAP_NV 0x4004 #define CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV 0x4005 #define CL_DEVICE_INTEGRATED_MEMORY_NV 0x4006 /********************************* * cl_amd_device_memory_flags * *********************************/ #define cl_amd_device_memory_flags 1 #define CL_MEM_USE_PERSISTENT_MEM_AMD (1 << 6) // Alloc from GPU's CPU visible heap /* cl_device_info */ #define CL_DEVICE_MAX_ATOMIC_COUNTERS_EXT 0x4032 /********************************* * cl_amd_device_attribute_query * *********************************/ #define CL_DEVICE_PROFILING_TIMER_OFFSET_AMD 0x4036 #define CL_DEVICE_TOPOLOGY_AMD 0x4037 #define CL_DEVICE_BOARD_NAME_AMD 0x4038 #define CL_DEVICE_GLOBAL_FREE_MEMORY_AMD 0x4039 #define CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD 0x4040 #define CL_DEVICE_SIMD_WIDTH_AMD 0x4041 #define CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD 0x4042 #define CL_DEVICE_WAVEFRONT_WIDTH_AMD 0x4043 #define CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD 0x4044 #define CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD 0x4045 #define CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD 0x4046 #define CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD 0x4047 #define CL_DEVICE_LOCAL_MEM_BANKS_AMD 0x4048 #define CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD 0x4049 #define CL_DEVICE_GFXIP_MAJOR_AMD 0x404A #define CL_DEVICE_GFXIP_MINOR_AMD 0x404B #define CL_DEVICE_AVAILABLE_ASYNC_QUEUES_AMD 0x404C #define CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD 0x4030 #define CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD 0x4031 #define CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD 0x4033 #define CL_DEVICE_PCIE_ID_AMD 0x4034 typedef union { struct { cl_uint type; cl_uint data[5]; } raw; struct { cl_uint type; cl_uchar unused[17]; cl_uchar bus; cl_uchar device; cl_uchar function; } pcie; } cl_device_topology_amd; #define CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD 1 /************************** * cl_amd_offline_devices * **************************/ #define CL_CONTEXT_OFFLINE_DEVICES_AMD 0x403F /******************************** * cl_amd_bus_addressable_memory * ********************************/ /* cl_mem flag - bitfield */ #define CL_MEM_BUS_ADDRESSABLE_AMD (1<<30) #define CL_MEM_EXTERNAL_PHYSICAL_AMD (1<<31) #define CL_COMMAND_WAIT_SIGNAL_AMD 0x4080 #define CL_COMMAND_WRITE_SIGNAL_AMD 0x4081 #define CL_COMMAND_MAKE_BUFFERS_RESIDENT_AMD 0x4082 typedef struct _cl_bus_address_amd { cl_ulong surface_bus_address; cl_ulong marker_bus_address; } cl_bus_address_amd; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueWaitSignalAMD_fn)( cl_command_queue /*command_queue*/, cl_mem /*mem_object*/, cl_uint /*value*/, cl_uint /*num_events*/, const cl_event * /*event_wait_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueWriteSignalAMD_fn)( cl_command_queue /*command_queue*/, cl_mem /*mem_object*/, cl_uint /*value*/, cl_ulong /*offset*/, cl_uint /*num_events*/, const cl_event * /*event_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueMakeBuffersResidentAMD_fn)( cl_command_queue /*command_queue*/, cl_uint /*num_mem_objs*/, cl_mem * /*mem_objects*/, cl_bool /*blocking_make_resident*/, cl_bus_address_amd * /*bus_addresses*/, cl_uint /*num_events*/, const cl_event * /*event_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; /************************* * cl_amd_copy_buffer_p2p * **************************/ #define CL_DEVICE_NUM_P2P_DEVICES_AMD 0x4088 #define CL_DEVICE_P2P_DEVICES_AMD 0x4089 #define cl_amd_copy_buffer_p2p 1 typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueCopyBufferP2PAMD_fn)(cl_command_queue /*command_queue*/, cl_mem /*src_buffer*/, cl_mem /*dst_buffer*/, size_t /*src_offset*/, size_t /*dst_offset*/, size_t /*cb*/, cl_uint /*num_events_in_wait_list*/, const cl_event* /*event_wait_list*/, cl_event* /*event*/) CL_EXT_SUFFIX__VERSION_1_2; /*********************************** * cl_amd_assembly_program extension * ***********************************/ #define cl_amd_assembly_program 1 typedef CL_API_ENTRY cl_program (CL_API_CALL * clCreateProgramWithAssemblyAMD_fn) ( cl_context /* context */, cl_uint /* count */, const char** /* strings */, const size_t* /* lengths */, cl_int* /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2; #ifdef CL_VERSION_2_0 /******************************** * cl_amd_planar_yuv * ********************************/ /* cl_mem flag - bitfield */ #define CL_YUV_IMAGE_Y_PLANE_AMD 0x0 #define CL_YUV_IMAGE_UV_PLANE_AMD 0x1 typedef CL_API_ENTRY cl_mem (CL_API_CALL * clGetPlaneFromImageAMD_fn)(cl_context /*context*/, cl_mem /*mem*/, cl_uint /*plane*/, cl_int * /*errcode_ret*/) CL_EXT_SUFFIX__VERSION_2_0; #endif // /************************** * cl_amd_command_queue_info * **************************/ #define CL_QUEUE_THREAD_HANDLE_AMD 0x403E /* cl_kernel_exec_info for DVR DOPP texture support */ #define CL_KERNEL_EXEC_INFO_NEW_VCOP_AMD 0x4120 #define CL_KERNEL_EXEC_INFO_PFPA_VCOP_AMD 0x4121 // /********************************* * cl_arm_printf extension *********************************/ #define CL_PRINTF_CALLBACK_ARM 0x40B0 #define CL_PRINTF_BUFFERSIZE_ARM 0x40B1 #ifdef CL_VERSION_1_1 /*********************************** * cl_ext_device_fission extension * ***********************************/ #define cl_ext_device_fission 1 extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clReleaseDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clRetainDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef cl_ulong cl_device_partition_property_ext; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevicesEXT( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int ( CL_API_CALL * clCreateSubDevicesEXT_fn)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; /* cl_device_partition_property_ext */ #define CL_DEVICE_PARTITION_EQUALLY_EXT 0x4050 #define CL_DEVICE_PARTITION_BY_COUNTS_EXT 0x4051 #define CL_DEVICE_PARTITION_BY_NAMES_EXT 0x4052 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT 0x4053 /* clDeviceGetInfo selectors */ #define CL_DEVICE_PARENT_DEVICE_EXT 0x4054 #define CL_DEVICE_PARTITION_TYPES_EXT 0x4055 #define CL_DEVICE_AFFINITY_DOMAINS_EXT 0x4056 #define CL_DEVICE_REFERENCE_COUNT_EXT 0x4057 #define CL_DEVICE_PARTITION_STYLE_EXT 0x4058 /* clGetImageInfo enum */ #define CL_IMAGE_BYTE_PITCH_AMD 0x4059 /* error codes */ #define CL_DEVICE_PARTITION_FAILED_EXT -1057 #define CL_INVALID_PARTITION_COUNT_EXT -1058 #define CL_INVALID_PARTITION_NAME_EXT -1059 /* CL_AFFINITY_DOMAINs */ #define CL_AFFINITY_DOMAIN_L1_CACHE_EXT 0x1 #define CL_AFFINITY_DOMAIN_L2_CACHE_EXT 0x2 #define CL_AFFINITY_DOMAIN_L3_CACHE_EXT 0x3 #define CL_AFFINITY_DOMAIN_L4_CACHE_EXT 0x4 #define CL_AFFINITY_DOMAIN_NUMA_EXT 0x10 #define CL_AFFINITY_DOMAIN_NEXT_FISSIONABLE_EXT 0x100 /* cl_device_partition_property_ext list terminators */ #define CL_PROPERTIES_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_COUNTS_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_NAMES_LIST_END_EXT ((cl_device_partition_property_ext) 0 - 1) /********************************* * cl_qcom_ext_host_ptr extension *********************************/ #define CL_MEM_EXT_HOST_PTR_QCOM (1 << 29) #define CL_DEVICE_EXT_MEM_PADDING_IN_BYTES_QCOM 0x40A0 #define CL_DEVICE_PAGE_SIZE_QCOM 0x40A1 #define CL_IMAGE_ROW_ALIGNMENT_QCOM 0x40A2 #define CL_IMAGE_SLICE_ALIGNMENT_QCOM 0x40A3 #define CL_MEM_HOST_UNCACHED_QCOM 0x40A4 #define CL_MEM_HOST_WRITEBACK_QCOM 0x40A5 #define CL_MEM_HOST_WRITETHROUGH_QCOM 0x40A6 #define CL_MEM_HOST_WRITE_COMBINING_QCOM 0x40A7 typedef cl_uint cl_image_pitch_info_qcom; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceImageInfoQCOM(cl_device_id device, size_t image_width, size_t image_height, const cl_image_format *image_format, cl_image_pitch_info_qcom param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); typedef struct _cl_mem_ext_host_ptr { /* Type of external memory allocation. */ /* Legal values will be defined in layered extensions. */ cl_uint allocation_type; /* Host cache policy for this external memory allocation. */ cl_uint host_cache_policy; } cl_mem_ext_host_ptr; /********************************* * cl_qcom_ion_host_ptr extension *********************************/ #define CL_MEM_ION_HOST_PTR_QCOM 0x40A8 typedef struct _cl_mem_ion_host_ptr { /* Type of external memory allocation. */ /* Must be CL_MEM_ION_HOST_PTR_QCOM for ION allocations. */ cl_mem_ext_host_ptr ext_host_ptr; /* ION file descriptor */ int ion_filedesc; /* Host pointer to the ION allocated memory */ void* ion_hostptr; } cl_mem_ion_host_ptr; #endif /* CL_VERSION_1_1 */ #if defined(CL_VERSION_1_2) /****************************************** * cl_img_yuv_image extension * ******************************************/ /* Image formats used in clCreateImage */ #define CL_NV21_IMG 0x40D0 #define CL_YV12_IMG 0x40D1 /****************************************** * cl_img_cached_allocations extension * ******************************************/ /* Flag values used by clCreteBuffer */ #define CL_MEM_USE_UNCACHED_CPU_MEMORY_IMG (1 << 26) #define CL_MEM_USE_CACHED_CPU_MEMORY_IMG (1 << 27) /****************************************** * cl_img_use_gralloc_ptr extension * ******************************************/ /* Flag values used by clCreteBuffer */ #define CL_MEM_USE_GRALLOC_PTR_IMG (1 << 28) /* To be used by clGetEventInfo: */ #define CL_COMMAND_ACQUIRE_GRALLOC_OBJECTS_IMG 0x40D2 #define CL_COMMAND_RELEASE_GRALLOC_OBJECTS_IMG 0x40D3 /* Error code from clEnqueueReleaseGrallocObjectsIMG */ #define CL_GRALLOC_RESOURCE_NOT_ACQUIRED_IMG 0x40D4 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGrallocObjectsIMG(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGrallocObjectsIMG(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; #endif /* CL_VERSION_1_2 */ #ifdef CL_VERSION_2_0 /********************************* * cl_khr_subgroups extension *********************************/ #define cl_khr_subgroups 1 /* cl_kernel_sub_group_info is declared in CL.h. */ /* cl_kernel_sub_group_info */ #define CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR 0x2033 #define CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE_KHR 0x2034 extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfoKHR(cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ) CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED; typedef CL_API_ENTRY cl_int ( CL_API_CALL * clGetKernelSubGroupInfoKHR_fn)(cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ) CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED; #endif /* CL_VERSION_2_0 */ #ifdef CL_VERSION_2_1 /********************************* * cl_khr_priority_hints extension *********************************/ #define cl_khr_priority_hints 1 typedef cl_uint cl_queue_priority_khr; /* cl_command_queue_properties */ #define CL_QUEUE_PRIORITY_KHR 0x1096 /* cl_queue_priority_khr */ #define CL_QUEUE_PRIORITY_HIGH_KHR (1<<0) #define CL_QUEUE_PRIORITY_MED_KHR (1<<1) #define CL_QUEUE_PRIORITY_LOW_KHR (1<<2) #endif /* CL_VERSION_2_1 */ #ifdef CL_VERSION_2_1 /********************************* * cl_khr_throttle_hints extension *********************************/ #define cl_khr_throttle_hints 1 typedef cl_uint cl_queue_throttle_khr; /* cl_command_queue_properties */ #define CL_QUEUE_THROTTLE_KHR 0x1097 /* cl_queue_throttle_khr */ #define CL_QUEUE_THROTTLE_HIGH_KHR (1<<0) #define CL_QUEUE_THROTTLE_MED_KHR (1<<1) #define CL_QUEUE_THROTTLE_LOW_KHR (1<<2) #endif /* CL_VERSION_2_1 */ /********************************** * cl_arm_import_memory extension * **********************************/ #ifdef CL_VERSION_1_0 typedef intptr_t cl_import_properties_arm; /* Default and valid proporties name for cl_arm_import_memory */ #define CL_IMPORT_TYPE_ARM 0x40B2 /* Host process memory type default value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_HOST_ARM 0x40B3 /* DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_DMA_BUF_ARM 0x40B4 /* Secure DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_SECURE_ARM 0x40B5 /* This extension adds a new function that allows for direct memory import into * OpenCL via the clImportMemoryARM function. * * Memory imported through this interface will be mapped into the device's page * tables directly, providing zero copy access. It will never fall back to copy * operations and aliased buffers. * * Types of memory supported for import are specified as additional extension * strings. * * This extension produces cl_mem allocations which are compatible with all other * users of cl_mem in the standard API. * * This extension maps pages with the same properties as the normal buffer creation * function clCreateBuffer. */ extern CL_API_ENTRY cl_mem CL_API_CALL clImportMemoryARM( cl_context context, cl_mem_flags flags, const cl_import_properties_arm *properties, void *memory, size_t size, cl_int *errcode_ret) CL_EXT_SUFFIX__VERSION_1_0; #endif /* CL_VERSION_1_0 */ /****************************************** * cl_arm_shared_virtual_memory extension * ******************************************/ #ifdef CL_VERSION_1_2 /* Used by clGetDeviceInfo */ #define CL_DEVICE_SVM_CAPABILITIES_ARM 0x40B6 /* Used by clGetMemObjectInfo */ #define CL_MEM_USES_SVM_POINTER_ARM 0x40B7 /* Used by clSetKernelExecInfoARM: */ #define CL_KERNEL_EXEC_INFO_SVM_PTRS_ARM 0x40B8 #define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM_ARM 0x40B9 /* To be used by clGetEventInfo: */ #define CL_COMMAND_SVM_FREE_ARM 0x40BA #define CL_COMMAND_SVM_MEMCPY_ARM 0x40BB #define CL_COMMAND_SVM_MEMFILL_ARM 0x40BC #define CL_COMMAND_SVM_MAP_ARM 0x40BD #define CL_COMMAND_SVM_UNMAP_ARM 0x40BE /* Flag values returned by clGetDeviceInfo with CL_DEVICE_SVM_CAPABILITIES_ARM as the param_name. */ #define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER_ARM (1 << 0) #define CL_DEVICE_SVM_FINE_GRAIN_BUFFER_ARM (1 << 1) #define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM_ARM (1 << 2) #define CL_DEVICE_SVM_ATOMICS_ARM (1 << 3) /* Flag values used by clSVMAllocARM: */ #define CL_MEM_SVM_FINE_GRAIN_BUFFER_ARM (1 << 10) #define CL_MEM_SVM_ATOMICS_ARM (1 << 11) typedef cl_bitfield cl_svm_mem_flags_arm; typedef cl_uint cl_kernel_exec_info_arm; typedef cl_bitfield cl_device_svm_capabilities_arm; extern CL_API_ENTRY void * CL_API_CALL clSVMAllocARM(cl_context /* context */, cl_svm_mem_flags_arm /* flags */, size_t /* size */, cl_uint /* alignment */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY void CL_API_CALL clSVMFreeARM(cl_context /* context */, void * /* svm_pointer */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFreeARM(cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void (CL_CALLBACK * /*pfn_free_func*/)(cl_command_queue /* queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void * /* user_data */), void * /* user_data */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpyARM(cl_command_queue /* command_queue */, cl_bool /* blocking_copy */, void * /* dst_ptr */, const void * /* src_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFillARM(cl_command_queue /* command_queue */, void * /* svm_ptr */, const void * /* pattern */, size_t /* pattern_size */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMapARM(cl_command_queue /* command_queue */, cl_bool /* blocking_map */, cl_map_flags /* flags */, void * /* svm_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmapARM(cl_command_queue /* command_queue */, void * /* svm_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointerARM(cl_kernel /* kernel */, cl_uint /* arg_index */, const void * /* arg_value */) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfoARM(cl_kernel /* kernel */, cl_kernel_exec_info_arm /* param_name */, size_t /* param_value_size */, const void * /* param_value */) CL_EXT_SUFFIX__VERSION_1_2; #endif /* CL_VERSION_1_2 */ #ifdef __cplusplus } #endif #endif /* __CL_EXT_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl_gl.h000066400000000000000000000166371450307266000230700ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ #ifndef __OPENCL_CL_GL_H #define __OPENCL_CL_GL_H #ifdef __APPLE__ #include #else #include #endif #ifdef __cplusplus extern "C" { #endif typedef cl_uint cl_gl_object_type; typedef cl_uint cl_gl_texture_info; typedef cl_uint cl_gl_platform_info; typedef struct __GLsync *cl_GLsync; /* cl_gl_object_type = 0x2000 - 0x200F enum values are currently taken */ #define CL_GL_OBJECT_BUFFER 0x2000 #define CL_GL_OBJECT_TEXTURE2D 0x2001 #define CL_GL_OBJECT_TEXTURE3D 0x2002 #define CL_GL_OBJECT_RENDERBUFFER 0x2003 #define CL_GL_OBJECT_TEXTURE2D_ARRAY 0x200E #define CL_GL_OBJECT_TEXTURE1D 0x200F #define CL_GL_OBJECT_TEXTURE1D_ARRAY 0x2010 #define CL_GL_OBJECT_TEXTURE_BUFFER 0x2011 /* cl_gl_texture_info */ #define CL_GL_TEXTURE_TARGET 0x2004 #define CL_GL_MIPMAP_LEVEL 0x2005 #define CL_GL_NUM_SAMPLES 0x2012 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLBuffer(cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* bufobj */, int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLRenderbuffer(cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* renderbuffer */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLObjectInfo(cl_mem /* memobj */, cl_gl_object_type * /* gl_object_type */, cl_GLuint * /* gl_object_name */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLTextureInfo(cl_mem /* memobj */, cl_gl_texture_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGLObjects(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGLObjects(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture2D(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture3D(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* cl_khr_gl_sharing extension */ #define cl_khr_gl_sharing 1 typedef cl_uint cl_gl_context_info; /* Additional Error Codes */ #define CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR -1000 /* cl_gl_context_info */ #define CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR 0x2006 #define CL_DEVICES_FOR_GL_CONTEXT_KHR 0x2007 /* Additional cl_context_properties */ #define CL_GL_CONTEXT_KHR 0x2008 #define CL_EGL_DISPLAY_KHR 0x2009 #define CL_GLX_DISPLAY_KHR 0x200A #define CL_WGL_HDC_KHR 0x200B #define CL_CGL_SHAREGROUP_KHR 0x200C extern CL_API_ENTRY cl_int CL_API_CALL clGetGLContextInfoKHR(const cl_context_properties * /* properties */, cl_gl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetGLContextInfoKHR_fn)( const cl_context_properties * properties, cl_gl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl_gl_ext.h000066400000000000000000000054651450307266000237450ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ /* cl_gl_ext.h contains vendor (non-KHR) OpenCL extensions which have */ /* OpenGL dependencies. */ #ifndef __OPENCL_CL_GL_EXT_H #define __OPENCL_CL_GL_EXT_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #else #include #endif /* * For each extension, follow this template * cl_VEN_extname extension */ /* #define cl_VEN_extname 1 * ... define new types, if any * ... define new tokens, if any * ... define new APIs, if any * * If you need GLtypes here, mirror them with a cl_GLtype, rather than including a GL header * This allows us to avoid having to decide whether to include GL headers or GLES here. */ /* * cl_khr_gl_event extension * See section 9.9 in the OpenCL 1.1 spec for more information */ #define CL_COMMAND_GL_FENCE_SYNC_OBJECT_KHR 0x200D extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromGLsyncKHR(cl_context /* context */, cl_GLsync /* cl_GLsync */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_EXT_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/cl_platform.h000066400000000000000000001324451450307266000243060ustar00rootroot00000000000000/********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11803 $ on $Date: 2010-06-25 10:02:12 -0700 (Fri, 25 Jun 2010) $ */ #ifndef __CL_PLATFORM_H #define __CL_PLATFORM_H #ifdef __APPLE__ /* Contains #defines for AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER below */ #include #endif #ifdef __cplusplus extern "C" { #endif #if defined(_WIN32) #define CL_API_ENTRY #define CL_API_CALL __stdcall #define CL_CALLBACK __stdcall #else #define CL_API_ENTRY #define CL_API_CALL #define CL_CALLBACK #endif /* * Deprecation flags refer to the last version of the header in which the * feature was not deprecated. * * E.g. VERSION_1_1_DEPRECATED means the feature is present in 1.1 without * deprecation but is deprecated in versions later than 1.1. */ #ifdef __APPLE__ #define CL_EXTENSION_WEAK_LINK __attribute__((weak_import)) #define CL_API_SUFFIX__VERSION_1_0 AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_0 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER #define CL_API_SUFFIX__VERSION_1_1 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define GCL_API_SUFFIX__VERSION_1_1 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_1 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER_BUT_DEPRECATED_IN_MAC_OS_X_VERSION_10_7 #ifdef AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define GCL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_2 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER_BUT_DEPRECATED_IN_MAC_OS_X_VERSION_10_8 #else #warning This path should never happen outside of internal operating system development. AvailabilityMacros do not function correctly here! #define CL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define GCL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_2 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #endif #else #define CL_EXTENSION_WEAK_LINK #define CL_API_SUFFIX__VERSION_1_0 #define CL_EXT_SUFFIX__VERSION_1_0 #define CL_API_SUFFIX__VERSION_1_1 #define CL_EXT_SUFFIX__VERSION_1_1 #define CL_API_SUFFIX__VERSION_1_2 #define CL_EXT_SUFFIX__VERSION_1_2 #define CL_API_SUFFIX__VERSION_2_0 #define CL_EXT_SUFFIX__VERSION_2_0 #define CL_API_SUFFIX__VERSION_2_1 #define CL_EXT_SUFFIX__VERSION_2_1 #ifdef __GNUC__ #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_2_0_APIS #define CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_2_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_2_0_DEPRECATED #endif #elif defined(_WIN32) #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED __declspec(deprecated) #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED __declspec(deprecated) #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED __declspec(deprecated) #endif #ifdef CL_USE_DEPRECATED_OPENCL_2_0_APIS #define CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_2_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_2_0_DEPRECATED __declspec(deprecated) #endif #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #define CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_2_0_DEPRECATED #endif #endif #if (defined (_WIN32) && defined(_MSC_VER)) /* scalar types */ typedef signed __int8 cl_char; typedef unsigned __int8 cl_uchar; typedef signed __int16 cl_short; typedef unsigned __int16 cl_ushort; typedef signed __int32 cl_int; typedef unsigned __int32 cl_uint; typedef signed __int64 cl_long; typedef unsigned __int64 cl_ulong; typedef unsigned __int16 cl_half; typedef float cl_float; typedef double cl_double; /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 1.1920928955078125e-7f #define CL_HALF_DIG 3 #define CL_HALF_MANT_DIG 11 #define CL_HALF_MAX_10_EXP +4 #define CL_HALF_MAX_EXP +16 #define CL_HALF_MIN_10_EXP -4 #define CL_HALF_MIN_EXP -13 #define CL_HALF_RADIX 2 #define CL_HALF_MAX 65504.0f #define CL_HALF_MIN 6.103515625e-05f #define CL_HALF_EPSILON 9.765625e-04f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 1.7976931348623158e+308 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.7182818284590452354 #define CL_M_LOG2E 1.4426950408889634074 #define CL_M_LOG10E 0.43429448190325182765 #define CL_M_LN2 0.69314718055994530942 #define CL_M_LN10 2.30258509299404568402 #define CL_M_PI 3.14159265358979323846 #define CL_M_PI_2 1.57079632679489661923 #define CL_M_PI_4 0.78539816339744830962 #define CL_M_1_PI 0.31830988618379067154 #define CL_M_2_PI 0.63661977236758134308 #define CL_M_2_SQRTPI 1.12837916709551257390 #define CL_M_SQRT2 1.41421356237309504880 #define CL_M_SQRT1_2 0.70710678118654752440 #define CL_M_E_F 2.718281828f #define CL_M_LOG2E_F 1.442695041f #define CL_M_LOG10E_F 0.434294482f #define CL_M_LN2_F 0.693147181f #define CL_M_LN10_F 2.302585093f #define CL_M_PI_F 3.141592654f #define CL_M_PI_2_F 1.570796327f #define CL_M_PI_4_F 0.785398163f #define CL_M_1_PI_F 0.318309886f #define CL_M_2_PI_F 0.636619772f #define CL_M_2_SQRTPI_F 1.128379167f #define CL_M_SQRT2_F 1.414213562f #define CL_M_SQRT1_2_F 0.707106781f #define CL_NAN (CL_INFINITY - CL_INFINITY) #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #else #include /* scalar types */ typedef int8_t cl_char; typedef uint8_t cl_uchar; typedef int16_t cl_short __attribute__((aligned(2))); typedef uint16_t cl_ushort __attribute__((aligned(2))); typedef int32_t cl_int __attribute__((aligned(4))); typedef uint32_t cl_uint __attribute__((aligned(4))); typedef int64_t cl_long __attribute__((aligned(8))); typedef uint64_t cl_ulong __attribute__((aligned(8))); typedef uint16_t cl_half __attribute__((aligned(2))); typedef float cl_float __attribute__((aligned(4))); typedef double cl_double __attribute__((aligned(8))); /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 1.1920928955078125e-7f #define CL_HALF_DIG 3 #define CL_HALF_MANT_DIG 11 #define CL_HALF_MAX_10_EXP +4 #define CL_HALF_MAX_EXP +16 #define CL_HALF_MIN_10_EXP -4 #define CL_HALF_MIN_EXP -13 #define CL_HALF_RADIX 2 #define CL_HALF_MAX 65504.0f #define CL_HALF_MIN 6.103515625e-05f #define CL_HALF_EPSILON 9.765625e-04f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.7182818284590452354 #define CL_M_LOG2E 1.4426950408889634074 #define CL_M_LOG10E 0.43429448190325182765 #define CL_M_LN2 0.69314718055994530942 #define CL_M_LN10 2.30258509299404568402 #define CL_M_PI 3.14159265358979323846 #define CL_M_PI_2 1.57079632679489661923 #define CL_M_PI_4 0.78539816339744830962 #define CL_M_1_PI 0.31830988618379067154 #define CL_M_2_PI 0.63661977236758134308 #define CL_M_2_SQRTPI 1.12837916709551257390 #define CL_M_SQRT2 1.41421356237309504880 #define CL_M_SQRT1_2 0.70710678118654752440 #define CL_M_E_F 2.718281828f #define CL_M_LOG2E_F 1.442695041f #define CL_M_LOG10E_F 0.434294482f #define CL_M_LN2_F 0.693147181f #define CL_M_LN10_F 2.302585093f #define CL_M_PI_F 3.141592654f #define CL_M_PI_2_F 1.570796327f #define CL_M_PI_4_F 0.785398163f #define CL_M_1_PI_F 0.318309886f #define CL_M_2_PI_F 0.636619772f #define CL_M_2_SQRTPI_F 1.128379167f #define CL_M_SQRT2_F 1.414213562f #define CL_M_SQRT1_2_F 0.707106781f #if defined( __GNUC__ ) #define CL_HUGE_VALF __builtin_huge_valf() #define CL_HUGE_VAL __builtin_huge_val() #define CL_NAN __builtin_nanf( "" ) #else #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) float nanf( const char * ); #define CL_NAN nanf( "" ) #endif #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #endif #include /* Mirror types to GL types. Mirror types allow us to avoid deciding which 87s to load based on whether we are using GL or GLES here. */ typedef unsigned int cl_GLuint; typedef int cl_GLint; typedef unsigned int cl_GLenum; /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ /* Define basic vector types */ #if defined( __VEC__ ) #include /* may be omitted depending on compiler. AltiVec spec provides no way to detect whether the header is required. */ typedef vector unsigned char __cl_uchar16; typedef vector signed char __cl_char16; typedef vector unsigned short __cl_ushort8; typedef vector signed short __cl_short8; typedef vector unsigned int __cl_uint4; typedef vector signed int __cl_int4; typedef vector float __cl_float4; #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_UINT4__ 1 #define __CL_INT4__ 1 #define __CL_FLOAT4__ 1 #endif #if defined( __SSE__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef float __cl_float4 __attribute__((vector_size(16))); #else typedef __m128 __cl_float4; #endif #define __CL_FLOAT4__ 1 #endif #if defined( __SSE2__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar16 __attribute__((vector_size(16))); typedef cl_char __cl_char16 __attribute__((vector_size(16))); typedef cl_ushort __cl_ushort8 __attribute__((vector_size(16))); typedef cl_short __cl_short8 __attribute__((vector_size(16))); typedef cl_uint __cl_uint4 __attribute__((vector_size(16))); typedef cl_int __cl_int4 __attribute__((vector_size(16))); typedef cl_ulong __cl_ulong2 __attribute__((vector_size(16))); typedef cl_long __cl_long2 __attribute__((vector_size(16))); typedef cl_double __cl_double2 __attribute__((vector_size(16))); #else typedef __m128i __cl_uchar16; typedef __m128i __cl_char16; typedef __m128i __cl_ushort8; typedef __m128i __cl_short8; typedef __m128i __cl_uint4; typedef __m128i __cl_int4; typedef __m128i __cl_ulong2; typedef __m128i __cl_long2; typedef __m128d __cl_double2; #endif #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_INT4__ 1 #define __CL_UINT4__ 1 #define __CL_ULONG2__ 1 #define __CL_LONG2__ 1 #define __CL_DOUBLE2__ 1 #endif #if defined( __MMX__ ) #include #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar8 __attribute__((vector_size(8))); typedef cl_char __cl_char8 __attribute__((vector_size(8))); typedef cl_ushort __cl_ushort4 __attribute__((vector_size(8))); typedef cl_short __cl_short4 __attribute__((vector_size(8))); typedef cl_uint __cl_uint2 __attribute__((vector_size(8))); typedef cl_int __cl_int2 __attribute__((vector_size(8))); typedef cl_ulong __cl_ulong1 __attribute__((vector_size(8))); typedef cl_long __cl_long1 __attribute__((vector_size(8))); typedef cl_float __cl_float2 __attribute__((vector_size(8))); #else typedef __m64 __cl_uchar8; typedef __m64 __cl_char8; typedef __m64 __cl_ushort4; typedef __m64 __cl_short4; typedef __m64 __cl_uint2; typedef __m64 __cl_int2; typedef __m64 __cl_ulong1; typedef __m64 __cl_long1; typedef __m64 __cl_float2; #endif #define __CL_UCHAR8__ 1 #define __CL_CHAR8__ 1 #define __CL_USHORT4__ 1 #define __CL_SHORT4__ 1 #define __CL_INT2__ 1 #define __CL_UINT2__ 1 #define __CL_ULONG1__ 1 #define __CL_LONG1__ 1 #define __CL_FLOAT2__ 1 #endif #if defined( __AVX__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_float __cl_float8 __attribute__((vector_size(32))); typedef cl_double __cl_double4 __attribute__((vector_size(32))); #else typedef __m256 __cl_float8; typedef __m256d __cl_double4; #endif #define __CL_FLOAT8__ 1 #define __CL_DOUBLE4__ 1 #endif /* Define capabilities for anonymous struct members. */ #if !defined(__cplusplus) && defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ #elif defined( __GNUC__) && ! defined( __STRICT_ANSI__ ) #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ __extension__ #elif defined( _WIN32) && defined(_MSC_VER) #if _MSC_VER >= 1500 /* Microsoft Developer Studio 2008 supports anonymous structs, but * complains by default. */ #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ /* Disable warning C4201: nonstandard extension used : nameless * struct/union */ #pragma warning( push ) #pragma warning( disable : 4201 ) #endif #else #define __CL_HAS_ANON_STRUCT__ 0 #define __CL_ANON_STRUCT__ #endif /* Define alignment keys */ #if defined( __GNUC__ ) #define CL_ALIGNED(_x) __attribute__ ((aligned(_x))) #elif defined( _WIN32) && (_MSC_VER) /* Alignment keys neutered on windows because MSVC can't swallow function arguments with alignment requirements */ /* http://msdn.microsoft.com/en-us/library/373ak2y1%28VS.71%29.aspx */ /* #include */ /* #define CL_ALIGNED(_x) _CRT_ALIGN(_x) */ #define CL_ALIGNED(_x) #else #warning Need to implement some method to align data here #define CL_ALIGNED(_x) #endif /* Indicate whether .xyzw, .s0123 and .hi.lo are supported */ #if __CL_HAS_ANON_STRUCT__ /* .xyzw and .s0123...{f|F} are supported */ #define CL_HAS_NAMED_VECTOR_FIELDS 1 /* .hi and .lo are supported */ #define CL_HAS_HI_LO_VECTOR_FIELDS 1 #endif /* Define cl_vector types */ /* ---- cl_charn ---- */ typedef union { cl_char CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_char lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2; #endif }cl_char2; typedef union { cl_char CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_char2 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[2]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4; #endif }cl_char4; /* cl_char3 is identical in size, alignment and behavior to cl_char4. See section 6.1.5. */ typedef cl_char4 cl_char3; typedef union { cl_char CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_char4 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[4]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[2]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8; #endif }cl_char8; typedef union { cl_char CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_char8 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[8]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[4]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8[2]; #endif #if defined( __CL_CHAR16__ ) __cl_char16 v16; #endif }cl_char16; /* ---- cl_ucharn ---- */ typedef union { cl_uchar CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uchar lo, hi; }; #endif #if defined( __cl_uchar2__) __cl_uchar2 v2; #endif }cl_uchar2; typedef union { cl_uchar CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uchar2 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[2]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4; #endif }cl_uchar4; /* cl_uchar3 is identical in size, alignment and behavior to cl_uchar4. See section 6.1.5. */ typedef cl_uchar4 cl_uchar3; typedef union { cl_uchar CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uchar4 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[4]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[2]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8; #endif }cl_uchar8; typedef union { cl_uchar CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uchar8 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[8]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[4]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8[2]; #endif #if defined( __CL_UCHAR16__ ) __cl_uchar16 v16; #endif }cl_uchar16; /* ---- cl_shortn ---- */ typedef union { cl_short CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_short lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2; #endif }cl_short2; typedef union { cl_short CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_short2 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[2]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4; #endif }cl_short4; /* cl_short3 is identical in size, alignment and behavior to cl_short4. See section 6.1.5. */ typedef cl_short4 cl_short3; typedef union { cl_short CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_short4 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[4]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[2]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8; #endif }cl_short8; typedef union { cl_short CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_short8 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[8]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[4]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8[2]; #endif #if defined( __CL_SHORT16__ ) __cl_short16 v16; #endif }cl_short16; /* ---- cl_ushortn ---- */ typedef union { cl_ushort CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ushort lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2; #endif }cl_ushort2; typedef union { cl_ushort CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ushort2 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[2]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4; #endif }cl_ushort4; /* cl_ushort3 is identical in size, alignment and behavior to cl_ushort4. See section 6.1.5. */ typedef cl_ushort4 cl_ushort3; typedef union { cl_ushort CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ushort4 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[4]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[2]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8; #endif }cl_ushort8; typedef union { cl_ushort CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ushort8 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[8]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[4]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8[2]; #endif #if defined( __CL_USHORT16__ ) __cl_ushort16 v16; #endif }cl_ushort16; /* ---- cl_halfn ---- */ typedef union { cl_half CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_half lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2; #endif }cl_half2; typedef union { cl_half CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_half2 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[2]; #endif #if defined( __CL_HALF4__) __cl_half4 v4; #endif }cl_half4; /* cl_half3 is identical in size, alignment and behavior to cl_half4. See section 6.1.5. */ typedef cl_half4 cl_half3; typedef union { cl_half CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_half4 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[4]; #endif #if defined( __CL_HALF4__) __cl_half4 v4[2]; #endif #if defined( __CL_HALF8__ ) __cl_half8 v8; #endif }cl_half8; typedef union { cl_half CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_half8 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[8]; #endif #if defined( __CL_HALF4__) __cl_half4 v4[4]; #endif #if defined( __CL_HALF8__ ) __cl_half8 v8[2]; #endif #if defined( __CL_HALF16__ ) __cl_half16 v16; #endif }cl_half16; /* ---- cl_intn ---- */ typedef union { cl_int CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_int lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2; #endif }cl_int2; typedef union { cl_int CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_int2 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[2]; #endif #if defined( __CL_INT4__) __cl_int4 v4; #endif }cl_int4; /* cl_int3 is identical in size, alignment and behavior to cl_int4. See section 6.1.5. */ typedef cl_int4 cl_int3; typedef union { cl_int CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_int4 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[4]; #endif #if defined( __CL_INT4__) __cl_int4 v4[2]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8; #endif }cl_int8; typedef union { cl_int CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_int8 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[8]; #endif #if defined( __CL_INT4__) __cl_int4 v4[4]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8[2]; #endif #if defined( __CL_INT16__ ) __cl_int16 v16; #endif }cl_int16; /* ---- cl_uintn ---- */ typedef union { cl_uint CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uint lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2; #endif }cl_uint2; typedef union { cl_uint CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uint2 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[2]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4; #endif }cl_uint4; /* cl_uint3 is identical in size, alignment and behavior to cl_uint4. See section 6.1.5. */ typedef cl_uint4 cl_uint3; typedef union { cl_uint CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uint4 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[4]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[2]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8; #endif }cl_uint8; typedef union { cl_uint CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uint8 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[8]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[4]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8[2]; #endif #if defined( __CL_UINT16__ ) __cl_uint16 v16; #endif }cl_uint16; /* ---- cl_longn ---- */ typedef union { cl_long CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_long lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2; #endif }cl_long2; typedef union { cl_long CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_long2 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[2]; #endif #if defined( __CL_LONG4__) __cl_long4 v4; #endif }cl_long4; /* cl_long3 is identical in size, alignment and behavior to cl_long4. See section 6.1.5. */ typedef cl_long4 cl_long3; typedef union { cl_long CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_long4 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[4]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[2]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8; #endif }cl_long8; typedef union { cl_long CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_long8 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[8]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[4]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8[2]; #endif #if defined( __CL_LONG16__ ) __cl_long16 v16; #endif }cl_long16; /* ---- cl_ulongn ---- */ typedef union { cl_ulong CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ulong lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2; #endif }cl_ulong2; typedef union { cl_ulong CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ulong2 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[2]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4; #endif }cl_ulong4; /* cl_ulong3 is identical in size, alignment and behavior to cl_ulong4. See section 6.1.5. */ typedef cl_ulong4 cl_ulong3; typedef union { cl_ulong CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ulong4 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[4]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[2]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8; #endif }cl_ulong8; typedef union { cl_ulong CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ulong8 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[8]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[4]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8[2]; #endif #if defined( __CL_ULONG16__ ) __cl_ulong16 v16; #endif }cl_ulong16; /* --- cl_floatn ---- */ typedef union { cl_float CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_float lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2; #endif }cl_float2; typedef union { cl_float CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_float2 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[2]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4; #endif }cl_float4; /* cl_float3 is identical in size, alignment and behavior to cl_float4. See section 6.1.5. */ typedef cl_float4 cl_float3; typedef union { cl_float CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_float4 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[4]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[2]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8; #endif }cl_float8; typedef union { cl_float CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_float8 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[8]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[4]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8[2]; #endif #if defined( __CL_FLOAT16__ ) __cl_float16 v16; #endif }cl_float16; /* --- cl_doublen ---- */ typedef union { cl_double CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_double lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2; #endif }cl_double2; typedef union { cl_double CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_double2 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[2]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4; #endif }cl_double4; /* cl_double3 is identical in size, alignment and behavior to cl_double4. See section 6.1.5. */ typedef cl_double4 cl_double3; typedef union { cl_double CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_double4 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[4]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[2]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8; #endif }cl_double8; typedef union { cl_double CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_double8 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[8]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[4]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8[2]; #endif #if defined( __CL_DOUBLE16__ ) __cl_double16 v16; #endif }cl_double16; /* Macro to facilitate debugging * Usage: * Place CL_PROGRAM_STRING_DEBUG_INFO on the line before the first line of your source. * The first line ends with: CL_PROGRAM_STRING_DEBUG_INFO \" * Each line thereafter of OpenCL C source must end with: \n\ * The last line ends in "; * * Example: * * const char *my_program = CL_PROGRAM_STRING_DEBUG_INFO "\ * kernel void foo( int a, float * b ) \n\ * { \n\ * // my comment \n\ * *b[ get_global_id(0)] = a; \n\ * } \n\ * "; * * This should correctly set up the line, (column) and file information for your source * string so you can do source level debugging. */ #define __CL_STRINGIFY( _x ) # _x #define _CL_STRINGIFY( _x ) __CL_STRINGIFY( _x ) #define CL_PROGRAM_STRING_DEBUG_INFO "#line " _CL_STRINGIFY(__LINE__) " \"" __FILE__ "\" \n\n" #ifdef __cplusplus } #endif #undef __CL_HAS_ANON_STRUCT__ #undef __CL_ANON_STRUCT__ #if defined( _WIN32) && defined(_MSC_VER) #if _MSC_VER >=1500 #pragma warning( pop ) #endif #endif #endif /* __CL_PLATFORM_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.1/CL/opencl.h000066400000000000000000000037111450307266000232550ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_H #define __OPENCL_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #include #include #include #else #include #include #include #include #endif #ifdef __cplusplus } #endif #endif /* __OPENCL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/000077500000000000000000000000001450307266000213255ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/000077500000000000000000000000001450307266000216235ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl.h000066400000000000000000002264651450307266000224110ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ #ifndef __OPENCL_CL_H #define __OPENCL_CL_H #include #include #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ typedef struct _cl_platform_id * cl_platform_id; typedef struct _cl_device_id * cl_device_id; typedef struct _cl_context * cl_context; typedef struct _cl_command_queue * cl_command_queue; typedef struct _cl_mem * cl_mem; typedef struct _cl_program * cl_program; typedef struct _cl_kernel * cl_kernel; typedef struct _cl_event * cl_event; typedef struct _cl_sampler * cl_sampler; typedef cl_uint cl_bool; /* WARNING! Unlike cl_ types in cl_platform.h, cl_bool is not guaranteed to be the same size as the bool in kernels. */ typedef cl_ulong cl_bitfield; typedef cl_bitfield cl_device_type; typedef cl_uint cl_platform_info; typedef cl_uint cl_device_info; typedef cl_bitfield cl_device_fp_config; typedef cl_uint cl_device_mem_cache_type; typedef cl_uint cl_device_local_mem_type; typedef cl_bitfield cl_device_exec_capabilities; #ifdef CL_VERSION_2_0 typedef cl_bitfield cl_device_svm_capabilities; #endif typedef cl_bitfield cl_command_queue_properties; #ifdef CL_VERSION_1_2 typedef intptr_t cl_device_partition_property; typedef cl_bitfield cl_device_affinity_domain; #endif typedef intptr_t cl_context_properties; typedef cl_uint cl_context_info; #ifdef CL_VERSION_2_0 typedef cl_bitfield cl_queue_properties; #endif typedef cl_uint cl_command_queue_info; typedef cl_uint cl_channel_order; typedef cl_uint cl_channel_type; typedef cl_bitfield cl_mem_flags; #ifdef CL_VERSION_2_0 typedef cl_bitfield cl_svm_mem_flags; #endif typedef cl_uint cl_mem_object_type; typedef cl_uint cl_mem_info; #ifdef CL_VERSION_1_2 typedef cl_bitfield cl_mem_migration_flags; #endif typedef cl_uint cl_image_info; #ifdef CL_VERSION_1_1 typedef cl_uint cl_buffer_create_type; #endif typedef cl_uint cl_addressing_mode; typedef cl_uint cl_filter_mode; typedef cl_uint cl_sampler_info; typedef cl_bitfield cl_map_flags; #ifdef CL_VERSION_2_0 typedef intptr_t cl_pipe_properties; typedef cl_uint cl_pipe_info; #endif typedef cl_uint cl_program_info; typedef cl_uint cl_program_build_info; #ifdef CL_VERSION_1_2 typedef cl_uint cl_program_binary_type; #endif typedef cl_int cl_build_status; typedef cl_uint cl_kernel_info; #ifdef CL_VERSION_1_2 typedef cl_uint cl_kernel_arg_info; typedef cl_uint cl_kernel_arg_address_qualifier; typedef cl_uint cl_kernel_arg_access_qualifier; typedef cl_bitfield cl_kernel_arg_type_qualifier; #endif typedef cl_uint cl_kernel_work_group_info; #ifdef CL_VERSION_2_1 typedef cl_uint cl_kernel_sub_group_info; #endif typedef cl_uint cl_event_info; typedef cl_uint cl_command_type; typedef cl_uint cl_profiling_info; #ifdef CL_VERSION_2_0 typedef cl_bitfield cl_sampler_properties; typedef cl_uint cl_kernel_exec_info; #endif #ifdef CL_EXPERIMENTAL typedef cl_bitfield cl_device_atomic_capabilities; typedef cl_uint cl_khronos_vendor_id; #endif typedef struct _cl_image_format { cl_channel_order image_channel_order; cl_channel_type image_channel_data_type; } cl_image_format; #ifdef CL_VERSION_1_2 typedef struct _cl_image_desc { cl_mem_object_type image_type; size_t image_width; size_t image_height; size_t image_depth; size_t image_array_size; size_t image_row_pitch; size_t image_slice_pitch; cl_uint num_mip_levels; cl_uint num_samples; #ifdef CL_VERSION_2_0 #ifdef __GNUC__ __extension__ /* Prevents warnings about anonymous union in -pedantic builds */ #endif #ifdef _MSC_VER #pragma warning( push ) #pragma warning( disable : 4201 ) /* Prevents warning about nameless struct/union in /W4 /Za builds */ #endif union { #endif cl_mem buffer; #ifdef CL_VERSION_2_0 cl_mem mem_object; }; #ifdef _MSC_VER #pragma warning( pop ) #endif #endif } cl_image_desc; #endif #ifdef CL_VERSION_1_1 typedef struct _cl_buffer_region { size_t origin; size_t size; } cl_buffer_region; #endif /******************************************************************************/ /* Error Codes */ #define CL_SUCCESS 0 #define CL_DEVICE_NOT_FOUND -1 #define CL_DEVICE_NOT_AVAILABLE -2 #define CL_COMPILER_NOT_AVAILABLE -3 #define CL_MEM_OBJECT_ALLOCATION_FAILURE -4 #define CL_OUT_OF_RESOURCES -5 #define CL_OUT_OF_HOST_MEMORY -6 #define CL_PROFILING_INFO_NOT_AVAILABLE -7 #define CL_MEM_COPY_OVERLAP -8 #define CL_IMAGE_FORMAT_MISMATCH -9 #define CL_IMAGE_FORMAT_NOT_SUPPORTED -10 #define CL_BUILD_PROGRAM_FAILURE -11 #define CL_MAP_FAILURE -12 #ifdef CL_VERSION_1_1 #define CL_MISALIGNED_SUB_BUFFER_OFFSET -13 #define CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST -14 #endif #ifdef CL_VERSION_1_2 #define CL_COMPILE_PROGRAM_FAILURE -15 #define CL_LINKER_NOT_AVAILABLE -16 #define CL_LINK_PROGRAM_FAILURE -17 #define CL_DEVICE_PARTITION_FAILED -18 #define CL_KERNEL_ARG_INFO_NOT_AVAILABLE -19 #endif #define CL_INVALID_VALUE -30 #define CL_INVALID_DEVICE_TYPE -31 #define CL_INVALID_PLATFORM -32 #define CL_INVALID_DEVICE -33 #define CL_INVALID_CONTEXT -34 #define CL_INVALID_QUEUE_PROPERTIES -35 #define CL_INVALID_COMMAND_QUEUE -36 #define CL_INVALID_HOST_PTR -37 #define CL_INVALID_MEM_OBJECT -38 #define CL_INVALID_IMAGE_FORMAT_DESCRIPTOR -39 #define CL_INVALID_IMAGE_SIZE -40 #define CL_INVALID_SAMPLER -41 #define CL_INVALID_BINARY -42 #define CL_INVALID_BUILD_OPTIONS -43 #define CL_INVALID_PROGRAM -44 #define CL_INVALID_PROGRAM_EXECUTABLE -45 #define CL_INVALID_KERNEL_NAME -46 #define CL_INVALID_KERNEL_DEFINITION -47 #define CL_INVALID_KERNEL -48 #define CL_INVALID_ARG_INDEX -49 #define CL_INVALID_ARG_VALUE -50 #define CL_INVALID_ARG_SIZE -51 #define CL_INVALID_KERNEL_ARGS -52 #define CL_INVALID_WORK_DIMENSION -53 #define CL_INVALID_WORK_GROUP_SIZE -54 #define CL_INVALID_WORK_ITEM_SIZE -55 #define CL_INVALID_GLOBAL_OFFSET -56 #define CL_INVALID_EVENT_WAIT_LIST -57 #define CL_INVALID_EVENT -58 #define CL_INVALID_OPERATION -59 #define CL_INVALID_GL_OBJECT -60 #define CL_INVALID_BUFFER_SIZE -61 #define CL_INVALID_MIP_LEVEL -62 #define CL_INVALID_GLOBAL_WORK_SIZE -63 #ifdef CL_VERSION_1_1 #define CL_INVALID_PROPERTY -64 #endif #ifdef CL_VERSION_1_2 #define CL_INVALID_IMAGE_DESCRIPTOR -65 #define CL_INVALID_COMPILER_OPTIONS -66 #define CL_INVALID_LINKER_OPTIONS -67 #define CL_INVALID_DEVICE_PARTITION_COUNT -68 #endif #ifdef CL_VERSION_2_0 #define CL_INVALID_PIPE_SIZE -69 #define CL_INVALID_DEVICE_QUEUE -70 #endif #ifdef CL_VERSION_2_2 #define CL_INVALID_SPEC_ID -71 #define CL_MAX_SIZE_RESTRICTION_EXCEEDED -72 #endif /* cl_bool */ #define CL_FALSE 0 #define CL_TRUE 1 #ifdef CL_VERSION_1_2 #define CL_BLOCKING CL_TRUE #define CL_NON_BLOCKING CL_FALSE #endif /* cl_platform_info */ #define CL_PLATFORM_PROFILE 0x0900 #define CL_PLATFORM_VERSION 0x0901 #define CL_PLATFORM_NAME 0x0902 #define CL_PLATFORM_VENDOR 0x0903 #define CL_PLATFORM_EXTENSIONS 0x0904 #ifdef CL_VERSION_2_1 #define CL_PLATFORM_HOST_TIMER_RESOLUTION 0x0905 #endif /* cl_device_type - bitfield */ #define CL_DEVICE_TYPE_DEFAULT (1 << 0) #define CL_DEVICE_TYPE_CPU (1 << 1) #define CL_DEVICE_TYPE_GPU (1 << 2) #define CL_DEVICE_TYPE_ACCELERATOR (1 << 3) #ifdef CL_VERSION_1_2 #define CL_DEVICE_TYPE_CUSTOM (1 << 4) #endif #define CL_DEVICE_TYPE_ALL 0xFFFFFFFF /* cl_device_info */ #define CL_DEVICE_TYPE 0x1000 #define CL_DEVICE_VENDOR_ID 0x1001 #define CL_DEVICE_MAX_COMPUTE_UNITS 0x1002 #define CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 0x1003 #define CL_DEVICE_MAX_WORK_GROUP_SIZE 0x1004 #define CL_DEVICE_MAX_WORK_ITEM_SIZES 0x1005 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR 0x1006 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT 0x1007 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT 0x1008 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG 0x1009 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT 0x100A #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 0x100B #define CL_DEVICE_MAX_CLOCK_FREQUENCY 0x100C #define CL_DEVICE_ADDRESS_BITS 0x100D #define CL_DEVICE_MAX_READ_IMAGE_ARGS 0x100E #define CL_DEVICE_MAX_WRITE_IMAGE_ARGS 0x100F #define CL_DEVICE_MAX_MEM_ALLOC_SIZE 0x1010 #define CL_DEVICE_IMAGE2D_MAX_WIDTH 0x1011 #define CL_DEVICE_IMAGE2D_MAX_HEIGHT 0x1012 #define CL_DEVICE_IMAGE3D_MAX_WIDTH 0x1013 #define CL_DEVICE_IMAGE3D_MAX_HEIGHT 0x1014 #define CL_DEVICE_IMAGE3D_MAX_DEPTH 0x1015 #define CL_DEVICE_IMAGE_SUPPORT 0x1016 #define CL_DEVICE_MAX_PARAMETER_SIZE 0x1017 #define CL_DEVICE_MAX_SAMPLERS 0x1018 #define CL_DEVICE_MEM_BASE_ADDR_ALIGN 0x1019 #define CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE 0x101A #define CL_DEVICE_SINGLE_FP_CONFIG 0x101B #define CL_DEVICE_GLOBAL_MEM_CACHE_TYPE 0x101C #define CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE 0x101D #define CL_DEVICE_GLOBAL_MEM_CACHE_SIZE 0x101E #define CL_DEVICE_GLOBAL_MEM_SIZE 0x101F #define CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE 0x1020 #define CL_DEVICE_MAX_CONSTANT_ARGS 0x1021 #define CL_DEVICE_LOCAL_MEM_TYPE 0x1022 #define CL_DEVICE_LOCAL_MEM_SIZE 0x1023 #define CL_DEVICE_ERROR_CORRECTION_SUPPORT 0x1024 #define CL_DEVICE_PROFILING_TIMER_RESOLUTION 0x1025 #define CL_DEVICE_ENDIAN_LITTLE 0x1026 #define CL_DEVICE_AVAILABLE 0x1027 #define CL_DEVICE_COMPILER_AVAILABLE 0x1028 #define CL_DEVICE_EXECUTION_CAPABILITIES 0x1029 #define CL_DEVICE_QUEUE_PROPERTIES 0x102A /* deprecated */ #ifdef CL_VERSION_2_0 #define CL_DEVICE_QUEUE_ON_HOST_PROPERTIES 0x102A #endif #define CL_DEVICE_NAME 0x102B #define CL_DEVICE_VENDOR 0x102C #define CL_DRIVER_VERSION 0x102D #define CL_DEVICE_PROFILE 0x102E #define CL_DEVICE_VERSION 0x102F #define CL_DEVICE_EXTENSIONS 0x1030 #define CL_DEVICE_PLATFORM 0x1031 #ifdef CL_VERSION_1_2 #define CL_DEVICE_DOUBLE_FP_CONFIG 0x1032 #endif /* 0x1033 reserved for CL_DEVICE_HALF_FP_CONFIG which is already defined in "cl_ext.h" */ #ifdef CL_VERSION_1_1 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF 0x1034 #define CL_DEVICE_HOST_UNIFIED_MEMORY 0x1035 /* deprecated */ #define CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR 0x1036 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT 0x1037 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_INT 0x1038 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG 0x1039 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT 0x103A #define CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE 0x103B #define CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF 0x103C #define CL_DEVICE_OPENCL_C_VERSION 0x103D #endif #ifdef CL_VERSION_1_2 #define CL_DEVICE_LINKER_AVAILABLE 0x103E #define CL_DEVICE_BUILT_IN_KERNELS 0x103F #define CL_DEVICE_IMAGE_MAX_BUFFER_SIZE 0x1040 #define CL_DEVICE_IMAGE_MAX_ARRAY_SIZE 0x1041 #define CL_DEVICE_PARENT_DEVICE 0x1042 #define CL_DEVICE_PARTITION_MAX_SUB_DEVICES 0x1043 #define CL_DEVICE_PARTITION_PROPERTIES 0x1044 #define CL_DEVICE_PARTITION_AFFINITY_DOMAIN 0x1045 #define CL_DEVICE_PARTITION_TYPE 0x1046 #define CL_DEVICE_REFERENCE_COUNT 0x1047 #define CL_DEVICE_PREFERRED_INTEROP_USER_SYNC 0x1048 #define CL_DEVICE_PRINTF_BUFFER_SIZE 0x1049 #endif #ifdef CL_VERSION_2_0 #define CL_DEVICE_IMAGE_PITCH_ALIGNMENT 0x104A #define CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT 0x104B #define CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS 0x104C #define CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE 0x104D #define CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES 0x104E #define CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE 0x104F #define CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE 0x1050 #define CL_DEVICE_MAX_ON_DEVICE_QUEUES 0x1051 #define CL_DEVICE_MAX_ON_DEVICE_EVENTS 0x1052 #define CL_DEVICE_SVM_CAPABILITIES 0x1053 #define CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE 0x1054 #define CL_DEVICE_MAX_PIPE_ARGS 0x1055 #define CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS 0x1056 #define CL_DEVICE_PIPE_MAX_PACKET_SIZE 0x1057 #define CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT 0x1058 #define CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT 0x1059 #define CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT 0x105A #endif #ifdef CL_VERSION_2_1 #define CL_DEVICE_IL_VERSION 0x105B #define CL_DEVICE_MAX_NUM_SUB_GROUPS 0x105C #define CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS 0x105D #endif /* cl_device_fp_config - bitfield */ #define CL_FP_DENORM (1 << 0) #define CL_FP_INF_NAN (1 << 1) #define CL_FP_ROUND_TO_NEAREST (1 << 2) #define CL_FP_ROUND_TO_ZERO (1 << 3) #define CL_FP_ROUND_TO_INF (1 << 4) #define CL_FP_FMA (1 << 5) #ifdef CL_VERSION_1_1 #define CL_FP_SOFT_FLOAT (1 << 6) #endif #ifdef CL_VERSION_1_2 #define CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT (1 << 7) #endif /* cl_device_mem_cache_type */ #define CL_NONE 0x0 #define CL_READ_ONLY_CACHE 0x1 #define CL_READ_WRITE_CACHE 0x2 /* cl_device_local_mem_type */ #define CL_LOCAL 0x1 #define CL_GLOBAL 0x2 /* cl_device_exec_capabilities - bitfield */ #define CL_EXEC_KERNEL (1 << 0) #define CL_EXEC_NATIVE_KERNEL (1 << 1) /* cl_command_queue_properties - bitfield */ #define CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE (1 << 0) #define CL_QUEUE_PROFILING_ENABLE (1 << 1) #ifdef CL_VERSION_2_0 #define CL_QUEUE_ON_DEVICE (1 << 2) #define CL_QUEUE_ON_DEVICE_DEFAULT (1 << 3) #endif /* cl_context_info */ #define CL_CONTEXT_REFERENCE_COUNT 0x1080 #define CL_CONTEXT_DEVICES 0x1081 #define CL_CONTEXT_PROPERTIES 0x1082 #ifdef CL_VERSION_1_1 #define CL_CONTEXT_NUM_DEVICES 0x1083 #endif /* cl_context_properties */ #define CL_CONTEXT_PLATFORM 0x1084 #ifdef CL_VERSION_1_2 #define CL_CONTEXT_INTEROP_USER_SYNC 0x1085 #endif #ifdef CL_VERSION_1_2 /* cl_device_partition_property */ #define CL_DEVICE_PARTITION_EQUALLY 0x1086 #define CL_DEVICE_PARTITION_BY_COUNTS 0x1087 #define CL_DEVICE_PARTITION_BY_COUNTS_LIST_END 0x0 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN 0x1088 #endif #ifdef CL_VERSION_1_2 /* cl_device_affinity_domain */ #define CL_DEVICE_AFFINITY_DOMAIN_NUMA (1 << 0) #define CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE (1 << 1) #define CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE (1 << 2) #define CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE (1 << 3) #define CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE (1 << 4) #define CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE (1 << 5) #endif #ifdef CL_VERSION_2_0 /* cl_device_svm_capabilities */ #define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER (1 << 0) #define CL_DEVICE_SVM_FINE_GRAIN_BUFFER (1 << 1) #define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM (1 << 2) #define CL_DEVICE_SVM_ATOMICS (1 << 3) #endif /* cl_command_queue_info */ #define CL_QUEUE_CONTEXT 0x1090 #define CL_QUEUE_DEVICE 0x1091 #define CL_QUEUE_REFERENCE_COUNT 0x1092 #define CL_QUEUE_PROPERTIES 0x1093 #ifdef CL_VERSION_2_0 #define CL_QUEUE_SIZE 0x1094 #endif #ifdef CL_VERSION_2_1 #define CL_QUEUE_DEVICE_DEFAULT 0x1095 #endif /* cl_mem_flags and cl_svm_mem_flags - bitfield */ #define CL_MEM_READ_WRITE (1 << 0) #define CL_MEM_WRITE_ONLY (1 << 1) #define CL_MEM_READ_ONLY (1 << 2) #define CL_MEM_USE_HOST_PTR (1 << 3) #define CL_MEM_ALLOC_HOST_PTR (1 << 4) #define CL_MEM_COPY_HOST_PTR (1 << 5) /* reserved (1 << 6) */ #ifdef CL_VERSION_1_2 #define CL_MEM_HOST_WRITE_ONLY (1 << 7) #define CL_MEM_HOST_READ_ONLY (1 << 8) #define CL_MEM_HOST_NO_ACCESS (1 << 9) #endif #ifdef CL_VERSION_2_0 #define CL_MEM_SVM_FINE_GRAIN_BUFFER (1 << 10) /* used by cl_svm_mem_flags only */ #define CL_MEM_SVM_ATOMICS (1 << 11) /* used by cl_svm_mem_flags only */ #define CL_MEM_KERNEL_READ_AND_WRITE (1 << 12) #endif #ifdef CL_VERSION_1_2 /* cl_mem_migration_flags - bitfield */ #define CL_MIGRATE_MEM_OBJECT_HOST (1 << 0) #define CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED (1 << 1) #endif /* cl_channel_order */ #define CL_R 0x10B0 #define CL_A 0x10B1 #define CL_RG 0x10B2 #define CL_RA 0x10B3 #define CL_RGB 0x10B4 #define CL_RGBA 0x10B5 #define CL_BGRA 0x10B6 #define CL_ARGB 0x10B7 #define CL_INTENSITY 0x10B8 #define CL_LUMINANCE 0x10B9 #ifdef CL_VERSION_1_1 #define CL_Rx 0x10BA #define CL_RGx 0x10BB #define CL_RGBx 0x10BC #endif #ifdef CL_VERSION_1_2 #define CL_DEPTH 0x10BD #define CL_DEPTH_STENCIL 0x10BE #endif #ifdef CL_VERSION_2_0 #define CL_sRGB 0x10BF #define CL_sRGBx 0x10C0 #define CL_sRGBA 0x10C1 #define CL_sBGRA 0x10C2 #define CL_ABGR 0x10C3 #endif /* cl_channel_type */ #define CL_SNORM_INT8 0x10D0 #define CL_SNORM_INT16 0x10D1 #define CL_UNORM_INT8 0x10D2 #define CL_UNORM_INT16 0x10D3 #define CL_UNORM_SHORT_565 0x10D4 #define CL_UNORM_SHORT_555 0x10D5 #define CL_UNORM_INT_101010 0x10D6 #define CL_SIGNED_INT8 0x10D7 #define CL_SIGNED_INT16 0x10D8 #define CL_SIGNED_INT32 0x10D9 #define CL_UNSIGNED_INT8 0x10DA #define CL_UNSIGNED_INT16 0x10DB #define CL_UNSIGNED_INT32 0x10DC #define CL_HALF_FLOAT 0x10DD #define CL_FLOAT 0x10DE #ifdef CL_VERSION_1_2 #define CL_UNORM_INT24 0x10DF #endif #ifdef CL_VERSION_2_1 #define CL_UNORM_INT_101010_2 0x10E0 #endif /* cl_mem_object_type */ #define CL_MEM_OBJECT_BUFFER 0x10F0 #define CL_MEM_OBJECT_IMAGE2D 0x10F1 #define CL_MEM_OBJECT_IMAGE3D 0x10F2 #ifdef CL_VERSION_1_2 #define CL_MEM_OBJECT_IMAGE2D_ARRAY 0x10F3 #define CL_MEM_OBJECT_IMAGE1D 0x10F4 #define CL_MEM_OBJECT_IMAGE1D_ARRAY 0x10F5 #define CL_MEM_OBJECT_IMAGE1D_BUFFER 0x10F6 #endif #ifdef CL_VERSION_2_0 #define CL_MEM_OBJECT_PIPE 0x10F7 #endif /* cl_mem_info */ #define CL_MEM_TYPE 0x1100 #define CL_MEM_FLAGS 0x1101 #define CL_MEM_SIZE 0x1102 #define CL_MEM_HOST_PTR 0x1103 #define CL_MEM_MAP_COUNT 0x1104 #define CL_MEM_REFERENCE_COUNT 0x1105 #define CL_MEM_CONTEXT 0x1106 #ifdef CL_VERSION_1_1 #define CL_MEM_ASSOCIATED_MEMOBJECT 0x1107 #define CL_MEM_OFFSET 0x1108 #endif #ifdef CL_VERSION_2_0 #define CL_MEM_USES_SVM_POINTER 0x1109 #endif /* cl_image_info */ #define CL_IMAGE_FORMAT 0x1110 #define CL_IMAGE_ELEMENT_SIZE 0x1111 #define CL_IMAGE_ROW_PITCH 0x1112 #define CL_IMAGE_SLICE_PITCH 0x1113 #define CL_IMAGE_WIDTH 0x1114 #define CL_IMAGE_HEIGHT 0x1115 #define CL_IMAGE_DEPTH 0x1116 #ifdef CL_VERSION_1_2 #define CL_IMAGE_ARRAY_SIZE 0x1117 #define CL_IMAGE_BUFFER 0x1118 #define CL_IMAGE_NUM_MIP_LEVELS 0x1119 #define CL_IMAGE_NUM_SAMPLES 0x111A #endif #ifdef CL_VERSION_2_0 /* cl_pipe_info */ #define CL_PIPE_PACKET_SIZE 0x1120 #define CL_PIPE_MAX_PACKETS 0x1121 #endif /* cl_addressing_mode */ #define CL_ADDRESS_NONE 0x1130 #define CL_ADDRESS_CLAMP_TO_EDGE 0x1131 #define CL_ADDRESS_CLAMP 0x1132 #define CL_ADDRESS_REPEAT 0x1133 #ifdef CL_VERSION_1_1 #define CL_ADDRESS_MIRRORED_REPEAT 0x1134 #endif /* cl_filter_mode */ #define CL_FILTER_NEAREST 0x1140 #define CL_FILTER_LINEAR 0x1141 /* cl_sampler_info */ #define CL_SAMPLER_REFERENCE_COUNT 0x1150 #define CL_SAMPLER_CONTEXT 0x1151 #define CL_SAMPLER_NORMALIZED_COORDS 0x1152 #define CL_SAMPLER_ADDRESSING_MODE 0x1153 #define CL_SAMPLER_FILTER_MODE 0x1154 #ifdef CL_VERSION_2_0 /* These enumerants are for the cl_khr_mipmap_image extension. They have since been added to cl_ext.h with an appropriate KHR suffix, but are left here for backwards compatibility. */ #define CL_SAMPLER_MIP_FILTER_MODE 0x1155 #define CL_SAMPLER_LOD_MIN 0x1156 #define CL_SAMPLER_LOD_MAX 0x1157 #endif /* cl_map_flags - bitfield */ #define CL_MAP_READ (1 << 0) #define CL_MAP_WRITE (1 << 1) #ifdef CL_VERSION_1_2 #define CL_MAP_WRITE_INVALIDATE_REGION (1 << 2) #endif /* cl_program_info */ #define CL_PROGRAM_REFERENCE_COUNT 0x1160 #define CL_PROGRAM_CONTEXT 0x1161 #define CL_PROGRAM_NUM_DEVICES 0x1162 #define CL_PROGRAM_DEVICES 0x1163 #define CL_PROGRAM_SOURCE 0x1164 #define CL_PROGRAM_BINARY_SIZES 0x1165 #define CL_PROGRAM_BINARIES 0x1166 #ifdef CL_VERSION_1_2 #define CL_PROGRAM_NUM_KERNELS 0x1167 #define CL_PROGRAM_KERNEL_NAMES 0x1168 #endif #ifdef CL_VERSION_2_1 #define CL_PROGRAM_IL 0x1169 #endif #ifdef CL_VERSION_2_2 #define CL_PROGRAM_SCOPE_GLOBAL_CTORS_PRESENT 0x116A #define CL_PROGRAM_SCOPE_GLOBAL_DTORS_PRESENT 0x116B #endif /* cl_program_build_info */ #define CL_PROGRAM_BUILD_STATUS 0x1181 #define CL_PROGRAM_BUILD_OPTIONS 0x1182 #define CL_PROGRAM_BUILD_LOG 0x1183 #ifdef CL_VERSION_1_2 #define CL_PROGRAM_BINARY_TYPE 0x1184 #endif #ifdef CL_VERSION_2_0 #define CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE 0x1185 #endif #ifdef CL_VERSION_1_2 /* cl_program_binary_type */ #define CL_PROGRAM_BINARY_TYPE_NONE 0x0 #define CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT 0x1 #define CL_PROGRAM_BINARY_TYPE_LIBRARY 0x2 #define CL_PROGRAM_BINARY_TYPE_EXECUTABLE 0x4 #endif /* cl_build_status */ #define CL_BUILD_SUCCESS 0 #define CL_BUILD_NONE -1 #define CL_BUILD_ERROR -2 #define CL_BUILD_IN_PROGRESS -3 /* cl_kernel_info */ #define CL_KERNEL_FUNCTION_NAME 0x1190 #define CL_KERNEL_NUM_ARGS 0x1191 #define CL_KERNEL_REFERENCE_COUNT 0x1192 #define CL_KERNEL_CONTEXT 0x1193 #define CL_KERNEL_PROGRAM 0x1194 #ifdef CL_VERSION_1_2 #define CL_KERNEL_ATTRIBUTES 0x1195 #endif #ifdef CL_VERSION_1_2 /* cl_kernel_arg_info */ #define CL_KERNEL_ARG_ADDRESS_QUALIFIER 0x1196 #define CL_KERNEL_ARG_ACCESS_QUALIFIER 0x1197 #define CL_KERNEL_ARG_TYPE_NAME 0x1198 #define CL_KERNEL_ARG_TYPE_QUALIFIER 0x1199 #define CL_KERNEL_ARG_NAME 0x119A #endif #ifdef CL_VERSION_1_2 /* cl_kernel_arg_address_qualifier */ #define CL_KERNEL_ARG_ADDRESS_GLOBAL 0x119B #define CL_KERNEL_ARG_ADDRESS_LOCAL 0x119C #define CL_KERNEL_ARG_ADDRESS_CONSTANT 0x119D #define CL_KERNEL_ARG_ADDRESS_PRIVATE 0x119E #endif #ifdef CL_VERSION_1_2 /* cl_kernel_arg_access_qualifier */ #define CL_KERNEL_ARG_ACCESS_READ_ONLY 0x11A0 #define CL_KERNEL_ARG_ACCESS_WRITE_ONLY 0x11A1 #define CL_KERNEL_ARG_ACCESS_READ_WRITE 0x11A2 #define CL_KERNEL_ARG_ACCESS_NONE 0x11A3 #endif #ifdef CL_VERSION_1_2 /* cl_kernel_arg_type_qualifier */ #define CL_KERNEL_ARG_TYPE_NONE 0 #define CL_KERNEL_ARG_TYPE_CONST (1 << 0) #define CL_KERNEL_ARG_TYPE_RESTRICT (1 << 1) #define CL_KERNEL_ARG_TYPE_VOLATILE (1 << 2) #ifdef CL_VERSION_2_0 #define CL_KERNEL_ARG_TYPE_PIPE (1 << 3) #endif #endif /* cl_kernel_work_group_info */ #define CL_KERNEL_WORK_GROUP_SIZE 0x11B0 #define CL_KERNEL_COMPILE_WORK_GROUP_SIZE 0x11B1 #define CL_KERNEL_LOCAL_MEM_SIZE 0x11B2 #define CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE 0x11B3 #define CL_KERNEL_PRIVATE_MEM_SIZE 0x11B4 #ifdef CL_VERSION_1_2 #define CL_KERNEL_GLOBAL_WORK_SIZE 0x11B5 #endif #ifdef CL_VERSION_2_1 /* cl_kernel_sub_group_info */ #define CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE 0x2033 #define CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE 0x2034 #define CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT 0x11B8 #define CL_KERNEL_MAX_NUM_SUB_GROUPS 0x11B9 #define CL_KERNEL_COMPILE_NUM_SUB_GROUPS 0x11BA #endif #ifdef CL_VERSION_2_0 /* cl_kernel_exec_info */ #define CL_KERNEL_EXEC_INFO_SVM_PTRS 0x11B6 #define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM 0x11B7 #endif /* cl_event_info */ #define CL_EVENT_COMMAND_QUEUE 0x11D0 #define CL_EVENT_COMMAND_TYPE 0x11D1 #define CL_EVENT_REFERENCE_COUNT 0x11D2 #define CL_EVENT_COMMAND_EXECUTION_STATUS 0x11D3 #ifdef CL_VERSION_1_1 #define CL_EVENT_CONTEXT 0x11D4 #endif /* cl_command_type */ #define CL_COMMAND_NDRANGE_KERNEL 0x11F0 #define CL_COMMAND_TASK 0x11F1 #define CL_COMMAND_NATIVE_KERNEL 0x11F2 #define CL_COMMAND_READ_BUFFER 0x11F3 #define CL_COMMAND_WRITE_BUFFER 0x11F4 #define CL_COMMAND_COPY_BUFFER 0x11F5 #define CL_COMMAND_READ_IMAGE 0x11F6 #define CL_COMMAND_WRITE_IMAGE 0x11F7 #define CL_COMMAND_COPY_IMAGE 0x11F8 #define CL_COMMAND_COPY_IMAGE_TO_BUFFER 0x11F9 #define CL_COMMAND_COPY_BUFFER_TO_IMAGE 0x11FA #define CL_COMMAND_MAP_BUFFER 0x11FB #define CL_COMMAND_MAP_IMAGE 0x11FC #define CL_COMMAND_UNMAP_MEM_OBJECT 0x11FD #define CL_COMMAND_MARKER 0x11FE #define CL_COMMAND_ACQUIRE_GL_OBJECTS 0x11FF #define CL_COMMAND_RELEASE_GL_OBJECTS 0x1200 #ifdef CL_VERSION_1_1 #define CL_COMMAND_READ_BUFFER_RECT 0x1201 #define CL_COMMAND_WRITE_BUFFER_RECT 0x1202 #define CL_COMMAND_COPY_BUFFER_RECT 0x1203 #define CL_COMMAND_USER 0x1204 #endif #ifdef CL_VERSION_1_2 #define CL_COMMAND_BARRIER 0x1205 #define CL_COMMAND_MIGRATE_MEM_OBJECTS 0x1206 #define CL_COMMAND_FILL_BUFFER 0x1207 #define CL_COMMAND_FILL_IMAGE 0x1208 #endif #ifdef CL_VERSION_2_0 #define CL_COMMAND_SVM_FREE 0x1209 #define CL_COMMAND_SVM_MEMCPY 0x120A #define CL_COMMAND_SVM_MEMFILL 0x120B #define CL_COMMAND_SVM_MAP 0x120C #define CL_COMMAND_SVM_UNMAP 0x120D #endif /* command execution status */ #define CL_COMPLETE 0x0 #define CL_RUNNING 0x1 #define CL_SUBMITTED 0x2 #define CL_QUEUED 0x3 #ifdef CL_VERSION_1_1 /* cl_buffer_create_type */ #define CL_BUFFER_CREATE_TYPE_REGION 0x1220 #endif /* cl_profiling_info */ #define CL_PROFILING_COMMAND_QUEUED 0x1280 #define CL_PROFILING_COMMAND_SUBMIT 0x1281 #define CL_PROFILING_COMMAND_START 0x1282 #define CL_PROFILING_COMMAND_END 0x1283 #ifdef CL_VERSION_2_0 #define CL_PROFILING_COMMAND_COMPLETE 0x1284 #endif #ifdef CL_EXPERIMENTAL /* cl_device_atomic_capabilities - bitfield */ #define CL_DEVICE_ATOMIC_ORDER_RELAXED (1 << 0) #define CL_DEVICE_ATOMIC_ORDER_ACQ_REL (1 << 1) #define CL_DEVICE_ATOMIC_ORDER_SEQ_CST (1 << 2) #define CL_DEVICE_ATOMIC_SCOPE_WORK_ITEM (1 << 3) #define CL_DEVICE_ATOMIC_SCOPE_WORK_GROUP (1 << 4) #define CL_DEVICE_ATOMIC_SCOPE_DEVICE (1 << 5) #define CL_DEVICE_ATOMIC_SCOPE_ALL_SVM_DEVICES (1 << 6) /* cl_device_info */ #define CL_DEVICE_ATOMIC_MEMORY_CAPABILITIES 0x1063 #define CL_DEVICE_ATOMIC_FENCE_CAPABILITIES 0x1064 #define CL_DEVICE_NON_UNIFORM_WORK_GROUP_SUPPORT 0x1065 #define CL_DEVICE_OPENCL_C_VERSIONS 0x1066 #define CL_DEVICE_MAX_WRITE_IMAGE3D_ARGS 0x1067 #define CL_DEVICE_WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT 0x1068 #define CL_DEVICE_GENERIC_ADDRESS_SPACE_SUPPORT 0x1069 /* 0x106A to 0x106E - Reserved for upcoming KHR extension */ #define CL_DEVICE_OPENCL_C_FEATURES 0x106F /* cl_command_type */ #define CL_COMMAND_SVM_MIGRATE_MEM 0x120E #endif /* cl_khronos_vendor_id */ #define CL_KHRONOS_VENDOR_ID_CODEPLAY 0x10004 /********************************************************************************************************/ /* Platform API */ extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformIDs(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformInfo(cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; /* Device APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDs(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceInfo(cl_device_id device, cl_device_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevices(cl_device_id in_device, const cl_device_partition_property * properties, cl_uint num_devices, cl_device_id * out_devices, cl_uint * num_devices_ret) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDevice(cl_device_id device) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDevice(cl_device_id device) CL_API_SUFFIX__VERSION_1_2; #endif #ifdef CL_VERSION_2_1 extern CL_API_ENTRY cl_int CL_API_CALL clSetDefaultDeviceCommandQueue(cl_context context, cl_device_id device, cl_command_queue command_queue) CL_API_SUFFIX__VERSION_2_1; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceAndHostTimer(cl_device_id device, cl_ulong* device_timestamp, cl_ulong* host_timestamp) CL_API_SUFFIX__VERSION_2_1; extern CL_API_ENTRY cl_int CL_API_CALL clGetHostTimer(cl_device_id device, cl_ulong * host_timestamp) CL_API_SUFFIX__VERSION_2_1; #endif /* Context APIs */ extern CL_API_ENTRY cl_context CL_API_CALL clCreateContext(const cl_context_properties * properties, cl_uint num_devices, const cl_device_id * devices, void (CL_CALLBACK * pfn_notify)(const char * errinfo, const void * private_info, size_t cb, void * user_data), void * user_data, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_context CL_API_CALL clCreateContextFromType(const cl_context_properties * properties, cl_device_type device_type, void (CL_CALLBACK * pfn_notify)(const char * errinfo, const void * private_info, size_t cb, void * user_data), void * user_data, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainContext(cl_context context) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseContext(cl_context context) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetContextInfo(cl_context context, cl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; /* Command Queue APIs */ #ifdef CL_VERSION_2_0 extern CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueueWithProperties(cl_context context, cl_device_id device, const cl_queue_properties * properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_0; #endif extern CL_API_ENTRY cl_int CL_API_CALL clRetainCommandQueue(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseCommandQueue(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetCommandQueueInfo(cl_command_queue command_queue, cl_command_queue_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; /* Memory Object APIs */ extern CL_API_ENTRY cl_mem CL_API_CALL clCreateBuffer(cl_context context, cl_mem_flags flags, size_t size, void * host_ptr, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateSubBuffer(cl_mem buffer, cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_1; #endif #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateImage(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, const cl_image_desc * image_desc, void * host_ptr, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; #endif #ifdef CL_VERSION_2_0 extern CL_API_ENTRY cl_mem CL_API_CALL clCreatePipe(cl_context context, cl_mem_flags flags, cl_uint pipe_packet_size, cl_uint pipe_max_packets, const cl_pipe_properties * properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_0; #endif extern CL_API_ENTRY cl_int CL_API_CALL clRetainMemObject(cl_mem memobj) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseMemObject(cl_mem memobj) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSupportedImageFormats(cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format * image_formats, cl_uint * num_image_formats) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectInfo(cl_mem memobj, cl_mem_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetImageInfo(cl_mem image, cl_image_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_2_0 extern CL_API_ENTRY cl_int CL_API_CALL clGetPipeInfo(cl_mem pipe, cl_pipe_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_2_0; #endif #ifdef CL_VERSION_1_1 extern CL_API_ENTRY cl_int CL_API_CALL clSetMemObjectDestructorCallback(cl_mem memobj, void (CL_CALLBACK * pfn_notify)(cl_mem memobj, void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_1; #endif /* SVM Allocation APIs */ #ifdef CL_VERSION_2_0 extern CL_API_ENTRY void * CL_API_CALL clSVMAlloc(cl_context context, cl_svm_mem_flags flags, size_t size, cl_uint alignment) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY void CL_API_CALL clSVMFree(cl_context context, void * svm_pointer) CL_API_SUFFIX__VERSION_2_0; #endif /* Sampler APIs */ #ifdef CL_VERSION_2_0 extern CL_API_ENTRY cl_sampler CL_API_CALL clCreateSamplerWithProperties(cl_context context, const cl_sampler_properties * sampler_properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_0; #endif extern CL_API_ENTRY cl_int CL_API_CALL clRetainSampler(cl_sampler sampler) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseSampler(cl_sampler sampler) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSamplerInfo(cl_sampler sampler, cl_sampler_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; /* Program Object APIs */ extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithSource(cl_context context, cl_uint count, const char ** strings, const size_t * lengths, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBinary(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const size_t * lengths, const unsigned char ** binaries, cl_int * binary_status, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBuiltInKernels(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * kernel_names, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; #endif #ifdef CL_VERSION_2_1 extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithIL(cl_context context, const void* il, size_t length, cl_int* errcode_ret) CL_API_SUFFIX__VERSION_2_1; #endif extern CL_API_ENTRY cl_int CL_API_CALL clRetainProgram(cl_program program) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseProgram(cl_program program) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clBuildProgram(cl_program program, cl_uint num_devices, const cl_device_id * device_list, const char * options, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_int CL_API_CALL clCompileProgram(cl_program program, cl_uint num_devices, const cl_device_id * device_list, const char * options, cl_uint num_input_headers, const cl_program * input_headers, const char ** header_include_names, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_program CL_API_CALL clLinkProgram(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * options, cl_uint num_input_programs, const cl_program * input_programs, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; #endif #ifdef CL_VERSION_2_2 extern CL_API_ENTRY cl_int CL_API_CALL clSetProgramReleaseCallback(cl_program program, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data) CL_API_SUFFIX__VERSION_2_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetProgramSpecializationConstant(cl_program program, cl_uint spec_id, size_t spec_size, const void* spec_value) CL_API_SUFFIX__VERSION_2_2; #endif #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_int CL_API_CALL clUnloadPlatformCompiler(cl_platform_id platform) CL_API_SUFFIX__VERSION_1_2; #endif extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramInfo(cl_program program, cl_program_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramBuildInfo(cl_program program, cl_device_id device, cl_program_build_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; /* Kernel Object APIs */ extern CL_API_ENTRY cl_kernel CL_API_CALL clCreateKernel(cl_program program, const char * kernel_name, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateKernelsInProgram(cl_program program, cl_uint num_kernels, cl_kernel * kernels, cl_uint * num_kernels_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_2_1 extern CL_API_ENTRY cl_kernel CL_API_CALL clCloneKernel(cl_kernel source_kernel, cl_int* errcode_ret) CL_API_SUFFIX__VERSION_2_1; #endif extern CL_API_ENTRY cl_int CL_API_CALL clRetainKernel(cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseKernel(cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArg(cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void * arg_value) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_2_0 extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointer(cl_kernel kernel, cl_uint arg_index, const void * arg_value) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfo(cl_kernel kernel, cl_kernel_exec_info param_name, size_t param_value_size, const void * param_value) CL_API_SUFFIX__VERSION_2_0; #endif extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelInfo(cl_kernel kernel, cl_kernel_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelArgInfo(cl_kernel kernel, cl_uint arg_indx, cl_kernel_arg_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_2; #endif extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelWorkGroupInfo(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_2_1 extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfo(cl_kernel kernel, cl_device_id device, cl_kernel_sub_group_info param_name, size_t input_value_size, const void* input_value, size_t param_value_size, void* param_value, size_t* param_value_size_ret) CL_API_SUFFIX__VERSION_2_1; #endif /* Event Object APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clWaitForEvents(cl_uint num_events, const cl_event * event_list) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetEventInfo(cl_event event, cl_event_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 extern CL_API_ENTRY cl_event CL_API_CALL clCreateUserEvent(cl_context context, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_1; #endif extern CL_API_ENTRY cl_int CL_API_CALL clRetainEvent(cl_event event) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseEvent(cl_event event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 extern CL_API_ENTRY cl_int CL_API_CALL clSetUserEventStatus(cl_event event, cl_int execution_status) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clSetEventCallback(cl_event event, cl_int command_exec_callback_type, void (CL_CALLBACK * pfn_notify)(cl_event event, cl_int event_command_status, void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_1; #endif /* Profiling APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetEventProfilingInfo(cl_event event, cl_profiling_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; /* Flush and Finish APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clFlush(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clFinish(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0; /* Enqueued Commands APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t size, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBufferRect(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t * buffer_offset, const size_t * host_offset, const size_t * region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_1; #endif extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t size, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBufferRect(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, const size_t * buffer_offset, const size_t * host_offset, const size_t * region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_1; #endif #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillBuffer(cl_command_queue command_queue, cl_mem buffer, const void * pattern, size_t pattern_size, size_t offset, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #endif extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBuffer(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferRect(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, const size_t * src_origin, const size_t * dst_origin, const size_t * region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_1; #endif extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_read, const size_t * origin, const size_t * region, size_t row_pitch, size_t slice_pitch, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_write, const size_t * origin, const size_t * region, size_t input_row_pitch, size_t input_slice_pitch, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillImage(cl_command_queue command_queue, cl_mem image, const void * fill_color, const size_t * origin, const size_t * region, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #endif extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImage(cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image, const size_t * src_origin, const size_t * dst_origin, const size_t * region, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImageToBuffer(cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buffer, const size_t * src_origin, const size_t * region, size_t dst_offset, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferToImage(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_image, size_t src_offset, const size_t * dst_origin, const size_t * region, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map, cl_map_flags map_flags, size_t offset, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_map, cl_map_flags map_flags, const size_t * origin, const size_t * region, size_t * image_row_pitch, size_t * image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueUnmapMemObject(cl_command_queue command_queue, cl_mem memobj, void * mapped_ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMigrateMemObjects(cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem * mem_objects, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #endif extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNDRangeKernel(cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t * global_work_offset, const size_t * global_work_size, const size_t * local_work_size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNativeKernel(cl_command_queue command_queue, void (CL_CALLBACK * user_func)(void *), void * args, size_t cb_args, cl_uint num_mem_objects, const cl_mem * mem_list, const void ** args_mem_loc, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarkerWithWaitList(cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrierWithWaitList(cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #endif #ifdef CL_VERSION_2_0 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFree(cl_command_queue command_queue, cl_uint num_svm_pointers, void * svm_pointers[], void (CL_CALLBACK * pfn_free_func)(cl_command_queue queue, cl_uint num_svm_pointers, void * svm_pointers[], void * user_data), void * user_data, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpy(cl_command_queue command_queue, cl_bool blocking_copy, void * dst_ptr, const void * src_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFill(cl_command_queue command_queue, void * svm_ptr, const void * pattern, size_t pattern_size, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMap(cl_command_queue command_queue, cl_bool blocking_map, cl_map_flags flags, void * svm_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmap(cl_command_queue command_queue, void * svm_ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0; #endif #ifdef CL_VERSION_2_1 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMigrateMem(cl_command_queue command_queue, cl_uint num_svm_pointers, const void ** svm_pointers, const size_t * sizes, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_1; #endif #ifdef CL_VERSION_1_2 /* Extension function access * * Returns the extension function address for the given function name, * or NULL if a valid function can not be found. The client must * check to make sure the address is not NULL, before using or * calling the returned function address. */ extern CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddressForPlatform(cl_platform_id platform, const char * func_name) CL_API_SUFFIX__VERSION_1_2; #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS /* * WARNING: * This API introduces mutable state into the OpenCL implementation. It has been REMOVED * to better facilitate thread safety. The 1.0 API is not thread safe. It is not tested by the * OpenCL 1.1 conformance test, and consequently may not work or may not work dependably. * It is likely to be non-performant. Use of this API is not advised. Use at your own risk. * * Software developers previously relying on this API are instructed to set the command queue * properties when creating the queue, instead. */ extern CL_API_ENTRY cl_int CL_API_CALL clSetCommandQueueProperty(cl_command_queue command_queue, cl_command_queue_properties properties, cl_bool enable, cl_command_queue_properties * old_properties) CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED; #endif /* CL_USE_DEPRECATED_OPENCL_1_0_APIS */ /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage2D(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height, size_t image_row_pitch, void * host_ptr, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage3D(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height, size_t image_depth, size_t image_row_pitch, size_t image_slice_pitch, void * host_ptr, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueMarker(cl_command_queue command_queue, cl_event * event) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueWaitForEvents(cl_command_queue command_queue, cl_uint num_events, const cl_event * event_list) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueBarrier(cl_command_queue command_queue) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clUnloadCompiler(void) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED void * CL_API_CALL clGetExtensionFunctionAddress(const char * func_name) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* Deprecated OpenCL 2.0 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_command_queue CL_API_CALL clCreateCommandQueue(cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_sampler CL_API_CALL clCreateSampler(cl_context context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_int CL_API_CALL clEnqueueTask(cl_command_queue command_queue, cl_kernel kernel, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl.hpp000066400000000000000000011064001450307266000227340ustar00rootroot00000000000000/* Modifications Copyright(C)[2021-2022] Advanced Micro Devices, Inc. * All rights reserved. * */ /******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /*! \file * * \brief C++ bindings for OpenCL 1.0 (rev 48), OpenCL 1.1 (rev 33) and * OpenCL 1.2 (rev 15) * \author Benedict R. Gaster, Laurent Morichetti and Lee Howes * * Additions and fixes from: * Brian Cole, March 3rd 2010 and April 2012 * Matt Gruenke, April 2012. * Bruce Merry, February 2013. * Tom Deakin and Simon McIntosh-Smith, July 2013 * * \version 1.2.9 * \date December 2015 * * Optional extension support * * cl * cl_ext_device_fission * #define USE_CL_DEVICE_FISSION */ /*! \mainpage * \section intro Introduction * For many large applications C++ is the language of choice and so it seems * reasonable to define C++ bindings for OpenCL. * * * The interface is contained with a single C++ header file \em cl.hpp and all * definitions are contained within the namespace \em cl. There is no additional * requirement to include \em cl.h and to use either the C++ or original C * bindings it is enough to simply include \em cl.hpp. * * The bindings themselves are lightweight and correspond closely to the * underlying C API. Using the C++ bindings introduces no additional execution * overhead. * * For detail documentation on the bindings see: * * The OpenCL C++ Wrapper API 1.2 (revision 09) * http://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.2.pdf * * \section example Example * * The following example shows a general use case for the C++ * bindings, including support for the optional exception feature and * also the supplied vector and string classes, see following sections for * decriptions of these features. * * \code * #define __CL_ENABLE_EXCEPTIONS * * #if defined(__APPLE__) || defined(__MACOSX) * #include * #else * #include * #endif * #include * #include * #include * * const char * helloStr = "__kernel void " * "hello(void) " * "{ " * " " * "} "; * * int * main(void) * { * cl_int err = CL_SUCCESS; * try { * * std::vector platforms; * cl::Platform::get(&platforms); * if (platforms.size() == 0) { * std::cout << "Platform size 0\n"; * return -1; * } * * cl_context_properties properties[] = * { CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[0])(), 0}; * cl::Context context(CL_DEVICE_TYPE_CPU, properties); * * std::vector devices = context.getInfo(); * * cl::Program::Sources source(1, * std::make_pair(helloStr,strlen(helloStr))); * cl::Program program_ = cl::Program(context, source); * program_.build(devices); * * cl::Kernel kernel(program_, "hello", &err); * * cl::Event event; * cl::CommandQueue queue(context, devices[0], 0, &err); * queue.enqueueNDRangeKernel( * kernel, * cl::NullRange, * cl::NDRange(4,4), * cl::NullRange, * NULL, * &event); * * event.wait(); * } * catch (cl::Error err) { * std::cerr * << "ERROR: " * << err.what() * << "(" * << err.err() * << ")" * << std::endl; * } * * return EXIT_SUCCESS; * } * * \endcode * */ #ifndef CL_HPP_ #define CL_HPP_ #ifdef _WIN32 #include #if defined(USE_DX_INTEROP) #include #include #endif #endif // _WIN32 #if defined(_MSC_VER) #include #endif // _MSC_VER // #if defined(USE_CL_DEVICE_FISSION) #include #endif #if defined(__APPLE__) || defined(__MACOSX) #include #else #include #endif // !__APPLE__ #if (_MSC_VER >= 1700) || (__cplusplus >= 201103L) #define CL_HPP_RVALUE_REFERENCES_SUPPORTED #define CL_HPP_CPP11_ATOMICS_SUPPORTED #include #endif #if (__cplusplus >= 201103L) #define CL_HPP_NOEXCEPT noexcept #else #define CL_HPP_NOEXCEPT #endif // To avoid accidentally taking ownership of core OpenCL types // such as cl_kernel constructors are made explicit // under OpenCL 1.2 #if defined(CL_VERSION_1_2) && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __CL_EXPLICIT_CONSTRUCTORS explicit #else // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __CL_EXPLICIT_CONSTRUCTORS #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Define deprecated prefixes and suffixes to ensure compilation // in case they are not pre-defined #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_CALLBACK) #define CL_CALLBACK #endif //CL_CALLBACK #include #include #include #if defined(__CL_ENABLE_EXCEPTIONS) #include #endif // #if defined(__CL_ENABLE_EXCEPTIONS) #if !defined(__NO_STD_VECTOR) #include #endif #if !defined(__NO_STD_STRING) #include #endif #if defined(__ANDROID__) || defined(linux) || defined(__APPLE__) || defined(__MACOSX) #include #endif // linux #include /*! \namespace cl * * \brief The OpenCL C++ bindings are defined within this namespace. * */ namespace cl { class Memory; /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) #define __INIT_CL_EXT_FCN_PTR(name) \ if(!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddress(#name); \ if(!pfn_##name) { \ } \ } #endif // #if defined(CL_VERSION_1_1) #if defined(CL_VERSION_1_2) #define __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, name) \ if(!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddressForPlatform(platform, #name); \ if(!pfn_##name) { \ } \ } #endif // #if defined(CL_VERSION_1_1) class Program; class Device; class Context; class CommandQueue; class Memory; class Buffer; #if defined(__CL_ENABLE_EXCEPTIONS) /*! \brief Exception class * * This may be thrown by API functions when __CL_ENABLE_EXCEPTIONS is defined. */ class Error : public std::exception { private: cl_int err_; const char * errStr_; public: /*! \brief Create a new CL error exception for a given error code * and corresponding message. * * \param err error code value. * * \param errStr a descriptive string that must remain in scope until * handling of the exception has concluded. If set, it * will be returned by what(). */ Error(cl_int err, const char * errStr = NULL) : err_(err), errStr_(errStr) {} ~Error() throw() {} /*! \brief Get error string associated with exception * * \return A memory pointer to the error message string. */ virtual const char * what() const throw () { if (errStr_ == NULL) { return "empty"; } else { return errStr_; } } /*! \brief Get error code associated with exception * * \return The error code. */ cl_int err(void) const { return err_; } }; #define __ERR_STR(x) #x #else #define __ERR_STR(x) NULL #endif // __CL_ENABLE_EXCEPTIONS namespace detail { #if defined(__CL_ENABLE_EXCEPTIONS) static inline cl_int errHandler ( cl_int err, const char * errStr = NULL) { if (err != CL_SUCCESS) { throw Error(err, errStr); } return err; } #else static inline cl_int errHandler (cl_int err, const char * errStr = NULL) { (void) errStr; // suppress unused variable warning return err; } #endif // __CL_ENABLE_EXCEPTIONS } //! \cond DOXYGEN_DETAIL #if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS) #define __GET_DEVICE_INFO_ERR __ERR_STR(clGetDeviceInfo) #define __GET_PLATFORM_INFO_ERR __ERR_STR(clGetPlatformInfo) #define __GET_DEVICE_IDS_ERR __ERR_STR(clGetDeviceIDs) #define __GET_PLATFORM_IDS_ERR __ERR_STR(clGetPlatformIDs) #define __GET_CONTEXT_INFO_ERR __ERR_STR(clGetContextInfo) #define __GET_EVENT_INFO_ERR __ERR_STR(clGetEventInfo) #define __GET_EVENT_PROFILE_INFO_ERR __ERR_STR(clGetEventProfileInfo) #define __GET_MEM_OBJECT_INFO_ERR __ERR_STR(clGetMemObjectInfo) #define __GET_IMAGE_INFO_ERR __ERR_STR(clGetImageInfo) #define __GET_SAMPLER_INFO_ERR __ERR_STR(clGetSamplerInfo) #define __GET_KERNEL_INFO_ERR __ERR_STR(clGetKernelInfo) #if defined(CL_VERSION_1_2) #define __GET_KERNEL_ARG_INFO_ERR __ERR_STR(clGetKernelArgInfo) #endif // #if defined(CL_VERSION_1_2) #define __GET_KERNEL_WORK_GROUP_INFO_ERR __ERR_STR(clGetKernelWorkGroupInfo) #define __GET_PROGRAM_INFO_ERR __ERR_STR(clGetProgramInfo) #define __GET_PROGRAM_BUILD_INFO_ERR __ERR_STR(clGetProgramBuildInfo) #define __GET_COMMAND_QUEUE_INFO_ERR __ERR_STR(clGetCommandQueueInfo) #define __CREATE_CONTEXT_ERR __ERR_STR(clCreateContext) #define __CREATE_CONTEXT_FROM_TYPE_ERR __ERR_STR(clCreateContextFromType) #define __GET_SUPPORTED_IMAGE_FORMATS_ERR __ERR_STR(clGetSupportedImageFormats) #define __CREATE_BUFFER_ERR __ERR_STR(clCreateBuffer) #define __COPY_ERR __ERR_STR(cl::copy) #define __CREATE_SUBBUFFER_ERR __ERR_STR(clCreateSubBuffer) #define __CREATE_GL_BUFFER_ERR __ERR_STR(clCreateFromGLBuffer) #define __CREATE_GL_RENDER_BUFFER_ERR __ERR_STR(clCreateFromGLBuffer) #define __GET_GL_OBJECT_INFO_ERR __ERR_STR(clGetGLObjectInfo) #if defined(CL_VERSION_1_2) #define __CREATE_IMAGE_ERR __ERR_STR(clCreateImage) #define __CREATE_GL_TEXTURE_ERR __ERR_STR(clCreateFromGLTexture) #define __IMAGE_DIMENSION_ERR __ERR_STR(Incorrect image dimensions) #endif // #if defined(CL_VERSION_1_2) #define __CREATE_SAMPLER_ERR __ERR_STR(clCreateSampler) #define __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR __ERR_STR(clSetMemObjectDestructorCallback) #define __CREATE_USER_EVENT_ERR __ERR_STR(clCreateUserEvent) #define __SET_USER_EVENT_STATUS_ERR __ERR_STR(clSetUserEventStatus) #define __SET_EVENT_CALLBACK_ERR __ERR_STR(clSetEventCallback) #define __WAIT_FOR_EVENTS_ERR __ERR_STR(clWaitForEvents) #define __CREATE_KERNEL_ERR __ERR_STR(clCreateKernel) #define __SET_KERNEL_ARGS_ERR __ERR_STR(clSetKernelArg) #define __CREATE_PROGRAM_WITH_SOURCE_ERR __ERR_STR(clCreateProgramWithSource) #define __CREATE_PROGRAM_WITH_BINARY_ERR __ERR_STR(clCreateProgramWithBinary) #if defined(CL_VERSION_1_2) #define __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR __ERR_STR(clCreateProgramWithBuiltInKernels) #endif // #if defined(CL_VERSION_1_2) #define __BUILD_PROGRAM_ERR __ERR_STR(clBuildProgram) #if defined(CL_VERSION_1_2) #define __COMPILE_PROGRAM_ERR __ERR_STR(clCompileProgram) #define __LINK_PROGRAM_ERR __ERR_STR(clLinkProgram) #endif // #if defined(CL_VERSION_1_2) #define __CREATE_KERNELS_IN_PROGRAM_ERR __ERR_STR(clCreateKernelsInProgram) #define __CREATE_COMMAND_QUEUE_ERR __ERR_STR(clCreateCommandQueue) #define __SET_COMMAND_QUEUE_PROPERTY_ERR __ERR_STR(clSetCommandQueueProperty) #define __ENQUEUE_READ_BUFFER_ERR __ERR_STR(clEnqueueReadBuffer) #define __ENQUEUE_READ_BUFFER_RECT_ERR __ERR_STR(clEnqueueReadBufferRect) #define __ENQUEUE_WRITE_BUFFER_ERR __ERR_STR(clEnqueueWriteBuffer) #define __ENQUEUE_WRITE_BUFFER_RECT_ERR __ERR_STR(clEnqueueWriteBufferRect) #define __ENQEUE_COPY_BUFFER_ERR __ERR_STR(clEnqueueCopyBuffer) #define __ENQEUE_COPY_BUFFER_RECT_ERR __ERR_STR(clEnqueueCopyBufferRect) #define __ENQUEUE_FILL_BUFFER_ERR __ERR_STR(clEnqueueFillBuffer) #define __ENQUEUE_READ_IMAGE_ERR __ERR_STR(clEnqueueReadImage) #define __ENQUEUE_WRITE_IMAGE_ERR __ERR_STR(clEnqueueWriteImage) #define __ENQUEUE_COPY_IMAGE_ERR __ERR_STR(clEnqueueCopyImage) #define __ENQUEUE_FILL_IMAGE_ERR __ERR_STR(clEnqueueFillImage) #define __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR __ERR_STR(clEnqueueCopyImageToBuffer) #define __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR __ERR_STR(clEnqueueCopyBufferToImage) #define __ENQUEUE_MAP_BUFFER_ERR __ERR_STR(clEnqueueMapBuffer) #define __ENQUEUE_MAP_IMAGE_ERR __ERR_STR(clEnqueueMapImage) #define __ENQUEUE_UNMAP_MEM_OBJECT_ERR __ERR_STR(clEnqueueUnMapMemObject) #define __ENQUEUE_NDRANGE_KERNEL_ERR __ERR_STR(clEnqueueNDRangeKernel) #define __ENQUEUE_TASK_ERR __ERR_STR(clEnqueueTask) #define __ENQUEUE_NATIVE_KERNEL __ERR_STR(clEnqueueNativeKernel) #if defined(CL_VERSION_1_2) #define __ENQUEUE_MIGRATE_MEM_OBJECTS_ERR __ERR_STR(clEnqueueMigrateMemObjects) #endif // #if defined(CL_VERSION_1_2) #define __ENQUEUE_ACQUIRE_GL_ERR __ERR_STR(clEnqueueAcquireGLObjects) #define __ENQUEUE_RELEASE_GL_ERR __ERR_STR(clEnqueueReleaseGLObjects) #define __RETAIN_ERR __ERR_STR(Retain Object) #define __RELEASE_ERR __ERR_STR(Release Object) #define __FLUSH_ERR __ERR_STR(clFlush) #define __FINISH_ERR __ERR_STR(clFinish) #define __VECTOR_CAPACITY_ERR __ERR_STR(Vector capacity error) /** * CL 1.2 version that uses device fission. */ #if defined(CL_VERSION_1_2) #define __CREATE_SUB_DEVICES __ERR_STR(clCreateSubDevices) #else #define __CREATE_SUB_DEVICES __ERR_STR(clCreateSubDevicesEXT) #endif // #if defined(CL_VERSION_1_2) /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) #define __ENQUEUE_MARKER_ERR __ERR_STR(clEnqueueMarker) #define __ENQUEUE_WAIT_FOR_EVENTS_ERR __ERR_STR(clEnqueueWaitForEvents) #define __ENQUEUE_BARRIER_ERR __ERR_STR(clEnqueueBarrier) #define __UNLOAD_COMPILER_ERR __ERR_STR(clUnloadCompiler) #define __CREATE_GL_TEXTURE_2D_ERR __ERR_STR(clCreateFromGLTexture2D) #define __CREATE_GL_TEXTURE_3D_ERR __ERR_STR(clCreateFromGLTexture3D) #define __CREATE_IMAGE2D_ERR __ERR_STR(clCreateImage2D) #define __CREATE_IMAGE3D_ERR __ERR_STR(clCreateImage3D) #endif // #if defined(CL_VERSION_1_1) #endif // __CL_USER_OVERRIDE_ERROR_STRINGS //! \endcond /** * CL 1.2 marker and barrier commands */ #if defined(CL_VERSION_1_2) #define __ENQUEUE_MARKER_WAIT_LIST_ERR __ERR_STR(clEnqueueMarkerWithWaitList) #define __ENQUEUE_BARRIER_WAIT_LIST_ERR __ERR_STR(clEnqueueBarrierWithWaitList) #endif // #if defined(CL_VERSION_1_2) #if !defined(__USE_DEV_STRING) && !defined(__NO_STD_STRING) typedef std::string STRING_CLASS; #elif !defined(__USE_DEV_STRING) /*! \class string * \brief Simple string class, that provides a limited subset of std::string * functionality but avoids many of the issues that come with that class. * \note Deprecated. Please use std::string as default or * re-define the string class to match the std::string * interface by defining STRING_CLASS */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED string { private: ::size_t size_; char * str_; public: //! \brief Constructs an empty string, allocating no memory. string(void) : size_(0), str_(NULL) { } /*! \brief Constructs a string populated from an arbitrary value of * specified size. * * An extra '\0' is added, in case none was contained in str. * * \param str the initial value of the string instance. Note that '\0' * characters receive no special treatment. If NULL, * the string is left empty, with a size of 0. * * \param size the number of characters to copy from str. */ string(const char * str, ::size_t size) : size_(size), str_(NULL) { if( size > 0 ) { str_ = new char[size_+1]; if (str_ != NULL) { memcpy(str_, str, size_ * sizeof(char)); str_[size_] = '\0'; } else { size_ = 0; } } } /*! \brief Constructs a string populated from a null-terminated value. * * \param str the null-terminated initial value of the string instance. * If NULL, the string is left empty, with a size of 0. */ string(const char * str) : size_(0), str_(NULL) { if( str ) { size_= ::strlen(str); } if( size_ > 0 ) { str_ = new char[size_ + 1]; if (str_ != NULL) { memcpy(str_, str, (size_ + 1) * sizeof(char)); } } } void resize( ::size_t n ) { if( size_ == n ) { return; } if (n == 0) { if( str_ ) { delete [] str_; } str_ = NULL; size_ = 0; } else { char *newString = new char[n + 1]; ::size_t copySize = n; if( size_ < n ) { copySize = size_; } size_ = n; if(str_) { memcpy(newString, str_, (copySize + 1) * sizeof(char)); } if( copySize < size_ ) { memset(newString + copySize, 0, size_ - copySize); } newString[size_] = '\0'; delete [] str_; str_ = newString; } } const char& operator[] ( ::size_t pos ) const { return str_[pos]; } char& operator[] ( ::size_t pos ) { return str_[pos]; } /*! \brief Copies the value of another string to this one. * * \param rhs the string to copy. * * \returns a reference to the modified instance. */ string& operator=(const string& rhs) { if (this == &rhs) { return *this; } if( str_ != NULL ) { delete [] str_; str_ = NULL; size_ = 0; } if (rhs.size_ == 0 || rhs.str_ == NULL) { str_ = NULL; size_ = 0; } else { str_ = new char[rhs.size_ + 1]; size_ = rhs.size_; if (str_ != NULL) { memcpy(str_, rhs.str_, (size_ + 1) * sizeof(char)); } else { size_ = 0; } } return *this; } /*! \brief Constructs a string by copying the value of another instance. * * \param rhs the string to copy. */ string(const string& rhs) : size_(0), str_(NULL) { *this = rhs; } //! \brief Destructor - frees memory used to hold the current value. ~string() { delete[] str_; str_ = NULL; } //! \brief Queries the length of the string, excluding any added '\0's. ::size_t size(void) const { return size_; } //! \brief Queries the length of the string, excluding any added '\0's. ::size_t length(void) const { return size(); } /*! \brief Returns a pointer to the private copy held by this instance, * or "" if empty/unset. */ const char * c_str(void) const { return (str_) ? str_ : "";} } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef cl::string STRING_CLASS; #endif // #elif !defined(__USE_DEV_STRING) #if !defined(__USE_DEV_VECTOR) && !defined(__NO_STD_VECTOR) #define VECTOR_CLASS std::vector #elif !defined(__USE_DEV_VECTOR) #define VECTOR_CLASS cl::vector #if !defined(__MAX_DEFAULT_VECTOR_SIZE) #define __MAX_DEFAULT_VECTOR_SIZE 10 #endif /*! \class vector * \brief Fixed sized vector implementation that mirroring * * \note Deprecated. Please use std::vector as default or * re-define the vector class to match the std::vector * interface by defining VECTOR_CLASS * \note Not recommended for use with custom objects as * current implementation will construct N elements * * std::vector functionality. * \brief Fixed sized vector compatible with std::vector. * * \note * This differs from std::vector<> not just in memory allocation, * but also in terms of when members are constructed, destroyed, * and assigned instead of being copy constructed. * * \param T type of element contained in the vector. * * \param N maximum size of the vector. */ template class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED vector { private: T data_[N]; unsigned int size_; public: //! \brief Constructs an empty vector with no memory allocated. vector() : size_(static_cast(0)) {} //! \brief Deallocates the vector's memory and destroys all of its elements. ~vector() { clear(); } //! \brief Returns the number of elements currently contained. unsigned int size(void) const { return size_; } /*! \brief Empties the vector of all elements. * \note * This does not deallocate memory but will invoke destructors * on contained elements. */ void clear() { while(!empty()) { pop_back(); } } /*! \brief Appends an element after the last valid element. * Calling this on a vector that has reached capacity will throw an * exception if exceptions are enabled. */ void push_back (const T& x) { if (size() < N) { new (&data_[size_]) T(x); size_++; } else { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } } /*! \brief Removes the last valid element from the vector. * Calling this on an empty vector will throw an exception * if exceptions are enabled. */ void pop_back(void) { if (size_ != 0) { --size_; data_[size_].~T(); } else { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } } /*! \brief Constructs with a value copied from another. * * \param vec the vector to copy. */ vector(const vector& vec) : size_(vec.size_) { if (size_ != 0) { assign(vec.begin(), vec.end()); } } /*! \brief Constructs with a specified number of initial elements. * * \param size number of initial elements. * * \param val value of initial elements. */ vector(unsigned int size, const T& val = T()) : size_(0) { for (unsigned int i = 0; i < size; i++) { push_back(val); } } /*! \brief Overwrites the current content with that copied from another * instance. * * \param rhs vector to copy. * * \returns a reference to this. */ vector& operator=(const vector& rhs) { if (this == &rhs) { return *this; } if (rhs.size_ != 0) { assign(rhs.begin(), rhs.end()); } else { clear(); } return *this; } /*! \brief Tests equality against another instance. * * \param vec the vector against which to compare. */ bool operator==(vector &vec) { if (size() != vec.size()) { return false; } for( unsigned int i = 0; i < size(); ++i ) { if( operator[](i) != vec[i] ) { return false; } } return true; } //! \brief Conversion operator to T*. operator T* () { return data_; } //! \brief Conversion operator to const T*. operator const T* () const { return data_; } //! \brief Tests whether this instance has any elements. bool empty (void) const { return size_==0; } //! \brief Returns the maximum number of elements this instance can hold. unsigned int max_size (void) const { return N; } //! \brief Returns the maximum number of elements this instance can hold. unsigned int capacity () const { return N; } //! \brief Resizes the vector to the given size void resize(unsigned int newSize, T fill = T()) { if (newSize > N) { detail::errHandler(CL_MEM_OBJECT_ALLOCATION_FAILURE, __VECTOR_CAPACITY_ERR); } else { while (size_ < newSize) { new (&data_[size_]) T(fill); size_++; } while (size_ > newSize) { --size_; data_[size_].~T(); } } } /*! \brief Returns a reference to a given element. * * \param index which element to access. * * \note * The caller is responsible for ensuring index is >= 0 and < size(). */ T& operator[](int index) { return data_[index]; } /*! \brief Returns a const reference to a given element. * * \param index which element to access. * * \note * The caller is responsible for ensuring index is >= 0 and < size(). */ const T& operator[](int index) const { return data_[index]; } /*! \brief Assigns elements of the vector based on a source iterator range. * * \param start Beginning iterator of source range * \param end Enditerator of source range * * \note * Will throw an exception if exceptions are enabled and size exceeded. */ template void assign(I start, I end) { clear(); while(start != end) { push_back(*start); start++; } } /*! \class iterator * \brief Const iterator class for vectors */ class iterator { private: const vector *vec_; int index_; /** * Internal iterator constructor to capture reference * to the vector it iterates over rather than taking * the vector by copy. */ iterator (const vector &vec, int index) : vec_(&vec) { if( !vec.empty() ) { index_ = index; } else { index_ = -1; } } public: iterator(void) : index_(-1), vec_(NULL) { } iterator(const iterator& rhs) : vec_(rhs.vec_), index_(rhs.index_) { } ~iterator(void) {} static iterator begin(const cl::vector &vec) { iterator i(vec, 0); return i; } static iterator end(const cl::vector &vec) { iterator i(vec, vec.size()); return i; } bool operator==(iterator i) { return ((vec_ == i.vec_) && (index_ == i.index_)); } bool operator!=(iterator i) { return (!(*this==i)); } iterator& operator++() { ++index_; return *this; } iterator operator++(int) { iterator retVal(*this); ++index_; return retVal; } iterator& operator--() { --index_; return *this; } iterator operator--(int) { iterator retVal(*this); --index_; return retVal; } const T& operator *() const { return (*vec_)[index_]; } }; iterator begin(void) { return iterator::begin(*this); } iterator begin(void) const { return iterator::begin(*this); } iterator end(void) { return iterator::end(*this); } iterator end(void) const { return iterator::end(*this); } T& front(void) { return data_[0]; } T& back(void) { return data_[size_]; } const T& front(void) const { return data_[0]; } const T& back(void) const { return data_[size_-1]; } } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; #endif // #if !defined(__USE_DEV_VECTOR) && !defined(__NO_STD_VECTOR) namespace detail { #define __DEFAULT_NOT_INITIALIZED 1 #define __DEFAULT_BEING_INITIALIZED 2 #define __DEFAULT_INITIALIZED 4 /* * Compare and exchange primitives are needed for handling of defaults */ #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED inline int compare_exchange(std::atomic * dest, int exchange, int comparand) #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED inline int compare_exchange(volatile int * dest, int exchange, int comparand) #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED { #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED std::atomic_compare_exchange_strong(dest, &comparand, exchange); return comparand; #elif _MSC_VER return (int)(_InterlockedCompareExchange( (volatile long*)dest, (long)exchange, (long)comparand)); #else // !_MSC_VER && !CL_HPP_CPP11_ATOMICS_SUPPORTED return (__sync_val_compare_and_swap( dest, comparand, exchange)); #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED } inline void fence() { #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED std::atomic_thread_fence(std::memory_order_seq_cst); #elif _MSC_VER // !CL_HPP_CPP11_ATOMICS_SUPPORTED _ReadWriteBarrier(); #else // !_MSC_VER && !CL_HPP_CPP11_ATOMICS_SUPPORTED __sync_synchronize(); #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED } } // namespace detail /*! \brief class used to interface between C++ and * OpenCL C calls that require arrays of size_t values, whose * size is known statically. */ template class size_t { private: ::size_t data_[N]; public: //! \brief Initialize size_t to all 0s size_t() { for( int i = 0; i < N; ++i ) { data_[i] = 0; } } ::size_t& operator[](int index) { return data_[index]; } const ::size_t& operator[](int index) const { return data_[index]; } //! \brief Conversion operator to T*. operator ::size_t* () { return data_; } //! \brief Conversion operator to const T*. operator const ::size_t* () const { return data_; } }; namespace detail { // Generic getInfoHelper. The final parameter is used to guide overload // resolution: the actual parameter passed is an int, which makes this // a worse conversion sequence than a specialization that declares the // parameter as an int. template inline cl_int getInfoHelper(Functor f, cl_uint name, T* param, long) { return f(name, sizeof(T), param, NULL); } // Specialized getInfoHelper for VECTOR_CLASS params template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, long) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } T* value = (T*) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } param->assign(&value[0], &value[required/sizeof(T)]); return CL_SUCCESS; } /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, int, typename T::cl_type = 0) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } typename T::cl_type * value = (typename T::cl_type *) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } ::size_t elements = required / sizeof(typename T::cl_type); param->assign(&value[0], &value[elements]); for (::size_t i = 0; i < elements; i++) { if (value[i] != NULL) { err = (*param)[i].retain(); if (err != CL_SUCCESS) { return err; } } } return CL_SUCCESS; } // Specialized for getInfo template inline cl_int getInfoHelper(Func f, cl_uint name, VECTOR_CLASS* param, int) { cl_int err = f(name, param->size() * sizeof(char *), &(*param)[0], NULL); if (err != CL_SUCCESS) { return err; } return CL_SUCCESS; } // Specialized GetInfoHelper for STRING_CLASS params template inline cl_int getInfoHelper(Func f, cl_uint name, STRING_CLASS* param, long) { #if defined(__NO_STD_VECTOR) || defined(__NO_STD_STRING) ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } char* value = (char*)alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; return CL_SUCCESS; #else ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } // std::string has a constant data member // a char vector does not VECTOR_CLASS value(required); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { param->assign(value.begin(), value.end()); } #endif return CL_SUCCESS; } // Specialized GetInfoHelper for cl::size_t params template inline cl_int getInfoHelper(Func f, cl_uint name, size_t* param, long) { ::size_t required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } ::size_t* value = (::size_t*) alloca(required); err = f(name, required, value, NULL); if (err != CL_SUCCESS) { return err; } for(int i = 0; i < N; ++i) { (*param)[i] = value[i]; } return CL_SUCCESS; } template struct ReferenceHandler; /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, T* param, int, typename T::cl_type = 0) { typename T::cl_type value; cl_int err = f(name, sizeof(value), &value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; if (value != NULL) { err = param->retain(); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } #define __PARAM_NAME_INFO_1_0(F) \ F(cl_platform_info, CL_PLATFORM_PROFILE, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_VERSION, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_NAME, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_VENDOR, STRING_CLASS) \ F(cl_platform_info, CL_PLATFORM_EXTENSIONS, STRING_CLASS) \ \ F(cl_device_info, CL_DEVICE_TYPE, cl_device_type) \ F(cl_device_info, CL_DEVICE_VENDOR_ID, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_COMPUTE_UNITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE, ::size_t) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_SIZES, VECTOR_CLASS< ::size_t>) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_CLOCK_FREQUENCY, cl_uint) \ F(cl_device_info, CL_DEVICE_ADDRESS_BITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_READ_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_MEM_ALLOC_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_WIDTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_HEIGHT, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_WIDTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_HEIGHT, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_DEPTH, ::size_t) \ F(cl_device_info, CL_DEVICE_IMAGE_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_MAX_PARAMETER_SIZE, ::size_t) \ F(cl_device_info, CL_DEVICE_MAX_SAMPLERS, cl_uint) \ F(cl_device_info, CL_DEVICE_MEM_BASE_ADDR_ALIGN, cl_uint) \ F(cl_device_info, CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SINGLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, cl_device_mem_cache_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, cl_uint)\ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_TYPE, cl_device_local_mem_type) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_ERROR_CORRECTION_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_PROFILING_TIMER_RESOLUTION, ::size_t) \ F(cl_device_info, CL_DEVICE_ENDIAN_LITTLE, cl_bool) \ F(cl_device_info, CL_DEVICE_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_COMPILER_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_EXECUTION_CAPABILITIES, cl_device_exec_capabilities) \ F(cl_device_info, CL_DEVICE_QUEUE_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_PLATFORM, cl_platform_id) \ F(cl_device_info, CL_DEVICE_NAME, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_VENDOR, STRING_CLASS) \ F(cl_device_info, CL_DRIVER_VERSION, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_PROFILE, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_VERSION, STRING_CLASS) \ F(cl_device_info, CL_DEVICE_EXTENSIONS, STRING_CLASS) \ \ F(cl_context_info, CL_CONTEXT_REFERENCE_COUNT, cl_uint) \ F(cl_context_info, CL_CONTEXT_DEVICES, VECTOR_CLASS) \ F(cl_context_info, CL_CONTEXT_PROPERTIES, VECTOR_CLASS) \ \ F(cl_event_info, CL_EVENT_COMMAND_QUEUE, cl::CommandQueue) \ F(cl_event_info, CL_EVENT_COMMAND_TYPE, cl_command_type) \ F(cl_event_info, CL_EVENT_REFERENCE_COUNT, cl_uint) \ F(cl_event_info, CL_EVENT_COMMAND_EXECUTION_STATUS, cl_int) \ \ F(cl_profiling_info, CL_PROFILING_COMMAND_QUEUED, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_SUBMIT, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_START, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_END, cl_ulong) \ \ F(cl_mem_info, CL_MEM_TYPE, cl_mem_object_type) \ F(cl_mem_info, CL_MEM_FLAGS, cl_mem_flags) \ F(cl_mem_info, CL_MEM_SIZE, ::size_t) \ F(cl_mem_info, CL_MEM_HOST_PTR, void*) \ F(cl_mem_info, CL_MEM_MAP_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_REFERENCE_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_CONTEXT, cl::Context) \ \ F(cl_image_info, CL_IMAGE_FORMAT, cl_image_format) \ F(cl_image_info, CL_IMAGE_ELEMENT_SIZE, ::size_t) \ F(cl_image_info, CL_IMAGE_ROW_PITCH, ::size_t) \ F(cl_image_info, CL_IMAGE_SLICE_PITCH, ::size_t) \ F(cl_image_info, CL_IMAGE_WIDTH, ::size_t) \ F(cl_image_info, CL_IMAGE_HEIGHT, ::size_t) \ F(cl_image_info, CL_IMAGE_DEPTH, ::size_t) \ \ F(cl_sampler_info, CL_SAMPLER_REFERENCE_COUNT, cl_uint) \ F(cl_sampler_info, CL_SAMPLER_CONTEXT, cl::Context) \ F(cl_sampler_info, CL_SAMPLER_NORMALIZED_COORDS, cl_bool) \ F(cl_sampler_info, CL_SAMPLER_ADDRESSING_MODE, cl_addressing_mode) \ F(cl_sampler_info, CL_SAMPLER_FILTER_MODE, cl_filter_mode) \ \ F(cl_program_info, CL_PROGRAM_REFERENCE_COUNT, cl_uint) \ F(cl_program_info, CL_PROGRAM_CONTEXT, cl::Context) \ F(cl_program_info, CL_PROGRAM_NUM_DEVICES, cl_uint) \ F(cl_program_info, CL_PROGRAM_DEVICES, VECTOR_CLASS) \ F(cl_program_info, CL_PROGRAM_SOURCE, STRING_CLASS) \ F(cl_program_info, CL_PROGRAM_BINARY_SIZES, VECTOR_CLASS< ::size_t>) \ F(cl_program_info, CL_PROGRAM_BINARIES, VECTOR_CLASS) \ \ F(cl_program_build_info, CL_PROGRAM_BUILD_STATUS, cl_build_status) \ F(cl_program_build_info, CL_PROGRAM_BUILD_OPTIONS, STRING_CLASS) \ F(cl_program_build_info, CL_PROGRAM_BUILD_LOG, STRING_CLASS) \ \ F(cl_kernel_info, CL_KERNEL_FUNCTION_NAME, STRING_CLASS) \ F(cl_kernel_info, CL_KERNEL_NUM_ARGS, cl_uint) \ F(cl_kernel_info, CL_KERNEL_REFERENCE_COUNT, cl_uint) \ F(cl_kernel_info, CL_KERNEL_CONTEXT, cl::Context) \ F(cl_kernel_info, CL_KERNEL_PROGRAM, cl::Program) \ \ F(cl_kernel_work_group_info, CL_KERNEL_WORK_GROUP_SIZE, ::size_t) \ F(cl_kernel_work_group_info, CL_KERNEL_COMPILE_WORK_GROUP_SIZE, cl::size_t<3>) \ F(cl_kernel_work_group_info, CL_KERNEL_LOCAL_MEM_SIZE, cl_ulong) \ \ F(cl_command_queue_info, CL_QUEUE_CONTEXT, cl::Context) \ F(cl_command_queue_info, CL_QUEUE_DEVICE, cl::Device) \ F(cl_command_queue_info, CL_QUEUE_REFERENCE_COUNT, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_PROPERTIES, cl_command_queue_properties) #if defined(CL_VERSION_1_1) #define __PARAM_NAME_INFO_1_1(F) \ F(cl_context_info, CL_CONTEXT_NUM_DEVICES, cl_uint)\ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_DOUBLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HALF_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HOST_UNIFIED_MEMORY, cl_bool) \ F(cl_device_info, CL_DEVICE_OPENCL_C_VERSION, STRING_CLASS) \ \ F(cl_mem_info, CL_MEM_ASSOCIATED_MEMOBJECT, cl::Memory) \ F(cl_mem_info, CL_MEM_OFFSET, ::size_t) \ \ F(cl_kernel_work_group_info, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, ::size_t) \ F(cl_kernel_work_group_info, CL_KERNEL_PRIVATE_MEM_SIZE, cl_ulong) \ \ F(cl_event_info, CL_EVENT_CONTEXT, cl::Context) #endif // CL_VERSION_1_1 #if defined(CL_VERSION_1_2) #define __PARAM_NAME_INFO_1_2(F) \ F(cl_image_info, CL_IMAGE_BUFFER, cl::Buffer) \ \ F(cl_program_info, CL_PROGRAM_NUM_KERNELS, ::size_t) \ F(cl_program_info, CL_PROGRAM_KERNEL_NAMES, STRING_CLASS) \ \ F(cl_program_build_info, CL_PROGRAM_BINARY_TYPE, cl_program_binary_type) \ \ F(cl_kernel_info, CL_KERNEL_ATTRIBUTES, STRING_CLASS) \ \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ADDRESS_QUALIFIER, cl_kernel_arg_address_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ACCESS_QUALIFIER, cl_kernel_arg_access_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_NAME, STRING_CLASS) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_NAME, STRING_CLASS) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_QUALIFIER, cl_kernel_arg_type_qualifier) \ \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_PROPERTIES, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPE, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, ::size_t) \ F(cl_device_info, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, cl_device_affinity_domain) \ F(cl_device_info, CL_DEVICE_BUILT_IN_KERNELS, STRING_CLASS) #endif // #if defined(CL_VERSION_1_2) #if defined(USE_CL_DEVICE_FISSION) #define __PARAM_NAME_DEVICE_FISSION(F) \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE_EXT, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPES_EXT, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_AFFINITY_DOMAINS_EXT, VECTOR_CLASS) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT_EXT , cl_uint) \ F(cl_device_info, CL_DEVICE_PARTITION_STYLE_EXT, VECTOR_CLASS) #endif // USE_CL_DEVICE_FISSION template struct param_traits {}; #define __CL_DECLARE_PARAM_TRAITS(token, param_name, T) \ struct token; \ template<> \ struct param_traits \ { \ enum { value = param_name }; \ typedef T param_type; \ }; __PARAM_NAME_INFO_1_0(__CL_DECLARE_PARAM_TRAITS) #if defined(CL_VERSION_1_1) __PARAM_NAME_INFO_1_1(__CL_DECLARE_PARAM_TRAITS) #endif // CL_VERSION_1_1 #if defined(CL_VERSION_1_2) __PARAM_NAME_INFO_1_2(__CL_DECLARE_PARAM_TRAITS) #endif // CL_VERSION_1_1 #if defined(USE_CL_DEVICE_FISSION) __PARAM_NAME_DEVICE_FISSION(__CL_DECLARE_PARAM_TRAITS); #endif // USE_CL_DEVICE_FISSION #ifdef CL_PLATFORM_ICD_SUFFIX_KHR __CL_DECLARE_PARAM_TRAITS(cl_platform_info, CL_PLATFORM_ICD_SUFFIX_KHR, STRING_CLASS) #endif #ifdef CL_DEVICE_PROFILING_TIMER_OFFSET_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PROFILING_TIMER_OFFSET_AMD, cl_ulong) #endif #ifdef CL_DEVICE_GLOBAL_FREE_MEMORY_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, VECTOR_CLASS< ::size_t>) #endif #ifdef CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_WAVEFRONT_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_WAVEFRONT_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_BANKS_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_LOCAL_MEM_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD, ::size_t) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV, cl_uint) #endif #ifdef CL_DEVICE_REGISTERS_PER_BLOCK_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_REGISTERS_PER_BLOCK_NV, cl_uint) #endif #ifdef CL_DEVICE_WARP_SIZE_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_WARP_SIZE_NV, cl_uint) #endif #ifdef CL_DEVICE_GPU_OVERLAP_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_GPU_OVERLAP_NV, cl_bool) #endif #ifdef CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV, cl_bool) #endif #ifdef CL_DEVICE_INTEGRATED_MEMORY_NV __CL_DECLARE_PARAM_TRAITS(cl_device_info, CL_DEVICE_INTEGRATED_MEMORY_NV, cl_bool) #endif // Convenience functions template inline cl_int getInfo(Func f, cl_uint name, T* param) { return getInfoHelper(f, name, param, 0); } template struct GetInfoFunctor0 { Func f_; const Arg0& arg0_; cl_int operator ()( cl_uint param, ::size_t size, void* value, ::size_t* size_ret) { return f_(arg0_, param, size, value, size_ret); } }; template struct GetInfoFunctor1 { Func f_; const Arg0& arg0_; const Arg1& arg1_; cl_int operator ()( cl_uint param, ::size_t size, void* value, ::size_t* size_ret) { return f_(arg0_, arg1_, param, size, value, size_ret); } }; template inline cl_int getInfo(Func f, const Arg0& arg0, cl_uint name, T* param) { GetInfoFunctor0 f0 = { f, arg0 }; return getInfoHelper(f0, name, param, 0); } template inline cl_int getInfo(Func f, const Arg0& arg0, const Arg1& arg1, cl_uint name, T* param) { GetInfoFunctor1 f0 = { f, arg0, arg1 }; return getInfoHelper(f0, name, param, 0); } template struct ReferenceHandler { }; #if defined(CL_VERSION_1_2) /** * OpenCL 1.2 devices do have retain/release. */ template <> struct ReferenceHandler { /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int retain(cl_device_id device) { return ::clRetainDevice(device); } /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int release(cl_device_id device) { return ::clReleaseDevice(device); } }; #else // #if defined(CL_VERSION_1_2) /** * OpenCL 1.1 devices do not have retain/release. */ template <> struct ReferenceHandler { // cl_device_id does not have retain(). static cl_int retain(cl_device_id) { return CL_SUCCESS; } // cl_device_id does not have release(). static cl_int release(cl_device_id) { return CL_SUCCESS; } }; #endif // #if defined(CL_VERSION_1_2) template <> struct ReferenceHandler { // cl_platform_id does not have retain(). static cl_int retain(cl_platform_id) { return CL_SUCCESS; } // cl_platform_id does not have release(). static cl_int release(cl_platform_id) { return CL_SUCCESS; } }; template <> struct ReferenceHandler { static cl_int retain(cl_context context) { return ::clRetainContext(context); } static cl_int release(cl_context context) { return ::clReleaseContext(context); } }; template <> struct ReferenceHandler { static cl_int retain(cl_command_queue queue) { return ::clRetainCommandQueue(queue); } static cl_int release(cl_command_queue queue) { return ::clReleaseCommandQueue(queue); } }; template <> struct ReferenceHandler { static cl_int retain(cl_mem memory) { return ::clRetainMemObject(memory); } static cl_int release(cl_mem memory) { return ::clReleaseMemObject(memory); } }; template <> struct ReferenceHandler { static cl_int retain(cl_sampler sampler) { return ::clRetainSampler(sampler); } static cl_int release(cl_sampler sampler) { return ::clReleaseSampler(sampler); } }; template <> struct ReferenceHandler { static cl_int retain(cl_program program) { return ::clRetainProgram(program); } static cl_int release(cl_program program) { return ::clReleaseProgram(program); } }; template <> struct ReferenceHandler { static cl_int retain(cl_kernel kernel) { return ::clRetainKernel(kernel); } static cl_int release(cl_kernel kernel) { return ::clReleaseKernel(kernel); } }; template <> struct ReferenceHandler { static cl_int retain(cl_event event) { return ::clRetainEvent(event); } static cl_int release(cl_event event) { return ::clReleaseEvent(event); } }; // Extracts version number with major in the upper 16 bits, minor in the lower 16 static cl_uint getVersion(const char *versionInfo) { int highVersion = 0; int lowVersion = 0; int index = 7; while(versionInfo[index] != '.' ) { highVersion *= 10; highVersion += versionInfo[index]-'0'; ++index; } ++index; while(versionInfo[index] != ' ' && versionInfo[index] != '\0') { lowVersion *= 10; lowVersion += versionInfo[index]-'0'; ++index; } return (highVersion << 16) | lowVersion; } static cl_uint getPlatformVersion(cl_platform_id platform) { ::size_t size = 0; clGetPlatformInfo(platform, CL_PLATFORM_VERSION, 0, NULL, &size); char *versionInfo = (char *) alloca(size); clGetPlatformInfo(platform, CL_PLATFORM_VERSION, size, &versionInfo[0], &size); return getVersion(versionInfo); } static cl_uint getDevicePlatformVersion(cl_device_id device) { cl_platform_id platform; clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(platform), &platform, NULL); return getPlatformVersion(platform); } #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) static cl_uint getContextPlatformVersion(cl_context context) { // The platform cannot be queried directly, so we first have to grab a // device and obtain its context ::size_t size = 0; clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &size); if (size == 0) return 0; cl_device_id *devices = (cl_device_id *) alloca(size); clGetContextInfo(context, CL_CONTEXT_DEVICES, size, devices, NULL); return getDevicePlatformVersion(devices[0]); } #endif // #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) template class Wrapper { public: typedef T cl_type; protected: cl_type object_; public: Wrapper() : object_(NULL) { } Wrapper(const cl_type &obj) : object_(obj) { } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT { object_ = rhs.object_; rhs.object_ = NULL; } #endif Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; rhs.object_ = NULL; } return *this; } #endif Wrapper& operator = (const cl_type &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs; return *this; } cl_type operator ()() const { return object_; } cl_type& operator ()() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); cl_int retain() const { return ReferenceHandler::retain(object_); } cl_int release() const { return ReferenceHandler::release(object_); } }; template <> class Wrapper { public: typedef cl_device_id cl_type; protected: cl_type object_; bool referenceCountable_; static bool isReferenceCountable(cl_device_id device) { bool retVal = false; if (device != NULL) { int version = getDevicePlatformVersion(device); if(version > ((1 << 16) + 1)) { retVal = true; } } return retVal; } public: Wrapper() : object_(NULL), referenceCountable_(false) { } Wrapper(const cl_type &obj) : object_(obj), referenceCountable_(false) { referenceCountable_ = isReferenceCountable(obj); } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; referenceCountable_ = isReferenceCountable(object_); if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT { object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } #endif Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; if (object_ != NULL) { detail::errHandler(retain(), __RETAIN_ERR); } } return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } return *this; } #endif Wrapper& operator = (const cl_type &rhs) { if (object_ != NULL) { detail::errHandler(release(), __RELEASE_ERR); } object_ = rhs; referenceCountable_ = isReferenceCountable(object_); return *this; } cl_type operator ()() const { return object_; } cl_type& operator ()() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); template friend inline cl_int getInfoHelper(Func, cl_uint, VECTOR_CLASS*, int, typename U::cl_type); cl_int retain() const { if( referenceCountable_ ) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if( referenceCountable_ ) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; } // namespace detail //! \endcond /*! \stuct ImageFormat * \brief Adds constructors and member functions for cl_image_format. * * \see cl_image_format */ struct ImageFormat : public cl_image_format { //! \brief Default constructor - performs no initialization. ImageFormat(){} //! \brief Initializing constructor. ImageFormat(cl_channel_order order, cl_channel_type type) { image_channel_order = order; image_channel_data_type = type; } //! \brief Assignment operator. ImageFormat& operator = (const ImageFormat& rhs) { if (this != &rhs) { this->image_channel_data_type = rhs.image_channel_data_type; this->image_channel_order = rhs.image_channel_order; } return *this; } }; /*! \brief Class interface for cl_device_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_device_id */ class Device : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Device() : detail::Wrapper() { } /*! \brief Constructor from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ __CL_EXPLICIT_CONSTRUCTORS Device(const cl_device_id &device) : detail::Wrapper(device) { } /*! \brief Returns the first device on the default context. * * \see Context::getDefault() */ static Device getDefault(cl_int * err = NULL); /*! \brief Assignment operator from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ Device& operator = (const cl_device_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Device(const Device& dev) : detail::Wrapper(dev) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Device& operator = (const Device &dev) { detail::Wrapper::operator=(dev); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Device(Device&& dev) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(dev)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Device& operator = (Device &&dev) { detail::Wrapper::operator=(std::move(dev)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetDeviceInfo(). template cl_int getInfo(cl_device_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetDeviceInfo, object_, name, param), __GET_DEVICE_INFO_ERR); } //! \brief Wrapper for clGetDeviceInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_device_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /** * CL 1.2 version */ #if defined(CL_VERSION_1_2) //! \brief Wrapper for clCreateSubDevicesEXT(). cl_int createSubDevices( const cl_device_partition_property * properties, VECTOR_CLASS* devices) { cl_uint n = 0; cl_int err = clCreateSubDevices(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = clCreateSubDevices(object_, properties, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif // #if defined(CL_VERSION_1_2) /** * CL 1.1 version that uses device fission. */ #if defined(CL_VERSION_1_1) #if defined(USE_CL_DEVICE_FISSION) cl_int createSubDevices( const cl_device_partition_property_ext * properties, VECTOR_CLASS* devices) { typedef CL_API_ENTRY cl_int ( CL_API_CALL * PFN_clCreateSubDevicesEXT)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; static PFN_clCreateSubDevicesEXT pfn_clCreateSubDevicesEXT = NULL; __INIT_CL_EXT_FCN_PTR(clCreateSubDevicesEXT); cl_uint n = 0; cl_int err = pfn_clCreateSubDevicesEXT(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = pfn_clCreateSubDevicesEXT(object_, properties, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif // #if defined(USE_CL_DEVICE_FISSION) #endif // #if defined(CL_VERSION_1_1) }; /*! \brief Class interface for cl_platform_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_platform_id */ class Platform : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Platform() : detail::Wrapper() { } /*! \brief Constructor from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ __CL_EXPLICIT_CONSTRUCTORS Platform(const cl_platform_id &platform) : detail::Wrapper(platform) { } /*! \brief Assignment operator from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ Platform& operator = (const cl_platform_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetPlatformInfo(). cl_int getInfo(cl_platform_info name, STRING_CLASS* param) const { return detail::errHandler( detail::getInfo(&::clGetPlatformInfo, object_, name, param), __GET_PLATFORM_INFO_ERR); } //! \brief Wrapper for clGetPlatformInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_platform_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of devices for this platform. * * Wraps clGetDeviceIDs(). */ cl_int getDevices( cl_device_type type, VECTOR_CLASS* devices) const { cl_uint n = 0; if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } cl_int err = ::clGetDeviceIDs(object_, type, 0, NULL, &n); if (err != CL_SUCCESS && err != CL_DEVICE_NOT_FOUND) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } if (n > 0) { cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = ::clGetDeviceIDs(object_, type, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } devices->assign(&ids[0], &ids[n]); } else { devices->clear(); } return CL_SUCCESS; } #if defined(USE_DX_INTEROP) /*! \brief Get the list of available D3D10 devices. * * \param d3d_device_source. * * \param d3d_object. * * \param d3d_device_set. * * \param devices returns a vector of OpenCL D3D10 devices found. The cl::Device * values returned in devices can be used to identify a specific OpenCL * device. If \a devices argument is NULL, this argument is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * * The application can query specific capabilities of the OpenCL device(s) * returned by cl::getDevices. This can be used by the application to * determine which device(s) to use. * * \note In the case that exceptions are enabled and a return value * other than CL_SUCCESS is generated, then cl::Error exception is * generated. */ cl_int getDevices( cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, VECTOR_CLASS* devices) const { typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clGetDeviceIDsFromD3D10KHR)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint* num_devices); if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } static PFN_clGetDeviceIDsFromD3D10KHR pfn_clGetDeviceIDsFromD3D10KHR = NULL; __INIT_CL_EXT_FCN_PTR_PLATFORM(object_, clGetDeviceIDsFromD3D10KHR); cl_uint n = 0; cl_int err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } cl_device_id* ids = (cl_device_id*) alloca(n * sizeof(cl_device_id)); err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } devices->assign(&ids[0], &ids[n]); return CL_SUCCESS; } #endif /*! \brief Gets a list of available platforms. * * Wraps clGetPlatformIDs(). */ static cl_int get( VECTOR_CLASS* platforms) { cl_uint n = 0; if( platforms == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } platforms->assign(&ids[0], &ids[n]); return CL_SUCCESS; } /*! \brief Gets the first available platform. * * Wraps clGetPlatformIDs(), returning the first result. */ static cl_int get( Platform * platform) { cl_uint n = 0; if( platform == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } *platform = ids[0]; return CL_SUCCESS; } /*! \brief Gets the first available platform, returning it by value. * * Wraps clGetPlatformIDs(), returning the first result. */ static Platform get( cl_int * errResult = NULL) { Platform platform; cl_uint n = 0; cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { detail::errHandler(err, __GET_PLATFORM_IDS_ERR); if (errResult != NULL) { *errResult = err; } return Platform(); } cl_platform_id* ids = (cl_platform_id*) alloca( n * sizeof(cl_platform_id)); err = ::clGetPlatformIDs(n, ids, NULL); if (err != CL_SUCCESS) { detail::errHandler(err, __GET_PLATFORM_IDS_ERR); if (errResult != NULL) { *errResult = err; } return Platform(); } return Platform(ids[0]); } static Platform getDefault( cl_int *errResult = NULL ) { return get(errResult); } #if defined(CL_VERSION_1_2) //! \brief Wrapper for clUnloadCompiler(). cl_int unloadCompiler() { return ::clUnloadPlatformCompiler(object_); } #endif // #if defined(CL_VERSION_1_2) }; // class Platform /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) /** * Unload the OpenCL compiler. * \note Deprecated for OpenCL 1.2. Use Platform::unloadCompiler instead. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int UnloadCompiler() CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline cl_int UnloadCompiler() { return ::clUnloadCompiler(); } #endif // #if defined(CL_VERSION_1_1) /*! \brief Class interface for cl_context. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_context as the original. For details, see * clRetainContext() and clReleaseContext(). * * \see cl_context */ class Context : public detail::Wrapper { private: #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED static std::atomic default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED static volatile int default_initialized_; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED static Context default_; static volatile cl_int default_error_; public: /*! \brief Constructs a context including a list of specified devices. * * Wraps clCreateContext(). */ Context( const VECTOR_CLASS& devices, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateContext( properties, (cl_uint) numDevices, deviceIDs, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } Context( const Device& device, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; cl_device_id deviceID = device(); object_ = ::clCreateContext( properties, 1, &deviceID, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a context including all or a subset of devices of a specified type. * * Wraps clCreateContextFromType(). */ Context( cl_device_type type, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, ::size_t, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties prop[4] = {CL_CONTEXT_PLATFORM, 0, 0, 0 }; if (properties == NULL) { // Get a valid platform ID as we cannot send in a blank one VECTOR_CLASS platforms; error = Platform::get(&platforms); if (error != CL_SUCCESS) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } return; } // Check the platforms we found for a device of our specified type cl_context_properties platform_id = 0; for (unsigned int i = 0; i < platforms.size(); i++) { VECTOR_CLASS devices; #if defined(__CL_ENABLE_EXCEPTIONS) try { #endif error = platforms[i].getDevices(type, &devices); #if defined(__CL_ENABLE_EXCEPTIONS) } catch (Error) {} // Catch if exceptions are enabled as we don't want to exit if first platform has no devices of type // We do error checking next anyway, and can throw there if needed #endif // Only squash CL_SUCCESS and CL_DEVICE_NOT_FOUND if (error != CL_SUCCESS && error != CL_DEVICE_NOT_FOUND) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } if (devices.size() > 0) { platform_id = (cl_context_properties)platforms[i](); break; } } if (platform_id == 0) { detail::errHandler(CL_DEVICE_NOT_FOUND, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = CL_DEVICE_NOT_FOUND; } return; } prop[1] = platform_id; properties = &prop[0]; } #endif object_ = ::clCreateContextFromType( properties, type, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Context(const Context& ctx) : detail::Wrapper(ctx) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Context& operator = (const Context &ctx) { detail::Wrapper::operator=(ctx); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Context(Context&& ctx) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(ctx)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Context& operator = (Context &&ctx) { detail::Wrapper::operator=(std::move(ctx)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Returns a singleton context including all devices of CL_DEVICE_TYPE_DEFAULT. * * \note All calls to this function return the same cl_context as the first. */ static Context getDefault(cl_int * err = NULL) { int state = detail::compare_exchange( &default_initialized_, __DEFAULT_BEING_INITIALIZED, __DEFAULT_NOT_INITIALIZED); if (state & __DEFAULT_INITIALIZED) { if (err != NULL) { *err = default_error_; } return default_; } if (state & __DEFAULT_BEING_INITIALIZED) { // Assume writes will propagate eventually... while(default_initialized_ != __DEFAULT_INITIALIZED) { detail::fence(); } if (err != NULL) { *err = default_error_; } return default_; } cl_int error; default_ = Context( CL_DEVICE_TYPE_DEFAULT, NULL, NULL, NULL, &error); detail::fence(); default_error_ = error; // Assume writes will propagate eventually... default_initialized_ = __DEFAULT_INITIALIZED; detail::fence(); if (err != NULL) { *err = default_error_; } return default_; } //! \brief Default constructor - initializes to NULL. Context() : detail::Wrapper() { } /*! \brief Constructor from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the cl_context * into the new Context object. */ __CL_EXPLICIT_CONSTRUCTORS Context(const cl_context& context) : detail::Wrapper(context) { } /*! \brief Assignment operator from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseContext() on the value previously held by this instance. */ Context& operator = (const cl_context& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetContextInfo(). template cl_int getInfo(cl_context_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetContextInfo, object_, name, param), __GET_CONTEXT_INFO_ERR); } //! \brief Wrapper for clGetContextInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_context_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of supported image formats. * * Wraps clGetSupportedImageFormats(). */ cl_int getSupportedImageFormats( cl_mem_flags flags, cl_mem_object_type type, VECTOR_CLASS* formats) const { cl_uint numEntries; if (!formats) { return CL_SUCCESS; } cl_int err = ::clGetSupportedImageFormats( object_, flags, type, 0, NULL, &numEntries); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } if (numEntries > 0) { ImageFormat* value = (ImageFormat*) alloca(numEntries * sizeof(ImageFormat)); err = ::clGetSupportedImageFormats( object_, flags, type, numEntries, (cl_image_format*)value, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } formats->assign(&value[0], &value[numEntries]); } else { formats->clear(); } return CL_SUCCESS; } }; inline Device Device::getDefault(cl_int * err) { cl_int error; Device device; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { device = context.getInfo()[0]; if (err != NULL) { *err = CL_SUCCESS; } } return device; } #ifdef _WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) std::atomic Context::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) volatile int Context::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) Context Context::default_; __declspec(selectany) volatile cl_int Context::default_error_ = CL_SUCCESS; #else // !_WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) std::atomic Context::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) volatile int Context::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) Context Context::default_; __attribute__((weak)) volatile cl_int Context::default_error_ = CL_SUCCESS; #endif // !_WIN32 /*! \brief Class interface for cl_event. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_event as the original. For details, see * clRetainEvent() and clReleaseEvent(). * * \see cl_event */ class Event : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Event() : detail::Wrapper() { } /*! \brief Constructor from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the cl_event * into the new Event object. */ __CL_EXPLICIT_CONSTRUCTORS Event(const cl_event& event) : detail::Wrapper(event) { } /*! \brief Assignment operator from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseEvent() on the value previously held by this instance. */ Event& operator = (const cl_event& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetEventInfo(). template cl_int getInfo(cl_event_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetEventInfo, object_, name, param), __GET_EVENT_INFO_ERR); } //! \brief Wrapper for clGetEventInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_event_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } //! \brief Wrapper for clGetEventProfilingInfo(). template cl_int getProfilingInfo(cl_profiling_info name, T* param) const { return detail::errHandler(detail::getInfo( &::clGetEventProfilingInfo, object_, name, param), __GET_EVENT_PROFILE_INFO_ERR); } //! \brief Wrapper for clGetEventProfilingInfo() that returns by value. template typename detail::param_traits::param_type getProfilingInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_profiling_info, name>::param_type param; cl_int result = getProfilingInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Blocks the calling thread until this event completes. * * Wraps clWaitForEvents(). */ cl_int wait() const { return detail::errHandler( ::clWaitForEvents(1, &object_), __WAIT_FOR_EVENTS_ERR); } #if defined(CL_VERSION_1_1) /*! \brief Registers a user callback function for a specific command execution status. * * Wraps clSetEventCallback(). */ cl_int setCallback( cl_int type, void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data = NULL) { return detail::errHandler( ::clSetEventCallback( object_, type, pfn_notify, user_data), __SET_EVENT_CALLBACK_ERR); } #endif /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ static cl_int waitForEvents(const VECTOR_CLASS& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } }; #if defined(CL_VERSION_1_1) /*! \brief Class interface for user events (a subset of cl_event's). * * See Event for details about copy semantics, etc. */ class UserEvent : public Event { public: /*! \brief Constructs a user event on a given context. * * Wraps clCreateUserEvent(). */ UserEvent( const Context& context, cl_int * err = NULL) { cl_int error; object_ = ::clCreateUserEvent( context(), &error); detail::errHandler(error, __CREATE_USER_EVENT_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. UserEvent() : Event() { } /*! \brief Sets the execution status of a user event object. * * Wraps clSetUserEventStatus(). */ cl_int setStatus(cl_int status) { return detail::errHandler( ::clSetUserEventStatus(object_,status), __SET_USER_EVENT_STATUS_ERR); } }; #endif /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ inline static cl_int WaitForEvents(const VECTOR_CLASS& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } /*! \brief Class interface for cl_mem. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_mem as the original. For details, see * clRetainMemObject() and clReleaseMemObject(). * * \see cl_mem */ class Memory : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Memory() : detail::Wrapper() { } /*! \brief Constructor from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the cl_mem * into the new Memory object. */ __CL_EXPLICIT_CONSTRUCTORS Memory(const cl_mem& memory) : detail::Wrapper(memory) { } /*! \brief Assignment operator from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseMemObject() on the value previously held by this instance. */ Memory& operator = (const cl_mem& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Memory(const Memory& mem) : detail::Wrapper(mem) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Memory& operator = (const Memory &mem) { detail::Wrapper::operator=(mem); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Memory(Memory&& mem) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(mem)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Memory& operator = (Memory &&mem) { detail::Wrapper::operator=(std::move(mem)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_mem_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetMemObjectInfo, object_, name, param), __GET_MEM_OBJECT_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_mem_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if defined(CL_VERSION_1_1) /*! \brief Registers a callback function to be called when the memory object * is no longer needed. * * Wraps clSetMemObjectDestructorCallback(). * * Repeated calls to this function, for a given cl_mem value, will append * to the list of functions called (in reverse order) when memory object's * resources are freed and the memory object is deleted. * * \note * The registered callbacks are associated with the underlying cl_mem * value - not the Memory class instance. */ cl_int setDestructorCallback( void (CL_CALLBACK * pfn_notify)(cl_mem, void *), void * user_data = NULL) { return detail::errHandler( ::clSetMemObjectDestructorCallback( object_, pfn_notify, user_data), __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR); } #endif }; // Pre-declare copy functions class Buffer; template< typename IteratorType > cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); /*! \brief Class interface for Buffer Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Buffer : public Memory { public: /*! \brief Constructs a Buffer in a specified context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. */ Buffer( const Context& context, cl_mem_flags flags, ::size_t size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Buffer in the default context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. * * \see Context::getDefault() */ Buffer( cl_mem_flags flags, ::size_t size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! * \brief Construct a Buffer from a host container via iterators. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer( IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); Context context = Context::getDefault(err); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { error = cl::copy(startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Construct a Buffer from a host container via iterators using a specified context. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); /*! * \brief Construct a Buffer from a host container via iterators using a specified queue. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Buffer() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Buffer(const cl_mem& buffer) : Memory(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Buffer& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Buffer(const Buffer& buf) : Memory(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Buffer& operator = (const Buffer &buf) { Memory::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Buffer(Buffer&& buf) CL_HPP_NOEXCEPT : Memory(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Buffer& operator = (Buffer &&buf) { Memory::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) #if defined(CL_VERSION_1_1) /*! \brief Creates a new buffer object from this. * * Wraps clCreateSubBuffer(). */ Buffer createSubBuffer( cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * err = NULL) { Buffer result; cl_int error; result.object_ = ::clCreateSubBuffer( object_, flags, buffer_create_type, buffer_create_info, &error); detail::errHandler(error, __CREATE_SUBBUFFER_ERR); if (err != NULL) { *err = error; } return result; } #endif }; #if defined (USE_DX_INTEROP) /*! \brief Class interface for creating OpenCL buffers from ID3D10Buffer's. * * This is provided to facilitate interoperability with Direct3D. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferD3D10 : public Buffer { public: typedef CL_API_ENTRY cl_mem (CL_API_CALL *PFN_clCreateFromD3D10BufferKHR)( cl_context context, cl_mem_flags flags, ID3D10Buffer* buffer, cl_int* errcode_ret); /*! \brief Constructs a BufferD3D10, in a specified context, from a * given ID3D10Buffer. * * Wraps clCreateFromD3D10BufferKHR(). */ BufferD3D10( const Context& context, cl_mem_flags flags, ID3D10Buffer* bufobj, cl_int * err = NULL) { static PFN_clCreateFromD3D10BufferKHR pfn_clCreateFromD3D10BufferKHR = NULL; #if defined(CL_VERSION_1_2) vector props = context.getInfo(); cl_platform platform = -1; for( int i = 0; i < props.size(); ++i ) { if( props[i] == CL_CONTEXT_PLATFORM ) { platform = props[i+1]; } } __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clCreateFromD3D10BufferKHR); #endif #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clCreateFromD3D10BufferKHR); #endif cl_int error; object_ = pfn_clCreateFromD3D10BufferKHR( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferD3D10() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS BufferD3D10(const cl_mem& buffer) : Buffer(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferD3D10& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10(const BufferD3D10& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (const BufferD3D10 &buf) { Buffer::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10(BufferD3D10&& buf) CL_HPP_NOEXCEPT : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (BufferD3D10 &&buf) { Buffer::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif /*! \brief Class interface for GL Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferGL : public Buffer { public: /*! \brief Constructs a BufferGL in a specified context, from a given * GL buffer. * * Wraps clCreateFromGLBuffer(). */ BufferGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLBuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS BufferGL(const cl_mem& buffer) : Buffer(buffer) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL(const BufferGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (const BufferGL &buf) { Buffer::operator=(buf); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferGL(BufferGL&& buf) CL_HPP_NOEXCEPT : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (BufferGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief C++ base class for Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image : public Memory { protected: //! \brief Default constructor - initializes to NULL. Image() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image(const cl_mem& image) : Memory(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image(const Image& img) : Memory(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image& operator = (const Image &img) { Memory::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image(Image&& img) CL_HPP_NOEXCEPT : Memory(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image& operator = (Image &&img) { Memory::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) public: //! \brief Wrapper for clGetImageInfo(). template cl_int getImageInfo(cl_image_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetImageInfo, object_, name, param), __GET_IMAGE_INFO_ERR); } //! \brief Wrapper for clGetImageInfo() that returns by value. template typename detail::param_traits::param_type getImageInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_image_info, name>::param_type param; cl_int result = getImageInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; #if defined(CL_VERSION_1_2) /*! \brief Class interface for 1D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image1D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image1D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D, width, 0, 0, 0, 0, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image1D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image1D(const cl_mem& image1D) : Image(image1D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image1D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1D(const Image1D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1D& operator = (const Image1D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1D(Image1D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1D& operator = (Image1D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; /*! \class Image1DBuffer * \brief Image interface for 1D buffer images. */ class Image1DBuffer : public Image { public: Image1DBuffer( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, const Buffer &buffer, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_BUFFER, width, 0, 0, 0, 0, 0, 0, 0, buffer() }; object_ = ::clCreateImage( context(), flags, &format, &desc, NULL, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DBuffer() { } __CL_EXPLICIT_CONSTRUCTORS Image1DBuffer(const cl_mem& image1D) : Image(image1D) { } Image1DBuffer& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer(const Image1DBuffer& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (const Image1DBuffer &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer(Image1DBuffer&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (Image1DBuffer &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; /*! \class Image1DArray * \brief Image interface for arrays of 1D images. */ class Image1DArray : public Image { public: Image1DArray( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t arraySize, ::size_t width, ::size_t rowPitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_ARRAY, width, 0, 0, // height, depth (unused) arraySize, rowPitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DArray() { } __CL_EXPLICIT_CONSTRUCTORS Image1DArray(const cl_mem& imageArray) : Image(imageArray) { } Image1DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray(const Image1DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (const Image1DArray &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray(Image1DArray&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (Image1DArray &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for 2D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image2D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, ::size_t height, ::size_t row_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif defined(CL_VERSION_1_2) useCreateImage = true; #else useCreateImage = false; #endif #if defined(CL_VERSION_1_2) if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) if (!useCreateImage) { object_ = ::clCreateImage2D( context(), flags,&format, width, height, row_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE2D_ERR); if (err != NULL) { *err = error; } } #endif // #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) } //! \brief Default constructor - initializes to NULL. Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image2D(const cl_mem& image2D) : Image(image2D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2D(const Image2D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2D& operator = (const Image2D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2D(Image2D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2D& operator = (Image2D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #if !defined(CL_VERSION_1_2) /*! \brief Class interface for GL 2D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory * \note Deprecated for OpenCL 1.2. Please use ImageGL instead. */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED Image2DGL CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED : public Image2D { public: /*! \brief Constructs an Image2DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture2D(). */ Image2DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture2D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_2D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image2DGL() : Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image2DGL(const cl_mem& image) : Image2D(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2DGL& operator = (const cl_mem& rhs) { Image2D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL(const Image2DGL& img) : Image2D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (const Image2DGL &img) { Image2D::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL(Image2DGL&& img) CL_HPP_NOEXCEPT : Image2D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (Image2DGL &&img) { Image2D::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if !defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_2) /*! \class Image2DArray * \brief Image interface for arrays of 2D images. */ class Image2DArray : public Image { public: Image2DArray( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t arraySize, ::size_t width, ::size_t height, ::size_t rowPitch, ::size_t slicePitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D_ARRAY, width, height, 0, // depth (unused) arraySize, rowPitch, slicePitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image2DArray() { } __CL_EXPLICIT_CONSTRUCTORS Image2DArray(const cl_mem& imageArray) : Image(imageArray) { } Image2DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray(const Image2DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (const Image2DArray &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray(Image2DArray&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (Image2DArray &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for 3D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3D : public Image { public: /*! \brief Constructs a 3D Image in a specified context. * * Wraps clCreateImage(). */ Image3D( const Context& context, cl_mem_flags flags, ImageFormat format, ::size_t width, ::size_t height, ::size_t depth, ::size_t row_pitch = 0, ::size_t slice_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if defined(CL_VERSION_1_2) && defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif defined(CL_VERSION_1_2) useCreateImage = true; #else useCreateImage = false; #endif #if defined(CL_VERSION_1_2) if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE3D, width, height, depth, 0, // array size (unused) row_pitch, slice_pitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) if (!useCreateImage) { object_ = ::clCreateImage3D( context(), flags, &format, width, height, depth, row_pitch, slice_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE3D_ERR); if (err != NULL) { *err = error; } } #endif // #if !defined(CL_VERSION_1_2) || defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) } //! \brief Default constructor - initializes to NULL. Image3D() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image3D(const cl_mem& image3D) : Image(image3D) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3D(const Image3D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3D& operator = (const Image3D &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3D(Image3D&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3D& operator = (Image3D &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #if !defined(CL_VERSION_1_2) /*! \brief Class interface for GL 3D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3DGL : public Image3D { public: /*! \brief Constructs an Image3DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture3D(). */ Image3DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture3D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_3D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image3DGL() : Image3D() { } /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ __CL_EXPLICIT_CONSTRUCTORS Image3DGL(const cl_mem& image) : Image3D(image) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3DGL& operator = (const cl_mem& rhs) { Image3D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL(const Image3DGL& img) : Image3D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (const Image3DGL &img) { Image3D::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL(Image3DGL&& img) CL_HPP_NOEXCEPT : Image3D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (Image3DGL &&img) { Image3D::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if !defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_2) /*! \class ImageGL * \brief general image interface for GL interop. * We abstract the 2D and 3D GL images into a single instance here * that wraps all GL sourced images on the grounds that setup information * was performed by OpenCL anyway. */ class ImageGL : public Image { public: ImageGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_ERR); if (err != NULL) { *err = error; } } ImageGL() : Image() { } __CL_EXPLICIT_CONSTRUCTORS ImageGL(const cl_mem& image) : Image(image) { } ImageGL& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL(const ImageGL& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (const ImageGL &img) { Image::operator=(img); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ ImageGL(ImageGL&& img) CL_HPP_NOEXCEPT : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (ImageGL &&img) { Image::operator=(std::move(img)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) }; #endif // #if defined(CL_VERSION_1_2) /*! \brief Class interface for GL Render Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferRenderGL : #if defined(CL_VERSION_1_2) public ImageGL #else // #if defined(CL_VERSION_1_2) public Image2DGL #endif //#if defined(CL_VERSION_1_2) { public: /*! \brief Constructs a BufferRenderGL in a specified context, from a given * GL Renderbuffer. * * Wraps clCreateFromGLRenderbuffer(). */ BufferRenderGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLRenderbuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_RENDER_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. #if defined(CL_VERSION_1_2) BufferRenderGL() : ImageGL() {}; #else // #if defined(CL_VERSION_1_2) BufferRenderGL() : Image2DGL() {}; #endif //#if defined(CL_VERSION_1_2) /*! \brief Constructor from cl_mem - takes ownership. * * See Memory for further details. */ #if defined(CL_VERSION_1_2) __CL_EXPLICIT_CONSTRUCTORS BufferRenderGL(const cl_mem& buffer) : ImageGL(buffer) { } #else // #if defined(CL_VERSION_1_2) __CL_EXPLICIT_CONSTRUCTORS BufferRenderGL(const cl_mem& buffer) : Image2DGL(buffer) { } #endif //#if defined(CL_VERSION_1_2) /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferRenderGL& operator = (const cl_mem& rhs) { #if defined(CL_VERSION_1_2) ImageGL::operator=(rhs); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(rhs); #endif //#if defined(CL_VERSION_1_2) return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ #if defined(CL_VERSION_1_2) BufferRenderGL(const BufferRenderGL& buf) : ImageGL(buf) {} #else // #if defined(CL_VERSION_1_2) BufferRenderGL(const BufferRenderGL& buf) : Image2DGL(buf) {} #endif //#if defined(CL_VERSION_1_2) /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (const BufferRenderGL &rhs) { #if defined(CL_VERSION_1_2) ImageGL::operator=(rhs); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(rhs); #endif //#if defined(CL_VERSION_1_2) return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ #if defined(CL_VERSION_1_2) BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT : ImageGL(std::move(buf)) {} #else // #if defined(CL_VERSION_1_2) BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT : Image2DGL(std::move(buf)) {} #endif //#if defined(CL_VERSION_1_2) /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (BufferRenderGL &&buf) { #if defined(CL_VERSION_1_2) ImageGL::operator=(std::move(buf)); #else // #if defined(CL_VERSION_1_2) Image2DGL::operator=(std::move(buf)); #endif //#if defined(CL_VERSION_1_2) return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_, type, gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief Class interface for cl_sampler. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_sampler as the original. For details, see * clRetainSampler() and clReleaseSampler(). * * \see cl_sampler */ class Sampler : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Sampler() { } /*! \brief Constructs a Sampler in a specified context. * * Wraps clCreateSampler(). */ Sampler( const Context& context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int* err = NULL) { cl_int error; object_ = ::clCreateSampler( context(), normalized_coords, addressing_mode, filter_mode, &error); detail::errHandler(error, __CREATE_SAMPLER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructor from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the cl_sampler * into the new Sampler object. */ __CL_EXPLICIT_CONSTRUCTORS Sampler(const cl_sampler& sampler) : detail::Wrapper(sampler) { } /*! \brief Assignment operator from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseSampler() on the value previously held by this instance. */ Sampler& operator = (const cl_sampler& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Sampler(const Sampler& sam) : detail::Wrapper(sam) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Sampler& operator = (const Sampler &sam) { detail::Wrapper::operator=(sam); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Sampler(Sampler&& sam) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(sam)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Sampler& operator = (Sampler &&sam) { detail::Wrapper::operator=(std::move(sam)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) //! \brief Wrapper for clGetSamplerInfo(). template cl_int getInfo(cl_sampler_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetSamplerInfo, object_, name, param), __GET_SAMPLER_INFO_ERR); } //! \brief Wrapper for clGetSamplerInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_sampler_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; class Program; class CommandQueue; class Kernel; //! \brief Class interface for specifying NDRange values. class NDRange { private: size_t<3> sizes_; cl_uint dimensions_; public: //! \brief Default constructor - resulting range has zero dimensions. NDRange() : dimensions_(0) { } //! \brief Constructs one-dimensional range. NDRange(::size_t size0) : dimensions_(1) { sizes_[0] = size0; } //! \brief Constructs two-dimensional range. NDRange(::size_t size0, ::size_t size1) : dimensions_(2) { sizes_[0] = size0; sizes_[1] = size1; } //! \brief Constructs three-dimensional range. NDRange(::size_t size0, ::size_t size1, ::size_t size2) : dimensions_(3) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = size2; } /*! \brief Conversion operator to const ::size_t *. * * \returns a pointer to the size of the first dimension. */ operator const ::size_t*() const { return (const ::size_t*) sizes_; } //! \brief Queries the number of dimensions in the range. ::size_t dimensions() const { return dimensions_; } }; //! \brief A zero-dimensional range. static const NDRange NullRange; //! \brief Local address wrapper for use with Kernel::setArg struct LocalSpaceArg { ::size_t size_; }; namespace detail { template struct KernelArgumentHandler { static ::size_t size(const T&) { return sizeof(T); } static const T* ptr(const T& value) { return &value; } }; template <> struct KernelArgumentHandler { static ::size_t size(const LocalSpaceArg& value) { return value.size_; } static const void* ptr(const LocalSpaceArg&) { return NULL; } }; } //! \endcond /*! __local * \brief Helper function for generating LocalSpaceArg objects. * Deprecated. Replaced with Local. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED LocalSpaceArg __local(::size_t size) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline LocalSpaceArg __local(::size_t size) { LocalSpaceArg ret = { size }; return ret; } /*! Local * \brief Helper function for generating LocalSpaceArg objects. */ inline LocalSpaceArg Local(::size_t size) { LocalSpaceArg ret = { size }; return ret; } //class KernelFunctor; /*! \brief Class interface for cl_kernel. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_kernel as the original. For details, see * clRetainKernel() and clReleaseKernel(). * * \see cl_kernel */ class Kernel : public detail::Wrapper { public: inline Kernel(const Program& program, const char* name, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Kernel() { } /*! \brief Constructor from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the cl_kernel * into the new Kernel object. */ __CL_EXPLICIT_CONSTRUCTORS Kernel(const cl_kernel& kernel) : detail::Wrapper(kernel) { } /*! \brief Assignment operator from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseKernel() on the value previously held by this instance. */ Kernel& operator = (const cl_kernel& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Kernel(const Kernel& kernel) : detail::Wrapper(kernel) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Kernel& operator = (const Kernel &kernel) { detail::Wrapper::operator=(kernel); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Kernel(Kernel&& kernel) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(kernel)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Kernel& operator = (Kernel &&kernel) { detail::Wrapper::operator=(std::move(kernel)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) template cl_int getInfo(cl_kernel_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelInfo, object_, name, param), __GET_KERNEL_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if defined(CL_VERSION_1_2) template cl_int getArgInfo(cl_uint argIndex, cl_kernel_arg_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelArgInfo, object_, argIndex, name, param), __GET_KERNEL_ARG_INFO_ERR); } template typename detail::param_traits::param_type getArgInfo(cl_uint argIndex, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_arg_info, name>::param_type param; cl_int result = getArgInfo(argIndex, name, ¶m); if (err != NULL) { *err = result; } return param; } #endif // #if defined(CL_VERSION_1_2) template cl_int getWorkGroupInfo( const Device& device, cl_kernel_work_group_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetKernelWorkGroupInfo, object_, device(), name, param), __GET_KERNEL_WORK_GROUP_INFO_ERR); } template typename detail::param_traits::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_work_group_info, name>::param_type param; cl_int result = getWorkGroupInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int setArg(cl_uint index, const T &value) { return detail::errHandler( ::clSetKernelArg( object_, index, detail::KernelArgumentHandler::size(value), detail::KernelArgumentHandler::ptr(value)), __SET_KERNEL_ARGS_ERR); } cl_int setArg(cl_uint index, ::size_t size, const void* argPtr) { return detail::errHandler( ::clSetKernelArg(object_, index, size, argPtr), __SET_KERNEL_ARGS_ERR); } }; /*! \class Program * \brief Program interface that implements cl_program. */ class Program : public detail::Wrapper { public: typedef VECTOR_CLASS > Binaries; typedef VECTOR_CLASS > Sources; Program( const STRING_CLASS& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const ::size_t length = source.size(); Context context = Context::getDefault(err); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, "", NULL, NULL); detail::errHandler(error, __BUILD_PROGRAM_ERR); } if (err != NULL) { *err = error; } } Program( const Context& context, const STRING_CLASS& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const ::size_t length = source.size(); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, "", NULL, NULL); detail::errHandler(error, __BUILD_PROGRAM_ERR); } if (err != NULL) { *err = error; } } Program( const Context& context, const Sources& sources, cl_int* err = NULL) { cl_int error; const ::size_t n = (::size_t)sources.size(); ::size_t* lengths = (::size_t*) alloca(n * sizeof(::size_t)); const char** strings = (const char**) alloca(n * sizeof(const char*)); for (::size_t i = 0; i < n; ++i) { strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings, lengths, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Construct a program object from a list of devices and a per-device list of binaries. * \param context A valid OpenCL context in which to construct the program. * \param devices A vector of OpenCL device objects for which the program will be created. * \param binaries A vector of pairs of a pointer to a binary object and its length. * \param binaryStatus An optional vector that on completion will be resized to * match the size of binaries and filled with values to specify if each binary * was successfully loaded. * Set to CL_SUCCESS if the binary was successfully loaded. * Set to CL_INVALID_VALUE if the length is 0 or the binary pointer is NULL. * Set to CL_INVALID_BINARY if the binary provided is not valid for the matching device. * \param err if non-NULL will be set to CL_SUCCESS on successful operation or one of the following errors: * CL_INVALID_CONTEXT if context is not a valid context. * CL_INVALID_VALUE if the length of devices is zero; or if the length of binaries does not match the length of devices; * or if any entry in binaries is NULL or has length 0. * CL_INVALID_DEVICE if OpenCL devices listed in devices are not in the list of devices associated with context. * CL_INVALID_BINARY if an invalid program binary was encountered for any device. binaryStatus will return specific status for each device. * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ Program( const Context& context, const VECTOR_CLASS& devices, const Binaries& binaries, VECTOR_CLASS* binaryStatus = NULL, cl_int* err = NULL) { cl_int error; const ::size_t numDevices = devices.size(); // Catch size mismatch early and return if(binaries.size() != numDevices) { error = CL_INVALID_VALUE; detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } return; } ::size_t* lengths = (::size_t*) alloca(numDevices * sizeof(::size_t)); const unsigned char** images = (const unsigned char**) alloca(numDevices * sizeof(const unsigned char**)); for (::size_t i = 0; i < numDevices; ++i) { images[i] = (const unsigned char*)binaries[i].first; lengths[i] = binaries[(int)i].second; } cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } if(binaryStatus) { binaryStatus->resize(numDevices); } object_ = ::clCreateProgramWithBinary( context(), (cl_uint) devices.size(), deviceIDs, lengths, images, (binaryStatus != NULL && numDevices > 0) ? &binaryStatus->front() : NULL, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } } #if defined(CL_VERSION_1_2) /** * Create program using builtin kernels. * \param kernelNames Semi-colon separated list of builtin kernel names */ Program( const Context& context, const VECTOR_CLASS& devices, const STRING_CLASS& kernelNames, cl_int* err = NULL) { cl_int error; ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateProgramWithBuiltInKernels( context(), (cl_uint) devices.size(), deviceIDs, kernelNames.c_str(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR); if (err != NULL) { *err = error; } } #endif // #if defined(CL_VERSION_1_2) Program() { } __CL_EXPLICIT_CONSTRUCTORS Program(const cl_program& program) : detail::Wrapper(program) { } Program& operator = (const cl_program& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Program(const Program& program) : detail::Wrapper(program) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Program& operator = (const Program &program) { detail::Wrapper::operator=(program); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Program(Program&& program) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(program)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Program& operator = (Program &&program) { detail::Wrapper::operator=(std::move(program)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) cl_int build( const VECTOR_CLASS& devices, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { ::size_t numDevices = devices.size(); cl_device_id* deviceIDs = (cl_device_id*) alloca(numDevices * sizeof(cl_device_id)); for( ::size_t deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } return detail::errHandler( ::clBuildProgram( object_, (cl_uint) devices.size(), deviceIDs, options, notifyFptr, data), __BUILD_PROGRAM_ERR); } cl_int build( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { return detail::errHandler( ::clBuildProgram( object_, 0, NULL, options, notifyFptr, data), __BUILD_PROGRAM_ERR); } #if defined(CL_VERSION_1_2) cl_int compile( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { return detail::errHandler( ::clCompileProgram( object_, 0, NULL, options, 0, NULL, NULL, notifyFptr, data), __COMPILE_PROGRAM_ERR); } #endif template cl_int getInfo(cl_program_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int getBuildInfo( const Device& device, cl_program_build_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetProgramBuildInfo, object_, device(), name, param), __GET_PROGRAM_BUILD_INFO_ERR); } template typename detail::param_traits::param_type getBuildInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; cl_int result = getBuildInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int createKernels(VECTOR_CLASS* kernels) { cl_uint numKernels; cl_int err = ::clCreateKernelsInProgram(object_, 0, NULL, &numKernels); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } Kernel* value = (Kernel*) alloca(numKernels * sizeof(Kernel)); err = ::clCreateKernelsInProgram( object_, numKernels, (cl_kernel*) value, NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } kernels->assign(&value[0], &value[numKernels]); return CL_SUCCESS; } }; #if defined(CL_VERSION_1_2) inline Program linkProgram( Program input1, Program input2, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program programs[2] = { input1(), input2() }; Context ctx = input1.getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, 2, programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } inline Program linkProgram( VECTOR_CLASS inputPrograms, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program * programs = (cl_program*) alloca(inputPrograms.size() * sizeof(cl_program)); if (programs != NULL) { for (unsigned int i = 0; i < inputPrograms.size(); i++) { programs[i] = inputPrograms[i](); } } Context ctx; if(inputPrograms.size() > 0) { ctx = inputPrograms[0].getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, (cl_uint)inputPrograms.size(), programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } #endif template<> inline VECTOR_CLASS cl::Program::getInfo(cl_int* err) const { VECTOR_CLASS< ::size_t> sizes = getInfo(); VECTOR_CLASS binaries; for (VECTOR_CLASS< ::size_t>::iterator s = sizes.begin(); s != sizes.end(); ++s) { char *ptr = NULL; if (*s != 0) ptr = new char[*s]; binaries.push_back(ptr); } cl_int result = getInfo(CL_PROGRAM_BINARIES, &binaries); if (err != NULL) { *err = result; } return binaries; } inline Kernel::Kernel(const Program& program, const char* name, cl_int* err) { cl_int error; object_ = ::clCreateKernel(program(), name, &error); detail::errHandler(error, __CREATE_KERNEL_ERR); if (err != NULL) { *err = error; } } /*! \class CommandQueue * \brief CommandQueue interface for cl_command_queue. */ class CommandQueue : public detail::Wrapper { private: #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED static std::atomic default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED static volatile int default_initialized_; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED static CommandQueue default_; static volatile cl_int default_error_; public: CommandQueue( cl_command_queue_properties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context */ explicit CommandQueue( const Context& context, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; VECTOR_CLASS devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } object_ = ::clCreateCommandQueue(context(), devices[0](), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } CommandQueue( const Context& context, const Device& device, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue(const CommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (const CommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue(CommandQueue&& queue) CL_HPP_NOEXCEPT : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (CommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } #endif // #if defined(CL_HPP_RVALUE_REFERENCES_SUPPORTED) static CommandQueue getDefault(cl_int * err = NULL) { int state = detail::compare_exchange( &default_initialized_, __DEFAULT_BEING_INITIALIZED, __DEFAULT_NOT_INITIALIZED); if (state & __DEFAULT_INITIALIZED) { if (err != NULL) { *err = default_error_; } return default_; } if (state & __DEFAULT_BEING_INITIALIZED) { // Assume writes will propagate eventually... while(default_initialized_ != __DEFAULT_INITIALIZED) { detail::fence(); } if (err != NULL) { *err = default_error_; } return default_; } cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; default_ = CommandQueue(context, device, 0, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } } detail::fence(); default_error_ = error; // Assume writes will propagate eventually... default_initialized_ = __DEFAULT_INITIALIZED; detail::fence(); if (err != NULL) { *err = default_error_; } return default_; } CommandQueue() { } __CL_EXPLICIT_CONSTRUCTORS CommandQueue(const cl_command_queue& commandQueue) : detail::Wrapper(commandQueue) { } CommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, ::size_t src_offset, ::size_t dst_offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBuffer( object_, src(), dst(), src_offset, dst_offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBufferRect( object_, buffer(), blocking, (const ::size_t *)buffer_offset, (const ::size_t *)host_offset, (const ::size_t *)region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, const void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBufferRect( object_, buffer(), blocking, (const ::size_t *)buffer_offset, (const ::size_t *)host_offset, (const ::size_t *)region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, ::size_t src_row_pitch, ::size_t src_slice_pitch, ::size_t dst_row_pitch, ::size_t dst_slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferRect( object_, src(), dst(), (const ::size_t *)src_origin, (const ::size_t *)dst_origin, (const ::size_t *)region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueue a command to fill a buffer object with a pattern * of a given size. The pattern is specified a as vector. * \tparam PatternType The datatype of the pattern field. * The pattern type must be an accepted OpenCL data type. */ template cl_int enqueueFillBuffer( const Buffer& buffer, PatternType pattern, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillBuffer( object_, buffer(), static_cast(&pattern), sizeof(PatternType), offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueReadImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadImage( object_, image(), blocking, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteImage( object_, image(), blocking, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyImage( const Image& src, const Image& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImage( object_, src(), dst(), (const ::size_t *) src_origin, (const ::size_t *)dst_origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA floating-point color value if * the image channel data type is not an unnormalized signed or * unsigned data type. */ cl_int enqueueFillImage( const Image& image, cl_float4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA signed integer color value if * the image channel data type is an unnormalized signed integer * type. */ cl_int enqueueFillImage( const Image& image, cl_int4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA unsigned integer color value if * the image channel data type is an unnormalized unsigned integer * type. */ cl_int enqueueFillImage( const Image& image, cl_uint4 fillColor, const size_t<3>& origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), (const ::size_t *) origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& region, ::size_t dst_offset, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImageToBuffer( object_, src(), dst(), (const ::size_t *) src_origin, (const ::size_t *) region, dst_offset, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, ::size_t src_offset, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferToImage( object_, src(), dst(), src_offset, (const ::size_t *) dst_origin, (const ::size_t *) region, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapBuffer( object_, buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } void* enqueueMapImage( const Image& buffer, cl_bool blocking, cl_map_flags flags, const size_t<3>& origin, const size_t<3>& region, ::size_t * row_pitch, ::size_t * slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapImage( object_, buffer(), blocking, flags, (const ::size_t *) origin, (const ::size_t *) region, row_pitch, slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_IMAGE_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( object_, memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_VERSION_1_2) /** * Enqueues a marker command which waits for either a list of events to complete, * or all previously enqueued commands to complete. * * Enqueues a marker command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command returns an event which can be waited on, * i.e. this event can be waited on to insure that all events either in the event_wait_list * or all previously enqueued commands, queued before this command to command_queue, * have completed. */ cl_int enqueueMarkerWithWaitList( const VECTOR_CLASS *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarkerWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * A synchronization point that enqueues a barrier operation. * * Enqueues a barrier command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command blocks command execution, that is, any * following commands enqueued after it do not execute until it completes. This command * returns an event which can be waited on, i.e. this event can be waited on to insure that * all events either in the event_wait_list or all previously enqueued commands, queued * before this command to command_queue, have completed. */ cl_int enqueueBarrierWithWaitList( const VECTOR_CLASS *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueBarrierWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_BARRIER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command to indicate with which device a set of memory objects * should be associated. */ cl_int enqueueMigrateMemObjects( const VECTOR_CLASS &memObjects, cl_mem_migration_flags flags, const VECTOR_CLASS* events = NULL, Event* event = NULL ) const { cl_event tmp; cl_mem* localMemObjects = static_cast(alloca(memObjects.size() * sizeof(cl_mem))); for( int i = 0; i < (int)memObjects.size(); ++i ) { localMemObjects[i] = memObjects[i](); } cl_int err = detail::errHandler( ::clEnqueueMigrateMemObjects( object_, (cl_uint)memObjects.size(), static_cast(localMemObjects), flags, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_VERSION_1_2) cl_int enqueueNDRangeKernel( const Kernel& kernel, const NDRange& offset, const NDRange& global, const NDRange& local = NullRange, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNDRangeKernel( object_, kernel(), (cl_uint) global.dimensions(), offset.dimensions() != 0 ? (const ::size_t*) offset : NULL, (const ::size_t*) global, local.dimensions() != 0 ? (const ::size_t*) local : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NDRANGE_KERNEL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueTask( const Kernel& kernel, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueTask( object_, kernel(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_TASK_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueNativeKernel( void (CL_CALLBACK *userFptr)(void *), std::pair args, const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* mem_locs = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_mem * mems = (mem_objects != NULL && mem_objects->size() > 0) ? (cl_mem*) alloca(mem_objects->size() * sizeof(cl_mem)) : NULL; if (mems != NULL) { for (unsigned int i = 0; i < mem_objects->size(); i++) { mems[i] = ((*mem_objects)[i])(); } } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNativeKernel( object_, userFptr, args.first, args.second, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, mems, (mem_locs != NULL && mem_locs->size() > 0) ? (const void **) &mem_locs->front() : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NATIVE_KERNEL); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueMarker(Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarker( object_, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueWaitForEvents(const VECTOR_CLASS& events) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueWaitForEvents( object_, (cl_uint) events.size(), events.size() > 0 ? (const cl_event*) &events.front() : NULL), __ENQUEUE_WAIT_FOR_EVENTS_ERR); } #endif // #if defined(CL_VERSION_1_1) cl_int enqueueAcquireGLObjects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueAcquireGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseGLObjects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReleaseGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined (USE_DX_INTEROP) typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueAcquireD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueReleaseD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); cl_int enqueueAcquireD3D10Objects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { static PFN_clEnqueueAcquireD3D10ObjectsKHR pfn_clEnqueueAcquireD3D10ObjectsKHR = NULL; #if defined(CL_VERSION_1_2) cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clEnqueueAcquireD3D10ObjectsKHR); #endif #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clEnqueueAcquireD3D10ObjectsKHR); #endif cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueAcquireD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseD3D10Objects( const VECTOR_CLASS* mem_objects = NULL, const VECTOR_CLASS* events = NULL, Event* event = NULL) const { static PFN_clEnqueueReleaseD3D10ObjectsKHR pfn_clEnqueueReleaseD3D10ObjectsKHR = NULL; #if defined(CL_VERSION_1_2) cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); __INIT_CL_EXT_FCN_PTR_PLATFORM(platform, clEnqueueReleaseD3D10ObjectsKHR); #endif // #if defined(CL_VERSION_1_2) #if defined(CL_VERSION_1_1) __INIT_CL_EXT_FCN_PTR(clEnqueueReleaseD3D10ObjectsKHR); #endif // #if defined(CL_VERSION_1_1) cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueReleaseD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) || (defined(CL_VERSION_1_1) && !defined(CL_VERSION_1_2)) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueBarrier() const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueBarrier(object_), __ENQUEUE_BARRIER_ERR); } #endif // #if defined(CL_VERSION_1_1) cl_int flush() const { return detail::errHandler(::clFlush(object_), __FLUSH_ERR); } cl_int finish() const { return detail::errHandler(::clFinish(object_), __FINISH_ERR); } }; #ifdef _WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) std::atomic CommandQueue::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) volatile int CommandQueue::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __declspec(selectany) CommandQueue CommandQueue::default_; __declspec(selectany) volatile cl_int CommandQueue::default_error_ = CL_SUCCESS; #else // !_WIN32 #ifdef CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) std::atomic CommandQueue::default_initialized_; #else // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) volatile int CommandQueue::default_initialized_ = __DEFAULT_NOT_INITIALIZED; #endif // !CL_HPP_CPP11_ATOMICS_SUPPORTED __attribute__((weak)) CommandQueue CommandQueue::default_; __attribute__((weak)) volatile cl_int CommandQueue::default_error_ = CL_SUCCESS; #endif // !_WIN32 template< typename IteratorType > Buffer::Buffer( const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { CommandQueue queue(context, 0, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } template< typename IteratorType > Buffer::Buffer( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if (readOnly) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; } ::size_t size = sizeof(DataType)*(endIterator - startIterator); Context context = queue.getInfo(); if (useHostPtr) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if (!useHostPtr) { error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } inline cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBuffer(buffer, blocking, offset, size, ptr, events, event); } inline cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, ::size_t offset, ::size_t size, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBuffer(buffer, blocking, offset, size, ptr, events, event); } inline void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, ::size_t offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL, cl_int* err = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } void * result = ::clEnqueueMapBuffer( queue(), buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (cl_event*) event, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } return result; } inline cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (error != CL_SUCCESS) { return error; } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( queue(), memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } inline cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, ::size_t src_offset, ::size_t dst_offset, ::size_t size, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBuffer(src, dst, src_offset, dst_offset, size, events, event); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, startIterator, endIterator, buffer); } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, buffer, startIterator, endIterator); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; ::size_t length = endIterator-startIterator; ::size_t byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_WRITE, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } #if defined(_MSC_VER) std::copy( startIterator, endIterator, stdext::checked_array_iterator( pointer, length)); #else std::copy(startIterator, endIterator, pointer); #endif Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; ::size_t length = endIterator-startIterator; ::size_t byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_READ, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } std::copy(pointer, pointer + length, startIterator); Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } #if defined(CL_VERSION_1_1) inline cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const size_t<3>& buffer_offset, const size_t<3>& host_offset, const size_t<3>& region, ::size_t buffer_row_pitch, ::size_t buffer_slice_pitch, ::size_t host_row_pitch, ::size_t host_slice_pitch, const void *ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, ::size_t src_row_pitch, ::size_t src_slice_pitch, ::size_t dst_row_pitch, ::size_t dst_slice_pitch, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferRect( src, dst, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, events, event); } #endif inline cl_int enqueueReadImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const size_t<3>& origin, const size_t<3>& region, ::size_t row_pitch, ::size_t slice_pitch, const void* ptr, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueCopyImage( const Image& src, const Image& dst, const size_t<3>& src_origin, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImage( src, dst, src_origin, dst_origin, region, events, event); } inline cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const size_t<3>& src_origin, const size_t<3>& region, ::size_t dst_offset, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImageToBuffer( src, dst, src_origin, region, dst_offset, events, event); } inline cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, ::size_t src_offset, const size_t<3>& dst_origin, const size_t<3>& region, const VECTOR_CLASS* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferToImage( src, dst, src_offset, dst_origin, region, events, event); } inline cl_int flush(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.flush(); } inline cl_int finish(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.finish(); } // Kernel Functor support // New interface as of September 2011 // Requires the C++11 std::tr1::function (note do not support TR1) // Visual Studio 2010 and GCC 4.2 struct EnqueueArgs { CommandQueue queue_; const NDRange offset_; const NDRange global_; const NDRange local_; VECTOR_CLASS events_; EnqueueArgs(NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { } EnqueueArgs(Event e, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(Event e, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(Event e, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(const VECTOR_CLASS &events, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(const VECTOR_CLASS &events, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(const VECTOR_CLASS &events, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(CommandQueue &queue, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, Event e, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, const VECTOR_CLASS &events, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local), events_(events) { } }; namespace detail { class NullType {}; template struct SetArg { static void set (Kernel kernel, T0 arg) { kernel.setArg(index, arg); } }; template struct SetArg { static void set (Kernel, NullType) { } }; template < typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30, typename T31 > class KernelFunctorGlobal { private: Kernel kernel_; public: KernelFunctorGlobal( Kernel kernel) : kernel_(kernel) {} KernelFunctorGlobal( const Program& program, const STRING_CLASS name, cl_int * err = NULL) : kernel_(program, name.c_str(), err) {} Event operator() ( const EnqueueArgs& args, T0 t0, T1 t1 = NullType(), T2 t2 = NullType(), T3 t3 = NullType(), T4 t4 = NullType(), T5 t5 = NullType(), T6 t6 = NullType(), T7 t7 = NullType(), T8 t8 = NullType(), T9 t9 = NullType(), T10 t10 = NullType(), T11 t11 = NullType(), T12 t12 = NullType(), T13 t13 = NullType(), T14 t14 = NullType(), T15 t15 = NullType(), T16 t16 = NullType(), T17 t17 = NullType(), T18 t18 = NullType(), T19 t19 = NullType(), T20 t20 = NullType(), T21 t21 = NullType(), T22 t22 = NullType(), T23 t23 = NullType(), T24 t24 = NullType(), T25 t25 = NullType(), T26 t26 = NullType(), T27 t27 = NullType(), T28 t28 = NullType(), T29 t29 = NullType(), T30 t30 = NullType(), T31 t31 = NullType() ) { Event event; SetArg<0, T0>::set(kernel_, t0); SetArg<1, T1>::set(kernel_, t1); SetArg<2, T2>::set(kernel_, t2); SetArg<3, T3>::set(kernel_, t3); SetArg<4, T4>::set(kernel_, t4); SetArg<5, T5>::set(kernel_, t5); SetArg<6, T6>::set(kernel_, t6); SetArg<7, T7>::set(kernel_, t7); SetArg<8, T8>::set(kernel_, t8); SetArg<9, T9>::set(kernel_, t9); SetArg<10, T10>::set(kernel_, t10); SetArg<11, T11>::set(kernel_, t11); SetArg<12, T12>::set(kernel_, t12); SetArg<13, T13>::set(kernel_, t13); SetArg<14, T14>::set(kernel_, t14); SetArg<15, T15>::set(kernel_, t15); SetArg<16, T16>::set(kernel_, t16); SetArg<17, T17>::set(kernel_, t17); SetArg<18, T18>::set(kernel_, t18); SetArg<19, T19>::set(kernel_, t19); SetArg<20, T20>::set(kernel_, t20); SetArg<21, T21>::set(kernel_, t21); SetArg<22, T22>::set(kernel_, t22); SetArg<23, T23>::set(kernel_, t23); SetArg<24, T24>::set(kernel_, t24); SetArg<25, T25>::set(kernel_, t25); SetArg<26, T26>::set(kernel_, t26); SetArg<27, T27>::set(kernel_, t27); SetArg<28, T28>::set(kernel_, t28); SetArg<29, T29>::set(kernel_, t29); SetArg<30, T30>::set(kernel_, t30); SetArg<31, T31>::set(kernel_, t31); args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } }; //------------------------------------------------------------------------------------------------------ template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30, typename T31> struct functionImplementation_ { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 32)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29, T30 arg30, T31 arg31) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29, arg30, arg31); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29, typename T30> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 31)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29, T30 arg30) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29, arg30); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28, typename T29> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 30)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28, T29 arg29) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28, arg29); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27, typename T28> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 29)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27, T28 arg28) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27, arg28); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26, typename T27> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 28)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26, T27 arg27) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26, arg27); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25, typename T26> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 27)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25, T26 arg26) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25, arg26); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24, typename T25> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 26)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24, T25 arg25) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24, arg25); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23, typename T24> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 25)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23, T24 arg24) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23, arg24); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22, typename T23> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 24)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22, T23 arg23) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22, arg23); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21, typename T22> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 23)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21, T22 arg22) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21, arg22); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20, typename T21> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 22)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20, T21 arg21) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20, arg21); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19, typename T20> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 21)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19, T20 arg20) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19, arg20); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18, typename T19> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 20)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18, T19 arg19) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18, arg19); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17, typename T18> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 19)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17, T18 arg18) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17, arg18); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16, typename T17> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 18)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16, T17 arg17) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16, arg17); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15, typename T16> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 17)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15, T16 arg16) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15, arg16); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14, typename T15> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 16)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14, T15 arg15) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14, arg15); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13, typename T14> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 15)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13, T14 arg14) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13, arg14); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12, typename T13> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 14)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12, T13 arg13) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12, arg13); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11, typename T12> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 13)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11, T12 arg12) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11, arg12); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10, typename T11> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 12)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10, T11 arg11) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10, arg11); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9, typename T10> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 11)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9, arg10); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8, typename T9> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 10)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7, typename T8> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, T8, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 9)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7, T8); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6, typename T7> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, T7, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 8)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6, T7); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5, typename T6> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, T6, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 7)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5, T6); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5, arg6); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4, typename T5> struct functionImplementation_ < T0, T1, T2, T3, T4, T5, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 6)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4, T5); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4, arg5); } }; template< typename T0, typename T1, typename T2, typename T3, typename T4> struct functionImplementation_ < T0, T1, T2, T3, T4, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 5)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3, T4); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3, arg4); } }; template< typename T0, typename T1, typename T2, typename T3> struct functionImplementation_ < T0, T1, T2, T3, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 4)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2, T3); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2, T3 arg3) { return functor_( enqueueArgs, arg0, arg1, arg2, arg3); } }; template< typename T0, typename T1, typename T2> struct functionImplementation_ < T0, T1, T2, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, T2, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 3)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1, T2); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1, T2 arg2) { return functor_( enqueueArgs, arg0, arg1, arg2); } }; template< typename T0, typename T1> struct functionImplementation_ < T0, T1, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, T1, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 2)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0, T1); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0, T1 arg1) { return functor_( enqueueArgs, arg0, arg1); } }; template< typename T0> struct functionImplementation_ < T0, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> { typedef detail::KernelFunctorGlobal< T0, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType, NullType> FunctorType; FunctorType functor_; functionImplementation_(const FunctorType &functor) : functor_(functor) { #if (defined(_WIN32) && defined(_VARIADIC_MAX) && (_VARIADIC_MAX < 1)) // Fail variadic expansion for dev11 static_assert(0, "Visual Studio has a hard limit of argument count for a std::function expansion. Please define _VARIADIC_MAX to be 10. If you need more arguments than that VC12 and below cannot support it."); #endif } //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, T0); Event operator()( const EnqueueArgs& enqueueArgs, T0 arg0) { return functor_( enqueueArgs, arg0); } }; } // namespace detail //---------------------------------------------------------------------------------------------- template < typename T0, typename T1 = detail::NullType, typename T2 = detail::NullType, typename T3 = detail::NullType, typename T4 = detail::NullType, typename T5 = detail::NullType, typename T6 = detail::NullType, typename T7 = detail::NullType, typename T8 = detail::NullType, typename T9 = detail::NullType, typename T10 = detail::NullType, typename T11 = detail::NullType, typename T12 = detail::NullType, typename T13 = detail::NullType, typename T14 = detail::NullType, typename T15 = detail::NullType, typename T16 = detail::NullType, typename T17 = detail::NullType, typename T18 = detail::NullType, typename T19 = detail::NullType, typename T20 = detail::NullType, typename T21 = detail::NullType, typename T22 = detail::NullType, typename T23 = detail::NullType, typename T24 = detail::NullType, typename T25 = detail::NullType, typename T26 = detail::NullType, typename T27 = detail::NullType, typename T28 = detail::NullType, typename T29 = detail::NullType, typename T30 = detail::NullType, typename T31 = detail::NullType > struct make_kernel : public detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 > { public: typedef detail::KernelFunctorGlobal< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 > FunctorType; make_kernel( const Program& program, const STRING_CLASS name, cl_int * err = NULL) : detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 >( FunctorType(program, name, err)) {} make_kernel( const Kernel kernel) : detail::functionImplementation_< T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22, T23, T24, T25, T26, T27, T28, T29, T30, T31 >( FunctorType(kernel)) {} }; //---------------------------------------------------------------------------------------------------------------------- #undef __ERR_STR #if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS) #undef __GET_DEVICE_INFO_ERR #undef __GET_PLATFORM_INFO_ERR #undef __GET_DEVICE_IDS_ERR #undef __GET_CONTEXT_INFO_ERR #undef __GET_EVENT_INFO_ERR #undef __GET_EVENT_PROFILE_INFO_ERR #undef __GET_MEM_OBJECT_INFO_ERR #undef __GET_IMAGE_INFO_ERR #undef __GET_SAMPLER_INFO_ERR #undef __GET_KERNEL_INFO_ERR #undef __GET_KERNEL_ARG_INFO_ERR #undef __GET_KERNEL_WORK_GROUP_INFO_ERR #undef __GET_PROGRAM_INFO_ERR #undef __GET_PROGRAM_BUILD_INFO_ERR #undef __GET_COMMAND_QUEUE_INFO_ERR #undef __CREATE_CONTEXT_ERR #undef __CREATE_CONTEXT_FROM_TYPE_ERR #undef __GET_SUPPORTED_IMAGE_FORMATS_ERR #undef __CREATE_BUFFER_ERR #undef __CREATE_SUBBUFFER_ERR #undef __CREATE_IMAGE2D_ERR #undef __CREATE_IMAGE3D_ERR #undef __CREATE_SAMPLER_ERR #undef __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR #undef __CREATE_USER_EVENT_ERR #undef __SET_USER_EVENT_STATUS_ERR #undef __SET_EVENT_CALLBACK_ERR #undef __SET_PRINTF_CALLBACK_ERR #undef __WAIT_FOR_EVENTS_ERR #undef __CREATE_KERNEL_ERR #undef __SET_KERNEL_ARGS_ERR #undef __CREATE_PROGRAM_WITH_SOURCE_ERR #undef __CREATE_PROGRAM_WITH_BINARY_ERR #undef __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR #undef __BUILD_PROGRAM_ERR #undef __CREATE_KERNELS_IN_PROGRAM_ERR #undef __CREATE_COMMAND_QUEUE_ERR #undef __SET_COMMAND_QUEUE_PROPERTY_ERR #undef __ENQUEUE_READ_BUFFER_ERR #undef __ENQUEUE_WRITE_BUFFER_ERR #undef __ENQUEUE_READ_BUFFER_RECT_ERR #undef __ENQUEUE_WRITE_BUFFER_RECT_ERR #undef __ENQEUE_COPY_BUFFER_ERR #undef __ENQEUE_COPY_BUFFER_RECT_ERR #undef __ENQUEUE_READ_IMAGE_ERR #undef __ENQUEUE_WRITE_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR #undef __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR #undef __ENQUEUE_MAP_BUFFER_ERR #undef __ENQUEUE_MAP_IMAGE_ERR #undef __ENQUEUE_UNMAP_MEM_OBJECT_ERR #undef __ENQUEUE_NDRANGE_KERNEL_ERR #undef __ENQUEUE_TASK_ERR #undef __ENQUEUE_NATIVE_KERNEL #undef __CL_EXPLICIT_CONSTRUCTORS #undef __UNLOAD_COMPILER_ERR #endif //__CL_USER_OVERRIDE_ERROR_STRINGS #undef __CL_FUNCTION_TYPE // Extensions /** * Deprecated APIs for 1.2 */ #if defined(CL_VERSION_1_1) #undef __INIT_CL_EXT_FCN_PTR #endif // #if defined(CL_VERSION_1_1) #undef __CREATE_SUB_DEVICES #if defined(USE_CL_DEVICE_FISSION) #undef __PARAM_NAME_DEVICE_FISSION #endif // USE_CL_DEVICE_FISSION #undef __DEFAULT_NOT_INITIALIZED #undef __DEFAULT_BEING_INITIALIZED #undef __DEFAULT_INITIALIZED #undef CL_HPP_RVALUE_REFERENCES_SUPPORTED #undef CL_HPP_NOEXCEPT } // namespace cl #endif // CL_HPP_ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl2.hpp000066400000000000000000011104031450307266000230140ustar00rootroot00000000000000/* Modifications Copyright(C)[2021-2022] Advanced Micro Devices, Inc. * All rights reserved. * */ /******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /*! \file * * \brief C++ bindings for OpenCL 1.0 (rev 48), OpenCL 1.1 (rev 33), * OpenCL 1.2 (rev 15) and OpenCL 2.0 (rev 29) * \author Lee Howes and Bruce Merry * * Derived from the OpenCL 1.x C++ bindings written by * Benedict R. Gaster, Laurent Morichetti and Lee Howes * With additions and fixes from: * Brian Cole, March 3rd 2010 and April 2012 * Matt Gruenke, April 2012. * Bruce Merry, February 2013. * Tom Deakin and Simon McIntosh-Smith, July 2013 * James Price, 2015- * * \version 2.0.10 * \date 2016-07-20 * * Optional extension support * * cl_ext_device_fission * #define CL_HPP_USE_CL_DEVICE_FISSION * cl_khr_d3d10_sharing * #define CL_HPP_USE_DX_INTEROP * cl_khr_sub_groups * #define CL_HPP_USE_CL_SUB_GROUPS_KHR * cl_khr_image2d_from_buffer * #define CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR * * Doxygen documentation for this header is available here: * * http://khronosgroup.github.io/OpenCL-CLHPP/ * * The latest version of this header can be found on the GitHub releases page: * * https://github.com/KhronosGroup/OpenCL-CLHPP/releases * * Bugs and patches can be submitted to the GitHub repository: * * https://github.com/KhronosGroup/OpenCL-CLHPP */ /*! \mainpage * \section intro Introduction * For many large applications C++ is the language of choice and so it seems * reasonable to define C++ bindings for OpenCL. * * The interface is contained with a single C++ header file \em cl2.hpp and all * definitions are contained within the namespace \em cl. There is no additional * requirement to include \em cl.h and to use either the C++ or original C * bindings; it is enough to simply include \em cl2.hpp. * * The bindings themselves are lightweight and correspond closely to the * underlying C API. Using the C++ bindings introduces no additional execution * overhead. * * There are numerous compatibility, portability and memory management * fixes in the new header as well as additional OpenCL 2.0 features. * As a result the header is not directly backward compatible and for this * reason we release it as cl2.hpp rather than a new version of cl.hpp. * * * \section compatibility Compatibility * Due to the evolution of the underlying OpenCL API the 2.0 C++ bindings * include an updated approach to defining supported feature versions * and the range of valid underlying OpenCL runtime versions supported. * * The combination of preprocessor macros CL_HPP_TARGET_OPENCL_VERSION and * CL_HPP_MINIMUM_OPENCL_VERSION control this range. These are three digit * decimal values representing OpenCL runime versions. The default for * the target is 200, representing OpenCL 2.0 and the minimum is also * defined as 200. These settings would use 2.0 API calls only. * If backward compatibility with a 1.2 runtime is required, the minimum * version may be set to 120. * * Note that this is a compile-time setting, and so affects linking against * a particular SDK version rather than the versioning of the loaded runtime. * * The earlier versions of the header included basic vector and string * classes based loosely on STL versions. These were difficult to * maintain and very rarely used. For the 2.0 header we now assume * the presence of the standard library unless requested otherwise. * We use std::array, std::vector, std::shared_ptr and std::string * throughout to safely manage memory and reduce the chance of a * recurrance of earlier memory management bugs. * * These classes are used through typedefs in the cl namespace: * cl::array, cl::vector, cl::pointer and cl::string. * In addition cl::allocate_pointer forwards to std::allocate_shared * by default. * In all cases these standard library classes can be replaced with * custom interface-compatible versions using the CL_HPP_NO_STD_ARRAY, * CL_HPP_NO_STD_VECTOR, CL_HPP_NO_STD_UNIQUE_PTR and * CL_HPP_NO_STD_STRING macros. * * The OpenCL 1.x versions of the C++ bindings included a size_t wrapper * class to interface with kernel enqueue. This caused unpleasant interactions * with the standard size_t declaration and led to namespacing bugs. * In the 2.0 version we have replaced this with a std::array-based interface. * However, the old behaviour can be regained for backward compatibility * using the CL_HPP_ENABLE_SIZE_T_COMPATIBILITY macro. * * Finally, the program construction interface used a clumsy vector-of-pairs * design in the earlier versions. We have replaced that with a cleaner * vector-of-vectors and vector-of-strings design. However, for backward * compatibility old behaviour can be regained with the * CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY macro. * * In OpenCL 2.0 OpenCL C is not entirely backward compatibility with * earlier versions. As a result a flag must be passed to the OpenCL C * compiled to request OpenCL 2.0 compilation of kernels with 1.2 as * the default in the absence of the flag. * In some cases the C++ bindings automatically compile code for ease. * For those cases the compilation defaults to OpenCL C 2.0. * If this is not wanted, the CL_HPP_CL_1_2_DEFAULT_BUILD macro may * be specified to assume 1.2 compilation. * If more fine-grained decisions on a per-kernel bases are required * then explicit build operations that take the flag should be used. * * * \section parameterization Parameters * This header may be parameterized by a set of preprocessor macros. * * - CL_HPP_TARGET_OPENCL_VERSION * * Defines the target OpenCL runtime version to build the header * against. Defaults to 200, representing OpenCL 2.0. * * - CL_HPP_NO_STD_STRING * * Do not use the standard library string class. cl::string is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_VECTOR * * Do not use the standard library vector class. cl::vector is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_ARRAY * * Do not use the standard library array class. cl::array is not * defined and may be defined by the user before cl2.hpp is * included. * * - CL_HPP_NO_STD_UNIQUE_PTR * * Do not use the standard library unique_ptr class. cl::pointer and * the cl::allocate_pointer functions are not defined and may be * defined by the user before cl2.hpp is included. * * - CL_HPP_ENABLE_DEVICE_FISSION * * Enables device fission for OpenCL 1.2 platforms. * * - CL_HPP_ENABLE_EXCEPTIONS * * Enable exceptions for use in the C++ bindings header. This is the * preferred error handling mechanism but is not required. * * - CL_HPP_ENABLE_SIZE_T_COMPATIBILITY * * Backward compatibility option to support cl.hpp-style size_t * class. Replaces the updated std::array derived version and * removal of size_t from the namespace. Note that in this case the * new size_t class is placed in the cl::compatibility namespace and * thus requires an additional using declaration for direct backward * compatibility. * * - CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY * * Enable older vector of pairs interface for construction of * programs. * * - CL_HPP_CL_1_2_DEFAULT_BUILD * * Default to OpenCL C 1.2 compilation rather than OpenCL C 2.0 * applies to use of cl::Program construction and other program * build variants. * * * \section example Example * * The following example shows a general use case for the C++ * bindings, including support for the optional exception feature and * also the supplied vector and string classes, see following sections for * decriptions of these features. * * \code #define CL_HPP_ENABLE_EXCEPTIONS #define CL_HPP_TARGET_OPENCL_VERSION 200 #include #include #include #include #include const int numElements = 32; int main(void) { // Filter for a 2.0 platform and set it as the default std::vector platforms; cl::Platform::get(&platforms); cl::Platform plat; for (auto &p : platforms) { std::string platver = p.getInfo(); if (platver.find("OpenCL 2.") != std::string::npos) { plat = p; } } if (plat() == 0) { std::cout << "No OpenCL 2.0 platform found."; return -1; } cl::Platform newP = cl::Platform::setDefault(plat); if (newP != plat) { std::cout << "Error setting default platform."; return -1; } // Use C++11 raw string literals for kernel source code std::string kernel1{R"CLC( global int globalA; kernel void updateGlobal() { globalA = 75; } )CLC"}; std::string kernel2{R"CLC( typedef struct { global int *bar; } Foo; kernel void vectorAdd(global const Foo* aNum, global const int *inputA, global const int *inputB, global int *output, int val, write_only pipe int outPipe, queue_t childQueue) { output[get_global_id(0)] = inputA[get_global_id(0)] + inputB[get_global_id(0)] + val + *(aNum->bar); write_pipe(outPipe, &val); queue_t default_queue = get_default_queue(); ndrange_t ndrange = ndrange_1D(get_global_size(0)/2, get_global_size(0)/2); // Have a child kernel write into third quarter of output enqueue_kernel(default_queue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ output[get_global_size(0)*2 + get_global_id(0)] = inputA[get_global_size(0)*2 + get_global_id(0)] + inputB[get_global_size(0)*2 + get_global_id(0)] + globalA; }); // Have a child kernel write into last quarter of output enqueue_kernel(childQueue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ output[get_global_size(0)*3 + get_global_id(0)] = inputA[get_global_size(0)*3 + get_global_id(0)] + inputB[get_global_size(0)*3 + get_global_id(0)] + globalA + 2; }); } )CLC"}; // New simpler string interface style std::vector programStrings {kernel1, kernel2}; cl::Program vectorAddProgram(programStrings); try { vectorAddProgram.build("-cl-std=CL2.0"); } catch (...) { // Print build info for all devices cl_int buildErr = CL_SUCCESS; auto buildInfo = vectorAddProgram.getBuildInfo(&buildErr); for (auto &pair : buildInfo) { std::cerr << pair.second << std::endl << std::endl; } return 1; } typedef struct { int *bar; } Foo; // Get and run kernel that initializes the program-scope global // A test for kernels that take no arguments auto program2Kernel = cl::KernelFunctor<>(vectorAddProgram, "updateGlobal"); program2Kernel( cl::EnqueueArgs( cl::NDRange(1))); ////////////////// // SVM allocations auto anSVMInt = cl::allocate_svm>(); *anSVMInt = 5; cl::SVMAllocator>> svmAllocReadOnly; auto fooPointer = cl::allocate_pointer(svmAllocReadOnly); fooPointer->bar = anSVMInt.get(); cl::SVMAllocator> svmAlloc; std::vector>> inputA(numElements, 1, svmAlloc); cl::coarse_svm_vector inputB(numElements, 2, svmAlloc); // ////////////// // Traditional cl_mem allocations std::vector output(numElements, 0xdeadbeef); cl::Buffer outputBuffer(begin(output), end(output), false); cl::Pipe aPipe(sizeof(cl_int), numElements / 2); // Default command queue, also passed in as a parameter cl::DeviceCommandQueue defaultDeviceQueue = cl::DeviceCommandQueue::makeDefault( cl::Context::getDefault(), cl::Device::getDefault()); auto vectorAddKernel = cl::KernelFunctor< decltype(fooPointer)&, int*, cl::coarse_svm_vector&, cl::Buffer, int, cl::Pipe&, cl::DeviceCommandQueue >(vectorAddProgram, "vectorAdd"); // Ensure that the additional SVM pointer is available to the kernel // This one was not passed as a parameter vectorAddKernel.setSVMPointers(anSVMInt); // Hand control of coarse allocations to runtime cl::enqueueUnmapSVM(anSVMInt); cl::enqueueUnmapSVM(fooPointer); cl::unmapSVM(inputB); cl::unmapSVM(output2); cl_int error; vectorAddKernel( cl::EnqueueArgs( cl::NDRange(numElements/2), cl::NDRange(numElements/2)), fooPointer, inputA.data(), inputB, outputBuffer, 3, aPipe, defaultDeviceQueue, error ); cl::copy(outputBuffer, begin(output), end(output)); // Grab the SVM output vector using a map cl::mapSVM(output2); cl::Device d = cl::Device::getDefault(); std::cout << "Output:\n"; for (int i = 1; i < numElements; ++i) { std::cout << "\t" << output[i] << "\n"; } std::cout << "\n\n"; return 0; } * * \endcode * */ #ifndef CL_HPP_ #define CL_HPP_ /* Handle deprecated preprocessor definitions. In each case, we only check for * the old name if the new name is not defined, so that user code can define * both and hence work with either version of the bindings. */ #if !defined(CL_HPP_USE_DX_INTEROP) && defined(USE_DX_INTEROP) # pragma message("cl2.hpp: USE_DX_INTEROP is deprecated. Define CL_HPP_USE_DX_INTEROP instead") # define CL_HPP_USE_DX_INTEROP #endif #if !defined(CL_HPP_USE_CL_DEVICE_FISSION) && defined(USE_CL_DEVICE_FISSION) # pragma message("cl2.hpp: USE_CL_DEVICE_FISSION is deprecated. Define CL_HPP_USE_CL_DEVICE_FISSION instead") # define CL_HPP_USE_CL_DEVICE_FISSION #endif #if !defined(CL_HPP_ENABLE_EXCEPTIONS) && defined(__CL_ENABLE_EXCEPTIONS) # pragma message("cl2.hpp: __CL_ENABLE_EXCEPTIONS is deprecated. Define CL_HPP_ENABLE_EXCEPTIONS instead") # define CL_HPP_ENABLE_EXCEPTIONS #endif #if !defined(CL_HPP_NO_STD_VECTOR) && defined(__NO_STD_VECTOR) # pragma message("cl2.hpp: __NO_STD_VECTOR is deprecated. Define CL_HPP_NO_STD_VECTOR instead") # define CL_HPP_NO_STD_VECTOR #endif #if !defined(CL_HPP_NO_STD_STRING) && defined(__NO_STD_STRING) # pragma message("cl2.hpp: __NO_STD_STRING is deprecated. Define CL_HPP_NO_STD_STRING instead") # define CL_HPP_NO_STD_STRING #endif #if defined(VECTOR_CLASS) # pragma message("cl2.hpp: VECTOR_CLASS is deprecated. Alias cl::vector instead") #endif #if defined(STRING_CLASS) # pragma message("cl2.hpp: STRING_CLASS is deprecated. Alias cl::string instead.") #endif #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) && defined(__CL_USER_OVERRIDE_ERROR_STRINGS) # pragma message("cl2.hpp: __CL_USER_OVERRIDE_ERROR_STRINGS is deprecated. Define CL_HPP_USER_OVERRIDE_ERROR_STRINGS instead") # define CL_HPP_USER_OVERRIDE_ERROR_STRINGS #endif /* Warn about features that are no longer supported */ #if defined(__USE_DEV_VECTOR) # pragma message("cl2.hpp: __USE_DEV_VECTOR is no longer supported. Expect compilation errors") #endif #if defined(__USE_DEV_STRING) # pragma message("cl2.hpp: __USE_DEV_STRING is no longer supported. Expect compilation errors") #endif /* Detect which version to target */ #if !defined(CL_HPP_TARGET_OPENCL_VERSION) # pragma message("cl2.hpp: CL_HPP_TARGET_OPENCL_VERSION is not defined. It will default to 200 (OpenCL 2.0)") # define CL_HPP_TARGET_OPENCL_VERSION 200 #endif #if CL_HPP_TARGET_OPENCL_VERSION != 100 && CL_HPP_TARGET_OPENCL_VERSION != 110 && CL_HPP_TARGET_OPENCL_VERSION != 120 && CL_HPP_TARGET_OPENCL_VERSION != 200 # pragma message("cl2.hpp: CL_HPP_TARGET_OPENCL_VERSION is not a valid value (100, 110, 120 or 200). It will be set to 200") # undef CL_HPP_TARGET_OPENCL_VERSION # define CL_HPP_TARGET_OPENCL_VERSION 200 #endif #if !defined(CL_HPP_MINIMUM_OPENCL_VERSION) # define CL_HPP_MINIMUM_OPENCL_VERSION 200 #endif #if CL_HPP_MINIMUM_OPENCL_VERSION != 100 && CL_HPP_MINIMUM_OPENCL_VERSION != 110 && CL_HPP_MINIMUM_OPENCL_VERSION != 120 && CL_HPP_MINIMUM_OPENCL_VERSION != 200 # pragma message("cl2.hpp: CL_HPP_MINIMUM_OPENCL_VERSION is not a valid value (100, 110, 120 or 200). It will be set to 100") # undef CL_HPP_MINIMUM_OPENCL_VERSION # define CL_HPP_MINIMUM_OPENCL_VERSION 100 #endif #if CL_HPP_MINIMUM_OPENCL_VERSION > CL_HPP_TARGET_OPENCL_VERSION # error "CL_HPP_MINIMUM_OPENCL_VERSION must not be greater than CL_HPP_TARGET_OPENCL_VERSION" #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 100 && !defined(CL_USE_DEPRECATED_OPENCL_1_0_APIS) # define CL_USE_DEPRECATED_OPENCL_1_0_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 110 && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) # define CL_USE_DEPRECATED_OPENCL_1_1_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 120 && !defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) # define CL_USE_DEPRECATED_OPENCL_1_2_APIS #endif #if CL_HPP_MINIMUM_OPENCL_VERSION <= 200 && !defined(CL_USE_DEPRECATED_OPENCL_2_0_APIS) # define CL_USE_DEPRECATED_OPENCL_2_0_APIS #endif #ifdef _WIN32 #include #if defined(CL_HPP_USE_DX_INTEROP) #include #include #endif #endif // _WIN32 #if defined(_MSC_VER) #include #endif // _MSC_VER // Check for a valid C++ version // Need to do both tests here because for some reason __cplusplus is not // updated in visual studio #if (!defined(_MSC_VER) && __cplusplus < 201103L) || (defined(_MSC_VER) && _MSC_VER < 1700) #error Visual studio 2013 or another C++11-supporting compiler required #endif // #if defined(CL_HPP_USE_CL_DEVICE_FISSION) || defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) #include #endif #if defined(__APPLE__) || defined(__MACOSX) #include #else #include #endif // !__APPLE__ #if (__cplusplus >= 201103L) #define CL_HPP_NOEXCEPT_ noexcept #else #define CL_HPP_NOEXCEPT_ #endif #if defined(_MSC_VER) # define CL_HPP_DEFINE_STATIC_MEMBER_ __declspec(selectany) #else # define CL_HPP_DEFINE_STATIC_MEMBER_ __attribute__((weak)) #endif // !_MSC_VER // Define deprecated prefixes and suffixes to ensure compilation // in case they are not pre-defined #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_1_DEPRECATED) #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #if !defined(CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED) #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #endif // #if !defined(CL_EXT_PREFIX__VERSION_1_2_DEPRECATED) #if !defined(CL_CALLBACK) #define CL_CALLBACK #endif //CL_CALLBACK #include #include #include #include #include #include // Define a size_type to represent a correctly resolved size_t #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { using size_type = ::size_t; } // namespace cl #else // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { using size_type = size_t; } // namespace cl #endif // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) #if defined(CL_HPP_ENABLE_EXCEPTIONS) #include #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) #if !defined(CL_HPP_NO_STD_VECTOR) #include namespace cl { template < class T, class Alloc = std::allocator > using vector = std::vector; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_VECTOR) #if !defined(CL_HPP_NO_STD_STRING) #include namespace cl { using string = std::string; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_STRING) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) #include namespace cl { // Replace unique_ptr and allocate_pointer for internal use // to allow user to replace them template using pointer = std::unique_ptr; } // namespace cl #endif #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if !defined(CL_HPP_NO_STD_ARRAY) #include namespace cl { template < class T, size_type N > using array = std::array; } // namespace cl #endif // #if !defined(CL_HPP_NO_STD_ARRAY) // Define size_type appropriately to allow backward-compatibility // use of the old size_t interface class #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) namespace cl { namespace compatibility { /*! \brief class used to interface between C++ and * OpenCL C calls that require arrays of size_t values, whose * size is known statically. */ template class size_t { private: size_type data_[N]; public: //! \brief Initialize size_t to all 0s size_t() { for (int i = 0; i < N; ++i) { data_[i] = 0; } } size_t(const array &rhs) { for (int i = 0; i < N; ++i) { data_[i] = rhs[i]; } } size_type& operator[](int index) { return data_[index]; } const size_type& operator[](int index) const { return data_[index]; } //! \brief Conversion operator to T*. operator size_type* () { return data_; } //! \brief Conversion operator to const T*. operator const size_type* () const { return data_; } operator array() const { array ret; for (int i = 0; i < N; ++i) { ret[i] = data_[i]; } return ret; } }; } // namespace compatibility template using size_t = compatibility::size_t; } // namespace cl #endif // #if defined(CL_HPP_ENABLE_SIZE_T_COMPATIBILITY) // Helper alias to avoid confusing the macros namespace cl { namespace detail { using size_t_array = array; } // namespace detail } // namespace cl /*! \namespace cl * * \brief The OpenCL C++ bindings are defined within this namespace. * */ namespace cl { class Memory; #define CL_HPP_INIT_CL_EXT_FCN_PTR_(name) \ if (!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddress(#name); \ if (!pfn_##name) { \ } \ } #define CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, name) \ if (!pfn_##name) { \ pfn_##name = (PFN_##name) \ clGetExtensionFunctionAddressForPlatform(platform, #name); \ if (!pfn_##name) { \ } \ } class Program; class Device; class Context; class CommandQueue; class DeviceCommandQueue; class Memory; class Buffer; class Pipe; #if defined(CL_HPP_ENABLE_EXCEPTIONS) /*! \brief Exception class * * This may be thrown by API functions when CL_HPP_ENABLE_EXCEPTIONS is defined. */ class Error : public std::exception { private: cl_int err_; const char * errStr_; public: /*! \brief Create a new CL error exception for a given error code * and corresponding message. * * \param err error code value. * * \param errStr a descriptive string that must remain in scope until * handling of the exception has concluded. If set, it * will be returned by what(). */ Error(cl_int err, const char * errStr = NULL) : err_(err), errStr_(errStr) {} ~Error() throw() {} /*! \brief Get error string associated with exception * * \return A memory pointer to the error message string. */ virtual const char * what() const throw () { if (errStr_ == NULL) { return "empty"; } else { return errStr_; } } /*! \brief Get error code associated with exception * * \return The error code. */ cl_int err(void) const { return err_; } }; #define CL_HPP_ERR_STR_(x) #x #else #define CL_HPP_ERR_STR_(x) NULL #endif // CL_HPP_ENABLE_EXCEPTIONS namespace detail { #if defined(CL_HPP_ENABLE_EXCEPTIONS) static inline cl_int errHandler ( cl_int err, const char * errStr = NULL) { if (err != CL_SUCCESS) { throw Error(err, errStr); } return err; } #else static inline cl_int errHandler (cl_int err, const char * errStr = NULL) { (void) errStr; // suppress unused variable warning return err; } #endif // CL_HPP_ENABLE_EXCEPTIONS } //! \cond DOXYGEN_DETAIL #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) #define __GET_DEVICE_INFO_ERR CL_HPP_ERR_STR_(clGetDeviceInfo) #define __GET_PLATFORM_INFO_ERR CL_HPP_ERR_STR_(clGetPlatformInfo) #define __GET_DEVICE_IDS_ERR CL_HPP_ERR_STR_(clGetDeviceIDs) #define __GET_PLATFORM_IDS_ERR CL_HPP_ERR_STR_(clGetPlatformIDs) #define __GET_CONTEXT_INFO_ERR CL_HPP_ERR_STR_(clGetContextInfo) #define __GET_EVENT_INFO_ERR CL_HPP_ERR_STR_(clGetEventInfo) #define __GET_EVENT_PROFILE_INFO_ERR CL_HPP_ERR_STR_(clGetEventProfileInfo) #define __GET_MEM_OBJECT_INFO_ERR CL_HPP_ERR_STR_(clGetMemObjectInfo) #define __GET_IMAGE_INFO_ERR CL_HPP_ERR_STR_(clGetImageInfo) #define __GET_SAMPLER_INFO_ERR CL_HPP_ERR_STR_(clGetSamplerInfo) #define __GET_KERNEL_INFO_ERR CL_HPP_ERR_STR_(clGetKernelInfo) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __GET_KERNEL_ARG_INFO_ERR CL_HPP_ERR_STR_(clGetKernelArgInfo) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __GET_KERNEL_WORK_GROUP_INFO_ERR CL_HPP_ERR_STR_(clGetKernelWorkGroupInfo) #define __GET_PROGRAM_INFO_ERR CL_HPP_ERR_STR_(clGetProgramInfo) #define __GET_PROGRAM_BUILD_INFO_ERR CL_HPP_ERR_STR_(clGetProgramBuildInfo) #define __GET_COMMAND_QUEUE_INFO_ERR CL_HPP_ERR_STR_(clGetCommandQueueInfo) #define __CREATE_CONTEXT_ERR CL_HPP_ERR_STR_(clCreateContext) #define __CREATE_CONTEXT_FROM_TYPE_ERR CL_HPP_ERR_STR_(clCreateContextFromType) #define __GET_SUPPORTED_IMAGE_FORMATS_ERR CL_HPP_ERR_STR_(clGetSupportedImageFormats) #define __CREATE_BUFFER_ERR CL_HPP_ERR_STR_(clCreateBuffer) #define __COPY_ERR CL_HPP_ERR_STR_(cl::copy) #define __CREATE_SUBBUFFER_ERR CL_HPP_ERR_STR_(clCreateSubBuffer) #define __CREATE_GL_BUFFER_ERR CL_HPP_ERR_STR_(clCreateFromGLBuffer) #define __CREATE_GL_RENDER_BUFFER_ERR CL_HPP_ERR_STR_(clCreateFromGLBuffer) #define __GET_GL_OBJECT_INFO_ERR CL_HPP_ERR_STR_(clGetGLObjectInfo) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_IMAGE_ERR CL_HPP_ERR_STR_(clCreateImage) #define __CREATE_GL_TEXTURE_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture) #define __IMAGE_DIMENSION_ERR CL_HPP_ERR_STR_(Incorrect image dimensions) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR CL_HPP_ERR_STR_(clSetMemObjectDestructorCallback) #define __CREATE_USER_EVENT_ERR CL_HPP_ERR_STR_(clCreateUserEvent) #define __SET_USER_EVENT_STATUS_ERR CL_HPP_ERR_STR_(clSetUserEventStatus) #define __SET_EVENT_CALLBACK_ERR CL_HPP_ERR_STR_(clSetEventCallback) #define __WAIT_FOR_EVENTS_ERR CL_HPP_ERR_STR_(clWaitForEvents) #define __CREATE_KERNEL_ERR CL_HPP_ERR_STR_(clCreateKernel) #define __SET_KERNEL_ARGS_ERR CL_HPP_ERR_STR_(clSetKernelArg) #define __CREATE_PROGRAM_WITH_SOURCE_ERR CL_HPP_ERR_STR_(clCreateProgramWithSource) #define __CREATE_PROGRAM_WITH_BINARY_ERR CL_HPP_ERR_STR_(clCreateProgramWithBinary) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR CL_HPP_ERR_STR_(clCreateProgramWithBuiltInKernels) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __BUILD_PROGRAM_ERR CL_HPP_ERR_STR_(clBuildProgram) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __COMPILE_PROGRAM_ERR CL_HPP_ERR_STR_(clCompileProgram) #define __LINK_PROGRAM_ERR CL_HPP_ERR_STR_(clLinkProgram) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_KERNELS_IN_PROGRAM_ERR CL_HPP_ERR_STR_(clCreateKernelsInProgram) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #define __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR CL_HPP_ERR_STR_(clCreateCommandQueueWithProperties) #define __CREATE_SAMPLER_WITH_PROPERTIES_ERR CL_HPP_ERR_STR_(clCreateSamplerWithProperties) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 #define __SET_COMMAND_QUEUE_PROPERTY_ERR CL_HPP_ERR_STR_(clSetCommandQueueProperty) #define __ENQUEUE_READ_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueReadBuffer) #define __ENQUEUE_READ_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueReadBufferRect) #define __ENQUEUE_WRITE_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueWriteBuffer) #define __ENQUEUE_WRITE_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueWriteBufferRect) #define __ENQEUE_COPY_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueCopyBuffer) #define __ENQEUE_COPY_BUFFER_RECT_ERR CL_HPP_ERR_STR_(clEnqueueCopyBufferRect) #define __ENQUEUE_FILL_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueFillBuffer) #define __ENQUEUE_READ_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueReadImage) #define __ENQUEUE_WRITE_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueWriteImage) #define __ENQUEUE_COPY_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueCopyImage) #define __ENQUEUE_FILL_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueFillImage) #define __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueCopyImageToBuffer) #define __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueCopyBufferToImage) #define __ENQUEUE_MAP_BUFFER_ERR CL_HPP_ERR_STR_(clEnqueueMapBuffer) #define __ENQUEUE_MAP_IMAGE_ERR CL_HPP_ERR_STR_(clEnqueueMapImage) #define __ENQUEUE_UNMAP_MEM_OBJECT_ERR CL_HPP_ERR_STR_(clEnqueueUnMapMemObject) #define __ENQUEUE_NDRANGE_KERNEL_ERR CL_HPP_ERR_STR_(clEnqueueNDRangeKernel) #define __ENQUEUE_NATIVE_KERNEL CL_HPP_ERR_STR_(clEnqueueNativeKernel) #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_MIGRATE_MEM_OBJECTS_ERR CL_HPP_ERR_STR_(clEnqueueMigrateMemObjects) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_ACQUIRE_GL_ERR CL_HPP_ERR_STR_(clEnqueueAcquireGLObjects) #define __ENQUEUE_RELEASE_GL_ERR CL_HPP_ERR_STR_(clEnqueueReleaseGLObjects) #define __CREATE_PIPE_ERR CL_HPP_ERR_STR_(clCreatePipe) #define __GET_PIPE_INFO_ERR CL_HPP_ERR_STR_(clGetPipeInfo) #define __RETAIN_ERR CL_HPP_ERR_STR_(Retain Object) #define __RELEASE_ERR CL_HPP_ERR_STR_(Release Object) #define __FLUSH_ERR CL_HPP_ERR_STR_(clFlush) #define __FINISH_ERR CL_HPP_ERR_STR_(clFinish) #define __VECTOR_CAPACITY_ERR CL_HPP_ERR_STR_(Vector capacity error) /** * CL 1.2 version that uses device fission. */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __CREATE_SUB_DEVICES_ERR CL_HPP_ERR_STR_(clCreateSubDevices) #else #define __CREATE_SUB_DEVICES_ERR CL_HPP_ERR_STR_(clCreateSubDevicesEXT) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define __ENQUEUE_MARKER_ERR CL_HPP_ERR_STR_(clEnqueueMarker) #define __ENQUEUE_WAIT_FOR_EVENTS_ERR CL_HPP_ERR_STR_(clEnqueueWaitForEvents) #define __ENQUEUE_BARRIER_ERR CL_HPP_ERR_STR_(clEnqueueBarrier) #define __UNLOAD_COMPILER_ERR CL_HPP_ERR_STR_(clUnloadCompiler) #define __CREATE_GL_TEXTURE_2D_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture2D) #define __CREATE_GL_TEXTURE_3D_ERR CL_HPP_ERR_STR_(clCreateFromGLTexture3D) #define __CREATE_IMAGE2D_ERR CL_HPP_ERR_STR_(clCreateImage2D) #define __CREATE_IMAGE3D_ERR CL_HPP_ERR_STR_(clCreateImage3D) #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * Deprecated APIs for 2.0 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) #define __CREATE_COMMAND_QUEUE_ERR CL_HPP_ERR_STR_(clCreateCommandQueue) #define __ENQUEUE_TASK_ERR CL_HPP_ERR_STR_(clEnqueueTask) #define __CREATE_SAMPLER_ERR CL_HPP_ERR_STR_(clCreateSampler) #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * CL 1.2 marker and barrier commands */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #define __ENQUEUE_MARKER_WAIT_LIST_ERR CL_HPP_ERR_STR_(clEnqueueMarkerWithWaitList) #define __ENQUEUE_BARRIER_WAIT_LIST_ERR CL_HPP_ERR_STR_(clEnqueueBarrierWithWaitList) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #endif // CL_HPP_USER_OVERRIDE_ERROR_STRINGS //! \endcond namespace detail { // Generic getInfoHelper. The final parameter is used to guide overload // resolution: the actual parameter passed is an int, which makes this // a worse conversion sequence than a specialization that declares the // parameter as an int. template inline cl_int getInfoHelper(Functor f, cl_uint name, T* param, long) { return f(name, sizeof(T), param, NULL); } // Specialized for getInfo // Assumes that the output vector was correctly resized on the way in template inline cl_int getInfoHelper(Func f, cl_uint name, vector>* param, int) { if (name != CL_PROGRAM_BINARIES) { return CL_INVALID_VALUE; } if (param) { // Create array of pointers, calculate total size and pass pointer array in size_type numBinaries = param->size(); vector binariesPointers(numBinaries); for (size_type i = 0; i < numBinaries; ++i) { binariesPointers[i] = (*param)[i].data(); } cl_int err = f(name, numBinaries * sizeof(unsigned char*), binariesPointers.data(), NULL); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } // Specialized getInfoHelper for vector params template inline cl_int getInfoHelper(Func f, cl_uint name, vector* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } const size_type elements = required / sizeof(T); // Temporary to avoid changing param on an error vector localData(elements); err = f(name, required, localData.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { *param = std::move(localData); } return CL_SUCCESS; } /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper( Func f, cl_uint name, vector* param, int, typename T::cl_type = 0) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } const size_type elements = required / sizeof(typename T::cl_type); vector value(elements); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { // Assign to convert CL type to T for each element param->resize(elements); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < elements; i++) { (*param)[i] = T(value[i], true); } } return CL_SUCCESS; } // Specialized GetInfoHelper for string params template inline cl_int getInfoHelper(Func f, cl_uint name, string* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } // std::string has a constant data member // a char vector does not if (required > 0) { vector value(required); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } if (param) { param->assign(begin(value), prev(end(value))); } } else if (param) { param->assign(""); } return CL_SUCCESS; } // Specialized GetInfoHelper for clsize_t params template inline cl_int getInfoHelper(Func f, cl_uint name, array* param, long) { size_type required; cl_int err = f(name, 0, NULL, &required); if (err != CL_SUCCESS) { return err; } size_type elements = required / sizeof(size_type); vector value(elements, 0); err = f(name, required, value.data(), NULL); if (err != CL_SUCCESS) { return err; } // Bound the copy with N to prevent overruns // if passed N > than the amount copied if (elements > N) { elements = N; } for (size_type i = 0; i < elements; ++i) { (*param)[i] = value[i]; } return CL_SUCCESS; } template struct ReferenceHandler; /* Specialization for reference-counted types. This depends on the * existence of Wrapper::cl_type, and none of the other types having the * cl_type member. Note that simplify specifying the parameter as Wrapper * does not work, because when using a derived type (e.g. Context) the generic * template will provide a better match. */ template inline cl_int getInfoHelper(Func f, cl_uint name, T* param, int, typename T::cl_type = 0) { typename T::cl_type value; cl_int err = f(name, sizeof(value), &value, NULL); if (err != CL_SUCCESS) { return err; } *param = value; if (value != NULL) { err = param->retain(); if (err != CL_SUCCESS) { return err; } } return CL_SUCCESS; } #define CL_HPP_PARAM_NAME_INFO_1_0_(F) \ F(cl_platform_info, CL_PLATFORM_PROFILE, string) \ F(cl_platform_info, CL_PLATFORM_VERSION, string) \ F(cl_platform_info, CL_PLATFORM_NAME, string) \ F(cl_platform_info, CL_PLATFORM_VENDOR, string) \ F(cl_platform_info, CL_PLATFORM_EXTENSIONS, string) \ \ F(cl_device_info, CL_DEVICE_TYPE, cl_device_type) \ F(cl_device_info, CL_DEVICE_VENDOR_ID, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_COMPUTE_UNITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WORK_GROUP_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_WORK_ITEM_SIZES, cl::vector) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_CLOCK_FREQUENCY, cl_uint) \ F(cl_device_info, CL_DEVICE_ADDRESS_BITS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_READ_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_MEM_ALLOC_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_WIDTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE2D_MAX_HEIGHT, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_WIDTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_HEIGHT, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE3D_MAX_DEPTH, size_type) \ F(cl_device_info, CL_DEVICE_IMAGE_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_MAX_PARAMETER_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_SAMPLERS, cl_uint) \ F(cl_device_info, CL_DEVICE_MEM_BASE_ADDR_ALIGN, cl_uint) \ F(cl_device_info, CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SINGLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, cl_device_mem_cache_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, cl_uint)\ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_GLOBAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_MAX_CONSTANT_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_TYPE, cl_device_local_mem_type) \ F(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE, cl_ulong) \ F(cl_device_info, CL_DEVICE_ERROR_CORRECTION_SUPPORT, cl_bool) \ F(cl_device_info, CL_DEVICE_PROFILING_TIMER_RESOLUTION, size_type) \ F(cl_device_info, CL_DEVICE_ENDIAN_LITTLE, cl_bool) \ F(cl_device_info, CL_DEVICE_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_COMPILER_AVAILABLE, cl_bool) \ F(cl_device_info, CL_DEVICE_EXECUTION_CAPABILITIES, cl_device_exec_capabilities) \ F(cl_device_info, CL_DEVICE_PLATFORM, cl_platform_id) \ F(cl_device_info, CL_DEVICE_NAME, string) \ F(cl_device_info, CL_DEVICE_VENDOR, string) \ F(cl_device_info, CL_DRIVER_VERSION, string) \ F(cl_device_info, CL_DEVICE_PROFILE, string) \ F(cl_device_info, CL_DEVICE_VERSION, string) \ F(cl_device_info, CL_DEVICE_EXTENSIONS, string) \ \ F(cl_context_info, CL_CONTEXT_REFERENCE_COUNT, cl_uint) \ F(cl_context_info, CL_CONTEXT_DEVICES, cl::vector) \ F(cl_context_info, CL_CONTEXT_PROPERTIES, cl::vector) \ \ F(cl_event_info, CL_EVENT_COMMAND_QUEUE, cl::CommandQueue) \ F(cl_event_info, CL_EVENT_COMMAND_TYPE, cl_command_type) \ F(cl_event_info, CL_EVENT_REFERENCE_COUNT, cl_uint) \ F(cl_event_info, CL_EVENT_COMMAND_EXECUTION_STATUS, cl_int) \ \ F(cl_profiling_info, CL_PROFILING_COMMAND_QUEUED, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_SUBMIT, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_START, cl_ulong) \ F(cl_profiling_info, CL_PROFILING_COMMAND_END, cl_ulong) \ \ F(cl_mem_info, CL_MEM_TYPE, cl_mem_object_type) \ F(cl_mem_info, CL_MEM_FLAGS, cl_mem_flags) \ F(cl_mem_info, CL_MEM_SIZE, size_type) \ F(cl_mem_info, CL_MEM_HOST_PTR, void*) \ F(cl_mem_info, CL_MEM_MAP_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_REFERENCE_COUNT, cl_uint) \ F(cl_mem_info, CL_MEM_CONTEXT, cl::Context) \ \ F(cl_image_info, CL_IMAGE_FORMAT, cl_image_format) \ F(cl_image_info, CL_IMAGE_ELEMENT_SIZE, size_type) \ F(cl_image_info, CL_IMAGE_ROW_PITCH, size_type) \ F(cl_image_info, CL_IMAGE_SLICE_PITCH, size_type) \ F(cl_image_info, CL_IMAGE_WIDTH, size_type) \ F(cl_image_info, CL_IMAGE_HEIGHT, size_type) \ F(cl_image_info, CL_IMAGE_DEPTH, size_type) \ \ F(cl_sampler_info, CL_SAMPLER_REFERENCE_COUNT, cl_uint) \ F(cl_sampler_info, CL_SAMPLER_CONTEXT, cl::Context) \ F(cl_sampler_info, CL_SAMPLER_NORMALIZED_COORDS, cl_bool) \ F(cl_sampler_info, CL_SAMPLER_ADDRESSING_MODE, cl_addressing_mode) \ F(cl_sampler_info, CL_SAMPLER_FILTER_MODE, cl_filter_mode) \ \ F(cl_program_info, CL_PROGRAM_REFERENCE_COUNT, cl_uint) \ F(cl_program_info, CL_PROGRAM_CONTEXT, cl::Context) \ F(cl_program_info, CL_PROGRAM_NUM_DEVICES, cl_uint) \ F(cl_program_info, CL_PROGRAM_DEVICES, cl::vector) \ F(cl_program_info, CL_PROGRAM_SOURCE, string) \ F(cl_program_info, CL_PROGRAM_BINARY_SIZES, cl::vector) \ F(cl_program_info, CL_PROGRAM_BINARIES, cl::vector>) \ \ F(cl_program_build_info, CL_PROGRAM_BUILD_STATUS, cl_build_status) \ F(cl_program_build_info, CL_PROGRAM_BUILD_OPTIONS, string) \ F(cl_program_build_info, CL_PROGRAM_BUILD_LOG, string) \ \ F(cl_kernel_info, CL_KERNEL_FUNCTION_NAME, string) \ F(cl_kernel_info, CL_KERNEL_NUM_ARGS, cl_uint) \ F(cl_kernel_info, CL_KERNEL_REFERENCE_COUNT, cl_uint) \ F(cl_kernel_info, CL_KERNEL_CONTEXT, cl::Context) \ F(cl_kernel_info, CL_KERNEL_PROGRAM, cl::Program) \ \ F(cl_kernel_work_group_info, CL_KERNEL_WORK_GROUP_SIZE, size_type) \ F(cl_kernel_work_group_info, CL_KERNEL_COMPILE_WORK_GROUP_SIZE, cl::detail::size_t_array) \ F(cl_kernel_work_group_info, CL_KERNEL_LOCAL_MEM_SIZE, cl_ulong) \ \ F(cl_command_queue_info, CL_QUEUE_CONTEXT, cl::Context) \ F(cl_command_queue_info, CL_QUEUE_DEVICE, cl::Device) \ F(cl_command_queue_info, CL_QUEUE_REFERENCE_COUNT, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_PROPERTIES, cl_command_queue_properties) #define CL_HPP_PARAM_NAME_INFO_1_1_(F) \ F(cl_context_info, CL_CONTEXT_NUM_DEVICES, cl_uint)\ F(cl_device_info, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, cl_uint) \ F(cl_device_info, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, cl_uint) \ F(cl_device_info, CL_DEVICE_DOUBLE_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_HALF_FP_CONFIG, cl_device_fp_config) \ F(cl_device_info, CL_DEVICE_OPENCL_C_VERSION, string) \ \ F(cl_mem_info, CL_MEM_ASSOCIATED_MEMOBJECT, cl::Memory) \ F(cl_mem_info, CL_MEM_OFFSET, size_type) \ \ F(cl_kernel_work_group_info, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, size_type) \ F(cl_kernel_work_group_info, CL_KERNEL_PRIVATE_MEM_SIZE, cl_ulong) \ \ F(cl_event_info, CL_EVENT_CONTEXT, cl::Context) #define CL_HPP_PARAM_NAME_INFO_1_2_(F) \ F(cl_program_info, CL_PROGRAM_NUM_KERNELS, size_type) \ F(cl_program_info, CL_PROGRAM_KERNEL_NAMES, string) \ \ F(cl_program_build_info, CL_PROGRAM_BINARY_TYPE, cl_program_binary_type) \ \ F(cl_kernel_info, CL_KERNEL_ATTRIBUTES, string) \ \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ADDRESS_QUALIFIER, cl_kernel_arg_address_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_ACCESS_QUALIFIER, cl_kernel_arg_access_qualifier) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_NAME, string) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_NAME, string) \ F(cl_kernel_arg_info, CL_KERNEL_ARG_TYPE_QUALIFIER, cl_kernel_arg_type_qualifier) \ \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE, cl::Device) \ F(cl_device_info, CL_DEVICE_PARTITION_PROPERTIES, cl::vector) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPE, cl::vector) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, size_type) \ F(cl_device_info, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, cl_device_affinity_domain) \ F(cl_device_info, CL_DEVICE_BUILT_IN_KERNELS, string) \ \ F(cl_image_info, CL_IMAGE_ARRAY_SIZE, size_type) \ F(cl_image_info, CL_IMAGE_NUM_MIP_LEVELS, cl_uint) \ F(cl_image_info, CL_IMAGE_NUM_SAMPLES, cl_uint) #define CL_HPP_PARAM_NAME_INFO_2_0_(F) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, cl_command_queue_properties) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_ON_DEVICE_QUEUES, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_ON_DEVICE_EVENTS, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_PIPE_ARGS, cl_uint) \ F(cl_device_info, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, cl_uint) \ F(cl_device_info, CL_DEVICE_PIPE_MAX_PACKET_SIZE, cl_uint) \ F(cl_device_info, CL_DEVICE_SVM_CAPABILITIES, cl_device_svm_capabilities) \ F(cl_device_info, CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT, cl_uint) \ F(cl_device_info, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, size_type) \ F(cl_device_info, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, cl_uint) \ F(cl_command_queue_info, CL_QUEUE_SIZE, cl_uint) \ F(cl_mem_info, CL_MEM_USES_SVM_POINTER, cl_bool) \ F(cl_program_build_info, CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE, size_type) \ F(cl_pipe_info, CL_PIPE_PACKET_SIZE, cl_uint) \ F(cl_pipe_info, CL_PIPE_MAX_PACKETS, cl_uint) #define CL_HPP_PARAM_NAME_DEVICE_FISSION_(F) \ F(cl_device_info, CL_DEVICE_PARENT_DEVICE_EXT, cl_device_id) \ F(cl_device_info, CL_DEVICE_PARTITION_TYPES_EXT, cl::vector) \ F(cl_device_info, CL_DEVICE_AFFINITY_DOMAINS_EXT, cl::vector) \ F(cl_device_info, CL_DEVICE_REFERENCE_COUNT_EXT , cl_uint) \ F(cl_device_info, CL_DEVICE_PARTITION_STYLE_EXT, cl::vector) template struct param_traits {}; #define CL_HPP_DECLARE_PARAM_TRAITS_(token, param_name, T) \ struct token; \ template<> \ struct param_traits \ { \ enum { value = param_name }; \ typedef T param_type; \ }; CL_HPP_PARAM_NAME_INFO_1_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_PARAM_NAME_INFO_1_1_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 #if CL_HPP_TARGET_OPENCL_VERSION >= 120 CL_HPP_PARAM_NAME_INFO_1_2_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 CL_HPP_PARAM_NAME_INFO_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 // Flags deprecated in OpenCL 2.0 #define CL_HPP_PARAM_NAME_INFO_1_0_DEPRECATED_IN_2_0_(F) \ F(cl_device_info, CL_DEVICE_QUEUE_PROPERTIES, cl_command_queue_properties) #define CL_HPP_PARAM_NAME_INFO_1_1_DEPRECATED_IN_2_0_(F) \ F(cl_device_info, CL_DEVICE_HOST_UNIFIED_MEMORY, cl_bool) #define CL_HPP_PARAM_NAME_INFO_1_2_DEPRECATED_IN_2_0_(F) \ F(cl_image_info, CL_IMAGE_BUFFER, cl::Buffer) // Include deprecated query flags based on versions // Only include deprecated 1.0 flags if 2.0 not active as there is an enum clash #if CL_HPP_TARGET_OPENCL_VERSION > 100 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 && CL_HPP_TARGET_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_0_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 110 #if CL_HPP_TARGET_OPENCL_VERSION > 110 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_1_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 #if CL_HPP_TARGET_OPENCL_VERSION > 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 200 CL_HPP_PARAM_NAME_INFO_1_2_DEPRECATED_IN_2_0_(CL_HPP_DECLARE_PARAM_TRAITS_) #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 200 #if defined(CL_HPP_USE_CL_DEVICE_FISSION) CL_HPP_PARAM_NAME_DEVICE_FISSION_(CL_HPP_DECLARE_PARAM_TRAITS_); #endif // CL_HPP_USE_CL_DEVICE_FISSION #ifdef CL_PLATFORM_ICD_SUFFIX_KHR CL_HPP_DECLARE_PARAM_TRAITS_(cl_platform_info, CL_PLATFORM_ICD_SUFFIX_KHR, string) #endif #ifdef CL_DEVICE_PROFILING_TIMER_OFFSET_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_PROFILING_TIMER_OFFSET_AMD, cl_ulong) #endif #ifdef CL_DEVICE_GLOBAL_FREE_MEMORY_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, vector) #endif #ifdef CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_WAVEFRONT_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_WAVEFRONT_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, cl_uint) #endif #ifdef CL_DEVICE_LOCAL_MEM_BANKS_AMD CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_LOCAL_MEM_BANKS_AMD, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV, cl_uint) #endif #ifdef CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV, cl_uint) #endif #ifdef CL_DEVICE_REGISTERS_PER_BLOCK_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_REGISTERS_PER_BLOCK_NV, cl_uint) #endif #ifdef CL_DEVICE_WARP_SIZE_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_WARP_SIZE_NV, cl_uint) #endif #ifdef CL_DEVICE_GPU_OVERLAP_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_GPU_OVERLAP_NV, cl_bool) #endif #ifdef CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV, cl_bool) #endif #ifdef CL_DEVICE_INTEGRATED_MEMORY_NV CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_INTEGRATED_MEMORY_NV, cl_bool) #endif // Convenience functions template inline cl_int getInfo(Func f, cl_uint name, T* param) { return getInfoHelper(f, name, param, 0); } template struct GetInfoFunctor0 { Func f_; const Arg0& arg0_; cl_int operator ()( cl_uint param, size_type size, void* value, size_type* size_ret) { return f_(arg0_, param, size, value, size_ret); } }; template struct GetInfoFunctor1 { Func f_; const Arg0& arg0_; const Arg1& arg1_; cl_int operator ()( cl_uint param, size_type size, void* value, size_type* size_ret) { return f_(arg0_, arg1_, param, size, value, size_ret); } }; template inline cl_int getInfo(Func f, const Arg0& arg0, cl_uint name, T* param) { GetInfoFunctor0 f0 = { f, arg0 }; return getInfoHelper(f0, name, param, 0); } template inline cl_int getInfo(Func f, const Arg0& arg0, const Arg1& arg1, cl_uint name, T* param) { GetInfoFunctor1 f0 = { f, arg0, arg1 }; return getInfoHelper(f0, name, param, 0); } template struct ReferenceHandler { }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * OpenCL 1.2 devices do have retain/release. */ template <> struct ReferenceHandler { /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int retain(cl_device_id device) { return ::clRetainDevice(device); } /** * Retain the device. * \param device A valid device created using createSubDevices * \return * CL_SUCCESS if the function executed successfully. * CL_INVALID_DEVICE if device was not a valid subdevice * CL_OUT_OF_RESOURCES * CL_OUT_OF_HOST_MEMORY */ static cl_int release(cl_device_id device) { return ::clReleaseDevice(device); } }; #else // CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * OpenCL 1.1 devices do not have retain/release. */ template <> struct ReferenceHandler { // cl_device_id does not have retain(). static cl_int retain(cl_device_id) { return CL_SUCCESS; } // cl_device_id does not have release(). static cl_int release(cl_device_id) { return CL_SUCCESS; } }; #endif // ! (CL_HPP_TARGET_OPENCL_VERSION >= 120) template <> struct ReferenceHandler { // cl_platform_id does not have retain(). static cl_int retain(cl_platform_id) { return CL_SUCCESS; } // cl_platform_id does not have release(). static cl_int release(cl_platform_id) { return CL_SUCCESS; } }; template <> struct ReferenceHandler { static cl_int retain(cl_context context) { return ::clRetainContext(context); } static cl_int release(cl_context context) { return ::clReleaseContext(context); } }; template <> struct ReferenceHandler { static cl_int retain(cl_command_queue queue) { return ::clRetainCommandQueue(queue); } static cl_int release(cl_command_queue queue) { return ::clReleaseCommandQueue(queue); } }; template <> struct ReferenceHandler { static cl_int retain(cl_mem memory) { return ::clRetainMemObject(memory); } static cl_int release(cl_mem memory) { return ::clReleaseMemObject(memory); } }; template <> struct ReferenceHandler { static cl_int retain(cl_sampler sampler) { return ::clRetainSampler(sampler); } static cl_int release(cl_sampler sampler) { return ::clReleaseSampler(sampler); } }; template <> struct ReferenceHandler { static cl_int retain(cl_program program) { return ::clRetainProgram(program); } static cl_int release(cl_program program) { return ::clReleaseProgram(program); } }; template <> struct ReferenceHandler { static cl_int retain(cl_kernel kernel) { return ::clRetainKernel(kernel); } static cl_int release(cl_kernel kernel) { return ::clReleaseKernel(kernel); } }; template <> struct ReferenceHandler { static cl_int retain(cl_event event) { return ::clRetainEvent(event); } static cl_int release(cl_event event) { return ::clReleaseEvent(event); } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Extracts version number with major in the upper 16 bits, minor in the lower 16 static cl_uint getVersion(const vector &versionInfo) { int highVersion = 0; int lowVersion = 0; int index = 7; while(versionInfo[index] != '.' ) { highVersion *= 10; highVersion += versionInfo[index]-'0'; ++index; } ++index; while(versionInfo[index] != ' ' && versionInfo[index] != '\0') { lowVersion *= 10; lowVersion += versionInfo[index]-'0'; ++index; } return (highVersion << 16) | lowVersion; } static cl_uint getPlatformVersion(cl_platform_id platform) { size_type size = 0; clGetPlatformInfo(platform, CL_PLATFORM_VERSION, 0, NULL, &size); vector versionInfo(size); clGetPlatformInfo(platform, CL_PLATFORM_VERSION, size, versionInfo.data(), &size); return getVersion(versionInfo); } static cl_uint getDevicePlatformVersion(cl_device_id device) { cl_platform_id platform; clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(platform), &platform, NULL); return getPlatformVersion(platform); } static cl_uint getContextPlatformVersion(cl_context context) { // The platform cannot be queried directly, so we first have to grab a // device and obtain its context size_type size = 0; clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &size); if (size == 0) return 0; vector devices(size/sizeof(cl_device_id)); clGetContextInfo(context, CL_CONTEXT_DEVICES, size, devices.data(), NULL); return getDevicePlatformVersion(devices[0]); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 template class Wrapper { public: typedef T cl_type; protected: cl_type object_; public: Wrapper() : object_(NULL) { } Wrapper(const cl_type &obj, bool retainObject) : object_(obj) { if (retainObject) { detail::errHandler(retain(), __RETAIN_ERR); } } ~Wrapper() { if (object_ != NULL) { release(); } } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; detail::errHandler(retain(), __RETAIN_ERR); } Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT_ { object_ = rhs.object_; rhs.object_ = NULL; } Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; detail::errHandler(retain(), __RETAIN_ERR); } return *this; } Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; rhs.object_ = NULL; } return *this; } Wrapper& operator = (const cl_type &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs; return *this; } const cl_type& operator ()() const { return object_; } cl_type& operator ()() { return object_; } const cl_type get() const { return object_; } cl_type get() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); cl_int retain() const { if (object_ != nullptr) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if (object_ != nullptr) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; template <> class Wrapper { public: typedef cl_device_id cl_type; protected: cl_type object_; bool referenceCountable_; static bool isReferenceCountable(cl_device_id device) { bool retVal = false; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (device != NULL) { int version = getDevicePlatformVersion(device); if(version > ((1 << 16) + 1)) { retVal = true; } } #else // CL_HPP_MINIMUM_OPENCL_VERSION < 120 retVal = true; #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 return retVal; } public: Wrapper() : object_(NULL), referenceCountable_(false) { } Wrapper(const cl_type &obj, bool retainObject) : object_(obj), referenceCountable_(false) { referenceCountable_ = isReferenceCountable(obj); if (retainObject) { detail::errHandler(retain(), __RETAIN_ERR); } } ~Wrapper() { release(); } Wrapper(const Wrapper& rhs) { object_ = rhs.object_; referenceCountable_ = isReferenceCountable(object_); detail::errHandler(retain(), __RETAIN_ERR); } Wrapper(Wrapper&& rhs) CL_HPP_NOEXCEPT_ { object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } Wrapper& operator = (const Wrapper& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; detail::errHandler(retain(), __RETAIN_ERR); } return *this; } Wrapper& operator = (Wrapper&& rhs) { if (this != &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs.object_; referenceCountable_ = rhs.referenceCountable_; rhs.object_ = NULL; rhs.referenceCountable_ = false; } return *this; } Wrapper& operator = (const cl_type &rhs) { detail::errHandler(release(), __RELEASE_ERR); object_ = rhs; referenceCountable_ = isReferenceCountable(object_); return *this; } const cl_type& operator ()() const { return object_; } cl_type& operator ()() { return object_; } const cl_type get() const { return object_; } cl_type get() { return object_; } protected: template friend inline cl_int getInfoHelper(Func, cl_uint, U*, int, typename U::cl_type); template friend inline cl_int getInfoHelper(Func, cl_uint, vector*, int, typename U::cl_type); cl_int retain() const { if( object_ != nullptr && referenceCountable_ ) { return ReferenceHandler::retain(object_); } else { return CL_SUCCESS; } } cl_int release() const { if (object_ != nullptr && referenceCountable_) { return ReferenceHandler::release(object_); } else { return CL_SUCCESS; } } }; template inline bool operator==(const Wrapper &lhs, const Wrapper &rhs) { return lhs() == rhs(); } template inline bool operator!=(const Wrapper &lhs, const Wrapper &rhs) { return !operator==(lhs, rhs); } } // namespace detail //! \endcond using BuildLogType = vector::param_type>>; #if defined(CL_HPP_ENABLE_EXCEPTIONS) /** * Exception class for build errors to carry build info */ class BuildError : public Error { private: BuildLogType buildLogs; public: BuildError(cl_int err, const char * errStr, const BuildLogType &vec) : Error(err, errStr), buildLogs(vec) { } BuildLogType getBuildLog() const { return buildLogs; } }; namespace detail { static inline cl_int buildErrHandler( cl_int err, const char * errStr, const BuildLogType &buildLogs) { if (err != CL_SUCCESS) { throw BuildError(err, errStr, buildLogs); } return err; } } // namespace detail #else namespace detail { static inline cl_int buildErrHandler( cl_int err, const char * errStr, const BuildLogType &buildLogs) { (void)buildLogs; // suppress unused variable warning (void)errStr; return err; } } // namespace detail #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) /*! \stuct ImageFormat * \brief Adds constructors and member functions for cl_image_format. * * \see cl_image_format */ struct ImageFormat : public cl_image_format { //! \brief Default constructor - performs no initialization. ImageFormat(){} //! \brief Initializing constructor. ImageFormat(cl_channel_order order, cl_channel_type type) { image_channel_order = order; image_channel_data_type = type; } //! \brief Assignment operator. ImageFormat& operator = (const ImageFormat& rhs) { if (this != &rhs) { this->image_channel_data_type = rhs.image_channel_data_type; this->image_channel_order = rhs.image_channel_order; } return *this; } }; /*! \brief Class interface for cl_device_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_device_id */ class Device : public detail::Wrapper { private: static std::once_flag default_initialized_; static Device default_; static cl_int default_error_; /*! \brief Create the default context. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault(); /*! \brief Create the default platform from a provided platform. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Device &p) { default_ = p; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Device(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE //! \brief Default constructor - initializes to NULL. Device() : detail::Wrapper() { } /*! \brief Constructor from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ explicit Device(const cl_device_id &device, bool retainObject = false) : detail::Wrapper(device, retainObject) { } /*! \brief Returns the first device on the default context. * * \see Context::getDefault() */ static Device getDefault( cl_int *errResult = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (errResult != NULL) { *errResult = default_error_; } return default_; } /** * Modify the default device to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default device. * Should be compared to the passed value to ensure that it was updated. */ static Device setDefault(const Device &default_device) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_device)); detail::errHandler(default_error_); return default_; } /*! \brief Assignment operator from cl_device_id. * * This simply copies the device ID value, which is an inexpensive operation. */ Device& operator = (const cl_device_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Device(const Device& dev) : detail::Wrapper(dev) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Device& operator = (const Device &dev) { detail::Wrapper::operator=(dev); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Device(Device&& dev) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(dev)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Device& operator = (Device &&dev) { detail::Wrapper::operator=(std::move(dev)); return *this; } //! \brief Wrapper for clGetDeviceInfo(). template cl_int getInfo(cl_device_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetDeviceInfo, object_, name, param), __GET_DEVICE_INFO_ERR); } //! \brief Wrapper for clGetDeviceInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_device_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /** * CL 1.2 version */ #if CL_HPP_TARGET_OPENCL_VERSION >= 120 //! \brief Wrapper for clCreateSubDevices(). cl_int createSubDevices( const cl_device_partition_property * properties, vector* devices) { cl_uint n = 0; cl_int err = clCreateSubDevices(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } vector ids(n); err = clCreateSubDevices(object_, properties, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { // We do not need to retain because this device is being created // by the runtime (*devices)[i] = Device(ids[i], false); } } return CL_SUCCESS; } #elif defined(CL_HPP_USE_CL_DEVICE_FISSION) /** * CL 1.1 version that uses device fission extension. */ cl_int createSubDevices( const cl_device_partition_property_ext * properties, vector* devices) { typedef CL_API_ENTRY cl_int ( CL_API_CALL * PFN_clCreateSubDevicesEXT)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; static PFN_clCreateSubDevicesEXT pfn_clCreateSubDevicesEXT = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_(clCreateSubDevicesEXT); cl_uint n = 0; cl_int err = pfn_clCreateSubDevicesEXT(object_, properties, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } vector ids(n); err = pfn_clCreateSubDevicesEXT(object_, properties, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_SUB_DEVICES_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { // We do not need to retain because this device is being created // by the runtime (*devices)[i] = Device(ids[i], false); } } return CL_SUCCESS; } #endif // defined(CL_HPP_USE_CL_DEVICE_FISSION) }; CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Device::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Device Device::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Device::default_error_ = CL_SUCCESS; /*! \brief Class interface for cl_platform_id. * * \note Copies of these objects are inexpensive, since they don't 'own' * any underlying resources or data structures. * * \see cl_platform_id */ class Platform : public detail::Wrapper { private: static std::once_flag default_initialized_; static Platform default_; static cl_int default_error_; /*! \brief Create the default context. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { // If default wasn't passed ,generate one // Otherwise set it cl_uint n = 0; cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { default_error_ = err; return; } if (n == 0) { default_error_ = CL_INVALID_PLATFORM; return; } vector ids(n); err = ::clGetPlatformIDs(n, ids.data(), NULL); if (err != CL_SUCCESS) { default_error_ = err; return; } default_ = Platform(ids[0]); } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default platform from a provided platform. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Platform &p) { default_ = p; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Platform(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE //! \brief Default constructor - initializes to NULL. Platform() : detail::Wrapper() { } /*! \brief Constructor from cl_platform_id. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This simply copies the platform ID value, which is an inexpensive operation. */ explicit Platform(const cl_platform_id &platform, bool retainObject = false) : detail::Wrapper(platform, retainObject) { } /*! \brief Assignment operator from cl_platform_id. * * This simply copies the platform ID value, which is an inexpensive operation. */ Platform& operator = (const cl_platform_id& rhs) { detail::Wrapper::operator=(rhs); return *this; } static Platform getDefault( cl_int *errResult = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (errResult != NULL) { *errResult = default_error_; } return default_; } /** * Modify the default platform to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default platform. * Should be compared to the passed value to ensure that it was updated. */ static Platform setDefault(const Platform &default_platform) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_platform)); detail::errHandler(default_error_); return default_; } //! \brief Wrapper for clGetPlatformInfo(). cl_int getInfo(cl_platform_info name, string* param) const { return detail::errHandler( detail::getInfo(&::clGetPlatformInfo, object_, name, param), __GET_PLATFORM_INFO_ERR); } //! \brief Wrapper for clGetPlatformInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_platform_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of devices for this platform. * * Wraps clGetDeviceIDs(). */ cl_int getDevices( cl_device_type type, vector* devices) const { cl_uint n = 0; if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } cl_int err = ::clGetDeviceIDs(object_, type, 0, NULL, &n); if (err != CL_SUCCESS && err != CL_DEVICE_NOT_FOUND) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } vector ids(n); if (n>0) { err = ::clGetDeviceIDs(object_, type, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } } // Cannot trivially assign because we need to capture intermediates // with safe construction // We must retain things we obtain from the API to avoid releasing // API-owned objects. if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { (*devices)[i] = Device(ids[i], true); } } return CL_SUCCESS; } #if defined(CL_HPP_USE_DX_INTEROP) /*! \brief Get the list of available D3D10 devices. * * \param d3d_device_source. * * \param d3d_object. * * \param d3d_device_set. * * \param devices returns a vector of OpenCL D3D10 devices found. The cl::Device * values returned in devices can be used to identify a specific OpenCL * device. If \a devices argument is NULL, this argument is ignored. * * \return One of the following values: * - CL_SUCCESS if the function is executed successfully. * * The application can query specific capabilities of the OpenCL device(s) * returned by cl::getDevices. This can be used by the application to * determine which device(s) to use. * * \note In the case that exceptions are enabled and a return value * other than CL_SUCCESS is generated, then cl::Error exception is * generated. */ cl_int getDevices( cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, vector* devices) const { typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clGetDeviceIDsFromD3D10KHR)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint* num_devices); if( devices == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_DEVICE_IDS_ERR); } static PFN_clGetDeviceIDsFromD3D10KHR pfn_clGetDeviceIDsFromD3D10KHR = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(object_, clGetDeviceIDsFromD3D10KHR); cl_uint n = 0; cl_int err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, 0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } vector ids(n); err = pfn_clGetDeviceIDsFromD3D10KHR( object_, d3d_device_source, d3d_object, d3d_device_set, n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_DEVICE_IDS_ERR); } // Cannot trivially assign because we need to capture intermediates // with safe construction // We must retain things we obtain from the API to avoid releasing // API-owned objects. if (devices) { devices->resize(ids.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < ids.size(); i++) { (*devices)[i] = Device(ids[i], true); } } return CL_SUCCESS; } #endif /*! \brief Gets a list of available platforms. * * Wraps clGetPlatformIDs(). */ static cl_int get( vector* platforms) { cl_uint n = 0; if( platforms == NULL ) { return detail::errHandler(CL_INVALID_ARG_VALUE, __GET_PLATFORM_IDS_ERR); } cl_int err = ::clGetPlatformIDs(0, NULL, &n); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } vector ids(n); err = ::clGetPlatformIDs(n, ids.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_PLATFORM_IDS_ERR); } if (platforms) { platforms->resize(ids.size()); // Platforms don't reference count for (size_type i = 0; i < ids.size(); i++) { (*platforms)[i] = Platform(ids[i]); } } return CL_SUCCESS; } /*! \brief Gets the first available platform. * * Wraps clGetPlatformIDs(), returning the first result. */ static cl_int get( Platform * platform) { cl_int err; Platform default_platform = Platform::getDefault(&err); if (platform) { *platform = default_platform; } return err; } /*! \brief Gets the first available platform, returning it by value. * * \return Returns a valid platform if one is available. * If no platform is available will return a null platform. * Throws an exception if no platforms are available * or an error condition occurs. * Wraps clGetPlatformIDs(), returning the first result. */ static Platform get( cl_int * errResult = NULL) { cl_int err; Platform default_platform = Platform::getDefault(&err); if (errResult) { *errResult = err; } return default_platform; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 //! \brief Wrapper for clUnloadCompiler(). cl_int unloadCompiler() { return ::clUnloadPlatformCompiler(object_); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 }; // class Platform CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Platform::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Platform Platform::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Platform::default_error_ = CL_SUCCESS; /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /** * Unload the OpenCL compiler. * \note Deprecated for OpenCL 1.2. Use Platform::unloadCompiler instead. */ inline CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int UnloadCompiler() CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; inline cl_int UnloadCompiler() { return ::clUnloadCompiler(); } #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for cl_context. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_context as the original. For details, see * clRetainContext() and clReleaseContext(). * * \see cl_context */ class Context : public detail::Wrapper { private: static std::once_flag default_initialized_; static Context default_; static cl_int default_error_; /*! \brief Create the default context from the default device type in the default platform. * * This sets @c default_ and @c default_error_. It does not throw * @c cl::Error. */ static void makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { #if !defined(__APPLE__) && !defined(__MACOS) const Platform &p = Platform::getDefault(); cl_platform_id defaultPlatform = p(); cl_context_properties properties[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)defaultPlatform, 0 }; #else // #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties *properties = nullptr; #endif // #if !defined(__APPLE__) && !defined(__MACOS) default_ = Context( CL_DEVICE_TYPE_DEFAULT, properties, NULL, NULL, &default_error_); } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default context from a provided Context. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const Context &c) { default_ = c; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = Context(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Constructs a context including a list of specified devices. * * Wraps clCreateContext(). */ Context( const vector& devices, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateContext( properties, (cl_uint) numDevices, deviceIDs.data(), notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } Context( const Device& device, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; cl_device_id deviceID = device(); object_ = ::clCreateContext( properties, 1, &deviceID, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a context including all or a subset of devices of a specified type. * * Wraps clCreateContextFromType(). */ Context( cl_device_type type, cl_context_properties* properties = NULL, void (CL_CALLBACK * notifyFptr)( const char *, const void *, size_type, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error; #if !defined(__APPLE__) && !defined(__MACOS) cl_context_properties prop[4] = {CL_CONTEXT_PLATFORM, 0, 0, 0 }; if (properties == NULL) { // Get a valid platform ID as we cannot send in a blank one vector platforms; error = Platform::get(&platforms); if (error != CL_SUCCESS) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } return; } // Check the platforms we found for a device of our specified type cl_context_properties platform_id = 0; for (unsigned int i = 0; i < platforms.size(); i++) { vector devices; #if defined(CL_HPP_ENABLE_EXCEPTIONS) try { #endif error = platforms[i].getDevices(type, &devices); #if defined(CL_HPP_ENABLE_EXCEPTIONS) } catch (Error) {} // Catch if exceptions are enabled as we don't want to exit if first platform has no devices of type // We do error checking next anyway, and can throw there if needed #endif // Only squash CL_SUCCESS and CL_DEVICE_NOT_FOUND if (error != CL_SUCCESS && error != CL_DEVICE_NOT_FOUND) { detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } if (devices.size() > 0) { platform_id = (cl_context_properties)platforms[i](); break; } } if (platform_id == 0) { detail::errHandler(CL_DEVICE_NOT_FOUND, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = CL_DEVICE_NOT_FOUND; } return; } prop[1] = platform_id; properties = &prop[0]; } #endif object_ = ::clCreateContextFromType( properties, type, notifyFptr, data, &error); detail::errHandler(error, __CREATE_CONTEXT_FROM_TYPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Context(const Context& ctx) : detail::Wrapper(ctx) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Context& operator = (const Context &ctx) { detail::Wrapper::operator=(ctx); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Context(Context&& ctx) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(ctx)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Context& operator = (Context &&ctx) { detail::Wrapper::operator=(std::move(ctx)); return *this; } /*! \brief Returns a singleton context including all devices of CL_DEVICE_TYPE_DEFAULT. * * \note All calls to this function return the same cl_context as the first. */ static Context getDefault(cl_int * err = NULL) { std::call_once(default_initialized_, makeDefault); detail::errHandler(default_error_); if (err != NULL) { *err = default_error_; } return default_; } /** * Modify the default context to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default context. * Should be compared to the passed value to ensure that it was updated. */ static Context setDefault(const Context &default_context) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_context)); detail::errHandler(default_error_); return default_; } //! \brief Default constructor - initializes to NULL. Context() : detail::Wrapper() { } /*! \brief Constructor from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the cl_context * into the new Context object. */ explicit Context(const cl_context& context, bool retainObject = false) : detail::Wrapper(context, retainObject) { } /*! \brief Assignment operator from cl_context - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseContext() on the value previously held by this instance. */ Context& operator = (const cl_context& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetContextInfo(). template cl_int getInfo(cl_context_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetContextInfo, object_, name, param), __GET_CONTEXT_INFO_ERR); } //! \brief Wrapper for clGetContextInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_context_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Gets a list of supported image formats. * * Wraps clGetSupportedImageFormats(). */ cl_int getSupportedImageFormats( cl_mem_flags flags, cl_mem_object_type type, vector* formats) const { cl_uint numEntries; if (!formats) { return CL_SUCCESS; } cl_int err = ::clGetSupportedImageFormats( object_, flags, type, 0, NULL, &numEntries); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } if (numEntries > 0) { vector value(numEntries); err = ::clGetSupportedImageFormats( object_, flags, type, numEntries, (cl_image_format*)value.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __GET_SUPPORTED_IMAGE_FORMATS_ERR); } formats->assign(begin(value), end(value)); } else { // If no values are being returned, ensure an empty vector comes back formats->clear(); } return CL_SUCCESS; } }; inline void Device::makeDefault() { /* Throwing an exception from a call_once invocation does not do * what we wish, so we catch it and save the error. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { cl_int error = 0; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { default_error_ = error; } else { default_ = context.getInfo()[0]; default_error_ = CL_SUCCESS; } } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag Context::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ Context Context::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int Context::default_error_ = CL_SUCCESS; /*! \brief Class interface for cl_event. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_event as the original. For details, see * clRetainEvent() and clReleaseEvent(). * * \see cl_event */ class Event : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Event() : detail::Wrapper() { } /*! \brief Constructor from cl_event - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_event * into the new Event object. */ explicit Event(const cl_event& event, bool retainObject = false) : detail::Wrapper(event, retainObject) { } /*! \brief Assignment operator from cl_event - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseEvent() on the value previously held by this instance. */ Event& operator = (const cl_event& rhs) { detail::Wrapper::operator=(rhs); return *this; } //! \brief Wrapper for clGetEventInfo(). template cl_int getInfo(cl_event_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetEventInfo, object_, name, param), __GET_EVENT_INFO_ERR); } //! \brief Wrapper for clGetEventInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_event_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } //! \brief Wrapper for clGetEventProfilingInfo(). template cl_int getProfilingInfo(cl_profiling_info name, T* param) const { return detail::errHandler(detail::getInfo( &::clGetEventProfilingInfo, object_, name, param), __GET_EVENT_PROFILE_INFO_ERR); } //! \brief Wrapper for clGetEventProfilingInfo() that returns by value. template typename detail::param_traits::param_type getProfilingInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_profiling_info, name>::param_type param; cl_int result = getProfilingInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! \brief Blocks the calling thread until this event completes. * * Wraps clWaitForEvents(). */ cl_int wait() const { return detail::errHandler( ::clWaitForEvents(1, &object_), __WAIT_FOR_EVENTS_ERR); } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Registers a user callback function for a specific command execution status. * * Wraps clSetEventCallback(). */ cl_int setCallback( cl_int type, void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data = NULL) { return detail::errHandler( ::clSetEventCallback( object_, type, pfn_notify, user_data), __SET_EVENT_CALLBACK_ERR); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ static cl_int waitForEvents(const vector& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Class interface for user events (a subset of cl_event's). * * See Event for details about copy semantics, etc. */ class UserEvent : public Event { public: /*! \brief Constructs a user event on a given context. * * Wraps clCreateUserEvent(). */ UserEvent( const Context& context, cl_int * err = NULL) { cl_int error; object_ = ::clCreateUserEvent( context(), &error); detail::errHandler(error, __CREATE_USER_EVENT_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. UserEvent() : Event() { } /*! \brief Sets the execution status of a user event object. * * Wraps clSetUserEventStatus(). */ cl_int setStatus(cl_int status) { return detail::errHandler( ::clSetUserEventStatus(object_,status), __SET_USER_EVENT_STATUS_ERR); } }; #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Blocks the calling thread until every event specified is complete. * * Wraps clWaitForEvents(). */ inline static cl_int WaitForEvents(const vector& events) { return detail::errHandler( ::clWaitForEvents( (cl_uint) events.size(), (events.size() > 0) ? (cl_event*)&events.front() : NULL), __WAIT_FOR_EVENTS_ERR); } /*! \brief Class interface for cl_mem. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_mem as the original. For details, see * clRetainMemObject() and clReleaseMemObject(). * * \see cl_mem */ class Memory : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Memory() : detail::Wrapper() { } /*! \brief Constructor from cl_mem - takes ownership. * * Optionally transfer ownership of a refcount on the cl_mem * into the new Memory object. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * * See Memory for further details. */ explicit Memory(const cl_mem& memory, bool retainObject) : detail::Wrapper(memory, retainObject) { } /*! \brief Assignment operator from cl_mem - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseMemObject() on the value previously held by this instance. */ Memory& operator = (const cl_mem& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Memory(const Memory& mem) : detail::Wrapper(mem) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Memory& operator = (const Memory &mem) { detail::Wrapper::operator=(mem); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Memory(Memory&& mem) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(mem)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Memory& operator = (Memory &&mem) { detail::Wrapper::operator=(std::move(mem)); return *this; } //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_mem_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetMemObjectInfo, object_, name, param), __GET_MEM_OBJECT_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_mem_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Registers a callback function to be called when the memory object * is no longer needed. * * Wraps clSetMemObjectDestructorCallback(). * * Repeated calls to this function, for a given cl_mem value, will append * to the list of functions called (in reverse order) when memory object's * resources are freed and the memory object is deleted. * * \note * The registered callbacks are associated with the underlying cl_mem * value - not the Memory class instance. */ cl_int setDestructorCallback( void (CL_CALLBACK * pfn_notify)(cl_mem, void *), void * user_data = NULL) { return detail::errHandler( ::clSetMemObjectDestructorCallback( object_, pfn_notify, user_data), __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 }; // Pre-declare copy functions class Buffer; template< typename IteratorType > cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ); template< typename IteratorType > cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ); #if CL_HPP_TARGET_OPENCL_VERSION >= 200 namespace detail { class SVMTraitNull { public: static cl_svm_mem_flags getSVMMemFlags() { return 0; } }; } // namespace detail template class SVMTraitReadWrite { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_READ_WRITE | Trait::getSVMMemFlags(); } }; template class SVMTraitReadOnly { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_READ_ONLY | Trait::getSVMMemFlags(); } }; template class SVMTraitWriteOnly { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_WRITE_ONLY | Trait::getSVMMemFlags(); } }; template> class SVMTraitCoarse { public: static cl_svm_mem_flags getSVMMemFlags() { return Trait::getSVMMemFlags(); } }; template> class SVMTraitFine { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_SVM_FINE_GRAIN_BUFFER | Trait::getSVMMemFlags(); } }; template> class SVMTraitAtomic { public: static cl_svm_mem_flags getSVMMemFlags() { return CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS | Trait::getSVMMemFlags(); } }; // Pre-declare SVM map function template inline cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL); /** * STL-like allocator class for managing SVM objects provided for convenience. * * Note that while this behaves like an allocator for the purposes of constructing vectors and similar objects, * care must be taken when using with smart pointers. * The allocator should not be used to construct a unique_ptr if we are using coarse-grained SVM mode because * the coarse-grained management behaviour would behave incorrectly with respect to reference counting. * * Instead the allocator embeds a Deleter which may be used with unique_ptr and is used * with the allocate_shared and allocate_ptr supplied operations. */ template class SVMAllocator { private: Context context_; public: typedef T value_type; typedef value_type* pointer; typedef const value_type* const_pointer; typedef value_type& reference; typedef const value_type& const_reference; typedef std::size_t size_type; typedef std::ptrdiff_t difference_type; template struct rebind { typedef SVMAllocator other; }; template friend class SVMAllocator; SVMAllocator() : context_(Context::getDefault()) { } explicit SVMAllocator(cl::Context context) : context_(context) { } SVMAllocator(const SVMAllocator &other) : context_(other.context_) { } template SVMAllocator(const SVMAllocator &other) : context_(other.context_) { } ~SVMAllocator() { } pointer address(reference r) CL_HPP_NOEXCEPT_ { return std::addressof(r); } const_pointer address(const_reference r) CL_HPP_NOEXCEPT_ { return std::addressof(r); } /** * Allocate an SVM pointer. * * If the allocator is coarse-grained, this will take ownership to allow * containers to correctly construct data in place. */ pointer allocate( size_type size, typename cl::SVMAllocator::const_pointer = 0) { // Allocate memory with default alignment matching the size of the type void* voidPointer = clSVMAlloc( context_(), SVMTrait::getSVMMemFlags(), size*sizeof(T), 0); pointer retValue = reinterpret_cast( voidPointer); #if defined(CL_HPP_ENABLE_EXCEPTIONS) if (!retValue) { std::bad_alloc excep; throw excep; } #endif // #if defined(CL_HPP_ENABLE_EXCEPTIONS) // If allocation was coarse-grained then map it if (!(SVMTrait::getSVMMemFlags() & CL_MEM_SVM_FINE_GRAIN_BUFFER)) { cl_int err = enqueueMapSVM(retValue, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, size*sizeof(T)); if (err != CL_SUCCESS) { std::bad_alloc excep; throw excep; } } // If exceptions disabled, return null pointer from allocator return retValue; } void deallocate(pointer p, size_type) { clSVMFree(context_(), p); } /** * Return the maximum possible allocation size. * This is the minimum of the maximum sizes of all devices in the context. */ size_type max_size() const CL_HPP_NOEXCEPT_ { size_type maxSize = std::numeric_limits::max() / sizeof(T); for (const Device &d : context_.getInfo()) { maxSize = std::min( maxSize, static_cast(d.getInfo())); } return maxSize; } template< class U, class... Args > void construct(U* p, Args&&... args) { new(p)T(args...); } template< class U > void destroy(U* p) { p->~U(); } /** * Returns true if the contexts match. */ inline bool operator==(SVMAllocator const& rhs) { return (context_==rhs.context_); } inline bool operator!=(SVMAllocator const& a) { return !operator==(a); } }; // class SVMAllocator return cl::pointer(tmp, detail::Deleter{alloc, copies}); template class SVMAllocator { public: typedef void value_type; typedef value_type* pointer; typedef const value_type* const_pointer; template struct rebind { typedef SVMAllocator other; }; template friend class SVMAllocator; }; #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) namespace detail { template class Deleter { private: Alloc alloc_; size_type copies_; public: typedef typename std::allocator_traits::pointer pointer; Deleter(const Alloc &alloc, size_type copies) : alloc_{ alloc }, copies_{ copies } { } void operator()(pointer ptr) const { Alloc tmpAlloc{ alloc_ }; std::allocator_traits::destroy(tmpAlloc, std::addressof(*ptr)); std::allocator_traits::deallocate(tmpAlloc, ptr, copies_); } }; } // namespace detail /** * Allocation operation compatible with std::allocate_ptr. * Creates a unique_ptr by default. * This requirement is to ensure that the control block is not * allocated in memory inaccessible to the host. */ template cl::pointer> allocate_pointer(const Alloc &alloc_, Args&&... args) { Alloc alloc(alloc_); static const size_type copies = 1; // Ensure that creation of the management block and the // object are dealt with separately such that we only provide a deleter T* tmp = std::allocator_traits::allocate(alloc, copies); if (!tmp) { std::bad_alloc excep; throw excep; } try { std::allocator_traits::construct( alloc, std::addressof(*tmp), std::forward(args)...); return cl::pointer>(tmp, detail::Deleter{alloc, copies}); } catch (std::bad_alloc b) { std::allocator_traits::deallocate(alloc, tmp, copies); throw; } } template< class T, class SVMTrait, class... Args > cl::pointer>> allocate_svm(Args... args) { SVMAllocator alloc; return cl::allocate_pointer(alloc, args...); } template< class T, class SVMTrait, class... Args > cl::pointer>> allocate_svm(const cl::Context &c, Args... args) { SVMAllocator alloc(c); return cl::allocate_pointer(alloc, args...); } #endif // #if !defined(CL_HPP_NO_STD_UNIQUE_PTR) /*! \brief Vector alias to simplify contruction of coarse-grained SVM containers. * */ template < class T > using coarse_svm_vector = vector>>; /*! \brief Vector alias to simplify contruction of fine-grained SVM containers. * */ template < class T > using fine_svm_vector = vector>>; /*! \brief Vector alias to simplify contruction of fine-grained SVM containers that support platform atomics. * */ template < class T > using atomic_svm_vector = vector>>; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for Buffer Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Buffer : public Memory { public: /*! \brief Constructs a Buffer in a specified context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. */ Buffer( const Context& context, cl_mem_flags flags, size_type size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Buffer in the default context. * * Wraps clCreateBuffer(). * * \param host_ptr Storage to be used if the CL_MEM_USE_HOST_PTR flag was * specified. Note alignment & exclusivity requirements. * * \see Context::getDefault() */ Buffer( cl_mem_flags flags, size_type size, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); object_ = ::clCreateBuffer(context(), flags, size, host_ptr, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } /*! * \brief Construct a Buffer from a host container via iterators. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer( IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); Context context = Context::getDefault(err); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { error = cl::copy(startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } /*! * \brief Construct a Buffer from a host container via iterators using a specified context. * IteratorType must be random access. * If useHostPtr is specified iterators must represent contiguous data. */ template< typename IteratorType > Buffer(const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); /*! * \brief Construct a Buffer from a host container via iterators using a specified queue. * If useHostPtr is specified iterators must be random access. */ template< typename IteratorType > Buffer(const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr = false, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Buffer() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with earlier versions. * * See Memory for further details. */ explicit Buffer(const cl_mem& buffer, bool retainObject = false) : Memory(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Buffer& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Buffer(const Buffer& buf) : Memory(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Buffer& operator = (const Buffer &buf) { Memory::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Buffer(Buffer&& buf) CL_HPP_NOEXCEPT_ : Memory(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Buffer& operator = (Buffer &&buf) { Memory::operator=(std::move(buf)); return *this; } #if CL_HPP_TARGET_OPENCL_VERSION >= 110 /*! \brief Creates a new buffer object from this. * * Wraps clCreateSubBuffer(). */ Buffer createSubBuffer( cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * err = NULL) { Buffer result; cl_int error; result.object_ = ::clCreateSubBuffer( object_, flags, buffer_create_type, buffer_create_info, &error); detail::errHandler(error, __CREATE_SUBBUFFER_ERR); if (err != NULL) { *err = error; } return result; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 }; #if defined (CL_HPP_USE_DX_INTEROP) /*! \brief Class interface for creating OpenCL buffers from ID3D10Buffer's. * * This is provided to facilitate interoperability with Direct3D. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferD3D10 : public Buffer { public: /*! \brief Constructs a BufferD3D10, in a specified context, from a * given ID3D10Buffer. * * Wraps clCreateFromD3D10BufferKHR(). */ BufferD3D10( const Context& context, cl_mem_flags flags, ID3D10Buffer* bufobj, cl_int * err = NULL) : pfn_clCreateFromD3D10BufferKHR(nullptr) { typedef CL_API_ENTRY cl_mem (CL_API_CALL *PFN_clCreateFromD3D10BufferKHR)( cl_context context, cl_mem_flags flags, ID3D10Buffer* buffer, cl_int* errcode_ret); PFN_clCreateFromD3D10BufferKHR pfn_clCreateFromD3D10BufferKHR; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 vector props = context.getInfo(); cl_platform platform = -1; for( int i = 0; i < props.size(); ++i ) { if( props[i] == CL_CONTEXT_PLATFORM ) { platform = props[i+1]; } } CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clCreateFromD3D10BufferKHR); #elif CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clCreateFromD3D10BufferKHR); #endif cl_int error; object_ = pfn_clCreateFromD3D10BufferKHR( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferD3D10() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferD3D10(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferD3D10& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10(const BufferD3D10& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (const BufferD3D10 &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10(BufferD3D10&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferD3D10& operator = (BufferD3D10 &&buf) { Buffer::operator=(std::move(buf)); return *this; } }; #endif /*! \brief Class interface for GL Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferGL : public Buffer { public: /*! \brief Constructs a BufferGL in a specified context, from a given * GL buffer. * * Wraps clCreateFromGLBuffer(). */ BufferGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLBuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferGL(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL(const BufferGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (const BufferGL &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferGL(BufferGL&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferGL& operator = (BufferGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief Class interface for GL Render Buffer Memory Objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class BufferRenderGL : public Buffer { public: /*! \brief Constructs a BufferRenderGL in a specified context, from a given * GL Renderbuffer. * * Wraps clCreateFromGLRenderbuffer(). */ BufferRenderGL( const Context& context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLRenderbuffer( context(), flags, bufobj, &error); detail::errHandler(error, __CREATE_GL_RENDER_BUFFER_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. BufferRenderGL() : Buffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit BufferRenderGL(const cl_mem& buffer, bool retainObject = false) : Buffer(buffer, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ BufferRenderGL& operator = (const cl_mem& rhs) { Buffer::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL(const BufferRenderGL& buf) : Buffer(buf) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (const BufferRenderGL &buf) { Buffer::operator=(buf); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL(BufferRenderGL&& buf) CL_HPP_NOEXCEPT_ : Buffer(std::move(buf)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ BufferRenderGL& operator = (BufferRenderGL &&buf) { Buffer::operator=(std::move(buf)); return *this; } //! \brief Wrapper for clGetGLObjectInfo(). cl_int getObjectInfo( cl_gl_object_type *type, cl_GLuint * gl_object_name) { return detail::errHandler( ::clGetGLObjectInfo(object_,type,gl_object_name), __GET_GL_OBJECT_INFO_ERR); } }; /*! \brief C++ base class for Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image : public Memory { protected: //! \brief Default constructor - initializes to NULL. Image() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image(const cl_mem& image, bool retainObject = false) : Memory(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image(const Image& img) : Memory(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image& operator = (const Image &img) { Memory::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image(Image&& img) CL_HPP_NOEXCEPT_ : Memory(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image& operator = (Image &&img) { Memory::operator=(std::move(img)); return *this; } public: //! \brief Wrapper for clGetImageInfo(). template cl_int getImageInfo(cl_image_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetImageInfo, object_, name, param), __GET_IMAGE_INFO_ERR); } //! \brief Wrapper for clGetImageInfo() that returns by value. template typename detail::param_traits::param_type getImageInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_image_info, name>::param_type param; cl_int result = getImageInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 1D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image1D : public Image { public: /*! \brief Constructs a 1D Image in a specified context. * * Wraps clCreateImage(). */ Image1D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D, width, 0, 0, 0, 0, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image1D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1D(const cl_mem& image1D, bool retainObject = false) : Image(image1D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image1D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1D(const Image1D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1D& operator = (const Image1D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1D(Image1D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1D& operator = (Image1D &&img) { Image::operator=(std::move(img)); return *this; } }; /*! \class Image1DBuffer * \brief Image interface for 1D buffer images. */ class Image1DBuffer : public Image { public: Image1DBuffer( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, const Buffer &buffer, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_BUFFER, width, 0, 0, 0, 0, 0, 0, 0, buffer() }; object_ = ::clCreateImage( context(), flags, &format, &desc, NULL, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DBuffer() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1DBuffer(const cl_mem& image1D, bool retainObject = false) : Image(image1D, retainObject) { } Image1DBuffer& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer(const Image1DBuffer& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (const Image1DBuffer &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer(Image1DBuffer&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DBuffer& operator = (Image1DBuffer &&img) { Image::operator=(std::move(img)); return *this; } }; /*! \class Image1DArray * \brief Image interface for arrays of 1D images. */ class Image1DArray : public Image { public: Image1DArray( const Context& context, cl_mem_flags flags, ImageFormat format, size_type arraySize, size_type width, size_type rowPitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE1D_ARRAY, width, 0, 0, // height, depth (unused) arraySize, rowPitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image1DArray() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image1DArray(const cl_mem& imageArray, bool retainObject = false) : Image(imageArray, retainObject) { } Image1DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray(const Image1DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (const Image1DArray &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray(Image1DArray&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image1DArray& operator = (Image1DArray &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 2D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image2D : public Image { public: /*! \brief Constructs a 2D Image in a specified context. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, size_type height, size_type row_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif CL_HPP_TARGET_OPENCL_VERSION >= 120 useCreateImage = true; #else useCreateImage = false; #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 120 if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (!useCreateImage) { object_ = ::clCreateImage2D( context(), flags,&format, width, height, row_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE2D_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 || defined(CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR) /*! \brief Constructs a 2D Image from a buffer. * \note This will share storage with the underlying buffer. * * Wraps clCreateImage(). */ Image2D( const Context& context, ImageFormat format, const Buffer &sourceBuffer, size_type width, size_type height, size_type row_pitch = 0, cl_int* err = nullptr) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, width, height, 0, 0, // depth, array size (unused) row_pitch, 0, 0, 0, // Use buffer as input to image sourceBuffer() }; object_ = ::clCreateImage( context(), 0, // flags inherited from buffer &format, &desc, nullptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != nullptr) { *err = error; } } #endif //#if CL_HPP_TARGET_OPENCL_VERSION >= 200 || defined(CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR) #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Constructs a 2D Image from an image. * \note This will share storage with the underlying image but may * reinterpret the channel order and type. * * The image will be created matching with a descriptor matching the source. * * \param order is the channel order to reinterpret the image data as. * The channel order may differ as described in the OpenCL * 2.0 API specification. * * Wraps clCreateImage(). */ Image2D( const Context& context, cl_channel_order order, const Image &sourceImage, cl_int* err = nullptr) { cl_int error; // Descriptor fields have to match source image size_type sourceWidth = sourceImage.getImageInfo(); size_type sourceHeight = sourceImage.getImageInfo(); size_type sourceRowPitch = sourceImage.getImageInfo(); cl_uint sourceNumMIPLevels = sourceImage.getImageInfo(); cl_uint sourceNumSamples = sourceImage.getImageInfo(); cl_image_format sourceFormat = sourceImage.getImageInfo(); // Update only the channel order. // Channel format inherited from source. sourceFormat.image_channel_order = order; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D, sourceWidth, sourceHeight, 0, 0, // depth (unused), array size (unused) sourceRowPitch, 0, // slice pitch (unused) sourceNumMIPLevels, sourceNumSamples, // Use buffer as input to image sourceImage() }; object_ = ::clCreateImage( context(), 0, // flags should be inherited from mem_object &sourceFormat, &desc, nullptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != nullptr) { *err = error; } } #endif //#if CL_HPP_TARGET_OPENCL_VERSION >= 200 //! \brief Default constructor - initializes to NULL. Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2D(const cl_mem& image2D, bool retainObject = false) : Image(image2D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image2D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2D(const Image2D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2D& operator = (const Image2D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2D(Image2D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2D& operator = (Image2D &&img) { Image::operator=(std::move(img)); return *this; } }; #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for GL 2D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory * \note Deprecated for OpenCL 1.2. Please use ImageGL instead. */ class CL_EXT_PREFIX__VERSION_1_1_DEPRECATED Image2DGL : public Image2D { public: /*! \brief Constructs an Image2DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture2D(). */ Image2DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture2D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_2D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image2DGL() : Image2D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2DGL(const cl_mem& image, bool retainObject = false) : Image2D(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. *c * See Memory for further details. */ Image2DGL& operator = (const cl_mem& rhs) { Image2D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL(const Image2DGL& img) : Image2D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (const Image2DGL &img) { Image2D::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL(Image2DGL&& img) CL_HPP_NOEXCEPT_ : Image2D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DGL& operator = (Image2DGL &&img) { Image2D::operator=(std::move(img)); return *this; } } CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \class Image2DArray * \brief Image interface for arrays of 2D images. */ class Image2DArray : public Image { public: Image2DArray( const Context& context, cl_mem_flags flags, ImageFormat format, size_type arraySize, size_type width, size_type height, size_type rowPitch, size_type slicePitch, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; cl_image_desc desc = { CL_MEM_OBJECT_IMAGE2D_ARRAY, width, height, 0, // depth (unused) arraySize, rowPitch, slicePitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } Image2DArray() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image2DArray(const cl_mem& imageArray, bool retainObject = false) : Image(imageArray, retainObject) { } Image2DArray& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray(const Image2DArray& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (const Image2DArray &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray(Image2DArray&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image2DArray& operator = (Image2DArray &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \brief Class interface for 3D Image Memory objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3D : public Image { public: /*! \brief Constructs a 3D Image in a specified context. * * Wraps clCreateImage(). */ Image3D( const Context& context, cl_mem_flags flags, ImageFormat format, size_type width, size_type height, size_type depth, size_type row_pitch = 0, size_type slice_pitch = 0, void* host_ptr = NULL, cl_int* err = NULL) { cl_int error; bool useCreateImage; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 && CL_HPP_MINIMUM_OPENCL_VERSION < 120 // Run-time decision based on the actual platform { cl_uint version = detail::getContextPlatformVersion(context()); useCreateImage = (version >= 0x10002); // OpenCL 1.2 or above } #elif CL_HPP_TARGET_OPENCL_VERSION >= 120 useCreateImage = true; #else useCreateImage = false; #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 120 if (useCreateImage) { cl_image_desc desc = { CL_MEM_OBJECT_IMAGE3D, width, height, depth, 0, // array size (unused) row_pitch, slice_pitch, 0, 0, 0 }; object_ = ::clCreateImage( context(), flags, &format, &desc, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_MINIMUM_OPENCL_VERSION < 120 if (!useCreateImage) { object_ = ::clCreateImage3D( context(), flags, &format, width, height, depth, row_pitch, slice_pitch, host_ptr, &error); detail::errHandler(error, __CREATE_IMAGE3D_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_MINIMUM_OPENCL_VERSION < 120 } //! \brief Default constructor - initializes to NULL. Image3D() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image3D(const cl_mem& image3D, bool retainObject = false) : Image(image3D, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3D& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3D(const Image3D& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3D& operator = (const Image3D &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3D(Image3D&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3D& operator = (Image3D &&img) { Image::operator=(std::move(img)); return *this; } }; #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) /*! \brief Class interface for GL 3D Image Memory objects. * * This is provided to facilitate interoperability with OpenGL. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Image3DGL : public Image3D { public: /*! \brief Constructs an Image3DGL in a specified context, from a given * GL Texture. * * Wraps clCreateFromGLTexture3D(). */ Image3DGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture3D( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_3D_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Image3DGL() : Image3D() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit Image3DGL(const cl_mem& image, bool retainObject = false) : Image3D(image, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Image3DGL& operator = (const cl_mem& rhs) { Image3D::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL(const Image3DGL& img) : Image3D(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (const Image3DGL &img) { Image3D::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL(Image3DGL&& img) CL_HPP_NOEXCEPT_ : Image3D(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Image3DGL& operator = (Image3DGL &&img) { Image3D::operator=(std::move(img)); return *this; } }; #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /*! \class ImageGL * \brief general image interface for GL interop. * We abstract the 2D and 3D GL images into a single instance here * that wraps all GL sourced images on the grounds that setup information * was performed by OpenCL anyway. */ class ImageGL : public Image { public: ImageGL( const Context& context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texobj, cl_int * err = NULL) { cl_int error; object_ = ::clCreateFromGLTexture( context(), flags, target, miplevel, texobj, &error); detail::errHandler(error, __CREATE_GL_TEXTURE_ERR); if (err != NULL) { *err = error; } } ImageGL() : Image() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * See Memory for further details. */ explicit ImageGL(const cl_mem& image, bool retainObject = false) : Image(image, retainObject) { } ImageGL& operator = (const cl_mem& rhs) { Image::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL(const ImageGL& img) : Image(img) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (const ImageGL &img) { Image::operator=(img); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ ImageGL(ImageGL&& img) CL_HPP_NOEXCEPT_ : Image(std::move(img)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ ImageGL& operator = (ImageGL &&img) { Image::operator=(std::move(img)); return *this; } }; #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for Pipe Memory Objects. * * See Memory for details about copy semantics, etc. * * \see Memory */ class Pipe : public Memory { public: /*! \brief Constructs a Pipe in a specified context. * * Wraps clCreatePipe(). * @param context Context in which to create the pipe. * @param flags Bitfield. Only CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS are valid. * @param packet_size Size in bytes of a single packet of the pipe. * @param max_packets Number of packets that may be stored in the pipe. * */ Pipe( const Context& context, cl_uint packet_size, cl_uint max_packets, cl_int* err = NULL) { cl_int error; cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS; object_ = ::clCreatePipe(context(), flags, packet_size, max_packets, nullptr, &error); detail::errHandler(error, __CREATE_PIPE_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructs a Pipe in a the default context. * * Wraps clCreatePipe(). * @param flags Bitfield. Only CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS are valid. * @param packet_size Size in bytes of a single packet of the pipe. * @param max_packets Number of packets that may be stored in the pipe. * */ Pipe( cl_uint packet_size, cl_uint max_packets, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS; object_ = ::clCreatePipe(context(), flags, packet_size, max_packets, nullptr, &error); detail::errHandler(error, __CREATE_PIPE_ERR); if (err != NULL) { *err = error; } } //! \brief Default constructor - initializes to NULL. Pipe() : Memory() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with earlier versions. * * See Memory for further details. */ explicit Pipe(const cl_mem& pipe, bool retainObject = false) : Memory(pipe, retainObject) { } /*! \brief Assignment from cl_mem - performs shallow copy. * * See Memory for further details. */ Pipe& operator = (const cl_mem& rhs) { Memory::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Pipe(const Pipe& pipe) : Memory(pipe) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Pipe& operator = (const Pipe &pipe) { Memory::operator=(pipe); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Pipe(Pipe&& pipe) CL_HPP_NOEXCEPT_ : Memory(std::move(pipe)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Pipe& operator = (Pipe &&pipe) { Memory::operator=(std::move(pipe)); return *this; } //! \brief Wrapper for clGetMemObjectInfo(). template cl_int getInfo(cl_pipe_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetPipeInfo, object_, name, param), __GET_PIPE_INFO_ERR); } //! \brief Wrapper for clGetMemObjectInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_pipe_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; // class Pipe #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief Class interface for cl_sampler. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_sampler as the original. For details, see * clRetainSampler() and clReleaseSampler(). * * \see cl_sampler */ class Sampler : public detail::Wrapper { public: //! \brief Default constructor - initializes to NULL. Sampler() { } /*! \brief Constructs a Sampler in a specified context. * * Wraps clCreateSampler(). */ Sampler( const Context& context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_sampler_properties sampler_properties[] = { CL_SAMPLER_NORMALIZED_COORDS, normalized_coords, CL_SAMPLER_ADDRESSING_MODE, addressing_mode, CL_SAMPLER_FILTER_MODE, filter_mode, 0 }; object_ = ::clCreateSamplerWithProperties( context(), sampler_properties, &error); detail::errHandler(error, __CREATE_SAMPLER_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateSampler( context(), normalized_coords, addressing_mode, filter_mode, &error); detail::errHandler(error, __CREATE_SAMPLER_ERR); if (err != NULL) { *err = error; } #endif } /*! \brief Constructor from cl_sampler - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_sampler * into the new Sampler object. */ explicit Sampler(const cl_sampler& sampler, bool retainObject = false) : detail::Wrapper(sampler, retainObject) { } /*! \brief Assignment operator from cl_sampler - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseSampler() on the value previously held by this instance. */ Sampler& operator = (const cl_sampler& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Sampler(const Sampler& sam) : detail::Wrapper(sam) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Sampler& operator = (const Sampler &sam) { detail::Wrapper::operator=(sam); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Sampler(Sampler&& sam) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(sam)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Sampler& operator = (Sampler &&sam) { detail::Wrapper::operator=(std::move(sam)); return *this; } //! \brief Wrapper for clGetSamplerInfo(). template cl_int getInfo(cl_sampler_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetSamplerInfo, object_, name, param), __GET_SAMPLER_INFO_ERR); } //! \brief Wrapper for clGetSamplerInfo() that returns by value. template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_sampler_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } }; class Program; class CommandQueue; class DeviceCommandQueue; class Kernel; //! \brief Class interface for specifying NDRange values. class NDRange { private: size_type sizes_[3]; cl_uint dimensions_; public: //! \brief Default constructor - resulting range has zero dimensions. NDRange() : dimensions_(0) { sizes_[0] = 0; sizes_[1] = 0; sizes_[2] = 0; } //! \brief Constructs one-dimensional range. NDRange(size_type size0) : dimensions_(1) { sizes_[0] = size0; sizes_[1] = 1; sizes_[2] = 1; } //! \brief Constructs two-dimensional range. NDRange(size_type size0, size_type size1) : dimensions_(2) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = 1; } //! \brief Constructs three-dimensional range. NDRange(size_type size0, size_type size1, size_type size2) : dimensions_(3) { sizes_[0] = size0; sizes_[1] = size1; sizes_[2] = size2; } /*! \brief Conversion operator to const size_type *. * * \returns a pointer to the size of the first dimension. */ operator const size_type*() const { return sizes_; } //! \brief Queries the number of dimensions in the range. size_type dimensions() const { return dimensions_; } //! \brief Returns the size of the object in bytes based on the // runtime number of dimensions size_type size() const { return dimensions_*sizeof(size_type); } size_type* get() { return sizes_; } const size_type* get() const { return sizes_; } }; //! \brief A zero-dimensional range. static const NDRange NullRange; //! \brief Local address wrapper for use with Kernel::setArg struct LocalSpaceArg { size_type size_; }; namespace detail { template struct KernelArgumentHandler; // Enable for objects that are not subclasses of memory // Pointers, constants etc template struct KernelArgumentHandler::value>::type> { static size_type size(const T&) { return sizeof(T); } static const T* ptr(const T& value) { return &value; } }; // Enable for subclasses of memory where we want to get a reference to the cl_mem out // and pass that in for safety template struct KernelArgumentHandler::value>::type> { static size_type size(const T&) { return sizeof(cl_mem); } static const cl_mem* ptr(const T& value) { return &(value()); } }; // Specialization for DeviceCommandQueue defined later template <> struct KernelArgumentHandler { static size_type size(const LocalSpaceArg& value) { return value.size_; } static const void* ptr(const LocalSpaceArg&) { return NULL; } }; } //! \endcond /*! Local * \brief Helper function for generating LocalSpaceArg objects. */ inline LocalSpaceArg Local(size_type size) { LocalSpaceArg ret = { size }; return ret; } /*! \brief Class interface for cl_kernel. * * \note Copies of these objects are shallow, meaning that the copy will refer * to the same underlying cl_kernel as the original. For details, see * clRetainKernel() and clReleaseKernel(). * * \see cl_kernel */ class Kernel : public detail::Wrapper { public: inline Kernel(const Program& program, const char* name, cl_int* err = NULL); //! \brief Default constructor - initializes to NULL. Kernel() { } /*! \brief Constructor from cl_kernel - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. * This effectively transfers ownership of a refcount on the cl_kernel * into the new Kernel object. */ explicit Kernel(const cl_kernel& kernel, bool retainObject = false) : detail::Wrapper(kernel, retainObject) { } /*! \brief Assignment operator from cl_kernel - takes ownership. * * This effectively transfers ownership of a refcount on the rhs and calls * clReleaseKernel() on the value previously held by this instance. */ Kernel& operator = (const cl_kernel& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Kernel(const Kernel& kernel) : detail::Wrapper(kernel) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Kernel& operator = (const Kernel &kernel) { detail::Wrapper::operator=(kernel); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Kernel(Kernel&& kernel) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(kernel)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Kernel& operator = (Kernel &&kernel) { detail::Wrapper::operator=(std::move(kernel)); return *this; } template cl_int getInfo(cl_kernel_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelInfo, object_, name, param), __GET_KERNEL_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getArgInfo(cl_uint argIndex, cl_kernel_arg_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetKernelArgInfo, object_, argIndex, name, param), __GET_KERNEL_ARG_INFO_ERR); } template typename detail::param_traits::param_type getArgInfo(cl_uint argIndex, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_arg_info, name>::param_type param; cl_int result = getArgInfo(argIndex, name, ¶m); if (err != NULL) { *err = result; } return param; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getWorkGroupInfo( const Device& device, cl_kernel_work_group_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetKernelWorkGroupInfo, object_, device(), name, param), __GET_KERNEL_WORK_GROUP_INFO_ERR); } template typename detail::param_traits::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_kernel_work_group_info, name>::param_type param; cl_int result = getWorkGroupInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) cl_int getSubGroupInfo(const cl::Device &dev, cl_kernel_sub_group_info name, const cl::NDRange &range, size_type* param) const { typedef clGetKernelSubGroupInfoKHR_fn PFN_clGetKernelSubGroupInfoKHR; static PFN_clGetKernelSubGroupInfoKHR pfn_clGetKernelSubGroupInfoKHR = NULL; CL_HPP_INIT_CL_EXT_FCN_PTR_(clGetKernelSubGroupInfoKHR); return detail::errHandler( pfn_clGetKernelSubGroupInfoKHR(object_, dev(), name, range.size(), range.get(), sizeof(size_type), param, nullptr), __GET_KERNEL_ARG_INFO_ERR); } template size_type getSubGroupInfo(const cl::Device &dev, const cl::NDRange &range, cl_int* err = NULL) const { size_type param; cl_int result = getSubGroupInfo(dev, name, range, ¶m); if (err != NULL) { *err = result; } return param; } #endif // #if defined(CL_HPP_USE_CL_SUB_GROUPS_KHR) #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief setArg overload taking a shared_ptr type */ template cl_int setArg(cl_uint index, const cl::pointer &argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr.get()), __SET_KERNEL_ARGS_ERR); } /*! \brief setArg overload taking a vector type. */ template cl_int setArg(cl_uint index, const cl::vector &argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr.data()), __SET_KERNEL_ARGS_ERR); } /*! \brief setArg overload taking a pointer type */ template typename std::enable_if::value, cl_int>::type setArg(cl_uint index, const T argPtr) { return detail::errHandler( ::clSetKernelArgSVMPointer(object_, index, argPtr), __SET_KERNEL_ARGS_ERR); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! \brief setArg overload taking a POD type */ template typename std::enable_if::value, cl_int>::type setArg(cl_uint index, const T &value) { return detail::errHandler( ::clSetKernelArg( object_, index, detail::KernelArgumentHandler::size(value), detail::KernelArgumentHandler::ptr(value)), __SET_KERNEL_ARGS_ERR); } cl_int setArg(cl_uint index, size_type size, const void* argPtr) { return detail::errHandler( ::clSetKernelArg(object_, index, size, argPtr), __SET_KERNEL_ARGS_ERR); } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /*! * Specify a vector of SVM pointers that the kernel may access in * addition to its arguments. */ cl_int setSVMPointers(const vector &pointerList) { return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*pointerList.size(), pointerList.data())); } /*! * Specify a std::array of SVM pointers that the kernel may access in * addition to its arguments. */ template cl_int setSVMPointers(const std::array &pointerList) { return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*pointerList.size(), pointerList.data())); } /*! \brief Enable fine-grained system SVM. * * \note It is only possible to enable fine-grained system SVM if all devices * in the context associated with kernel support it. * * \param svmEnabled True if fine-grained system SVM is requested. False otherwise. * \return CL_SUCCESS if the function was executed succesfully. CL_INVALID_OPERATION * if no devices in the context support fine-grained system SVM. * * \see clSetKernelExecInfo */ cl_int enableFineGrainedSystemSVM(bool svmEnabled) { cl_bool svmEnabled_ = svmEnabled ? CL_TRUE : CL_FALSE; return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM, sizeof(cl_bool), &svmEnabled_ ) ); } template void setSVMPointersHelper(std::array &pointerList, const pointer &t0, Ts... ts) { pointerList[index] = static_cast(t0.get()); setSVMPointersHelper(ts...); } template typename std::enable_if::value, void>::type setSVMPointersHelper(std::array &pointerList, T0 t0, Ts... ts) { pointerList[index] = static_cast(t0); setSVMPointersHelper(ts...); } template void setSVMPointersHelper(std::array &pointerList, const pointer &t0) { pointerList[index] = static_cast(t0.get()); } template typename std::enable_if::value, void>::type setSVMPointersHelper(std::array &pointerList, T0 t0) { pointerList[index] = static_cast(t0); } template cl_int setSVMPointers(const T0 &t0, Ts... ts) { std::array pointerList; setSVMPointersHelper<0, 1 + sizeof...(Ts)>(pointerList, t0, ts...); return detail::errHandler( ::clSetKernelExecInfo( object_, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(void*)*(1 + sizeof...(Ts)), pointerList.data())); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 }; /*! \class Program * \brief Program interface that implements cl_program. */ class Program : public detail::Wrapper { public: #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) typedef vector> Binaries; typedef vector Sources; #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) typedef vector > Binaries; typedef vector > Sources; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) Program( const string& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const size_type length = source.size(); Context context = Context::getDefault(err); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) "-cl-std=CL2.0", #else "", #endif // #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) NULL, NULL); detail::buildErrHandler(error, __BUILD_PROGRAM_ERR, getBuildInfo()); } if (err != NULL) { *err = error; } } Program( const Context& context, const string& source, bool build = false, cl_int* err = NULL) { cl_int error; const char * strings = source.c_str(); const size_type length = source.size(); object_ = ::clCreateProgramWithSource( context(), (cl_uint)1, &strings, &length, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (error == CL_SUCCESS && build) { error = ::clBuildProgram( object_, 0, NULL, #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) "-cl-std=CL2.0", #else "", #endif // #if !defined(CL_HPP_CL_1_2_DEFAULT_BUILD) NULL, NULL); detail::buildErrHandler(error, __BUILD_PROGRAM_ERR, getBuildInfo()); } if (err != NULL) { *err = error; } } /** * Create a program from a vector of source strings and the default context. * Does not compile or link the program. */ Program( const Sources& sources, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(err); const size_type n = (size_type)sources.size(); vector lengths(n); vector strings(n); for (size_type i = 0; i < n; ++i) { #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].data(); lengths[i] = sources[(int)i].length(); #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings.data(), lengths.data(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Create a program from a vector of source strings and a provided context. * Does not compile or link the program. */ Program( const Context& context, const Sources& sources, cl_int* err = NULL) { cl_int error; const size_type n = (size_type)sources.size(); vector lengths(n); vector strings(n); for (size_type i = 0; i < n; ++i) { #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].data(); lengths[i] = sources[(int)i].length(); #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) strings[i] = sources[(int)i].first; lengths[i] = sources[(int)i].second; #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) } object_ = ::clCreateProgramWithSource( context(), (cl_uint)n, strings.data(), lengths.data(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_SOURCE_ERR); if (err != NULL) { *err = error; } } /** * Construct a program object from a list of devices and a per-device list of binaries. * \param context A valid OpenCL context in which to construct the program. * \param devices A vector of OpenCL device objects for which the program will be created. * \param binaries A vector of pairs of a pointer to a binary object and its length. * \param binaryStatus An optional vector that on completion will be resized to * match the size of binaries and filled with values to specify if each binary * was successfully loaded. * Set to CL_SUCCESS if the binary was successfully loaded. * Set to CL_INVALID_VALUE if the length is 0 or the binary pointer is NULL. * Set to CL_INVALID_BINARY if the binary provided is not valid for the matching device. * \param err if non-NULL will be set to CL_SUCCESS on successful operation or one of the following errors: * CL_INVALID_CONTEXT if context is not a valid context. * CL_INVALID_VALUE if the length of devices is zero; or if the length of binaries does not match the length of devices; * or if any entry in binaries is NULL or has length 0. * CL_INVALID_DEVICE if OpenCL devices listed in devices are not in the list of devices associated with context. * CL_INVALID_BINARY if an invalid program binary was encountered for any device. binaryStatus will return specific status for each device. * CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host. */ Program( const Context& context, const vector& devices, const Binaries& binaries, vector* binaryStatus = NULL, cl_int* err = NULL) { cl_int error; const size_type numDevices = devices.size(); // Catch size mismatch early and return if(binaries.size() != numDevices) { error = CL_INVALID_VALUE; detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } return; } vector lengths(numDevices); vector images(numDevices); #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) for (size_type i = 0; i < numDevices; ++i) { images[i] = binaries[i].data(); lengths[i] = binaries[(int)i].size(); } #else // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) for (size_type i = 0; i < numDevices; ++i) { images[i] = (const unsigned char*)binaries[i].first; lengths[i] = binaries[(int)i].second; } #endif // #if !defined(CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY) vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } if(binaryStatus) { binaryStatus->resize(numDevices); } object_ = ::clCreateProgramWithBinary( context(), (cl_uint) devices.size(), deviceIDs.data(), lengths.data(), images.data(), (binaryStatus != NULL && numDevices > 0) ? &binaryStatus->front() : NULL, &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BINARY_ERR); if (err != NULL) { *err = error; } } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Create program using builtin kernels. * \param kernelNames Semi-colon separated list of builtin kernel names */ Program( const Context& context, const vector& devices, const string& kernelNames, cl_int* err = NULL) { cl_int error; size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } object_ = ::clCreateProgramWithBuiltInKernels( context(), (cl_uint) devices.size(), deviceIDs.data(), kernelNames.c_str(), &error); detail::errHandler(error, __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR); if (err != NULL) { *err = error; } } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 Program() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit Program(const cl_program& program, bool retainObject = false) : detail::Wrapper(program, retainObject) { } Program& operator = (const cl_program& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ Program(const Program& program) : detail::Wrapper(program) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ Program& operator = (const Program &program) { detail::Wrapper::operator=(program); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ Program(Program&& program) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(program)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ Program& operator = (Program &&program) { detail::Wrapper::operator=(std::move(program)); return *this; } cl_int build( const vector& devices, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { size_type numDevices = devices.size(); vector deviceIDs(numDevices); for( size_type deviceIndex = 0; deviceIndex < numDevices; ++deviceIndex ) { deviceIDs[deviceIndex] = (devices[deviceIndex])(); } cl_int buildError = ::clBuildProgram( object_, (cl_uint) devices.size(), deviceIDs.data(), options, notifyFptr, data); return detail::buildErrHandler(buildError, __BUILD_PROGRAM_ERR, getBuildInfo()); } cl_int build( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { cl_int buildError = ::clBuildProgram( object_, 0, NULL, options, notifyFptr, data); return detail::buildErrHandler(buildError, __BUILD_PROGRAM_ERR, getBuildInfo()); } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int compile( const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL) const { cl_int error = ::clCompileProgram( object_, 0, NULL, options, 0, NULL, NULL, notifyFptr, data); return detail::buildErrHandler(error, __COMPILE_PROGRAM_ERR, getBuildInfo()); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 template cl_int getInfo(cl_program_info name, T* param) const { return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } template cl_int getBuildInfo( const Device& device, cl_program_build_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetProgramBuildInfo, object_, device(), name, param), __GET_PROGRAM_BUILD_INFO_ERR); } template typename detail::param_traits::param_type getBuildInfo(const Device& device, cl_int* err = NULL) const { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; cl_int result = getBuildInfo(device, name, ¶m); if (err != NULL) { *err = result; } return param; } /** * Build info function that returns a vector of device/info pairs for the specified * info type and for all devices in the program. * On an error reading the info for any device, an empty vector of info will be returned. */ template vector::param_type>> getBuildInfo(cl_int *err = NULL) const { cl_int result = CL_SUCCESS; auto devs = getInfo(&result); vector::param_type>> devInfo; // If there was an initial error from getInfo return the error if (result != CL_SUCCESS) { if (err != NULL) { *err = result; } return devInfo; } for (const cl::Device &d : devs) { typename detail::param_traits< detail::cl_program_build_info, name>::param_type param; result = getBuildInfo(d, name, ¶m); devInfo.push_back( std::pair::param_type> (d, param)); if (result != CL_SUCCESS) { // On error, leave the loop and return the error code break; } } if (err != NULL) { *err = result; } if (result != CL_SUCCESS) { devInfo.clear(); } return devInfo; } cl_int createKernels(vector* kernels) { cl_uint numKernels; cl_int err = ::clCreateKernelsInProgram(object_, 0, NULL, &numKernels); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } vector value(numKernels); err = ::clCreateKernelsInProgram( object_, numKernels, value.data(), NULL); if (err != CL_SUCCESS) { return detail::errHandler(err, __CREATE_KERNELS_IN_PROGRAM_ERR); } if (kernels) { kernels->resize(value.size()); // Assign to param, constructing with retain behaviour // to correctly capture each underlying CL object for (size_type i = 0; i < value.size(); i++) { // We do not need to retain because this kernel is being created // by the runtime (*kernels)[i] = Kernel(value[i], false); } } return CL_SUCCESS; } }; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 inline Program linkProgram( Program input1, Program input2, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; cl_program programs[2] = { input1(), input2() }; Context ctx = input1.getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, 2, programs, notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog); } inline Program linkProgram( vector inputPrograms, const char* options = NULL, void (CL_CALLBACK * notifyFptr)(cl_program, void *) = NULL, void* data = NULL, cl_int* err = NULL) { cl_int error_local = CL_SUCCESS; vector programs(inputPrograms.size()); for (unsigned int i = 0; i < inputPrograms.size(); i++) { programs[i] = inputPrograms[i](); } Context ctx; if(inputPrograms.size() > 0) { ctx = inputPrograms[0].getInfo(&error_local); if(error_local!=CL_SUCCESS) { detail::errHandler(error_local, __LINK_PROGRAM_ERR); } } cl_program prog = ::clLinkProgram( ctx(), 0, NULL, options, (cl_uint)inputPrograms.size(), programs.data(), notifyFptr, data, &error_local); detail::errHandler(error_local,__COMPILE_PROGRAM_ERR); if (err != NULL) { *err = error_local; } return Program(prog, false); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 // Template specialization for CL_PROGRAM_BINARIES template <> inline cl_int cl::Program::getInfo(cl_program_info name, vector>* param) const { if (name != CL_PROGRAM_BINARIES) { return CL_INVALID_VALUE; } if (param) { // Resize the parameter array appropriately for each allocation // and pass down to the helper vector sizes = getInfo(); size_type numBinaries = sizes.size(); // Resize the parameter array and constituent arrays param->resize(numBinaries); for (size_type i = 0; i < numBinaries; ++i) { (*param)[i].resize(sizes[i]); } return detail::errHandler( detail::getInfo(&::clGetProgramInfo, object_, name, param), __GET_PROGRAM_INFO_ERR); } return CL_SUCCESS; } template<> inline vector> cl::Program::getInfo(cl_int* err) const { vector> binariesVectors; cl_int result = getInfo(CL_PROGRAM_BINARIES, &binariesVectors); if (err != NULL) { *err = result; } return binariesVectors; } inline Kernel::Kernel(const Program& program, const char* name, cl_int* err) { cl_int error; object_ = ::clCreateKernel(program(), name, &error); detail::errHandler(error, __CREATE_KERNEL_ERR); if (err != NULL) { *err = error; } } enum class QueueProperties : cl_command_queue_properties { None = 0, Profiling = CL_QUEUE_PROFILING_ENABLE, OutOfOrder = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, }; inline QueueProperties operator|(QueueProperties lhs, QueueProperties rhs) { return static_cast(static_cast(lhs) | static_cast(rhs)); } /*! \class CommandQueue * \brief CommandQueue interface for cl_command_queue. */ class CommandQueue : public detail::Wrapper { private: static std::once_flag default_initialized_; static CommandQueue default_; static cl_int default_error_; /*! \brief Create the default command queue returned by @ref getDefault. * * It sets default_error_ to indicate success or failure. It does not throw * @c cl::Error. */ static void makeDefault() { /* We don't want to throw an error from this function, so we have to * catch and set the error flag. */ #if defined(CL_HPP_ENABLE_EXCEPTIONS) try #endif { int error; Context context = Context::getDefault(&error); if (error != CL_SUCCESS) { default_error_ = error; } else { Device device = Device::getDefault(); default_ = CommandQueue(context, device, 0, &default_error_); } } #if defined(CL_HPP_ENABLE_EXCEPTIONS) catch (cl::Error &e) { default_error_ = e.err(); } #endif } /*! \brief Create the default command queue. * * This sets @c default_. It does not throw * @c cl::Error. */ static void makeDefaultProvided(const CommandQueue &c) { default_ = c; } public: #ifdef CL_HPP_UNIT_TEST_ENABLE /*! \brief Reset the default. * * This sets @c default_ to an empty value to support cleanup in * the unit test framework. * This function is not thread safe. */ static void unitTestClearDefault() { default_ = CommandQueue(); } #endif // #ifdef CL_HPP_UNIT_TEST_ENABLE /*! * \brief Constructs a CommandQueue based on passed properties. * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( cl_command_queue_properties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; if ((properties & CL_QUEUE_ON_DEVICE) == 0) { object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); } else { error = CL_INVALID_QUEUE_PROPERTIES; } detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } } /*! * \brief Constructs a CommandQueue based on passed properties. * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( QueueProperties properties, cl_int* err = NULL) { cl_int error; Context context = Context::getDefault(&error); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } } else { Device device = context.getInfo()[0]; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ explicit CommandQueue( const Context& context, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; vector devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; if ((properties & CL_QUEUE_ON_DEVICE) == 0) { object_ = ::clCreateCommandQueueWithProperties( context(), devices[0](), queue_properties, &error); } else { error = CL_INVALID_QUEUE_PROPERTIES; } detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), devices[0](), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for an implementation defined device in the given context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ explicit CommandQueue( const Context& context, QueueProperties properties, cl_int* err = NULL) { cl_int error; vector devices; error = context.getInfo(CL_CONTEXT_DEVICES, &devices); detail::errHandler(error, __CREATE_CONTEXT_ERR); if (error != CL_SUCCESS) { if (err != NULL) { *err = error; } return; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), devices[0](), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), devices[0](), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for a passed device and context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( const Context& context, const Device& device, cl_command_queue_properties properties = 0, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } /*! * \brief Constructs a CommandQueue for a passed device and context * Will return an CL_INVALID_QUEUE_PROPERTIES error if CL_QUEUE_ON_DEVICE is specified. */ CommandQueue( const Context& context, const Device& device, QueueProperties properties, cl_int* err = NULL) { cl_int error; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } #else object_ = ::clCreateCommandQueue( context(), device(), static_cast(properties), &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_ERR); if (err != NULL) { *err = error; } #endif } static CommandQueue getDefault(cl_int * err = NULL) { std::call_once(default_initialized_, makeDefault); #if CL_HPP_TARGET_OPENCL_VERSION >= 200 detail::errHandler(default_error_, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); #else // CL_HPP_TARGET_OPENCL_VERSION >= 200 detail::errHandler(default_error_, __CREATE_COMMAND_QUEUE_ERR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 200 if (err != NULL) { *err = default_error_; } return default_; } /** * Modify the default command queue to be used by * subsequent operations. * Will only set the default if no default was previously created. * @return updated default command queue. * Should be compared to the passed value to ensure that it was updated. */ static CommandQueue setDefault(const CommandQueue &default_queue) { std::call_once(default_initialized_, makeDefaultProvided, std::cref(default_queue)); detail::errHandler(default_error_); return default_; } CommandQueue() { } /*! \brief Constructor from cl_mem - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit CommandQueue(const cl_command_queue& commandQueue, bool retainObject = false) : detail::Wrapper(commandQueue, retainObject) { } CommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue(const CommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (const CommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue(CommandQueue&& queue) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ CommandQueue& operator = (CommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, const void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBuffer( object_, buffer(), blocking, offset, size, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, size_type src_offset, size_type dst_offset, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBuffer( object_, src(), dst(), src_offset, dst_offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, void *ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadBufferRect( object_, buffer(), blocking, buffer_offset.data(), host_offset.data(), region.data(), buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, const void *ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteBufferRect( object_, buffer(), blocking, buffer_offset.data(), host_offset.data(), region.data(), buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const array& src_origin, const array& dst_origin, const array& region, size_type src_row_pitch, size_type src_slice_pitch, size_type dst_row_pitch, size_type dst_slice_pitch, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferRect( object_, src(), dst(), src_origin.data(), dst_origin.data(), region.data(), src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQEUE_COPY_BUFFER_RECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueue a command to fill a buffer object with a pattern * of a given size. The pattern is specified as a vector type. * \tparam PatternType The datatype of the pattern field. * The pattern type must be an accepted OpenCL data type. * \tparam offset Is the offset in bytes into the buffer at * which to start filling. This must be a multiple of * the pattern size. * \tparam size Is the size in bytes of the region to fill. * This must be a multiple of the pattern size. */ template cl_int enqueueFillBuffer( const Buffer& buffer, PatternType pattern, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillBuffer( object_, buffer(), static_cast(&pattern), sizeof(PatternType), offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueReadImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReadImage( object_, image(), blocking, origin.data(), region.data(), row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_READ_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, const void* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueWriteImage( object_, image(), blocking, origin.data(), region.data(), row_pitch, slice_pitch, ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_WRITE_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyImage( const Image& src, const Image& dst, const array& src_origin, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImage( object_, src(), dst(), src_origin.data(), dst_origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA floating-point color value if * the image channel data type is not an unnormalized signed or * unsigned data type. */ cl_int enqueueFillImage( const Image& image, cl_float4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA signed integer color value if * the image channel data type is an unnormalized signed integer * type. */ cl_int enqueueFillImage( const Image& image, cl_int4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueue a command to fill an image object with a specified color. * \param fillColor is the color to use to fill the image. * This is a four component RGBA unsigned integer color value if * the image channel data type is an unnormalized unsigned integer * type. */ cl_int enqueueFillImage( const Image& image, cl_uint4 fillColor, const array& origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueFillImage( object_, image(), static_cast(&fillColor), origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_FILL_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const array& src_origin, const array& region, size_type dst_offset, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyImageToBuffer( object_, src(), dst(), src_origin.data(), region.data(), dst_offset, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, size_type src_offset, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueCopyBufferToImage( object_, src(), dst(), src_offset, dst_origin.data(), region.data(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapBuffer( object_, buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } void* enqueueMapImage( const Image& buffer, cl_bool blocking, cl_map_flags flags, const array& origin, const array& region, size_type * row_pitch, size_type * slice_pitch, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) const { cl_event tmp; cl_int error; void * result = ::clEnqueueMapImage( object_, buffer(), blocking, flags, origin.data(), region.data(), row_pitch, slice_pitch, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL, &error); detail::errHandler(error, __ENQUEUE_MAP_IMAGE_ERR); if (err != NULL) { *err = error; } if (event != NULL && error == CL_SUCCESS) *event = tmp; return result; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a raw SVM pointer. */ template cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(ptr), size, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a cl::pointer instance. */ template cl_int enqueueMapSVM( cl::pointer &ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(ptr.get()), size, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will allow the host to update a region of a coarse-grained SVM buffer. * This variant takes a cl::vector instance. */ template cl_int enqueueMapSVM( cl::vector &container, cl_bool blocking, cl_map_flags flags, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler(::clEnqueueSVMMap( object_, blocking, flags, static_cast(container.data()), container.size(), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MAP_BUFFER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( object_, memory(), mapped_ptr, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a raw SVM pointer. */ template cl_int enqueueUnmapSVM( T* ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(ptr), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a cl::pointer instance. */ template cl_int enqueueUnmapSVM( cl::pointer &ptr, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(ptr.get()), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command that will release a coarse-grained SVM buffer back to the OpenCL runtime. * This variant takes a cl::vector instance. */ template cl_int enqueueUnmapSVM( cl::vector &container, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueSVMUnmap( object_, static_cast(container.data()), (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 120 /** * Enqueues a marker command which waits for either a list of events to complete, * or all previously enqueued commands to complete. * * Enqueues a marker command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command returns an event which can be waited on, * i.e. this event can be waited on to insure that all events either in the event_wait_list * or all previously enqueued commands, queued before this command to command_queue, * have completed. */ cl_int enqueueMarkerWithWaitList( const vector *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarkerWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * A synchronization point that enqueues a barrier operation. * * Enqueues a barrier command which waits for either a list of events to complete, * or if the list is empty it waits for all commands previously enqueued in command_queue * to complete before it completes. This command blocks command execution, that is, any * following commands enqueued after it do not execute until it completes. This command * returns an event which can be waited on, i.e. this event can be waited on to insure that * all events either in the event_wait_list or all previously enqueued commands, queued * before this command to command_queue, have completed. */ cl_int enqueueBarrierWithWaitList( const vector *events = 0, Event *event = 0) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueBarrierWithWaitList( object_, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_BARRIER_WAIT_LIST_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Enqueues a command to indicate with which device a set of memory objects * should be associated. */ cl_int enqueueMigrateMemObjects( const vector &memObjects, cl_mem_migration_flags flags, const vector* events = NULL, Event* event = NULL ) const { cl_event tmp; vector localMemObjects(memObjects.size()); for( int i = 0; i < (int)memObjects.size(); ++i ) { localMemObjects[i] = memObjects[i](); } cl_int err = detail::errHandler( ::clEnqueueMigrateMemObjects( object_, (cl_uint)memObjects.size(), localMemObjects.data(), flags, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_int enqueueNDRangeKernel( const Kernel& kernel, const NDRange& offset, const NDRange& global, const NDRange& local = NullRange, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNDRangeKernel( object_, kernel(), (cl_uint) global.dimensions(), offset.dimensions() != 0 ? (const size_type*) offset : NULL, (const size_type*) global, local.dimensions() != 0 ? (const size_type*) local : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NDRANGE_KERNEL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_int enqueueTask( const Kernel& kernel, const vector* events = NULL, Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueTask( object_, kernel(), (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_TASK_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif // #if defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) cl_int enqueueNativeKernel( void (CL_CALLBACK *userFptr)(void *), std::pair args, const vector* mem_objects = NULL, const vector* mem_locs = NULL, const vector* events = NULL, Event* event = NULL) const { size_type elements = 0; if (mem_objects != NULL) { elements = mem_objects->size(); } vector mems(elements); for (unsigned int i = 0; i < elements; i++) { mems[i] = ((*mem_objects)[i])(); } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueNativeKernel( object_, userFptr, args.first, args.second, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, mems.data(), (mem_locs != NULL && mem_locs->size() > 0) ? (const void **) &mem_locs->front() : NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_NATIVE_KERNEL); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueMarker(Event* event = NULL) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueMarker( object_, (event != NULL) ? &tmp : NULL), __ENQUEUE_MARKER_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueWaitForEvents(const vector& events) const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueWaitForEvents( object_, (cl_uint) events.size(), events.size() > 0 ? (const cl_event*) &events.front() : NULL), __ENQUEUE_WAIT_FOR_EVENTS_ERR); } #endif // defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) cl_int enqueueAcquireGLObjects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueAcquireGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseGLObjects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueReleaseGLObjects( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if defined (CL_HPP_USE_DX_INTEROP) typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueAcquireD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clEnqueueReleaseD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event); cl_int enqueueAcquireD3D10Objects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { static PFN_clEnqueueAcquireD3D10ObjectsKHR pfn_clEnqueueAcquireD3D10ObjectsKHR = NULL; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clEnqueueAcquireD3D10ObjectsKHR); #endif #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clEnqueueAcquireD3D10ObjectsKHR); #endif cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueAcquireD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_ACQUIRE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } cl_int enqueueReleaseD3D10Objects( const vector* mem_objects = NULL, const vector* events = NULL, Event* event = NULL) const { static PFN_clEnqueueReleaseD3D10ObjectsKHR pfn_clEnqueueReleaseD3D10ObjectsKHR = NULL; #if CL_HPP_TARGET_OPENCL_VERSION >= 120 cl_context context = getInfo(); cl::Device device(getInfo()); cl_platform_id platform = device.getInfo(); CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_(platform, clEnqueueReleaseD3D10ObjectsKHR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 120 #if CL_HPP_TARGET_OPENCL_VERSION >= 110 CL_HPP_INIT_CL_EXT_FCN_PTR_(clEnqueueReleaseD3D10ObjectsKHR); #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 cl_event tmp; cl_int err = detail::errHandler( pfn_clEnqueueReleaseD3D10ObjectsKHR( object_, (mem_objects != NULL) ? (cl_uint) mem_objects->size() : 0, (mem_objects != NULL && mem_objects->size() > 0) ? (const cl_mem *) &mem_objects->front(): NULL, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_RELEASE_GL_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #endif /** * Deprecated APIs for 1.2 */ #if defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int enqueueBarrier() const CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return detail::errHandler( ::clEnqueueBarrier(object_), __ENQUEUE_BARRIER_ERR); } #endif // CL_USE_DEPRECATED_OPENCL_1_1_APIS cl_int flush() const { return detail::errHandler(::clFlush(object_), __FLUSH_ERR); } cl_int finish() const { return detail::errHandler(::clFinish(object_), __FINISH_ERR); } }; // CommandQueue CL_HPP_DEFINE_STATIC_MEMBER_ std::once_flag CommandQueue::default_initialized_; CL_HPP_DEFINE_STATIC_MEMBER_ CommandQueue CommandQueue::default_; CL_HPP_DEFINE_STATIC_MEMBER_ cl_int CommandQueue::default_error_ = CL_SUCCESS; #if CL_HPP_TARGET_OPENCL_VERSION >= 200 enum class DeviceQueueProperties : cl_command_queue_properties { None = 0, Profiling = CL_QUEUE_PROFILING_ENABLE, }; inline DeviceQueueProperties operator|(DeviceQueueProperties lhs, DeviceQueueProperties rhs) { return static_cast(static_cast(lhs) | static_cast(rhs)); } /*! \class DeviceCommandQueue * \brief DeviceCommandQueue interface for device cl_command_queues. */ class DeviceCommandQueue : public detail::Wrapper { public: /*! * Trivial empty constructor to create a null queue. */ DeviceCommandQueue() { } /*! * Default construct device command queue on default context and device */ DeviceCommandQueue(DeviceQueueProperties properties, cl_int* err = NULL) { cl_int error; cl::Context context = cl::Context::getDefault(); cl::Device device = cl::Device::getDefault(); cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! * Create a device command queue for a specified device in the passed context. */ DeviceCommandQueue( const Context& context, const Device& device, DeviceQueueProperties properties = DeviceQueueProperties::None, cl_int* err = NULL) { cl_int error; cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! * Create a device command queue for a specified device in the passed context. */ DeviceCommandQueue( const Context& context, const Device& device, cl_uint queueSize, DeviceQueueProperties properties = DeviceQueueProperties::None, cl_int* err = NULL) { cl_int error; cl_command_queue_properties mergedProperties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | static_cast(properties); cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, mergedProperties, CL_QUEUE_SIZE, queueSize, 0 }; object_ = ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } } /*! \brief Constructor from cl_command_queue - takes ownership. * * \param retainObject will cause the constructor to retain its cl object. * Defaults to false to maintain compatibility with * earlier versions. */ explicit DeviceCommandQueue(const cl_command_queue& commandQueue, bool retainObject = false) : detail::Wrapper(commandQueue, retainObject) { } DeviceCommandQueue& operator = (const cl_command_queue& rhs) { detail::Wrapper::operator=(rhs); return *this; } /*! \brief Copy constructor to forward copy to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue(const DeviceCommandQueue& queue) : detail::Wrapper(queue) {} /*! \brief Copy assignment to forward copy to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue& operator = (const DeviceCommandQueue &queue) { detail::Wrapper::operator=(queue); return *this; } /*! \brief Move constructor to forward move to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue(DeviceCommandQueue&& queue) CL_HPP_NOEXCEPT_ : detail::Wrapper(std::move(queue)) {} /*! \brief Move assignment to forward move to the superclass correctly. * Required for MSVC. */ DeviceCommandQueue& operator = (DeviceCommandQueue &&queue) { detail::Wrapper::operator=(std::move(queue)); return *this; } template cl_int getInfo(cl_command_queue_info name, T* param) const { return detail::errHandler( detail::getInfo( &::clGetCommandQueueInfo, object_, name, param), __GET_COMMAND_QUEUE_INFO_ERR); } template typename detail::param_traits::param_type getInfo(cl_int* err = NULL) const { typename detail::param_traits< detail::cl_command_queue_info, name>::param_type param; cl_int result = getInfo(name, ¶m); if (err != NULL) { *err = result; } return param; } /*! * Create a new default device command queue for the default device, * in the default context and of the default size. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( cl_int *err = nullptr) { cl_int error; cl::Context context = cl::Context::getDefault(); cl::Device device = cl::Device::getDefault(); cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } /*! * Create a new default device command queue for the specified device * and of the default size. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( const Context &context, const Device &device, cl_int *err = nullptr) { cl_int error; cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } /*! * Create a new default device command queue for the specified device * and of the requested size in bytes. * If there is already a default queue for the specified device this * function will return the pre-existing queue. */ static DeviceCommandQueue makeDefault( const Context &context, const Device &device, cl_uint queueSize, cl_int *err = nullptr) { cl_int error; cl_command_queue_properties properties = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT; cl_queue_properties queue_properties[] = { CL_QUEUE_PROPERTIES, properties, CL_QUEUE_SIZE, queueSize, 0 }; DeviceCommandQueue deviceQueue( ::clCreateCommandQueueWithProperties( context(), device(), queue_properties, &error)); detail::errHandler(error, __CREATE_COMMAND_QUEUE_WITH_PROPERTIES_ERR); if (err != NULL) { *err = error; } return deviceQueue; } }; // DeviceCommandQueue namespace detail { // Specialization for device command queue template <> struct KernelArgumentHandler { static size_type size(const cl::DeviceCommandQueue&) { return sizeof(cl_command_queue); } static const cl_command_queue* ptr(const cl::DeviceCommandQueue& value) { return &(value()); } }; } // namespace detail #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 template< typename IteratorType > Buffer::Buffer( const Context &context, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if( readOnly ) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if( useHostPtr ) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); if( useHostPtr ) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if( !useHostPtr ) { CommandQueue queue(context, 0, &error); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } template< typename IteratorType > Buffer::Buffer( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, bool readOnly, bool useHostPtr, cl_int* err) { typedef typename std::iterator_traits::value_type DataType; cl_int error; cl_mem_flags flags = 0; if (readOnly) { flags |= CL_MEM_READ_ONLY; } else { flags |= CL_MEM_READ_WRITE; } if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; } size_type size = sizeof(DataType)*(endIterator - startIterator); Context context = queue.getInfo(); if (useHostPtr) { object_ = ::clCreateBuffer(context(), flags, size, static_cast(&*startIterator), &error); } else { object_ = ::clCreateBuffer(context(), flags, size, 0, &error); } detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } if (!useHostPtr) { error = cl::copy(queue, startIterator, endIterator, *this); detail::errHandler(error, __CREATE_BUFFER_ERR); if (err != NULL) { *err = error; } } } inline cl_int enqueueReadBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBuffer(buffer, blocking, offset, size, ptr, events, event); } inline cl_int enqueueWriteBuffer( const Buffer& buffer, cl_bool blocking, size_type offset, size_type size, const void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBuffer(buffer, blocking, offset, size, ptr, events, event); } inline void* enqueueMapBuffer( const Buffer& buffer, cl_bool blocking, cl_map_flags flags, size_type offset, size_type size, const vector* events = NULL, Event* event = NULL, cl_int* err = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } void * result = ::clEnqueueMapBuffer( queue(), buffer(), blocking, flags, offset, size, (events != NULL) ? (cl_uint) events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*) &events->front() : NULL, (cl_event*) event, &error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (err != NULL) { *err = error; } return result; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a raw SVM pointer. */ template inline cl_int enqueueMapSVM( T* ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events, Event* event) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( ptr, blocking, flags, size, events, event); } /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a cl::pointer instance. */ template inline cl_int enqueueMapSVM( cl::pointer ptr, cl_bool blocking, cl_map_flags flags, size_type size, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( ptr, blocking, flags, size, events, event); } /** * Enqueues to the default queue a command that will allow the host to * update a region of a coarse-grained SVM buffer. * This variant takes a cl::vector instance. */ template inline cl_int enqueueMapSVM( cl::vector container, cl_bool blocking, cl_map_flags flags, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); } return queue.enqueueMapSVM( container, blocking, flags, events, event); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 inline cl_int enqueueUnmapMemObject( const Memory& memory, void* mapped_ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); detail::errHandler(error, __ENQUEUE_MAP_BUFFER_ERR); if (error != CL_SUCCESS) { return error; } cl_event tmp; cl_int err = detail::errHandler( ::clEnqueueUnmapMemObject( queue(), memory(), mapped_ptr, (events != NULL) ? (cl_uint)events->size() : 0, (events != NULL && events->size() > 0) ? (cl_event*)&events->front() : NULL, (event != NULL) ? &tmp : NULL), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); if (event != NULL && err == CL_SUCCESS) *event = tmp; return err; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a raw SVM pointer. */ template inline cl_int enqueueUnmapSVM( T* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(ptr, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a cl::pointer instance. */ template inline cl_int enqueueUnmapSVM( cl::pointer &ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(ptr, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } /** * Enqueues to the default queue a command that will release a coarse-grained * SVM buffer back to the OpenCL runtime. * This variant takes a cl::vector instance. */ template inline cl_int enqueueUnmapSVM( cl::vector &container, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return detail::errHandler(error, __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } return detail::errHandler(queue.enqueueUnmapSVM(container, events, event), __ENQUEUE_UNMAP_MEM_OBJECT_ERR); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 inline cl_int enqueueCopyBuffer( const Buffer& src, const Buffer& dst, size_type src_offset, size_type dst_offset, size_type size, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBuffer(src, dst, src_offset, dst_offset, size, events, event); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, startIterator, endIterator, buffer); } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses default command queue. */ template< typename IteratorType > inline cl_int copy( const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) return error; return cl::copy(queue, buffer, startIterator, endIterator); } /** * Blocking copy operation between iterators and a buffer. * Host to Device. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; size_type length = endIterator-startIterator; size_type byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_WRITE, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } #if defined(_MSC_VER) std::copy( startIterator, endIterator, stdext::checked_array_iterator( pointer, length)); #else std::copy(startIterator, endIterator, pointer); #endif Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } /** * Blocking copy operation between iterators and a buffer. * Device to Host. * Uses specified queue. */ template< typename IteratorType > inline cl_int copy( const CommandQueue &queue, const cl::Buffer &buffer, IteratorType startIterator, IteratorType endIterator ) { typedef typename std::iterator_traits::value_type DataType; cl_int error; size_type length = endIterator-startIterator; size_type byteLength = length*sizeof(DataType); DataType *pointer = static_cast(queue.enqueueMapBuffer(buffer, CL_TRUE, CL_MAP_READ, 0, byteLength, 0, 0, &error)); // if exceptions enabled, enqueueMapBuffer will throw if( error != CL_SUCCESS ) { return error; } std::copy(pointer, pointer + length, startIterator); Event endEvent; error = queue.enqueueUnmapMemObject(buffer, pointer, 0, &endEvent); // if exceptions enabled, enqueueUnmapMemObject will throw if( error != CL_SUCCESS ) { return error; } endEvent.wait(); return CL_SUCCESS; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 /** * Blocking SVM map operation - performs a blocking map underneath. */ template inline cl_int mapSVM(cl::vector &container) { return enqueueMapSVM(container, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE); } /** * Blocking SVM map operation - performs a blocking map underneath. */ template inline cl_int unmapSVM(cl::vector &container) { return enqueueUnmapSVM(container); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 #if CL_HPP_TARGET_OPENCL_VERSION >= 110 inline cl_int enqueueReadBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, void *ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueWriteBufferRect( const Buffer& buffer, cl_bool blocking, const array& buffer_offset, const array& host_offset, const array& region, size_type buffer_row_pitch, size_type buffer_slice_pitch, size_type host_row_pitch, size_type host_slice_pitch, const void *ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteBufferRect( buffer, blocking, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, events, event); } inline cl_int enqueueCopyBufferRect( const Buffer& src, const Buffer& dst, const array& src_origin, const array& dst_origin, const array& region, size_type src_row_pitch, size_type src_slice_pitch, size_type dst_row_pitch, size_type dst_slice_pitch, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferRect( src, dst, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, events, event); } #endif // CL_HPP_TARGET_OPENCL_VERSION >= 110 inline cl_int enqueueReadImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueReadImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueWriteImage( const Image& image, cl_bool blocking, const array& origin, const array& region, size_type row_pitch, size_type slice_pitch, const void* ptr, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueWriteImage( image, blocking, origin, region, row_pitch, slice_pitch, ptr, events, event); } inline cl_int enqueueCopyImage( const Image& src, const Image& dst, const array& src_origin, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImage( src, dst, src_origin, dst_origin, region, events, event); } inline cl_int enqueueCopyImageToBuffer( const Image& src, const Buffer& dst, const array& src_origin, const array& region, size_type dst_offset, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyImageToBuffer( src, dst, src_origin, region, dst_offset, events, event); } inline cl_int enqueueCopyBufferToImage( const Buffer& src, const Image& dst, size_type src_offset, const array& dst_origin, const array& region, const vector* events = NULL, Event* event = NULL) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.enqueueCopyBufferToImage( src, dst, src_offset, dst_origin, region, events, event); } inline cl_int flush(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.flush(); } inline cl_int finish(void) { cl_int error; CommandQueue queue = CommandQueue::getDefault(&error); if (error != CL_SUCCESS) { return error; } return queue.finish(); } class EnqueueArgs { private: CommandQueue queue_; const NDRange offset_; const NDRange global_; const NDRange local_; vector events_; template friend class KernelFunctor; public: EnqueueArgs(NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { } EnqueueArgs(Event e, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(Event e, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(Event e, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(const vector &events, NDRange global) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(const vector &events, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(const vector &events, NDRange offset, NDRange global, NDRange local) : queue_(CommandQueue::getDefault()), offset_(offset), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { } EnqueueArgs(CommandQueue &queue, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { } EnqueueArgs(CommandQueue &queue, Event e, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, Event e, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local) { events_.push_back(e); } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange global) : queue_(queue), offset_(NullRange), global_(global), local_(NullRange), events_(events) { } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange global, NDRange local) : queue_(queue), offset_(NullRange), global_(global), local_(local), events_(events) { } EnqueueArgs(CommandQueue &queue, const vector &events, NDRange offset, NDRange global, NDRange local) : queue_(queue), offset_(offset), global_(global), local_(local), events_(events) { } }; //---------------------------------------------------------------------------------------------- /** * Type safe kernel functor. * */ template class KernelFunctor { private: Kernel kernel_; template void setArgs(T0&& t0, T1s&&... t1s) { kernel_.setArg(index, t0); setArgs(std::forward(t1s)...); } template void setArgs(T0&& t0) { kernel_.setArg(index, t0); } template void setArgs() { } public: KernelFunctor(Kernel kernel) : kernel_(kernel) {} KernelFunctor( const Program& program, const string name, cl_int * err = NULL) : kernel_(program, name.c_str(), err) {} //! \brief Return type of the functor typedef Event result_type; /** * Enqueue kernel. * @param args Launch parameters of the kernel. * @param t0... List of kernel arguments based on the template type of the functor. */ Event operator() ( const EnqueueArgs& args, Ts... ts) { Event event; setArgs<0>(std::forward(ts)...); args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } /** * Enqueue kernel with support for error code. * @param args Launch parameters of the kernel. * @param t0... List of kernel arguments based on the template type of the functor. * @param error Out parameter returning the error code from the execution. */ Event operator() ( const EnqueueArgs& args, Ts... ts, cl_int &error) { Event event; setArgs<0>(std::forward(ts)...); error = args.queue_.enqueueNDRangeKernel( kernel_, args.offset_, args.global_, args.local_, &args.events_, &event); return event; } #if CL_HPP_TARGET_OPENCL_VERSION >= 200 cl_int setSVMPointers(const vector &pointerList) { return kernel_.setSVMPointers(pointerList); } template cl_int setSVMPointers(const T0 &t0, T1s... ts) { return kernel_.setSVMPointers(t0, ts...); } #endif // #if CL_HPP_TARGET_OPENCL_VERSION >= 200 Kernel getKernel() { return kernel_; } }; namespace compatibility { /** * Backward compatibility class to ensure that cl.hpp code works with cl2.hpp. * Please use KernelFunctor directly. */ template struct make_kernel { typedef KernelFunctor FunctorType; FunctorType functor_; make_kernel( const Program& program, const string name, cl_int * err = NULL) : functor_(FunctorType(program, name, err)) {} make_kernel( const Kernel kernel) : functor_(FunctorType(kernel)) {} //! \brief Return type of the functor typedef Event result_type; //! \brief Function signature of kernel functor with no event dependency. typedef Event type_( const EnqueueArgs&, Ts...); Event operator()( const EnqueueArgs& enqueueArgs, Ts... args) { return functor_( enqueueArgs, args...); } }; } // namespace compatibility //---------------------------------------------------------------------------------------------------------------------- #undef CL_HPP_ERR_STR_ #if !defined(CL_HPP_USER_OVERRIDE_ERROR_STRINGS) #undef __GET_DEVICE_INFO_ERR #undef __GET_PLATFORM_INFO_ERR #undef __GET_DEVICE_IDS_ERR #undef __GET_CONTEXT_INFO_ERR #undef __GET_EVENT_INFO_ERR #undef __GET_EVENT_PROFILE_INFO_ERR #undef __GET_MEM_OBJECT_INFO_ERR #undef __GET_IMAGE_INFO_ERR #undef __GET_SAMPLER_INFO_ERR #undef __GET_KERNEL_INFO_ERR #undef __GET_KERNEL_ARG_INFO_ERR #undef __GET_KERNEL_WORK_GROUP_INFO_ERR #undef __GET_PROGRAM_INFO_ERR #undef __GET_PROGRAM_BUILD_INFO_ERR #undef __GET_COMMAND_QUEUE_INFO_ERR #undef __CREATE_CONTEXT_ERR #undef __CREATE_CONTEXT_FROM_TYPE_ERR #undef __GET_SUPPORTED_IMAGE_FORMATS_ERR #undef __CREATE_BUFFER_ERR #undef __CREATE_SUBBUFFER_ERR #undef __CREATE_IMAGE2D_ERR #undef __CREATE_IMAGE3D_ERR #undef __CREATE_SAMPLER_ERR #undef __SET_MEM_OBJECT_DESTRUCTOR_CALLBACK_ERR #undef __CREATE_USER_EVENT_ERR #undef __SET_USER_EVENT_STATUS_ERR #undef __SET_EVENT_CALLBACK_ERR #undef __SET_PRINTF_CALLBACK_ERR #undef __WAIT_FOR_EVENTS_ERR #undef __CREATE_KERNEL_ERR #undef __SET_KERNEL_ARGS_ERR #undef __CREATE_PROGRAM_WITH_SOURCE_ERR #undef __CREATE_PROGRAM_WITH_BINARY_ERR #undef __CREATE_PROGRAM_WITH_BUILT_IN_KERNELS_ERR #undef __BUILD_PROGRAM_ERR #undef __CREATE_KERNELS_IN_PROGRAM_ERR #undef __CREATE_COMMAND_QUEUE_ERR #undef __SET_COMMAND_QUEUE_PROPERTY_ERR #undef __ENQUEUE_READ_BUFFER_ERR #undef __ENQUEUE_WRITE_BUFFER_ERR #undef __ENQUEUE_READ_BUFFER_RECT_ERR #undef __ENQUEUE_WRITE_BUFFER_RECT_ERR #undef __ENQEUE_COPY_BUFFER_ERR #undef __ENQEUE_COPY_BUFFER_RECT_ERR #undef __ENQUEUE_READ_IMAGE_ERR #undef __ENQUEUE_WRITE_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_ERR #undef __ENQUEUE_COPY_IMAGE_TO_BUFFER_ERR #undef __ENQUEUE_COPY_BUFFER_TO_IMAGE_ERR #undef __ENQUEUE_MAP_BUFFER_ERR #undef __ENQUEUE_MAP_IMAGE_ERR #undef __ENQUEUE_UNMAP_MEM_OBJECT_ERR #undef __ENQUEUE_NDRANGE_KERNEL_ERR #undef __ENQUEUE_TASK_ERR #undef __ENQUEUE_NATIVE_KERNEL #undef __UNLOAD_COMPILER_ERR #undef __CREATE_SUB_DEVICES_ERR #undef __CREATE_PIPE_ERR #undef __GET_PIPE_INFO_ERR #endif //CL_HPP_USER_OVERRIDE_ERROR_STRINGS // Extensions #undef CL_HPP_INIT_CL_EXT_FCN_PTR_ #undef CL_HPP_INIT_CL_EXT_FCN_PTR_PLATFORM_ #if defined(CL_HPP_USE_CL_DEVICE_FISSION) #undef CL_HPP_PARAM_NAME_DEVICE_FISSION_ #endif // CL_HPP_USE_CL_DEVICE_FISSION #undef CL_HPP_NOEXCEPT_ #undef CL_HPP_DEFINE_STATIC_MEMBER_ } // namespace cl #endif // CL_HPP_ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_d3d10.h000066400000000000000000000103661450307266000232730ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D10_H #define __OPENCL_CL_D3D10_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d10_sharing */ #define cl_khr_d3d10_sharing 1 typedef cl_uint cl_d3d10_device_source_khr; typedef cl_uint cl_d3d10_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D10_DEVICE_KHR -1002 #define CL_INVALID_D3D10_RESOURCE_KHR -1003 #define CL_D3D10_RESOURCE_ALREADY_ACQUIRED_KHR -1004 #define CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR -1005 /* cl_d3d10_device_source_nv */ #define CL_D3D10_DEVICE_KHR 0x4010 #define CL_D3D10_DXGI_ADAPTER_KHR 0x4011 /* cl_d3d10_device_set_nv */ #define CL_PREFERRED_DEVICES_FOR_D3D10_KHR 0x4012 #define CL_ALL_DEVICES_FOR_D3D10_KHR 0x4013 /* cl_context_info */ #define CL_CONTEXT_D3D10_DEVICE_KHR 0x4014 #define CL_CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR 0x402C /* cl_mem_info */ #define CL_MEM_D3D10_RESOURCE_KHR 0x4015 /* cl_image_info */ #define CL_IMAGE_D3D10_SUBRESOURCE_KHR 0x4016 /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR 0x4017 #define CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR 0x4018 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D10KHR_fn)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D10_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_d3d11.h000066400000000000000000000103601450307266000232660ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D11_H #define __OPENCL_CL_D3D11_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d11_sharing */ #define cl_khr_d3d11_sharing 1 typedef cl_uint cl_d3d11_device_source_khr; typedef cl_uint cl_d3d11_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D11_DEVICE_KHR -1006 #define CL_INVALID_D3D11_RESOURCE_KHR -1007 #define CL_D3D11_RESOURCE_ALREADY_ACQUIRED_KHR -1008 #define CL_D3D11_RESOURCE_NOT_ACQUIRED_KHR -1009 /* cl_d3d11_device_source */ #define CL_D3D11_DEVICE_KHR 0x4019 #define CL_D3D11_DXGI_ADAPTER_KHR 0x401A /* cl_d3d11_device_set */ #define CL_PREFERRED_DEVICES_FOR_D3D11_KHR 0x401B #define CL_ALL_DEVICES_FOR_D3D11_KHR 0x401C /* cl_context_info */ #define CL_CONTEXT_D3D11_DEVICE_KHR 0x401D #define CL_CONTEXT_D3D11_PREFER_SHARED_RESOURCES_KHR 0x402D /* cl_mem_info */ #define CL_MEM_D3D11_RESOURCE_KHR 0x401E /* cl_image_info */ #define CL_IMAGE_D3D11_SUBRESOURCE_KHR 0x401F /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D11_OBJECTS_KHR 0x4020 #define CL_COMMAND_RELEASE_D3D11_OBJECTS_KHR 0x4021 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D11KHR_fn)( cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void * d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D11_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_dx9_media_sharing.h000066400000000000000000000110411450307266000260250ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_DX9_MEDIA_SHARING_H #define __OPENCL_CL_DX9_MEDIA_SHARING_H #include #include #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ /* cl_khr_dx9_media_sharing */ #define cl_khr_dx9_media_sharing 1 typedef cl_uint cl_dx9_media_adapter_type_khr; typedef cl_uint cl_dx9_media_adapter_set_khr; #if defined(_WIN32) #include typedef struct _cl_dx9_surface_info_khr { IDirect3DSurface9 *resource; HANDLE shared_handle; } cl_dx9_surface_info_khr; #endif /******************************************************************************/ /* Error Codes */ #define CL_INVALID_DX9_MEDIA_ADAPTER_KHR -1010 #define CL_INVALID_DX9_MEDIA_SURFACE_KHR -1011 #define CL_DX9_MEDIA_SURFACE_ALREADY_ACQUIRED_KHR -1012 #define CL_DX9_MEDIA_SURFACE_NOT_ACQUIRED_KHR -1013 /* cl_media_adapter_type_khr */ #define CL_ADAPTER_D3D9_KHR 0x2020 #define CL_ADAPTER_D3D9EX_KHR 0x2021 #define CL_ADAPTER_DXVA_KHR 0x2022 /* cl_media_adapter_set_khr */ #define CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2023 #define CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2024 /* cl_context_info */ #define CL_CONTEXT_ADAPTER_D3D9_KHR 0x2025 #define CL_CONTEXT_ADAPTER_D3D9EX_KHR 0x2026 #define CL_CONTEXT_ADAPTER_DXVA_KHR 0x2027 /* cl_mem_info */ #define CL_MEM_DX9_MEDIA_ADAPTER_TYPE_KHR 0x2028 #define CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR 0x2029 /* cl_image_info */ #define CL_IMAGE_DX9_MEDIA_PLANE_KHR 0x202A /* cl_command_type */ #define CL_COMMAND_ACQUIRE_DX9_MEDIA_SURFACES_KHR 0x202B #define CL_COMMAND_RELEASE_DX9_MEDIA_SURFACES_KHR 0x202C /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromDX9MediaAdapterKHR_fn)( cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr * media_adapter_type, void * media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceKHR_fn)( cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void * surface_info, cl_uint plane, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_DX9_MEDIA_SHARING_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_dx9_media_sharing_intel.h000066400000000000000000000151661450307266000272340ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /*****************************************************************************\ Copyright (c) 2013-2019 Intel Corporation All Rights Reserved. THESE MATERIALS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE MATERIALS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. File Name: cl_dx9_media_sharing_intel.h Abstract: Notes: \*****************************************************************************/ #ifndef __OPENCL_CL_DX9_MEDIA_SHARING_INTEL_H #define __OPENCL_CL_DX9_MEDIA_SHARING_INTEL_H #include #include #include #include #include #include #ifdef __cplusplus extern "C" { #endif /*************************************** * cl_intel_dx9_media_sharing extension * ****************************************/ #define cl_intel_dx9_media_sharing 1 typedef cl_uint cl_dx9_device_source_intel; typedef cl_uint cl_dx9_device_set_intel; /* error codes */ #define CL_INVALID_DX9_DEVICE_INTEL -1010 #define CL_INVALID_DX9_RESOURCE_INTEL -1011 #define CL_DX9_RESOURCE_ALREADY_ACQUIRED_INTEL -1012 #define CL_DX9_RESOURCE_NOT_ACQUIRED_INTEL -1013 /* cl_dx9_device_source_intel */ #define CL_D3D9_DEVICE_INTEL 0x4022 #define CL_D3D9EX_DEVICE_INTEL 0x4070 #define CL_DXVA_DEVICE_INTEL 0x4071 /* cl_dx9_device_set_intel */ #define CL_PREFERRED_DEVICES_FOR_DX9_INTEL 0x4024 #define CL_ALL_DEVICES_FOR_DX9_INTEL 0x4025 /* cl_context_info */ #define CL_CONTEXT_D3D9_DEVICE_INTEL 0x4026 #define CL_CONTEXT_D3D9EX_DEVICE_INTEL 0x4072 #define CL_CONTEXT_DXVA_DEVICE_INTEL 0x4073 /* cl_mem_info */ #define CL_MEM_DX9_RESOURCE_INTEL 0x4027 #define CL_MEM_DX9_SHARED_HANDLE_INTEL 0x4074 /* cl_image_info */ #define CL_IMAGE_DX9_PLANE_INTEL 0x4075 /* cl_command_type */ #define CL_COMMAND_ACQUIRE_DX9_OBJECTS_INTEL 0x402A #define CL_COMMAND_RELEASE_DX9_OBJECTS_INTEL 0x402B /******************************************************************************/ extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDsFromDX9INTEL( cl_platform_id platform, cl_dx9_device_source_intel dx9_device_source, void* dx9_object, cl_dx9_device_set_intel dx9_device_set, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL* clGetDeviceIDsFromDX9INTEL_fn)( cl_platform_id platform, cl_dx9_device_source_intel dx9_device_source, void* dx9_object, cl_dx9_device_set_intel dx9_device_set, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices) CL_EXT_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromDX9MediaSurfaceINTEL( cl_context context, cl_mem_flags flags, IDirect3DSurface9* resource, HANDLE sharedHandle, UINT plane, cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceINTEL_fn)( cl_context context, cl_mem_flags flags, IDirect3DSurface9* resource, HANDLE sharedHandle, UINT plane, cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireDX9ObjectsINTEL( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9ObjectsINTEL_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseDX9ObjectsINTEL( cl_command_queue command_queue, cl_uint num_objects, cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseDX9ObjectsINTEL_fn)( cl_command_queue command_queue, cl_uint num_objects, cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_1; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_DX9_MEDIA_SHARING_INTEL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_egl.h000066400000000000000000000106061450307266000232240ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ #ifndef __OPENCL_CL_EGL_H #define __OPENCL_CL_EGL_H #include #ifdef __cplusplus extern "C" { #endif /* Command type for events created with clEnqueueAcquireEGLObjectsKHR */ #define CL_COMMAND_EGL_FENCE_SYNC_OBJECT_KHR 0x202F #define CL_COMMAND_ACQUIRE_EGL_OBJECTS_KHR 0x202D #define CL_COMMAND_RELEASE_EGL_OBJECTS_KHR 0x202E /* Error type for clCreateFromEGLImageKHR */ #define CL_INVALID_EGL_OBJECT_KHR -1093 #define CL_EGL_RESOURCE_NOT_ACQUIRED_KHR -1092 /* CLeglImageKHR is an opaque handle to an EGLImage */ typedef void* CLeglImageKHR; /* CLeglDisplayKHR is an opaque handle to an EGLDisplay */ typedef void* CLeglDisplayKHR; /* CLeglSyncKHR is an opaque handle to an EGLSync object */ typedef void* CLeglSyncKHR; /* properties passed to clCreateFromEGLImageKHR */ typedef intptr_t cl_egl_image_properties_khr; #define cl_khr_egl_image 1 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromEGLImageKHR(cl_context context, CLeglDisplayKHR egldisplay, CLeglImageKHR eglimage, cl_mem_flags flags, const cl_egl_image_properties_khr * properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromEGLImageKHR_fn)( cl_context context, CLeglDisplayKHR egldisplay, CLeglImageKHR eglimage, cl_mem_flags flags, const cl_egl_image_properties_khr * properties, cl_int * errcode_ret); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireEGLObjectsKHR(cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireEGLObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseEGLObjectsKHR(cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseEGLObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); #define cl_khr_egl_event 1 extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromEGLSyncKHR(cl_context context, CLeglSyncKHR sync, CLeglDisplayKHR display, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_event (CL_API_CALL *clCreateEventFromEGLSyncKHR_fn)( cl_context context, CLeglSyncKHR sync, CLeglDisplayKHR display, cl_int * errcode_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_EGL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_ext.h000066400000000000000000001156741450307266000232700ustar00rootroot00000000000000/* Modifications Copyright (C) 2010-2021 Advanced Micro Devices, Inc. */ /******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /* cl_ext.h contains OpenCL extensions which don't have external */ /* (OpenGL, D3D) dependencies. */ #ifndef __CL_EXT_H #define __CL_EXT_H #ifdef __cplusplus extern "C" { #endif #include /* cl_khr_fp64 extension - no extension #define since it has no functions */ /* CL_DEVICE_DOUBLE_FP_CONFIG is defined in CL.h for OpenCL >= 120 */ #if CL_TARGET_OPENCL_VERSION <= 110 #define CL_DEVICE_DOUBLE_FP_CONFIG 0x1032 #endif /* cl_khr_fp16 extension - no extension #define since it has no functions */ #define CL_DEVICE_HALF_FP_CONFIG 0x1033 /* Memory object destruction * * Apple extension for use to manage externally allocated buffers used with cl_mem objects with CL_MEM_USE_HOST_PTR * * Registers a user callback function that will be called when the memory object is deleted and its resources * freed. Each call to clSetMemObjectCallbackFn registers the specified user callback function on a callback * stack associated with memobj. The registered user callback functions are called in the reverse order in * which they were registered. The user callback functions are called and then the memory object is deleted * and its resources freed. This provides a mechanism for the application (and libraries) using memobj to be * notified when the memory referenced by host_ptr, specified when the memory object is created and used as * the storage bits for the memory object, can be reused or freed. * * The application may not call CL api's with the cl_mem object passed to the pfn_notify. * * Please check for the "cl_APPLE_SetMemObjectDestructor" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. */ #define cl_APPLE_SetMemObjectDestructor 1 cl_int CL_API_ENTRY clSetMemObjectDestructorAPPLE( cl_mem memobj, void (* pfn_notify)(cl_mem memobj, void * user_data), void * user_data) CL_EXT_SUFFIX__VERSION_1_0; /* Context Logging Functions * * The next three convenience functions are intended to be used as the pfn_notify parameter to clCreateContext(). * Please check for the "cl_APPLE_ContextLoggingFunctions" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. * * clLogMessagesToSystemLog forwards on all log messages to the Apple System Logger */ #define cl_APPLE_ContextLoggingFunctions 1 extern void CL_API_ENTRY clLogMessagesToSystemLogAPPLE( const char * errstr, const void * private_info, size_t cb, void * user_data) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStdout sends all log messages to the file descriptor stdout */ extern void CL_API_ENTRY clLogMessagesToStdoutAPPLE( const char * errstr, const void * private_info, size_t cb, void * user_data) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStderr sends all log messages to the file descriptor stderr */ extern void CL_API_ENTRY clLogMessagesToStderrAPPLE( const char * errstr, const void * private_info, size_t cb, void * user_data) CL_EXT_SUFFIX__VERSION_1_0; /************************ * cl_khr_icd extension * ************************/ #define cl_khr_icd 1 /* cl_platform_info */ #define CL_PLATFORM_ICD_SUFFIX_KHR 0x0920 /* Additional Error Codes */ #define CL_PLATFORM_NOT_FOUND_KHR -1001 extern CL_API_ENTRY cl_int CL_API_CALL clIcdGetPlatformIDsKHR(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms); typedef CL_API_ENTRY cl_int (CL_API_CALL *clIcdGetPlatformIDsKHR_fn)(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms); /******************************* * cl_khr_il_program extension * *******************************/ #define cl_khr_il_program 1 /* New property to clGetDeviceInfo for retrieving supported intermediate * languages */ #define CL_DEVICE_IL_VERSION_KHR 0x105B /* New property to clGetProgramInfo for retrieving for retrieving the IL of a * program */ #define CL_PROGRAM_IL_KHR 0x1169 extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithILKHR(cl_context context, const void * il, size_t length, cl_int * errcode_ret); typedef CL_API_ENTRY cl_program (CL_API_CALL *clCreateProgramWithILKHR_fn)(cl_context context, const void * il, size_t length, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_2; /* Extension: cl_khr_image2d_from_buffer * * This extension allows a 2D image to be created from a cl_mem buffer without * a copy. The type associated with a 2D image created from a buffer in an * OpenCL program is image2d_t. Both the sampler and sampler-less read_image * built-in functions are supported for 2D images and 2D images created from * a buffer. Similarly, the write_image built-ins are also supported for 2D * images created from a buffer. * * When the 2D image from buffer is created, the client must specify the * width, height, image format (i.e. channel order and channel data type) * and optionally the row pitch. * * The pitch specified must be a multiple of * CL_DEVICE_IMAGE_PITCH_ALIGNMENT_KHR pixels. * The base address of the buffer must be aligned to * CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT_KHR pixels. */ #define CL_DEVICE_IMAGE_PITCH_ALIGNMENT_KHR 0x104A #define CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT_KHR 0x104B /************************************** * cl_khr_initialize_memory extension * **************************************/ #define CL_CONTEXT_MEMORY_INITIALIZE_KHR 0x2030 /************************************** * cl_khr_terminate_context extension * **************************************/ #define CL_DEVICE_TERMINATE_CAPABILITY_KHR 0x2031 #define CL_CONTEXT_TERMINATE_KHR 0x2032 #define cl_khr_terminate_context 1 extern CL_API_ENTRY cl_int CL_API_CALL clTerminateContextKHR(cl_context context) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clTerminateContextKHR_fn)(cl_context context) CL_EXT_SUFFIX__VERSION_1_2; /* * Extension: cl_khr_spir * * This extension adds support to create an OpenCL program object from a * Standard Portable Intermediate Representation (SPIR) instance */ #define CL_DEVICE_SPIR_VERSIONS 0x40E0 #define CL_PROGRAM_BINARY_TYPE_INTERMEDIATE 0x40E1 /***************************************** * cl_khr_create_command_queue extension * *****************************************/ #define cl_khr_create_command_queue 1 typedef cl_bitfield cl_queue_properties_khr; extern CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueueWithPropertiesKHR(cl_context context, cl_device_id device, const cl_queue_properties_khr* properties, cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_command_queue (CL_API_CALL *clCreateCommandQueueWithPropertiesKHR_fn)(cl_context context, cl_device_id device, const cl_queue_properties_khr* properties, cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2; /****************************************** * cl_nv_device_attribute_query extension * ******************************************/ /* cl_nv_device_attribute_query extension - no extension #define since it has no functions */ #define CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV 0x4000 #define CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV 0x4001 #define CL_DEVICE_REGISTERS_PER_BLOCK_NV 0x4002 #define CL_DEVICE_WARP_SIZE_NV 0x4003 #define CL_DEVICE_GPU_OVERLAP_NV 0x4004 #define CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV 0x4005 #define CL_DEVICE_INTEGRATED_MEMORY_NV 0x4006 /********************************* * cl_amd_device_memory_flags * *********************************/ #define cl_amd_device_memory_flags 1 #define CL_MEM_USE_PERSISTENT_MEM_AMD (1 << 6) // Alloc from GPU's CPU visible heap /* cl_device_info */ #define CL_DEVICE_MAX_ATOMIC_COUNTERS_EXT 0x4032 /********************************* * cl_amd_device_attribute_query * *********************************/ #define CL_DEVICE_PROFILING_TIMER_OFFSET_AMD 0x4036 #define CL_DEVICE_TOPOLOGY_AMD 0x4037 #define CL_DEVICE_BOARD_NAME_AMD 0x4038 #define CL_DEVICE_GLOBAL_FREE_MEMORY_AMD 0x4039 #define CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD 0x4040 #define CL_DEVICE_SIMD_WIDTH_AMD 0x4041 #define CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD 0x4042 #define CL_DEVICE_WAVEFRONT_WIDTH_AMD 0x4043 #define CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD 0x4044 #define CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD 0x4045 #define CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD 0x4046 #define CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD 0x4047 #define CL_DEVICE_LOCAL_MEM_BANKS_AMD 0x4048 #define CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD 0x4049 #define CL_DEVICE_GFXIP_MAJOR_AMD 0x404A #define CL_DEVICE_GFXIP_MINOR_AMD 0x404B #define CL_DEVICE_AVAILABLE_ASYNC_QUEUES_AMD 0x404C #define CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD 0x4030 #define CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD 0x4031 #define CL_DEVICE_PREFERRED_CONSTANT_BUFFER_SIZE_AMD 0x4033 #define CL_DEVICE_PCIE_ID_AMD 0x4034 typedef union { struct { cl_uint type; cl_uint data[5]; } raw; struct { cl_uint type; cl_uchar unused[17]; cl_uchar bus; cl_uchar device; cl_uchar function; } pcie; } cl_device_topology_amd; #define CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD 1 /************************** * cl_amd_offline_devices * **************************/ #define CL_CONTEXT_OFFLINE_DEVICES_AMD 0x403F /******************************** * cl_amd_bus_addressable_memory * ********************************/ /* cl_mem flag - bitfield */ #define CL_MEM_BUS_ADDRESSABLE_AMD (1<<30) #define CL_MEM_EXTERNAL_PHYSICAL_AMD (1<<31) #define CL_COMMAND_WAIT_SIGNAL_AMD 0x4080 #define CL_COMMAND_WRITE_SIGNAL_AMD 0x4081 #define CL_COMMAND_MAKE_BUFFERS_RESIDENT_AMD 0x4082 typedef struct _cl_bus_address_amd { cl_ulong surface_bus_address; cl_ulong marker_bus_address; } cl_bus_address_amd; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueWaitSignalAMD_fn)( cl_command_queue /*command_queue*/, cl_mem /*mem_object*/, cl_uint /*value*/, cl_uint /*num_events*/, const cl_event * /*event_wait_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueWriteSignalAMD_fn)( cl_command_queue /*command_queue*/, cl_mem /*mem_object*/, cl_uint /*value*/, cl_ulong /*offset*/, cl_uint /*num_events*/, const cl_event * /*event_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueMakeBuffersResidentAMD_fn)( cl_command_queue /*command_queue*/, cl_uint /*num_mem_objs*/, cl_mem * /*mem_objects*/, cl_bool /*blocking_make_resident*/, cl_bus_address_amd * /*bus_addresses*/, cl_uint /*num_events*/, const cl_event * /*event_list*/, cl_event * /*event*/) CL_EXT_SUFFIX__VERSION_1_2; /************************* * cl_amd_copy_buffer_p2p * **************************/ #define CL_DEVICE_NUM_P2P_DEVICES_AMD 0x4088 #define CL_DEVICE_P2P_DEVICES_AMD 0x4089 #define cl_amd_copy_buffer_p2p 1 typedef CL_API_ENTRY cl_int (CL_API_CALL * clEnqueueCopyBufferP2PAMD_fn)(cl_command_queue /*command_queue*/, cl_mem /*src_buffer*/, cl_mem /*dst_buffer*/, size_t /*src_offset*/, size_t /*dst_offset*/, size_t /*cb*/, cl_uint /*num_events_in_wait_list*/, const cl_event* /*event_wait_list*/, cl_event* /*event*/) CL_EXT_SUFFIX__VERSION_1_2; /*********************************** * cl_amd_assembly_program extension * ***********************************/ #define cl_amd_assembly_program 1 typedef CL_API_ENTRY cl_program (CL_API_CALL * clCreateProgramWithAssemblyAMD_fn) ( cl_context /* context */, cl_uint /* count */, const char** /* strings */, const size_t* /* lengths */, cl_int* /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2; #ifdef CL_VERSION_2_0 /******************************** * cl_amd_planar_yuv * ********************************/ /* cl_mem flag - bitfield */ #define CL_YUV_IMAGE_Y_PLANE_AMD 0x0 #define CL_YUV_IMAGE_UV_PLANE_AMD 0x1 typedef CL_API_ENTRY cl_mem (CL_API_CALL * clGetPlaneFromImageAMD_fn)(cl_context /*context*/, cl_mem /*mem*/, cl_uint /*plane*/, cl_int * /*errcode_ret*/) CL_EXT_SUFFIX__VERSION_2_0; #endif // /************************** * cl_amd_command_queue_info * **************************/ #define CL_QUEUE_THREAD_HANDLE_AMD 0x403E /* cl_kernel_exec_info for DVR DOPP texture support */ #define CL_KERNEL_EXEC_INFO_NEW_VCOP_AMD 0x4120 #define CL_KERNEL_EXEC_INFO_PFPA_VCOP_AMD 0x4121 /************************* * cl_amd_object_metadata * **************************/ #define cl_amd_object_metadata 1 typedef size_t cl_key_amd; #define CL_INVALID_OBJECT_AMD 0x403A #define CL_INVALID_KEY_AMD 0x403B #define CL_PLATFORM_MAX_KEYS_AMD 0x403C typedef CL_API_ENTRY cl_key_amd (CL_API_CALL * clCreateKeyAMD_fn)( cl_platform_id /* platform */, void (CL_CALLBACK * /* destructor */)( void* /* old_value */), cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL * clObjectGetValueForKeyAMD_fn)( void * /* object */, cl_key_amd /* key */, void ** /* ret_val */) CL_API_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL * clObjectSetValueForKeyAMD_fn)( void * /* object */, cl_key_amd /* key */, void * /* value */) CL_API_SUFFIX__VERSION_1_1; // /********************************* * cl_arm_printf extension *********************************/ #define CL_PRINTF_CALLBACK_ARM 0x40B0 #define CL_PRINTF_BUFFERSIZE_ARM 0x40B1 /*********************************** * cl_ext_device_fission extension ***********************************/ #define cl_ext_device_fission 1 extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDeviceEXT(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clReleaseDeviceEXT_fn)(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDeviceEXT(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clRetainDeviceEXT_fn)(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1; typedef cl_ulong cl_device_partition_property_ext; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevicesEXT(cl_device_id in_device, const cl_device_partition_property_ext * properties, cl_uint num_entries, cl_device_id * out_devices, cl_uint * num_devices) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL * clCreateSubDevicesEXT_fn)(cl_device_id in_device, const cl_device_partition_property_ext * properties, cl_uint num_entries, cl_device_id * out_devices, cl_uint * num_devices) CL_EXT_SUFFIX__VERSION_1_1; /* cl_device_partition_property_ext */ #define CL_DEVICE_PARTITION_EQUALLY_EXT 0x4050 #define CL_DEVICE_PARTITION_BY_COUNTS_EXT 0x4051 #define CL_DEVICE_PARTITION_BY_NAMES_EXT 0x4052 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT 0x4053 /* clDeviceGetInfo selectors */ #define CL_DEVICE_PARENT_DEVICE_EXT 0x4054 #define CL_DEVICE_PARTITION_TYPES_EXT 0x4055 #define CL_DEVICE_AFFINITY_DOMAINS_EXT 0x4056 #define CL_DEVICE_REFERENCE_COUNT_EXT 0x4057 #define CL_DEVICE_PARTITION_STYLE_EXT 0x4058 /* clGetImageInfo enum */ #define CL_IMAGE_BYTE_PITCH_AMD 0x4059 /* error codes */ #define CL_DEVICE_PARTITION_FAILED_EXT -1057 #define CL_INVALID_PARTITION_COUNT_EXT -1058 #define CL_INVALID_PARTITION_NAME_EXT -1059 /* CL_AFFINITY_DOMAINs */ #define CL_AFFINITY_DOMAIN_L1_CACHE_EXT 0x1 #define CL_AFFINITY_DOMAIN_L2_CACHE_EXT 0x2 #define CL_AFFINITY_DOMAIN_L3_CACHE_EXT 0x3 #define CL_AFFINITY_DOMAIN_L4_CACHE_EXT 0x4 #define CL_AFFINITY_DOMAIN_NUMA_EXT 0x10 #define CL_AFFINITY_DOMAIN_NEXT_FISSIONABLE_EXT 0x100 /* cl_device_partition_property_ext list terminators */ #define CL_PROPERTIES_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_COUNTS_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_NAMES_LIST_END_EXT ((cl_device_partition_property_ext) 0 - 1) /*********************************** * cl_ext_migrate_memobject extension definitions ***********************************/ #define cl_ext_migrate_memobject 1 typedef cl_bitfield cl_mem_migration_flags_ext; #define CL_MIGRATE_MEM_OBJECT_HOST_EXT 0x1 #define CL_COMMAND_MIGRATE_MEM_OBJECT_EXT 0x4040 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMigrateMemObjectEXT(cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem * mem_objects, cl_mem_migration_flags_ext flags, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueMigrateMemObjectEXT_fn)(cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem * mem_objects, cl_mem_migration_flags_ext flags, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); /********************************* * cl_qcom_ext_host_ptr extension *********************************/ #define cl_qcom_ext_host_ptr 1 #define CL_MEM_EXT_HOST_PTR_QCOM (1 << 29) #define CL_DEVICE_EXT_MEM_PADDING_IN_BYTES_QCOM 0x40A0 #define CL_DEVICE_PAGE_SIZE_QCOM 0x40A1 #define CL_IMAGE_ROW_ALIGNMENT_QCOM 0x40A2 #define CL_IMAGE_SLICE_ALIGNMENT_QCOM 0x40A3 #define CL_MEM_HOST_UNCACHED_QCOM 0x40A4 #define CL_MEM_HOST_WRITEBACK_QCOM 0x40A5 #define CL_MEM_HOST_WRITETHROUGH_QCOM 0x40A6 #define CL_MEM_HOST_WRITE_COMBINING_QCOM 0x40A7 typedef cl_uint cl_image_pitch_info_qcom; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceImageInfoQCOM(cl_device_id device, size_t image_width, size_t image_height, const cl_image_format *image_format, cl_image_pitch_info_qcom param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); typedef struct _cl_mem_ext_host_ptr { /* Type of external memory allocation. */ /* Legal values will be defined in layered extensions. */ cl_uint allocation_type; /* Host cache policy for this external memory allocation. */ cl_uint host_cache_policy; } cl_mem_ext_host_ptr; /******************************************* * cl_qcom_ext_host_ptr_iocoherent extension ********************************************/ /* Cache policy specifying io-coherence */ #define CL_MEM_HOST_IOCOHERENT_QCOM 0x40A9 /********************************* * cl_qcom_ion_host_ptr extension *********************************/ #define CL_MEM_ION_HOST_PTR_QCOM 0x40A8 typedef struct _cl_mem_ion_host_ptr { /* Type of external memory allocation. */ /* Must be CL_MEM_ION_HOST_PTR_QCOM for ION allocations. */ cl_mem_ext_host_ptr ext_host_ptr; /* ION file descriptor */ int ion_filedesc; /* Host pointer to the ION allocated memory */ void* ion_hostptr; } cl_mem_ion_host_ptr; /********************************* * cl_qcom_android_native_buffer_host_ptr extension *********************************/ #define CL_MEM_ANDROID_NATIVE_BUFFER_HOST_PTR_QCOM 0x40C6 typedef struct _cl_mem_android_native_buffer_host_ptr { /* Type of external memory allocation. */ /* Must be CL_MEM_ANDROID_NATIVE_BUFFER_HOST_PTR_QCOM for Android native buffers. */ cl_mem_ext_host_ptr ext_host_ptr; /* Virtual pointer to the android native buffer */ void* anb_ptr; } cl_mem_android_native_buffer_host_ptr; /****************************************** * cl_img_yuv_image extension * ******************************************/ /* Image formats used in clCreateImage */ #define CL_NV21_IMG 0x40D0 #define CL_YV12_IMG 0x40D1 /****************************************** * cl_img_cached_allocations extension * ******************************************/ /* Flag values used by clCreateBuffer */ #define CL_MEM_USE_UNCACHED_CPU_MEMORY_IMG (1 << 26) #define CL_MEM_USE_CACHED_CPU_MEMORY_IMG (1 << 27) /****************************************** * cl_img_use_gralloc_ptr extension * ******************************************/ #define cl_img_use_gralloc_ptr 1 /* Flag values used by clCreateBuffer */ #define CL_MEM_USE_GRALLOC_PTR_IMG (1 << 28) /* To be used by clGetEventInfo: */ #define CL_COMMAND_ACQUIRE_GRALLOC_OBJECTS_IMG 0x40D2 #define CL_COMMAND_RELEASE_GRALLOC_OBJECTS_IMG 0x40D3 /* Error code from clEnqueueReleaseGrallocObjectsIMG */ #define CL_GRALLOC_RESOURCE_NOT_ACQUIRED_IMG 0x40D4 extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGrallocObjectsIMG(cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGrallocObjectsIMG(cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_EXT_SUFFIX__VERSION_1_2; /********************************* * cl_khr_subgroups extension *********************************/ #define cl_khr_subgroups 1 #if !defined(CL_VERSION_2_1) /* For OpenCL 2.1 and newer, cl_kernel_sub_group_info is declared in CL.h. In hindsight, there should have been a khr suffix on this type for the extension, but keeping it un-suffixed to maintain backwards compatibility. */ typedef cl_uint cl_kernel_sub_group_info; #endif /* cl_kernel_sub_group_info */ #define CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR 0x2033 #define CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE_KHR 0x2034 extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfoKHR(cl_kernel in_kernel, cl_device_id in_device, cl_kernel_sub_group_info param_name, size_t input_value_size, const void * input_value, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED; typedef CL_API_ENTRY cl_int (CL_API_CALL * clGetKernelSubGroupInfoKHR_fn)(cl_kernel in_kernel, cl_device_id in_device, cl_kernel_sub_group_info param_name, size_t input_value_size, const void * input_value, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED; /********************************* * cl_khr_mipmap_image extension *********************************/ /* cl_sampler_properties */ #define CL_SAMPLER_MIP_FILTER_MODE_KHR 0x1155 #define CL_SAMPLER_LOD_MIN_KHR 0x1156 #define CL_SAMPLER_LOD_MAX_KHR 0x1157 /********************************* * cl_khr_priority_hints extension *********************************/ /* This extension define is for backwards compatibility. It shouldn't be required since this extension has no new functions. */ #define cl_khr_priority_hints 1 typedef cl_uint cl_queue_priority_khr; /* cl_command_queue_properties */ #define CL_QUEUE_PRIORITY_KHR 0x1096 /* cl_queue_priority_khr */ #define CL_QUEUE_PRIORITY_HIGH_KHR (1<<0) #define CL_QUEUE_PRIORITY_MED_KHR (1<<1) #define CL_QUEUE_PRIORITY_LOW_KHR (1<<2) /********************************* * cl_khr_throttle_hints extension *********************************/ /* This extension define is for backwards compatibility. It shouldn't be required since this extension has no new functions. */ #define cl_khr_throttle_hints 1 typedef cl_uint cl_queue_throttle_khr; /* cl_command_queue_properties */ #define CL_QUEUE_THROTTLE_KHR 0x1097 /* cl_queue_throttle_khr */ #define CL_QUEUE_THROTTLE_HIGH_KHR (1<<0) #define CL_QUEUE_THROTTLE_MED_KHR (1<<1) #define CL_QUEUE_THROTTLE_LOW_KHR (1<<2) /********************************* * cl_khr_subgroup_named_barrier *********************************/ /* This extension define is for backwards compatibility. It shouldn't be required since this extension has no new functions. */ #define cl_khr_subgroup_named_barrier 1 /* cl_device_info */ #define CL_DEVICE_MAX_NAMED_BARRIER_COUNT_KHR 0x2035 /********************************* * cl_khr_extended_versioning *********************************/ #define CL_VERSION_MAJOR_BITS_KHR (10) #define CL_VERSION_MINOR_BITS_KHR (10) #define CL_VERSION_PATCH_BITS_KHR (12) #define CL_VERSION_MAJOR_MASK_KHR ((1 << CL_VERSION_MAJOR_BITS_KHR) - 1) #define CL_VERSION_MINOR_MASK_KHR ((1 << CL_VERSION_MINOR_BITS_KHR) - 1) #define CL_VERSION_PATCH_MASK_KHR ((1 << CL_VERSION_PATCH_BITS_KHR) - 1) #define CL_VERSION_MAJOR_KHR(version) ((version) >> (CL_VERSION_MINOR_BITS_KHR + CL_VERSION_PATCH_BITS_KHR)) #define CL_VERSION_MINOR_KHR(version) (((version) >> CL_VERSION_PATCH_BITS_KHR) & CL_VERSION_MINOR_MASK_KHR) #define CL_VERSION_PATCH_KHR(version) ((version) & CL_VERSION_PATCH_MASK_KHR) #define CL_MAKE_VERSION_KHR(major, minor, patch) \ ((((major) & CL_VERSION_MAJOR_MASK_KHR) << (CL_VERSION_MINOR_BITS_KHR + CL_VERSION_PATCH_BITS_KHR)) | \ (((minor) & CL_VERSION_MINOR_MASK_KHR) << CL_VERSION_PATCH_BITS_KHR) | \ ((patch) & CL_VERSION_PATCH_MASK_KHR)) typedef cl_uint cl_version_khr; #define CL_NAME_VERSION_MAX_NAME_SIZE_KHR 64 typedef struct _cl_name_version_khr { cl_version_khr version; char name[CL_NAME_VERSION_MAX_NAME_SIZE_KHR]; } cl_name_version_khr; /* cl_platform_info */ #define CL_PLATFORM_NUMERIC_VERSION_KHR 0x0906 #define CL_PLATFORM_EXTENSIONS_WITH_VERSION_KHR 0x0907 /* cl_device_info */ #define CL_DEVICE_NUMERIC_VERSION_KHR 0x105E #define CL_DEVICE_OPENCL_C_NUMERIC_VERSION_KHR 0x105F #define CL_DEVICE_EXTENSIONS_WITH_VERSION_KHR 0x1060 #define CL_DEVICE_ILS_WITH_VERSION_KHR 0x1061 #define CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION_KHR 0x1062 /********************************** * cl_arm_import_memory extension * **********************************/ #define cl_arm_import_memory 1 typedef intptr_t cl_import_properties_arm; /* Default and valid proporties name for cl_arm_import_memory */ #define CL_IMPORT_TYPE_ARM 0x40B2 /* Host process memory type default value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_HOST_ARM 0x40B3 /* DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_DMA_BUF_ARM 0x40B4 /* Protected memory property */ #define CL_IMPORT_TYPE_PROTECTED_ARM 0x40B5 /* Android hardware buffer type value for CL_IMPORT_TYPE_ARM property */ #define CL_IMPORT_TYPE_ANDROID_HARDWARE_BUFFER_ARM 0x41E2 /* Data consistency with host property */ #define CL_IMPORT_DMA_BUF_DATA_CONSISTENCY_WITH_HOST_ARM 0x41E3 /* Import memory size value to indicate a size for the whole buffer */ #define CL_IMPORT_MEMORY_WHOLE_ALLOCATION_ARM SIZE_MAX /* This extension adds a new function that allows for direct memory import into * OpenCL via the clImportMemoryARM function. * * Memory imported through this interface will be mapped into the device's page * tables directly, providing zero copy access. It will never fall back to copy * operations and aliased buffers. * * Types of memory supported for import are specified as additional extension * strings. * * This extension produces cl_mem allocations which are compatible with all other * users of cl_mem in the standard API. * * This extension maps pages with the same properties as the normal buffer creation * function clCreateBuffer. */ extern CL_API_ENTRY cl_mem CL_API_CALL clImportMemoryARM( cl_context context, cl_mem_flags flags, const cl_import_properties_arm *properties, void *memory, size_t size, cl_int *errcode_ret) CL_EXT_SUFFIX__VERSION_1_0; /****************************************** * cl_arm_shared_virtual_memory extension * ******************************************/ #define cl_arm_shared_virtual_memory 1 /* Used by clGetDeviceInfo */ #define CL_DEVICE_SVM_CAPABILITIES_ARM 0x40B6 /* Used by clGetMemObjectInfo */ #define CL_MEM_USES_SVM_POINTER_ARM 0x40B7 /* Used by clSetKernelExecInfoARM: */ #define CL_KERNEL_EXEC_INFO_SVM_PTRS_ARM 0x40B8 #define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM_ARM 0x40B9 /* To be used by clGetEventInfo: */ #define CL_COMMAND_SVM_FREE_ARM 0x40BA #define CL_COMMAND_SVM_MEMCPY_ARM 0x40BB #define CL_COMMAND_SVM_MEMFILL_ARM 0x40BC #define CL_COMMAND_SVM_MAP_ARM 0x40BD #define CL_COMMAND_SVM_UNMAP_ARM 0x40BE /* Flag values returned by clGetDeviceInfo with CL_DEVICE_SVM_CAPABILITIES_ARM as the param_name. */ #define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER_ARM (1 << 0) #define CL_DEVICE_SVM_FINE_GRAIN_BUFFER_ARM (1 << 1) #define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM_ARM (1 << 2) #define CL_DEVICE_SVM_ATOMICS_ARM (1 << 3) /* Flag values used by clSVMAllocARM: */ #define CL_MEM_SVM_FINE_GRAIN_BUFFER_ARM (1 << 10) #define CL_MEM_SVM_ATOMICS_ARM (1 << 11) typedef cl_bitfield cl_svm_mem_flags_arm; typedef cl_uint cl_kernel_exec_info_arm; typedef cl_bitfield cl_device_svm_capabilities_arm; extern CL_API_ENTRY void * CL_API_CALL clSVMAllocARM(cl_context context, cl_svm_mem_flags_arm flags, size_t size, cl_uint alignment) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY void CL_API_CALL clSVMFreeARM(cl_context context, void * svm_pointer) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFreeARM(cl_command_queue command_queue, cl_uint num_svm_pointers, void * svm_pointers[], void (CL_CALLBACK * pfn_free_func)(cl_command_queue queue, cl_uint num_svm_pointers, void * svm_pointers[], void * user_data), void * user_data, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpyARM(cl_command_queue command_queue, cl_bool blocking_copy, void * dst_ptr, const void * src_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFillARM(cl_command_queue command_queue, void * svm_ptr, const void * pattern, size_t pattern_size, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMapARM(cl_command_queue command_queue, cl_bool blocking_map, cl_map_flags flags, void * svm_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmapARM(cl_command_queue command_queue, void * svm_ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointerARM(cl_kernel kernel, cl_uint arg_index, const void * arg_value) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfoARM(cl_kernel kernel, cl_kernel_exec_info_arm param_name, size_t param_value_size, const void * param_value) CL_EXT_SUFFIX__VERSION_1_2; /******************************** * cl_arm_get_core_id extension * ********************************/ #ifdef CL_VERSION_1_2 #define cl_arm_get_core_id 1 /* Device info property for bitfield of cores present */ #define CL_DEVICE_COMPUTE_UNITS_BITFIELD_ARM 0x40BF #endif /* CL_VERSION_1_2 */ /********************************* * cl_arm_job_slot_selection *********************************/ #define cl_arm_job_slot_selection 1 /* cl_device_info */ #define CL_DEVICE_JOB_SLOTS_ARM 0x41E0 /* cl_command_queue_properties */ #define CL_QUEUE_JOB_SLOT_ARM 0x41E1 #ifdef __cplusplus } #endif #endif /* __CL_EXT_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_ext_intel.h000066400000000000000000000446601450307266000244570ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /*****************************************************************************\ Copyright (c) 2013-2019 Intel Corporation All Rights Reserved. THESE MATERIALS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE MATERIALS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. File Name: cl_ext_intel.h Abstract: Notes: \*****************************************************************************/ #ifndef __CL_EXT_INTEL_H #define __CL_EXT_INTEL_H #include #include #ifdef __cplusplus extern "C" { #endif /*************************************** * cl_intel_thread_local_exec extension * ****************************************/ #define cl_intel_thread_local_exec 1 #define CL_QUEUE_THREAD_LOCAL_EXEC_ENABLE_INTEL (((cl_bitfield)1) << 31) /*********************************************** * cl_intel_device_partition_by_names extension * ************************************************/ #define cl_intel_device_partition_by_names 1 #define CL_DEVICE_PARTITION_BY_NAMES_INTEL 0x4052 #define CL_PARTITION_BY_NAMES_LIST_END_INTEL -1 /************************************************ * cl_intel_accelerator extension * * cl_intel_motion_estimation extension * * cl_intel_advanced_motion_estimation extension * *************************************************/ #define cl_intel_accelerator 1 #define cl_intel_motion_estimation 1 #define cl_intel_advanced_motion_estimation 1 typedef struct _cl_accelerator_intel* cl_accelerator_intel; typedef cl_uint cl_accelerator_type_intel; typedef cl_uint cl_accelerator_info_intel; typedef struct _cl_motion_estimation_desc_intel { cl_uint mb_block_type; cl_uint subpixel_mode; cl_uint sad_adjust_mode; cl_uint search_path_type; } cl_motion_estimation_desc_intel; /* error codes */ #define CL_INVALID_ACCELERATOR_INTEL -1094 #define CL_INVALID_ACCELERATOR_TYPE_INTEL -1095 #define CL_INVALID_ACCELERATOR_DESCRIPTOR_INTEL -1096 #define CL_ACCELERATOR_TYPE_NOT_SUPPORTED_INTEL -1097 /* cl_accelerator_type_intel */ #define CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL 0x0 /* cl_accelerator_info_intel */ #define CL_ACCELERATOR_DESCRIPTOR_INTEL 0x4090 #define CL_ACCELERATOR_REFERENCE_COUNT_INTEL 0x4091 #define CL_ACCELERATOR_CONTEXT_INTEL 0x4092 #define CL_ACCELERATOR_TYPE_INTEL 0x4093 /* cl_motion_detect_desc_intel flags */ #define CL_ME_MB_TYPE_16x16_INTEL 0x0 #define CL_ME_MB_TYPE_8x8_INTEL 0x1 #define CL_ME_MB_TYPE_4x4_INTEL 0x2 #define CL_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0 #define CL_ME_SUBPIXEL_MODE_HPEL_INTEL 0x1 #define CL_ME_SUBPIXEL_MODE_QPEL_INTEL 0x2 #define CL_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0 #define CL_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x1 #define CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL 0x0 #define CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL 0x1 #define CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL 0x5 #define CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL 0x0 #define CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL 0x1 #define CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL 0x2 #define CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL 0x4 #define CL_ME_FORWARD_INPUT_MODE_INTEL 0x1 #define CL_ME_BACKWARD_INPUT_MODE_INTEL 0x2 #define CL_ME_BIDIRECTION_INPUT_MODE_INTEL 0x3 #define CL_ME_BIDIR_WEIGHT_QUARTER_INTEL 16 #define CL_ME_BIDIR_WEIGHT_THIRD_INTEL 21 #define CL_ME_BIDIR_WEIGHT_HALF_INTEL 32 #define CL_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL 43 #define CL_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL 48 #define CL_ME_COST_PENALTY_NONE_INTEL 0x0 #define CL_ME_COST_PENALTY_LOW_INTEL 0x1 #define CL_ME_COST_PENALTY_NORMAL_INTEL 0x2 #define CL_ME_COST_PENALTY_HIGH_INTEL 0x3 #define CL_ME_COST_PRECISION_QPEL_INTEL 0x0 #define CL_ME_COST_PRECISION_HPEL_INTEL 0x1 #define CL_ME_COST_PRECISION_PEL_INTEL 0x2 #define CL_ME_COST_PRECISION_DPEL_INTEL 0x3 #define CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL 0x0 #define CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1 #define CL_ME_LUMA_PREDICTOR_MODE_DC_INTEL 0x2 #define CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL 0x3 #define CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL 0x4 #define CL_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL 0x4 #define CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL 0x5 #define CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL 0x6 #define CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL 0x7 #define CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL 0x8 #define CL_ME_CHROMA_PREDICTOR_MODE_DC_INTEL 0x0 #define CL_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1 #define CL_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL 0x2 #define CL_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL 0x3 /* cl_device_info */ #define CL_DEVICE_ME_VERSION_INTEL 0x407E #define CL_ME_VERSION_LEGACY_INTEL 0x0 #define CL_ME_VERSION_ADVANCED_VER_1_INTEL 0x1 #define CL_ME_VERSION_ADVANCED_VER_2_INTEL 0x2 extern CL_API_ENTRY cl_accelerator_intel CL_API_CALL clCreateAcceleratorINTEL( cl_context context, cl_accelerator_type_intel accelerator_type, size_t descriptor_size, const void* descriptor, cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_accelerator_intel (CL_API_CALL *clCreateAcceleratorINTEL_fn)( cl_context context, cl_accelerator_type_intel accelerator_type, size_t descriptor_size, const void* descriptor, cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetAcceleratorInfoINTEL( cl_accelerator_intel accelerator, cl_accelerator_info_intel param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetAcceleratorInfoINTEL_fn)( cl_accelerator_intel accelerator, cl_accelerator_info_intel param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainAcceleratorINTEL( cl_accelerator_intel accelerator) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clRetainAcceleratorINTEL_fn)( cl_accelerator_intel accelerator) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseAcceleratorINTEL( cl_accelerator_intel accelerator) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clReleaseAcceleratorINTEL_fn)( cl_accelerator_intel accelerator) CL_EXT_SUFFIX__VERSION_1_2; /****************************************** * cl_intel_simultaneous_sharing extension * *******************************************/ #define cl_intel_simultaneous_sharing 1 #define CL_DEVICE_SIMULTANEOUS_INTEROPS_INTEL 0x4104 #define CL_DEVICE_NUM_SIMULTANEOUS_INTEROPS_INTEL 0x4105 /*********************************** * cl_intel_egl_image_yuv extension * ************************************/ #define cl_intel_egl_image_yuv 1 #define CL_EGL_YUV_PLANE_INTEL 0x4107 /******************************** * cl_intel_packed_yuv extension * *********************************/ #define cl_intel_packed_yuv 1 #define CL_YUYV_INTEL 0x4076 #define CL_UYVY_INTEL 0x4077 #define CL_YVYU_INTEL 0x4078 #define CL_VYUY_INTEL 0x4079 /******************************************** * cl_intel_required_subgroup_size extension * *********************************************/ #define cl_intel_required_subgroup_size 1 #define CL_DEVICE_SUB_GROUP_SIZES_INTEL 0x4108 #define CL_KERNEL_SPILL_MEM_SIZE_INTEL 0x4109 #define CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL 0x410A /**************************************** * cl_intel_driver_diagnostics extension * *****************************************/ #define cl_intel_driver_diagnostics 1 typedef cl_uint cl_diagnostics_verbose_level; #define CL_CONTEXT_SHOW_DIAGNOSTICS_INTEL 0x4106 #define CL_CONTEXT_DIAGNOSTICS_LEVEL_ALL_INTEL ( 0xff ) #define CL_CONTEXT_DIAGNOSTICS_LEVEL_GOOD_INTEL ( 1 ) #define CL_CONTEXT_DIAGNOSTICS_LEVEL_BAD_INTEL ( 1 << 1 ) #define CL_CONTEXT_DIAGNOSTICS_LEVEL_NEUTRAL_INTEL ( 1 << 2 ) /******************************** * cl_intel_planar_yuv extension * *********************************/ #define CL_NV12_INTEL 0x410E #define CL_MEM_NO_ACCESS_INTEL ( 1 << 24 ) #define CL_MEM_ACCESS_FLAGS_UNRESTRICTED_INTEL ( 1 << 25 ) #define CL_DEVICE_PLANAR_YUV_MAX_WIDTH_INTEL 0x417E #define CL_DEVICE_PLANAR_YUV_MAX_HEIGHT_INTEL 0x417F /******************************************************* * cl_intel_device_side_avc_motion_estimation extension * ********************************************************/ #define CL_DEVICE_AVC_ME_VERSION_INTEL 0x410B #define CL_DEVICE_AVC_ME_SUPPORTS_TEXTURE_SAMPLER_USE_INTEL 0x410C #define CL_DEVICE_AVC_ME_SUPPORTS_PREEMPTION_INTEL 0x410D #define CL_AVC_ME_VERSION_0_INTEL 0x0; // No support. #define CL_AVC_ME_VERSION_1_INTEL 0x1; // First supported version. #define CL_AVC_ME_MAJOR_16x16_INTEL 0x0 #define CL_AVC_ME_MAJOR_16x8_INTEL 0x1 #define CL_AVC_ME_MAJOR_8x16_INTEL 0x2 #define CL_AVC_ME_MAJOR_8x8_INTEL 0x3 #define CL_AVC_ME_MINOR_8x8_INTEL 0x0 #define CL_AVC_ME_MINOR_8x4_INTEL 0x1 #define CL_AVC_ME_MINOR_4x8_INTEL 0x2 #define CL_AVC_ME_MINOR_4x4_INTEL 0x3 #define CL_AVC_ME_MAJOR_FORWARD_INTEL 0x0 #define CL_AVC_ME_MAJOR_BACKWARD_INTEL 0x1 #define CL_AVC_ME_MAJOR_BIDIRECTIONAL_INTEL 0x2 #define CL_AVC_ME_PARTITION_MASK_ALL_INTEL 0x0 #define CL_AVC_ME_PARTITION_MASK_16x16_INTEL 0x7E #define CL_AVC_ME_PARTITION_MASK_16x8_INTEL 0x7D #define CL_AVC_ME_PARTITION_MASK_8x16_INTEL 0x7B #define CL_AVC_ME_PARTITION_MASK_8x8_INTEL 0x77 #define CL_AVC_ME_PARTITION_MASK_8x4_INTEL 0x6F #define CL_AVC_ME_PARTITION_MASK_4x8_INTEL 0x5F #define CL_AVC_ME_PARTITION_MASK_4x4_INTEL 0x3F #define CL_AVC_ME_SEARCH_WINDOW_EXHAUSTIVE_INTEL 0x0 #define CL_AVC_ME_SEARCH_WINDOW_SMALL_INTEL 0x1 #define CL_AVC_ME_SEARCH_WINDOW_TINY_INTEL 0x2 #define CL_AVC_ME_SEARCH_WINDOW_EXTRA_TINY_INTEL 0x3 #define CL_AVC_ME_SEARCH_WINDOW_DIAMOND_INTEL 0x4 #define CL_AVC_ME_SEARCH_WINDOW_LARGE_DIAMOND_INTEL 0x5 #define CL_AVC_ME_SEARCH_WINDOW_RESERVED0_INTEL 0x6 #define CL_AVC_ME_SEARCH_WINDOW_RESERVED1_INTEL 0x7 #define CL_AVC_ME_SEARCH_WINDOW_CUSTOM_INTEL 0x8 #define CL_AVC_ME_SEARCH_WINDOW_16x12_RADIUS_INTEL 0x9 #define CL_AVC_ME_SEARCH_WINDOW_4x4_RADIUS_INTEL 0x2 #define CL_AVC_ME_SEARCH_WINDOW_2x2_RADIUS_INTEL 0xa #define CL_AVC_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0 #define CL_AVC_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x2 #define CL_AVC_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0 #define CL_AVC_ME_SUBPIXEL_MODE_HPEL_INTEL 0x1 #define CL_AVC_ME_SUBPIXEL_MODE_QPEL_INTEL 0x3 #define CL_AVC_ME_COST_PRECISION_QPEL_INTEL 0x0 #define CL_AVC_ME_COST_PRECISION_HPEL_INTEL 0x1 #define CL_AVC_ME_COST_PRECISION_PEL_INTEL 0x2 #define CL_AVC_ME_COST_PRECISION_DPEL_INTEL 0x3 #define CL_AVC_ME_BIDIR_WEIGHT_QUARTER_INTEL 0x10 #define CL_AVC_ME_BIDIR_WEIGHT_THIRD_INTEL 0x15 #define CL_AVC_ME_BIDIR_WEIGHT_HALF_INTEL 0x20 #define CL_AVC_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL 0x2B #define CL_AVC_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL 0x30 #define CL_AVC_ME_BORDER_REACHED_LEFT_INTEL 0x0 #define CL_AVC_ME_BORDER_REACHED_RIGHT_INTEL 0x2 #define CL_AVC_ME_BORDER_REACHED_TOP_INTEL 0x4 #define CL_AVC_ME_BORDER_REACHED_BOTTOM_INTEL 0x8 #define CL_AVC_ME_SKIP_BLOCK_PARTITION_16x16_INTEL 0x0 #define CL_AVC_ME_SKIP_BLOCK_PARTITION_8x8_INTEL 0x4000 #define CL_AVC_ME_SKIP_BLOCK_16x16_FORWARD_ENABLE_INTEL ( 0x1 << 24 ) #define CL_AVC_ME_SKIP_BLOCK_16x16_BACKWARD_ENABLE_INTEL ( 0x2 << 24 ) #define CL_AVC_ME_SKIP_BLOCK_16x16_DUAL_ENABLE_INTEL ( 0x3 << 24 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_FORWARD_ENABLE_INTEL ( 0x55 << 24 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_BACKWARD_ENABLE_INTEL ( 0xAA << 24 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_DUAL_ENABLE_INTEL ( 0xFF << 24 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_0_FORWARD_ENABLE_INTEL ( 0x1 << 24 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_0_BACKWARD_ENABLE_INTEL ( 0x2 << 24 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_1_FORWARD_ENABLE_INTEL ( 0x1 << 26 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_1_BACKWARD_ENABLE_INTEL ( 0x2 << 26 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_2_FORWARD_ENABLE_INTEL ( 0x1 << 28 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_2_BACKWARD_ENABLE_INTEL ( 0x2 << 28 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_3_FORWARD_ENABLE_INTEL ( 0x1 << 30 ) #define CL_AVC_ME_SKIP_BLOCK_8x8_3_BACKWARD_ENABLE_INTEL ( 0x2 << 30 ) #define CL_AVC_ME_BLOCK_BASED_SKIP_4x4_INTEL 0x00 #define CL_AVC_ME_BLOCK_BASED_SKIP_8x8_INTEL 0x80 #define CL_AVC_ME_INTRA_16x16_INTEL 0x0 #define CL_AVC_ME_INTRA_8x8_INTEL 0x1 #define CL_AVC_ME_INTRA_4x4_INTEL 0x2 #define CL_AVC_ME_INTRA_LUMA_PARTITION_MASK_16x16_INTEL 0x6 #define CL_AVC_ME_INTRA_LUMA_PARTITION_MASK_8x8_INTEL 0x5 #define CL_AVC_ME_INTRA_LUMA_PARTITION_MASK_4x4_INTEL 0x3 #define CL_AVC_ME_INTRA_NEIGHBOR_LEFT_MASK_ENABLE_INTEL 0x60 #define CL_AVC_ME_INTRA_NEIGHBOR_UPPER_MASK_ENABLE_INTEL 0x10 #define CL_AVC_ME_INTRA_NEIGHBOR_UPPER_RIGHT_MASK_ENABLE_INTEL 0x8 #define CL_AVC_ME_INTRA_NEIGHBOR_UPPER_LEFT_MASK_ENABLE_INTEL 0x4 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL 0x0 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_DC_INTEL 0x2 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL 0x3 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL 0x4 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL 0x4 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL 0x5 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL 0x6 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL 0x7 #define CL_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL 0x8 #define CL_AVC_ME_CHROMA_PREDICTOR_MODE_DC_INTEL 0x0 #define CL_AVC_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1 #define CL_AVC_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL 0x2 #define CL_AVC_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL 0x3 #define CL_AVC_ME_FRAME_FORWARD_INTEL 0x1 #define CL_AVC_ME_FRAME_BACKWARD_INTEL 0x2 #define CL_AVC_ME_FRAME_DUAL_INTEL 0x3 #define CL_AVC_ME_SLICE_TYPE_PRED_INTEL 0x0 #define CL_AVC_ME_SLICE_TYPE_BPRED_INTEL 0x1 #define CL_AVC_ME_SLICE_TYPE_INTRA_INTEL 0x2 #define CL_AVC_ME_INTERLACED_SCAN_TOP_FIELD_INTEL 0x0 #define CL_AVC_ME_INTERLACED_SCAN_BOTTOM_FIELD_INTEL 0x1 #ifdef __cplusplus } #endif #endif /* __CL_EXT_INTEL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_gl.h000066400000000000000000000145121450307266000230570ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ #ifndef __OPENCL_CL_GL_H #define __OPENCL_CL_GL_H #include #ifdef __cplusplus extern "C" { #endif typedef cl_uint cl_gl_object_type; typedef cl_uint cl_gl_texture_info; typedef cl_uint cl_gl_platform_info; typedef struct __GLsync *cl_GLsync; /* cl_gl_object_type = 0x2000 - 0x200F enum values are currently taken */ #define CL_GL_OBJECT_BUFFER 0x2000 #define CL_GL_OBJECT_TEXTURE2D 0x2001 #define CL_GL_OBJECT_TEXTURE3D 0x2002 #define CL_GL_OBJECT_RENDERBUFFER 0x2003 #ifdef CL_VERSION_1_2 #define CL_GL_OBJECT_TEXTURE2D_ARRAY 0x200E #define CL_GL_OBJECT_TEXTURE1D 0x200F #define CL_GL_OBJECT_TEXTURE1D_ARRAY 0x2010 #define CL_GL_OBJECT_TEXTURE_BUFFER 0x2011 #endif /* cl_gl_texture_info */ #define CL_GL_TEXTURE_TARGET 0x2004 #define CL_GL_MIPMAP_LEVEL 0x2005 #ifdef CL_VERSION_1_2 #define CL_GL_NUM_SAMPLES 0x2012 #endif extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLBuffer(cl_context context, cl_mem_flags flags, cl_GLuint bufobj, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture(cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; #endif extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLRenderbuffer(cl_context context, cl_mem_flags flags, cl_GLuint renderbuffer, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLObjectInfo(cl_mem memobj, cl_gl_object_type * gl_object_type, cl_GLuint * gl_object_name) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLTextureInfo(cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGLObjects(cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGLObjects(cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture2D(cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture3D(cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* cl_khr_gl_sharing extension */ #define cl_khr_gl_sharing 1 typedef cl_uint cl_gl_context_info; /* Additional Error Codes */ #define CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR -1000 /* cl_gl_context_info */ #define CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR 0x2006 #define CL_DEVICES_FOR_GL_CONTEXT_KHR 0x2007 /* Additional cl_context_properties */ #define CL_GL_CONTEXT_KHR 0x2008 #define CL_EGL_DISPLAY_KHR 0x2009 #define CL_GLX_DISPLAY_KHR 0x200A #define CL_WGL_HDC_KHR 0x200B #define CL_CGL_SHAREGROUP_KHR 0x200C extern CL_API_ENTRY cl_int CL_API_CALL clGetGLContextInfoKHR(const cl_context_properties * properties, cl_gl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetGLContextInfoKHR_fn)( const cl_context_properties * properties, cl_gl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_gl_ext.h000066400000000000000000000023631450307266000237400ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ #ifndef __OPENCL_CL_GL_EXT_H #define __OPENCL_CL_GL_EXT_H #ifdef __cplusplus extern "C" { #endif #include /* * cl_khr_gl_event extension */ #define CL_COMMAND_GL_FENCE_SYNC_OBJECT_KHR 0x200D extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromGLsyncKHR(cl_context context, cl_GLsync cl_GLsync, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_EXT_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_icd.h000066400000000000000000001472461450307266000232270ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ #ifndef OPENCL_CL_ICD_H #define OPENCL_CL_ICD_H #include #include #include #include #ifdef __cplusplus extern "C" { #endif /* * This file contains pointer type definitions for each of the CL API calls as * well as a type definition for the dispatch table used by the Khronos ICD * loader (see cl_khr_icd extension specification for background). */ /* API function pointer definitions */ // Platform APIs typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetPlatformIDs)( cl_uint num_entries, cl_platform_id *platforms, cl_uint *num_platforms) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetPlatformInfo)( cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; // Device APIs typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetDeviceIDs)( cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetDeviceInfo)( cl_device_id device, cl_device_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clCreateSubDevices)( cl_device_id in_device, const cl_device_partition_property *partition_properties, cl_uint num_entries, cl_device_id *out_devices, cl_uint *num_devices); typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainDevice)( cl_device_id device) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseDevice)( cl_device_id device) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clCreateSubDevices; typedef void *cl_api_clRetainDevice; typedef void *cl_api_clReleaseDevice; #endif // Context APIs typedef CL_API_ENTRY cl_context(CL_API_CALL *cl_api_clCreateContext)( const cl_context_properties *properties, cl_uint num_devices, const cl_device_id *devices, void(CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void *user_data, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_context(CL_API_CALL *cl_api_clCreateContextFromType)( const cl_context_properties *properties, cl_device_type device_type, void(CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void *user_data, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainContext)( cl_context context) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseContext)( cl_context context) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetContextInfo)( cl_context context, cl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; // Command Queue APIs typedef CL_API_ENTRY cl_command_queue(CL_API_CALL *cl_api_clCreateCommandQueue)( cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_2_0 typedef CL_API_ENTRY cl_command_queue(CL_API_CALL *cl_api_clCreateCommandQueueWithProperties)( cl_context /* context */, cl_device_id /* device */, const cl_queue_properties * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; #else typedef void *cl_api_clCreateCommandQueueWithProperties; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainCommandQueue)( cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseCommandQueue)( cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetCommandQueueInfo)( cl_command_queue command_queue, cl_command_queue_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; // Memory Object APIs typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateBuffer)( cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateImage)( cl_context context, cl_mem_flags flags, const cl_image_format *image_format, const cl_image_desc *image_desc, void *host_ptr, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clCreateImage; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainMemObject)( cl_mem memobj) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseMemObject)( cl_mem memobj) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetSupportedImageFormats)( cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format *image_formats, cl_uint *num_image_formats) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetMemObjectInfo)( cl_mem memobj, cl_mem_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetImageInfo)( cl_mem image, cl_image_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_2_0 typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreatePipe)( cl_context /* context */, cl_mem_flags /* flags */, cl_uint /* pipe_packet_size */, cl_uint /* pipe_max_packets */, const cl_pipe_properties * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetPipeInfo)( cl_mem /* pipe */, cl_pipe_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY void *(CL_API_CALL *cl_api_clSVMAlloc)( cl_context /* context */, cl_svm_mem_flags /* flags */, size_t /* size */, unsigned int /* alignment */)CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY void(CL_API_CALL *cl_api_clSVMFree)( cl_context /* context */, void * /* svm_pointer */) CL_API_SUFFIX__VERSION_2_0; #else typedef void *cl_api_clCreatePipe; typedef void *cl_api_clGetPipeInfo; typedef void *cl_api_clSVMAlloc; typedef void *cl_api_clSVMFree; #endif // Sampler APIs typedef CL_API_ENTRY cl_sampler(CL_API_CALL *cl_api_clCreateSampler)( cl_context context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainSampler)( cl_sampler sampler) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseSampler)( cl_sampler sampler) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetSamplerInfo)( cl_sampler sampler, cl_sampler_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_2_0 typedef CL_API_ENTRY cl_sampler(CL_API_CALL *cl_api_clCreateSamplerWithProperties)( cl_context /* context */, const cl_sampler_properties * /* sampler_properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; #else typedef void *cl_api_clCreateSamplerWithProperties; #endif // Program Object APIs typedef CL_API_ENTRY cl_program(CL_API_CALL *cl_api_clCreateProgramWithSource)( cl_context context, cl_uint count, const char **strings, const size_t *lengths, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_program(CL_API_CALL *cl_api_clCreateProgramWithBinary)( cl_context context, cl_uint num_devices, const cl_device_id *device_list, const size_t *lengths, const unsigned char **binaries, cl_int *binary_status, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_program(CL_API_CALL *cl_api_clCreateProgramWithBuiltInKernels)( cl_context context, cl_uint num_devices, const cl_device_id *device_list, const char *kernel_names, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clCreateProgramWithBuiltInKernels; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainProgram)( cl_program program) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseProgram)( cl_program program) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clBuildProgram)( cl_program program, cl_uint num_devices, const cl_device_id *device_list, const char *options, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clCompileProgram)( cl_program program, cl_uint num_devices, const cl_device_id *device_list, const char *options, cl_uint num_input_headers, const cl_program *input_headers, const char **header_include_names, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_program(CL_API_CALL *cl_api_clLinkProgram)( cl_context context, cl_uint num_devices, const cl_device_id *device_list, const char *options, cl_uint num_input_programs, const cl_program *input_programs, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clCompileProgram; typedef void *cl_api_clLinkProgram; #endif #ifdef CL_VERSION_2_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetProgramSpecializationConstant)( cl_program program, cl_uint spec_id, size_t spec_size, const void *spec_value) CL_API_SUFFIX__VERSION_2_2; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetProgramReleaseCallback)( cl_program program, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data) CL_API_SUFFIX__VERSION_2_2; #else typedef void *cl_api_clSetProgramSpecializationConstant; typedef void *cl_api_clSetProgramReleaseCallback; #endif #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clUnloadPlatformCompiler)( cl_platform_id platform) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clUnloadPlatformCompiler; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetProgramInfo)( cl_program program, cl_program_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetProgramBuildInfo)( cl_program program, cl_device_id device, cl_program_build_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; // Kernel Object APIs typedef CL_API_ENTRY cl_kernel(CL_API_CALL *cl_api_clCreateKernel)( cl_program program, const char *kernel_name, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clCreateKernelsInProgram)( cl_program program, cl_uint num_kernels, cl_kernel *kernels, cl_uint *num_kernels_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainKernel)( cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseKernel)( cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetKernelArg)( cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void *arg_value) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetKernelInfo)( cl_kernel kernel, cl_kernel_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetKernelArgInfo)( cl_kernel kernel, cl_uint arg_indx, cl_kernel_arg_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clGetKernelArgInfo; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetKernelWorkGroupInfo)( cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_2_0 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetKernelArgSVMPointer)( cl_kernel /* kernel */, cl_uint /* arg_index */, const void * /* arg_value */) CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetKernelExecInfo)( cl_kernel /* kernel */, cl_kernel_exec_info /* param_name */, size_t /* param_value_size */, const void * /* param_value */) CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetKernelSubGroupInfoKHR)( cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void * /*param_value*/, size_t * /*param_value_size_ret*/) CL_EXT_SUFFIX__VERSION_2_0; #else typedef void *cl_api_clSetKernelArgSVMPointer; typedef void *cl_api_clSetKernelExecInfo; typedef void *cl_api_clGetKernelSubGroupInfoKHR; #endif // Event Object APIs typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clWaitForEvents)( cl_uint num_events, const cl_event *event_list) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetEventInfo)( cl_event event, cl_event_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainEvent)(cl_event event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseEvent)(cl_event event) CL_API_SUFFIX__VERSION_1_0; // Profiling APIs typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetEventProfilingInfo)( cl_event event, cl_profiling_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; // Flush and Finish APIs typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clFlush)( cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clFinish)( cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0; // Enqueued Commands APIs typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueReadBuffer)( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t cb, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueReadBufferRect)( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_1; #else typedef void *cl_api_clEnqueueReadBufferRect; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueWriteBuffer)( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t cb, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueWriteBufferRect)( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_1; #else typedef void *cl_api_clEnqueueWriteBufferRect; #endif #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueFillBuffer)( cl_command_queue command_queue, cl_mem buffer, const void *pattern, size_t pattern_size, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clEnqueueFillBuffer; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueCopyBuffer)( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_1 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueCopyBufferRect)( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, const size_t *src_origin, const size_t *dst_origin, const size_t *region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_1; #else typedef void *cl_api_clEnqueueCopyBufferRect; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueReadImage)( cl_command_queue command_queue, cl_mem image, cl_bool blocking_read, const size_t *origin, const size_t *region, size_t row_pitch, size_t slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueWriteImage)( cl_command_queue command_queue, cl_mem image, cl_bool blocking_write, const size_t *origin, const size_t *region, size_t input_row_pitch, size_t input_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueFillImage)( cl_command_queue command_queue, cl_mem image, const void *fill_color, const size_t origin[3], const size_t region[3], cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clEnqueueFillImage; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueCopyImage)( cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image, const size_t *src_origin, const size_t *dst_origin, const size_t *region, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueCopyImageToBuffer)( cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buffer, const size_t *src_origin, const size_t *region, size_t dst_offset, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueCopyBufferToImage)( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_image, size_t src_offset, const size_t *dst_origin, const size_t *region, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY void *(CL_API_CALL *cl_api_clEnqueueMapBuffer)( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map, cl_map_flags map_flags, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event, cl_int *errcode_ret)CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY void *(CL_API_CALL *cl_api_clEnqueueMapImage)( cl_command_queue command_queue, cl_mem image, cl_bool blocking_map, cl_map_flags map_flags, const size_t *origin, const size_t *region, size_t *image_row_pitch, size_t *image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event, cl_int *errcode_ret)CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueUnmapMemObject)( cl_command_queue command_queue, cl_mem memobj, void *mapped_ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueMigrateMemObjects)( cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem *mem_objects, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clEnqueueMigrateMemObjects; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueNDRangeKernel)( cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t *global_work_offset, const size_t *global_work_size, const size_t *local_work_size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueTask)( cl_command_queue command_queue, cl_kernel kernel, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueNativeKernel)( cl_command_queue command_queue, void(CL_CALLBACK *user_func)(void *), void *args, size_t cb_args, cl_uint num_mem_objects, const cl_mem *mem_list, const void **args_mem_loc, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; #ifdef CL_VERSION_1_2 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueMarkerWithWaitList)( cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueBarrierWithWaitList)( cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY void *( CL_API_CALL *cl_api_clGetExtensionFunctionAddressForPlatform)( cl_platform_id platform, const char *function_name)CL_API_SUFFIX__VERSION_1_2; #else typedef void *cl_api_clEnqueueMarkerWithWaitList; typedef void *cl_api_clEnqueueBarrierWithWaitList; typedef void *cl_api_clGetExtensionFunctionAddressForPlatform; #endif // Shared Virtual Memory APIs #ifdef CL_VERSION_2_0 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueSVMFree)( cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, void ** /* svm_pointers */, void(CL_CALLBACK *pfn_free_func)(cl_command_queue /* queue */, cl_uint /* num_svm_pointers */, void ** /* svm_pointers[] */, void * /* user_data */), void * /* user_data */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueSVMMemcpy)( cl_command_queue /* command_queue */, cl_bool /* blocking_copy */, void * /* dst_ptr */, const void * /* src_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueSVMMemFill)( cl_command_queue /* command_queue */, void * /* svm_ptr */, const void * /* pattern */, size_t /* pattern_size */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueSVMMap)( cl_command_queue /* command_queue */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, void * /* svm_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueSVMUnmap)( cl_command_queue /* command_queue */, void * /* svm_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; #else typedef void *cl_api_clEnqueueSVMFree; typedef void *cl_api_clEnqueueSVMMemcpy; typedef void *cl_api_clEnqueueSVMMemFill; typedef void *cl_api_clEnqueueSVMMap; typedef void *cl_api_clEnqueueSVMUnmap; #endif // Deprecated APIs typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetCommandQueueProperty)( cl_command_queue command_queue, cl_command_queue_properties properties, cl_bool enable, cl_command_queue_properties *old_properties) CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateImage2D)( cl_context context, cl_mem_flags flags, const cl_image_format *image_format, size_t image_width, size_t image_height, size_t image_row_pitch, void *host_ptr, cl_int *errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateImage3D)( cl_context context, cl_mem_flags flags, const cl_image_format *image_format, size_t image_width, size_t image_height, size_t image_depth, size_t image_row_pitch, size_t image_slice_pitch, void *host_ptr, cl_int *errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clUnloadCompiler)(void) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueMarker)( cl_command_queue command_queue, cl_event *event) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueWaitForEvents)( cl_command_queue command_queue, cl_uint num_events, const cl_event *event_list) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueBarrier)( cl_command_queue command_queue) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; typedef CL_API_ENTRY void *(CL_API_CALL *cl_api_clGetExtensionFunctionAddress)( const char *function_name)CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; // GL and other APIs typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromGLBuffer)( cl_context context, cl_mem_flags flags, cl_GLuint bufobj, int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromGLTexture)( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromGLTexture2D)( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromGLTexture3D)( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromGLRenderbuffer)( cl_context context, cl_mem_flags flags, cl_GLuint renderbuffer, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetGLObjectInfo)( cl_mem memobj, cl_gl_object_type *gl_object_type, cl_GLuint *gl_object_name) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetGLTextureInfo)( cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueAcquireGLObjects)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueReleaseGLObjects)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; /* cl_khr_gl_sharing */ typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetGLContextInfoKHR)( const cl_context_properties *properties, cl_gl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); /* cl_khr_gl_event */ typedef CL_API_ENTRY cl_event(CL_API_CALL *cl_api_clCreateEventFromGLsyncKHR)( cl_context context, cl_GLsync sync, cl_int *errcode_ret); #if defined(_WIN32) /* cl_khr_d3d10_sharing */ typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetDeviceIDsFromD3D10KHR)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void *d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromD3D10BufferKHR)( cl_context context, cl_mem_flags flags, ID3D10Buffer *resource, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromD3D10Texture2DKHR)( cl_context context, cl_mem_flags flags, ID3D10Texture2D *resource, UINT subresource, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromD3D10Texture3DKHR)( cl_context context, cl_mem_flags flags, ID3D10Texture3D *resource, UINT subresource, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueAcquireD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueReleaseD3D10ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDsFromD3D10KHR( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void *d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D10BufferKHR(cl_context context, cl_mem_flags flags, ID3D10Buffer *resource, cl_int *errcode_ret); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D10Texture2DKHR( cl_context context, cl_mem_flags flags, ID3D10Texture2D *resource, UINT subresource, cl_int *errcode_ret); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D10Texture3DKHR( cl_context context, cl_mem_flags flags, ID3D10Texture3D *resource, UINT subresource, cl_int *errcode_ret); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireD3D10ObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseD3D10ObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); /* cl_khr_d3d11_sharing */ typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetDeviceIDsFromD3D11KHR)( cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void *d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromD3D11BufferKHR)( cl_context context, cl_mem_flags flags, ID3D11Buffer *resource, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromD3D11Texture2DKHR)( cl_context context, cl_mem_flags flags, ID3D11Texture2D *resource, UINT subresource, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromD3D11Texture3DKHR)( cl_context context, cl_mem_flags flags, ID3D11Texture3D *resource, UINT subresource, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueAcquireD3D11ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueReleaseD3D11ObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; /* cl_khr_dx9_media_sharing */ typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetDeviceIDsFromDX9MediaAdapterKHR)( cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr *media_adapters_type, void *media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromDX9MediaSurfaceKHR)( cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void *surface_info, cl_uint plane, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueAcquireDX9MediaSurfacesKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueReleaseDX9MediaSurfacesKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_1_2; /* cl_khr_d3d11_sharing */ extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDsFromD3D11KHR( cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void *d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D11BufferKHR(cl_context context, cl_mem_flags flags, ID3D11Buffer *resource, cl_int *errcode_ret); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D11Texture2DKHR( cl_context context, cl_mem_flags flags, ID3D11Texture2D *resource, UINT subresource, cl_int *errcode_ret); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D11Texture3DKHR( cl_context context, cl_mem_flags flags, ID3D11Texture3D *resource, UINT subresource, cl_int *errcode_ret); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireD3D11ObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseD3D11ObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); /* cl_khr_dx9_media_sharing */ extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDsFromDX9MediaAdapterKHR( cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr *media_adapter_type, void *media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromDX9MediaSurfaceKHR( cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void *surface_info, cl_uint plane, cl_int *errcode_ret); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireDX9MediaSurfacesKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseDX9MediaSurfacesKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); #else /* cl_khr_d3d10_sharing */ typedef void *cl_api_clGetDeviceIDsFromD3D10KHR; typedef void *cl_api_clCreateFromD3D10BufferKHR; typedef void *cl_api_clCreateFromD3D10Texture2DKHR; typedef void *cl_api_clCreateFromD3D10Texture3DKHR; typedef void *cl_api_clEnqueueAcquireD3D10ObjectsKHR; typedef void *cl_api_clEnqueueReleaseD3D10ObjectsKHR; /* cl_khr_d3d11_sharing */ typedef void *cl_api_clGetDeviceIDsFromD3D11KHR; typedef void *cl_api_clCreateFromD3D11BufferKHR; typedef void *cl_api_clCreateFromD3D11Texture2DKHR; typedef void *cl_api_clCreateFromD3D11Texture3DKHR; typedef void *cl_api_clEnqueueAcquireD3D11ObjectsKHR; typedef void *cl_api_clEnqueueReleaseD3D11ObjectsKHR; /* cl_khr_dx9_media_sharing */ typedef void *cl_api_clCreateFromDX9MediaSurfaceKHR; typedef void *cl_api_clEnqueueAcquireDX9MediaSurfacesKHR; typedef void *cl_api_clEnqueueReleaseDX9MediaSurfacesKHR; typedef void *cl_api_clGetDeviceIDsFromDX9MediaAdapterKHR; #endif /* OpenCL 1.1 */ #ifdef CL_VERSION_1_1 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetEventCallback)( cl_event /* event */, cl_int /* command_exec_callback_type */, void(CL_CALLBACK * /* pfn_notify */)(cl_event, cl_int, void *), void * /* user_data */) CL_API_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateSubBuffer)( cl_mem /* buffer */, cl_mem_flags /* flags */, cl_buffer_create_type /* buffer_create_type */, const void * /* buffer_create_info */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetMemObjectDestructorCallback)( cl_mem /* memobj */, void(CL_CALLBACK * /*pfn_notify*/)(cl_mem /* memobj */, void * /*user_data*/), void * /*user_data */) CL_API_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_event(CL_API_CALL *cl_api_clCreateUserEvent)( cl_context /* context */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetUserEventStatus)( cl_event /* event */, cl_int /* execution_status */) CL_API_SUFFIX__VERSION_1_1; #else typedef void *cl_api_clSetEventCallback; typedef void *cl_api_clCreateSubBuffer; typedef void *cl_api_clSetMemObjectDestructorCallback; typedef void *cl_api_clCreateUserEvent; typedef void *cl_api_clSetUserEventStatus; #endif typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clCreateSubDevicesEXT)( cl_device_id in_device, const cl_device_partition_property_ext *partition_properties, cl_uint num_entries, cl_device_id *out_devices, cl_uint *num_devices); typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clRetainDeviceEXT)( cl_device_id device) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clReleaseDeviceEXT)( cl_device_id device) CL_API_SUFFIX__VERSION_1_0; /* cl_khr_egl_image */ typedef CL_API_ENTRY cl_mem(CL_API_CALL *cl_api_clCreateFromEGLImageKHR)( cl_context context, CLeglDisplayKHR display, CLeglImageKHR image, cl_mem_flags flags, const cl_egl_image_properties_khr *properties, cl_int *errcode_ret); typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueAcquireEGLObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueReleaseEGLObjectsKHR)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); /* cl_khr_egl_event */ typedef CL_API_ENTRY cl_event(CL_API_CALL *cl_api_clCreateEventFromEGLSyncKHR)( cl_context context, CLeglSyncKHR sync, CLeglDisplayKHR display, cl_int *errcode_ret); #ifdef CL_VERSION_2_1 typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clSetDefaultDeviceCommandQueue)( cl_context context, cl_device_id device, cl_command_queue command_queue) CL_API_SUFFIX__VERSION_2_1; typedef CL_API_ENTRY cl_program(CL_API_CALL *cl_api_clCreateProgramWithIL)( cl_context context, const void *il, size_t length, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_2_1; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetKernelSubGroupInfo)( cl_kernel kernel, cl_device_id device, cl_kernel_sub_group_info param_name, size_t input_value_size, const void *input_value, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_2_1; typedef CL_API_ENTRY cl_kernel(CL_API_CALL *cl_api_clCloneKernel)( cl_kernel source_kernel, cl_int *errcode_ret) CL_API_SUFFIX__VERSION_2_1; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clEnqueueSVMMigrateMem)( cl_command_queue command_queue, cl_uint num_svm_pointers, const void **svm_pointers, const size_t *sizes, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) CL_API_SUFFIX__VERSION_2_1; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetDeviceAndHostTimer)( cl_device_id device, cl_ulong *device_timestamp, cl_ulong *host_timestamp) CL_API_SUFFIX__VERSION_2_1; typedef CL_API_ENTRY cl_int(CL_API_CALL *cl_api_clGetHostTimer)( cl_device_id device, cl_ulong *host_timestamp) CL_API_SUFFIX__VERSION_2_1; #else typedef void *cl_api_clSetDefaultDeviceCommandQueue; typedef void *cl_api_clCreateProgramWithIL; typedef void *cl_api_clGetKernelSubGroupInfo; typedef void *cl_api_clCloneKernel; typedef void *cl_api_clEnqueueSVMMigrateMem; typedef void *cl_api_clGetDeviceAndHostTimer; typedef void *cl_api_clGetHostTimer; #endif /* Vendor dispatch table struture */ typedef struct _cl_icd_dispatch { /* OpenCL 1.0 */ cl_api_clGetPlatformIDs clGetPlatformIDs; cl_api_clGetPlatformInfo clGetPlatformInfo; cl_api_clGetDeviceIDs clGetDeviceIDs; cl_api_clGetDeviceInfo clGetDeviceInfo; cl_api_clCreateContext clCreateContext; cl_api_clCreateContextFromType clCreateContextFromType; cl_api_clRetainContext clRetainContext; cl_api_clReleaseContext clReleaseContext; cl_api_clGetContextInfo clGetContextInfo; cl_api_clCreateCommandQueue clCreateCommandQueue; cl_api_clRetainCommandQueue clRetainCommandQueue; cl_api_clReleaseCommandQueue clReleaseCommandQueue; cl_api_clGetCommandQueueInfo clGetCommandQueueInfo; cl_api_clSetCommandQueueProperty clSetCommandQueueProperty; cl_api_clCreateBuffer clCreateBuffer; cl_api_clCreateImage2D clCreateImage2D; cl_api_clCreateImage3D clCreateImage3D; cl_api_clRetainMemObject clRetainMemObject; cl_api_clReleaseMemObject clReleaseMemObject; cl_api_clGetSupportedImageFormats clGetSupportedImageFormats; cl_api_clGetMemObjectInfo clGetMemObjectInfo; cl_api_clGetImageInfo clGetImageInfo; cl_api_clCreateSampler clCreateSampler; cl_api_clRetainSampler clRetainSampler; cl_api_clReleaseSampler clReleaseSampler; cl_api_clGetSamplerInfo clGetSamplerInfo; cl_api_clCreateProgramWithSource clCreateProgramWithSource; cl_api_clCreateProgramWithBinary clCreateProgramWithBinary; cl_api_clRetainProgram clRetainProgram; cl_api_clReleaseProgram clReleaseProgram; cl_api_clBuildProgram clBuildProgram; cl_api_clUnloadCompiler clUnloadCompiler; cl_api_clGetProgramInfo clGetProgramInfo; cl_api_clGetProgramBuildInfo clGetProgramBuildInfo; cl_api_clCreateKernel clCreateKernel; cl_api_clCreateKernelsInProgram clCreateKernelsInProgram; cl_api_clRetainKernel clRetainKernel; cl_api_clReleaseKernel clReleaseKernel; cl_api_clSetKernelArg clSetKernelArg; cl_api_clGetKernelInfo clGetKernelInfo; cl_api_clGetKernelWorkGroupInfo clGetKernelWorkGroupInfo; cl_api_clWaitForEvents clWaitForEvents; cl_api_clGetEventInfo clGetEventInfo; cl_api_clRetainEvent clRetainEvent; cl_api_clReleaseEvent clReleaseEvent; cl_api_clGetEventProfilingInfo clGetEventProfilingInfo; cl_api_clFlush clFlush; cl_api_clFinish clFinish; cl_api_clEnqueueReadBuffer clEnqueueReadBuffer; cl_api_clEnqueueWriteBuffer clEnqueueWriteBuffer; cl_api_clEnqueueCopyBuffer clEnqueueCopyBuffer; cl_api_clEnqueueReadImage clEnqueueReadImage; cl_api_clEnqueueWriteImage clEnqueueWriteImage; cl_api_clEnqueueCopyImage clEnqueueCopyImage; cl_api_clEnqueueCopyImageToBuffer clEnqueueCopyImageToBuffer; cl_api_clEnqueueCopyBufferToImage clEnqueueCopyBufferToImage; cl_api_clEnqueueMapBuffer clEnqueueMapBuffer; cl_api_clEnqueueMapImage clEnqueueMapImage; cl_api_clEnqueueUnmapMemObject clEnqueueUnmapMemObject; cl_api_clEnqueueNDRangeKernel clEnqueueNDRangeKernel; cl_api_clEnqueueTask clEnqueueTask; cl_api_clEnqueueNativeKernel clEnqueueNativeKernel; cl_api_clEnqueueMarker clEnqueueMarker; cl_api_clEnqueueWaitForEvents clEnqueueWaitForEvents; cl_api_clEnqueueBarrier clEnqueueBarrier; cl_api_clGetExtensionFunctionAddress clGetExtensionFunctionAddress; cl_api_clCreateFromGLBuffer clCreateFromGLBuffer; cl_api_clCreateFromGLTexture2D clCreateFromGLTexture2D; cl_api_clCreateFromGLTexture3D clCreateFromGLTexture3D; cl_api_clCreateFromGLRenderbuffer clCreateFromGLRenderbuffer; cl_api_clGetGLObjectInfo clGetGLObjectInfo; cl_api_clGetGLTextureInfo clGetGLTextureInfo; cl_api_clEnqueueAcquireGLObjects clEnqueueAcquireGLObjects; cl_api_clEnqueueReleaseGLObjects clEnqueueReleaseGLObjects; cl_api_clGetGLContextInfoKHR clGetGLContextInfoKHR; /* cl_khr_d3d10_sharing */ cl_api_clGetDeviceIDsFromD3D10KHR clGetDeviceIDsFromD3D10KHR; cl_api_clCreateFromD3D10BufferKHR clCreateFromD3D10BufferKHR; cl_api_clCreateFromD3D10Texture2DKHR clCreateFromD3D10Texture2DKHR; cl_api_clCreateFromD3D10Texture3DKHR clCreateFromD3D10Texture3DKHR; cl_api_clEnqueueAcquireD3D10ObjectsKHR clEnqueueAcquireD3D10ObjectsKHR; cl_api_clEnqueueReleaseD3D10ObjectsKHR clEnqueueReleaseD3D10ObjectsKHR; /* OpenCL 1.1 */ cl_api_clSetEventCallback clSetEventCallback; cl_api_clCreateSubBuffer clCreateSubBuffer; cl_api_clSetMemObjectDestructorCallback clSetMemObjectDestructorCallback; cl_api_clCreateUserEvent clCreateUserEvent; cl_api_clSetUserEventStatus clSetUserEventStatus; cl_api_clEnqueueReadBufferRect clEnqueueReadBufferRect; cl_api_clEnqueueWriteBufferRect clEnqueueWriteBufferRect; cl_api_clEnqueueCopyBufferRect clEnqueueCopyBufferRect; /* cl_ext_device_fission */ cl_api_clCreateSubDevicesEXT clCreateSubDevicesEXT; cl_api_clRetainDeviceEXT clRetainDeviceEXT; cl_api_clReleaseDeviceEXT clReleaseDeviceEXT; /* cl_khr_gl_event */ cl_api_clCreateEventFromGLsyncKHR clCreateEventFromGLsyncKHR; /* OpenCL 1.2 */ cl_api_clCreateSubDevices clCreateSubDevices; cl_api_clRetainDevice clRetainDevice; cl_api_clReleaseDevice clReleaseDevice; cl_api_clCreateImage clCreateImage; cl_api_clCreateProgramWithBuiltInKernels clCreateProgramWithBuiltInKernels; cl_api_clCompileProgram clCompileProgram; cl_api_clLinkProgram clLinkProgram; cl_api_clUnloadPlatformCompiler clUnloadPlatformCompiler; cl_api_clGetKernelArgInfo clGetKernelArgInfo; cl_api_clEnqueueFillBuffer clEnqueueFillBuffer; cl_api_clEnqueueFillImage clEnqueueFillImage; cl_api_clEnqueueMigrateMemObjects clEnqueueMigrateMemObjects; cl_api_clEnqueueMarkerWithWaitList clEnqueueMarkerWithWaitList; cl_api_clEnqueueBarrierWithWaitList clEnqueueBarrierWithWaitList; cl_api_clGetExtensionFunctionAddressForPlatform clGetExtensionFunctionAddressForPlatform; cl_api_clCreateFromGLTexture clCreateFromGLTexture; /* cl_khr_d3d11_sharing */ cl_api_clGetDeviceIDsFromD3D11KHR clGetDeviceIDsFromD3D11KHR; cl_api_clCreateFromD3D11BufferKHR clCreateFromD3D11BufferKHR; cl_api_clCreateFromD3D11Texture2DKHR clCreateFromD3D11Texture2DKHR; cl_api_clCreateFromD3D11Texture3DKHR clCreateFromD3D11Texture3DKHR; cl_api_clCreateFromDX9MediaSurfaceKHR clCreateFromDX9MediaSurfaceKHR; cl_api_clEnqueueAcquireD3D11ObjectsKHR clEnqueueAcquireD3D11ObjectsKHR; cl_api_clEnqueueReleaseD3D11ObjectsKHR clEnqueueReleaseD3D11ObjectsKHR; /* cl_khr_dx9_media_sharing */ cl_api_clGetDeviceIDsFromDX9MediaAdapterKHR clGetDeviceIDsFromDX9MediaAdapterKHR; cl_api_clEnqueueAcquireDX9MediaSurfacesKHR clEnqueueAcquireDX9MediaSurfacesKHR; cl_api_clEnqueueReleaseDX9MediaSurfacesKHR clEnqueueReleaseDX9MediaSurfacesKHR; /* cl_khr_egl_image */ cl_api_clCreateFromEGLImageKHR clCreateFromEGLImageKHR; cl_api_clEnqueueAcquireEGLObjectsKHR clEnqueueAcquireEGLObjectsKHR; cl_api_clEnqueueReleaseEGLObjectsKHR clEnqueueReleaseEGLObjectsKHR; /* cl_khr_egl_event */ cl_api_clCreateEventFromEGLSyncKHR clCreateEventFromEGLSyncKHR; /* OpenCL 2.0 */ cl_api_clCreateCommandQueueWithProperties clCreateCommandQueueWithProperties; cl_api_clCreatePipe clCreatePipe; cl_api_clGetPipeInfo clGetPipeInfo; cl_api_clSVMAlloc clSVMAlloc; cl_api_clSVMFree clSVMFree; cl_api_clEnqueueSVMFree clEnqueueSVMFree; cl_api_clEnqueueSVMMemcpy clEnqueueSVMMemcpy; cl_api_clEnqueueSVMMemFill clEnqueueSVMMemFill; cl_api_clEnqueueSVMMap clEnqueueSVMMap; cl_api_clEnqueueSVMUnmap clEnqueueSVMUnmap; cl_api_clCreateSamplerWithProperties clCreateSamplerWithProperties; cl_api_clSetKernelArgSVMPointer clSetKernelArgSVMPointer; cl_api_clSetKernelExecInfo clSetKernelExecInfo; /* cl_khr_sub_groups */ cl_api_clGetKernelSubGroupInfoKHR clGetKernelSubGroupInfoKHR; /* OpenCL 2.1 */ cl_api_clCloneKernel clCloneKernel; cl_api_clCreateProgramWithIL clCreateProgramWithIL; cl_api_clEnqueueSVMMigrateMem clEnqueueSVMMigrateMem; cl_api_clGetDeviceAndHostTimer clGetDeviceAndHostTimer; cl_api_clGetHostTimer clGetHostTimer; cl_api_clGetKernelSubGroupInfo clGetKernelSubGroupInfo; cl_api_clSetDefaultDeviceCommandQueue clSetDefaultDeviceCommandQueue; /* OpenCL 2.2 */ cl_api_clSetProgramReleaseCallback clSetProgramReleaseCallback; cl_api_clSetProgramSpecializationConstant clSetProgramSpecializationConstant; } cl_icd_dispatch; #ifdef __cplusplus } #endif #endif /* #ifndef OPENCL_CL_ICD_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_platform.h000066400000000000000000001212601450307266000243000ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ #ifndef __CL_PLATFORM_H #define __CL_PLATFORM_H #include #ifdef __cplusplus extern "C" { #endif #if defined(_WIN32) #define CL_API_ENTRY #define CL_API_CALL __stdcall #define CL_CALLBACK __stdcall #else #define CL_API_ENTRY #define CL_API_CALL #define CL_CALLBACK #endif /* * Deprecation flags refer to the last version of the header in which the * feature was not deprecated. * * E.g. VERSION_1_1_DEPRECATED means the feature is present in 1.1 without * deprecation but is deprecated in versions later than 1.1. */ #define CL_EXTENSION_WEAK_LINK #define CL_API_SUFFIX__VERSION_1_0 #define CL_EXT_SUFFIX__VERSION_1_0 #define CL_API_SUFFIX__VERSION_1_1 #define CL_EXT_SUFFIX__VERSION_1_1 #define CL_API_SUFFIX__VERSION_1_2 #define CL_EXT_SUFFIX__VERSION_1_2 #define CL_API_SUFFIX__VERSION_2_0 #define CL_EXT_SUFFIX__VERSION_2_0 #define CL_API_SUFFIX__VERSION_2_1 #define CL_EXT_SUFFIX__VERSION_2_1 #define CL_API_SUFFIX__VERSION_2_2 #define CL_EXT_SUFFIX__VERSION_2_2 #ifdef __GNUC__ #define CL_EXT_SUFFIX_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX_DEPRECATED #elif defined(_WIN32) #define CL_EXT_SUFFIX_DEPRECATED #define CL_EXT_PREFIX_DEPRECATED __declspec(deprecated) #else #define CL_EXT_SUFFIX_DEPRECATED #define CL_EXT_PREFIX_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED CL_EXT_SUFFIX_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED CL_EXT_PREFIX_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXT_SUFFIX_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED CL_EXT_PREFIX_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED CL_EXT_SUFFIX_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED CL_EXT_PREFIX_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_2_0_APIS #define CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_2_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED CL_EXT_SUFFIX_DEPRECATED #define CL_EXT_PREFIX__VERSION_2_0_DEPRECATED CL_EXT_PREFIX_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_2_1_APIS #define CL_EXT_SUFFIX__VERSION_2_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_2_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_2_1_DEPRECATED CL_EXT_SUFFIX_DEPRECATED #define CL_EXT_PREFIX__VERSION_2_1_DEPRECATED CL_EXT_PREFIX_DEPRECATED #endif #if (defined (_WIN32) && defined(_MSC_VER)) /* scalar types */ typedef signed __int8 cl_char; typedef unsigned __int8 cl_uchar; typedef signed __int16 cl_short; typedef unsigned __int16 cl_ushort; typedef signed __int32 cl_int; typedef unsigned __int32 cl_uint; typedef signed __int64 cl_long; typedef unsigned __int64 cl_ulong; typedef unsigned __int16 cl_half; typedef float cl_float; typedef double cl_double; /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 1.1920928955078125e-7f #define CL_HALF_DIG 3 #define CL_HALF_MANT_DIG 11 #define CL_HALF_MAX_10_EXP +4 #define CL_HALF_MAX_EXP +16 #define CL_HALF_MIN_10_EXP -4 #define CL_HALF_MIN_EXP -13 #define CL_HALF_RADIX 2 #define CL_HALF_MAX 65504.0f #define CL_HALF_MIN 6.103515625e-05f #define CL_HALF_EPSILON 9.765625e-04f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 1.7976931348623158e+308 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.7182818284590452354 #define CL_M_LOG2E 1.4426950408889634074 #define CL_M_LOG10E 0.43429448190325182765 #define CL_M_LN2 0.69314718055994530942 #define CL_M_LN10 2.30258509299404568402 #define CL_M_PI 3.14159265358979323846 #define CL_M_PI_2 1.57079632679489661923 #define CL_M_PI_4 0.78539816339744830962 #define CL_M_1_PI 0.31830988618379067154 #define CL_M_2_PI 0.63661977236758134308 #define CL_M_2_SQRTPI 1.12837916709551257390 #define CL_M_SQRT2 1.41421356237309504880 #define CL_M_SQRT1_2 0.70710678118654752440 #define CL_M_E_F 2.718281828f #define CL_M_LOG2E_F 1.442695041f #define CL_M_LOG10E_F 0.434294482f #define CL_M_LN2_F 0.693147181f #define CL_M_LN10_F 2.302585093f #define CL_M_PI_F 3.141592654f #define CL_M_PI_2_F 1.570796327f #define CL_M_PI_4_F 0.785398163f #define CL_M_1_PI_F 0.318309886f #define CL_M_2_PI_F 0.636619772f #define CL_M_2_SQRTPI_F 1.128379167f #define CL_M_SQRT2_F 1.414213562f #define CL_M_SQRT1_2_F 0.707106781f #define CL_NAN (CL_INFINITY - CL_INFINITY) #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #else #include /* scalar types */ typedef int8_t cl_char; typedef uint8_t cl_uchar; typedef int16_t cl_short; typedef uint16_t cl_ushort; typedef int32_t cl_int; typedef uint32_t cl_uint; typedef int64_t cl_long; typedef uint64_t cl_ulong; typedef uint16_t cl_half; typedef float cl_float; typedef double cl_double; /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 1.1920928955078125e-7f #define CL_HALF_DIG 3 #define CL_HALF_MANT_DIG 11 #define CL_HALF_MAX_10_EXP +4 #define CL_HALF_MAX_EXP +16 #define CL_HALF_MIN_10_EXP -4 #define CL_HALF_MIN_EXP -13 #define CL_HALF_RADIX 2 #define CL_HALF_MAX 65504.0f #define CL_HALF_MIN 6.103515625e-05f #define CL_HALF_EPSILON 9.765625e-04f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.7182818284590452354 #define CL_M_LOG2E 1.4426950408889634074 #define CL_M_LOG10E 0.43429448190325182765 #define CL_M_LN2 0.69314718055994530942 #define CL_M_LN10 2.30258509299404568402 #define CL_M_PI 3.14159265358979323846 #define CL_M_PI_2 1.57079632679489661923 #define CL_M_PI_4 0.78539816339744830962 #define CL_M_1_PI 0.31830988618379067154 #define CL_M_2_PI 0.63661977236758134308 #define CL_M_2_SQRTPI 1.12837916709551257390 #define CL_M_SQRT2 1.41421356237309504880 #define CL_M_SQRT1_2 0.70710678118654752440 #define CL_M_E_F 2.718281828f #define CL_M_LOG2E_F 1.442695041f #define CL_M_LOG10E_F 0.434294482f #define CL_M_LN2_F 0.693147181f #define CL_M_LN10_F 2.302585093f #define CL_M_PI_F 3.141592654f #define CL_M_PI_2_F 1.570796327f #define CL_M_PI_4_F 0.785398163f #define CL_M_1_PI_F 0.318309886f #define CL_M_2_PI_F 0.636619772f #define CL_M_2_SQRTPI_F 1.128379167f #define CL_M_SQRT2_F 1.414213562f #define CL_M_SQRT1_2_F 0.707106781f #if defined( __GNUC__ ) #define CL_HUGE_VALF __builtin_huge_valf() #define CL_HUGE_VAL __builtin_huge_val() #define CL_NAN __builtin_nanf( "" ) #else #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) float nanf( const char * ); #define CL_NAN nanf( "" ) #endif #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #endif #include /* Mirror types to GL types. Mirror types allow us to avoid deciding which 87s to load based on whether we are using GL or GLES here. */ typedef unsigned int cl_GLuint; typedef int cl_GLint; typedef unsigned int cl_GLenum; /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ /* Define basic vector types */ #if defined( __VEC__ ) #include /* may be omitted depending on compiler. AltiVec spec provides no way to detect whether the header is required. */ typedef __vector unsigned char __cl_uchar16; typedef __vector signed char __cl_char16; typedef __vector unsigned short __cl_ushort8; typedef __vector signed short __cl_short8; typedef __vector unsigned int __cl_uint4; typedef __vector signed int __cl_int4; typedef __vector float __cl_float4; #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_UINT4__ 1 #define __CL_INT4__ 1 #define __CL_FLOAT4__ 1 #endif #if defined( __SSE__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef float __cl_float4 __attribute__((vector_size(16))); #else typedef __m128 __cl_float4; #endif #define __CL_FLOAT4__ 1 #endif #if defined( __SSE2__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar16 __attribute__((vector_size(16))); typedef cl_char __cl_char16 __attribute__((vector_size(16))); typedef cl_ushort __cl_ushort8 __attribute__((vector_size(16))); typedef cl_short __cl_short8 __attribute__((vector_size(16))); typedef cl_uint __cl_uint4 __attribute__((vector_size(16))); typedef cl_int __cl_int4 __attribute__((vector_size(16))); typedef cl_ulong __cl_ulong2 __attribute__((vector_size(16))); typedef cl_long __cl_long2 __attribute__((vector_size(16))); typedef cl_double __cl_double2 __attribute__((vector_size(16))); #else typedef __m128i __cl_uchar16; typedef __m128i __cl_char16; typedef __m128i __cl_ushort8; typedef __m128i __cl_short8; typedef __m128i __cl_uint4; typedef __m128i __cl_int4; typedef __m128i __cl_ulong2; typedef __m128i __cl_long2; typedef __m128d __cl_double2; #endif #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_INT4__ 1 #define __CL_UINT4__ 1 #define __CL_ULONG2__ 1 #define __CL_LONG2__ 1 #define __CL_DOUBLE2__ 1 #endif #if defined( __MMX__ ) #include #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar8 __attribute__((vector_size(8))); typedef cl_char __cl_char8 __attribute__((vector_size(8))); typedef cl_ushort __cl_ushort4 __attribute__((vector_size(8))); typedef cl_short __cl_short4 __attribute__((vector_size(8))); typedef cl_uint __cl_uint2 __attribute__((vector_size(8))); typedef cl_int __cl_int2 __attribute__((vector_size(8))); typedef cl_ulong __cl_ulong1 __attribute__((vector_size(8))); typedef cl_long __cl_long1 __attribute__((vector_size(8))); typedef cl_float __cl_float2 __attribute__((vector_size(8))); #else typedef __m64 __cl_uchar8; typedef __m64 __cl_char8; typedef __m64 __cl_ushort4; typedef __m64 __cl_short4; typedef __m64 __cl_uint2; typedef __m64 __cl_int2; typedef __m64 __cl_ulong1; typedef __m64 __cl_long1; typedef __m64 __cl_float2; #endif #define __CL_UCHAR8__ 1 #define __CL_CHAR8__ 1 #define __CL_USHORT4__ 1 #define __CL_SHORT4__ 1 #define __CL_INT2__ 1 #define __CL_UINT2__ 1 #define __CL_ULONG1__ 1 #define __CL_LONG1__ 1 #define __CL_FLOAT2__ 1 #endif #if defined( __AVX__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_float __cl_float8 __attribute__((vector_size(32))); typedef cl_double __cl_double4 __attribute__((vector_size(32))); #else typedef __m256 __cl_float8; typedef __m256d __cl_double4; #endif #define __CL_FLOAT8__ 1 #define __CL_DOUBLE4__ 1 #endif /* Define capabilities for anonymous struct members. */ #if !defined(__cplusplus) && defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ #elif defined( __GNUC__) && ! defined( __STRICT_ANSI__ ) #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ __extension__ #elif defined( _WIN32) && defined(_MSC_VER) #if _MSC_VER >= 1500 /* Microsoft Developer Studio 2008 supports anonymous structs, but * complains by default. */ #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ /* Disable warning C4201: nonstandard extension used : nameless * struct/union */ #pragma warning( push ) #pragma warning( disable : 4201 ) #endif #else #define __CL_HAS_ANON_STRUCT__ 0 #define __CL_ANON_STRUCT__ #endif /* Define alignment keys */ #if defined( __GNUC__ ) #define CL_ALIGNED(_x) __attribute__ ((aligned(_x))) #elif defined( _WIN32) && (_MSC_VER) /* Alignment keys neutered on windows because MSVC can't swallow function arguments with alignment requirements */ /* http://msdn.microsoft.com/en-us/library/373ak2y1%28VS.71%29.aspx */ /* #include */ /* #define CL_ALIGNED(_x) _CRT_ALIGN(_x) */ #define CL_ALIGNED(_x) #else #warning Need to implement some method to align data here #define CL_ALIGNED(_x) #endif /* Indicate whether .xyzw, .s0123 and .hi.lo are supported */ #if __CL_HAS_ANON_STRUCT__ /* .xyzw and .s0123...{f|F} are supported */ #define CL_HAS_NAMED_VECTOR_FIELDS 1 /* .hi and .lo are supported */ #define CL_HAS_HI_LO_VECTOR_FIELDS 1 #endif /* Define cl_vector types */ /* ---- cl_charn ---- */ typedef union { cl_char CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_char lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2; #endif }cl_char2; typedef union { cl_char CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_char2 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[2]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4; #endif }cl_char4; /* cl_char3 is identical in size, alignment and behavior to cl_char4. See section 6.1.5. */ typedef cl_char4 cl_char3; typedef union { cl_char CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_char4 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[4]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[2]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8; #endif }cl_char8; typedef union { cl_char CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_char8 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[8]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[4]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8[2]; #endif #if defined( __CL_CHAR16__ ) __cl_char16 v16; #endif }cl_char16; /* ---- cl_ucharn ---- */ typedef union { cl_uchar CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uchar lo, hi; }; #endif #if defined( __cl_uchar2__) __cl_uchar2 v2; #endif }cl_uchar2; typedef union { cl_uchar CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uchar2 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[2]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4; #endif }cl_uchar4; /* cl_uchar3 is identical in size, alignment and behavior to cl_uchar4. See section 6.1.5. */ typedef cl_uchar4 cl_uchar3; typedef union { cl_uchar CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uchar4 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[4]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[2]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8; #endif }cl_uchar8; typedef union { cl_uchar CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uchar8 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[8]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[4]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8[2]; #endif #if defined( __CL_UCHAR16__ ) __cl_uchar16 v16; #endif }cl_uchar16; /* ---- cl_shortn ---- */ typedef union { cl_short CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_short lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2; #endif }cl_short2; typedef union { cl_short CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_short2 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[2]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4; #endif }cl_short4; /* cl_short3 is identical in size, alignment and behavior to cl_short4. See section 6.1.5. */ typedef cl_short4 cl_short3; typedef union { cl_short CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_short4 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[4]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[2]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8; #endif }cl_short8; typedef union { cl_short CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_short8 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[8]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[4]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8[2]; #endif #if defined( __CL_SHORT16__ ) __cl_short16 v16; #endif }cl_short16; /* ---- cl_ushortn ---- */ typedef union { cl_ushort CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ushort lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2; #endif }cl_ushort2; typedef union { cl_ushort CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ushort2 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[2]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4; #endif }cl_ushort4; /* cl_ushort3 is identical in size, alignment and behavior to cl_ushort4. See section 6.1.5. */ typedef cl_ushort4 cl_ushort3; typedef union { cl_ushort CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ushort4 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[4]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[2]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8; #endif }cl_ushort8; typedef union { cl_ushort CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ushort8 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[8]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[4]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8[2]; #endif #if defined( __CL_USHORT16__ ) __cl_ushort16 v16; #endif }cl_ushort16; /* ---- cl_halfn ---- */ typedef union { cl_half CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_half lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2; #endif }cl_half2; typedef union { cl_half CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_half2 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[2]; #endif #if defined( __CL_HALF4__) __cl_half4 v4; #endif }cl_half4; /* cl_half3 is identical in size, alignment and behavior to cl_half4. See section 6.1.5. */ typedef cl_half4 cl_half3; typedef union { cl_half CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_half4 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[4]; #endif #if defined( __CL_HALF4__) __cl_half4 v4[2]; #endif #if defined( __CL_HALF8__ ) __cl_half8 v8; #endif }cl_half8; typedef union { cl_half CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_half x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_half s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_half8 lo, hi; }; #endif #if defined( __CL_HALF2__) __cl_half2 v2[8]; #endif #if defined( __CL_HALF4__) __cl_half4 v4[4]; #endif #if defined( __CL_HALF8__ ) __cl_half8 v8[2]; #endif #if defined( __CL_HALF16__ ) __cl_half16 v16; #endif }cl_half16; /* ---- cl_intn ---- */ typedef union { cl_int CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_int lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2; #endif }cl_int2; typedef union { cl_int CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_int2 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[2]; #endif #if defined( __CL_INT4__) __cl_int4 v4; #endif }cl_int4; /* cl_int3 is identical in size, alignment and behavior to cl_int4. See section 6.1.5. */ typedef cl_int4 cl_int3; typedef union { cl_int CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_int4 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[4]; #endif #if defined( __CL_INT4__) __cl_int4 v4[2]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8; #endif }cl_int8; typedef union { cl_int CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_int8 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[8]; #endif #if defined( __CL_INT4__) __cl_int4 v4[4]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8[2]; #endif #if defined( __CL_INT16__ ) __cl_int16 v16; #endif }cl_int16; /* ---- cl_uintn ---- */ typedef union { cl_uint CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uint lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2; #endif }cl_uint2; typedef union { cl_uint CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uint2 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[2]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4; #endif }cl_uint4; /* cl_uint3 is identical in size, alignment and behavior to cl_uint4. See section 6.1.5. */ typedef cl_uint4 cl_uint3; typedef union { cl_uint CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uint4 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[4]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[2]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8; #endif }cl_uint8; typedef union { cl_uint CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uint8 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[8]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[4]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8[2]; #endif #if defined( __CL_UINT16__ ) __cl_uint16 v16; #endif }cl_uint16; /* ---- cl_longn ---- */ typedef union { cl_long CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_long lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2; #endif }cl_long2; typedef union { cl_long CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_long2 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[2]; #endif #if defined( __CL_LONG4__) __cl_long4 v4; #endif }cl_long4; /* cl_long3 is identical in size, alignment and behavior to cl_long4. See section 6.1.5. */ typedef cl_long4 cl_long3; typedef union { cl_long CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_long4 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[4]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[2]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8; #endif }cl_long8; typedef union { cl_long CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_long8 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[8]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[4]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8[2]; #endif #if defined( __CL_LONG16__ ) __cl_long16 v16; #endif }cl_long16; /* ---- cl_ulongn ---- */ typedef union { cl_ulong CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ulong lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2; #endif }cl_ulong2; typedef union { cl_ulong CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ulong2 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[2]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4; #endif }cl_ulong4; /* cl_ulong3 is identical in size, alignment and behavior to cl_ulong4. See section 6.1.5. */ typedef cl_ulong4 cl_ulong3; typedef union { cl_ulong CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ulong4 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[4]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[2]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8; #endif }cl_ulong8; typedef union { cl_ulong CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ulong8 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[8]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[4]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8[2]; #endif #if defined( __CL_ULONG16__ ) __cl_ulong16 v16; #endif }cl_ulong16; /* --- cl_floatn ---- */ typedef union { cl_float CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_float lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2; #endif }cl_float2; typedef union { cl_float CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_float2 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[2]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4; #endif }cl_float4; /* cl_float3 is identical in size, alignment and behavior to cl_float4. See section 6.1.5. */ typedef cl_float4 cl_float3; typedef union { cl_float CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_float4 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[4]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[2]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8; #endif }cl_float8; typedef union { cl_float CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_float8 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[8]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[4]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8[2]; #endif #if defined( __CL_FLOAT16__ ) __cl_float16 v16; #endif }cl_float16; /* --- cl_doublen ---- */ typedef union { cl_double CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_double lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2; #endif }cl_double2; typedef union { cl_double CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_double2 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[2]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4; #endif }cl_double4; /* cl_double3 is identical in size, alignment and behavior to cl_double4. See section 6.1.5. */ typedef cl_double4 cl_double3; typedef union { cl_double CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_double4 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[4]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[2]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8; #endif }cl_double8; typedef union { cl_double CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_double8 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[8]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[4]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8[2]; #endif #if defined( __CL_DOUBLE16__ ) __cl_double16 v16; #endif }cl_double16; /* Macro to facilitate debugging * Usage: * Place CL_PROGRAM_STRING_DEBUG_INFO on the line before the first line of your source. * The first line ends with: CL_PROGRAM_STRING_DEBUG_INFO \" * Each line thereafter of OpenCL C source must end with: \n\ * The last line ends in "; * * Example: * * const char *my_program = CL_PROGRAM_STRING_DEBUG_INFO "\ * kernel void foo( int a, float * b ) \n\ * { \n\ * // my comment \n\ * *b[ get_global_id(0)] = a; \n\ * } \n\ * "; * * This should correctly set up the line, (column) and file information for your source * string so you can do source level debugging. */ #define __CL_STRINGIFY( _x ) # _x #define _CL_STRINGIFY( _x ) __CL_STRINGIFY( _x ) #define CL_PROGRAM_STRING_DEBUG_INFO "#line " _CL_STRINGIFY(__LINE__) " \"" __FILE__ "\" \n\n" #ifdef __cplusplus } #endif #undef __CL_HAS_ANON_STRUCT__ #undef __CL_ANON_STRUCT__ #if defined( _WIN32) && defined(_MSC_VER) #if _MSC_VER >=1500 #pragma warning( pop ) #endif #endif #endif /* __CL_PLATFORM_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_va_api_media_sharing_intel.h000066400000000000000000000145751450307266000277720ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /*****************************************************************************\ Copyright (c) 2013-2019 Intel Corporation All Rights Reserved. THESE MATERIALS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE MATERIALS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. File Name: cl_va_api_media_sharing_intel.h Abstract: Notes: \*****************************************************************************/ #ifndef __OPENCL_CL_VA_API_MEDIA_SHARING_INTEL_H #define __OPENCL_CL_VA_API_MEDIA_SHARING_INTEL_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************** * cl_intel_va_api_media_sharing extension * *******************************************/ #define cl_intel_va_api_media_sharing 1 /* error codes */ #define CL_INVALID_VA_API_MEDIA_ADAPTER_INTEL -1098 #define CL_INVALID_VA_API_MEDIA_SURFACE_INTEL -1099 #define CL_VA_API_MEDIA_SURFACE_ALREADY_ACQUIRED_INTEL -1100 #define CL_VA_API_MEDIA_SURFACE_NOT_ACQUIRED_INTEL -1101 /* cl_va_api_device_source_intel */ #define CL_VA_API_DISPLAY_INTEL 0x4094 /* cl_va_api_device_set_intel */ #define CL_PREFERRED_DEVICES_FOR_VA_API_INTEL 0x4095 #define CL_ALL_DEVICES_FOR_VA_API_INTEL 0x4096 /* cl_context_info */ #define CL_CONTEXT_VA_API_DISPLAY_INTEL 0x4097 /* cl_mem_info */ #define CL_MEM_VA_API_MEDIA_SURFACE_INTEL 0x4098 /* cl_image_info */ #define CL_IMAGE_VA_API_PLANE_INTEL 0x4099 /* cl_command_type */ #define CL_COMMAND_ACQUIRE_VA_API_MEDIA_SURFACES_INTEL 0x409A #define CL_COMMAND_RELEASE_VA_API_MEDIA_SURFACES_INTEL 0x409B typedef cl_uint cl_va_api_device_source_intel; typedef cl_uint cl_va_api_device_set_intel; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDsFromVA_APIMediaAdapterINTEL( cl_platform_id platform, cl_va_api_device_source_intel media_adapter_type, void* media_adapter, cl_va_api_device_set_intel media_adapter_set, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL * clGetDeviceIDsFromVA_APIMediaAdapterINTEL_fn)( cl_platform_id platform, cl_va_api_device_source_intel media_adapter_type, void* media_adapter, cl_va_api_device_set_intel media_adapter_set, cl_uint num_entries, cl_device_id* devices, cl_uint* num_devices) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromVA_APIMediaSurfaceINTEL( cl_context context, cl_mem_flags flags, VASurfaceID* surface, cl_uint plane, cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL * clCreateFromVA_APIMediaSurfaceINTEL_fn)( cl_context context, cl_mem_flags flags, VASurfaceID* surface, cl_uint plane, cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireVA_APIMediaSurfacesINTEL( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireVA_APIMediaSurfacesINTEL_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseVA_APIMediaSurfacesINTEL( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseVA_APIMediaSurfacesINTEL_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem* mem_objects, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_EXT_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_VA_API_MEDIA_SHARING_INTEL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/cl_version.h000066400000000000000000000054411450307266000241430ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ #ifndef __CL_VERSION_H #define __CL_VERSION_H /* Detect which version to target */ #if !defined(CL_TARGET_OPENCL_VERSION) #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)") #define CL_TARGET_OPENCL_VERSION 220 #endif #if CL_TARGET_OPENCL_VERSION != 100 && \ CL_TARGET_OPENCL_VERSION != 110 && \ CL_TARGET_OPENCL_VERSION != 120 && \ CL_TARGET_OPENCL_VERSION != 200 && \ CL_TARGET_OPENCL_VERSION != 210 && \ CL_TARGET_OPENCL_VERSION != 220 #pragma message("cl_version: CL_TARGET_OPENCL_VERSION is not a valid value (100, 110, 120, 200, 210, 220). Defaulting to 220 (OpenCL 2.2)") #undef CL_TARGET_OPENCL_VERSION #define CL_TARGET_OPENCL_VERSION 220 #endif /* OpenCL Version */ #if CL_TARGET_OPENCL_VERSION >= 220 && !defined(CL_VERSION_2_2) #define CL_VERSION_2_2 1 #endif #if CL_TARGET_OPENCL_VERSION >= 210 && !defined(CL_VERSION_2_1) #define CL_VERSION_2_1 1 #endif #if CL_TARGET_OPENCL_VERSION >= 200 && !defined(CL_VERSION_2_0) #define CL_VERSION_2_0 1 #endif #if CL_TARGET_OPENCL_VERSION >= 120 && !defined(CL_VERSION_1_2) #define CL_VERSION_1_2 1 #endif #if CL_TARGET_OPENCL_VERSION >= 110 && !defined(CL_VERSION_1_1) #define CL_VERSION_1_1 1 #endif #if CL_TARGET_OPENCL_VERSION >= 100 && !defined(CL_VERSION_1_0) #define CL_VERSION_1_0 1 #endif /* Allow deprecated APIs for older OpenCL versions. */ #if CL_TARGET_OPENCL_VERSION <= 210 && !defined(CL_USE_DEPRECATED_OPENCL_2_1_APIS) #define CL_USE_DEPRECATED_OPENCL_2_1_APIS #endif #if CL_TARGET_OPENCL_VERSION <= 200 && !defined(CL_USE_DEPRECATED_OPENCL_2_0_APIS) #define CL_USE_DEPRECATED_OPENCL_2_0_APIS #endif #if CL_TARGET_OPENCL_VERSION <= 120 && !defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS) #define CL_USE_DEPRECATED_OPENCL_1_2_APIS #endif #if CL_TARGET_OPENCL_VERSION <= 110 && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS) #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #endif #if CL_TARGET_OPENCL_VERSION <= 100 && !defined(CL_USE_DEPRECATED_OPENCL_1_0_APIS) #define CL_USE_DEPRECATED_OPENCL_1_0_APIS #endif #endif /* __CL_VERSION_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CL/opencl.h000066400000000000000000000020671450307266000232610ustar00rootroot00000000000000/******************************************************************************* * Copyright (c) 2008-2021 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. ******************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_H #define __OPENCL_H #ifdef __cplusplus extern "C" { #endif #include #include #include #include #ifdef __cplusplus } #endif #endif /* __OPENCL_H */ clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/CODE_OF_CONDUCT.md000066400000000000000000000004301450307266000241210ustar00rootroot00000000000000A reminder that this issue tracker is managed by the Khronos Group. Interactions here should follow the Khronos Code of Conduct (https://www.khronos.org/developers/code-of-conduct), which prohibits aggressive or derogatory language. Please keep the discussion friendly and civil. clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/LICENSE000066400000000000000000000024351450307266000223360ustar00rootroot00000000000000Copyright (c) 2008-2015 The Khronos Group Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and/or associated documentation files (the "Materials"), to deal in the Materials without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Materials, and to permit persons to whom the Materials are furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Materials. MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT https://www.khronos.org/registry/ THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/LICENSE.txt000066400000000000000000000033341450307266000231530ustar00rootroot00000000000000Copyright (c) 2016 The Khronos Group Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software source and associated documentation files (the "Materials"), to deal in the Materials without restriction, including without limitation the rights to use, copy, modify, compile, merge, publish, distribute, sublicense, and/or sell copies of the Materials, and to permit persons to whom the Materials are furnished to do so, subject the following terms and conditions: All modifications to the Materials used to create a binary that is distributed to third parties shall be provided to Khronos with an unrestricted license to use for the purposes of implementing bug fixes and enhancements to the Materials; If the binary is used as part of an OpenCL(TM) implementation, whether binary is distributed together with or separately to that implementation, then recipient must become an OpenCL Adopter and follow the published OpenCL conformance process for that implementation, details at: http://www.khronos.org/conformance/; The above copyright notice, the OpenCL trademark license, and this permission notice shall be included in all copies or substantial portions of the Materials. THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. OpenCL is a trademark of Apple Inc. used under license by Khronos. clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/README.md000066400000000000000000000030301450307266000226000ustar00rootroot00000000000000# OpenCLTM API Headers This repository contains C language headers for the OpenCL API. The authoritative public repository for these headers is located at: https://github.com/KhronosGroup/OpenCL-Headers Issues, proposed fixes for issues, and other suggested changes should be created using Github. ## Branch Structure The OpenCL API headers in this repository are Unified headers and are designed to work with all released OpenCL versions. This differs from previous OpenCL API headers, where version-specific API headers either existed in separate branches, or in separate folders in a branch. ## Compiling for a Specific OpenCL Version By default, the OpenCL API headers in this repository are for the latest OpenCL version (currently OpenCL 2.2). To use these API headers to target a different OpenCL version, an application may `#define` the preprocessor value `CL_TARGET_OPENCL_VERSION` before including the OpenCL API headers. The `CL_TARGET_OPENCL_VERSION` is a three digit decimal value representing the OpenCL API version. For example, to enforce usage of no more than the OpenCL 1.2 APIs, you may include the OpenCL API headers as follows: ``` #define CL_TARGET_OPENCL_VERSION 120 #include ``` ## Directory Structure ``` README.md This file LICENSE Source license for the OpenCL API headers CL/ Unified OpenCL API headers tree ``` ## License See [LICENSE](LICENSE). --- OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/000077500000000000000000000000001450307266000224675ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/CMakeLists.txt000066400000000000000000000027151450307266000252340ustar00rootroot00000000000000cmake_minimum_required (VERSION 2.8.11) project(OpenCL_Headers_Tests) enable_testing() include_directories(${PROJECT_SOURCE_DIR}/..) # Make sure headers do not produce warnings add_compile_options(-Wall -Wextra -Werror -pedantic -Wno-format) # Add a test for a given source file for each version of OpenCL function(add_header_test NAME SOURCE) foreach(VERSION 100 110 120 200 210 220) set(TEST_EXE ${NAME}_${VERSION}) add_executable(${TEST_EXE} ${SOURCE}) target_compile_definitions(${TEST_EXE} PUBLIC -DCL_TARGET_OPENCL_VERSION=${VERSION} ) add_test(NAME ${TEST_EXE} COMMAND ${TEST_EXE}) if(${VERSION} EQUAL 220) set(TEST_EXE ${NAME}_${VERSION}_EXPERIMENTAL) add_executable(${TEST_EXE} ${SOURCE}) target_compile_definitions(${TEST_EXE} PUBLIC -DCL_TARGET_OPENCL_VERSION=${VERSION} -DCL_EXPERIMENTAL ) add_test(NAME ${TEST_EXE} COMMAND ${TEST_EXE}) endif() endforeach(VERSION) endfunction(add_header_test) # Tests add_header_test(cl_h test_cl.h.c) add_header_test(cl_egl_h test_cl_egl.h.c) add_header_test(cl_ext_h test_cl_ext.h.c) add_header_test(cl_ext_intel_h test_cl_ext_intel.h.c) add_header_test(cl_gl_h test_cl_gl.h.c) add_header_test(cl_gl_ext_h test_cl_gl_ext.h.c) add_header_test(cl_icd_h test_cl_icd.h.c) add_header_test(cl_platform_h test_cl_platform.h.c) add_header_test(cl_opencl_h test_opencl.h.c) add_header_test(cl_version_h test_cl_version.h.c) add_header_test(headers test_headers.c) clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/README.md000066400000000000000000000023611450307266000237500ustar00rootroot00000000000000OpenCL-Headers/tests README =========================== The test_headers.c test is designed to make sure that the various cl_typen types work and conform to expectation for recent versions of cl_platform.h. Conforming to these expectations make use of these types practical for developers writing portable code. The various tests ending in .h.c are there to verify that the various OpenCL headers can compile stand alone. That is to ensure that they may be used a la carte. This provides developers a lifeline in the case that some unneeded part of OpenCL (e.g. cl/gl sharing) brings in a pile of symbols (e.g. all of OpenGL) that collides with other headers needed by the application. It is also poor form to require headers to be included in a particular order, especially if multiple systems require they be included in mutually incompatible order. So, here we require that each header can be used standalone so that the order is irrelevant. We also check to make sure that the headers don't cause spurious warnings. These tests are intended to be compiled using the most stringent compiler flags available for the platform, within reason. All warnings should be errors and extra warnings that it is expected developers are likely to use should be turned on. clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl.h.c000066400000000000000000000013261450307266000245200ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl.h" int main( void ) { printf("cl.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl_egl.h.c000066400000000000000000000013361450307266000253500ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl_egl.h" int main( void ) { printf("cl_egl.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl_ext.h.c000066400000000000000000000013361450307266000254010ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl_ext.h" int main( void ) { printf("cl_ext.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl_ext_intel.h.c000066400000000000000000000013521450307266000265720ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl_ext_intel.h" int main( void ) { printf("cl_ext_intel.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl_gl.h.c000066400000000000000000000013341450307266000252010ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl_gl.h" int main( void ) { printf("cl_gl.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl_gl_ext.h.c000066400000000000000000000013441450307266000260620ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl_gl_ext.h" int main( void ) { printf("cl_gl_ext.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl_icd.h.c000066400000000000000000000017321450307266000253400ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #define CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_USE_DEPRECATED_OPENCL_2_0_APIS #define CL_USE_DEPRECATED_OPENCL_2_1_APIS #define CL_USE_DEPRECATED_OPENCL_2_2_APIS #include "CL/cl_icd.h" int main( void ) { printf("cl_icd.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl_platform.h.c000066400000000000000000000013501450307266000264210ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl_platform.h" int main( void ) { printf("cl_platform.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_cl_version.h.c000066400000000000000000000013461450307266000262670ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl_version.h" int main( void ) { printf("cl_version.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_headers.c000066400000000000000000000671721450307266000253220ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/cl.h" int test_char() { /* char */ /* Constructor */ cl_char a = 0; cl_char2 a2 = {{ 0, 1 }}; cl_char4 a4 = {{ 0, 1, 2, 3 }}; cl_char8 a8 = {{ 0, 1, 2, 3, 4, 5, 6, 7 }}; cl_char16 a16 = {{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }}; /* assignment */ cl_char b = a; cl_char2 b2 = a2; cl_char4 b4 = a4; cl_char8 b8 = a8; cl_char16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %d\n", b ); printf("b2: %d %d \n", b2.s[0], b2.s[1] ); printf("b4: %d %d %d %d\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %d %d %d %d %d %d %d %d\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_CHAR2__ ) __cl_char2 v2 = b2.v2; printf("__cl_char2: %d %d \n", ((cl_char*)&v2)[0], ((cl_char*)&v2)[1] ); #else printf( "__cl_char2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_CHAR4__ ) __cl_char4 v4 = b4.v4; printf("__cl_char4: %d %d %d %d \n", ((cl_char*)&v4)[0], ((cl_char*)&v4)[1], ((cl_char*)&v4)[2], ((cl_char*)&v4)[3] ); #else printf( "__cl_char4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8 = b8.v8; printf("__cl_char8: %d %d %d %d %d %d %d %d \n", ((cl_char*)&v8)[0], ((cl_char*)&v8)[1], ((cl_char*)&v8)[2], ((cl_char*)&v8)[3], ((cl_char*)&v8)[4], ((cl_char*)&v8)[5], ((cl_char*)&v8)[6], ((cl_char*)&v8)[7] ); #else printf( "__cl_char8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_CHAR16__ ) __cl_char16 v16 = b16.v16; printf("__cl_char16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d \n", ((cl_char*)&v16)[0], ((cl_char*)&v16)[1], ((cl_char*)&v16)[2], ((cl_char*)&v16)[3], ((cl_char*)&v16)[4], ((cl_char*)&v16)[5], ((cl_char*)&v16)[6], ((cl_char*)&v16)[7], ((cl_char*)&v16)[8], ((cl_char*)&v16)[9], ((cl_char*)&v16)[10], ((cl_char*)&v16)[11], ((cl_char*)&v16)[12], ((cl_char*)&v16)[13], ((cl_char*)&v16)[14], ((cl_char*)&v16)[15]); #else printf( "__cl_char16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_uchar() { /* uchar */ /* Constructor */ cl_uchar a = 0; cl_uchar2 a2 = {{ 0, 1 }}; cl_uchar4 a4 = {{ 0, 1, 2, 3 }}; cl_uchar8 a8 = {{ 0, 1, 2, 3, 4, 5, 6, 7 }}; cl_uchar16 a16 = {{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }}; /* assignment */ cl_uchar b = a; cl_uchar2 b2 = a2; cl_uchar4 b4 = a4; cl_uchar8 b8 = a8; cl_uchar16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %d\n", b ); printf("b2: %d %d \n", b2.s[0], b2.s[1] ); printf("b4: %d %d %d %d\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %d %d %d %d %d %d %d %d\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_UCHAR2__ ) __cl_uchar2 v2 = b2.v2; printf("__cl_uchar2: %d %d \n", ((uchar*)&v2)[0], ((cl_uchar*)&v2)[1] ); #else printf( "__cl_uchar2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_UCHAR4__ ) __cl_uchar4 v4 = b4.v4; printf("__cl_uchar4: %d %d %d %d \n", ((uchar*)&v4)[0], ((cl_uchar*)&v4)[1], ((cl_uchar*)&v4)[2], ((cl_uchar*)&v4)[3] ); #else printf( "__cl_uchar4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8 = b8.v8; printf("__cl_uchar8: %d %d %d %d %d %d %d %d \n", ((cl_uchar*)&v8)[0], ((cl_uchar*)&v8)[1], ((cl_uchar*)&v8)[2], ((cl_uchar*)&v8)[3], ((cl_uchar*)&v8)[4], ((cl_uchar*)&v8)[5], ((cl_uchar*)&v8)[6], ((cl_uchar*)&v8)[7] ); #else printf( "__cl_uchar8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_UCHAR16__ ) __cl_uchar16 v16 = b16.v16; printf("__cl_uchar16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d \n", ((cl_uchar*)&v16)[0], ((cl_uchar*)&v16)[1], ((cl_uchar*)&v16)[2], ((cl_uchar*)&v16)[3], ((cl_uchar*)&v16)[4], ((cl_uchar*)&v16)[5], ((cl_uchar*)&v16)[6], ((cl_uchar*)&v16)[7], ((cl_uchar*)&v16)[8], ((cl_uchar*)&v16)[9], ((cl_uchar*)&v16)[10], ((cl_uchar*)&v16)[11], ((cl_uchar*)&v16)[12], ((cl_uchar*)&v16)[13], ((cl_uchar*)&v16)[14], ((cl_uchar*)&v16)[15]); #else printf( "__cl_uchar16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_short() { /* short */ /* Constructor */ cl_short a = 0; cl_short2 a2 = {{ 0, 1 }}; cl_short4 a4 = {{ 0, 1, 2, 3 }}; cl_short8 a8 = {{ 0, 1, 2, 3, 4, 5, 6, 7 }}; cl_short16 a16 = {{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }}; /* assignment */ cl_short b = a; cl_short2 b2 = a2; cl_short4 b4 = a4; cl_short8 b8 = a8; cl_short16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %d\n", b ); printf("b2: %d %d \n", b2.s[0], b2.s[1] ); printf("b4: %d %d %d %d\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %d %d %d %d %d %d %d %d\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_SHORT2__ ) __cl_short2 v2 = b2.v2; printf("__cl_short2: %d %d \n", ((cl_short*)&v2)[0], ((cl_short*)&v2)[1] ); #else printf( "__cl_short2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_SHORT4__ ) __cl_short4 v4 = b4.v4; printf("__cl_short4: %d %d %d %d \n", ((cl_short*)&v4)[0], ((cl_short*)&v4)[1], ((cl_short*)&v4)[2], ((cl_short*)&v4)[3] ); #else printf( "__cl_short4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8 = b8.v8; printf("__cl_short8: %d %d %d %d %d %d %d %d \n", ((cl_short*)&v8)[0], ((cl_short*)&v8)[1], ((cl_short*)&v8)[2], ((cl_short*)&v8)[3], ((cl_short*)&v8)[4], ((cl_short*)&v8)[5], ((cl_short*)&v8)[6], ((cl_short*)&v8)[7] ); #else printf( "__cl_short8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_SHORT16__ ) __cl_short16 v16 = b16.v16; printf("__cl_short16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d \n", ((cl_short*)&v16)[0], ((cl_short*)&v16)[1], ((cl_short*)&v16)[2], ((cl_short*)&v16)[3], ((cl_short*)&v16)[4], ((cl_short*)&v16)[5], ((cl_short*)&v16)[6], ((cl_short*)&v16)[7], ((cl_short*)&v16)[8], ((cl_short*)&v16)[9], ((cl_short*)&v16)[10], ((cl_short*)&v16)[11], ((cl_short*)&v16)[12], ((cl_short*)&v16)[13], ((cl_short*)&v16)[14], ((cl_short*)&v16)[15]); #else printf( "__cl_short16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_ushort() { /* ushort */ /* Constructor */ cl_ushort a = 0; cl_ushort2 a2 = {{ 0, 1 }}; cl_ushort4 a4 = {{ 0, 1, 2, 3 }}; cl_ushort8 a8 = {{ 0, 1, 2, 3, 4, 5, 6, 7 }}; cl_ushort16 a16 = {{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }}; /* assignment */ cl_ushort b = a; cl_ushort2 b2 = a2; cl_ushort4 b4 = a4; cl_ushort8 b8 = a8; cl_ushort16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %d\n", b ); printf("b2: %d %d \n", b2.s[0], b2.s[1] ); printf("b4: %d %d %d %d\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %d %d %d %d %d %d %d %d\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_USHORT2__ ) __cl_ushort2 v2 = b2.v2; printf("__cl_ushort2: %d %d \n", ((unsigned short*)&v2)[0], ((unsigned short*)&v2)[1] ); #else printf( "__cl_ushort2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_USHORT4__ ) __cl_ushort4 v4 = b4.v4; printf("__cl_ushort4: %d %d %d %d \n", ((unsigned short*)&v4)[0], ((unsigned short*)&v4)[1], ((unsigned short*)&v4)[2], ((unsigned short*)&v4)[3] ); #else printf( "__cl_ushort4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8 = b8.v8; printf("__cl_ushort8: %d %d %d %d %d %d %d %d \n", ((unsigned short*)&v8)[0], ((unsigned short*)&v8)[1], ((unsigned short*)&v8)[2], ((unsigned short*)&v8)[3], ((unsigned short*)&v8)[4], ((unsigned short*)&v8)[5], ((unsigned short*)&v8)[6], ((unsigned short*)&v8)[7] ); #else printf( "__cl_ushort8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_USHORT16__ ) __cl_ushort16 v16 = b16.v16; printf("__cl_ushort16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d \n", ((unsigned short*)&v16)[0], ((unsigned short*)&v16)[1], ((unsigned short*)&v16)[2], ((unsigned short*)&v16)[3], ((unsigned short*)&v16)[4], ((unsigned short*)&v16)[5], ((unsigned short*)&v16)[6], ((unsigned short*)&v16)[7], ((unsigned short*)&v16)[8], ((unsigned short*)&v16)[9], ((unsigned short*)&v16)[10], ((unsigned short*)&v16)[11], ((unsigned short*)&v16)[12], ((unsigned short*)&v16)[13], ((unsigned short*)&v16)[14], ((unsigned short*)&v16)[15]); #else printf( "__cl_ushort16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_int() { /* int */ /* Constructor */ cl_int a = 0; cl_int2 a2 = {{ 0, 1 }}; cl_int4 a4 = {{ 0, 1, 2, 3 }}; cl_int8 a8 = {{ 0, 1, 2, 3, 4, 5, 6, 7 }}; cl_int16 a16 = {{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }}; /* assignment */ cl_int b = a; cl_int2 b2 = a2; cl_int4 b4 = a4; cl_int8 b8 = a8; cl_int16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %d\n", b ); printf("b2: %d %d \n", b2.s[0], b2.s[1] ); printf("b4: %d %d %d %d\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %d %d %d %d %d %d %d %d\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_INT2__ ) __cl_int2 v2 = b2.v2; printf("__cl_int2: %d %d \n", ((cl_int*)&v2)[0], ((cl_int*)&v2)[1] ); #else printf( "__cl_int2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_INT4__ ) __cl_int4 v4 = b4.v4; printf("__cl_int4: %d %d %d %d \n", ((cl_int*)&v4)[0], ((cl_int*)&v4)[1], ((cl_int*)&v4)[2], ((cl_int*)&v4)[3] ); #else printf( "__cl_int4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_INT8__ ) __cl_int8 v8 = b8.v8; printf("__cl_int8: %d %d %d %d %d %d %d %d \n", ((cl_int*)&v8)[0], ((cl_int*)&v8)[1], ((cl_int*)&v8)[2], ((cl_int*)&v8)[3], ((cl_int*)&v8)[4], ((cl_int*)&v8)[5], ((cl_int*)&v8)[6], ((cl_int*)&v8)[7] ); #else printf( "__cl_int8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_INT16__ ) __cl_int16 v16 = b16.v16; printf("__cl_int16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d \n", ((cl_int*)&v16)[0], ((cl_int*)&v16)[1], ((cl_int*)&v16)[2], ((cl_int*)&v16)[3], ((cl_int*)&v16)[4], ((cl_int*)&v16)[5], ((cl_int*)&v16)[6], ((cl_int*)&v16)[7], ((cl_int*)&v16)[8], ((cl_int*)&v16)[9], ((cl_int*)&v16)[10], ((cl_int*)&v16)[11], ((cl_int*)&v16)[12], ((cl_int*)&v16)[13], ((cl_int*)&v16)[14], ((cl_int*)&v16)[15]); #else printf( "__cl_int16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_uint() { /* uint */ /* Constructor */ cl_uint a = 0; cl_uint2 a2 = {{ 0, 1 }}; cl_uint4 a4 = {{ 0, 1, 2, 3 }}; cl_uint8 a8 = {{ 0, 1, 2, 3, 4, 5, 6, 7 }}; cl_uint16 a16 = {{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }}; /* assignment */ cl_uint b = a; cl_uint2 b2 = a2; cl_uint4 b4 = a4; cl_uint8 b8 = a8; cl_uint16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %d\n", b ); printf("b2: %d %d \n", b2.s[0], b2.s[1] ); printf("b4: %d %d %d %d\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %d %d %d %d %d %d %d %d\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_UINT2__ ) __cl_uint2 v2 = b2.v2; printf("__cl_uint2: %d %d \n", ((cl_uint*)&v2)[0], ((cl_uint*)&v2)[1] ); #else printf( "__cl_uint2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_UINT4__ ) __cl_uint4 v4 = b4.v4; printf("__cl_uint4: %d %d %d %d \n", ((cl_uint*)&v4)[0], ((cl_uint*)&v4)[1], ((cl_uint*)&v4)[2], ((cl_uint*)&v4)[3] ); #else printf( "__cl_uint4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8 = b8.v8; printf("__cl_uint8: %d %d %d %d %d %d %d %d \n", ((cl_uint*)&v8)[0], ((cl_uint*)&v8)[1], ((cl_uint*)&v8)[2], ((cl_uint*)&v8)[3], ((cl_uint*)&v8)[4], ((cl_uint*)&v8)[5], ((cl_uint*)&v8)[6], ((cl_uint*)&v8)[7] ); #else printf( "__cl_uint8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_UINT16__ ) __cl_uint16 v16 = b16.v16; printf("__cl_uint16: %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d \n", ((cl_uint*)&v16)[0], ((cl_uint*)&v16)[1], ((cl_uint*)&v16)[2], ((cl_uint*)&v16)[3], ((cl_uint*)&v16)[4], ((cl_uint*)&v16)[5], ((cl_uint*)&v16)[6], ((cl_uint*)&v16)[7], ((cl_uint*)&v16)[8], ((cl_uint*)&v16)[9], ((cl_uint*)&v16)[10], ((cl_uint*)&v16)[11], ((cl_uint*)&v16)[12], ((cl_uint*)&v16)[13], ((cl_uint*)&v16)[14], ((cl_uint*)&v16)[15]); #else printf( "__cl_uint16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_long() { /* long */ /* Constructor */ cl_long a = 0; cl_long2 a2 = {{ 0, 1 }}; cl_long4 a4 = {{ 0, 1, 2, 3 }}; cl_long8 a8 = {{ 0, 1, 2, 3, 4, 5, 6, 7 }}; cl_long16 a16 = {{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }}; /* assignment */ cl_long b = a; cl_long2 b2 = a2; cl_long4 b4 = a4; cl_long8 b8 = a8; cl_long16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %lld\n", b ); printf("b2: %lld %lld \n", b2.s[0], b2.s[1] ); printf("b4: %lld %lld %lld %lld\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %lld %lld %lld %lld %lld %lld %lld %lld\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_LONG2__ ) __cl_long2 v2 = b2.v2; printf("__cl_long2: %lld %lld \n", ((cl_long*)&v2)[0], ((cl_long*)&v2)[1] ); #else printf( "__cl_long2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_LONG4__ ) __cl_long4 v4 = b4.v4; printf("__cl_long4: %lld %lld %lld %lld \n", ((cl_long*)&v4)[0], ((cl_long*)&v4)[1], ((cl_long*)&v4)[2], ((cl_long*)&v4)[3] ); #else printf( "__cl_long4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_LONG8__ ) __cl_long8 v8 = b8.v8; printf("__cl_long8: %lld %lld %lld %lld %lld %lld %lld %lld \n", ((cl_long*)&v8)[0], ((cl_long*)&v8)[1], ((cl_long*)&v8)[2], ((cl_long*)&v8)[3], ((cl_long*)&v8)[4], ((cl_long*)&v8)[5], ((cl_long*)&v8)[6], ((cl_long*)&v8)[7] ); #else printf( "__cl_long8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_LONG16__ ) __cl_long16 v16 = b16.v16; printf("__cl_long16: %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld \n", ((cl_long*)&v16)[0], ((cl_long*)&v16)[1], ((cl_long*)&v16)[2], ((cl_long*)&v16)[3], ((cl_long*)&v16)[4], ((cl_long*)&v16)[5], ((cl_long*)&v16)[6], ((cl_long*)&v16)[7], ((cl_long*)&v16)[8], ((cl_long*)&v16)[9], ((cl_long*)&v16)[10], ((cl_long*)&v16)[11], ((cl_long*)&v16)[12], ((cl_long*)&v16)[13], ((cl_long*)&v16)[14], ((cl_long*)&v16)[15]); #else printf( "__cl_long16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_ulong() { /* ulong */ /* Constructor */ cl_ulong a = 0; cl_ulong2 a2 = {{ 0, 1 }}; cl_ulong4 a4 = {{ 0, 1, 2, 3 }}; cl_ulong8 a8 = {{ 0, 1, 2, 3, 4, 5, 6, 7 }}; cl_ulong16 a16 = {{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }}; /* assignment */ cl_ulong b = a; cl_ulong2 b2 = a2; cl_ulong4 b4 = a4; cl_ulong8 b8 = a8; cl_ulong16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %lld\n", b ); printf("b2: %lld %lld \n", b2.s[0], b2.s[1] ); printf("b4: %lld %lld %lld %lld\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %lld %lld %lld %lld %lld %lld %lld %lld\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_ULONG2__ ) __cl_ulong2 v2 = b2.v2; printf("__cl_ulong2: %lld %lld \n", ((cl_ulong*)&v2)[0], ((cl_ulong*)&v2)[1] ); #else printf( "__cl_ulong2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_ULONG4__ ) __cl_ulong4 v4 = b4.v4; printf("__cl_ulong4: %lld %lld %lld %lld \n", ((cl_ulong*)&v4)[0], ((cl_ulong*)&v4)[1], ((cl_ulong*)&v4)[2], ((cl_ulong*)&v4)[3] ); #else printf( "__cl_ulong4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8 = b8.v8; printf("__cl_ulong8: %lld %lld %lld %lld %lld %lld %lld %lld \n", ((cl_ulong*)&v8)[0], ((cl_ulong*)&v8)[1], ((cl_ulong*)&v8)[2], ((cl_ulong*)&v8)[3], ((cl_ulong*)&v8)[4], ((cl_ulong*)&v8)[5], ((cl_ulong*)&v8)[6], ((cl_ulong*)&v8)[7] ); #else printf( "__cl_ulong8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_ULONG16__ ) __cl_ulong16 v16 = b16.v16; printf("__cl_ulong16: %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld \n", ((cl_ulong*)&v16)[0], ((cl_ulong*)&v16)[1], ((cl_ulong*)&v16)[2], ((cl_ulong*)&v16)[3], ((cl_ulong*)&v16)[4], ((cl_ulong*)&v16)[5], ((cl_ulong*)&v16)[6], ((cl_ulong*)&v16)[7], ((cl_ulong*)&v16)[8], ((cl_ulong*)&v16)[9], ((cl_ulong*)&v16)[10], ((cl_ulong*)&v16)[11], ((cl_ulong*)&v16)[12], ((cl_ulong*)&v16)[13], ((cl_ulong*)&v16)[14], ((cl_ulong*)&v16)[15]); #else printf( "__cl_ulong16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_float() { /* float */ /* Constructor */ cl_float a = 0.0f; cl_float2 a2 = {{ 0.0f, 1.0f }}; cl_float4 a4 = {{ 0.0f, 1.0f, 2.0f, 3.0f }}; cl_float8 a8 = {{ 0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f }}; cl_float16 a16 = {{ 0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 10.0f, 11.0f, 12.0f, 13.0f, 14.0f, 15.0f }}; /* assignment */ cl_float b = a; cl_float2 b2 = a2; cl_float4 b4 = a4; cl_float8 b8 = a8; cl_float16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %f\n", b ); printf("b2: %f %f \n", b2.s[0], b2.s[1] ); printf("b4: %f %f %f %f\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %f %f %f %f %f %f %f %f\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_FLOAT2__ ) __cl_float2 v2 = b2.v2; printf("__cl_float2: %f %f \n", ((cl_float*)&v2)[0], ((cl_float*)&v2)[1] ); #else printf( "__cl_float2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_FLOAT4__ ) { __cl_float4 v4 = b4.v4; printf("__cl_float4: %f %f %f %f \n", ((cl_float*)&v4)[0], ((cl_float*)&v4)[1], ((cl_float*)&v4)[2], ((cl_float*)&v4)[3] ); } #else printf( "__cl_float4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8 = b8.v8; printf("__cl_float8: %f %f %f %f %f %f %f %f \n", ((cl_float*)&v8)[0], ((cl_float*)&v8)[1], ((cl_float*)&v8)[2], ((cl_float*)&v8)[3], ((cl_float*)&v8)[4], ((cl_float*)&v8)[5], ((cl_float*)&v8)[6], ((cl_float*)&v8)[7] ); #else printf( "__cl_float8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_FLOAT16__ ) __cl_float16 v16 = b16.v16; printf("__cl_float16: %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f \n", ((cl_float*)&v16)[0], ((cl_float*)&v16)[1], ((cl_float*)&v16)[2], ((cl_float*)&v16)[3], ((cl_float*)&v16)[4], ((cl_float*)&v16)[5], ((cl_float*)&v16)[6], ((cl_float*)&v16)[7], ((cl_float*)&v16)[8], ((cl_float*)&v16)[9], ((cl_float*)&v16)[10], ((cl_float*)&v16)[11], ((cl_float*)&v16)[12], ((cl_float*)&v16)[13], ((cl_float*)&v16)[14], ((cl_float*)&v16)[15]); #else printf( "__cl_float16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int test_double() { /* double */ /* Constructor */ cl_double a = 0.0f; cl_double2 a2 = {{ 0.0f, 1.0f }}; cl_double4 a4 = {{ 0.0f, 1.0f, 2.0f, 3.0f }}; cl_double8 a8 = {{ 0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f }}; cl_double16 a16 = {{ 0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 10.0f, 11.0f, 12.0f, 13.0f, 14.0f, 15.0f }}; /* assignment */ cl_double b = a; cl_double2 b2 = a2; cl_double4 b4 = a4; cl_double8 b8 = a8; cl_double16 b16 = a16; printf("\nVerifying assignment:\n" ); printf("b: %f\n", b ); printf("b2: %f %f \n", b2.s[0], b2.s[1] ); printf("b4: %f %f %f %f\n", b4.s[0], b4.s[1], b4.s[2], b4.s[3] ); printf("b8: %f %f %f %f %f %f %f %f\n", b8.s[0], b8.s[1], b8.s[2], b8.s[3], b8.s[4], b8.s[5], b8.s[6], b8.s[7] ); printf("b16: %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f\n", b16.s[0], b16.s[1], b16.s[2], b16.s[3], b16.s[4], b16.s[5], b16.s[6], b16.s[7], b16.s[8], b16.s[9], b16.s[10], b16.s[11], b16.s[12], b16.s[13], b16.s[14], b16.s[15]); /* vector access */ printf("\nVerifying vector access:\n" ); #if defined( __CL_DOUBLE2__ ) __cl_double2 v2 = b2.v2; printf("__cl_double2: %f %f \n", ((cl_double*)&v2)[0], ((cl_double*)&v2)[1] ); #else printf( "__cl_double2 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_DOUBLE4__ ) __cl_double4 v4 = b4.v4; printf("__cl_double4: %f %f %f %f \n", ((cl_double*)&v4)[0], ((cl_double*)&v4)[1], ((cl_double*)&v4)[2], ((cl_double*)&v4)[3] ); #else printf( "__cl_double4 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8 = b8.v8; printf("__cl_double8: %f %f %f %f %f %f %f %f \n", ((cl_double*)&v8)[0], ((cl_double*)&v8)[1], ((cl_double*)&v8)[2], ((cl_double*)&v8)[3], ((cl_double*)&v8)[4], ((cl_double*)&v8)[5], ((cl_double*)&v8)[6], ((cl_double*)&v8)[7] ); #else printf( "__cl_double8 SIMD vectors not supported on this architecture.\n" ); #endif #if defined( __CL_DOUBLE16__ ) __cl_double16 v16 = b16.v16; printf("__cl_double16: %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f \n", ((cl_double*)&v16)[0], ((cl_double*)&v16)[1], ((cl_double*)&v16)[2], ((cl_double*)&v16)[3], ((cl_double*)&v16)[4], ((cl_double*)&v16)[5], ((cl_double*)&v16)[6], ((cl_double*)&v16)[7], ((cl_double*)&v16)[8], ((cl_double*)&v16)[9], ((cl_double*)&v16)[10], ((cl_double*)&v16)[11], ((cl_double*)&v16)[12], ((cl_double*)&v16)[13], ((cl_double*)&v16)[14], ((cl_double*)&v16)[15]); #else printf( "__cl_double16 SIMD vectors not supported on this architecture.\n" ); #endif printf( "\n" ); return 0; } int main(void) { printf( "\nChecking operations on cl_types.\nNumbers, where presented, should walk upward from 0, with step of 1:\n" ); test_char(); test_uchar(); test_short(); test_ushort(); test_long(); test_ulong(); test_float(); test_double(); return 0; } clr-rocm-5.7.1/opencl/khronos/headers/opencl2.2/tests/test_opencl.h.c000066400000000000000000000013361450307266000254030ustar00rootroot00000000000000// // Copyright (c) 2020 The Khronos Group Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // #include #include "CL/opencl.h" int main( void ) { printf("opencl.h standalone test PASSED.\n"); return 0; } clr-rocm-5.7.1/opencl/khronos/icd/000077500000000000000000000000001450307266000167475ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/.gitignore000066400000000000000000000000411450307266000207320ustar00rootroot00000000000000inc/CL/ inc/EGL/ inc/KHR/ build/ clr-rocm-5.7.1/opencl/khronos/icd/CMakeLists.txt000066400000000000000000000137361450307266000215210ustar00rootroot00000000000000cmake_minimum_required (VERSION 2.8.11) project (OPENCL_ICD_LOADER) include (GNUInstallDirs) find_package (Threads REQUIRED) # The option below allows building the ICD Loader library as a shared library # (ON, default) or a static library (OFF). # # Khronos OpenCL Working Group strongly recommends building and using the ICD # loader as a shared library due to the following benefits: # # 1. The shared library can be updated independent of the application. This # allows releasing new fixes and features in the ICD loader without updating # the application. # # In rare cases when there are backward-incompatible changes to the ICD # loader (due to platform requirements, for instance), using a shared # library allows updating the library to make the transition seamless to # installed applications. # # 2. On platforms that require the ICD mechanism there are multiple vendors # shipping their OpenCL implementations. The vendor installers collaborate # to make sure that the installed ICD shared library version is suitable for # working with all vendor implementations installed on the system. # # If applications statically link to ICD Loader then that version of the ICD # loader may not work with one or more installed vendor implementations. # # Using the OpenCL ICD loader as a static library is NOT recommended for # end-user installations in general. However in some controlled environments it # may be useful to simplify the build and distribution of the application. E.g. # in test farms, or in cases where the end-user system configs are known in # advance. Use it with discretion. option (BUILD_SHARED_LIBS "Build shared libs" ON) include(CheckFunctionExists) check_function_exists(secure_getenv HAVE_SECURE_GETENV) check_function_exists(__secure_getenv HAVE___SECURE_GETENV) configure_file(${CMAKE_CURRENT_SOURCE_DIR}/loader/icd_cmake_config.h.in ${CMAKE_CURRENT_BINARY_DIR}/icd_cmake_config.h) set (OPENCL_ICD_LOADER_SOURCES loader/icd.c loader/icd.h loader/icd_dispatch.c loader/icd_dispatch.h loader/icd_envvars.h loader/icd_platform.h) if (WIN32) list (APPEND OPENCL_ICD_LOADER_SOURCES loader/windows/icd_windows.c loader/windows/icd_windows.h loader/windows/icd_windows_dxgk.c loader/windows/icd_windows_dxgk.h loader/windows/icd_windows_envvars.c loader/windows/icd_windows_hkr.c loader/windows/icd_windows_hkr.h loader/windows/OpenCL.def loader/windows/OpenCL.rc) # Only add the DXSDK include directory if the environment variable is # defined. Since the DXSDK has merged into the Windows SDK, this is # only required in rare cases. if (DEFINED ENV{DXSDK_DIR} AND NOT (MINGW OR MSYS OR CYGWIN)) include_directories ($ENV{DXSDK_DIR}/Include) endif () else () list (APPEND OPENCL_ICD_LOADER_SOURCES loader/linux/icd_linux.c loader/linux/icd_linux_envvars.c loader/linux/icd_exports.map) endif () set (OPENCL_ICD_LOADER_HEADERS_DIR ${CMAKE_CURRENT_SOURCE_DIR}/inc CACHE PATH "Path to OpenCL Headers") add_library (OpenCL ${OPENCL_ICD_LOADER_SOURCES}) set_target_properties (OpenCL PROPERTIES VERSION "1.2" SOVERSION "1") if (WIN32) target_link_libraries (OpenCL cfgmgr32.lib) option (OPENCL_ICD_LOADER_REQUIRE_WDK "Build with D3DKMT support, which requires the Windows WDK." ON) if (OPENCL_ICD_LOADER_REQUIRE_WDK) if(DEFINED ENV{WDKContentRoot}) file(GLOB D3DKMT_HEADER "$ENV{WDKContentRoot}/Include/*/km/d3dkmthk.h") else() file(GLOB D3DKMT_HEADER "$ENV{HOMEDRIVE}/Program Files */Windows Kits/10/Include/*/km/d3dkmthk.h") endif() if(D3DKMT_HEADER) list(GET D3DKMT_HEADER -1 LATEST_D3DKMT_HEADER) get_filename_component(WDK_DIRECTORY ${LATEST_D3DKMT_HEADER} DIRECTORY) get_filename_component(WDK_DIRECTORY ${WDK_DIRECTORY} DIRECTORY) message(STATUS "Found the Windows WDK in: ${WDK_DIRECTORY}") target_compile_definitions(OpenCL PRIVATE OPENCL_ICD_LOADER_REQUIRE_WDK) target_include_directories(OpenCL PRIVATE ${WDK_DIRECTORY}/um) target_include_directories(OpenCL PRIVATE ${WDK_DIRECTORY}/km) target_include_directories(OpenCL PRIVATE ${WDK_DIRECTORY}/shared) else() message(FATAL_ERROR "The Windows WDK was not found. Consider disabling OPENCL_ICD_LOADER_REQUIRE_WDK. Aborting.") endif() endif() if(NOT USE_DYNAMIC_VCXX_RUNTIME) string(REPLACE "/MD" "/MT" CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE}") string(REPLACE "/MD" "/MT" CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE}") string(REPLACE "/MD" "/MT" CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_MINSIZEREL}") string(REPLACE "/MD" "/MT" CMAKE_CXX_FLAGS_MINSIZEREL "${CMAKE_CXX_FLAGS_MINSIZEREL}") string(REPLACE "/MD" "/MT" CMAKE_C_FLAGS_RELWITHDEBINFO "${CMAKE_C_FLAGS_RELWITHDEBINFO}") string(REPLACE "/MD" "/MT" CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO}") string(REPLACE "/MDd" "/MTd" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}") string(REPLACE "/MDd" "/MTd" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}") endif() else() if (APPLE) target_link_libraries (OpenCL ${CMAKE_THREAD_LIBS_INIT}) else () set_target_properties (OpenCL PROPERTIES LINK_FLAGS "-Wl,--version-script -Wl,${CMAKE_CURRENT_SOURCE_DIR}/loader/linux/icd_exports.map") target_link_libraries (OpenCL ${CMAKE_THREAD_LIBS_INIT}) endif () endif () include_directories (${OPENCL_ICD_LOADER_HEADERS_DIR}) add_definitions (-DCL_TARGET_OPENCL_VERSION=220) target_include_directories (OpenCL PRIVATE ${CMAKE_CURRENT_BINARY_DIR} loader) target_link_libraries (OpenCL ${CMAKE_DL_LIBS}) include (CTest) if (BUILD_TESTING) add_subdirectory (test) endif() install (TARGETS OpenCL RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}) clr-rocm-5.7.1/opencl/khronos/icd/CODE_OF_CONDUCT.md000066400000000000000000000004301450307266000215430ustar00rootroot00000000000000A reminder that this issue tracker is managed by the Khronos Group. Interactions here should follow the Khronos Code of Conduct (https://www.khronos.org/developers/code-of-conduct), which prohibits aggressive or derogatory language. Please keep the discussion friendly and civil. clr-rocm-5.7.1/opencl/khronos/icd/LICENSE000066400000000000000000000261351450307266000177630ustar00rootroot00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. clr-rocm-5.7.1/opencl/khronos/icd/LICENSE.txt000066400000000000000000000033341450307266000205750ustar00rootroot00000000000000Copyright (c) 2016 The Khronos Group Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software source and associated documentation files (the "Materials"), to deal in the Materials without restriction, including without limitation the rights to use, copy, modify, compile, merge, publish, distribute, sublicense, and/or sell copies of the Materials, and to permit persons to whom the Materials are furnished to do so, subject the following terms and conditions: All modifications to the Materials used to create a binary that is distributed to third parties shall be provided to Khronos with an unrestricted license to use for the purposes of implementing bug fixes and enhancements to the Materials; If the binary is used as part of an OpenCL(TM) implementation, whether binary is distributed together with or separately to that implementation, then recipient must become an OpenCL Adopter and follow the published OpenCL conformance process for that implementation, details at: http://www.khronos.org/conformance/; The above copyright notice, the OpenCL trademark license, and this permission notice shall be included in all copies or substantial portions of the Materials. THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. OpenCL is a trademark of Apple Inc. used under license by Khronos. clr-rocm-5.7.1/opencl/khronos/icd/README.md000066400000000000000000000144411450307266000202320ustar00rootroot00000000000000# OpenCL ICD Loader This repo contains the source code and tests for the Khronos official OpenCL ICD Loader. ## CI Build Status [![Windows Build Status](https://ci.appveyor.com/api/projects/status/47uhjgp5h4de2f63/branch/master?svg=true)](https://ci.appveyor.com/project/Khronoswebmaster/opencl-icd-loader/branch/master) [![Linux OSX Build Status](https://travis-ci.com/KhronosGroup/opencl-icd-loader.svg?branch=master)](https://travis-ci.com/KhronosGroup/opencl-icd-loader) ## Introduction OpenCL defines an *Installable Client Driver* (ICD) mechanism to allow developers to build applications against an *Installable Client Driver* loader (ICD loader) rather than linking their applications against a specific OpenCL implementation. The ICD Loader is responsible for: * Exporting OpenCL API entry points * Enumerating OpenCL implementations * Forwarding OpenCL API calls to the correct implementation This repo contains the source code and tests for the Khronos official OpenCL ICD Loader. Note that this repo does not contain an OpenCL implementation (ICD). You will need to obtain and install an OpenCL implementation for your OpenCL device that supports the OpenCL ICD extension `cl_khr_icd` to run an application using the OpenCL ICD Loader. The OpenCL *Installable Client Driver* extension (`cl_khr_icd`) is described in the OpenCL extensions specification, which may be found on the [Khronos OpenCL Registry](https://www.khronos.org/registry/OpenCL/). ## Build Instructions ### Dependencies The OpenCL ICD Loader requires OpenCL Headers. To use system OpenCL Headers, please specify the OpenCL Header location using the CMake variable `OPENCL_ICD_LOADER_HEADERS_DIR`. By default, the OpenCL ICD Loader will look for OpenCL Headers in the `inc` directory. By default, the OpenCL ICD Loader on Windows requires the Windows Driver Kit (WDK). To build OpenCL ICD Loader with WDK support - * Install recent Windows WDK currently at https://docs.microsoft.com/en-us/windows-hardware/drivers/download-the-wdk * Establish environment variable WDK to include directory. Ex: set WDK=C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0 An OpenCL ICD Loader may be built without the Windows Driver Kit using the CMake variable `OPENCL_ICD_LOADER_REQUIRE_WDK`, however this option should be used with caution since it may prevent the OpenCL ICD Loader from enumerating some OpenCL implementations. This dependency may be removed in the future. The OpenCL ICD Loader uses CMake for its build system. If CMake is not provided by your build system or OS package manager, please consult the [CMake website](https://cmake.org). ### Build and Install Directories A common convention is to place the `build` directory in the top directory of the repository and to place the `install` directory as a child of the `build` directory. The remainder of these instructions follow this convention, although you may place these directories in any location. ### Example Usage For most Windows and Linux usages, the following steps are sufficient to build the OpenCL ICD Loader: 1. Clone this repo: git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader 1. Obtain the OpenCL Headers, if you are not planning to use system OpenCL headers. Headers may be obtained from the [Khronos OpenCL Headers](https://github.com/KhronosGroup/OpenCL-Headers) repository. 1. Create a `build` directory: cd OpenCL-ICD-Loader mkdir build cd build 1. Invoke `cmake` to generate solution files, Makefiles, or files for other build systems. cmake .. 1. Build using the CMake-generated files. Notes: * For 64-bit Windows builds, you may need to specify a 64-bit generator manually, for example: cmake.exe -G "Visual Studio 14 2015 Win64" .. * Some users may prefer to use a CMake GUI frontend, such as `cmake-gui` or `ccmake`, vs. the command-line CMake. ## OpenCL ICD Loader Tests OpenCL ICD Loader Tests can be run using `ctest`, which is a companion to CMake. The OpenCL ICD Loader Tests can also be run directly by executing icd_loader_test(.exe) executable from the bin folder. ### Test Setup The OpenCL ICD Loader Tests use a "stub" ICD, which must be set up manually. The OpenCL ICD Loader Tests will "fail" if the "stub" ICD is not set up correctly. The method to install the "stub" ICD is operating system dependent. On Linux, install the "stub" ICD by creating a file with the full path to the "stub" ICD in `/etc/OpenCL/vendors`: echo full/path/to/libOpenCLDriverStub.so > /etc/OpenCL/vendors/test.icd On Windows, add the "stub" ICD by adding a `REG_DWORD` value to the registry keys: // For 32-bit operating systems, or 64-bit tests on a 64-bit operating system: HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors // For 32-bit tests on a 64-bit operating system: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Khronos\OpenCL\Vendors // The name of the REG_DWORD value should be the full path to the "stub" ICD // OpenCLDriverStub.dll, and the data for this value should be 0. ### Running Tests To run the tests, invoke `ctest` from the `build` directory. The CMake-generated build files may be able to invoke the OpenCL ICD Loader tests as well. ### Test Cleanup Manually remove the file or registry keys added during Test Setup. ## Support Please create a GitHub issue to report an issue or ask questions. ## Contributing Contributions to the OpenCL ICD Loader are welcomed and encouraged. You will be prompted with a one-time "click-through" CLA dialog as part of submitting your pull request or other contribution to GitHub. ## Table of Debug Environment Variables The following debug environment variables are available for use with the OpenCL ICD loader: | Environment Variable | Behavior | Example Format | |:---------------------------------:|---------------------|----------------------| | OCL_ICD_FILENAMES | Specifies a list of additional ICDs to load. The ICDs will be enumerated first, before any ICDs discovered via default mechanisms. | `export OCL_ICD_FILENAMES=libVendorA.so:libVendorB.so`

`set OCL_ICD_FILENAMES=vendor_a.dll;vendor_b.dll` | | OCL_ICD_VENDORS | On Linux and Android, specifies a directory to scan for ICDs to enumerate in place of the default `/etc/OpenCL/vendors'. | `export OCL_ICD_VENDORS=/my/local/icd/search/path` | clr-rocm-5.7.1/opencl/khronos/icd/inc/000077500000000000000000000000001450307266000175205ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/inc/README.txt000066400000000000000000000005061450307266000212170ustar00rootroot00000000000000Copy or symlink OpenCL headers here, inside a CL directory, so that the structure of the inc directory looks something like this: inc/CL/cl_d3d10.h inc/CL/cl_d3d11.h inc/CL/cl_dx9_media_sharing.h inc/CL/cl_egl.h inc/CL/cl_ext.h inc/CL/cl_gl_ext.h inc/CL/cl_gl.h inc/CL/cl.h inc/CL/cl.hpp inc/CL/cl_platform.h inc/CL/opencl.h clr-rocm-5.7.1/opencl/khronos/icd/loader/000077500000000000000000000000001450307266000202155ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/loader/icd.c000066400000000000000000000174751450307266000211360ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ /* * Copyright (c) 2016-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include "icd.h" #include "icd_dispatch.h" #include "icd_envvars.h" #include #include KHRicdVendor *khrIcdVendors = NULL; // entrypoint to initialize the ICD and add all vendors void khrIcdInitialize(void) { // enumerate vendors present on the system khrIcdOsVendorsEnumerateOnce(); } void khrIcdVendorAdd(const char *libraryName) { void *library = NULL; cl_int result = CL_SUCCESS; pfn_clGetExtensionFunctionAddress p_clGetExtensionFunctionAddress = NULL; pfn_clIcdGetPlatformIDs p_clIcdGetPlatformIDs = NULL; cl_uint i = 0; cl_uint platformCount = 0; cl_platform_id *platforms = NULL; KHRicdVendor *vendorIterator = NULL; // require that the library name be valid if (!libraryName) { goto Done; } KHR_ICD_TRACE("attempting to add vendor %s...\n", libraryName); // load its library and query its function pointers library = khrIcdOsLibraryLoad(libraryName); if (!library) { KHR_ICD_TRACE("failed to load library %s\n", libraryName); goto Done; } // get the library's file name const char *libName = libraryName; const char *c; for (c = libraryName; *c; ++c) { if ((*c == '\\') || (*c == '/')) { libName = c + 1; } } // ensure that we haven't already loaded this vendor for (vendorIterator = khrIcdVendors; vendorIterator; vendorIterator = vendorIterator->next) { if (vendorIterator->library == library) { KHR_ICD_TRACE("already loaded vendor %s, nothing to do here\n", libraryName); goto Done; } if (!strcmp(vendorIterator->libName, libName)) { KHR_ICD_TRACE("already loaded library %s, nothing to do here\n", libName); goto Done; } } // get the library's clGetExtensionFunctionAddress pointer p_clGetExtensionFunctionAddress = (pfn_clGetExtensionFunctionAddress)(size_t)khrIcdOsLibraryGetFunctionAddress(library, "clGetExtensionFunctionAddress"); if (!p_clGetExtensionFunctionAddress) { KHR_ICD_TRACE("failed to get function address clGetExtensionFunctionAddress\n"); goto Done; } // use that function to get the clIcdGetPlatformIDsKHR function pointer p_clIcdGetPlatformIDs = (pfn_clIcdGetPlatformIDs)(size_t)p_clGetExtensionFunctionAddress("clIcdGetPlatformIDsKHR"); if (!p_clIcdGetPlatformIDs) { KHR_ICD_TRACE("failed to get extension function address clIcdGetPlatformIDsKHR\n"); goto Done; } // query the number of platforms available and allocate space to store them result = p_clIcdGetPlatformIDs(0, NULL, &platformCount); if (CL_SUCCESS != result) { KHR_ICD_TRACE("failed clIcdGetPlatformIDs\n"); goto Done; } platforms = (cl_platform_id *)malloc(platformCount * sizeof(cl_platform_id) ); if (!platforms) { KHR_ICD_TRACE("failed to allocate memory\n"); goto Done; } memset(platforms, 0, platformCount * sizeof(cl_platform_id) ); result = p_clIcdGetPlatformIDs(platformCount, platforms, NULL); if (CL_SUCCESS != result) { KHR_ICD_TRACE("failed clIcdGetPlatformIDs\n"); goto Done; } // for each platform, add it for (i = 0; i < platformCount; ++i) { KHRicdVendor* vendor = NULL; char *suffix; size_t suffixSize; // call clGetPlatformInfo on the returned platform to get the suffix if (!platforms[i]) { continue; } result = platforms[i]->dispatch->clGetPlatformInfo( platforms[i], CL_PLATFORM_ICD_SUFFIX_KHR, 0, NULL, &suffixSize); if (CL_SUCCESS != result) { continue; } suffix = (char *)malloc(suffixSize); if (!suffix) { continue; } result = platforms[i]->dispatch->clGetPlatformInfo( platforms[i], CL_PLATFORM_ICD_SUFFIX_KHR, suffixSize, suffix, NULL); if (CL_SUCCESS != result) { free(suffix); continue; } // allocate a structure for the vendor vendor = (KHRicdVendor*)malloc(sizeof(*vendor) ); if (!vendor) { free(suffix); KHR_ICD_TRACE("failed to allocate memory\n"); continue; } memset(vendor, 0, sizeof(*vendor) ); // populate vendor data vendor->library = khrIcdOsLibraryLoad(libraryName); if (!vendor->library) { free(suffix); free(vendor); KHR_ICD_TRACE("failed get platform handle to library\n"); continue; } vendor->libName = (char *)malloc(strlen(libName) + 1); strcpy(vendor->libName, libName); vendor->clGetExtensionFunctionAddress = p_clGetExtensionFunctionAddress; vendor->platform = platforms[i]; vendor->suffix = suffix; // add this vendor to the list of vendors at the tail { KHRicdVendor **prevNextPointer = NULL; for (prevNextPointer = &khrIcdVendors; *prevNextPointer; prevNextPointer = &( (*prevNextPointer)->next) ); *prevNextPointer = vendor; } KHR_ICD_TRACE("successfully added vendor %s with suffix %s\n", libraryName, suffix); } Done: if (library) { khrIcdOsLibraryUnload(library); } if (platforms) { free(platforms); } } // Get next file or dirname given a string list or registry key path. // Note: the input string may be modified! static char *loader_get_next_path(char *path) { size_t len; char *next; if (path == NULL) return NULL; next = strchr(path, PATH_SEPARATOR); if (next == NULL) { len = strlen(path); next = path + len; } else { *next = '\0'; next++; } return next; } void khrIcdVendorsEnumerateEnv(void) { char* icdFilenames = khrIcd_secure_getenv("OCL_ICD_FILENAMES"); char* cur_file = NULL; char* next_file = NULL; if (icdFilenames) { KHR_ICD_TRACE("Found OCL_ICD_FILENAMES environment variable.\n"); next_file = icdFilenames; while (NULL != next_file && *next_file != '\0') { cur_file = next_file; next_file = loader_get_next_path(cur_file); khrIcdVendorAdd(cur_file); } khrIcd_free_getenv(icdFilenames); } } void khrIcdContextPropertiesGetPlatform(const cl_context_properties *properties, cl_platform_id *outPlatform) { if (properties == NULL && khrIcdVendors != NULL) { *outPlatform = khrIcdVendors[0].platform; } else { const cl_context_properties *property = (cl_context_properties *)NULL; *outPlatform = NULL; for (property = properties; property && property[0]; property += 2) { if ((cl_context_properties)CL_CONTEXT_PLATFORM == property[0]) { *outPlatform = (cl_platform_id)property[1]; } } } } clr-rocm-5.7.1/opencl/khronos/icd/loader/icd.h000066400000000000000000000122111450307266000211220ustar00rootroot00000000000000/* * Copyright (c) 2016-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #ifndef _ICD_H_ #define _ICD_H_ #include "icd_platform.h" #ifndef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_USE_DEPRECATED_OPENCL_1_0_APIS #endif #ifndef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #endif #ifndef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_USE_DEPRECATED_OPENCL_1_2_APIS #endif #include #include /* * type definitions */ typedef CL_API_ENTRY cl_int (CL_API_CALL *pfn_clIcdGetPlatformIDs)( cl_uint num_entries, cl_platform_id *platforms, cl_uint *num_platforms) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *pfn_clGetPlatformInfo)( cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY void *(CL_API_CALL *pfn_clGetExtensionFunctionAddress)( const char *function_name) CL_API_SUFFIX__VERSION_1_0; typedef struct KHRicdVendorRec KHRicdVendor; /* * KHRicdVendor * * Data for a single ICD vendor platform. */ struct KHRicdVendorRec { // the loaded library object (true type varies on Linux versus Windows) void *library; // the file name of the library char *libName; // the extension suffix for this platform char *suffix; // function pointer to the ICD platform IDs extracted from the library pfn_clGetExtensionFunctionAddress clGetExtensionFunctionAddress; // the platform retrieved from clGetIcdPlatformIDsKHR cl_platform_id platform; // next vendor in the list vendors KHRicdVendor *next; }; // the global state extern KHRicdVendor * khrIcdVendors; /* * khrIcd interface */ // read vendors from system configuration and store the data // loaded into khrIcdState. this will call the OS-specific // function khrIcdEnumerateVendors. this is called at every // dispatch function which may be a valid first call into the // API (e.g, getPlatformIDs, etc). void khrIcdInitialize(void); // go through the list of vendors (in /etc/OpenCL.conf or through // the registry) and call khrIcdVendorAdd for each vendor encountered // n.b, this call is OS-specific void khrIcdOsVendorsEnumerateOnce(void); // read vendors from environment variables void khrIcdVendorsEnumerateEnv(void); // add a vendor's implementation to the list of libraries void khrIcdVendorAdd(const char *libraryName); // dynamically load a library. returns NULL on failure // n.b, this call is OS-specific void *khrIcdOsLibraryLoad(const char *libraryName); // get a function pointer from a loaded library. returns NULL on failure. // n.b, this call is OS-specific void *khrIcdOsLibraryGetFunctionAddress(void *library, const char *functionName); // unload a library. // n.b, this call is OS-specific void khrIcdOsLibraryUnload(void *library); // parse properties and determine the platform to use from them void khrIcdContextPropertiesGetPlatform( const cl_context_properties *properties, cl_platform_id *outPlatform); // internal tracing macros #if 0 #include #define KHR_ICD_TRACE(...) \ do \ { \ fprintf(stderr, "KHR ICD trace at %s:%d: ", __FILE__, __LINE__); \ fprintf(stderr, __VA_ARGS__); \ } while (0) #ifdef _WIN32 #define KHR_ICD_WIDE_TRACE(...) \ do \ { \ fwprintf(stderr, L"KHR ICD trace at %hs:%d: ", __FILE__, __LINE__); \ fwprintf(stderr, __VA_ARGS__); \ } while (0) #else #define KHR_ICD_WIDE_TRACE(...) #endif #define KHR_ICD_ASSERT(x) \ do \ { \ if (!(x)) \ { \ fprintf(stderr, "KHR ICD assert at %s:%d: %s failed", __FILE__, __LINE__, #x); \ } \ } while (0) #else #define KHR_ICD_TRACE(...) #define KHR_ICD_WIDE_TRACE(...) #define KHR_ICD_ASSERT(x) #endif // if handle is NULL then return invalid_handle_error_code #define KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(handle,invalid_handle_error_code) \ do \ { \ if (!handle) \ { \ return invalid_handle_error_code; \ } \ } while (0) // if handle is NULL then set errcode_ret to invalid_handle_error and return NULL // (NULL being an invalid handle) #define KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(handle,invalid_handle_error) \ do \ { \ if (!handle) \ { \ if (errcode_ret) \ { \ *errcode_ret = invalid_handle_error; \ } \ return NULL; \ } \ } while (0) #endif clr-rocm-5.7.1/opencl/khronos/icd/loader/icd_cmake_config.h.in000066400000000000000000000001021450307266000242100ustar00rootroot00000000000000#cmakedefine HAVE_SECURE_GETENV #cmakedefine HAVE___SECURE_GETENV clr-rocm-5.7.1/opencl/khronos/icd/loader/icd_dispatch.c000066400000000000000000002536441450307266000230150ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ /* * Copyright (c) 2012-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include "icd_dispatch.h" #include "icd.h" #include #include // Platform APIs CL_API_ENTRY cl_int CL_API_CALL clGetPlatformIDs(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms) CL_API_SUFFIX__VERSION_1_0 { KHRicdVendor* vendor = NULL; cl_uint i; // initialize the platforms (in case they have not been already) khrIcdInitialize(); if (!num_entries && platforms) { return CL_INVALID_VALUE; } if (!platforms && !num_platforms) { return CL_INVALID_VALUE; } // set num_platforms to 0 and set all platform pointers to NULL if (num_platforms) { *num_platforms = 0; } for (i = 0; i < num_entries && platforms; ++i) { platforms[i] = NULL; } // return error if we have no platforms if (!khrIcdVendors) { return CL_PLATFORM_NOT_FOUND_KHR; } // otherwise enumerate all platforms for (vendor = khrIcdVendors; vendor; vendor = vendor->next) { if (num_entries && platforms) { *(platforms++) = vendor->platform; --num_entries; } if (num_platforms) { ++(*num_platforms); } } return CL_SUCCESS; } CL_API_ENTRY cl_int CL_API_CALL clGetPlatformInfo(cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { // initialize the platforms (in case they have not been already) khrIcdInitialize(); KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(platform, CL_INVALID_PLATFORM); return platform->dispatch->clGetPlatformInfo( platform, param_name, param_value_size, param_value, param_value_size_ret); } // Device APIs CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDs(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_0 { // initialize the platforms (in case they have not been already) khrIcdInitialize(); KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(platform, CL_INVALID_PLATFORM); return platform->dispatch->clGetDeviceIDs( platform, device_type, num_entries, devices, num_devices); } CL_API_ENTRY cl_int CL_API_CALL clGetDeviceInfo( cl_device_id device, cl_device_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return device->dispatch->clGetDeviceInfo( device, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevices(cl_device_id in_device, const cl_device_partition_property * properties, cl_uint num_entries, cl_device_id * out_devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(in_device, CL_INVALID_DEVICE); return in_device->dispatch->clCreateSubDevices( in_device, properties, num_entries, out_devices, num_devices); } CL_API_ENTRY cl_int CL_API_CALL clRetainDevice(cl_device_id device) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return device->dispatch->clRetainDevice(device); } CL_API_ENTRY cl_int CL_API_CALL clReleaseDevice(cl_device_id device) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return device->dispatch->clReleaseDevice(device); } // Context APIs CL_API_ENTRY cl_context CL_API_CALL clCreateContext(const cl_context_properties * properties, cl_uint num_devices, const cl_device_id * devices, void (CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void * user_data, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { // initialize the platforms (in case they have not been already) khrIcdInitialize(); if (!num_devices || !devices) { if (errcode_ret) { *errcode_ret = CL_INVALID_VALUE; } return NULL; } KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(devices[0], CL_INVALID_DEVICE); return devices[0]->dispatch->clCreateContext( properties, num_devices, devices, pfn_notify, user_data, errcode_ret); } CL_API_ENTRY cl_context CL_API_CALL clCreateContextFromType(const cl_context_properties * properties, cl_device_type device_type, void (CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void * user_data, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_platform_id platform = NULL; // initialize the platforms (in case they have not been already) khrIcdInitialize(); // determine the platform to use from the properties specified khrIcdContextPropertiesGetPlatform(properties, &platform); // validate the platform handle and dispatch KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(platform, CL_INVALID_PLATFORM); return platform->dispatch->clCreateContextFromType( properties, device_type, pfn_notify, user_data, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clRetainContext(cl_context context) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(context, CL_INVALID_CONTEXT); return context->dispatch->clRetainContext(context); } CL_API_ENTRY cl_int CL_API_CALL clReleaseContext(cl_context context) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(context, CL_INVALID_CONTEXT); return context->dispatch->clReleaseContext(context); } CL_API_ENTRY cl_int CL_API_CALL clGetContextInfo(cl_context context, cl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(context, CL_INVALID_CONTEXT); return context->dispatch->clGetContextInfo( context, param_name, param_value_size, param_value, param_value_size_ret); } // Command Queue APIs CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueue(cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateCommandQueue( context, device, properties, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clRetainCommandQueue(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clRetainCommandQueue(command_queue); } CL_API_ENTRY cl_int CL_API_CALL clReleaseCommandQueue(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clReleaseCommandQueue(command_queue); } CL_API_ENTRY cl_int CL_API_CALL clGetCommandQueueInfo(cl_command_queue command_queue, cl_command_queue_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clGetCommandQueueInfo( command_queue, param_name, param_value_size, param_value, param_value_size_ret); } // Memory Object APIs CL_API_ENTRY cl_mem CL_API_CALL clCreateBuffer(cl_context context, cl_mem_flags flags, size_t size, void * host_ptr, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateBuffer( context, flags, size, host_ptr, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateImage(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, const cl_image_desc * image_desc, void * host_ptr, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateImage( context, flags, image_format, image_desc, host_ptr, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clRetainMemObject(cl_mem memobj) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(memobj, CL_INVALID_MEM_OBJECT); return memobj->dispatch->clRetainMemObject(memobj); } CL_API_ENTRY cl_int CL_API_CALL clReleaseMemObject(cl_mem memobj) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(memobj, CL_INVALID_MEM_OBJECT); return memobj->dispatch->clReleaseMemObject(memobj); } CL_API_ENTRY cl_int CL_API_CALL clGetSupportedImageFormats(cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format * image_formats, cl_uint * num_image_formats) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(context, CL_INVALID_CONTEXT); return context->dispatch->clGetSupportedImageFormats( context, flags, image_type, num_entries, image_formats, num_image_formats); } CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectInfo(cl_mem memobj, cl_mem_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(memobj, CL_INVALID_MEM_OBJECT); return memobj->dispatch->clGetMemObjectInfo( memobj, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_int CL_API_CALL clGetImageInfo(cl_mem image, cl_image_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(image, CL_INVALID_MEM_OBJECT); return image->dispatch->clGetImageInfo( image, param_name, param_value_size, param_value, param_value_size_ret); } // Sampler APIs CL_API_ENTRY cl_sampler CL_API_CALL clCreateSampler(cl_context context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateSampler( context, normalized_coords, addressing_mode, filter_mode, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clRetainSampler(cl_sampler sampler) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(sampler, CL_INVALID_SAMPLER); return sampler->dispatch->clRetainSampler(sampler); } CL_API_ENTRY cl_int CL_API_CALL clReleaseSampler(cl_sampler sampler) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(sampler, CL_INVALID_SAMPLER); return sampler->dispatch->clReleaseSampler(sampler); } CL_API_ENTRY cl_int CL_API_CALL clGetSamplerInfo(cl_sampler sampler, cl_sampler_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(sampler, CL_INVALID_SAMPLER); return sampler->dispatch->clGetSamplerInfo( sampler, param_name, param_value_size, param_value, param_value_size_ret); } // Program Object APIs CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithSource(cl_context context, cl_uint count, const char ** strings, const size_t * lengths, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateProgramWithSource( context, count, strings, lengths, errcode_ret); } CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBinary(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const size_t * lengths, const unsigned char ** binaries, cl_int * binary_status, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateProgramWithBinary( context, num_devices, device_list, lengths, binaries, binary_status, errcode_ret); } CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBuiltInKernels(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * kernel_names, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateProgramWithBuiltInKernels( context, num_devices, device_list, kernel_names, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clRetainProgram(cl_program program) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clRetainProgram(program); } CL_API_ENTRY cl_int CL_API_CALL clReleaseProgram(cl_program program) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clReleaseProgram(program); } CL_API_ENTRY cl_int CL_API_CALL clBuildProgram(cl_program program, cl_uint num_devices, const cl_device_id * device_list, const char * options, void (CL_CALLBACK *pfn_notify)(cl_program program, void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clBuildProgram( program, num_devices, device_list, options, pfn_notify, user_data); } CL_API_ENTRY cl_int CL_API_CALL clCompileProgram(cl_program program, cl_uint num_devices, const cl_device_id * device_list, const char * options, cl_uint num_input_headers, const cl_program * input_headers, const char ** header_include_names, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clCompileProgram( program, num_devices, device_list, options, num_input_headers, input_headers, header_include_names, pfn_notify, user_data); } CL_API_ENTRY cl_program CL_API_CALL clLinkProgram(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * options, cl_uint num_input_programs, const cl_program * input_programs, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clLinkProgram( context, num_devices, device_list, options, num_input_programs, input_programs, pfn_notify, user_data, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clSetProgramSpecializationConstant(cl_program program, cl_uint spec_id, size_t spec_size, const void* spec_value) CL_API_SUFFIX__VERSION_2_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clSetProgramSpecializationConstant( program, spec_id, spec_size, spec_value); } CL_API_ENTRY cl_int CL_API_CALL clSetProgramReleaseCallback(cl_program program, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data) CL_API_SUFFIX__VERSION_2_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clSetProgramReleaseCallback( program, pfn_notify, user_data); } CL_API_ENTRY cl_int CL_API_CALL clUnloadPlatformCompiler(cl_platform_id platform) CL_API_SUFFIX__VERSION_1_2 { // initialize the platforms (in case they have not been already) khrIcdInitialize(); KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(platform, CL_INVALID_PLATFORM); return platform->dispatch->clUnloadPlatformCompiler(platform); } CL_API_ENTRY cl_int CL_API_CALL clGetProgramInfo(cl_program program, cl_program_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clGetProgramInfo( program, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_int CL_API_CALL clGetProgramBuildInfo(cl_program program, cl_device_id device, cl_program_build_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clGetProgramBuildInfo( program, device, param_name, param_value_size, param_value, param_value_size_ret); } // Kernel Object APIs CL_API_ENTRY cl_kernel CL_API_CALL clCreateKernel(cl_program program, const char * kernel_name, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(program, CL_INVALID_PROGRAM); return program->dispatch->clCreateKernel( program, kernel_name, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clCreateKernelsInProgram(cl_program program, cl_uint num_kernels, cl_kernel * kernels, cl_uint * num_kernels_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(program, CL_INVALID_PROGRAM); return program->dispatch->clCreateKernelsInProgram( program, num_kernels, kernels, num_kernels_ret); } CL_API_ENTRY cl_int CL_API_CALL clRetainKernel(cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); return kernel->dispatch->clRetainKernel(kernel); } CL_API_ENTRY cl_int CL_API_CALL clReleaseKernel(cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); return kernel->dispatch->clReleaseKernel(kernel); } CL_API_ENTRY cl_int CL_API_CALL clSetKernelArg(cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void * arg_value) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); return kernel->dispatch->clSetKernelArg( kernel, arg_index, arg_size, arg_value); } CL_API_ENTRY cl_int CL_API_CALL clGetKernelInfo(cl_kernel kernel, cl_kernel_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); return kernel->dispatch->clGetKernelInfo( kernel, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_int CL_API_CALL clGetKernelArgInfo(cl_kernel kernel, cl_uint arg_indx, cl_kernel_arg_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); return kernel->dispatch->clGetKernelArgInfo( kernel, arg_indx, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_int CL_API_CALL clGetKernelWorkGroupInfo(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); return kernel->dispatch->clGetKernelWorkGroupInfo( kernel, device, param_name, param_value_size, param_value, param_value_size_ret); } // Event Object APIs CL_API_ENTRY cl_int CL_API_CALL clWaitForEvents(cl_uint num_events, const cl_event * event_list) CL_API_SUFFIX__VERSION_1_0 { if (!num_events || !event_list) { return CL_INVALID_VALUE; } KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(event_list[0], CL_INVALID_EVENT); return event_list[0]->dispatch->clWaitForEvents( num_events, event_list); } CL_API_ENTRY cl_int CL_API_CALL clGetEventInfo(cl_event event, cl_event_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(event, CL_INVALID_EVENT); return event->dispatch->clGetEventInfo( event, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_int CL_API_CALL clRetainEvent(cl_event event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(event, CL_INVALID_EVENT); return event->dispatch->clRetainEvent(event); } CL_API_ENTRY cl_int CL_API_CALL clReleaseEvent(cl_event event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(event, CL_INVALID_EVENT); return event->dispatch->clReleaseEvent(event); } // Profiling APIs CL_API_ENTRY cl_int CL_API_CALL clGetEventProfilingInfo(cl_event event, cl_profiling_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(event, CL_INVALID_EVENT); return event->dispatch->clGetEventProfilingInfo( event, param_name, param_value_size, param_value, param_value_size_ret); } // Flush and Finish APIs CL_API_ENTRY cl_int CL_API_CALL clFlush(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clFlush(command_queue); } CL_API_ENTRY cl_int CL_API_CALL clFinish(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clFinish(command_queue); } // Enqueued Commands APIs CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t cb, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueReadBuffer( command_queue, buffer, blocking_read, offset, cb, ptr, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBufferRect( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t * buffer_origin, const size_t * host_origin, const size_t * region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueReadBufferRect( command_queue, buffer, blocking_read, buffer_origin, host_origin, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t cb, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueWriteBuffer( command_queue, buffer, blocking_write, offset, cb, ptr, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBufferRect( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t * buffer_origin, const size_t * host_origin, const size_t * region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueWriteBufferRect( command_queue, buffer, blocking_read, buffer_origin, host_origin, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillBuffer(cl_command_queue command_queue, cl_mem buffer, const void * pattern, size_t pattern_size, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueFillBuffer( command_queue, buffer, pattern, pattern_size, offset, cb, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBuffer(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueCopyBuffer( command_queue, src_buffer, dst_buffer, src_offset, dst_offset, cb, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferRect( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, const size_t * src_origin, const size_t * dst_origin, const size_t * region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueCopyBufferRect( command_queue, src_buffer, dst_buffer, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_read, const size_t * origin, const size_t * region, size_t row_pitch, size_t slice_pitch, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueReadImage( command_queue, image, blocking_read, origin, region, row_pitch, slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_write, const size_t * origin, const size_t * region, size_t input_row_pitch, size_t input_slice_pitch, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueWriteImage( command_queue, image, blocking_write, origin, region, input_row_pitch, input_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillImage(cl_command_queue command_queue, cl_mem image, const void * fill_color, const size_t origin[3], const size_t region[3], cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueFillImage( command_queue, image, fill_color, origin, region, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImage(cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image, const size_t * src_origin, const size_t * dst_origin, const size_t * region, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueCopyImage( command_queue, src_image, dst_image, src_origin, dst_origin, region, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImageToBuffer(cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buffer, const size_t * src_origin, const size_t * region, size_t dst_offset, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueCopyImageToBuffer( command_queue, src_image, dst_buffer, src_origin, region, dst_offset, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferToImage(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_image, size_t src_offset, const size_t * dst_origin, const size_t * region, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueCopyBufferToImage( command_queue, src_buffer, dst_image, src_offset, dst_origin, region, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY void * CL_API_CALL clEnqueueMapBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map, cl_map_flags map_flags, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueMapBuffer( command_queue, buffer, blocking_map, map_flags, offset, cb, num_events_in_wait_list, event_wait_list, event, errcode_ret); } CL_API_ENTRY void * CL_API_CALL clEnqueueMapImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_map, cl_map_flags map_flags, const size_t * origin, const size_t * region, size_t * image_row_pitch, size_t * image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueMapImage( command_queue, image, blocking_map, map_flags, origin, region, image_row_pitch, image_slice_pitch, num_events_in_wait_list, event_wait_list, event, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueUnmapMemObject(cl_command_queue command_queue, cl_mem memobj, void * mapped_ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueUnmapMemObject( command_queue, memobj, mapped_ptr, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueMigrateMemObjects(cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem * mem_objects, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueMigrateMemObjects( command_queue, num_mem_objects, mem_objects, flags, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueNDRangeKernel(cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t * global_work_offset, const size_t * global_work_size, const size_t * local_work_size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueNDRangeKernel( command_queue, kernel, work_dim, global_work_offset, global_work_size, local_work_size, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueTask(cl_command_queue command_queue, cl_kernel kernel, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueTask( command_queue, kernel, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueNativeKernel(cl_command_queue command_queue, void (CL_CALLBACK * user_func)(void *), void * args, size_t cb_args, cl_uint num_mem_objects, const cl_mem * mem_list, const void ** args_mem_loc, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueNativeKernel( command_queue, user_func, args, cb_args, num_mem_objects, mem_list, args_mem_loc, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarkerWithWaitList(cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueMarkerWithWaitList( command_queue, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrierWithWaitList(cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueBarrierWithWaitList( command_queue, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddressForPlatform(cl_platform_id platform, const char * function_name) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(function_name, NULL); // make sure the ICD is initialized khrIcdInitialize(); // return any ICD-aware extensions // Most extensions, including multi-vendor KHR and EXT extensions, // do not need to be ICD-aware and do not require any ICD loader // modifications. The KHR and EXT extensions below were added for // backwards compatibility only. #define CL_COMMON_EXTENSION_ENTRYPOINT_ADD(name) if (!strcmp(function_name, #name) ) return (void *)(size_t)&name // Functions supporting the creation of OpenCL Memory Objects // from OpenGL Objects (cl_apple_gl_sharing, cl_khr_gl_sharing) CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLBuffer); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLTexture); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLTexture2D); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLTexture3D); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLRenderbuffer); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetGLObjectInfo); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetGLTextureInfo); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireGLObjects); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseGLObjects); // cl_khr_gl_sharing CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetGLContextInfoKHR); // cl_khr_gl_event CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateEventFromGLsyncKHR); #if defined(_WIN32) // cl_khr_d3d10_sharing CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetDeviceIDsFromD3D10KHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D10BufferKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D10Texture2DKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D10Texture3DKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireD3D10ObjectsKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseD3D10ObjectsKHR); // cl_khr_d3d11_sharing CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetDeviceIDsFromD3D11KHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D11BufferKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D11Texture2DKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D11Texture3DKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireD3D11ObjectsKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseD3D11ObjectsKHR); // cl_khr_dx9_media_sharing CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetDeviceIDsFromDX9MediaAdapterKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromDX9MediaSurfaceKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireDX9MediaSurfacesKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseDX9MediaSurfacesKHR); #endif // cl_ext_device_fission CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateSubDevicesEXT); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clRetainDeviceEXT); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clReleaseDeviceEXT); /* cl_khr_egl_image */ CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromEGLImageKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireEGLObjectsKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseEGLObjectsKHR); /* cl_khr_egl_event */ CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateEventFromEGLSyncKHR); /* cl_khr_sub_groups */ CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetKernelSubGroupInfoKHR); #undef CL_COMMON_EXTENSION_ENTRYPOINT_ADD // This is not an ICD-aware extension, so call into the implementation // to get the extension function address. KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(platform, NULL); return platform->dispatch->clGetExtensionFunctionAddressForPlatform( platform, function_name); } // Deprecated APIs CL_API_ENTRY cl_int CL_API_CALL clSetCommandQueueProperty(cl_command_queue command_queue, cl_command_queue_properties properties, cl_bool enable, cl_command_queue_properties * old_properties) CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clSetCommandQueueProperty( command_queue, properties, enable, old_properties); } CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevicesEXT( cl_device_id in_device, const cl_device_partition_property_ext * partition_properties, cl_uint num_entries, cl_device_id * out_devices, cl_uint * num_devices) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(in_device, CL_INVALID_DEVICE); return in_device->dispatch->clCreateSubDevicesEXT( in_device, partition_properties, num_entries, out_devices, num_devices); } CL_API_ENTRY cl_int CL_API_CALL clRetainDeviceEXT(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return device->dispatch->clRetainDeviceEXT(device); } CL_API_ENTRY cl_int CL_API_CALL clReleaseDeviceEXT(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return device->dispatch->clReleaseDeviceEXT(device); } CL_API_ENTRY cl_mem CL_API_CALL clCreateImage2D(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height, size_t image_row_pitch, void * host_ptr, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateImage2D( context, flags, image_format, image_width, image_height, image_row_pitch, host_ptr, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateImage3D(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height, size_t image_depth, size_t image_row_pitch, size_t image_slice_pitch, void * host_ptr, cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateImage3D( context, flags, image_format, image_width, image_height, image_depth, image_row_pitch, image_slice_pitch, host_ptr, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clUnloadCompiler(void) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { return CL_SUCCESS; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarker(cl_command_queue command_queue, cl_event * event) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueMarker( command_queue, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueWaitForEvents(cl_command_queue command_queue, cl_uint num_events, const cl_event * event_list) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueWaitForEvents( command_queue, num_events, event_list); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrier(cl_command_queue command_queue) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueBarrier(command_queue); } CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddress(const char *function_name) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED { size_t function_name_length = 0; KHRicdVendor* vendor = NULL; KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(function_name, NULL); // make sure the ICD is initialized khrIcdInitialize(); function_name_length = strlen(function_name); // return any ICD-aware extensions // Most extensions, including multi-vendor KHR and EXT extensions, // do not need to be ICD-aware and do not require any ICD loader // modifications. The KHR and EXT extensions below were added for // backwards compatibility only. #define CL_COMMON_EXTENSION_ENTRYPOINT_ADD(name) if (!strcmp(function_name, #name) ) return (void *)(size_t)&name // Functions supporting the creation of OpenCL Memory Objects // from OpenGL Objects (cl_apple_gl_sharing, cl_khr_gl_sharing) CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLBuffer); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLTexture); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLTexture2D); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLTexture3D); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromGLRenderbuffer); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetGLObjectInfo); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetGLTextureInfo); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireGLObjects); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseGLObjects); // cl_khr_gl_sharing CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetGLContextInfoKHR); // cl_khr_gl_event CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateEventFromGLsyncKHR); #if defined(_WIN32) // cl_khr_d3d10_sharing CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetDeviceIDsFromD3D10KHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D10BufferKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D10Texture2DKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D10Texture3DKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireD3D10ObjectsKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseD3D10ObjectsKHR); // cl_khr_d3d11_sharing CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetDeviceIDsFromD3D11KHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D11BufferKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D11Texture2DKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromD3D11Texture3DKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireD3D11ObjectsKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseD3D11ObjectsKHR); // cl_khr_dx9_media_sharing CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetDeviceIDsFromDX9MediaAdapterKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromDX9MediaSurfaceKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireDX9MediaSurfacesKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseDX9MediaSurfacesKHR); #endif // cl_ext_device_fission CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateSubDevicesEXT); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clRetainDeviceEXT); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clReleaseDeviceEXT); /* cl_khr_egl_image */ CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateFromEGLImageKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueAcquireEGLObjectsKHR); CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clEnqueueReleaseEGLObjectsKHR); /* cl_khr_egl_event */ CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clCreateEventFromEGLSyncKHR); /* cl_khr_sub_groups */ CL_COMMON_EXTENSION_ENTRYPOINT_ADD(clGetKernelSubGroupInfoKHR); #undef CL_COMMON_EXTENSION_ENTRYPOINT_ADD // fall back to vendor extension detection for (vendor = khrIcdVendors; vendor; vendor = vendor->next) { size_t vendor_suffix_length = strlen(vendor->suffix); if (vendor_suffix_length <= function_name_length && vendor_suffix_length > 0) { const char *function_suffix = function_name+function_name_length-vendor_suffix_length; if (!strcmp(function_suffix, vendor->suffix) ) { return vendor->clGetExtensionFunctionAddress(function_name); } } } return NULL; } // GL and other APIs CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLBuffer( cl_context context, cl_mem_flags flags, cl_GLuint bufobj, int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromGLBuffer( context, flags, bufobj, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromGLTexture( context, flags, target, miplevel, texture, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture2D( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromGLTexture2D( context, flags, target, miplevel, texture, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture3D( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromGLTexture3D( context, flags, target, miplevel, texture, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLRenderbuffer( cl_context context, cl_mem_flags flags, cl_GLuint renderbuffer, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromGLRenderbuffer( context, flags, renderbuffer, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clGetGLObjectInfo( cl_mem memobj, cl_gl_object_type * gl_object_type, cl_GLuint * gl_object_name) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(memobj, CL_INVALID_MEM_OBJECT); return memobj->dispatch->clGetGLObjectInfo( memobj, gl_object_type, gl_object_name); } CL_API_ENTRY cl_int CL_API_CALL clGetGLTextureInfo( cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(memobj, CL_INVALID_MEM_OBJECT); return memobj->dispatch->clGetGLTextureInfo( memobj, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGLObjects( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueAcquireGLObjects( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGLObjects( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueReleaseGLObjects( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clGetGLContextInfoKHR( const cl_context_properties *properties, cl_gl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_platform_id platform = NULL; // initialize the platforms (in case they have not been already) khrIcdInitialize(); // determine the platform to use from the properties specified khrIcdContextPropertiesGetPlatform(properties, &platform); KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(platform, CL_INVALID_PLATFORM); return platform->dispatch->clGetGLContextInfoKHR( properties, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromGLsyncKHR( cl_context context, cl_GLsync sync, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateEventFromGLsyncKHR( context, sync, errcode_ret); } #if defined(_WIN32) /* * * cl_d3d10_sharing_khr * */ CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDsFromD3D10KHR( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void *d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(platform, CL_INVALID_PLATFORM); return platform->dispatch->clGetDeviceIDsFromD3D10KHR( platform, d3d_device_source, d3d_object, d3d_device_set, num_entries, devices, num_devices); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D10BufferKHR( cl_context context, cl_mem_flags flags, ID3D10Buffer *resource, cl_int *errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromD3D10BufferKHR( context, flags, resource, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D10Texture2DKHR( cl_context context, cl_mem_flags flags, ID3D10Texture2D * resource, UINT subresource, cl_int * errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromD3D10Texture2DKHR( context, flags, resource, subresource, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D10Texture3DKHR( cl_context context, cl_mem_flags flags, ID3D10Texture3D *resource, UINT subresource, cl_int *errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromD3D10Texture3DKHR( context, flags, resource, subresource, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireD3D10ObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueAcquireD3D10ObjectsKHR( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseD3D10ObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueReleaseD3D10ObjectsKHR( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } /* * * cl_d3d11_sharing_khr * */ CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDsFromD3D11KHR( cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void * d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(platform, CL_INVALID_PLATFORM); return platform->dispatch->clGetDeviceIDsFromD3D11KHR( platform, d3d_device_source, d3d_object, d3d_device_set, num_entries, devices, num_devices); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D11BufferKHR( cl_context context, cl_mem_flags flags, ID3D11Buffer * resource, cl_int * errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromD3D11BufferKHR( context, flags, resource, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D11Texture2DKHR( cl_context context, cl_mem_flags flags, ID3D11Texture2D * resource, UINT subresource, cl_int * errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromD3D11Texture2DKHR( context, flags, resource, subresource, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromD3D11Texture3DKHR( cl_context context, cl_mem_flags flags, ID3D11Texture3D * resource, UINT subresource, cl_int * errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromD3D11Texture3DKHR( context, flags, resource, subresource, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireD3D11ObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueAcquireD3D11ObjectsKHR( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseD3D11ObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueReleaseD3D11ObjectsKHR( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } /* * * cl_khr_dx9_media_sharing * */ CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDsFromDX9MediaAdapterKHR( cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr * media_adapters_type, void * media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(platform, CL_INVALID_PLATFORM); return platform->dispatch->clGetDeviceIDsFromDX9MediaAdapterKHR( platform, num_media_adapters, media_adapters_type, media_adapters, media_adapter_set, num_entries, devices, num_devices); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromDX9MediaSurfaceKHR( cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void * surface_info, cl_uint plane, cl_int * errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromDX9MediaSurfaceKHR( context, flags, adapter_type, surface_info, plane, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireDX9MediaSurfacesKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueAcquireDX9MediaSurfacesKHR( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseDX9MediaSurfacesKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueReleaseDX9MediaSurfacesKHR( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } #endif CL_API_ENTRY cl_int CL_API_CALL clSetEventCallback( cl_event event, cl_int command_exec_callback_type, void (CL_CALLBACK *pfn_notify)(cl_event, cl_int, void *), void *user_data) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(event, CL_INVALID_EVENT); return event->dispatch->clSetEventCallback( event, command_exec_callback_type, pfn_notify, user_data); } CL_API_ENTRY cl_mem CL_API_CALL clCreateSubBuffer( cl_mem buffer, cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(buffer, CL_INVALID_MEM_OBJECT); return buffer->dispatch->clCreateSubBuffer( buffer, flags, buffer_create_type, buffer_create_info, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clSetMemObjectDestructorCallback( cl_mem memobj, void (CL_CALLBACK * pfn_notify)( cl_mem, void*), void * user_data ) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(memobj, CL_INVALID_MEM_OBJECT); return memobj->dispatch->clSetMemObjectDestructorCallback( memobj, pfn_notify, user_data); } CL_API_ENTRY cl_event CL_API_CALL clCreateUserEvent( cl_context context, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateUserEvent( context, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clSetUserEventStatus( cl_event event, cl_int execution_status) CL_API_SUFFIX__VERSION_1_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(event, CL_INVALID_EVENT); return event->dispatch->clSetUserEventStatus( event, execution_status); } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromEGLImageKHR( cl_context context, CLeglDisplayKHR display, CLeglImageKHR image, cl_mem_flags flags, const cl_egl_image_properties_khr *properties, cl_int *errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateFromEGLImageKHR( context, display, image, flags, properties, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireEGLObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueAcquireEGLObjectsKHR( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseEGLObjectsKHR( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueReleaseEGLObjectsKHR( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } /* cl_khr_egl_event */ CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromEGLSyncKHR( cl_context context, CLeglSyncKHR sync, CLeglDisplayKHR display, cl_int *errcode_ret) { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateEventFromEGLSyncKHR( context, sync, display, errcode_ret); } CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueueWithProperties( cl_context context, cl_device_id device, const cl_queue_properties * properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateCommandQueueWithProperties( context, device, properties, errcode_ret); } CL_API_ENTRY cl_mem CL_API_CALL clCreatePipe( cl_context context, cl_mem_flags flags, cl_uint pipe_packet_size, cl_uint pipe_max_packets, const cl_pipe_properties * properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreatePipe( context, flags, pipe_packet_size, pipe_max_packets, properties, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clGetPipeInfo( cl_mem pipe, cl_pipe_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(pipe, CL_INVALID_MEM_OBJECT); return pipe->dispatch->clGetPipeInfo( pipe, param_name, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY void * CL_API_CALL clSVMAlloc( cl_context context, cl_svm_mem_flags flags, size_t size, cl_uint alignment) CL_API_SUFFIX__VERSION_2_0 { if (!context) { return NULL; } return context->dispatch->clSVMAlloc( context, flags, size, alignment); } CL_API_ENTRY void CL_API_CALL clSVMFree( cl_context context, void * svm_pointer) CL_API_SUFFIX__VERSION_2_0 { if (!context || !svm_pointer) { return; } context->dispatch->clSVMFree( context, svm_pointer); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFree( cl_command_queue command_queue, cl_uint num_svm_pointers, void* svm_pointers[], void (CL_CALLBACK* pfn_free_func)( cl_command_queue queue, cl_uint num_svm_pointers, void* svm_pointers[], void* user_data), void* user_data, cl_uint num_events_in_wait_list, const cl_event* event_wait_list, cl_event* event) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueSVMFree( command_queue, num_svm_pointers, svm_pointers, pfn_free_func, user_data, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpy( cl_command_queue command_queue, cl_bool blocking_copy, void * dst_ptr, const void * src_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueSVMMemcpy( command_queue, blocking_copy, dst_ptr, src_ptr, size, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFill( cl_command_queue command_queue, void * svm_ptr, const void * pattern, size_t pattern_size, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueSVMMemFill( command_queue, svm_ptr, pattern, pattern_size, size, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMap( cl_command_queue command_queue, cl_bool blocking_map, cl_map_flags flags, void * svm_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueSVMMap( command_queue, blocking_map, flags, svm_ptr, size, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmap( cl_command_queue command_queue, void * svm_ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueSVMUnmap( command_queue, svm_ptr, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_sampler CL_API_CALL clCreateSamplerWithProperties( cl_context context, const cl_sampler_properties * sampler_properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateSamplerWithProperties( context, sampler_properties, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointer( cl_kernel kernel, cl_uint arg_index, const void * arg_value) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); return kernel->dispatch->clSetKernelArgSVMPointer( kernel, arg_index, arg_value); } CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfo( cl_kernel kernel, cl_kernel_exec_info param_name, size_t param_value_size, const void * param_value) CL_API_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); return kernel->dispatch->clSetKernelExecInfo( kernel, param_name, param_value_size, param_value); } CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfoKHR( cl_kernel in_kernel, cl_device_id in_device, cl_kernel_sub_group_info param_name, size_t input_value_size, const void * input_value, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_EXT_SUFFIX__VERSION_2_0 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(in_kernel, CL_INVALID_KERNEL); return in_kernel->dispatch->clGetKernelSubGroupInfoKHR( in_kernel, in_device, param_name, input_value_size, input_value, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_int CL_API_CALL clSetDefaultDeviceCommandQueue( cl_context context, cl_device_id device, cl_command_queue command_queue) CL_API_SUFFIX__VERSION_2_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(context, CL_INVALID_CONTEXT); KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return context->dispatch->clSetDefaultDeviceCommandQueue( context, device, command_queue); } CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithIL( cl_context context, const void * il, size_t length, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(context, CL_INVALID_CONTEXT); return context->dispatch->clCreateProgramWithIL( context, il, length, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfo( cl_kernel kernel, cl_device_id device, cl_kernel_sub_group_info param_name, size_t input_value_size, const void * input_value, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_2_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(kernel, CL_INVALID_KERNEL); KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return kernel->dispatch->clGetKernelSubGroupInfo( kernel, device, param_name, input_value_size, input_value, param_value_size, param_value, param_value_size_ret); } CL_API_ENTRY cl_kernel CL_API_CALL clCloneKernel( cl_kernel source_kernel, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_HANDLE(source_kernel, CL_INVALID_KERNEL); return source_kernel->dispatch->clCloneKernel( source_kernel, errcode_ret); } CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMigrateMem( cl_command_queue command_queue, cl_uint num_svm_pointers, const void ** svm_pointers, const size_t * sizes, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(command_queue, CL_INVALID_COMMAND_QUEUE); return command_queue->dispatch->clEnqueueSVMMigrateMem( command_queue, num_svm_pointers, svm_pointers, sizes, flags, num_events_in_wait_list, event_wait_list, event); } CL_API_ENTRY cl_int CL_API_CALL clGetDeviceAndHostTimer( cl_device_id device, cl_ulong * device_timestamp, cl_ulong * host_timestamp) CL_API_SUFFIX__VERSION_2_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return device->dispatch->clGetDeviceAndHostTimer( device, device_timestamp, host_timestamp); } CL_API_ENTRY cl_int CL_API_CALL clGetHostTimer( cl_device_id device, cl_ulong * host_timestamp) CL_API_SUFFIX__VERSION_2_1 { KHR_ICD_VALIDATE_HANDLE_RETURN_ERROR(device, CL_INVALID_DEVICE); return device->dispatch->clGetHostTimer( device, host_timestamp); } clr-rocm-5.7.1/opencl/khronos/icd/loader/icd_dispatch.h000066400000000000000000000037661450307266000230200ustar00rootroot00000000000000/* * Copyright (c) 2016-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #ifndef _ICD_DISPATCH_H_ #define _ICD_DISPATCH_H_ #ifndef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_USE_DEPRECATED_OPENCL_1_0_APIS #endif #ifndef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #endif #ifndef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_USE_DEPRECATED_OPENCL_1_2_APIS #endif #ifndef CL_USE_DEPRECATED_OPENCL_2_0_APIS #define CL_USE_DEPRECATED_OPENCL_2_0_APIS #endif // cl.h #include // cl_gl.h and required files #ifdef _WIN32 #include #include #include #include #include #include #endif #include #include #include #include #include /* * * vendor dispatch table structure * */ struct _cl_platform_id { cl_icd_dispatch *dispatch; }; struct _cl_device_id { cl_icd_dispatch *dispatch; }; struct _cl_context { cl_icd_dispatch *dispatch; }; struct _cl_command_queue { cl_icd_dispatch *dispatch; }; struct _cl_mem { cl_icd_dispatch *dispatch; }; struct _cl_program { cl_icd_dispatch *dispatch; }; struct _cl_kernel { cl_icd_dispatch *dispatch; }; struct _cl_event { cl_icd_dispatch *dispatch; }; struct _cl_sampler { cl_icd_dispatch *dispatch; }; #endif // _ICD_DISPATCH_H_ clr-rocm-5.7.1/opencl/khronos/icd/loader/icd_envvars.h000066400000000000000000000015431450307266000226740ustar00rootroot00000000000000/* * Copyright (c) 2016-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #ifndef _ICD_ENVVARS_H_ #define _ICD_ENVVARS_H_ char *khrIcd_getenv(const char *name); char *khrIcd_secure_getenv(const char *name); void khrIcd_free_getenv(char *val); #endif clr-rocm-5.7.1/opencl/khronos/icd/loader/icd_platform.h000066400000000000000000000021111450307266000230240ustar00rootroot00000000000000/* * Copyright (c) 2016-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #ifndef _ICD_PLATFORM_H_ #define _ICD_PLATFORM_H_ #if defined(__linux__) || defined(__APPLE__) #define PATH_SEPARATOR ':' #define DIRECTORY_SYMBOL '/' #ifdef __ANDROID__ #define ICD_VENDOR_PATH "/system/vendor/Khronos/OpenCL/vendors/"; #else #define ICD_VENDOR_PATH "/etc/OpenCL/vendors/"; #endif // ANDROID #elif defined(_WIN32) #define PATH_SEPARATOR ';' #define DIRECTORY_SYMBOL '\\' #endif #endif clr-rocm-5.7.1/opencl/khronos/icd/loader/linux/000077500000000000000000000000001450307266000213545ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/loader/linux/icd_exports.map000066400000000000000000000106231450307266000244000ustar00rootroot00000000000000/* * Copyright (c) 2016-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ OPENCL_1.0 { global: clBuildProgram; clCreateBuffer; clCreateCommandQueue; clCreateContext; clCreateContextFromType; clCreateFromGLBuffer; clCreateFromGLRenderbuffer; clCreateFromGLTexture2D; clCreateFromGLTexture3D; clCreateImage2D; clCreateImage3D; clCreateKernel; clCreateKernelsInProgram; clCreateProgramWithBinary; clCreateProgramWithSource; clCreateSampler; clEnqueueAcquireGLObjects; clEnqueueBarrier; clEnqueueCopyBuffer; clEnqueueCopyBufferToImage; clEnqueueCopyImage; clEnqueueCopyImageToBuffer; clEnqueueMapBuffer; clEnqueueMapImage; clEnqueueMarker; clEnqueueNDRangeKernel; clEnqueueNativeKernel; clEnqueueReadBuffer; clEnqueueReadImage; clEnqueueReleaseGLObjects; clEnqueueTask; clEnqueueUnmapMemObject; clEnqueueWaitForEvents; clEnqueueWriteBuffer; clEnqueueWriteImage; clFinish; clFlush; clGetCommandQueueInfo; clGetContextInfo; clGetDeviceIDs; clGetDeviceInfo; clGetEventInfo; clGetEventProfilingInfo; clGetExtensionFunctionAddress; clGetGLObjectInfo; clGetGLTextureInfo; clGetImageInfo; clGetKernelInfo; clGetKernelWorkGroupInfo; clGetMemObjectInfo; clGetPlatformIDs; clGetPlatformInfo; clGetProgramBuildInfo; clGetProgramInfo; clGetSamplerInfo; clGetSupportedImageFormats; clReleaseCommandQueue; clReleaseContext; clReleaseEvent; clReleaseKernel; clReleaseMemObject; clReleaseProgram; clReleaseSampler; clRetainCommandQueue; clRetainContext; clRetainEvent; clRetainKernel; clRetainMemObject; clRetainProgram; clRetainSampler; clSetCommandQueueProperty; clSetKernelArg; clUnloadCompiler; clWaitForEvents; local: /* Everything else is local to ICD. */ *; }; OPENCL_1.1 { global: clCreateSubBuffer; clCreateUserEvent; clEnqueueCopyBufferRect; clEnqueueReadBufferRect; clEnqueueWriteBufferRect; clSetEventCallback; clSetMemObjectDestructorCallback; clSetUserEventStatus; } OPENCL_1.0; OPENCL_1.2 { global: clCompileProgram; clCreateFromGLTexture; clCreateImage; clCreateProgramWithBuiltInKernels; clCreateSubDevices; clEnqueueBarrierWithWaitList; clEnqueueFillBuffer; clEnqueueFillImage; clEnqueueMarkerWithWaitList; clEnqueueMigrateMemObjects; clGetExtensionFunctionAddressForPlatform; clGetKernelArgInfo; clLinkProgram; clReleaseDevice; clRetainDevice; clUnloadPlatformCompiler; } OPENCL_1.1; OPENCL_2.0 { global: clCreateCommandQueueWithProperties; clCreatePipe; clGetPipeInfo; clSVMAlloc; clSVMFree; clEnqueueSVMFree; clEnqueueSVMMemcpy; clEnqueueSVMMemFill; clEnqueueSVMMap; clEnqueueSVMUnmap; clCreateSamplerWithProperties; clSetKernelArgSVMPointer; clSetKernelExecInfo; } OPENCL_1.2; OPENCL_2.1 { global: clCloneKernel; clCreateProgramWithIL; clEnqueueSVMMigrateMem; clGetDeviceAndHostTimer; clGetHostTimer; clGetKernelSubGroupInfo; clSetDefaultDeviceCommandQueue; } OPENCL_2.0; OPENCL_2.2 { global: clSetProgramReleaseCallback; clSetProgramSpecializationConstant; } OPENCL_2.1; clr-rocm-5.7.1/opencl/khronos/icd/loader/linux/icd_linux.c000066400000000000000000000126471450307266000235100ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ /* * Copyright (c) 2016-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include "icd.h" #include "icd_envvars.h" #include #include #include #include #include #include #include static pthread_once_t initialized = PTHREAD_ONCE_INIT; /* * * Vendor enumeration functions * */ // go through the list of vendors in the two configuration files void khrIcdOsVendorsEnumerate(void) { DIR *dir = NULL; struct dirent *dirEntry = NULL; char* vendorPath = ICD_VENDOR_PATH; char* envPath = NULL; khrIcdVendorsEnumerateEnv(); envPath = khrIcd_secure_getenv("OCL_ICD_VENDORS"); if (NULL != envPath) { vendorPath = envPath; } dir = opendir(vendorPath); if (NULL == dir) { KHR_ICD_TRACE("Failed to open path %s, continuing\n", vendorPath); } else { // attempt to load all files in the directory for (dirEntry = readdir(dir); dirEntry; dirEntry = readdir(dir) ) { switch(dirEntry->d_type) { case DT_UNKNOWN: case DT_REG: case DT_LNK: { const char* extension = ".icd"; FILE *fin = NULL; char* fileName = NULL; char* buffer = NULL; long bufferSize = 0; // make sure the file name ends in .icd if (strlen(extension) > strlen(dirEntry->d_name) ) { break; } if (strcmp(dirEntry->d_name + strlen(dirEntry->d_name) - strlen(extension), extension) ) { break; } // allocate space for the full path of the vendor library name fileName = malloc(strlen(dirEntry->d_name) + strlen(vendorPath) + 1); if (!fileName) { KHR_ICD_TRACE("Failed allocate space for ICD file path\n"); break; } sprintf(fileName, "%s%s", vendorPath, dirEntry->d_name); // open the file and read its contents fin = fopen(fileName, "r"); if (!fin) { free(fileName); break; } fseek(fin, 0, SEEK_END); bufferSize = ftell(fin); buffer = malloc(bufferSize+1); if (!buffer) { free(fileName); fclose(fin); break; } memset(buffer, 0, bufferSize+1); fseek(fin, 0, SEEK_SET); if (bufferSize != (long)fread(buffer, 1, bufferSize, fin) ) { free(fileName); free(buffer); fclose(fin); break; } // ignore a newline at the end of the file if (buffer[bufferSize-1] == '\n') buffer[bufferSize-1] = '\0'; // load the string read from the file khrIcdVendorAdd(buffer); free(fileName); free(buffer); fclose(fin); } break; default: break; } } closedir(dir); KHRicdVendor *vendorIterator; for (vendorIterator = khrIcdVendors; vendorIterator; vendorIterator = vendorIterator->next) { if (vendorIterator->libName != NULL) { free(vendorIterator->libName); vendorIterator->libName = NULL; } } } if (NULL != envPath) { khrIcd_free_getenv(envPath); } } // go through the list of vendors only once void khrIcdOsVendorsEnumerateOnce(void) { pthread_once(&initialized, khrIcdOsVendorsEnumerate); } /* * * Dynamic library loading functions * */ // dynamically load a library. returns NULL on failure void *khrIcdOsLibraryLoad(const char *libraryName) { void *retVal = dlopen (libraryName, RTLD_NOW); if (NULL == retVal) { printf("dlerror: %s\n", dlerror()); } return retVal; } // get a function pointer from a loaded library. returns NULL on failure. void *khrIcdOsLibraryGetFunctionAddress(void *library, const char *functionName) { return dlsym(library, functionName); } // unload a library void khrIcdOsLibraryUnload(void *library) { dlclose(library); } clr-rocm-5.7.1/opencl/khronos/icd/loader/linux/icd_linux_envvars.c000066400000000000000000000042301450307266000252410ustar00rootroot00000000000000/* * Copyright (c) 2016-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ // for secure_getenv(): #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include "icd_cmake_config.h" #include char *khrIcd_getenv(const char *name) { // No allocation of memory necessary for Linux. return getenv(name); } char *khrIcd_secure_getenv(const char *name) { #if defined(__APPLE__) // Apple does not appear to have a secure getenv implementation. // The main difference between secure getenv and getenv is that secure getenv // returns NULL if the process is being run with elevated privileges by a normal user. // The idea is to prevent the reading of malicious environment variables by a process // that can do damage. // This algorithm is derived from glibc code that sets an internal // variable (__libc_enable_secure) if the process is running under setuid or setgid. return geteuid() != getuid() || getegid() != getgid() ? NULL : khrIcd_getenv(name); #else // Linux #ifdef HAVE_SECURE_GETENV return secure_getenv(name); #elif defined(HAVE___SECURE_GETENV) return __secure_getenv(name); #else #pragma message( \ "Warning: Falling back to non-secure getenv for environmental lookups! Consider" \ " updating to a different libc.") return khrIcd_getenv(name); #endif #endif } void khrIcd_free_getenv(char *val) { // No freeing of memory necessary for Linux, but we should at least touch // val to get rid of compiler warnings. (void)val; } clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/000077500000000000000000000000001450307266000217075ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/OpenCL.def000066400000000000000000000066131450307266000235150ustar00rootroot00000000000000; ; Copyright (c) 2016-2019 The Khronos Group Inc. ; ; Licensed under the Apache License, Version 2.0 (the "License"); ; you may not use this file except in compliance with the License. ; You may obtain a copy of the License at ; ; http://www.apache.org/licenses/LICENSE-2.0 ; ; Unless required by applicable law or agreed to in writing, software ; distributed under the License is distributed on an "AS IS" BASIS, ; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ; See the License for the specific language governing permissions and ; limitations under the License. ; ; OpenCL is a trademark of Apple Inc. used under license by Khronos. EXPORTS ; ; Note: ; ; 1. Functions are grouped into blocks according to the OpenCL API version they ; were introduced in. ; ; 2. Function blocks are sorted in ascending order of the API version. ; ; 3. Functions within a block are sorted alphabetically. ; ; OpenCL 1.0 API clBuildProgram clCreateBuffer clCreateCommandQueue clCreateContext clCreateContextFromType clCreateFromGLBuffer clCreateFromGLRenderbuffer clCreateFromGLTexture2D clCreateFromGLTexture3D clCreateImage2D clCreateImage3D clCreateKernel clCreateKernelsInProgram clCreateProgramWithBinary clCreateProgramWithSource clCreateSampler clEnqueueAcquireGLObjects clEnqueueBarrier clEnqueueCopyBuffer clEnqueueCopyBufferToImage clEnqueueCopyImage clEnqueueCopyImageToBuffer clEnqueueMapBuffer clEnqueueMapImage clEnqueueMarker clEnqueueNDRangeKernel clEnqueueNativeKernel clEnqueueReadBuffer clEnqueueReadImage clEnqueueReleaseGLObjects clEnqueueTask clEnqueueUnmapMemObject clEnqueueWaitForEvents clEnqueueWriteBuffer clEnqueueWriteImage clFinish clFlush clGetCommandQueueInfo clGetContextInfo clGetDeviceIDs clGetDeviceInfo clGetEventInfo clGetEventProfilingInfo clGetExtensionFunctionAddress clGetGLObjectInfo clGetGLTextureInfo clGetImageInfo clGetKernelInfo clGetKernelWorkGroupInfo clGetMemObjectInfo clGetPlatformIDs clGetPlatformInfo clGetProgramBuildInfo clGetProgramInfo clGetSamplerInfo clGetSupportedImageFormats clReleaseCommandQueue clReleaseContext clReleaseEvent clReleaseKernel clReleaseMemObject clReleaseProgram clReleaseSampler clRetainCommandQueue clRetainContext clRetainEvent clRetainKernel clRetainMemObject clRetainProgram clRetainSampler clSetCommandQueueProperty clSetKernelArg clUnloadCompiler clWaitForEvents ; OpenCL 1.1 API clCreateSubBuffer clCreateUserEvent clEnqueueCopyBufferRect clEnqueueReadBufferRect clEnqueueWriteBufferRect clSetEventCallback clSetMemObjectDestructorCallback clSetUserEventStatus ; OpenCL 1.2 API clCompileProgram clCreateFromGLTexture clCreateImage clCreateProgramWithBuiltInKernels clCreateSubDevices clEnqueueBarrierWithWaitList clEnqueueFillBuffer clEnqueueFillImage clEnqueueMarkerWithWaitList clEnqueueMigrateMemObjects clGetExtensionFunctionAddressForPlatform clGetKernelArgInfo clLinkProgram clReleaseDevice clRetainDevice clUnloadPlatformCompiler ; OpenCL 2.0 API clCreateCommandQueueWithProperties clCreatePipe clCreateSamplerWithProperties clEnqueueSVMFree clEnqueueSVMMap clEnqueueSVMMemcpy clEnqueueSVMMemFill clEnqueueSVMUnmap clGetPipeInfo clSetKernelArgSVMPointer clSetKernelExecInfo clSVMAlloc clSVMFree ; OpenCL 2.1 API clCloneKernel clCreateProgramWithIL clEnqueueSVMMigrateMem clGetDeviceAndHostTimer clGetHostTimer clGetKernelSubGroupInfo clSetDefaultDeviceCommandQueue ; OpenCL 2.2 API clSetProgramReleaseCallback clSetProgramSpecializationConstant clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/OpenCL.rc000066400000000000000000000041651450307266000233630ustar00rootroot00000000000000/* * Copyright (c) 2016-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include #define OPENCL_ICD_LOADER_VERSION_MAJOR 2 #define OPENCL_ICD_LOADER_VERSION_MINOR 2 #define OPENCL_ICD_LOADER_VERSION_REV 6 #ifdef RC_INVOKED #define OPENCL_ICD_LOADER_VAL(_v) #_v #define OPENCL_ICD_LOADER_TOSTRING(_d) OPENCL_ICD_LOADER_VAL(_d) #define OPENCL_ICD_LOADER_VERSION_STRING \ OPENCL_ICD_LOADER_TOSTRING(OPENCL_ICD_LOADER_VERSION_MAJOR) "." \ OPENCL_ICD_LOADER_TOSTRING(OPENCL_ICD_LOADER_VERSION_MINOR) "." \ OPENCL_ICD_LOADER_TOSTRING(OPENCL_ICD_LOADER_VERSION_REV) VS_VERSION_INFO VERSIONINFO FILEVERSION OPENCL_ICD_LOADER_VERSION_MAJOR,OPENCL_ICD_LOADER_VERSION_MINOR,OPENCL_ICD_LOADER_VERSION_REV,0 PRODUCTVERSION OPENCL_ICD_LOADER_VERSION_MAJOR,OPENCL_ICD_LOADER_VERSION_MINOR,OPENCL_ICD_LOADER_VERSION_REV,0 FILETYPE VFT_DLL BEGIN BLOCK "StringFileInfo" BEGIN BLOCK "040904E4" BEGIN VALUE "FileDescription" ,"OpenCL Client DLL" VALUE "ProductName" ,"Khronos OpenCL ICD Loader" VALUE "LegalCopyright" ,"Copyright \251 The Khronos Group Inc 2016-2019" VALUE "FileVersion" ,OPENCL_ICD_LOADER_VERSION_STRING ".0" VALUE "CompanyName" ,"Khronos Group" VALUE "InternalName" ,"OpenCL" VALUE "OriginalFilename","OpenCL.dll" END END BLOCK "VarFileInfo" BEGIN // extend this line for localized versions VALUE "Translation", 0x0409, 0x04E4 END END #endif clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/icd_windows.c000066400000000000000000000206661450307266000243760ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ /* * Copyright (c) 2016-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include "icd.h" #include "icd_windows.h" #include "icd_windows_hkr.h" #include "icd_windows_dxgk.h" #include #include #include #include #include typedef HRESULT (WINAPI *PFN_CREATE_DXGI_FACTORY)(REFIID, void **); static INIT_ONCE initialized = INIT_ONCE_STATIC_INIT; typedef struct WinAdapter { char * szName; LUID luid; } WinAdapter; const LUID ZeroLuid = { 0, 0 }; static WinAdapter* pWinAdapterBegin = NULL; static WinAdapter* pWinAdapterEnd = NULL; static WinAdapter* pWinAdapterCapacity = NULL; BOOL adapterAdd(const char* szName, LUID luid) { BOOL result = TRUE; if (pWinAdapterEnd == pWinAdapterCapacity) { size_t oldCapacity = pWinAdapterCapacity - pWinAdapterBegin; size_t newCapacity = oldCapacity; if (0 == newCapacity) { newCapacity = 1; } else if(newCapacity < UINT_MAX/2) { newCapacity *= 2; } WinAdapter* pNewBegin = malloc(newCapacity * sizeof(*pWinAdapterBegin)); if (!pNewBegin) result = FALSE; else { if (pWinAdapterBegin) { memcpy(pNewBegin, pWinAdapterBegin, oldCapacity * sizeof(*pWinAdapterBegin)); free(pWinAdapterBegin); } pWinAdapterCapacity = pNewBegin + newCapacity; pWinAdapterEnd = pNewBegin + oldCapacity; pWinAdapterBegin = pNewBegin; } } if (pWinAdapterEnd != pWinAdapterCapacity) { size_t nameLen = (strlen(szName) + 1)*sizeof(szName[0]); pWinAdapterEnd->szName = malloc(nameLen); if (!pWinAdapterEnd->szName) result = FALSE; else { memcpy(pWinAdapterEnd->szName, szName, nameLen); pWinAdapterEnd->luid = luid; ++pWinAdapterEnd; } } return result; } void adapterFree(WinAdapter *pWinAdapter) { free(pWinAdapter->szName); pWinAdapter->szName = NULL; } /* * * Vendor enumeration functions * */ // go through the list of vendors in the registry and call khrIcdVendorAdd // for each vendor encountered BOOL CALLBACK khrIcdOsVendorsEnumerate(PINIT_ONCE InitOnce, PVOID Parameter, PVOID *lpContext) { LONG result; BOOL status = FALSE; const char* platformsName = "SOFTWARE\\Khronos\\OpenCL\\Vendors"; HKEY platformsKey = NULL; DWORD dwIndex; khrIcdVendorsEnumerateEnv(); status |= khrIcdOsVendorsEnumerateDXGK(); if (!status) { KHR_ICD_TRACE("Failed to load via DXGK interface on RS4, continuing\n"); status |= khrIcdOsVendorsEnumerateHKR(); if (!status) { KHR_ICD_TRACE("Failed to enumerate HKR entries, continuing\n"); } } KHR_ICD_TRACE("Opening key HKLM\\%s...\n", platformsName); result = RegOpenKeyExA( HKEY_LOCAL_MACHINE, platformsName, 0, KEY_READ, &platformsKey); if (ERROR_SUCCESS != result) { KHR_ICD_TRACE("Failed to open platforms key %s, continuing\n", platformsName); } else { // for each value for (dwIndex = 0;; ++dwIndex) { char cszLibraryName[1024] = {0}; DWORD dwLibraryNameSize = sizeof(cszLibraryName); DWORD dwLibraryNameType = 0; DWORD dwValue = 0; DWORD dwValueSize = sizeof(dwValue); // read the value name KHR_ICD_TRACE("Reading value %d...\n", dwIndex); result = RegEnumValueA( platformsKey, dwIndex, cszLibraryName, &dwLibraryNameSize, NULL, &dwLibraryNameType, (LPBYTE)&dwValue, &dwValueSize); // if RegEnumKeyEx fails, we are done with the enumeration if (ERROR_SUCCESS != result) { KHR_ICD_TRACE("Failed to read value %d, done reading key.\n", dwIndex); break; } KHR_ICD_TRACE("Value %s found...\n", cszLibraryName); // Require that the value be a DWORD and equal zero if (REG_DWORD != dwLibraryNameType) { KHR_ICD_TRACE("Value not a DWORD, skipping\n"); continue; } if (dwValue) { KHR_ICD_TRACE("Value not zero, skipping\n"); continue; } // add the library status |= adapterAdd(cszLibraryName, ZeroLuid); } } // Add adapters according to DXGI's preference order HMODULE hDXGI = LoadLibrary("dxgi.dll"); if (hDXGI) { IDXGIFactory* pFactory = NULL; PFN_CREATE_DXGI_FACTORY pCreateDXGIFactory = (PFN_CREATE_DXGI_FACTORY)GetProcAddress(hDXGI, "CreateDXGIFactory"); if (pCreateDXGIFactory) { HRESULT hr = pCreateDXGIFactory(&IID_IDXGIFactory, &pFactory); if (SUCCEEDED(hr)) { UINT i = 0; IDXGIAdapter* pAdapter = NULL; while (SUCCEEDED(pFactory->lpVtbl->EnumAdapters(pFactory, i++, &pAdapter))) { DXGI_ADAPTER_DESC AdapterDesc; if (SUCCEEDED(pAdapter->lpVtbl->GetDesc(pAdapter, &AdapterDesc))) { for (WinAdapter* iterAdapter = pWinAdapterBegin; iterAdapter != pWinAdapterEnd; ++iterAdapter) { if (iterAdapter->luid.LowPart == AdapterDesc.AdapterLuid.LowPart && iterAdapter->luid.HighPart == AdapterDesc.AdapterLuid.HighPart) { khrIcdVendorAdd(iterAdapter->szName); break; } } } pAdapter->lpVtbl->Release(pAdapter); } pFactory->lpVtbl->Release(pFactory); } FreeLibrary(hDXGI); } } // Go through the list again, putting any remaining adapters at the end of the list in an undefined order for (WinAdapter* iterAdapter = pWinAdapterBegin; iterAdapter != pWinAdapterEnd; ++iterAdapter) { khrIcdVendorAdd(iterAdapter->szName); adapterFree(iterAdapter); } free(pWinAdapterBegin); pWinAdapterBegin = NULL; pWinAdapterEnd = NULL; pWinAdapterCapacity = NULL; result = RegCloseKey(platformsKey); if (ERROR_SUCCESS != result) { KHR_ICD_TRACE("Failed to close platforms key %s, ignoring\n", platformsName); } KHRicdVendor *vendorIterator; for (vendorIterator = khrIcdVendors; vendorIterator; vendorIterator = vendorIterator->next) { if (vendorIterator->libName != NULL) { free(vendorIterator->libName); vendorIterator->libName = NULL; } } return status; } // go through the list of vendors only once void khrIcdOsVendorsEnumerateOnce() { InitOnceExecuteOnce(&initialized, khrIcdOsVendorsEnumerate, NULL, NULL); } /* * * Dynamic library loading functions * */ // dynamically load a library. returns NULL on failure void *khrIcdOsLibraryLoad(const char *libraryName) { return (void *)LoadLibraryA(libraryName); } // get a function pointer from a loaded library. returns NULL on failure. void *khrIcdOsLibraryGetFunctionAddress(void *library, const char *functionName) { if (!library || !functionName) { return NULL; } return GetProcAddress( (HMODULE)library, functionName); } // unload a library. void khrIcdOsLibraryUnload(void *library) { FreeLibrary( (HMODULE)library); } clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/icd_windows.h000066400000000000000000000016061450307266000243740ustar00rootroot00000000000000/* * Copyright (c) 2017-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include #include extern const LUID ZeroLuid; BOOL adapterAdd(const char* szName, LUID luid); // Do not free the memory returned by this function. const char* getOpenCLRegKeyName(void); clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/icd_windows_dxgk.c000066400000000000000000000152371450307266000254110ustar00rootroot00000000000000/* * Copyright (c) 2017-2020 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include "icd.h" #include "icd_windows_dxgk.h" #if defined(OPENCL_ICD_LOADER_REQUIRE_WDK) #include #ifndef NTSTATUS typedef LONG NTSTATUS; #define STATUS_SUCCESS ((NTSTATUS)0x00000000L) #define STATUS_BUFFER_TOO_SMALL ((NTSTATUS)0xC0000023) #define NT_SUCCESS(Status) (((NTSTATUS)(Status)) >= 0) #endif #include #endif bool khrIcdOsVendorsEnumerateDXGK(void) { bool ret = false; int result = 0; #if defined(OPENCL_ICD_LOADER_REQUIRE_WDK) #if defined(DXGKDDI_INTERFACE_VERSION_WDDM2_4) && (DXGKDDI_INTERFACE_VERSION >= DXGKDDI_INTERFACE_VERSION_WDDM2_4) // Get handle to GDI Runtime HMODULE h = LoadLibrary("gdi32.dll"); if (h == NULL) return ret; if(GetProcAddress((HMODULE)h, "D3DKMTSubmitPresentBltToHwQueue")) // OS Version check { D3DKMT_ADAPTERINFO* pAdapterInfo = NULL; D3DKMT_ENUMADAPTERS2 EnumAdapters; NTSTATUS Status = STATUS_SUCCESS; char cszLibraryName[MAX_PATH] = { 0 }; EnumAdapters.NumAdapters = 0; EnumAdapters.pAdapters = NULL; PFND3DKMT_ENUMADAPTERS2 pEnumAdapters2 = (PFND3DKMT_ENUMADAPTERS2)GetProcAddress((HMODULE)h, "D3DKMTEnumAdapters2"); if (!pEnumAdapters2) { KHR_ICD_TRACE("GetProcAddress failed for D3DKMT_ENUMADAPTERS2\n"); goto out; } while (1) { EnumAdapters.NumAdapters = 0; EnumAdapters.pAdapters = NULL; Status = pEnumAdapters2(&EnumAdapters); if (Status == STATUS_BUFFER_TOO_SMALL) { // Number of Adapters increased between calls, retry; continue; } else if (!NT_SUCCESS(Status)) { KHR_ICD_TRACE("D3DKMT_ENUMADAPTERS2 status != SUCCESS\n"); goto out; } break; } pAdapterInfo = (D3DKMT_ADAPTERINFO*)malloc(sizeof(D3DKMT_ADAPTERINFO)*(EnumAdapters.NumAdapters)); if (pAdapterInfo == NULL) { KHR_ICD_TRACE("Allocation failure for AdapterInfo buffer\n"); goto out; } EnumAdapters.pAdapters = pAdapterInfo; Status = pEnumAdapters2(&EnumAdapters); if (!NT_SUCCESS(Status)) { KHR_ICD_TRACE("D3DKMT_ENUMADAPTERS2 status != SUCCESS\n"); goto out; } const char* cszOpenCLRegKeyName = getOpenCLRegKeyName(); const int szOpenCLRegKeyName = (int)(strlen(cszOpenCLRegKeyName) + 1)*sizeof(cszOpenCLRegKeyName[0]); for (UINT AdapterIndex = 0; AdapterIndex < EnumAdapters.NumAdapters; AdapterIndex++) { D3DDDI_QUERYREGISTRY_INFO queryArgs = {0}; D3DDDI_QUERYREGISTRY_INFO* pQueryArgs = &queryArgs; D3DDDI_QUERYREGISTRY_INFO* pQueryBuffer = NULL; queryArgs.QueryType = D3DDDI_QUERYREGISTRY_ADAPTERKEY; queryArgs.QueryFlags.TranslatePath = TRUE; queryArgs.ValueType = REG_SZ; result = MultiByteToWideChar( CP_ACP, 0, cszOpenCLRegKeyName, szOpenCLRegKeyName, queryArgs.ValueName, ARRAYSIZE(queryArgs.ValueName)); if (!result) { KHR_ICD_TRACE("MultiByteToWideChar status != SUCCESS\n"); continue; } D3DKMT_QUERYADAPTERINFO queryAdapterInfo = {0}; queryAdapterInfo.hAdapter = pAdapterInfo[AdapterIndex].hAdapter; queryAdapterInfo.Type = KMTQAITYPE_QUERYREGISTRY; queryAdapterInfo.pPrivateDriverData = &queryArgs; queryAdapterInfo.PrivateDriverDataSize = sizeof(queryArgs); Status = D3DKMTQueryAdapterInfo(&queryAdapterInfo); if (!NT_SUCCESS(Status)) { // Try a different value type. Some vendors write the key as a multi-string type. queryArgs.ValueType = REG_MULTI_SZ; Status = D3DKMTQueryAdapterInfo(&queryAdapterInfo); if (NT_SUCCESS(Status)) { KHR_ICD_TRACE("Accepting multi-string registry key type\n"); } else { // Continue trying to get as much info on each adapter as possible. // It's too late to return FALSE and claim WDDM2_4 enumeration is not available here. continue; } } if (NT_SUCCESS(Status) && pQueryArgs->Status == D3DDDI_QUERYREGISTRY_STATUS_BUFFER_OVERFLOW) { ULONG queryBufferSize = sizeof(D3DDDI_QUERYREGISTRY_INFO) + queryArgs.OutputValueSize; pQueryBuffer = (D3DDDI_QUERYREGISTRY_INFO*)malloc(queryBufferSize); if (pQueryBuffer == NULL) continue; memcpy(pQueryBuffer, &queryArgs, sizeof(D3DDDI_QUERYREGISTRY_INFO)); queryAdapterInfo.pPrivateDriverData = pQueryBuffer; queryAdapterInfo.PrivateDriverDataSize = queryBufferSize; Status = D3DKMTQueryAdapterInfo(&queryAdapterInfo); pQueryArgs = pQueryBuffer; } if (NT_SUCCESS(Status) && pQueryArgs->Status == D3DDDI_QUERYREGISTRY_STATUS_SUCCESS) { wchar_t* pWchar = pQueryArgs->OutputString; memset(cszLibraryName, 0, sizeof(cszLibraryName)); { size_t len = wcstombs(cszLibraryName, pWchar, sizeof(cszLibraryName)); KHR_ICD_ASSERT(len == (sizeof(cszLibraryName) - 1)); ret |= adapterAdd(cszLibraryName, pAdapterInfo[AdapterIndex].AdapterLuid); } } else if (Status == STATUS_INVALID_PARAMETER && pQueryArgs->Status == D3DDDI_QUERYREGISTRY_STATUS_FAIL) { free(pQueryBuffer); goto out; } free(pQueryBuffer); } out: free(pAdapterInfo); } FreeLibrary(h); #endif #endif return ret; } clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/icd_windows_dxgk.h000066400000000000000000000014111450307266000254030ustar00rootroot00000000000000/* * Copyright (c) 2017-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include #include "icd_windows.h" bool khrIcdOsVendorsEnumerateDXGK(void); clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/icd_windows_envvars.c000066400000000000000000000051541450307266000261350ustar00rootroot00000000000000/* * Copyright (c) 2016-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include #include #include char *khrIcd_getenv(const char *name) { char *retVal; DWORD valSize; valSize = GetEnvironmentVariableA(name, NULL, 0); // valSize DOES include the null terminator, so for any set variable // will always be at least 1. If it's 0, the variable wasn't set. if (valSize == 0) return NULL; // Allocate the space necessary for the registry entry retVal = (char *)malloc(valSize); if (NULL != retVal) { GetEnvironmentVariableA(name, retVal, valSize); } return retVal; } static bool khrIcd_IsHighIntegrityLevel() { bool isHighIntegrityLevel = false; HANDLE processToken; if (OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY | TOKEN_QUERY_SOURCE, &processToken)) { // Maximum possible size of SID_AND_ATTRIBUTES is maximum size of a SID + size of attributes DWORD. char mandatoryLabelBuffer[SECURITY_MAX_SID_SIZE + sizeof(DWORD)] = {0}; DWORD bufferSize; if (GetTokenInformation(processToken, TokenIntegrityLevel, mandatoryLabelBuffer, sizeof(mandatoryLabelBuffer), &bufferSize) != 0) { const TOKEN_MANDATORY_LABEL* mandatoryLabel = (const TOKEN_MANDATORY_LABEL*)(mandatoryLabelBuffer); const DWORD subAuthorityCount = *GetSidSubAuthorityCount(mandatoryLabel->Label.Sid); const DWORD integrityLevel = *GetSidSubAuthority(mandatoryLabel->Label.Sid, subAuthorityCount - 1); isHighIntegrityLevel = integrityLevel > SECURITY_MANDATORY_MEDIUM_RID; } CloseHandle(processToken); } return isHighIntegrityLevel; } char *khrIcd_secure_getenv(const char *name) { if (khrIcd_IsHighIntegrityLevel()) { KHR_ICD_TRACE("Running at a high integrity level, so secure_getenv is returning NULL\n"); return NULL; } return khrIcd_getenv(name); } void khrIcd_free_getenv(char *val) { free((void *)val); } clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/icd_windows_hkr.c000066400000000000000000000243321450307266000252340ustar00rootroot00000000000000/* * Copyright (c) 2017-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include "icd.h" #include "icd_windows_hkr.h" #include #include "icd_windows_dxgk.h" #include #include #include #include #include #include // This GUID was only added to devguid.h on Windows SDK v10.0.16232 which // corresponds to Windows 10 Redstone 3 (Windows 10 Fall Creators Update). DEFINE_GUID(OCL_GUID_DEVCLASS_SOFTWARECOMPONENT, 0x5c4c3332, 0x344d, 0x483c, 0x87, 0x39, 0x25, 0x9e, 0x93, 0x4c, 0x9c, 0xc8); typedef enum { ProbeFailure, PendingReboot, Valid } DeviceProbeResult; #define KHR_SAFE_RELEASE(mem) \ do \ { \ free(mem); \ mem = NULL; \ } while (0) static const char OPENCL_REG_SUB_KEY[] = "OpenCLDriverName"; #ifndef _WIN64 static const char OPENCL_REG_SUB_KEY_WOW[] = "OpenCLDriverNameWow"; #endif // Do not free the memory returned by this function. const char* getOpenCLRegKeyName(void) { #ifdef _WIN64 return OPENCL_REG_SUB_KEY; #else // The suffix/substring "WoW" is meaningful only when a 32-bit // application is running on a 64-bit Windows OS. A 32-bit application // running on a 32-bit OS uses non-WoW names. BOOL is_wow64; if (IsWow64Process(GetCurrentProcess(), &is_wow64) && is_wow64) { return OPENCL_REG_SUB_KEY_WOW; } return OPENCL_REG_SUB_KEY; #endif } static bool ReadOpenCLKey(DEVINST dnDevNode) { HKEY hkey = 0; CONFIGRET ret; bool bRet = false; DWORD dwLibraryNameType = 0; char *cszOclPath = NULL; DWORD dwOclPathSize = 0; LSTATUS result; ret = CM_Open_DevNode_Key( dnDevNode, KEY_QUERY_VALUE, 0, RegDisposition_OpenExisting, &hkey, CM_REGISTRY_SOFTWARE); if (CR_SUCCESS != ret) { KHR_ICD_TRACE("Failed with ret 0x%x\n", ret); goto out; } else { result = RegQueryValueExA( hkey, getOpenCLRegKeyName(), NULL, &dwLibraryNameType, NULL, &dwOclPathSize); if (ERROR_SUCCESS != result) { KHR_ICD_TRACE("Failed to open sub key 0x%x\n", result); goto out; } cszOclPath = malloc(dwOclPathSize); if (NULL == cszOclPath) { KHR_ICD_TRACE("Failed to allocate %u bytes for registry value\n", dwOclPathSize); goto out; } result = RegQueryValueExA( hkey, getOpenCLRegKeyName(), NULL, &dwLibraryNameType, (LPBYTE)cszOclPath, &dwOclPathSize); if (ERROR_SUCCESS != result) { KHR_ICD_TRACE("Failed to open sub key 0x%x\n", result); goto out; } if (REG_SZ != dwLibraryNameType) { if (REG_MULTI_SZ == dwLibraryNameType) { KHR_ICD_TRACE("Accepting multi-string registry key type\n"); } else { KHR_ICD_TRACE("Unexpected registry entry 0x%x! continuing\n", dwLibraryNameType); goto out; } } KHR_ICD_TRACE(" Path: %s\n", cszOclPath); bRet |= adapterAdd(cszOclPath, ZeroLuid); } out: free(cszOclPath); if (hkey) { result = RegCloseKey(hkey); if (ERROR_SUCCESS != result) { KHR_ICD_TRACE("WARNING: failed to close hkey 0x%x\n", result); } } return bRet; } static DeviceProbeResult ProbeDevice(DEVINST devnode) { CONFIGRET ret; ULONG ulStatus; ULONG ulProblem; ret = CM_Get_DevNode_Status( &ulStatus, &ulProblem, devnode, 0); if (CR_SUCCESS != ret) { KHR_ICD_TRACE(" WARNING: failed to probe the status of the device 0x%x\n", ret); return ProbeFailure; } // // Careful here, we need to check 2 scenarios: // 1. DN_NEED_RESTART // status flag indicates that a reboot is needed when an _already started_ // device cannot be stopped. This covers devices that are still started with their // old KMD (because they couldn't be stopped/restarted) while the UMD is updated // and possibly out of sync. // // 2. Status & DN_HAS_PROBLEM && Problem == CM_PROB_NEED_RESTART // indicates that a reboot is needed when a _stopped device_ cannot be (re)started. // if (((ulStatus & DN_HAS_PROBLEM) && ulProblem == CM_PROB_NEED_RESTART) || ulStatus & DN_NEED_RESTART) { KHR_ICD_TRACE(" WARNING: device is pending reboot (0x%x), skipping...\n", ulStatus); return PendingReboot; } return Valid; } // Tries to look for the OpenCL key under the display devices and // if not found, falls back to software component devices. bool khrIcdOsVendorsEnumerateHKR(void) { CONFIGRET ret; int iret; bool foundOpenCLKey = false; DEVINST devinst = 0; DEVINST devchild = 0; wchar_t *deviceIdList = NULL; ULONG szBuffer = 0; OLECHAR display_adapter_guid_str[MAX_GUID_STRING_LEN]; ULONG ulFlags = CM_GETIDLIST_FILTER_CLASS | CM_GETIDLIST_FILTER_PRESENT; iret = StringFromGUID2( &GUID_DEVCLASS_DISPLAY, display_adapter_guid_str, MAX_GUID_STRING_LEN); if (MAX_GUID_STRING_LEN != iret) { KHR_ICD_TRACE("StringFromGUID2 failed with %d\n", iret); goto out; } // Paranoia: we might have a new device added to the list between the call // to CM_Get_Device_ID_List_Size() and the call to CM_Get_Device_ID_List(). do { ret = CM_Get_Device_ID_List_SizeW( &szBuffer, display_adapter_guid_str, ulFlags); if (CR_SUCCESS != ret) { KHR_ICD_TRACE("CM_Get_Device_ID_List_size failed with 0x%x\n", ret); break; } // "pulLen [out] Receives a value representing the required buffer // size, in characters." // So we need to allocate the right size in bytes but we still need // to keep szBuffer as it was returned from CM_Get_Device_ID_List_Size so // the call to CM_Get_Device_ID_List will receive the correct size. deviceIdList = malloc(szBuffer * sizeof(wchar_t)); if (NULL == deviceIdList) { KHR_ICD_TRACE("Failed to allocate %u bytes for device ID strings\n", szBuffer); break; } ret = CM_Get_Device_ID_ListW( display_adapter_guid_str, deviceIdList, szBuffer, ulFlags); if (CR_SUCCESS != ret) { KHR_ICD_TRACE("CM_Get_Device_ID_List failed with 0x%x\n", ret); KHR_SAFE_RELEASE(deviceIdList); } } while (CR_BUFFER_SMALL == ret); if (NULL == deviceIdList) { goto out; } for (PWSTR deviceId = deviceIdList; *deviceId; deviceId += wcslen(deviceId) + 1) { DEVPROPTYPE devpropType; KHR_ICD_WIDE_TRACE(L"Device ID: %ls\n", deviceId); ret = CM_Locate_DevNodeW(&devinst, deviceId, 0); if (CR_SUCCESS == ret) { KHR_ICD_TRACE(" devinst: %d\n", devinst); } else { KHR_ICD_TRACE("CM_Locate_DevNode failed with 0x%x\n", ret); continue; } if (ProbeDevice(devinst) != Valid) { continue; } KHR_ICD_TRACE(" Trying to look for the key in the display adapter HKR...\n"); if (ReadOpenCLKey(devinst)) { foundOpenCLKey = true; continue; } KHR_ICD_TRACE(" Could not find the key, proceeding to children software components...\n"); ret = CM_Get_Child( &devchild, devinst, 0); if (CR_SUCCESS != ret) { KHR_ICD_TRACE(" CM_Get_Child returned 0x%x, skipping children...\n", ret); } else { do { wchar_t deviceInstanceID[MAX_DEVICE_ID_LEN] = { 0 }; GUID guid; ULONG szGuid = sizeof(guid); KHR_ICD_TRACE(" devchild: %d\n", devchild); ret = CM_Get_Device_IDW( devchild, deviceInstanceID, sizeof(deviceInstanceID), 0); if (CR_SUCCESS != ret) { KHR_ICD_TRACE(" CM_Get_Device_ID returned 0x%x, skipping device...\n", ret); continue; } else { KHR_ICD_WIDE_TRACE(L" deviceInstanceID: %ls\n", deviceInstanceID); } ret = CM_Get_DevNode_PropertyW( devchild, &DEVPKEY_Device_ClassGuid, &devpropType, (PBYTE)&guid, &szGuid, 0); if (CR_SUCCESS != ret || !IsEqualGUID(&OCL_GUID_DEVCLASS_SOFTWARECOMPONENT, &guid)) { continue; } if (ProbeDevice(devchild) != Valid) { continue; } if (ReadOpenCLKey(devchild)) { foundOpenCLKey = true; break; } } while (CM_Get_Sibling(&devchild, devchild, 0) == CR_SUCCESS); } } out: free(deviceIdList); return foundOpenCLKey; } clr-rocm-5.7.1/opencl/khronos/icd/loader/windows/icd_windows_hkr.h000066400000000000000000000014101450307266000252310ustar00rootroot00000000000000/* * Copyright (c) 2017-2019 The Khronos Group Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * OpenCL is a trademark of Apple Inc. used under license by Khronos. */ #include #include "icd_windows.h" bool khrIcdOsVendorsEnumerateHKR(void); clr-rocm-5.7.1/opencl/khronos/icd/test/000077500000000000000000000000001450307266000177265ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/test/CMakeLists.txt000066400000000000000000000002621450307266000224660ustar00rootroot00000000000000include_directories (./inc) add_subdirectory (log) add_subdirectory (driver_stub) add_subdirectory (loader_test) add_test (NAME opencl_icd_loader_test COMMAND icd_loader_test) clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/000077500000000000000000000000001450307266000222565ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/CMakeLists.txt000066400000000000000000000004451450307266000250210ustar00rootroot00000000000000 set (OPENCL_DRIVER_STUB_SOURCES cl.c cl_ext.c cl_gl.c icd.c) if (NOT "${CMAKE_SYSTEM_NAME}" STREQUAL "Linux") list (APPEND OPENCL_DRIVER_STUB_SOURCES driver_stub.def) endif () add_library (OpenCLDriverStub ${OPENCL_DRIVER_STUB_SOURCES}) target_link_libraries (OpenCLDriverStub IcdLog) clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/cl.c000066400000000000000000002117431450307266000230300ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ #include #include #include #ifndef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_USE_DEPRECATED_OPENCL_1_0_APIS #endif #ifndef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #endif // Need to rename all CL API functions to prevent ICD loader functions calling // themselves via the dispatch table. Include this before cl headers. #include "rename_api.h" #include #include #include "icd_structs.h" #define CL_PLATFORM_ICD_SUFFIX_KHR 0x0920 CL_API_ENTRY cl_int CL_API_CALL clIcdGetPlatformIDsKHR(cl_uint, cl_platform_id *, cl_uint *); struct _cl_platform_id { CLIicdDispatchTable* dispatch; const char *profile; const char *version; const char *name; const char *vendor; const char *extensions; const char *suffix; }; struct _cl_device_id { CLIicdDispatchTable* dispatch; }; struct _cl_context { CLIicdDispatchTable* dispatch; }; struct _cl_command_queue { CLIicdDispatchTable* dispatch; }; struct _cl_mem { CLIicdDispatchTable* dispatch; }; struct _cl_program { CLIicdDispatchTable* dispatch; }; struct _cl_kernel { CLIicdDispatchTable* dispatch; }; struct _cl_event { CLIicdDispatchTable* dispatch; }; struct _cl_sampler { CLIicdDispatchTable* dispatch; }; static CLIicdDispatchTable* dispatchTable = NULL; static cl_platform_id platform = NULL; static cl_bool initialized = CL_FALSE; CL_API_ENTRY cl_int CL_API_CALL clGetPlatformIDs(cl_uint num_entries , cl_platform_id * platforms , cl_uint * num_platforms) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetPlatformIDs(%u, %p, %p)\n", num_entries, platforms, num_platforms); return_value = clIcdGetPlatformIDsKHR(num_entries, platforms, num_platforms); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetPlatformInfo(cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int ret = CL_SUCCESS; const char *returnString = NULL; size_t returnStringLength = 0; /*test_icd_stub_log("clGetPlatformInfo(%p, %u, %u, %p, %p)\n", platform, param_name, param_value_size, param_value, param_value_size_ret);*/ // validate the arguments if (param_value_size == 0 && param_value != NULL) { ret = CL_INVALID_VALUE; goto done; } // select the string to return switch(param_name) { case CL_PLATFORM_PROFILE: returnString = platform->profile; break; case CL_PLATFORM_VERSION: returnString = platform->version; break; case CL_PLATFORM_NAME: returnString = platform->name; break; case CL_PLATFORM_VENDOR: returnString = platform->vendor; break; case CL_PLATFORM_EXTENSIONS: returnString = platform->extensions; break; case CL_PLATFORM_ICD_SUFFIX_KHR: returnString = platform->suffix; break; default: ret = CL_INVALID_VALUE; goto done; } // make sure the buffer passed in is big enough for the result returnStringLength = strlen(returnString)+1; if (param_value_size && param_value_size < returnStringLength) { ret = CL_INVALID_VALUE; goto done; } // pass the data back to the user if (param_value) { memcpy(param_value, returnString, returnStringLength); } if (param_value_size_ret) { *param_value_size_ret = returnStringLength; } done: /*test_icd_stub_log("Value returned: %d\n", return_value);*/ return ret; } /* Device APIs */ CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDs(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_0 { cl_int ret = CL_SUCCESS; if ((num_entries > 1 || num_entries < 0) && devices != NULL) { ret = CL_INVALID_VALUE; goto done; } if (devices != NULL) { cl_device_id obj = (cl_device_id) malloc(sizeof(*obj)); obj->dispatch = dispatchTable; devices[0] = obj; } if (num_devices) { *num_devices = 1; } done: test_icd_stub_log("clGetDeviceIDs(%p, %x, %u, %p, %p)\n", platform, device_type, num_entries, devices, num_devices); test_icd_stub_log("Value returned: %d\n", ret); return ret; } CL_API_ENTRY cl_int CL_API_CALL clGetDeviceInfo(cl_device_id device, cl_device_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetDeviceInfo(%p, %u, %u, %p, %p)\n", device, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevices(cl_device_id in_device, const cl_device_partition_property *properties, cl_uint num_entries, cl_device_id *out_devices, cl_uint *num_devices) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clCreateSubDevices(%p, %p, %u, %p, %p)\n", in_device, properties, num_entries, out_devices, num_devices); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clRetainDevice(cl_device_id device) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clRetainDevice(%p)\n", device); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clReleaseDevice(cl_device_id device) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clReleaseDevice(%p)\n", device); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Context APIs */ CL_API_ENTRY cl_context CL_API_CALL clCreateContext(const cl_context_properties * properties, cl_uint num_devices , const cl_device_id * devices, void (CL_CALLBACK * pfn_notify)(const char *, const void *, size_t, void *), void * user_data, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_context obj = (cl_context) malloc(sizeof(struct _cl_context)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateContext(%p, %u, %p, %p, %p, %p)\n", properties, num_devices, devices, pfn_notify, user_data, errcode_ret); pfn_notify(NULL, NULL, 0, NULL); test_icd_stub_log("createcontext_callback(%p, %p, %u, %p)\n", NULL, NULL, 0, NULL); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_context CL_API_CALL clCreateContextFromType(const cl_context_properties * properties, cl_device_type device_type, void (CL_CALLBACK * pfn_notify)(const char *, const void *, size_t, void *), void * user_data, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_context obj = (cl_context) malloc(sizeof(struct _cl_context)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateContextFromType(%p, %x, %p, %p, %p)\n", properties, device_type, pfn_notify, user_data, errcode_ret); pfn_notify(NULL, NULL, 0, NULL); test_icd_stub_log ("createcontext_callback(%p, %p, %u, %p)\n", NULL, NULL, 0, NULL); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_int CL_API_CALL clRetainContext(cl_context context) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clRetainContext(%p)\n", context); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clReleaseContext(cl_context context) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clReleaseContext(%p)\n", context); free(context); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetContextInfo(cl_context context, cl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetContextInfo(%p, %u, %u, %p, %p)\n", context, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Command Queue APIs */ CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueue(cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_command_queue obj = (cl_command_queue) malloc(sizeof(struct _cl_command_queue)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateCommandQueue(%p, %p, %x, %p)\n", context, device, properties, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_int CL_API_CALL clSetCommandQueueProperty(cl_command_queue command_queue , cl_command_queue_properties properties , cl_bool enable , cl_command_queue_properties * old_properties) CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clSetCommandQueueProperty(%p, %p, %u, %p)\n", command_queue, properties, enable, old_properties); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clRetainCommandQueue(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clRetainCommandQueue(%p)\n", command_queue); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clReleaseCommandQueue(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clReleaseCommandQueue(%p)\n", command_queue); free(command_queue); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetCommandQueueInfo(cl_command_queue command_queue , cl_command_queue_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetCommandQueueInfo(%p, %u, %u, %p, %p)\n", command_queue, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Memory Object APIs */ CL_API_ENTRY cl_mem CL_API_CALL clCreateBuffer(cl_context context , cl_mem_flags flags , size_t size , void * host_ptr , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_mem obj = (cl_mem) malloc(sizeof(struct _cl_mem)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateBuffer(%p, %x, %u, %p, %p)\n", context, flags, size, host_ptr, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_mem CL_API_CALL clCreateSubBuffer(cl_mem buffer , cl_mem_flags flags , cl_buffer_create_type buffer_create_type , const void * buffer_create_info , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_1 { cl_mem obj = (cl_mem) malloc(sizeof(struct _cl_mem)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateSubBuffer(%p, %x, %u, %p, %p)\n", buffer, flags, buffer_create_type, buffer_create_info, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_mem CL_API_CALL clCreateImage(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, const cl_image_desc * image_desc, void * host_ptr, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2 { cl_mem obj = (cl_mem) malloc(sizeof(struct _cl_mem)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateImage(%p, %x, %p, %p, %p, %p)\n", context, flags, image_format, image_desc, host_ptr, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_mem CL_API_CALL clCreateImage2D(cl_context context , cl_mem_flags flags , const cl_image_format * image_format , size_t image_width , size_t image_height , size_t image_row_pitch , void * host_ptr , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_mem obj = (cl_mem) malloc(sizeof(struct _cl_mem)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateImage2D(%p, %x, %p, %u, %u, %u, %p, %p)\n", context, flags, image_format, image_width, image_height, image_row_pitch, host_ptr, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_mem CL_API_CALL clCreateImage3D(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height , size_t image_depth , size_t image_row_pitch , size_t image_slice_pitch , void * host_ptr , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_mem obj = (cl_mem) malloc(sizeof(struct _cl_mem)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateImage3D(%p, %x, %p, %u, %u, %u, %u, %u, %p, %p)\n", context, flags, image_format, image_width, image_height, image_depth, image_row_pitch, image_slice_pitch, host_ptr, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_int CL_API_CALL clRetainMemObject(cl_mem memobj) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clRetainMemObject(%p)\n", memobj); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clReleaseMemObject(cl_mem memobj) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clReleaseMemObject(%p)\n", memobj); free(memobj); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetSupportedImageFormats(cl_context context, cl_mem_flags flags, cl_mem_object_type image_type , cl_uint num_entries , cl_image_format * image_formats , cl_uint * num_image_formats) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetSupportedImageFormats(%p, %x, %u, %u, %p, %p)\n", context, flags, image_type, num_entries, image_formats, num_image_formats); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectInfo(cl_mem memobj , cl_mem_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetMemObjectInfo(%p, %u, %u, %p, %p)\n", memobj, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetImageInfo(cl_mem image , cl_image_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetImageInfo(%p, %u, %u, %p, %p)\n", image, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clSetMemObjectDestructorCallback(cl_mem memobj , void (CL_CALLBACK * pfn_notify)(cl_mem memobj , void* user_data), void * user_data) CL_API_SUFFIX__VERSION_1_1 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clSetMemObjectDestructorCallback(%p, %p, %p)\n", memobj, pfn_notify, user_data); pfn_notify(memobj, NULL); test_icd_stub_log("setmemobjectdestructor_callback(%p, %p)\n", memobj, NULL); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Sampler APIs */ CL_API_ENTRY cl_sampler CL_API_CALL clCreateSampler(cl_context context , cl_bool normalized_coords , cl_addressing_mode addressing_mode , cl_filter_mode filter_mode , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_sampler obj = (cl_sampler) malloc(sizeof(struct _cl_sampler)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateSampler(%p, %u, %u, %u, %p)\n", context, normalized_coords, addressing_mode, filter_mode, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_int CL_API_CALL clRetainSampler(cl_sampler sampler) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clRetainSampler(%p)\n", sampler); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clReleaseSampler(cl_sampler sampler) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clReleaseSampler(%p)\n", sampler); free(sampler); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetSamplerInfo(cl_sampler sampler , cl_sampler_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetSamplerInfo(%p, %u, %u, %p, %p)\n", sampler, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Program Object APIs */ CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithSource(cl_context context , cl_uint count , const char ** strings , const size_t * lengths , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_program obj = (cl_program) malloc(sizeof(struct _cl_program)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateProgramWithSource(%p, %u, %p, %p, %p)\n", context, count, strings, lengths, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBinary(cl_context context , cl_uint num_devices , const cl_device_id * device_list , const size_t * lengths , const unsigned char ** binaries , cl_int * binary_status , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_program obj = (cl_program) malloc(sizeof(struct _cl_program)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateProgramWithBinary(%p, %u, %p, %p, %p, %p, %p)\n", context, num_devices, device_list, lengths, binaries, binary_status, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBuiltInKernels(cl_context context , cl_uint num_devices , const cl_device_id * device_list , const char * kernel_names , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2 { cl_program obj = (cl_program) malloc(sizeof(struct _cl_program)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateProgramWithBuiltInKernels(%p, %u, %p, %p, %p)\n", context, num_devices, device_list, kernel_names, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_int CL_API_CALL clRetainProgram(cl_program program) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clRetainProgram(%p)\n", program); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clReleaseProgram(cl_program program) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clReleaseProgram(%p)\n", program); free(program); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clBuildProgram(cl_program program , cl_uint num_devices , const cl_device_id * device_list , const char * options , void (CL_CALLBACK * pfn_notify)(cl_program program , void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clBuildProgram(%p, %u, %p, %p, %p, %p)\n", program, num_devices, device_list, options, pfn_notify, user_data); pfn_notify(program, NULL); test_icd_stub_log("program_callback(%p, %p)\n", program, NULL); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clUnloadCompiler(void) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clUnloadCompiler()\n"); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clCompileProgram(cl_program program , cl_uint num_devices , const cl_device_id * device_list , const char * options , cl_uint num_input_headers , const cl_program * input_headers, const char ** header_include_names , void (CL_CALLBACK * pfn_notify)(cl_program program , void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clCompileProgram(%p, %u, %p, %p, %u, %p, %p, %p)\n", program, num_devices, device_list, options, num_input_headers, header_include_names, pfn_notify, user_data); pfn_notify(program, NULL); test_icd_stub_log("program_callback(%p, %p)\n", program, NULL); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_program CL_API_CALL clLinkProgram(cl_context context , cl_uint num_devices , const cl_device_id * device_list , const char * options , cl_uint num_input_programs , const cl_program * input_programs , void (CL_CALLBACK * pfn_notify)(cl_program program , void * user_data), void * user_data , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2 { cl_program obj = (cl_program) malloc(sizeof(cl_program)); obj->dispatch = dispatchTable; test_icd_stub_log("clLinkProgram(%p, %u, %p, %p, %u, %p, %p, %p, %p)\n", context, num_devices, device_list, options, num_input_programs, input_programs, pfn_notify, user_data, errcode_ret); pfn_notify(obj, NULL); test_icd_stub_log("program_callback(%p, %p)\n", obj, NULL); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_int CL_API_CALL clUnloadPlatformCompiler(cl_platform_id platform) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clUnloadPlatformCompiler(%p)\n", platform); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetProgramInfo(cl_program program , cl_program_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetProgramInfo(%p, %u, %u, %p, %p)\n", program, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetProgramBuildInfo(cl_program program , cl_device_id device , cl_program_build_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetProgramBuildInfo(%p, %p, %u, %u, %p, %p)\n", program, device, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Kernel Object APIs */ CL_API_ENTRY cl_kernel CL_API_CALL clCreateKernel(cl_program program , const char * kernel_name , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { cl_kernel obj = (cl_kernel) malloc(sizeof(struct _cl_kernel)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateKernel(%p, %p, %p)\n", program, kernel_name, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_int CL_API_CALL clCreateKernelsInProgram(cl_program program , cl_uint num_kernels , cl_kernel * kernels , cl_uint * num_kernels_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clCreateKernelsInProgram(%p, %u, %p, %p)\n", program, num_kernels, kernels, num_kernels_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clRetainKernel(cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clRetainKernel(%p)\n", kernel); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clReleaseKernel(cl_kernel kernel) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clReleaseKernel(%p)\n", kernel); free(kernel); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clSetKernelArg(cl_kernel kernel , cl_uint arg_index , size_t arg_size , const void * arg_value) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clSetKernelArg(%p, %u, %u, %p)\n", kernel, arg_index, arg_size, arg_value); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetKernelInfo(cl_kernel kernel , cl_kernel_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetKernelInfo(%p, %u, %u, %p, %p)\n", kernel, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetKernelArgInfo(cl_kernel kernel , cl_uint arg_indx , cl_kernel_arg_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetKernelArgInfo(%p, %u, %u, %u, %p, %p)\n", kernel, arg_indx, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetKernelWorkGroupInfo(cl_kernel kernel , cl_device_id device , cl_kernel_work_group_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetKernelWorkGroupInfo(%p, %p, %u, %u, %p, %p)\n", kernel, device, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Event Object APIs */ CL_API_ENTRY cl_int CL_API_CALL clWaitForEvents(cl_uint num_events , const cl_event * event_list) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clWaitForEvents(%u, %p)\n", num_events, event_list); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clGetEventInfo(cl_event event , cl_event_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetEventInfo(%p, %u, %u, %p, %p)\n", event, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_event CL_API_CALL clCreateUserEvent(cl_context context , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_1 { cl_event obj = (cl_event) malloc(sizeof(struct _cl_event)); obj->dispatch = dispatchTable; test_icd_stub_log("clCreateUserEvent(%p, %p)\n", context, errcode_ret); test_icd_stub_log("Value returned: %p\n", obj); return obj; } CL_API_ENTRY cl_int CL_API_CALL clRetainEvent(cl_event event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clRetainEvent(%p)\n", event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clReleaseEvent(cl_event event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clReleaseEvent(%p)\n", event); free(event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clSetUserEventStatus(cl_event event , cl_int execution_status) CL_API_SUFFIX__VERSION_1_1 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clSetUserEventStatus(%p, %d)\n", event, execution_status); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clSetEventCallback(cl_event event , cl_int command_exec_callback_type , void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data) CL_API_SUFFIX__VERSION_1_1 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clSetEventCallback(%p, %d, %p, %p)\n", event, command_exec_callback_type, pfn_notify, user_data); pfn_notify(event, command_exec_callback_type, NULL); test_icd_stub_log("setevent_callback(%p, %d, %p)\n", event, command_exec_callback_type, NULL); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Profiling APIs */ CL_API_ENTRY cl_int CL_API_CALL clGetEventProfilingInfo(cl_event event , cl_profiling_info param_name , size_t param_value_size , void * param_value , size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clGetEventProfilingInfo(%p, %u, %u, %p, %p)\n", event, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Flush and Finish APIs */ CL_API_ENTRY cl_int CL_API_CALL clFlush(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clFlush(%p)\n", command_queue); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clFinish(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clFinish(%p)\n", command_queue); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } /* Enqueued Commands APIs */ CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBuffer(cl_command_queue command_queue , cl_mem buffer , cl_bool blocking_read , size_t offset , size_t cb , void * ptr , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueReadBuffer(%p, %p, %u, %u, %u, %p, %u, %p, %p)\n", command_queue, buffer, blocking_read, offset, cb, ptr, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBufferRect(cl_command_queue command_queue , cl_mem buffer , cl_bool blocking_read , const size_t * buffer_origin , const size_t * host_origin , const size_t * region , size_t buffer_row_pitch , size_t buffer_slice_pitch , size_t host_row_pitch , size_t host_slice_pitch , void * ptr , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_1 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueReadBufferRect(%p, %p, %u, %p, %p, %p, %u, %u, %u, %u, %p, %u, %p, %p)\n", command_queue, buffer, blocking_read, buffer_origin, host_origin, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBuffer(cl_command_queue command_queue , cl_mem buffer , cl_bool blocking_write , size_t offset , size_t cb , const void * ptr , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueWriteBuffer(%p, %p, %u, %u, %u, %p, %u, %p, %p)\n", command_queue, buffer, blocking_write, offset, cb, ptr, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBufferRect(cl_command_queue command_queue , cl_mem buffer , cl_bool blocking_write , const size_t * buffer_origin , const size_t * host_origin , const size_t * region , size_t buffer_row_pitch , size_t buffer_slice_pitch , size_t host_row_pitch , size_t host_slice_pitch , const void * ptr , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_1 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueWriteBufferRect(%p, %p, %u, %p, %p, %p, %u, %u, %u, %u, %p, %u, %p, %p)\n", command_queue, buffer, blocking_write, buffer_origin, host_origin, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBuffer(cl_command_queue command_queue , cl_mem src_buffer , cl_mem dst_buffer , size_t src_offset , size_t dst_offset , size_t cb , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueCopyBuffer(%p, %p, %p, %u, %u, %u, %u, %p, %p)\n", command_queue, src_buffer, dst_buffer, src_offset, dst_offset, cb, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferRect(cl_command_queue command_queue , cl_mem src_buffer , cl_mem dst_buffer , const size_t * src_origin , const size_t * dst_origin , const size_t * region , size_t src_row_pitch , size_t src_slice_pitch , size_t dst_row_pitch , size_t dst_slice_pitch , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_1 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueCopyBufferRect(%p, %p, %p, %p, %p, %p, %u, %u, %u, %u, %u, %p, %p)\n", command_queue, src_buffer, dst_buffer, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillBuffer(cl_command_queue command_queue , cl_mem buffer , const void * pattern , size_t pattern_size , size_t offset , size_t cb , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueFillBuffer(%p, %p, %p, %u, %u, %u, %u, %p, %p)\n", command_queue, buffer, pattern, pattern_size, offset, cb, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillImage(cl_command_queue command_queue , cl_mem image , const void * fill_color , const size_t * origin , const size_t * region , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueFillImage(%p, %p, %p, %p, %p, %u, %p, %p)\n", command_queue, image, fill_color, origin, region, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadImage(cl_command_queue command_queue , cl_mem image , cl_bool blocking_read , const size_t * origin , const size_t * region , size_t row_pitch , size_t slice_pitch , void * ptr , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueReadImage(%p, %p, %u, %p, %p, %u, %u, %p, %u, %p, %p)\n", command_queue, image, blocking_read, origin, region, row_pitch, slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteImage(cl_command_queue command_queue , cl_mem image , cl_bool blocking_write , const size_t * origin , const size_t * region , size_t input_row_pitch , size_t input_slice_pitch , const void * ptr , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueWriteImage(%p, %p, %u, %p, %p, %u, %u, %p, %u, %p, %p)\n", command_queue, image, blocking_write, origin, region, input_row_pitch, input_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImage(cl_command_queue command_queue , cl_mem src_image , cl_mem dst_image , const size_t * src_origin , const size_t * dst_origin , const size_t * region , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueCopyImage(%p, %p, %p, %p, %p, %p, %u, %p, %p)\n", command_queue, src_image, dst_image, src_origin, dst_origin, region, num_events_in_wait_list, event_wait_list , event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImageToBuffer(cl_command_queue command_queue , cl_mem src_image , cl_mem dst_buffer , const size_t * src_origin , const size_t * region , size_t dst_offset , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueCopyImageToBuffer(%p, %p, %p, %p, %p, %u, %u, %p, %p)\n", command_queue, src_image, dst_buffer, src_origin, region, dst_offset, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferToImage(cl_command_queue command_queue , cl_mem src_buffer , cl_mem dst_image , size_t src_offset , const size_t * dst_origin , const size_t * region , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueCopyBufferToImage(%p, %p, %p, %u, %p, %p, %u, %p, %p)\n", command_queue, src_buffer, dst_image, src_offset, dst_origin, region, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY void * CL_API_CALL clEnqueueMapBuffer(cl_command_queue command_queue , cl_mem buffer , cl_bool blocking_map , cl_map_flags map_flags , size_t offset , size_t cb , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { void *return_value = (void *) malloc(sizeof(void *)); test_icd_stub_log("clEnqueueMapBuffer(%p, %p, %u, %x, %u, %u, %u, %p, %p, %p)\n", command_queue, buffer, blocking_map, map_flags, offset, cb, num_events_in_wait_list, event_wait_list, event, errcode_ret); test_icd_stub_log("Value returned: %p\n", return_value); return return_value; } CL_API_ENTRY void * CL_API_CALL clEnqueueMapImage(cl_command_queue command_queue , cl_mem image , cl_bool blocking_map , cl_map_flags map_flags , const size_t * origin , const size_t * region , size_t * image_row_pitch , size_t * image_slice_pitch , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event , cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0 { void *return_value = (void *) malloc(sizeof(void *)); test_icd_stub_log("clEnqueueMapImage(%p, %p, %u, %x, %p, %p, %p, %p, %u, %p, %p, %p)\n", command_queue, image, blocking_map, map_flags, origin, region, image_row_pitch, image_slice_pitch, num_events_in_wait_list, event_wait_list, event, errcode_ret); test_icd_stub_log("Value returned: %p\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueUnmapMemObject(cl_command_queue command_queue , cl_mem memobj , void * mapped_ptr , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueUnmapMemObject(%p, %p, %p, %u, %p, %p)\n", command_queue, memobj, mapped_ptr, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueMigrateMemObjects(cl_command_queue command_queue , cl_uint num_mem_objects , const cl_mem * mem_objects , cl_mem_migration_flags flags , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueMigrateMemObjects(%p, %u, %p, %x, %u, %p, %p)\n", command_queue, num_mem_objects, mem_objects, flags, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueNDRangeKernel(cl_command_queue command_queue , cl_kernel kernel , cl_uint work_dim , const size_t * global_work_offset , const size_t * global_work_size , const size_t * local_work_size , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueNDRangeKernel(%p, %p, %u, %p, %p, %p, %u, %p, %p)\n", command_queue, kernel, work_dim, global_work_offset, global_work_size, local_work_size, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueTask(cl_command_queue command_queue , cl_kernel kernel , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueTask(%p, %p, %u, %p, %p)\n", command_queue, kernel, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueNativeKernel(cl_command_queue command_queue , void (CL_CALLBACK *user_func)(void *), void * args , size_t cb_args , cl_uint num_mem_objects , const cl_mem * mem_list , const void ** args_mem_loc , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueNativeKernel(%p, %p, %p, %u, %u, %p, %p, %u, %p, %p)\n", command_queue, user_func, args, cb_args, num_mem_objects, mem_list, args_mem_loc, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddressForPlatform(cl_platform_id platform , const char * func_name) CL_API_SUFFIX__VERSION_1_2 { void *return_value = (void *) malloc(sizeof(void *)); test_icd_stub_log("clGetExtensionFunctionAddressForPlatform(%p, %p)\n", platform, func_name); test_icd_stub_log("Value returned: %p\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarkerWithWaitList(cl_command_queue command_queue , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueMarkerWithWaitList(%p, %u, %p, %p)\n", command_queue, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrierWithWaitList(cl_command_queue command_queue , cl_uint num_events_in_wait_list , const cl_event * event_wait_list , cl_event * event) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueBarrierWithWaitList(%p, %u, %p, %p)\n", command_queue, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } extern CL_API_ENTRY cl_int CL_API_CALL clSetPrintfCallback(cl_context context , void (CL_CALLBACK * pfn_notify)(cl_context program , cl_uint printf_data_len , char * printf_data_ptr , void * user_data), void * user_data) CL_API_SUFFIX__VERSION_1_2 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clSetPrintfCallback(%p, %p, %p)\n", context, pfn_notify, user_data); pfn_notify(context, 0, NULL, NULL); test_icd_stub_log("setprintf_callback(%p, %u, %p, %p)\n", context, 0, NULL, NULL); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarker(cl_command_queue command_queue , cl_event * event) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueMarker(%p, %p)\n", command_queue, event); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueWaitForEvents(cl_command_queue command_queue , cl_uint num_events , const cl_event * event_list) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueWaitForEvents(%p, %u, %p)\n", command_queue, num_events, event_list); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrier(cl_command_queue command_queue) CL_API_SUFFIX__VERSION_1_0 { cl_int return_value = CL_OUT_OF_RESOURCES; test_icd_stub_log("clEnqueueBarrier(%p)\n", command_queue); test_icd_stub_log("Value returned: %d\n", return_value); return return_value; } extern cl_int cliIcdDispatchTableCreate(CLIicdDispatchTable **outDispatchTable); CL_API_ENTRY cl_int CL_API_CALL clIcdGetPlatformIDsKHR(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms) { cl_int result = CL_SUCCESS; if (!initialized) { result = cliIcdDispatchTableCreate(&dispatchTable); platform = (cl_platform_id) malloc(sizeof(struct _cl_platform_id)); memset(platform, 0, sizeof(struct _cl_platform_id)); platform->dispatch = dispatchTable; platform->version = "OpenCL 1.2 Stub"; platform->vendor = "stubvendorxxx"; platform->profile = "stubprofilexxx"; platform->name = "ICD_LOADER_TEST_OPENCL_STUB"; platform->extensions = "cl_khr_icd cl_khr_gl cl_khr_d3d10"; platform->suffix = "ilts"; platform->dispatch = dispatchTable; initialized = CL_TRUE; } if ((platforms && num_entries >1) || (platforms && num_entries <= 0) || (!platforms && num_entries >= 1)) { result = CL_INVALID_VALUE; goto Done; } if (platforms && num_entries == 1) { platforms[0] = platform; } Done: if (num_platforms) { *num_platforms = 1; } return result; } clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/cl_ext.c000066400000000000000000000014071450307266000237020ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ #include #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #include "CL/cl.h" #include "CL/cl_ext.h" struct driverStubextFunc_st { const char *name; void *func; }; #define EXT_FUNC(name) { #name, (void*)(name) } static struct driverStubextFunc_st clExtensions[] = { EXT_FUNC(clIcdGetPlatformIDsKHR), }; static const int clExtensionCount = sizeof(clExtensions) / sizeof(clExtensions[0]); CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddress(const char *name) { int ii; for (ii = 0; ii < clExtensionCount; ii++) { if (!strcmp(name, clExtensions[ii].name)) { return clExtensions[ii].func; } } return NULL; } clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/cl_gl.c000066400000000000000000000202521450307266000235030ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ #include #include #include // Need to rename all CL API functions to prevent ICD loader functions calling // themselves via the dispatch table. Include this before cl headers. #include "rename_api.h" #define SIZE_T_MAX (size_t) 0xFFFFFFFFFFFFFFFFULL CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLBuffer(cl_context context , cl_mem_flags flags , cl_GLuint bufret_mem , int * errcode_ret ) CL_API_SUFFIX__VERSION_1_0 { cl_mem ret_mem = (cl_mem)(SIZE_T_MAX); test_icd_stub_log("clCreateFromGLBuffer(%p, %x, %u, %p)\n", context, flags, bufret_mem, errcode_ret); test_icd_stub_log("Value returned: %p\n", ret_mem); return ret_mem; } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture(cl_context context , cl_mem_flags flags , cl_GLenum target , cl_GLint miplevel , cl_GLuint texture , cl_int * errcode_ret ) CL_API_SUFFIX__VERSION_1_2 { cl_mem ret_mem = (cl_mem)(SIZE_T_MAX); test_icd_stub_log("clCreateFromGLTexture(%p, %x, %d, %d, %u, %p)\n", context , flags , target , miplevel , texture , errcode_ret ); test_icd_stub_log("Value returned: %p\n", ret_mem); return ret_mem; } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture2D(cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret ) CL_API_SUFFIX__VERSION_1_0 { cl_mem ret_mem = (cl_mem)(SIZE_T_MAX); test_icd_stub_log("clCreateFromGLTexture2D(%p, %x, %d, %d, %u, %p)\n", context, flags, target, miplevel, texture, errcode_ret ); test_icd_stub_log("Value returned: %p\n", ret_mem); return ret_mem; } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture3D(cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret ) CL_API_SUFFIX__VERSION_1_0 { cl_mem ret_mem = (cl_mem)(SIZE_T_MAX); test_icd_stub_log("clCreateFromGLTexture3D(%p, %x, %d, %d, %u, %p)\n", context, flags, target, miplevel, texture, errcode_ret ); test_icd_stub_log("Value returned: %p\n", ret_mem); return ret_mem; } CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLRenderbuffer(cl_context context, cl_mem_flags flags, cl_GLuint renderbuffer, cl_int * errcode_ret ) CL_API_SUFFIX__VERSION_1_0 { cl_mem ret_mem = (cl_mem)(SIZE_T_MAX); test_icd_stub_log("clCreateFromGLRenderbuffer(%p, %x, %d, %p)\n", context, flags, renderbuffer, errcode_ret); test_icd_stub_log("Value returned: %p\n", ret_mem); return ret_mem; } CL_API_ENTRY cl_int CL_API_CALL clGetGLObjectInfo(cl_mem memobj, cl_gl_object_type * gl_object_type, cl_GLuint * gl_object_name ) CL_API_SUFFIX__VERSION_1_0 { cl_int ret_val = -5; test_icd_stub_log("clGetGLObjectInfo(%p, %p, %p)\n", memobj, gl_object_type, gl_object_name); test_icd_stub_log("Value returned: %p\n", ret_val); return ret_val; } CL_API_ENTRY cl_int CL_API_CALL clGetGLTextureInfo(cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret ) CL_API_SUFFIX__VERSION_1_0 { cl_int ret_val = -5; test_icd_stub_log("clGetGLTextureInfo(%p, %u, %u, %p, %p)\n", memobj, param_name, param_value_size, param_value, param_value_size_ret ); test_icd_stub_log("Value returned: %p\n", ret_val); return ret_val; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGLObjects(cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event ) CL_API_SUFFIX__VERSION_1_0 { cl_int ret_val = -5; test_icd_stub_log("clEnqueueAcquireGLObjects(%p, %u, %p, %u, %p, %p)\n", command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %p\n", ret_val); return ret_val; } CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGLObjects(cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event ) CL_API_SUFFIX__VERSION_1_0 { cl_int ret_val = -5; test_icd_stub_log("clEnqueueReleaseGLObjects(%p, %u, %p, %u, %p, %p)\n", command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); test_icd_stub_log("Value returned: %p\n", ret_val); return ret_val; } CL_API_ENTRY cl_int CL_API_CALL clGetGLContextInfoKHR(const cl_context_properties * properties, cl_gl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret ) CL_API_SUFFIX__VERSION_1_0 { cl_int ret_val = -5; test_icd_stub_log("clGetGLContextInfoKHR(%p, %u, %u, %p, %p)\n", properties, param_name, param_value_size, param_value, param_value_size_ret); test_icd_stub_log("Value returned: %p\n", ret_val); return ret_val; } CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromGLsyncKHR(cl_context context , cl_GLsync cl_GLsync , cl_int * errcode_ret ) CL_EXT_SUFFIX__VERSION_1_1 { cl_event ret_event = (cl_event)(SIZE_T_MAX); test_icd_stub_log("clCreateEventFromGLsyncKHR(%p, %p, %p)\n", context, cl_GLsync, errcode_ret); test_icd_stub_log("Value returned: %p\n", ret_event); return ret_event; } clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/driver_stub.def000066400000000000000000000000461450307266000252660ustar00rootroot00000000000000EXPORTS clGetExtensionFunctionAddress clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/icd.c000066400000000000000000000213111450307266000231570ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ #include #include #include #include #include "icd_structs.h" #define CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_USE_DEPRECATED_OPENCL_1_2_APIS // Need to rename all CL API functions to prevent ICD loader functions calling // themselves via the dispatch table. Include this before cl headers. #include "rename_api.h" #include "CL/cl.h" #include "CL/cl_gl.h" #include "CL/cl_gl_ext.h" /* * Prototypes for deprecated functions no longer present in cl.h */ extern CL_API_ENTRY cl_int CL_API_CALL clSetCommandQueueProperty(cl_command_queue /* command_queue */, cl_command_queue_properties /* properties */, cl_bool /* enable */, cl_command_queue_properties * /* old_properties */); #define ICD_DISPATCH_TABLE_ENTRY(fn) \ assert(dispatchTable->entryCount < 256); \ dispatchTable->entries[dispatchTable->entryCount++] = (void*)(fn) cl_int cliIcdDispatchTableCreate(CLIicdDispatchTable **outDispatchTable) { CLIicdDispatchTable *dispatchTable = NULL; cl_int result = CL_SUCCESS; // allocate the public handle dispatchTable = (CLIicdDispatchTable *) malloc(sizeof(*dispatchTable)); if (!dispatchTable) { result = CL_OUT_OF_HOST_MEMORY; goto Error; } memset(dispatchTable, 0, sizeof(*dispatchTable)); // OpenCL 1.0 ICD_DISPATCH_TABLE_ENTRY ( clGetPlatformIDs ); ICD_DISPATCH_TABLE_ENTRY ( clGetPlatformInfo ); ICD_DISPATCH_TABLE_ENTRY ( clGetDeviceIDs ); ICD_DISPATCH_TABLE_ENTRY ( clGetDeviceInfo ); ICD_DISPATCH_TABLE_ENTRY ( clCreateContext ); ICD_DISPATCH_TABLE_ENTRY ( clCreateContextFromType ); ICD_DISPATCH_TABLE_ENTRY ( clRetainContext ); ICD_DISPATCH_TABLE_ENTRY ( clReleaseContext ); ICD_DISPATCH_TABLE_ENTRY ( clGetContextInfo ); ICD_DISPATCH_TABLE_ENTRY ( clCreateCommandQueue ); ICD_DISPATCH_TABLE_ENTRY ( clRetainCommandQueue ); ICD_DISPATCH_TABLE_ENTRY ( clReleaseCommandQueue ); ICD_DISPATCH_TABLE_ENTRY ( clGetCommandQueueInfo ); ICD_DISPATCH_TABLE_ENTRY ( clSetCommandQueueProperty ); ICD_DISPATCH_TABLE_ENTRY ( clCreateBuffer ); ICD_DISPATCH_TABLE_ENTRY ( clCreateImage2D ); ICD_DISPATCH_TABLE_ENTRY ( clCreateImage3D ); ICD_DISPATCH_TABLE_ENTRY ( clRetainMemObject ); ICD_DISPATCH_TABLE_ENTRY ( clReleaseMemObject ); ICD_DISPATCH_TABLE_ENTRY ( clGetSupportedImageFormats ); ICD_DISPATCH_TABLE_ENTRY ( clGetMemObjectInfo ); ICD_DISPATCH_TABLE_ENTRY ( clGetImageInfo ); ICD_DISPATCH_TABLE_ENTRY ( clCreateSampler ); ICD_DISPATCH_TABLE_ENTRY ( clRetainSampler ); ICD_DISPATCH_TABLE_ENTRY ( clReleaseSampler ); ICD_DISPATCH_TABLE_ENTRY ( clGetSamplerInfo ); ICD_DISPATCH_TABLE_ENTRY ( clCreateProgramWithSource ); ICD_DISPATCH_TABLE_ENTRY ( clCreateProgramWithBinary ); ICD_DISPATCH_TABLE_ENTRY ( clRetainProgram ); ICD_DISPATCH_TABLE_ENTRY ( clReleaseProgram ); ICD_DISPATCH_TABLE_ENTRY ( clBuildProgram ); ICD_DISPATCH_TABLE_ENTRY ( clUnloadCompiler ); ICD_DISPATCH_TABLE_ENTRY ( clGetProgramInfo ); ICD_DISPATCH_TABLE_ENTRY ( clGetProgramBuildInfo ); ICD_DISPATCH_TABLE_ENTRY ( clCreateKernel ); ICD_DISPATCH_TABLE_ENTRY ( clCreateKernelsInProgram ); ICD_DISPATCH_TABLE_ENTRY ( clRetainKernel ); ICD_DISPATCH_TABLE_ENTRY ( clReleaseKernel ); ICD_DISPATCH_TABLE_ENTRY ( clSetKernelArg ); ICD_DISPATCH_TABLE_ENTRY ( clGetKernelInfo ); ICD_DISPATCH_TABLE_ENTRY ( clGetKernelWorkGroupInfo ); ICD_DISPATCH_TABLE_ENTRY ( clWaitForEvents ); ICD_DISPATCH_TABLE_ENTRY ( clGetEventInfo ); ICD_DISPATCH_TABLE_ENTRY ( clRetainEvent ); ICD_DISPATCH_TABLE_ENTRY ( clReleaseEvent ); ICD_DISPATCH_TABLE_ENTRY ( clGetEventProfilingInfo ); ICD_DISPATCH_TABLE_ENTRY ( clFlush ); ICD_DISPATCH_TABLE_ENTRY ( clFinish ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueReadBuffer ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueWriteBuffer ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueCopyBuffer ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueReadImage ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueWriteImage ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueCopyImage ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueCopyImageToBuffer ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueCopyBufferToImage ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueMapBuffer ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueMapImage ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueUnmapMemObject ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueNDRangeKernel ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueTask ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueNativeKernel ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueMarker ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueWaitForEvents ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueBarrier ); ICD_DISPATCH_TABLE_ENTRY ( clGetExtensionFunctionAddress ); ICD_DISPATCH_TABLE_ENTRY ( clCreateFromGLBuffer ); ICD_DISPATCH_TABLE_ENTRY ( clCreateFromGLTexture2D ); ICD_DISPATCH_TABLE_ENTRY ( clCreateFromGLTexture3D ); ICD_DISPATCH_TABLE_ENTRY ( clCreateFromGLRenderbuffer ); ICD_DISPATCH_TABLE_ENTRY ( clGetGLObjectInfo ); ICD_DISPATCH_TABLE_ENTRY ( clGetGLTextureInfo ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueAcquireGLObjects ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueReleaseGLObjects ); // cl_khr_gl_sharing ICD_DISPATCH_TABLE_ENTRY ( clGetGLContextInfoKHR ); // cl_khr_d3d10_sharing (windows-only) #if 0 && defined(_WIN32) ICD_DISPATCH_TABLE_ENTRY ( clGetDeviceIDsFromD3D10KHR ); ICD_DISPATCH_TABLE_ENTRY ( clCreateFromD3D10BufferKHR ); ICD_DISPATCH_TABLE_ENTRY ( clCreateFromD3D10Texture2DKHR ); ICD_DISPATCH_TABLE_ENTRY ( clCreateFromD3D10Texture3DKHR ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueAcquireD3D10ObjectsKHR ); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueReleaseD3D10ObjectsKHR ); #else ICD_DISPATCH_TABLE_ENTRY( NULL ); ICD_DISPATCH_TABLE_ENTRY( NULL ); ICD_DISPATCH_TABLE_ENTRY( NULL ); ICD_DISPATCH_TABLE_ENTRY( NULL ); ICD_DISPATCH_TABLE_ENTRY( NULL ); ICD_DISPATCH_TABLE_ENTRY( NULL ); #endif // OpenCL 1.1 ICD_DISPATCH_TABLE_ENTRY ( clSetEventCallback); ICD_DISPATCH_TABLE_ENTRY ( clCreateSubBuffer); ICD_DISPATCH_TABLE_ENTRY ( clSetMemObjectDestructorCallback); ICD_DISPATCH_TABLE_ENTRY ( clCreateUserEvent); ICD_DISPATCH_TABLE_ENTRY ( clSetUserEventStatus); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueReadBufferRect); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueWriteBufferRect); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueCopyBufferRect); ICD_DISPATCH_TABLE_ENTRY ( /*clCreateSubDevicesEXT*/NULL); ICD_DISPATCH_TABLE_ENTRY ( /*clRetainDeviceEXT*/ NULL); ICD_DISPATCH_TABLE_ENTRY ( /*clReleaseDevice*/NULL); ICD_DISPATCH_TABLE_ENTRY ( clCreateEventFromGLsyncKHR); ICD_DISPATCH_TABLE_ENTRY ( clCreateSubDevices); ICD_DISPATCH_TABLE_ENTRY ( clRetainDevice); ICD_DISPATCH_TABLE_ENTRY ( clReleaseDevice); ICD_DISPATCH_TABLE_ENTRY ( clCreateImage); ICD_DISPATCH_TABLE_ENTRY ( clCreateProgramWithBuiltInKernels); ICD_DISPATCH_TABLE_ENTRY ( clCompileProgram); ICD_DISPATCH_TABLE_ENTRY ( clLinkProgram); ICD_DISPATCH_TABLE_ENTRY ( clUnloadPlatformCompiler); ICD_DISPATCH_TABLE_ENTRY ( clGetKernelArgInfo); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueFillBuffer); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueFillImage); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueMigrateMemObjects); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueMarkerWithWaitList); ICD_DISPATCH_TABLE_ENTRY ( clEnqueueBarrierWithWaitList); ICD_DISPATCH_TABLE_ENTRY ( clGetExtensionFunctionAddressForPlatform); ICD_DISPATCH_TABLE_ENTRY ( clCreateFromGLTexture); // return success *outDispatchTable = dispatchTable; return CL_SUCCESS; Error: return result; } void cliIcdDispatchTableDestroy(CLIicdDispatchTable *dispatchTable) { free(dispatchTable); } clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/icd_driver_exports.map000066400000000000000000000001321450307266000266470ustar00rootroot00000000000000{ global: clGetExtensionFunctionAddress; clGetPlatformInfo; local: *; }; clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/icd_structs.h000066400000000000000000000005051450307266000247550ustar00rootroot00000000000000#ifndef _ICD_STRUCTS_H_ #define _ICD_STRUCTS_H_ typedef struct CLIicdDispatchTable_st CLIicdDispatchTable; typedef struct CLIplatform_st CLIplatform; struct CLIicdDispatchTable_st { void *entries[256]; int entryCount; }; struct CLIplatform_st { CLIicdDispatchTable* dispatch; }; #endif /* _ICD_STRUCTS_H_ */ clr-rocm-5.7.1/opencl/khronos/icd/test/driver_stub/rename_api.h000066400000000000000000000161661450307266000245410ustar00rootroot00000000000000#ifndef _RENAME_API_H_ #define _RENAME_API_H_ #define clGetPlatformIDs ___clGetPlatformIDs #define clGetPlatformInfo ___clGetPlatformInfo #define clGetDeviceIDs ___clGetDeviceIDs #define clGetDeviceInfo ___clGetDeviceInfo #define clCreateSubDevices ___clCreateSubDevices #define clRetainDevice ___clRetainDevice #define clReleaseDevice ___clReleaseDevice #define clCreateContext ___clCreateContext #define clCreateContextFromType ___clCreateContextFromType #define clRetainContext ___clRetainContext #define clReleaseContext ___clReleaseContext #define clGetContextInfo ___clGetContextInfo #define clCreateCommandQueue ___clCreateCommandQueue #define clSetCommandQueueProperty ___clSetCommandQueueProperty #define clRetainCommandQueue ___clRetainCommandQueue #define clReleaseCommandQueue ___clReleaseCommandQueue #define clGetCommandQueueInfo ___clGetCommandQueueInfo #define clCreateBuffer ___clCreateBuffer #define clCreateSubBuffer ___clCreateSubBuffer #define clCreateImage ___clCreateImage #define clCreateImage2D ___clCreateImage2D #define clCreateImage3D ___clCreateImage3D #define clRetainMemObject ___clRetainMemObject #define clReleaseMemObject ___clReleaseMemObject #define clGetSupportedImageFormats ___clGetSupportedImageFormats #define clGetMemObjectInfo ___clGetMemObjectInfo #define clGetImageInfo ___clGetImageInfo #define clSetMemObjectDestructorCallback ___clSetMemObjectDestructorCallback #define clCreateSampler ___clCreateSampler #define clRetainSampler ___clRetainSampler #define clReleaseSampler ___clReleaseSampler #define clGetSamplerInfo ___clGetSamplerInfo #define clCreateProgramWithSource ___clCreateProgramWithSource #define clCreateProgramWithBinary ___clCreateProgramWithBinary #define clCreateProgramWithBuiltInKernels ___clCreateProgramWithBuiltInKernels #define clRetainProgram ___clRetainProgram #define clReleaseProgram ___clReleaseProgram #define clBuildProgram ___clBuildProgram #define clUnloadCompiler ___clUnloadCompiler #define clCompileProgram ___clCompileProgram #define clLinkProgram ___clLinkProgram #define clUnloadPlatformCompiler ___clUnloadPlatformCompiler #define clGetProgramInfo ___clGetProgramInfo #define clGetProgramBuildInfo ___clGetProgramBuildInfo #define clCreateKernel ___clCreateKernel #define clCreateKernelsInProgram ___clCreateKernelsInProgram #define clRetainKernel ___clRetainKernel #define clReleaseKernel ___clReleaseKernel #define clSetKernelArg ___clSetKernelArg #define clGetKernelInfo ___clGetKernelInfo #define clGetKernelArgInfo ___clGetKernelArgInfo #define clGetKernelWorkGroupInfo ___clGetKernelWorkGroupInfo #define clWaitForEvents ___clWaitForEvents #define clGetEventInfo ___clGetEventInfo #define clCreateUserEvent ___clCreateUserEvent #define clRetainEvent ___clRetainEvent #define clReleaseEvent ___clReleaseEvent #define clSetUserEventStatus ___clSetUserEventStatus #define clSetEventCallback ___clSetEventCallback #define clGetEventProfilingInfo ___clGetEventProfilingInfo #define clFlush ___clFlush #define clFinish ___clFinish #define clEnqueueReadBuffer ___clEnqueueReadBuffer #define clEnqueueReadBufferRect ___clEnqueueReadBufferRect #define clEnqueueWriteBuffer ___clEnqueueWriteBuffer #define clEnqueueWriteBufferRect ___clEnqueueWriteBufferRect #define clEnqueueCopyBuffer ___clEnqueueCopyBuffer #define clEnqueueCopyBufferRect ___clEnqueueCopyBufferRect #define clEnqueueFillBuffer ___clEnqueueFillBuffer #define clEnqueueFillImage ___clEnqueueFillImage #define clEnqueueReadImage ___clEnqueueReadImage #define clEnqueueWriteImage ___clEnqueueWriteImage #define clEnqueueCopyImage ___clEnqueueCopyImage #define clEnqueueCopyImageToBuffer ___clEnqueueCopyImageToBuffer #define clEnqueueCopyBufferToImage ___clEnqueueCopyBufferToImage #define clEnqueueMapBuffer ___clEnqueueMapBuffer #define clEnqueueMapImage ___clEnqueueMapImage #define clEnqueueUnmapMemObject ___clEnqueueUnmapMemObject #define clEnqueueMigrateMemObjects ___clEnqueueMigrateMemObjects #define clEnqueueNDRangeKernel ___clEnqueueNDRangeKernel #define clEnqueueTask ___clEnqueueTask #define clEnqueueNativeKernel ___clEnqueueNativeKernel #define clGetExtensionFunctionAddressForPlatform ___clGetExtensionFunctionAddressForPlatform #define clEnqueueMarkerWithWaitList ___clEnqueueMarkerWithWaitList #define clEnqueueBarrierWithWaitList ___clEnqueueBarrierWithWaitList #define clSetPrintfCallback ___clSetPrintfCallback #define clEnqueueMarker ___clEnqueueMarker #define clEnqueueWaitForEvents ___clEnqueueWaitForEvents #define clEnqueueBarrier ___clEnqueueBarrier #define clCreateFromGLBuffer ___clCreateFromGLBuffer #define clCreateFromGLTexture ___clCreateFromGLTexture #define clCreateFromGLTexture2D ___clCreateFromGLTexture2D #define clCreateFromGLTexture3D ___clCreateFromGLTexture3D #define clCreateFromGLRenderbuffer ___clCreateFromGLRenderbuffer #define clGetGLObjectInfo ___clGetGLObjectInfo #define clGetGLTextureInfo ___clGetGLTextureInfo #define clEnqueueAcquireGLObjects ___clEnqueueAcquireGLObjects #define clEnqueueReleaseGLObjects ___clEnqueueReleaseGLObjects #define clGetGLContextInfoKHR ___clGetGLContextInfoKHR #define clCreateEventFromGLsyncKHR ___clCreateEventFromGLsyncKHR #endif /* __RENAME_API_H__ */ clr-rocm-5.7.1/opencl/khronos/icd/test/inc/000077500000000000000000000000001450307266000204775ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/test/inc/platform/000077500000000000000000000000001450307266000223235ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/test/inc/platform/icd_test_log.h000066400000000000000000000011011450307266000251240ustar00rootroot00000000000000#ifndef _ICD_TEST_LOG_H_ #define _ICD_TEST_LOG_H_ #if defined (_WIN32) #define DllExport __declspec( dllexport ) #else #define DllExport #endif DllExport int test_icd_initialize_app_log(void); DllExport void test_icd_app_log(const char *format, ...); DllExport void test_icd_close_app_log(void); DllExport char *test_icd_get_stub_log(void); DllExport int test_icd_initialize_stub_log(void); DllExport void test_icd_stub_log(const char *format, ...); DllExport void test_icd_close_stub_log(void); DllExport char *test_icd_get_app_log(void); #endif /* _ICD_TEST_LOG_H_ */ clr-rocm-5.7.1/opencl/khronos/icd/test/loader_test/000077500000000000000000000000001450307266000222335ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/khronos/icd/test/loader_test/CMakeLists.txt000066400000000000000000000005261450307266000247760ustar00rootroot00000000000000add_executable (icd_loader_test test_kernel.c main.c test_platforms.c icd_test_match.c test_program_objects.c test_sampler_objects.c test_buffer_object.c test_cl_runtime.c callbacks.c test_create_calls.c test_clgl.c test_image_objects.c ) target_link_libraries (icd_loader_test OpenCL IcdLog) clr-rocm-5.7.1/opencl/khronos/icd/test/loader_test/callbacks.c000066400000000000000000000021611450307266000243160ustar00rootroot00000000000000#include #include #include void CL_CALLBACK createcontext_callback(const char* _a, const void* _b, size_t _c, void* _d) { test_icd_app_log("createcontext_callback(%p, %p, %u, %p)\n", _a, _b, _c, _d); } void CL_CALLBACK setmemobjectdestructor_callback(cl_mem _a, void* _b) { test_icd_app_log("setmemobjectdestructor_callback(%p, %p)\n", _a, _b); } void CL_CALLBACK program_callback(cl_program _a, void* _b) { test_icd_app_log("program_callback(%p, %p)\n", _a, _b); } void CL_CALLBACK setevent_callback(cl_event _a, cl_int _b, void* _c) { test_icd_app_log("setevent_callback(%p, %d, %p)\n", _a, _b, _c); } void CL_CALLBACK setprintf_callback(cl_context _a, cl_uint _b, char* _c, void* _d ) { test_icd_app_log("setprintf_callback(%p, %u, %p, %p)\n", _a, _b, _c, _d); } clr-rocm-5.7.1/opencl/khronos/icd/test/loader_test/icd_test_match.c000066400000000000000000000013301450307266000253460ustar00rootroot00000000000000#include #include #ifndef __APPLE__ #include #endif #include int test_icd_match() { int error = 0; char *app_log = NULL, *stub_log = NULL; app_log = test_icd_get_app_log(); if (!app_log) { printf("ERROR: Could not retrieve app log\n"); error = 1; goto End; } stub_log = test_icd_get_stub_log(); if (!stub_log) { printf("ERROR: Could not retrieve stub log\n"); error = 1; goto End; } if (strcmp(app_log, stub_log)) { printf("ERROR: App log and stub log differ.\n"); error = 1; goto End; } End: free(app_log); free(stub_log); return error; } clr-rocm-5.7.1/opencl/khronos/icd/test/loader_test/main.c000066400000000000000000000021511450307266000233220ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ #include #include #include #include "param_struct.h" extern int test_create_calls(); extern int test_platforms(); extern int test_cl_runtime(); extern int test_kernel(); extern int test_buffer_object(); extern int test_program_objects(); extern int test_image_objects(); extern int test_sampler_objects(); extern int test_OpenGL_share(); extern int test_release_calls(); extern int test_icd_match(); int main(int argc, char **argv) { test_icd_initialize_app_log(); test_icd_initialize_stub_log(); test_create_calls(); test_platforms(); test_cl_runtime(); test_kernel(); test_buffer_object(); test_program_objects(); test_image_objects(); test_sampler_objects(); test_OpenGL_share(); test_release_calls(); test_icd_close_app_log(); test_icd_close_stub_log(); if (test_icd_match()) { printf("ICD Loader Test FAILED\n"); return 1; } else { printf("ICD Loader Test PASSED\n"); return 0; } } clr-rocm-5.7.1/opencl/khronos/icd/test/loader_test/param_struct.h000066400000000000000000000574121450307266000251210ustar00rootroot00000000000000#ifndef _PARAM_STRUCT_H_ #define _PARAM_STRUCT_H_ #include #include #include struct clCreateCommandQueue_st { cl_context context; cl_device_id device; cl_command_queue_properties properties; cl_int *errcode_ret; }; struct clSetCommandQueueProperty_st { cl_command_queue command_queue; cl_command_queue_properties properties; cl_bool enable; cl_command_queue_properties *old_properties; }; struct clGetCommandQueueInfo_st { cl_command_queue command_queue; cl_command_queue_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; struct clCreateContext_st { const cl_context_properties *properties; cl_uint num_devices; const cl_device_id *devices; void (CL_CALLBACK*pfn_notify)(const char *errinfo, const void *private_info, size_t cb, void *user_data); void *user_data; cl_int *errcode_ret; }; struct clCreateContextFromType_st { const cl_context_properties *properties; cl_device_type device_type; void (CL_CALLBACK *pfn_notify)(const char *errinfo, const void *private_info, size_t cb,void *user_data); void *user_data; cl_int *errcode_ret; }; struct clRetainContext_st { cl_context context; }; struct clReleaseContext_st { cl_context context; }; struct clGetContextInfo_st { cl_context context; cl_context_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; struct clGetPlatformIDs_st { cl_uint num_entries; cl_platform_id *platforms; cl_uint *num_platforms; }; struct clGetPlatformInfo_st { cl_platform_id platform; cl_platform_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; struct clGetDeviceIDs_st { cl_platform_id platform; cl_device_type device_type; cl_uint num_entries; cl_device_id *devices; cl_uint *num_devices; }; struct clRetainCommandQueue_st { cl_command_queue command_queue; }; struct clReleaseCommandQueue_st { cl_command_queue command_queue; }; #define NUM_ITEMS_clCreateCommandQueue 1 #define NUM_ITEMS_clRetainCommandQueue 1 #define NUM_ITEMS_clReleaseCommandQueue 1 #define NUM_ITEMS_clGetCommandQueueInfo 1 #define NUM_ITEMS_clSetCommandQueueProperty 1 #define NUM_ITEMS_clCreateContext 1 #define NUM_ITEMS_clCreateContextFromType 1 #define NUM_ITEMS_clRetainContext 1 #define NUM_ITEMS_clReleaseContext 1 #define NUM_ITEMS_clGetContextInfo 1 #define NUM_ITEMS_clGetPlatformIDs 1 #define NUM_ITEMS_clGetPlatformInfo 1 #define NUM_ITEMS_clGetDeviceIDs 1 #define NUM_ITEMS_clGetDeviceInfo 1 #define NUM_ITEMS_clCreateSubDevices 1 #define NUM_ITEMS_clRetainDevice 1 #define NUM_ITEMS_clReleaseDevice 1 struct clGetDeviceInfo_st { cl_device_id device; cl_device_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; struct clCreateSubDevices_st { cl_device_id in_device; cl_device_partition_property *properties; cl_uint num_entries; cl_device_id *out_devices; cl_uint *num_devices; }; struct clRetainDevice_st { cl_device_id device; }; struct clReleaseDevice_st { cl_device_id device; }; #define NUM_ITEMS_clCreateBuffer 1 #define NUM_ITEMS_clCreateSubBuffer 1 #define NUM_ITEMS_clEnqueueReadBuffer 1 #define NUM_ITEMS_clEnqueueWriteBuffer 1 #define NUM_ITEMS_clEnqueueReadBufferRect 1 #define NUM_ITEMS_clEnqueueWriteBufferRect 1 #define NUM_ITEMS_clEnqueueFillBuffer 1 #define NUM_ITEMS_clEnqueueCopyBuffer 1 #define NUM_ITEMS_clEnqueueCopyBufferRect 1 #define NUM_ITEMS_clEnqueueMapBuffer 1 #define NUM_ITEMS_clRetainMemObject 1 #define NUM_ITEMS_clReleaseMemObject 1 #define NUM_ITEMS_clSetMemObjectDestructorCallback 1 #define NUM_ITEMS_clEnqueueUnmapMemObject 1 #define NUM_ITEMS_clGetMemObjectInfo 1 struct clCreateBuffer_st { cl_context context; cl_mem_flags flags; size_t size; void *host_ptr; cl_int *errcode_ret; }; struct clCreateSubBuffer_st { cl_mem buffer; cl_mem_flags flags; cl_buffer_create_type buffer_create_type; const void *buffer_create_info; cl_int *errcode_ret; }; struct clEnqueueReadBuffer_st { cl_command_queue command_queue; cl_mem buffer; cl_bool blocking_read; size_t offset; size_t cb; void *ptr; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueWriteBuffer_st { cl_command_queue command_queue; cl_mem buffer; cl_bool blocking_write; size_t offset; size_t cb; const void *ptr; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueReadBufferRect_st { cl_command_queue command_queue; cl_mem buffer; cl_bool blocking_read; const size_t * buffer_offset; const size_t * host_offset; const size_t * region; size_t buffer_row_pitch; size_t buffer_slice_pitch; size_t host_row_pitch; size_t host_slice_pitch; void *ptr; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueWriteBufferRect_st { cl_command_queue command_queue; cl_mem buffer; cl_bool blocking_write; const size_t *buffer_offset; const size_t *host_offset; const size_t *region; size_t buffer_row_pitch; size_t buffer_slice_pitch; size_t host_row_pitch; size_t host_slice_pitch; void *ptr; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueFillBuffer_st { cl_command_queue command_queue; cl_mem buffer; const void *pattern; size_t pattern_size; size_t offset; size_t cb; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueCopyBuffer_st { cl_command_queue command_queue; cl_mem src_buffer; cl_mem dst_buffer; size_t src_offset; size_t dst_offset; size_t cb; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueCopyBufferRect_st { cl_command_queue command_queue; cl_mem src_buffer; cl_mem dst_buffer; const size_t *src_origin; const size_t *dst_origin; const size_t *region; size_t src_row_pitch; size_t src_slice_pitch; size_t dst_row_pitch; size_t dst_slice_pitch; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueMapBuffer_st { cl_command_queue command_queue; cl_mem buffer; cl_bool blocking_map; cl_map_flags map_flags; size_t offset; size_t cb; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; cl_int *errcode_ret; }; struct clRetainMemObject_st { cl_mem memobj; }; struct clReleaseMemObject_st { cl_mem memobj; }; struct clSetMemObjectDestructorCallback_st { cl_mem memobj; void (CL_CALLBACK *pfn_notify)(cl_mem memobj, void *user_data); void *user_data; }; struct clEnqueueUnmapMemObject_st { cl_command_queue command_queue; cl_mem memobj; void *mapped_ptr; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clGetMemObjectInfo_st { cl_mem memobj; cl_mem_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; #define NUM_ITEMS_clCreateProgramWithSource 1 #define NUM_ITEMS_clCreateProgramWithBinary 1 #define NUM_ITEMS_clCreateProgramWithBuiltInKernels 1 #define NUM_ITEMS_clRetainProgram 1 #define NUM_ITEMS_clReleaseProgram 1 #define NUM_ITEMS_clBuildProgram 1 #define NUM_ITEMS_clCompileProgram 1 #define NUM_ITEMS_clLinkProgram 1 #define NUM_ITEMS_clUnloadPlatformCompiler 1 #define NUM_ITEMS_clGetProgramInfo 1 #define NUM_ITEMS_clGetProgramBuildInfo 1 #define NUM_ITEMS_clUnloadCompiler 1 #define NUM_ITEMS_clGetExtensionFunctionAddress 1 #define NUM_ITEMS_clGetExtensionFunctionAddressForPlatform 1 struct clCreateProgramWithSource_st { cl_context context; cl_uint count; const char **strings; const size_t *lengths; cl_int *errcode_ret; }; struct clCreateProgramWithBinary_st { cl_context context; cl_uint num_devices; const cl_device_id *device_list; const size_t *lengths; const unsigned char **binaries; cl_int *binary_status; cl_int *errcode_ret; }; struct clCreateProgramWithBuiltInKernels_st { cl_context context; cl_uint num_devices; const cl_device_id *device_list; const char *kernel_names; cl_int *errcode_ret; }; struct clRetainProgram_st { cl_program program; }; struct clReleaseProgram_st { cl_program program; }; struct clBuildProgram_st { cl_program program; cl_uint num_devices; const cl_device_id *device_list; const char *options; void (CL_CALLBACK*pfn_notify)(cl_program program, void *user_data); void *user_data; }; struct clCompileProgram_st { cl_program program; cl_uint num_devices; const cl_device_id *device_list; const char *options; cl_uint num_input_headers; const cl_program *headers; const char **header_include_names; void (CL_CALLBACK *pfn_notify)(cl_program program, void * user_data); void *user_data; }; struct clLinkProgram_st { cl_context context; cl_uint num_devices; const cl_device_id *device_list; const char *options; cl_uint num_input_programs; const cl_program *input_programs; void (CL_CALLBACK *pfn_notify)(cl_program program, void *user_data); void *user_data; cl_int *errcode_ret; }; struct clUnloadPlatformCompiler_st { cl_platform_id platform; }; #if 0 struct clUnloadCompiler_st { void ; }; #endif struct clGetExtensionFunctionAddress_st { const char *func_name; }; struct clGetExtensionFunctionAddressForPlatform_st { cl_platform_id platform; const char *func_name; }; struct clGetProgramInfo_st { cl_program program; cl_program_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; struct clGetProgramBuildInfo_st { cl_program program; cl_device_id device; cl_program_build_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; #define NUM_ITEMS_clCreateImage2D 1 #define NUM_ITEMS_clCreateImage3D 1 #define NUM_ITEMS_clCreateImage 1 #define NUM_ITEMS_clGetSupportedImageFormats 1 #define NUM_ITEMS_clEnqueueCopyImageToBuffer 1 #define NUM_ITEMS_clEnqueueCopyBufferToImage 1 #define NUM_ITEMS_clEnqueueMapImage 1 #define NUM_ITEMS_clEnqueueReadImage 1 #define NUM_ITEMS_clEnqueueWriteImage 1 #define NUM_ITEMS_clEnqueueFillImage 1 #define NUM_ITEMS_clEnqueueCopyImage 1 #define NUM_ITEMS_clGetMemObjectInfo 1 #define NUM_ITEMS_clGetImageInfo 1 struct clCreateImage_st { cl_context context; cl_mem_flags flags; const cl_image_format *image_format; const cl_image_desc *image_desc; void *host_ptr; cl_int *errcode_ret; }; struct clCreateImage2D_st { cl_context context; cl_mem_flags flags; const cl_image_format *image_format; size_t image_width; size_t image_height; size_t image_row_pitch; void *host_ptr; cl_int *errcode_ret; }; struct clCreateImage3D_st { cl_context context; cl_mem_flags flags; const cl_image_format *image_format; size_t image_width; size_t image_height; size_t image_depth; size_t image_row_pitch; size_t image_slice_pitch; void *host_ptr; cl_int *errcode_ret; }; struct clGetSupportedImageFormats_st { cl_context context; cl_mem_flags flags; cl_mem_object_type image_type; cl_uint num_entries; cl_image_format *image_formats; cl_uint *num_image_formats; }; struct clEnqueueCopyImageToBuffer_st { cl_command_queue command_queue; cl_mem src_image; cl_mem dst_buffer; const size_t *src_origin; const size_t *region; size_t dst_offset; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueCopyBufferToImage_st { cl_command_queue command_queue; cl_mem src_buffer; cl_mem dst_image; size_t src_offset; const size_t *dst_origin; const size_t *region; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueMapImage_st { cl_command_queue command_queue; cl_mem image; cl_bool blocking_map; cl_map_flags map_flags; const size_t *origin; const size_t *region; size_t *image_row_pitch; size_t *image_slice_pitch; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; cl_int *errcode_ret; }; struct clEnqueueReadImage_st { cl_command_queue command_queue; cl_mem image; cl_bool blocking_read; const size_t *origin; const size_t *region; size_t row_pitch; size_t slice_pitch; void *ptr; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueWriteImage_st { cl_command_queue command_queue; cl_mem image; cl_bool blocking_write; const size_t *origin; const size_t *region; size_t input_row_pitch; size_t input_slice_pitch; const void *ptr; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueFillImage_st { cl_command_queue command_queue; cl_mem image; const void *fill_color; const size_t *origin; const size_t *region; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueCopyImage_st { cl_command_queue command_queue; cl_mem src_image; cl_mem dst_image; const size_t *src_origin; const size_t *dst_origin; const size_t *region; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; #if 0 struct clGetMemObjectInfo_st { cl_mem memobj; cl_mem_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; #endif struct clGetImageInfo_st { cl_mem image; cl_image_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; #define NUM_ITEMS_clCreateSampler 1 #define NUM_ITEMS_clRetainSampler 1 #define NUM_ITEMS_clReleaseSampler 1 #define NUM_ITEMS_clGetSamplerInfo 1 struct clCreateSampler_st { cl_context context; cl_bool normalized_coords; cl_addressing_mode addressing_mode; cl_filter_mode filter_mode; cl_int *errcode_ret; }; struct clRetainSampler_st { cl_sampler sampler; }; struct clReleaseSampler_st { cl_sampler sampler; }; struct clGetSamplerInfo_st { cl_sampler sampler; cl_sampler_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; #define NUM_ITEMS_clCreateKernel 1 #define NUM_ITEMS_clCreateKernelsInProgram 1 #define NUM_ITEMS_clRetainKernel 1 #define NUM_ITEMS_clReleaseKernel 1 struct clCreateKernel_st { cl_program program; const char *kernel_name; cl_int *errcode_ret; }; struct clCreateKernelsInProgram_st { cl_program program; cl_uint num_kernels; cl_kernel *kernels; cl_uint *num_kernels_ret; }; struct clRetainKernel_st { cl_kernel kernel; }; struct clReleaseKernel_st { cl_kernel kernel; }; #define NUM_ITEMS_clSetKernelArg 1 #define NUM_ITEMS_clGetKernelInfo 1 #define NUM_ITEMS_clGetKernelArgInfo 1 #define NUM_ITEMS_clGetKernelWorkGroupInfo 1 struct clSetKernelArg_st { cl_kernel kernel; cl_uint arg_index; size_t arg_size; const void *arg_value; }; struct clGetKernelInfo_st { cl_kernel kernel; cl_kernel_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; struct clGetKernelArgInfo_st { cl_kernel kernel; cl_uint arg_indx; cl_kernel_arg_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; struct clGetKernelWorkGroupInfo_st { cl_kernel kernel; cl_device_id device; cl_kernel_work_group_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; #define NUM_ITEMS_clEnqueueMigrateMemObjects 1 #define NUM_ITEMS_clEnqueueNDRangeKernel 1 #define NUM_ITEMS_clEnqueueTask 1 #define NUM_ITEMS_clEnqueueNativeKernel 1 struct clEnqueueMigrateMemObjects_st { cl_command_queue command_queue; cl_uint num_mem_objects; const cl_mem *mem_objects; cl_mem_migration_flags flags; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueNDRangeKernel_st { cl_command_queue command_queue; cl_kernel kernel; cl_uint work_dim; const size_t *global_work_offset; const size_t *global_work_size; const size_t *local_work_size; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueTask_st { cl_command_queue command_queue; cl_kernel kernel; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueNativeKernel_st { cl_command_queue command_queue; void (CL_CALLBACK *user_func)(void *); void *args; size_t cb_args; cl_uint num_mem_objects; const cl_mem *mem_list; const void **args_mem_loc; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; #define NUM_ITEMS_clCreateUserEvent 1 #define NUM_ITEMS_clSetUserEventStatus 1 #define NUM_ITEMS_clWaitForEvents 1 #define NUM_ITEMS_clGetEventInfo 1 #define NUM_ITEMS_clSetEventCallback 1 #define NUM_ITEMS_clRetainEvent 1 #define NUM_ITEMS_clReleaseEvent 1 struct clCreateUserEvent_st { cl_context context; cl_int *errcode_ret; }; struct clSetUserEventStatus_st { cl_event event; cl_int execution_status; }; struct clWaitForEvents_st { cl_uint num_events; const cl_event *event_list; }; struct clGetEventInfo_st { cl_event event; cl_event_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; struct clSetEventCallback_st { cl_event event; cl_int command_exec_callback_type; void (CL_CALLBACK *pfn_event_notify)(cl_event event, cl_int event_command_exec_status,void *user_data); void *user_data; }; struct clRetainEvent_st { cl_event event; }; struct clReleaseEvent_st { cl_event event; }; #define NUM_ITEMS_clEnqueueMarker 1 #define NUM_ITEMS_clEnqueueWaitForEvents 1 #define NUM_ITEMS_clEnqueueBarrier 1 #define NUM_ITEMS_clEnqueueMarkerWithWaitList 1 #define NUM_ITEMS_clEnqueueBarrierWithWaitList 1 struct clEnqueueMarker_st { cl_command_queue command_queue; cl_event *event; }; struct clEnqueueWaitForEvents_st { cl_command_queue command_queue; cl_uint num_events; const cl_event *event_list; }; struct clEnqueueBarrier_st { cl_command_queue command_queue; }; struct clEnqueueMarkerWithWaitList_st { cl_command_queue command_queue; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueBarrierWithWaitList_st { cl_command_queue command_queue; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; #define NUM_ITEMS_clGetEventProfilingInfo 1 struct clGetEventProfilingInfo_st { cl_event event; cl_profiling_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; #define NUM_ITEMS_clFlush 1 #define NUM_ITEMS_clFinish 1 struct clFlush_st { cl_command_queue command_queue; }; struct clFinish_st { cl_command_queue command_queue; }; #define NUM_ITEMS_clCreateFromGLBuffer 1 struct clCreateFromGLBuffer_st { cl_context context; cl_mem_flags flags; cl_GLuint bufobj; int *errcode_ret; }; #define NUM_ITEMS_clCreateFromGLTexture 1 #define NUM_ITEMS_clCreateFromGLTexture2D 1 #define NUM_ITEMS_clCreateFromGLTexture3D 1 struct clCreateFromGLTexture_st { cl_context context; cl_mem_flags flags; cl_GLenum texture_target; cl_GLint miplevel; cl_GLuint texture; cl_int *errcode_ret; }; struct clCreateFromGLTexture2D_st { cl_context context; cl_mem_flags flags; cl_GLenum texture_target; cl_GLint miplevel; cl_GLuint texture; cl_int *errcode_ret; }; struct clCreateFromGLTexture3D_st { cl_context context; cl_mem_flags flags; cl_GLenum texture_target; cl_GLint miplevel; cl_GLuint texture; cl_int *errcode_ret; }; #define NUM_ITEMS_clCreateFromGLRenderbuffer 1 struct clCreateFromGLRenderbuffer_st { cl_context context; cl_mem_flags flags; cl_GLuint renderbuffer; cl_int *errcode_ret; }; // Query Information [9.8.5] #define NUM_ITEMS_clGetGLObjectInfo 1 #define NUM_ITEMS_clGetGLTextureInfo 1 struct clGetGLObjectInfo_st { cl_mem memobj; cl_gl_object_type *gl_object_type; cl_GLuint *gl_object_name; }; struct clGetGLTextureInfo_st { cl_mem memobj; cl_gl_texture_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; // Share Objects [9.8.6] #define NUM_ITEMS_clEnqueueAcquireGLObjects 1 #define NUM_ITEMS_clEnqueueReleaseGLObjects 1 struct clEnqueueAcquireGLObjects_st { cl_command_queue command_queue; cl_uint num_objects; const cl_mem *mem_objects; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; struct clEnqueueReleaseGLObjects_st { cl_command_queue command_queue; cl_uint num_objects; const cl_mem *mem_objects; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; // CL Event Objects > GL Sync Objects [9.9] #define NUM_ITEMS_clCreateEventFromGLsyncKHR 1 struct clCreateEventFromGLsyncKHR_st { cl_context context; cl_GLsync sync; cl_int *errcode_ret; }; // CL Context > GL Context; Sharegroup [9.7] #define NUM_ITEMS_clGetGLContextInfoKHR 1 struct clGetGLContextInfoKHR_st { const cl_context_properties *properties; cl_gl_context_info param_name; size_t param_value_size; void *param_value; size_t *param_value_size_ret; }; #if 0 // OpenCL/Direct3D 10 Sharing APIs [9.10] #define NUM_ITEMS_clGetDeviceIDsFromD3D10KHR 1 #define NUM_ITEMS_clCreateFromD3D10BufferKHR 1 #define NUM_ITEMS_clCreateFromD3D10Texture2DKHR 1 #define NUM_ITEMS_clCreateFromD3D10Texture3DKHR 1 #define NUM_ITEMS_clEnqueueAcquireD3D10ObjectsKHR 1 #define NUM_ITEMS_clEnqueueReleaseD3D10ObjectsKHR 1 struct clGetDeviceIDsFromD3D10KHR_st { cl_platform_id platform; cl_d3d10_device_source_khr d3d_device_source; void *d3d_object; cl_d3d10_device_set_khr d3d_device_set; cl_uint num_entries; cl_device_id *devices; cl_uint *num_devices; }; struct clCreateFromD3D10BufferKHR_st { cl_context context; cl_mem_flags flags; ID3D10Buffer *resource; cl_int *errcode_ret; }; struct clCreateFromD3D10Texture2DKHR_st { cl_context context; cl_mem_flags flags; ID3D10Texture2D *resource; UINT subresource; cl_int *errcode_ret; }; struct clCreateFromD3D10Texture3DKHR_st { cl_context context; cl_mem_flags flags; ID3D10Texture3D *resource; UINT subresource; cl_int *errcode_ret; }; struct clEnqueueAcquireD3D10ObjectsKHR_st { cl_command_queue command_queue; cl_uint num_objects; const cl_mem *mem_objects; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event;}; struct clEnqueueReleaseD3D10ObjectsKHR_st { cl_command_queue command_queue; cl_uint num_objects; const cl_mem *mem_objects; cl_uint num_events_in_wait_list; const cl_event *event_wait_list; cl_event *event; }; #endif #endif /* _PARAM_STRUCT_H_ */ clr-rocm-5.7.1/opencl/khronos/icd/test/loader_test/test_buffer_object.c000066400000000000000000000367421450307266000262510ustar00rootroot00000000000000#include #include #include "param_struct.h" #include extern cl_mem buffer; extern cl_command_queue command_queue; extern cl_event event; static int ret_val; extern void CL_CALLBACK setmemobjectdestructor_callback(cl_mem _a, void* _b); const struct clEnqueueReadBuffer_st clEnqueueReadBufferData[NUM_ITEMS_clEnqueueReadBuffer] = { {NULL, NULL, 0, 0, 0, NULL, 0, NULL, NULL} }; const struct clEnqueueWriteBuffer_st clEnqueueWriteBufferData[NUM_ITEMS_clEnqueueWriteBuffer] = { {NULL, NULL, 0, 0, 0, NULL, 0, NULL, NULL} }; const struct clEnqueueReadBufferRect_st clEnqueueReadBufferRectData[NUM_ITEMS_clEnqueueReadBufferRect] = { {NULL, NULL, 0, NULL, NULL, NULL, 0, 0, 0, 0, NULL, 0, NULL, NULL} }; const struct clEnqueueWriteBufferRect_st clEnqueueWriteBufferRectData[NUM_ITEMS_clEnqueueWriteBufferRect] = { {NULL, NULL, 0, NULL, NULL, NULL, 0, 0, 0, 0, NULL, 0, NULL, NULL} }; const struct clEnqueueFillBuffer_st clEnqueueFillBufferData[NUM_ITEMS_clEnqueueFillBuffer] = { {NULL, NULL, NULL, 0, 0, 0, 0, NULL, NULL} }; const struct clEnqueueCopyBuffer_st clEnqueueCopyBufferData[NUM_ITEMS_clEnqueueCopyBuffer] = { {NULL, NULL, NULL, 0, 0, 0, 0, NULL, NULL} }; const struct clEnqueueCopyBufferRect_st clEnqueueCopyBufferRectData[NUM_ITEMS_clEnqueueCopyBufferRect] = { {NULL, NULL, NULL, NULL, NULL, NULL, 0, 0, 0, 0, 0, NULL, NULL} }; const struct clEnqueueMapBuffer_st clEnqueueMapBufferData[NUM_ITEMS_clEnqueueMapBuffer] = { {NULL, NULL, 0, 0, 0, 0, 0, NULL, NULL, NULL} }; const struct clRetainMemObject_st clRetainMemObjectData[NUM_ITEMS_clRetainMemObject] = { {NULL} }; const struct clSetMemObjectDestructorCallback_st clSetMemObjectDestructorCallbackData[NUM_ITEMS_clSetMemObjectDestructorCallback] = { {NULL, setmemobjectdestructor_callback, NULL} }; const struct clEnqueueUnmapMemObject_st clEnqueueUnmapMemObjectData[NUM_ITEMS_clEnqueueUnmapMemObject] = { {NULL, NULL, NULL, 0, NULL, NULL} }; const struct clGetMemObjectInfo_st clGetMemObjectInfoData[NUM_ITEMS_clGetMemObjectInfo] = { {NULL, 0, 0, NULL, NULL} }; int test_clEnqueueReadBuffer(const struct clEnqueueReadBuffer_st *data) { test_icd_app_log("clEnqueueReadBuffer(%p, %p, %u, %u, %u, %p, %u, %p, %p)\n", command_queue, buffer, data->blocking_read, data->offset, data->cb, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueReadBuffer(command_queue, buffer, data->blocking_read, data->offset, data->cb, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueWriteBuffer(const struct clEnqueueWriteBuffer_st *data) { test_icd_app_log("clEnqueueWriteBuffer(%p, %p, %u, %u, %u, %p, %u, %p, %p)\n", command_queue, buffer, data->blocking_write, data->offset, data->cb, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueWriteBuffer(command_queue, buffer, data->blocking_write, data->offset, data->cb, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueReadBufferRect(const struct clEnqueueReadBufferRect_st *data) { test_icd_app_log("clEnqueueReadBufferRect(%p, %p, %u, %p, %p, %p, %u, %u, %u, %u, %p, %u, %p, %p)\n", command_queue, buffer, data->blocking_read, data->buffer_offset, data->host_offset, data->region, data->buffer_row_pitch, data->buffer_slice_pitch, data->host_row_pitch, data->host_slice_pitch, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueReadBufferRect(command_queue, buffer, data->blocking_read, data->buffer_offset, data->host_offset, data->region, data->buffer_row_pitch, data->buffer_slice_pitch, data->host_row_pitch, data->host_slice_pitch, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueWriteBufferRect(const struct clEnqueueWriteBufferRect_st *data) { test_icd_app_log("clEnqueueWriteBufferRect(%p, %p, %u, %p, %p, %p, %u, %u, %u, %u, %p, %u, %p, %p)\n", command_queue, buffer, data->blocking_write, data->buffer_offset, data->host_offset, data->region, data->buffer_row_pitch, data->buffer_slice_pitch, data->host_row_pitch, data->host_slice_pitch, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueWriteBufferRect(command_queue, buffer, data->blocking_write, data->buffer_offset, data->host_offset, data->region, data->buffer_row_pitch, data->buffer_slice_pitch, data->host_row_pitch, data->host_slice_pitch, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueFillBuffer(const struct clEnqueueFillBuffer_st *data) { test_icd_app_log("clEnqueueFillBuffer(%p, %p, %p, %u, %u, %u, %u, %p, %p)\n", command_queue, buffer, data->pattern, data->pattern_size, data->offset, data->cb, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueFillBuffer(command_queue, buffer, data->pattern, data->pattern_size, data->offset, data->cb, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueCopyBuffer(const struct clEnqueueCopyBuffer_st *data) { test_icd_app_log("clEnqueueCopyBuffer(%p, %p, %p, %u, %u, %u, %u, %p, %p)\n", command_queue, data->src_buffer, buffer, data->src_offset, data->dst_offset, data->cb, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueCopyBuffer(command_queue, data->src_buffer, buffer, data->src_offset, data->dst_offset, data->cb, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueCopyBufferRect(const struct clEnqueueCopyBufferRect_st *data) { test_icd_app_log("clEnqueueCopyBufferRect(%p, %p, %p, %p, %p, %p, %u, %u, %u, %u, %u, %p, %p)\n", command_queue, buffer, buffer, data->src_origin, data->dst_origin, data->region, data->src_row_pitch, data->src_slice_pitch, data->dst_row_pitch, data->dst_slice_pitch, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueCopyBufferRect(command_queue, buffer, buffer, data->src_origin, data->dst_origin, data->region, data->src_row_pitch, data->src_slice_pitch, data->dst_row_pitch, data->dst_slice_pitch, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueMapBuffer(const struct clEnqueueMapBuffer_st *data) { void * return_value; test_icd_app_log("clEnqueueMapBuffer(%p, %p, %u, %x, %u, %u, %u, %p, %p, %p)\n", command_queue, buffer, data->blocking_map, data->map_flags, data->offset, data->cb, data->num_events_in_wait_list, data->event_wait_list, &event, data->errcode_ret); return_value=clEnqueueMapBuffer(command_queue, buffer, data->blocking_map, data->map_flags, data->offset, data->cb, data->num_events_in_wait_list, data->event_wait_list, &event, data->errcode_ret); test_icd_app_log("Value returned: %p\n", return_value); free(return_value); return 0; } int test_clRetainMemObject(const struct clRetainMemObject_st *data) { test_icd_app_log("clRetainMemObject(%p)\n", buffer); ret_val=clRetainMemObject(buffer); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clSetMemObjectDestructorCallback(const struct clSetMemObjectDestructorCallback_st *data) { test_icd_app_log("clSetMemObjectDestructorCallback(%p, %p, %p)\n", buffer, data->pfn_notify, data->user_data); ret_val=clSetMemObjectDestructorCallback(buffer, data->pfn_notify, data->user_data); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueUnmapMemObject(const struct clEnqueueUnmapMemObject_st *data) { test_icd_app_log("clEnqueueUnmapMemObject(%p, %p, %p, %u, %p, %p)\n", command_queue, buffer, data->mapped_ptr, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueUnmapMemObject(command_queue, buffer, data->mapped_ptr, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetMemObjectInfo (const struct clGetMemObjectInfo_st *data) { test_icd_app_log("clGetMemObjectInfo(%p, %u, %u, %p, %p)\n", buffer, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetMemObjectInfo(buffer, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n",ret_val); return 0; } int test_buffer_object() { int i; for (i=0; i #include "param_struct.h" #include extern cl_command_queue command_queue; static cl_int ret_val; const struct clRetainCommandQueue_st clRetainCommandQueueData[NUM_ITEMS_clRetainCommandQueue] = { {NULL} }; const struct clGetCommandQueueInfo_st clGetCommandQueueInfoData[NUM_ITEMS_clGetCommandQueueInfo] = { {NULL, 0, 0, NULL, NULL} }; int test_clRetainCommandQueue(const struct clRetainCommandQueue_st *data) { test_icd_app_log("clRetainCommandQueue(%p)\n", command_queue); ret_val = clRetainCommandQueue(command_queue); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetCommandQueueInfo(const struct clGetCommandQueueInfo_st *data) { test_icd_app_log("clGetCommandQueueInfo(%p, %u, %u, %p, %p)\n", command_queue, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val = clGetCommandQueueInfo(command_queue, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_cl_runtime() { int i; for (i=0; i #include #include #include "param_struct.h" #include extern cl_context context; extern cl_mem buffer; extern cl_command_queue command_queue; extern cl_event event; extern cl_context_properties context_properties[3]; static cl_int ret_val; cl_mem ret_mem; struct clCreateFromGLBuffer_st clCreateFromGLBufferData[NUM_ITEMS_clCreateFromGLBuffer] = { {NULL, 0x0, 0, NULL} }; int test_clCreateFromGLBuffer(const struct clCreateFromGLBuffer_st* data) { test_icd_app_log("clCreateFromGLBuffer(%p, %x, %u, %p)\n", context, data->flags, data->bufobj, data->errcode_ret); ret_mem = clCreateFromGLBuffer(context, data->flags, data->bufobj, data->errcode_ret); test_icd_app_log("Value returned: %p\n", ret_mem); return 0; } struct clCreateFromGLTexture_st clCreateFromGLTextureData[NUM_ITEMS_clCreateFromGLTexture] = { {NULL, 0x0, 0, 0, 0, NULL} }; int test_clCreateFromGLTexture(const struct clCreateFromGLTexture_st* data) { test_icd_app_log("clCreateFromGLTexture(%p, %x, %d, %d, %u, %p)\n", context, data->flags, data->texture_target, data->miplevel, data->texture, data->errcode_ret); ret_mem = clCreateFromGLTexture(context, data->flags, data->texture_target, data->miplevel, data->texture, data->errcode_ret); test_icd_app_log("Value returned: %p\n", ret_mem); return 0; } struct clCreateFromGLTexture2D_st clCreateFromGLTexture2DData[NUM_ITEMS_clCreateFromGLTexture2D] = { {NULL, 0x0, 0, 0, 0, NULL} }; int test_clCreateFromGLTexture2D(const struct clCreateFromGLTexture2D_st* data) { test_icd_app_log("clCreateFromGLTexture2D(%p, %x, %d, %d, %u, %p)\n", context, data->flags, data->texture_target, data->miplevel, data->texture, data->errcode_ret); ret_mem = clCreateFromGLTexture2D(context, data->flags, data->texture_target, data->miplevel, data->texture, data->errcode_ret); test_icd_app_log("Value returned: %p\n", ret_mem); return 0; } struct clCreateFromGLTexture3D_st clCreateFromGLTexture3DData[NUM_ITEMS_clCreateFromGLTexture3D] = { {NULL, 0, 0, 0, 0, NULL} }; int test_clCreateFromGLTexture3D(const struct clCreateFromGLTexture3D_st* data) { test_icd_app_log("clCreateFromGLTexture3D(%p, %x, %d, %d, %u, %p)\n", context, data->flags, data->texture_target, data->miplevel, data->texture, data->errcode_ret); ret_mem = clCreateFromGLTexture3D(context, data->flags, data->texture_target, data->miplevel, data->texture, data->errcode_ret); test_icd_app_log("Value returned: %p\n", ret_mem); return 0; } struct clCreateFromGLRenderbuffer_st clCreateFromGLRenderbufferData[NUM_ITEMS_clCreateFromGLRenderbuffer] = { {NULL, 0x0, 0, NULL} }; int test_clCreateFromGLRenderbuffer(const struct clCreateFromGLRenderbuffer_st* data) { test_icd_app_log("clCreateFromGLRenderbuffer(%p, %x, %d, %p)\n", context, data->flags, data->renderbuffer, data->errcode_ret); ret_mem = clCreateFromGLRenderbuffer(context, data->flags, data->renderbuffer, data->errcode_ret); test_icd_app_log("Value returned: %p\n", ret_mem); return 0; } struct clGetGLObjectInfo_st clGetGLObjectInfoData[NUM_ITEMS_clGetGLObjectInfo] = { {NULL, NULL, NULL} }; int test_clGetGLObjectInfo(const struct clGetGLObjectInfo_st* data) { test_icd_app_log("clGetGLObjectInfo(%p, %p, %p)\n", buffer, data->gl_object_type, data->gl_object_name); ret_val = clGetGLObjectInfo(buffer, data->gl_object_type, data->gl_object_name); test_icd_app_log("Value returned: %p\n", ret_val); return ret_val; } struct clGetGLTextureInfo_st clGetGLTextureInfoData[NUM_ITEMS_clGetGLTextureInfo] = { {NULL, 0, 0, NULL, NULL} }; int test_clGetGLTextureInfo(const struct clGetGLTextureInfo_st* data) { test_icd_app_log("clGetGLTextureInfo(%p, %u, %u, %p, %p)\n", buffer, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val = clGetGLTextureInfo (buffer, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %p\n", ret_val); return 0; } struct clEnqueueAcquireGLObjects_st clEnqueueAcquireGLObjectsData[NUM_ITEMS_clEnqueueAcquireGLObjects] = { {NULL, 0, NULL, 0, NULL, NULL} }; int test_clEnqueueAcquireGLObjects(const struct clEnqueueAcquireGLObjects_st* data) { test_icd_app_log("clEnqueueAcquireGLObjects(%p, %u, %p, %u, %p, %p)\n", command_queue, data->num_objects, data->mem_objects, data->num_events_in_wait_list, &event, &event); ret_val = clEnqueueAcquireGLObjects (command_queue, data->num_objects, data->mem_objects, data->num_events_in_wait_list, &event, &event); test_icd_app_log("Value returned: %p\n", ret_val); return 0; } struct clEnqueueReleaseGLObjects_st clEnqueueReleaseGLObjectsData[NUM_ITEMS_clEnqueueReleaseGLObjects] = { {NULL, 0, NULL, 0, NULL, NULL} }; int test_clEnqueueReleaseGLObjects(const struct clEnqueueReleaseGLObjects_st* data) { test_icd_app_log("clEnqueueReleaseGLObjects(%p, %u, %p, %u, %p, %p)\n", command_queue, data->num_objects, data->mem_objects, data->num_events_in_wait_list, &event, &event); ret_val = clEnqueueReleaseGLObjects (command_queue, data->num_objects, data->mem_objects, data->num_events_in_wait_list, &event, &event); test_icd_app_log("Value returned: %p\n", ret_val); return 0; } struct clCreateEventFromGLsyncKHR_st clCreateEventFromGLsyncKHRData[NUM_ITEMS_clCreateEventFromGLsyncKHR] = { {NULL, NULL, NULL} }; typedef CL_API_ENTRY cl_event (CL_API_CALL *PFN_clCreateEventFromGLsyncKHR)(cl_context /* context */, cl_GLsync /* cl_GLsync */, cl_int * /* errcode_ret */); int test_clCreateEventFromGLsyncKHR(const struct clCreateEventFromGLsyncKHR_st* data) { cl_event ret_event; PFN_clCreateEventFromGLsyncKHR pfn_clCreateEventFromGLsyncKHR = NULL; test_icd_app_log("clCreateEventFromGLsyncKHR(%p, %p, %p)\n", context, data->sync, data->errcode_ret); pfn_clCreateEventFromGLsyncKHR = clGetExtensionFunctionAddress("clCreateEventFromGLsyncKHR"); if (!pfn_clCreateEventFromGLsyncKHR) { test_icd_app_log("clGetExtensionFunctionAddress failed!\n"); return 1; } ret_event = pfn_clCreateEventFromGLsyncKHR (context, data->sync, data->errcode_ret); test_icd_app_log("Value returned: %p\n", ret_event); return 0; } struct clGetGLContextInfoKHR_st clGetGLContextInfoKHRData[NUM_ITEMS_clGetGLContextInfoKHR] = { {NULL, 0, 0, NULL, NULL} }; typedef CL_API_ENTRY cl_int (CL_API_CALL *PFN_clGetGLContextInfoKHR)(const cl_context_properties * /* properties */, cl_gl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */); int test_clGetGLContextInfoKHR(const struct clGetGLContextInfoKHR_st* data) { PFN_clGetGLContextInfoKHR pfn_clGetGLContextInfoKHR = NULL; test_icd_app_log("clGetGLContextInfoKHR(%p, %u, %u, %p, %p)\n", context_properties, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); pfn_clGetGLContextInfoKHR = clGetExtensionFunctionAddress("clGetGLContextInfoKHR"); if (!pfn_clGetGLContextInfoKHR) { test_icd_app_log("clGetExtensionFunctionAddress failed!\n"); return 1; } ret_val = pfn_clGetGLContextInfoKHR(context_properties, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %p\n", ret_val); return 0; } int test_OpenGL_share() { int i; for(i=0;i #include #define CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_USE_DEPRECATED_OPENCL_1_2_APIS #include #include "param_struct.h" #include extern void CL_CALLBACK createcontext_callback(const char* a, const void* b, size_t c, void* d); cl_platform_id* all_platforms; cl_platform_id platform; cl_uint num_platforms; cl_context context; cl_command_queue command_queue; cl_mem buffer; cl_mem subBuffer; cl_mem image; cl_sampler sampler; cl_program program; cl_kernel kernel; cl_event event; cl_device_id devices; cl_context_properties context_properties[3] = { (cl_context_properties)CL_CONTEXT_PLATFORM, 0, 0, }; const struct clGetDeviceIDs_st clGetDeviceIDsData[NUM_ITEMS_clGetDeviceIDs] = { {NULL, 0, 1, NULL, NULL} }; const struct clCreateSampler_st clCreateSamplerData[NUM_ITEMS_clCreateSampler] = { {NULL, 0x0, 0, 0, NULL}, }; const struct clCreateCommandQueue_st clCreateCommandQueueData[NUM_ITEMS_clCreateCommandQueue] = { {NULL, NULL, 0, NULL} }; const struct clCreateContext_st clCreateContextData[NUM_ITEMS_clCreateContext] = { {NULL, 1, NULL, NULL, NULL, NULL} }; const struct clCreateContextFromType_st clCreateContextFromTypeData[NUM_ITEMS_clCreateContextFromType] = { {NULL, 0, createcontext_callback, NULL, NULL} }; const struct clCreateBuffer_st clCreateBufferData[NUM_ITEMS_clCreateBuffer] = { {NULL, 0, 0, NULL, NULL} }; const struct clCreateSubBuffer_st clCreateSubBufferData[NUM_ITEMS_clCreateSubBuffer] = { {NULL, 0, 0, NULL, NULL} }; const struct clCreateImage_st clCreateImageData[NUM_ITEMS_clCreateImage] = { { NULL, 0x0, NULL, NULL, NULL, NULL} }; const struct clCreateImage2D_st clCreateImage2DData[NUM_ITEMS_clCreateImage2D] = { { NULL, 0x0, NULL, 0, 0, 0, NULL, NULL} }; const struct clCreateImage3D_st clCreateImage3DData[NUM_ITEMS_clCreateImage3D] = { { NULL, 0x0, NULL, 0, 0, 0, 0, 0, NULL, NULL } }; struct clReleaseMemObject_st clReleaseMemObjectData[NUM_ITEMS_clReleaseMemObject] = { {NULL} }; struct clReleaseMemObject_st clReleaseMemObjectDataImage[NUM_ITEMS_clReleaseMemObject] = { {NULL} };const struct clCreateProgramWithSource_st clCreateProgramWithSourceData[NUM_ITEMS_clCreateProgramWithSource] = { {NULL, 0, NULL, NULL, NULL} }; const struct clCreateProgramWithBinary_st clCreateProgramWithBinaryData[NUM_ITEMS_clCreateProgramWithBinary] = { {NULL, 0, NULL, NULL, NULL, NULL, NULL} }; const struct clCreateProgramWithBuiltInKernels_st clCreateProgramWithBuiltInKernelsData[NUM_ITEMS_clCreateProgramWithBuiltInKernels] = { {NULL, 0, NULL, NULL, NULL} }; const struct clCreateKernel_st clCreateKernelData[NUM_ITEMS_clCreateKernel] = { {NULL, NULL, NULL} }; const struct clCreateKernelsInProgram_st clCreateKernelsInProgramData[NUM_ITEMS_clCreateKernelsInProgram] = { {NULL, 0, NULL, NULL} }; const struct clCreateUserEvent_st clCreateUserEventData[NUM_ITEMS_clCreateUserEvent] = { {NULL, NULL} }; const struct clGetPlatformIDs_st clGetPlatformIDsData[NUM_ITEMS_clGetPlatformIDs] = { {0, NULL, 0} }; /* * Some log messages cause log mismatches when ICD loader calls a driver * function while initializing platforms. The functions clGetPlatform* are most * likely to be called at that time. But nothing stops an ICD loader from * calling a ICD driver function anytime. * * FIXME: Figure out a good way to handle this. */ #define ENABLE_MISMATCHING_PRINTS 0 int test_clGetPlatformIDs(const struct clGetPlatformIDs_st* data) { cl_int ret_val; size_t param_val_ret_size; #define PLATFORM_NAME_SIZE 40 char platform_name[PLATFORM_NAME_SIZE]; cl_uint i; #if ENABLE_MISMATCHING_PRINTS test_icd_app_log("clGetPlatformIDs(%u, %p, %p)\n", data->num_entries, &platforms, &num_platforms); #endif ret_val = clGetPlatformIDs(0, NULL, &num_platforms); if (ret_val != CL_SUCCESS){ return -1; } all_platforms = (cl_platform_id *) malloc (num_platforms * sizeof(cl_platform_id)); ret_val = clGetPlatformIDs(num_platforms, all_platforms, NULL); if (ret_val != CL_SUCCESS){ return -1; } for (i = 0; i < num_platforms; i++) { ret_val = clGetPlatformInfo(all_platforms[i], CL_PLATFORM_NAME, PLATFORM_NAME_SIZE, (void*)platform_name, ¶m_val_ret_size ); if (ret_val == CL_SUCCESS ){ if(!strcmp(platform_name, "ICD_LOADER_TEST_OPENCL_STUB")) { platform = all_platforms[i]; } } } #if ENABLE_MISMATCHING_PRINTS test_icd_app_log("Value returned: %d\n", ret_val); #endif return 0; } int test_clGetDeviceIDs(const struct clGetDeviceIDs_st* data) { int ret_val; test_icd_app_log("clGetDeviceIDs(%p, %x, %u, %p, %p)\n", platform, data->device_type, data->num_entries, &devices, data->num_devices); ret_val = clGetDeviceIDs(platform, data->device_type, data->num_entries, &devices, data->num_devices); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clCreateContext(const struct clCreateContext_st* data) { test_icd_app_log("clCreateContext(%p, %u, %p, %p, %p, %p)\n", data->properties, data->num_devices, &devices, &createcontext_callback, data->user_data, data->errcode_ret); context = clCreateContext(data->properties, data->num_devices, &devices, &createcontext_callback, data->user_data, data->errcode_ret); test_icd_app_log("Value returned: %p\n", context); return 0; } int test_clCreateContextFromType(const struct clCreateContextFromType_st* data) { test_icd_app_log("clCreateContextFromType(%p, %x, %p, %p, %p)\n", context_properties, data->device_type, data->pfn_notify, data->user_data, data->errcode_ret); context = clCreateContextFromType(context_properties, data->device_type, data->pfn_notify, data->user_data, data->errcode_ret); test_icd_app_log("Value returned: %p\n", context); return 0; } int test_clCreateCommandQueue(const struct clCreateCommandQueue_st *data) { test_icd_app_log("clCreateCommandQueue(%p, %p, %x, %p)\n", context, devices, data->properties, data->errcode_ret); command_queue = clCreateCommandQueue(context, devices, data->properties, data->errcode_ret); test_icd_app_log("Value returned: %p\n", command_queue); return 0; } int test_clCreateBuffer(const struct clCreateBuffer_st *data) { test_icd_app_log("clCreateBuffer(%p, %x, %u, %p, %p)\n", context, data->flags, data->size, data->host_ptr, data->errcode_ret); buffer = clCreateBuffer(context, data->flags, data->size, data->host_ptr, data->errcode_ret); clReleaseMemObjectData->memobj = buffer; test_icd_app_log("Value returned: %p\n", buffer); return 0; } int test_clCreateSubBuffer(const struct clCreateSubBuffer_st *data) { test_icd_app_log("clCreateSubBuffer(%p, %x, %u, %p, %p)\n", buffer, data->flags, data->buffer_create_type, data->buffer_create_info, data->errcode_ret); subBuffer = clCreateSubBuffer(buffer, data->flags, data->buffer_create_type, data->buffer_create_info, data->errcode_ret); clReleaseMemObjectData->memobj = buffer; test_icd_app_log("Value returned: %p\n", subBuffer); return 0; } int test_clCreateImage(const struct clCreateImage_st *data) { test_icd_app_log("clCreateImage(%p, %x, %p, %p, %p, %p)\n", context, data->flags, data->image_format, data->image_desc, data->host_ptr, data->errcode_ret); image = clCreateImage(context, data->flags, data->image_format, data->image_desc, data->host_ptr, data->errcode_ret); clReleaseMemObjectDataImage[0].memobj = image; test_icd_app_log("Value returned: %p\n", image); return 0; } int test_clCreateImage2D(const struct clCreateImage2D_st *data) { test_icd_app_log("clCreateImage2D(%p, %x, %p, %u, %u, %u, %p, %p)\n", context, data->flags, data->image_format, data->image_width, data->image_height, data->image_row_pitch, data->host_ptr, data->errcode_ret); image = clCreateImage2D(context, data->flags, data->image_format, data->image_width, data->image_height, data->image_row_pitch, data->host_ptr, data->errcode_ret); clReleaseMemObjectDataImage[0].memobj = image; test_icd_app_log("Value returned: %p\n", image); return 0; } int test_clCreateImage3D(const struct clCreateImage3D_st *data) { test_icd_app_log("clCreateImage3D(%p, %x, %p, %u, %u, %u, %u, %u, %p, %p)\n", context, data->flags, data->image_format, data->image_width, data->image_height, data->image_depth, data->image_row_pitch, data->image_slice_pitch, data->host_ptr, data->errcode_ret); image = clCreateImage3D(context, data->flags, data->image_format, data->image_width, data->image_height, data->image_depth, data->image_row_pitch, data->image_slice_pitch, data->host_ptr, data->errcode_ret); clReleaseMemObjectDataImage[0].memobj = image; test_icd_app_log("Value returned: %p\n", image); return 0; } int test_clCreateSampler(const struct clCreateSampler_st *data) { test_icd_app_log("clCreateSampler(%p, %u, %u, %u, %p)\n", context, data->normalized_coords, data->addressing_mode, data->filter_mode, data->errcode_ret); sampler = clCreateSampler(context, data->normalized_coords, data->addressing_mode, data->filter_mode, data->errcode_ret); test_icd_app_log("Value returned: %p\n", sampler); return 0; } int test_clCreateProgramWithSource(const struct clCreateProgramWithSource_st *data) { test_icd_app_log("clCreateProgramWithSource(%p, %u, %p, %p, %p)\n", context, data->count, data->strings, data->lengths, data->errcode_ret); program = clCreateProgramWithSource(context, data->count, data->strings, data->lengths, data->errcode_ret); test_icd_app_log("Value returned: %p\n", program); return 0; } int test_clCreateProgramWithBinary(const struct clCreateProgramWithBinary_st *data) { test_icd_app_log("clCreateProgramWithBinary(%p, %u, %p, %p, %p, %p, %p)\n", context, data->num_devices, &devices, data->lengths, data->binaries, data->binary_status, data->errcode_ret); program = clCreateProgramWithBinary(context, data->num_devices, &devices, data->lengths, data->binaries, data->binary_status, data->errcode_ret); test_icd_app_log("Value returned: %p\n", program); return 0; } int test_clCreateProgramWithBuiltInKernels(const struct clCreateProgramWithBuiltInKernels_st *data) { test_icd_app_log("clCreateProgramWithBuiltInKernels(%p, %u, %p, %p, %p)\n", context, data->num_devices, &devices, data->kernel_names, data->errcode_ret); program = clCreateProgramWithBuiltInKernels(context, data->num_devices, &devices, data->kernel_names, data->errcode_ret); test_icd_app_log("Value returned: %p\n", program); return 0; } int test_clCreateKernel(const struct clCreateKernel_st* data) { test_icd_app_log("clCreateKernel(%p, %p, %p)\n", program, data->kernel_name, data->errcode_ret); kernel = clCreateKernel(program, data->kernel_name, data->errcode_ret); test_icd_app_log("Value returned: %p\n", kernel); return 0; } int test_clCreateKernelsInProgram(const struct clCreateKernelsInProgram_st* data) { int ret_val; test_icd_app_log("clCreateKernelsInProgram(%p, %u, %p, %p)\n", program, data->num_kernels, &kernel, data->num_kernels_ret); ret_val = clCreateKernelsInProgram(program, data->num_kernels, &kernel, data->num_kernels_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clCreateUserEvent(const struct clCreateUserEvent_st* data) { test_icd_app_log("clCreateUserEvent(%p, %p)\n", context, data->errcode_ret); event = clCreateUserEvent(context, data->errcode_ret); test_icd_app_log("Value returned: %p\n", event); return 0; } const struct clReleaseSampler_st clReleaseSamplerData[NUM_ITEMS_clReleaseSampler] = { { NULL } }; int test_clReleaseSampler(const struct clReleaseSampler_st *data) { int ret_val = CL_OUT_OF_RESOURCES; test_icd_app_log("clReleaseSampler(%p)\n", sampler); ret_val = clReleaseSampler(sampler); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clReleaseMemObject(const struct clReleaseMemObject_st *data) { int ret_val = -15; test_icd_app_log("clReleaseMemObject(%p)\n", data->memobj); ret_val = clReleaseMemObject(data->memobj); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } const struct clReleaseEvent_st clReleaseEventData[NUM_ITEMS_clReleaseEvent] = { {NULL} }; int test_clReleaseEvent(const struct clReleaseEvent_st* data) { int ret_val = CL_OUT_OF_RESOURCES; test_icd_app_log("clReleaseEvent(%p)\n", event); ret_val = clReleaseEvent(event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } const struct clReleaseKernel_st clReleaseKernelData[NUM_ITEMS_clReleaseKernel] = { {NULL} }; int test_clReleaseKernel(const struct clReleaseKernel_st* data) { int ret_val = CL_OUT_OF_RESOURCES; test_icd_app_log("clReleaseKernel(%p)\n", kernel); ret_val = clReleaseKernel(kernel); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } const struct clReleaseProgram_st clReleaseProgramData[NUM_ITEMS_clReleaseProgram] = { {NULL} }; int test_clReleaseProgram(const struct clReleaseProgram_st *data) { int ret_val = CL_OUT_OF_RESOURCES; test_icd_app_log("clReleaseProgram(%p)\n", program); ret_val = clReleaseProgram(program); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } const struct clReleaseCommandQueue_st clReleaseCommandQueueData[NUM_ITEMS_clReleaseCommandQueue] = { {NULL} }; int test_clReleaseCommandQueue(const struct clReleaseCommandQueue_st *data) { int ret_val = CL_OUT_OF_RESOURCES; test_icd_app_log("clReleaseCommandQueue(%p)\n", command_queue); ret_val = clReleaseCommandQueue(command_queue); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } const struct clReleaseContext_st clReleaseContextData[NUM_ITEMS_clReleaseContext] = { {NULL} }; int test_clReleaseContext(const struct clReleaseContext_st* data) { int ret_val = CL_OUT_OF_RESOURCES; test_icd_app_log("clReleaseContext(%p)\n", context); ret_val = clReleaseContext(context); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } const struct clReleaseDevice_st clReleaseDeviceData[NUM_ITEMS_clReleaseDevice] = { {NULL} }; int test_clReleaseDevice(const struct clReleaseDevice_st* data) { int ret_val = CL_OUT_OF_RESOURCES; test_icd_app_log("clReleaseDevice(%p)\n", devices); ret_val = clReleaseDevice(devices); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_create_calls() { test_clGetPlatformIDs(clGetPlatformIDsData); context_properties[1] = (cl_context_properties) platform; test_clGetDeviceIDs(clGetDeviceIDsData); test_clCreateContext(clCreateContextData); test_clReleaseContext(clReleaseContextData); test_clCreateContextFromType(clCreateContextFromTypeData); test_clCreateCommandQueue(clCreateCommandQueueData); test_clCreateBuffer(clCreateBufferData); test_clCreateSubBuffer(clCreateSubBufferData); test_clCreateImage(clCreateImageData); test_clReleaseMemObject(clReleaseMemObjectDataImage); test_clCreateImage2D(clCreateImage2DData); test_clReleaseMemObject(clReleaseMemObjectDataImage); test_clCreateImage3D(clCreateImage3DData); test_clCreateSampler(clCreateSamplerData); test_clCreateProgramWithSource(clCreateProgramWithSourceData); test_clReleaseProgram(clReleaseProgramData); test_clCreateProgramWithBinary(clCreateProgramWithBinaryData); test_clReleaseProgram(clReleaseProgramData); test_clCreateProgramWithBuiltInKernels(clCreateProgramWithBuiltInKernelsData); test_clCreateKernel(clCreateKernelData); test_clCreateKernelsInProgram(clCreateKernelsInProgramData); test_clCreateUserEvent(clCreateUserEventData); return 0; } int test_release_calls() { test_clReleaseSampler(clReleaseSamplerData); test_clReleaseMemObject(clReleaseMemObjectData); test_clReleaseMemObject(clReleaseMemObjectDataImage); test_clReleaseEvent(clReleaseEventData); test_clReleaseKernel(clReleaseKernelData); test_clReleaseProgram(clReleaseProgramData); test_clReleaseCommandQueue(clReleaseCommandQueueData); test_clReleaseContext(clReleaseContextData); test_clReleaseDevice(clReleaseDeviceData); return 0; } clr-rocm-5.7.1/opencl/khronos/icd/test/loader_test/test_image_objects.c000066400000000000000000000274521450307266000262430ustar00rootroot00000000000000/* Modifications Copyright(C) 2022 Advanced Micro Devices, Inc. * All rights reserved. */ #include #include #include "param_struct.h" #include extern cl_mem image; extern cl_context context; extern cl_command_queue command_queue; extern cl_event event; extern cl_mem buffer; static int ret_val; const struct clGetSupportedImageFormats_st clGetSupportedImageFormatsData[NUM_ITEMS_clGetSupportedImageFormats] = { { NULL, 0x0, 0, 0, NULL, NULL } }; const struct clEnqueueCopyImageToBuffer_st clEnqueueCopyImageToBufferData[NUM_ITEMS_clEnqueueCopyImageToBuffer] = { { NULL, NULL, NULL, NULL, NULL, 0, 0, NULL, NULL } }; const struct clEnqueueCopyBufferToImage_st clEnqueueCopyBufferToImageData[NUM_ITEMS_clEnqueueCopyBufferToImage] = { { NULL, NULL, NULL, 0, NULL, NULL, 0, NULL, NULL } }; const struct clEnqueueMapImage_st clEnqueueMapImageData[NUM_ITEMS_clEnqueueMapImage] = { { NULL, NULL, 0, 0x0, NULL, NULL, NULL, NULL,0, NULL, NULL} }; const struct clEnqueueReadImage_st clEnqueueReadImageData[NUM_ITEMS_clEnqueueReadImage] = { { NULL, NULL, 0, NULL, NULL, 0, 0, NULL, 0, NULL, NULL } }; const struct clEnqueueWriteImage_st clEnqueueWriteImageData[NUM_ITEMS_clEnqueueWriteImage] = { { NULL, NULL, 0, NULL, NULL, 0, 0, NULL, 0, NULL, NULL } }; const struct clEnqueueFillImage_st clEnqueueFillImageData[NUM_ITEMS_clEnqueueFillImage] = { { NULL, NULL, NULL, NULL, NULL, 0, NULL, NULL } }; const struct clEnqueueCopyImage_st clEnqueueCopyImageData[NUM_ITEMS_clEnqueueCopyImage] = { { NULL, NULL, NULL, NULL, NULL, NULL, 0, NULL, NULL } }; const struct clGetImageInfo_st clGetImageInfoData[NUM_ITEMS_clGetImageInfo] = { { NULL, 0, 0, NULL, NULL} }; int test_clGetSupportedImageFormats(const struct clGetSupportedImageFormats_st *data) { test_icd_app_log("clGetSupportedImageFormats(%p, %x, %u, %u, %p, %p)\n", context, data->flags, data->image_type, data->num_entries, data->image_formats, data->num_image_formats); ret_val = clGetSupportedImageFormats(context, data->flags, data->image_type, data->num_entries, data->image_formats, data->num_image_formats); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueCopyImageToBuffer(const struct clEnqueueCopyImageToBuffer_st *data) { test_icd_app_log("clEnqueueCopyImageToBuffer(%p, %p, %p, %p, %p, %u, %u, %p, %p)\n", command_queue, image, buffer, data->src_origin, data->region, data->dst_offset, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val = clEnqueueCopyImageToBuffer(command_queue, image, buffer, data->src_origin, data->region, data->dst_offset, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueCopyBufferToImage(const struct clEnqueueCopyBufferToImage_st *data) { test_icd_app_log("clEnqueueCopyBufferToImage(%p, %p, %p, %u, %p, %p, %u, %p, %p)\n", command_queue, buffer, image, data->src_offset, data->dst_origin, data->region, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val = clEnqueueCopyBufferToImage(command_queue, buffer, image, data->src_offset, data->dst_origin, data->region, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueMapImage(const struct clEnqueueMapImage_st *data) { void *return_value; test_icd_app_log("clEnqueueMapImage(%p, %p, %u, %x, %p, %p, %p, %p, %u, %p, %p, %p)\n", command_queue, image, data->blocking_map, data->map_flags, data->origin, data->region, data->image_row_pitch, data->image_slice_pitch, data->num_events_in_wait_list, data->event_wait_list, &event, data->errcode_ret); return_value = clEnqueueMapImage(command_queue, image, data->blocking_map, data->map_flags, data->origin, data->region, data->image_row_pitch, data->image_slice_pitch, data->num_events_in_wait_list, data->event_wait_list, &event, data->errcode_ret); test_icd_app_log("Value returned: %p\n", return_value); free(return_value); return 0; } int test_clEnqueueReadImage(const struct clEnqueueReadImage_st *data) { test_icd_app_log("clEnqueueReadImage(%p, %p, %u, %p, %p, %u, %u, %p, %u, %p, %p)\n", command_queue, image, data->blocking_read, data->origin, data->region, data->row_pitch, data->slice_pitch, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val = clEnqueueReadImage(command_queue, image, data->blocking_read, data->origin, data->region, data->row_pitch, data->slice_pitch, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueWriteImage(const struct clEnqueueWriteImage_st *data) { test_icd_app_log("clEnqueueWriteImage(%p, %p, %u, %p, %p, %u, %u, %p, %u, %p, %p)\n", command_queue, image, data->blocking_write, data->origin, data->region, data->input_row_pitch, data->input_slice_pitch, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val = clEnqueueWriteImage(command_queue, image, data->blocking_write, data->origin, data->region, data->input_row_pitch, data->input_slice_pitch, data->ptr, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueFillImage(const struct clEnqueueFillImage_st *data) { test_icd_app_log("clEnqueueFillImage(%p, %p, %p, %p, %p, %u, %p, %p)\n", command_queue, image, data->fill_color, data->origin, data->region, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val = clEnqueueFillImage(command_queue, image, data->fill_color, data->origin, data->region, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clEnqueueCopyImage(const struct clEnqueueCopyImage_st *data) { test_icd_app_log("clEnqueueCopyImage(%p, %p, %p, %p, %p, %p, %u, %p, %p)\n", command_queue, image, image, data->src_origin, data->dst_origin, data->region, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val = clEnqueueCopyImage(command_queue, image, image, data->src_origin, data->dst_origin, data->region, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetImageInfo(const struct clGetImageInfo_st *data) { test_icd_app_log("clGetImageInfo(%p, %u, %u, %p, %p)\n", image, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val = clGetImageInfo(image, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_image_objects() { int i; for (i = 0; i #include "param_struct.h" #include extern cl_kernel kernel; extern cl_event event; extern cl_context context; extern cl_command_queue command_queue; extern cl_device_id devices; static int ret_val; extern void CL_CALLBACK setevent_callback(cl_event _a, cl_int _b, void* _c); extern void CL_CALLBACK setprintf_callback(cl_context _a, cl_uint _b, char* _c, void* _d ); struct clRetainKernel_st clRetainKernelData[NUM_ITEMS_clRetainKernel] = { {NULL} }; int test_clRetainKernel(const struct clRetainKernel_st* data) { test_icd_app_log("clRetainKernel(%p)\n", kernel); ret_val=clRetainKernel(kernel); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clSetKernelArg_st clSetKernelArgData[NUM_ITEMS_clSetKernelArg] = { {NULL, 0, 0, NULL} }; int test_clSetKernelArg(const struct clSetKernelArg_st* data) { test_icd_app_log("clSetKernelArg(%p, %u, %u, %p)\n", kernel, data->arg_index, data->arg_size, data->arg_value); ret_val=clSetKernelArg(kernel, data->arg_index, data->arg_size, data->arg_value); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clGetKernelInfo_st clGetKernelInfoData[NUM_ITEMS_clGetKernelInfo] = { {NULL, 0, 0, NULL, NULL} }; int test_clGetKernelInfo(const struct clGetKernelInfo_st* data) { test_icd_app_log("clGetKernelInfo(%p, %u, %u, %p, %p)\n", kernel, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetKernelInfo(kernel, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clGetKernelArgInfo_st clGetKernelArgInfoData[NUM_ITEMS_clGetKernelArgInfo] = { {NULL, 0, 0, 0, NULL, NULL} }; int test_clGetKernelArgInfo(const struct clGetKernelArgInfo_st* data) { test_icd_app_log("clGetKernelArgInfo(%p, %u, %u, %u, %p, %p)\n", kernel, data->arg_indx, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetKernelArgInfo(kernel, data->arg_indx, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clGetKernelWorkGroupInfo_st clGetKernelWorkGroupInfoData[NUM_ITEMS_clGetKernelWorkGroupInfo] = { {NULL, NULL, 0, 0, NULL, NULL} }; int test_clGetKernelWorkGroupInfo(const struct clGetKernelWorkGroupInfo_st* data) { test_icd_app_log("clGetKernelWorkGroupInfo(%p, %p, %u, %u, %p, %p)\n", kernel, devices, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetKernelWorkGroupInfo(kernel, devices, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueMigrateMemObjects_st clEnqueueMigrateMemObjectsData[NUM_ITEMS_clEnqueueMigrateMemObjects] = { {NULL, 0, NULL, 0x0, 0, NULL, NULL} }; int test_clEnqueueMigrateMemObjects(const struct clEnqueueMigrateMemObjects_st* data) { test_icd_app_log("clEnqueueMigrateMemObjects(%p, %u, %p, %x, %u, %p, %p)\n", command_queue, data->num_mem_objects, data->mem_objects, data->flags, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueMigrateMemObjects(command_queue, data->num_mem_objects, data->mem_objects, data->flags, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueNDRangeKernel_st clEnqueueNDRangeKernelData[NUM_ITEMS_clEnqueueNDRangeKernel] = { {NULL, NULL, 0, NULL, NULL, NULL, 0, NULL} }; int test_clEnqueueNDRangeKernel(const struct clEnqueueNDRangeKernel_st* data) { test_icd_app_log("clEnqueueNDRangeKernel(%p, %p, %u, %p, %p, %p, %u, %p, %p)\n", command_queue, kernel, data->work_dim, data->global_work_offset, data->global_work_size, data->local_work_size, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueNDRangeKernel(command_queue, kernel, data->work_dim, data->global_work_offset, data->global_work_size, data->local_work_size, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueTask_st clEnqueueTaskData[NUM_ITEMS_clEnqueueTask] = { {NULL, NULL, 0, NULL, NULL} }; int test_clEnqueueTask(const struct clEnqueueTask_st* data) { test_icd_app_log("clEnqueueTask(%p, %p, %u, %p, %p)\n", command_queue, kernel, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueTask(command_queue, kernel, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueNativeKernel_st clEnqueueNativeKernelData[NUM_ITEMS_clEnqueueNativeKernel] = { {NULL, NULL, NULL, 0, 0, NULL, NULL, 0, NULL, NULL} }; int test_clEnqueueNativeKernel(const struct clEnqueueNativeKernel_st* data) { test_icd_app_log("clEnqueueNativeKernel(%p, %p, %p, %u, %u, %p, %p, %u, %p, %p)\n", command_queue, data->user_func, data->args, data->cb_args, data->num_mem_objects, data->mem_list, data->args_mem_loc, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueNativeKernel(command_queue, data->user_func, data->args, data->cb_args, data->num_mem_objects, data->mem_list, data->args_mem_loc, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clSetUserEventStatus_st clSetUserEventStatusData[NUM_ITEMS_clSetUserEventStatus] = { {NULL, 0} }; int test_clSetUserEventStatus(const struct clSetUserEventStatus_st* data) { test_icd_app_log("clSetUserEventStatus(%p, %d)\n", event, data->execution_status); ret_val=clSetUserEventStatus(event, data->execution_status); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clWaitForEvents_st clWaitForEventsData[NUM_ITEMS_clWaitForEvents] = { {1, NULL} }; int test_clWaitForEvents(const struct clWaitForEvents_st* data) { test_icd_app_log("clWaitForEvents(%u, %p)\n", data->num_events, &event); ret_val=clWaitForEvents(data->num_events, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clGetEventInfo_st clGetEventInfoData[NUM_ITEMS_clGetEventInfo] = { {NULL, 0, 0, NULL, NULL} }; int test_clGetEventInfo(const struct clGetEventInfo_st* data){ test_icd_app_log("clGetEventInfo(%p, %u, %u, %p, %p)\n", event, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetEventInfo(event, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clSetEventCallback_st clSetEventCallbackData[NUM_ITEMS_clSetEventCallback] = { {NULL, 0, setevent_callback, NULL} }; int test_clSetEventCallback(const struct clSetEventCallback_st* data) { test_icd_app_log("clSetEventCallback(%p, %d, %p, %p)\n", event, data->command_exec_callback_type, data->pfn_event_notify, data->user_data); ret_val=clSetEventCallback(event, data->command_exec_callback_type, data->pfn_event_notify, data->user_data); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clRetainEvent_st clRetainEventData[NUM_ITEMS_clRetainEvent] = { {NULL} }; int test_clRetainEvent(const struct clRetainEvent_st* data) { test_icd_app_log("clRetainEvent(%p)\n", event); ret_val=clRetainEvent(event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueMarker_st clEnqueueMarkerData[NUM_ITEMS_clEnqueueMarker] = { {NULL, NULL} }; int test_clEnqueueMarker(const struct clEnqueueMarker_st* data) { test_icd_app_log("clEnqueueMarker(%p, %p)\n", command_queue, &event); ret_val = clEnqueueMarker(command_queue, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueMarkerWithWaitList_st clEnqueueMarkerWithWaitListData[NUM_ITEMS_clEnqueueMarkerWithWaitList] = { {NULL, 0, NULL, NULL} }; int test_clEnqueueMarkerWithWaitList(const struct clEnqueueMarkerWithWaitList_st* data) { test_icd_app_log("clEnqueueMarkerWithWaitList(%p, %u, %p, %p)\n", command_queue, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueMarkerWithWaitList(command_queue, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueBarrierWithWaitList_st clEnqueueBarrierWithWaitListData[NUM_ITEMS_clEnqueueBarrierWithWaitList] = { {NULL, 0, NULL, NULL} }; int test_clEnqueueBarrierWithWaitList(const struct clEnqueueBarrierWithWaitList_st* data) { test_icd_app_log("clEnqueueBarrierWithWaitList(%p, %u, %p, %p)\n", command_queue, data->num_events_in_wait_list, data->event_wait_list, &event); ret_val=clEnqueueBarrierWithWaitList(command_queue, data->num_events_in_wait_list, data->event_wait_list, &event); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueWaitForEvents_st clEnqueueWaitForEventsData[NUM_ITEMS_clEnqueueWaitForEvents] = { {NULL, 0, NULL} }; int test_clEnqueueWaitForEvents(const struct clEnqueueWaitForEvents_st* data) { test_icd_app_log("clEnqueueWaitForEvents(%p, %u, %p)\n", command_queue, data->num_events, data->event_list); ret_val = clEnqueueWaitForEvents(command_queue, data->num_events, data->event_list); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clEnqueueBarrier_st clEnqueueBarrierData[NUM_ITEMS_clEnqueueBarrier] = { {NULL} }; int test_clEnqueueBarrier(const struct clEnqueueBarrier_st* data) { test_icd_app_log("clEnqueueBarrier(%p)\n", command_queue); ret_val = clEnqueueBarrier(command_queue); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clGetEventProfilingInfo_st clGetEventProfilingInfoData[NUM_ITEMS_clGetEventProfilingInfo] = { {NULL, 0, 0, NULL, NULL} }; int test_clGetEventProfilingInfo(const struct clGetEventProfilingInfo_st* data) { test_icd_app_log("clGetEventProfilingInfo(%p, %u, %u, %p, %p)\n", event, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetEventProfilingInfo(event, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clFlush_st clFlushData[NUM_ITEMS_clFlush] = { {NULL} }; int test_clFlush(const struct clFlush_st* data) { test_icd_app_log("clFlush(%p)\n", command_queue); ret_val=clFlush(command_queue); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } struct clFinish_st clFinishData[NUM_ITEMS_clFinish] = { {NULL} }; int test_clFinish(const struct clFinish_st* data) { test_icd_app_log("clFinish(%p)\n", command_queue); ret_val=clFinish(command_queue); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_kernel() { int i; for (i=0; i #include "param_struct.h" #include extern cl_context context; extern cl_platform_id platform; extern cl_device_id devices; static int ret_val; struct clRetainContext_st clRetainContextData[NUM_ITEMS_clRetainContext] = { {NULL} }; struct clGetContextInfo_st clGetContextInfoData[NUM_ITEMS_clGetContextInfo] = { {NULL, 0, 0, NULL, NULL} }; struct clGetPlatformInfo_st clGetPlatformInfoData[NUM_ITEMS_clGetPlatformInfo] = { {NULL, 0, 0, NULL, NULL} }; struct clGetDeviceInfo_st clGetDeviceInfoData[NUM_ITEMS_clGetDeviceInfo] = { {NULL, 0, 0, NULL, NULL} }; struct clCreateSubDevices_st clCreateSubDevicesData[NUM_ITEMS_clCreateSubDevices] = { {NULL, NULL, 0, NULL, NULL} }; struct clRetainDevice_st clRetainDeviceData[NUM_ITEMS_clRetainDevice] = { {NULL} }; int test_clRetainContext(const struct clRetainContext_st* data) { test_icd_app_log("clRetainContext(%p)\n", context); ret_val = clRetainContext(context); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetContextInfo(const struct clGetContextInfo_st* data) { test_icd_app_log("clGetContextInfo(%p, %u, %u, %p, %p)\n", context, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val = clGetContextInfo(context, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetPlatformInfo(const struct clGetPlatformInfo_st* data) { test_icd_app_log("clGetPlatformInfo(%p, %u, %u, %p, %p)\n", platform, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val = clGetPlatformInfo(platform, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetDeviceInfo(const struct clGetDeviceInfo_st* data) { test_icd_app_log("clGetDeviceInfo(%p, %u, %u, %p, %p)\n", devices, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val = clGetDeviceInfo(devices, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clCreateSubDevices(const struct clCreateSubDevices_st* data) { test_icd_app_log("clCreateSubDevices(%p, %p, %u, %p, %p)\n", devices, data->properties, data->num_entries, &devices, data->num_devices); ret_val = clCreateSubDevices(devices, data->properties, data->num_entries, &devices, data->num_devices); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clRetainDevice(const struct clRetainDevice_st* data) { test_icd_app_log("clRetainDevice(%p)\n", devices); ret_val = clRetainDevice(devices); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_platforms() { int i; for (i = 0;i #include "param_struct.h" #include extern cl_context context; extern cl_program program; extern cl_platform_id platform; extern cl_device_id devices; static int ret_val; extern void CL_CALLBACK program_callback(cl_program _a, void* _b); const struct clRetainProgram_st clRetainProgramData[NUM_ITEMS_clRetainProgram]= { {NULL} }; const struct clBuildProgram_st clBuildProgramData[NUM_ITEMS_clBuildProgram]= { {NULL,0,NULL,NULL,program_callback,NULL} }; const struct clCompileProgram_st clCompileProgramData[NUM_ITEMS_clCompileProgram]= { {NULL,0,NULL,NULL,0,NULL,NULL,program_callback,NULL} }; const struct clLinkProgram_st clLinkProgramData[NUM_ITEMS_clLinkProgram]= { {NULL,0,NULL,NULL,0,NULL,program_callback,NULL,NULL} }; const struct clUnloadPlatformCompiler_st clUnloadPlatformCompilerData[NUM_ITEMS_clUnloadPlatformCompiler]= { {NULL} }; const struct clGetExtensionFunctionAddressForPlatform_st clGetExtensionFunctionAddressForPlatformData[NUM_ITEMS_clGetExtensionFunctionAddressForPlatform]= { {NULL, ""} }; const struct clGetProgramInfo_st clGetProgramInfoData[NUM_ITEMS_clGetProgramInfo]= { {NULL,0,0,NULL,NULL} }; const struct clGetProgramBuildInfo_st clGetProgramBuildInfoData[NUM_ITEMS_clGetProgramBuildInfo]= { {NULL,NULL,0,0,NULL,NULL} }; int test_clRetainProgram(const struct clRetainProgram_st *data) { test_icd_app_log("clRetainProgram(%p)\n", program); ret_val=clRetainProgram(program); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clBuildProgram(const struct clBuildProgram_st *data) { test_icd_app_log("clBuildProgram(%p, %u, %p, %p, %p, %p)\n", program, data->num_devices, &devices, data->options, data->pfn_notify, data->user_data); ret_val=clBuildProgram(program, data->num_devices, &devices, data->options, data->pfn_notify, data->user_data); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clCompileProgram(const struct clCompileProgram_st *data) { test_icd_app_log("clCompileProgram(%p, %u, %p, %p, %u, %p, %p, %p)\n", program, data->num_devices, &devices, data->options, data->num_input_headers, data->header_include_names, data->pfn_notify, data->user_data); ret_val=clCompileProgram(program, data->num_devices, &devices, data->options, data->num_input_headers, data->headers, data->header_include_names, data->pfn_notify, data->user_data); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clLinkProgram(const struct clLinkProgram_st *data) { cl_program program; test_icd_app_log("clLinkProgram(%p, %u, %p, %p, %u, %p, %p, %p, %p)\n", context, data->num_devices, data->device_list, data->options, data->num_input_programs, data->input_programs, data->pfn_notify, data->user_data, data->errcode_ret); program=clLinkProgram(context, data->num_devices, data->device_list, data->options, data->num_input_programs, data->input_programs, data->pfn_notify, data->user_data, data->errcode_ret); test_icd_app_log("Value returned: %p\n", program); return 0; } int test_clUnloadPlatformCompiler(const struct clUnloadPlatformCompiler_st *data) { test_icd_app_log("clUnloadPlatformCompiler(%p)\n", platform); ret_val=clUnloadPlatformCompiler(platform); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetExtensionFunctionAddressForPlatform(const struct clGetExtensionFunctionAddressForPlatform_st *data) { void *return_value; test_icd_app_log("clGetExtensionFunctionAddressForPlatform(%p, %p)\n", platform, data->func_name); return_value=clGetExtensionFunctionAddressForPlatform(platform, data->func_name); test_icd_app_log("Value returned: %p\n", return_value); return 0; } int test_clGetProgramInfo(const struct clGetProgramInfo_st *data) { test_icd_app_log("clGetProgramInfo(%p, %u, %u, %p, %p)\n", program, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetProgramInfo(program, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetProgramBuildInfo(const struct clGetProgramBuildInfo_st *data) { test_icd_app_log("clGetProgramBuildInfo(%p, %p, %u, %u, %p, %p)\n", program, data->device, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetProgramBuildInfo(program, data->device, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_program_objects() { int i; for (i=0;i #include "param_struct.h" #include extern cl_sampler sampler; static int ret_val; const struct clRetainSampler_st clRetainSamplerData[NUM_ITEMS_clRetainSampler]= { { NULL } }; const struct clGetSamplerInfo_st clGetSamplerInfoData[NUM_ITEMS_clGetSamplerInfo]= { { NULL, 0, 0, NULL, NULL } }; int test_clRetainSampler(const struct clRetainSampler_st *data) { test_icd_app_log("clRetainSampler(%p)\n", sampler); ret_val=clRetainSampler(sampler); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_clGetSamplerInfo(const struct clGetSamplerInfo_st *data) { test_icd_app_log("clGetSamplerInfo(%p, %u, %u, %p, %p)\n", sampler, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); ret_val=clGetSamplerInfo(sampler, data->param_name, data->param_value_size, data->param_value, data->param_value_size_ret); test_icd_app_log("Value returned: %d\n", ret_val); return 0; } int test_sampler_objects() { int i; for (i=0;i #include #include #include #include #include #define APP_LOG_FILE "icd_test_app_log.txt" #define STUB_LOG_FILE "icd_test_stub_log.txt" static FILE *app_log_file; static FILE *stub_log_file; int test_icd_initialize_app_log(void) { app_log_file = fopen(APP_LOG_FILE, "w"); if (!app_log_file) { printf("Unable to open file %s\n", APP_LOG_FILE); return -1; } return 0; } void test_icd_close_app_log(void) { fclose(app_log_file); } void test_icd_app_log(const char *format, ...) { va_list args; va_start(args, format); vfprintf(app_log_file, format, args); va_end(args); } int test_icd_initialize_stub_log(void) { stub_log_file = fopen(STUB_LOG_FILE, "w"); if (!stub_log_file) { printf("Unable to open file %s\n", STUB_LOG_FILE); return -1; } return 0; } void test_icd_close_stub_log(void) { fclose(stub_log_file); } void test_icd_stub_log(const char *format, ...) { va_list args; va_start(args, format); vfprintf(stub_log_file, format, args); va_end(args); } static char *test_icd_get_log(const char *filename) { struct stat statbuf; FILE *fp; char *source = NULL; fp = fopen(filename, "rb"); if (fp) { size_t fsize = 0; stat(filename, &statbuf); fsize = statbuf.st_size; source = (char *)malloc(fsize+1); // +1 for NULL terminator if (source) { if (fsize) { if (fread(source, fsize, 1, fp) != 1) { free(source); source = NULL; } else { source[fsize] = '\0'; } } else { // Don't fail when fsize = 0, just return empty string source[fsize] = '\0'; } } fclose(fp); } return source; } char *test_icd_get_app_log(void) { return test_icd_get_log(APP_LOG_FILE); } char *test_icd_get_stub_log(void) { return test_icd_get_log(STUB_LOG_FILE); } clr-rocm-5.7.1/opencl/opencl-backward-compat.cmake000066400000000000000000000157001450307266000220470ustar00rootroot00000000000000# Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved. # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.16.8) set(OPENCL ${PROJECT_NAME}) set(OPENCL_BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR}) set(OPENCL_WRAPPER_DIR ${OPENCL_BUILD_DIR}/wrapper_dir) set(OPENCL_WRAPPER_INC_DIR ${OPENCL_WRAPPER_DIR}/include/CL) set(OPENCL_WRAPPER_BIN_DIR ${OPENCL_WRAPPER_DIR}/bin) set(OPENCL_WRAPPER_LIB_DIR ${OPENCL_WRAPPER_DIR}/lib) #Function to generate header template file function(create_header_template) file(WRITE ${OPENCL_WRAPPER_DIR}/header.hpp.in "/* Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef @include_guard@ #define @include_guard@ #ifndef ROCM_HEADER_WRAPPER_WERROR #define ROCM_HEADER_WRAPPER_WERROR @deprecated_error@ #endif #if ROCM_HEADER_WRAPPER_WERROR /* ROCM_HEADER_WRAPPER_WERROR 1 */ #error \"This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with CL\" #else /* ROCM_HEADER_WRAPPER_WERROR 0 */ #if defined(__GNUC__) #warning \"This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with CL\" #else #pragma message(\"This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with CL\") #endif #endif /* ROCM_HEADER_WRAPPER_WERROR */ @include_statements@ #endif") endfunction() #use header template file and generate wrapper header files function(generate_wrapper_header) file(MAKE_DIRECTORY ${OPENCL_WRAPPER_INC_DIR}) set(HEADER_DIR ${CMAKE_CURRENT_SOURCE_DIR}/khronos/headers/opencl2.2/CL ) #find all header files from CL folder file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/khronos/headers/opencl2.2/CL/*) #remove files that are not required in package list(REMOVE_ITEM include_files ${HEADER_DIR}/cl_egl.h ${HEADER_DIR}/cl_dx9_media_sharing.h ${HEADER_DIR}/cl_d3d11.h ${HEADER_DIR}/cl_d3d10.h) #Generate wrapper header files foreach(header_file ${include_files}) # set include guard get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE) string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME) set(include_guard "${include_guard}OPENCL_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H") #set #include statement get_filename_component(file_name ${header_file} NAME) set(include_statements "${include_statements}#include \"../../../${CMAKE_INSTALL_INCLUDEDIR}/CL/${file_name}\"\n") configure_file(${OPENCL_WRAPPER_DIR}/header.hpp.in ${OPENCL_WRAPPER_INC_DIR}/${file_name}) unset(include_guard) unset(include_statements) endforeach() endfunction() #function to create symlink to binaries function(create_binary_symlink) file(MAKE_DIRECTORY ${OPENCL_WRAPPER_BIN_DIR}) set(file_name "clinfo") add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_BINDIR}/${file_name} ${OPENCL_WRAPPER_BIN_DIR}/${file_name}) endfunction() #function to create symlink to libraries function(create_library_symlink) if(BUILD_ICD) file(MAKE_DIRECTORY ${OPENCL_WRAPPER_LIB_DIR}) set(LIB_OPENCL "libOpenCL.so") set(MAJ_VERSION "${OPENCL_LIB_VERSION_MAJOR}") set(SO_VERSION "${OPENCL_LIB_VERSION_STRING}") set(library_files "${LIB_OPENCL}" "${LIB_OPENCL}.${MAJ_VERSION}" "${LIB_OPENCL}.${SO_VERSION}") foreach(file_name ${library_files}) add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_LIBDIR}/${file_name} ${OPENCL_WRAPPER_LIB_DIR}/${file_name}) endforeach() endif() if(BUILD_SHARED_LIBS) set(LIB_AMDDOC "libamdocl64.so") else() set(LIB_AMDDOC "libamdocl64.a") endif() set(file_name "${LIB_AMDDOC}") add_custom_target(link_${file_name} ALL WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} COMMAND ${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_LIBDIR}/${file_name} ${OPENCL_WRAPPER_DIR}/${file_name}) endfunction() #Creater a template for header file create_header_template() #Use template header file and generater wrapper header files generate_wrapper_header() install(DIRECTORY ${OPENCL_WRAPPER_INC_DIR} DESTINATION ${OPENCL}/include COMPONENT dev) # Create symlink to binaries create_binary_symlink() install(DIRECTORY ${OPENCL_WRAPPER_BIN_DIR} DESTINATION ${OPENCL} COMPONENT binary) option(BUILD_SHARED_LIBS "Build the shared library" ON) # Create symlink to libraries create_library_symlink() if(BUILD_ICD) install(DIRECTORY ${OPENCL_WRAPPER_LIB_DIR} DESTINATION ${OPENCL} COMPONENT icd) endif() if(BUILD_SHARED_LIBS) install(FILES ${OPENCL_WRAPPER_DIR}/libamdocl64.so DESTINATION ${OPENCL}/lib COMPONENT binary) else() install(FILES ${OPENCL_WRAPPER_DIR}/libamdocl64.a DESTINATION ${OPENCL}/lib COMPONENT binary) endif() clr-rocm-5.7.1/opencl/packaging/000077500000000000000000000000001450307266000164515ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/packaging/CMakeLists.txt000066400000000000000000000144421450307266000212160ustar00rootroot00000000000000cmake_minimum_required(VERSION 3.5.1) project(rocm-opencl) set(CPACK_COMPONENTS_ALL binary dev) if(BUILD_ICD) set(CPACK_COMPONENTS_ALL "${CPACK_COMPONENTS_ALL}" icd) endif() if(BUILD_TESTS) set(CPACK_COMPONENTS_ALL "${CPACK_COMPONENTS_ALL}" ocltst) endif() if(ENABLE_ASAN_PACKAGING) set(CPACK_COMPONENTS_ALL asan) endif() set(CPACK_DEB_COMPONENT_INSTALL ON) set(CPACK_RPM_COMPONENT_INSTALL ON) install(TARGETS clinfo DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT binary) install(TARGETS amdocl DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT binary) install(TARGETS amdocl DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan) install(FILES ${opencl_SOURCE_DIR}/LICENSE.txt DESTINATION ${CMAKE_INSTALL_DOCDIR} COMPONENT binary) install(FILES ${opencl_SOURCE_DIR}/LICENSE.txt DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan COMPONENT asan) install(DIRECTORY ${opencl_SOURCE_DIR}/khronos/headers/opencl2.2/CL DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} COMPONENT dev USE_SOURCE_PERMISSIONS PATTERN cl_d3d10.h EXCLUDE PATTERN cl_d3d11.h EXCLUDE PATTERN cl_dx9_media_sharing.h EXCLUDE ) if(BUILD_ICD) install(TARGETS OpenCL DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT icd ) install(TARGETS OpenCL DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan ) install(FILES ${opencl_SOURCE_DIR}/khronos/icd/LICENSE DESTINATION ${CMAKE_INSTALL_DATADIR}/doc/rocm-ocl-icd COMPONENT icd) endif() # Generic CPACK variables set(CPACK_GENERATOR "DEB;RPM" CACHE STRING "Package types to build") set(CPACK_PACKAGE_CONTACT "ROCm OpenCL Support ") set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.") if(ENABLE_ASAN_PACKAGING) set(CPACK_PACKAGE_DESCRIPTION "OpenCL: AddressSanitizer Instrumented libraries for Open Computing Language on ROCclr") else() set(CPACK_PACKAGE_DESCRIPTION "OpenCL: Open Computing Language on ROCclr") endif() #########Binary-Package############### set(CPACK_DEBIAN_BINARY_PACKAGE_NAME "rocm-opencl") set(CPACK_RPM_BINARY_PACKAGE_NAME "rocm-opencl") ## Set ASAN package name set(CPACK_DEBIAN_ASAN_PACKAGE_NAME "rocm-opencl-asan") set(CPACK_RPM_ASAN_PACKAGE_NAME "rocm-opencl-asan") # Debian CPACK variables set(CPACK_BINARY_DEB "ON") if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE}) set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE}) else() set(CPACK_DEBIAN_PACKAGE_RELEASE "local") endif() message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}") set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT") if(BUILD_ICD) set(CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS "libelf-dev, comgr, hsa-rocr, rocm-core") set(CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "libelf-dev, comgr-asan, hsa-rocr-asan, rocm-core-asan") else() set(CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS "libelf-dev, comgr, hsa-rocr, rocm-core") set(CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "libelf-dev, comgr-asan, hsa-rocr-asan, rocm-core-asan") endif() # RPM CPACK variables set(CPACK_BINARY_RPM "ON") if(BUILD_ICD) set(CPACK_RPM_BINARY_PACKAGE_REQUIRES "comgr, hsa-rocr, rocm-core") set(CPACK_RPM_ASAN_PACKAGE_REQUIRES "comgr-asan, hsa-rocr-asan, rocm-core-asan") else() set(CPACK_RPM_BINARY_PACKAGE_REQUIRES "comgr, hsa-rocr, rocm-core") set(CPACK_RPM_ASAN_PACKAGE_REQUIRES "comgr-asan, hsa-rocr-asan, rocm-core-asan") endif() #Unable to set CPACK_RPM_BIANRY_PACKAGE_LICENSE to control individual pacakge license #Hence combining the license for BINARY,DEV,ICD to one set(CPACK_RPM_PACKAGE_LICENSE "MIT and ASL 2.0") #########Dev-Package############### # DEBIAN CPACK variables set(CPACK_DEBIAN_DEV_PACKAGE_NAME "rocm-opencl-dev") set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "mesa-common-dev, rocm-opencl, hsa-rocr-dev, rocm-core") # RPM CPACK variables set(CPACK_RPM_DEV_PACKAGE_NAME "rocm-opencl-devel") set(CPACK_RPM_DEV_PACKAGE_REQUIRES "rocm-opencl, hsa-rocr-devel, rocm-core") ############################# # ICD Loader ############################# # Debian CPACK variables if(BUILD_ICD) set(CPACK_ICD_DEB "ON") set(CPACK_DEBIAN_ICD_PACKAGE_NAME "rocm-ocl-icd") set(CPACK_DEBIAN_ICD_PACKAGE_CONTROL_EXTRA "${CMAKE_BINARY_DIR}/opencl/packages/rocm-ocl-icd/postinst;${CMAKE_BINARY_DIR}/opencl/packages/rocm-ocl-icd/prerm") set(CPACK_DEBIAN_ICD_PACKAGE_DEPENDS "rocm-core") # RPM CPACK variables set(CPACK_ICD_RPM "ON") set(CPACK_RPM_ICD_PACKAGE_NAME "rocm-ocl-icd") set(CPACK_RPM_ICD_POST_INSTALL_SCRIPT_FILE "${CMAKE_BINARY_DIR}/opencl/packages/rocm-ocl-icd/rpm_post") set(CPACK_RPM_ICD_POST_UNINSTALL_SCRIPT_FILE "${CMAKE_BINARY_DIR}/opencl/packages/rocm-ocl-icd/rpm_postun") set(CPACK_RPM_ICD_PACKAGE_REQUIRES "rocm-core") endif() if(BUILD_TESTS) set(CPACK_OCLTST_DEB "ON") set(CPACK_DEBIAN_OCLTST_PACKAGE_NAME "rocm-ocltst") set(CPACK_OCLTST_RPM "ON") set(CPACK_RPM_OCLTST_PACKAGE_NAME "rocm-ocltst") endif() if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE}) set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE}) else() set(CPACK_RPM_PACKAGE_RELEASE "local") endif() message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}") set(CPACK_RPM_FILE_NAME "RPM-DEFAULT") set(CPACK_RPM_PACKAGE_AUTOREQPROV " no") ## 'dist' breaks manual builds on debian systems due to empty Provides execute_process(COMMAND rpm --eval %{?dist} RESULT_VARIABLE PROC_RESULT OUTPUT_VARIABLE EVAL_RESULT OUTPUT_STRIP_TRAILING_WHITESPACE) message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}") if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "") string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}") endif() # Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake if(NOT ROCM_DEP_ROCMCORE) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_BINARY_PACKAGE_REQUIRES ${CPACK_RPM_BINARY_PACKAGE_REQUIRES}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS ${CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES ${CPACK_RPM_DEV_PACKAGE_REQUIRES}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS ${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS}) if(BUILD_ICD) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_ICD_PACKAGE_REQUIRES ${CPACK_RPM_ICD_PACKAGE_REQUIRES}) string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_ICD_PACKAGE_DEPENDS ${CPACK_DEBIAN_ICD_PACKAGE_DEPENDS}) endif() endif() include(CPack) clr-rocm-5.7.1/opencl/packaging/rocm-ocl-icd.postinst000066400000000000000000000007561450307266000225360ustar00rootroot00000000000000#!/bin/bash set -e INSTALL_PATH=@CPACK_PACKAGING_INSTALL_PREFIX@ do_ldconfig() { echo ${INSTALL_PATH}/@CMAKE_INSTALL_LIBDIR@ > /@CMAKE_INSTALL_SYSCONFDIR@/ld.so.conf.d/10-rocm-opencl.conf && ldconfig mkdir -p /@CMAKE_INSTALL_SYSCONFDIR@/OpenCL/vendors && (echo libamdocl64.so > /@CMAKE_INSTALL_SYSCONFDIR@/OpenCL/vendors/@OPENCL_AMD_ICD_FILE@) } case "$1" in abort-deconfigure|abort-remove|abort-upgrade) echo "$1" ;; configure) do_ldconfig ;; *) exit 0 ;; esac clr-rocm-5.7.1/opencl/packaging/rocm-ocl-icd.prerm000066400000000000000000000004411450307266000217670ustar00rootroot00000000000000#!/bin/bash set -e rm_ldconfig() { rm -f /@CMAKE_INSTALL_SYSCONFDIR@/ld.so.conf.d/10-rocm-opencl.conf && ldconfig rm -f /@CMAKE_INSTALL_SYSCONFDIR@/OpenCL/vendors/@OPENCL_AMD_ICD_FILE@ } case "$1" in purge) ;; remove | upgrade ) rm_ldconfig ;; *) exit 0 ;; esac clr-rocm-5.7.1/opencl/packaging/rocm-ocl-icd.rpm_post000066400000000000000000000004661450307266000225140ustar00rootroot00000000000000INSTALL_PATH=@CPACK_PACKAGING_INSTALL_PREFIX@ echo ${INSTALL_PATH}/@CMAKE_INSTALL_LIBDIR@ > /@CMAKE_INSTALL_SYSCONFDIR@/ld.so.conf.d/10-rocm-opencl.conf && ldconfig mkdir -p /@CMAKE_INSTALL_SYSCONFDIR@/OpenCL/vendors && (echo libamdocl64.so > /@CMAKE_INSTALL_SYSCONFDIR@/OpenCL/vendors/@OPENCL_AMD_ICD_FILE@) clr-rocm-5.7.1/opencl/packaging/rocm-ocl-icd.rpm_postun000066400000000000000000000004471450307266000230560ustar00rootroot00000000000000if [ $1 -eq 0 ]; then # Remove rocm-opencl.conf during remove/uninstall operation rm -f /@CMAKE_INSTALL_SYSCONFDIR@/ld.so.conf.d/10-rocm-opencl.conf && ldconfig fi # Remove icd file for uninstall and upgrade operation rm -f /@CMAKE_INSTALL_SYSCONFDIR@/OpenCL/vendors/@OPENCL_AMD_ICD_FILE@ clr-rocm-5.7.1/opencl/tests/000077500000000000000000000000001450307266000156675ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/000077500000000000000000000000001450307266000171775ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/CMakeLists.txt000066400000000000000000000006621450307266000217430ustar00rootroot00000000000000cmake_minimum_required(VERSION 3.5.1) set(OCLTST_DIR ${CMAKE_CURRENT_SOURCE_DIR}) if (WIN32) set(OCLTST_INSTALL_DIR "tests/ocltst") else() set(OCLTST_INSTALL_DIR "share/opencl/ocltst") endif() find_package(OpenGL) find_package(GLEW) add_subdirectory(module/common) add_subdirectory(env) if(OPENGL_FOUND AND GLEW_FOUND) add_subdirectory(module/gl) endif() add_subdirectory(module/perf) add_subdirectory(module/runtime) clr-rocm-5.7.1/opencl/tests/ocltst/env/000077500000000000000000000000001450307266000177675ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/env/CMakeLists.txt000066400000000000000000000023071450307266000225310ustar00rootroot00000000000000add_executable(ocltst) target_sources(ocltst PRIVATE ${OCLTST_DIR}/env/oclTestLog.cpp ${OCLTST_DIR}/env/oclsysinfo.cpp ${OCLTST_DIR}/env/ocltst.cpp ${OCLTST_DIR}/env/pfm.cpp ${OCLTST_DIR}/env/Timer.cpp ${OCLTST_DIR}/module/common/BaseTestImp.cpp ${OCLTST_DIR}/module/common/OCLTestImp.cpp ${OCLTST_DIR}/module/common/OCLThread.cpp ${OCLTST_DIR}/module/common/OCLWrapper.cpp) # Windows compatibilty logic if (WIN32) target_sources(ocltst PRIVATE ${OCLTST_DIR}/env/getopt.cpp ${OCLTST_DIR}/env/ServiceCode.cpp ${OCLTST_DIR}/env/window.cpp) endif() set_target_properties(ocltst PROPERTIES CXX_STANDARD 14 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst) target_compile_definitions(ocltst PRIVATE $) target_include_directories(ocltst PRIVATE $) target_link_libraries(ocltst PRIVATE OpenCL ) set_target_properties(ocltst PROPERTIES INSTALL_RPATH "$ORIGIN") INSTALL(TARGETS ocltst DESTINATION ${OCLTST_INSTALL_DIR} COMPONENT ocltst) clr-rocm-5.7.1/opencl/tests/ocltst/env/Module.h000066400000000000000000000033451450307266000213720ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef OCL_TEST_MODULE_H #define OCL_TEST_MODULE_H #include #include "OCLTest.h" #include "OCLTestList.h" struct Module { std::string name; ModuleHandle hmodule; TestCountFuncPtr get_count; TestNameFuncPtr get_name; CreateTestFuncPtr create_test; DestroyTestFuncPtr destroy_test; TestVersionFuncPtr get_version; TestLibNameFuncPtr get_libname; OCLTest** cached_test; Module() : name(""), hmodule(0), get_count(0), get_name(0), create_test(0), destroy_test(0), get_version(0), get_libname(0), cached_test(0) { // EMPTY! } }; #endif // OCL_TEST_MODULE_H clr-rocm-5.7.1/opencl/tests/ocltst/env/ResultStruct.h000066400000000000000000000037661450307266000226370ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _RESULT_STRUCT_H_ struct IndicesRange { int startIndex; int endIndex; }; #define INDEX_ALL_TESTS -1 #define EXTREMELY_SMALL_VALUE -10000.0f #define EXTREMELY_LARGE_VALUE 10000.0f class TestResult { public: float value; std::string resultString; bool passed; TestResult(float val) : resultString("\n"), passed(true) { value = val; } void reset(float val) { value = val; passed = true; resultString.assign("\n"); } }; class Report { public: TestResult *max; TestResult *min; bool success; int numFailedTests; Report() : success(true), numFailedTests(0) { max = new TestResult(EXTREMELY_SMALL_VALUE); min = new TestResult(EXTREMELY_LARGE_VALUE); } void reset() { max->reset(EXTREMELY_SMALL_VALUE); min->reset(EXTREMELY_LARGE_VALUE); success = true; numFailedTests = 0; } ~Report() { delete max; delete min; } }; #endif // _RESULT_STRUCT_H_ clr-rocm-5.7.1/opencl/tests/ocltst/env/ServiceCode.cpp000066400000000000000000000236561450307266000227020ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #define CL_USE_DEPRECATED_OPENCL_2_0_APIS 1 #include "CL/cl.hpp" SERVICE_STATUS serviceStatus = {0}; SERVICE_STATUS_HANDLE serviceStatusHandle = 0; const wchar_t* CrossProcessEventName = L"Global\\OpenCL_Test_serviceEvent"; const wchar_t* successMessage = L"OpenCL Service Test Success\n"; const wchar_t* serviceName = L"OpenCL Test service"; // this event is set whenever the service thread is finished executing // all it's tasks HANDLE RetireServiceEvent = 0; DWORD WINAPI ThreadProc(LPVOID lpdwThreadParam); ////////////////////////// // log relate functions // ////////////////////////// void getLogFileName(wchar_t fileName[MAX_PATH]) { DWORD dwSize = GetModuleFileNameW(NULL, fileName, MAX_PATH); wchar_t* p = fileName + dwSize; while (*p != '\\' && p > fileName) p--; p++; wcscpy(p, L"result.txt"); } VOID WriteLog(const wchar_t* pMsg) { static wchar_t fileName[MAX_PATH] = {0}; if (fileName[0] == 0) getLogFileName(fileName); FILE* pLog = _wfopen(fileName, L"w"); if (NULL != pLog) { fwprintf(pLog, pMsg); fclose(pLog); } } VOID AppendLog(const wchar_t* pMsg) { static wchar_t fileName[MAX_PATH] = {0}; if (fileName[0] == 0) getLogFileName(fileName); FILE* pLog = _wfopen(fileName, L"a"); if (NULL != pLog) { fwprintf(pLog, pMsg); fclose(pLog); } } VOID AppendLog(const char* pMsg) { static wchar_t fileName[MAX_PATH] = {0}; if (fileName[0] == 0) getLogFileName(fileName); FILE* pLog = _wfopen(fileName, L"a"); if (NULL != pLog) { fprintf(pLog, pMsg); fclose(pLog); } } /////////////////////////////// // service related functions // /////////////////////////////// void WINAPI ServiceControlHandler(DWORD controlCode) { switch (controlCode) { case SERVICE_CONTROL_INTERROGATE: break; case SERVICE_CONTROL_SHUTDOWN: case SERVICE_CONTROL_STOP: serviceStatus.dwCurrentState = SERVICE_STOP_PENDING; if (!SetServiceStatus(serviceStatusHandle, &serviceStatus)) AppendLog(L"SetServiceStatus SERVICE_STOP_PENDING failed\n"); if (RetireServiceEvent) SetEvent(RetireServiceEvent); return; case SERVICE_CONTROL_PAUSE: break; case SERVICE_CONTROL_CONTINUE: break; default: if (controlCode >= 128 && controlCode <= 255) // user defined control code break; else // unrecognised control code break; } if (!SetServiceStatus(serviceStatusHandle, &serviceStatus)) AppendLog(L"SetServiceStatus SERVICE_STOP_PENDING failed\n"); } void WINAPI ServiceMain(DWORD /*argc*/, wchar_t* /*argv*/[]) { // initialise service status serviceStatus.dwServiceType = SERVICE_WIN32; serviceStatus.dwCurrentState = SERVICE_START_PENDING; serviceStatus.dwControlsAccepted = SERVICE_ACCEPT_SHUTDOWN; serviceStatus.dwWin32ExitCode = NO_ERROR; serviceStatus.dwServiceSpecificExitCode = NO_ERROR; serviceStatus.dwCheckPoint = 0; serviceStatus.dwWaitHint = 0; serviceStatusHandle = RegisterServiceCtrlHandlerW(serviceName, ServiceControlHandler); if (serviceStatusHandle) { // service is starting serviceStatus.dwCurrentState = SERVICE_START_PENDING; if (!SetServiceStatus(serviceStatusHandle, &serviceStatus)) AppendLog(L"SetServiceStatus SERVICE_START_PENDING failed\n"); // do initialisation here RetireServiceEvent = CreateEvent(0, FALSE, FALSE, 0); // running serviceStatus.dwControlsAccepted |= (SERVICE_ACCEPT_STOP | SERVICE_ACCEPT_SHUTDOWN); serviceStatus.dwCurrentState = SERVICE_RUNNING; if (!SetServiceStatus(serviceStatusHandle, &serviceStatus)) AppendLog(L"SetServiceStatus SERVICE_RUNNING failed\n"); // Create the thread that actually does the CL testing CreateThread(NULL, 0, ThreadProc, NULL, 0, NULL); // wait for the thread to finish WaitForSingleObject(RetireServiceEvent, 60000); HANDLE crossProcessEvent = OpenEventW(EVENT_ALL_ACCESS, FALSE, CrossProcessEventName); if (NULL != crossProcessEvent) { SetEvent(crossProcessEvent); } else { AppendLog(L"cross process Event could not be openned\n"); } // service was stopped serviceStatus.dwCurrentState = SERVICE_STOP_PENDING; if (!SetServiceStatus(serviceStatusHandle, &serviceStatus)) AppendLog(L"SetServiceStatus SERVICE_STOP_PENDING failed\n"); // do cleanup here CloseHandle(crossProcessEvent); CloseHandle(RetireServiceEvent); RetireServiceEvent = 0; // service is now stopped serviceStatus.dwControlsAccepted &= ~(SERVICE_ACCEPT_STOP | SERVICE_ACCEPT_SHUTDOWN); serviceStatus.dwCurrentState = SERVICE_STOPPED; if (!SetServiceStatus(serviceStatusHandle, &serviceStatus)) AppendLog(L"SetServiceStatus SERVICE_STOPPED failed\n"); } } // This function services ocltst as a service when launched // by the OS. It registers the service routines. void serviceStubCall() { wchar_t serviceName[MAX_PATH]; wcscpy(serviceName, ::serviceName); SERVICE_TABLE_ENTRYW serviceTable[] = {{serviceName, ServiceMain}, {0, 0}}; DWORD session_id; BOOL retVal = ProcessIdToSessionId(GetCurrentProcessId(), &session_id); if (0 == session_id) { StartServiceCtrlDispatcherW(serviceTable); } } ///////////////////// // CL related code // ///////////////////// const char c_kernelCode[] = " __kernel void hello(__global char * theArray)" "{" " size_t i = get_global_id(0);" "if ( i < get_global_size(0))" "theArray[i] = 78;" "}"; const unsigned int c_bufferSize = 1024; DWORD WINAPI ThreadProc(LPVOID lpdwThreadParam) { cl_int err; // Platform info std::vector platforms; err = cl::Platform::get(&platforms); if (err != CL_SUCCESS) { AppendLog(L"Platform::get() failed\n"); return -1; } std::vector::iterator i; if (platforms.size() > 0) { for (i = platforms.begin(); i != platforms.end(); ++i) { if (!strcmp((*i).getInfo(&err).c_str(), "Advanced Micro Devices, Inc.")) { break; } } } if (err != CL_SUCCESS) { AppendLog(L"Platform::getInfo() failed \n"); return -1; } cl_context_properties cps[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties)(*i)(), 0}; cl::Context context(CL_DEVICE_TYPE_GPU, cps, NULL, NULL, &err); if (err != CL_SUCCESS) { AppendLog(L"Context::Context() failed \n"); return -1; } std::vector devices = context.getInfo(); if (err != CL_SUCCESS) { AppendLog(L"Context::getInfo() failed \n"); return -1; } if (devices.size() == 0) { AppendLog(L"No device available\n"); return -1; } cl::Program::Sources sources( 1, std::make_pair(c_kernelCode, sizeof(c_kernelCode))); cl::Program program = cl::Program(context, sources, &err); if (err != CL_SUCCESS) { AppendLog(L"Program::Program() failed\n"); } err = program.build(devices); if (err != CL_SUCCESS) { if (err == CL_BUILD_PROGRAM_FAILURE) { std::string str( (char*)program.getBuildInfo(devices[0]) .c_str()); AppendLog(L" \n\t\t\tBUILD LOG\n\n"); AppendLog(L" ************************************************\n"); AppendLog(str.c_str()); AppendLog(L" ************************************************\n"); } AppendLog(L"Program::build() failed\n"); return -1; } cl::Kernel kernel(program, "hello", &err); if (err != CL_SUCCESS) { AppendLog(L"Kernel::Kernel() failed\n"); return -1; } cl::Buffer buffer = cl::Buffer(context, CL_MEM_READ_WRITE, c_bufferSize, 0, &err); if (err != CL_SUCCESS) { AppendLog(L"Kernel::setArg() failed \n"); } cl::CommandQueue queue(context, devices[0], 0, &err); if (err != CL_SUCCESS) { AppendLog(L"CommandQueue::CommandQueue() failed \n"); return -1; } err = kernel.setArg(0, buffer); if (err != CL_SUCCESS) { AppendLog(L"Kernel::setArg() failed \n"); return -1; } err = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(c_bufferSize), cl::NullRange); if (err != CL_SUCCESS) { AppendLog(L"CommandQueue::enqueueNDRangeKernel()\n"); return -1; } err = queue.finish(); if (err != CL_SUCCESS) { AppendLog(L"Event::wait() failed \n"); } char* ptr = (char*)malloc(c_bufferSize); err = queue.enqueueReadBuffer(buffer, CL_TRUE, 0, c_bufferSize, ptr, NULL, NULL); if (err != CL_SUCCESS) { AppendLog(L"CommandQueue::enqueueReadBuffer()\n"); return -1; } bool validateSuccess = true; // validate the results for (int i = 0; i < c_bufferSize; i++) { if (ptr[i] != 78) validateSuccess = false; } free(ptr); if (validateSuccess) { WriteLog(successMessage); AppendLog(L"validate success"); } else { AppendLog(L"Validate fail"); return -1; } SetEvent(RetireServiceEvent); return 0; } clr-rocm-5.7.1/opencl/tests/ocltst/env/Timer.cpp000066400000000000000000000047571450307266000215700ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "Timer.h" #ifdef _WIN32 #include #endif #ifdef __linux__ #include #endif CPerfCounter::CPerfCounter() : _clocks(0), _start(0) { #ifdef _WIN32 QueryPerformanceFrequency((LARGE_INTEGER *)&_freq); #endif #ifdef __linux__ _freq = 1000; #endif } CPerfCounter::~CPerfCounter() { // EMPTY! } void CPerfCounter::Start(void) { #ifdef _WIN32 if (_start) { MessageBox(NULL, "Bad Perf Counter Start", "Error", MB_OK); exit(0); } QueryPerformanceCounter((LARGE_INTEGER *)&_start); #endif #ifdef __linux__ struct timeval s; gettimeofday(&s, 0); _start = (i64)s.tv_sec * 1000 + (i64)s.tv_usec / 1000; #endif } void CPerfCounter::Stop(void) { i64 n; #ifdef _WIN32 if (!_start) { MessageBox(NULL, "Bad Perf Counter Stop", "Error", MB_OK); exit(0); } QueryPerformanceCounter((LARGE_INTEGER *)&n); #endif #ifdef __linux__ struct timeval s; gettimeofday(&s, 0); n = (i64)s.tv_sec * 1000 + (i64)s.tv_usec / 1000; #endif n -= _start; _start = 0; _clocks += n; } void CPerfCounter::Reset(void) { #ifdef _WIN32 if (_start) { MessageBox(NULL, "Bad Perf Counter Reset", "Error", MB_OK); exit(0); } #endif _clocks = 0; } double CPerfCounter::GetElapsedTime(void) { #ifdef _WIN32 if (_start) { MessageBox(NULL, "Trying to get time while still running.", "Error", MB_OK); exit(0); } #endif return (double)_clocks / (double)_freq; } clr-rocm-5.7.1/opencl/tests/ocltst/env/Timer.h000066400000000000000000000026671450307266000212330ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _TIMER_H_ #define _TIMER_H_ #ifdef _WIN32 typedef __int64 i64; #endif #ifdef __linux__ typedef long long i64; #endif class CPerfCounter { public: CPerfCounter(); ~CPerfCounter(); void Start(void); void Stop(void); void Reset(void); double GetElapsedTime(void); private: i64 _freq; i64 _clocks; i64 _start; }; #endif // _TIMER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/env/Worker.h000066400000000000000000000122261450307266000214140ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef OCL_TEST_WORKER_H #define OCL_TEST_WORKER_H ///////////////////////////////////////////////////////////////////////////// #include #include #include #include #include #include #include "Module.h" #include "OCLTest.h" #include "OCLTestList.h" #include "ResultStruct.h" #include "Timer.h" #include "getopt.h" #include "pfm.h" ///////////////////////////////////////////////////////////////////////////// typedef void* (*TestMethod)(void* param); ///////////////////////////////////////////////////////////////////////////// class Worker { public: Worker() : m_wrapper(0), m_module(0), m_run(0), m_id(0), m_subtest(0), m_testindex(0), m_dump(false), m_display(false), m_useCPU(false), m_window(0), m_width(0), m_height(0), m_buffer(0), m_perflab(false), m_deviceId(0), m_platform(0) { // EMPTY! } Worker(OCLWrapper* wrapper, Module* module, TestMethod run, unsigned int id, unsigned int subtest, unsigned int testindex, bool dump, bool view, bool useCPU, void* window, unsigned int x, unsigned int y, bool perflab, unsigned int deviceId = 0, unsigned int platform = 0) : m_wrapper(wrapper), m_module(module), m_run(run), m_id(id), m_subtest(subtest), m_testindex(testindex), m_dump(dump), m_display(view), m_useCPU(useCPU), m_window(window), m_width(x), m_height(y), m_buffer(0), m_perflab(perflab), m_deviceId(deviceId), m_platform(platform) { if (m_dump == true || m_display == true) { m_buffer = new float[4 * m_width * m_height]; if (m_buffer != 0) { memset(m_buffer, 0, 4 * m_width * m_height * sizeof(float)); } else { m_dump = false; m_display = false; } } m_result = new TestResult(0.0f); } Worker(const Worker& w) { if (this == &w) return; if (m_buffer) delete[] m_buffer; m_buffer = 0; m_wrapper = w.m_wrapper; m_module = w.m_module; m_run = w.m_run; m_id = w.m_id; m_subtest = w.m_subtest; m_testindex = w.m_testindex; m_dump = w.m_dump; m_display = w.m_display; m_useCPU = w.m_useCPU; m_window = w.m_window; m_width = w.m_width; m_height = w.m_height; m_perflab = w.m_perflab; m_deviceId = w.m_deviceId; m_result = w.m_result; m_platform = w.m_platform; if (w.m_buffer) { m_buffer = new float[4 * m_width * m_height]; if (m_buffer != 0) { memcpy(m_buffer, w.m_buffer, 4 * m_width * m_height * sizeof(float)); } } } ~Worker() { if (m_buffer) delete[] m_buffer; m_buffer = 0; delete m_result; m_result = 0; } OCLWrapper* getOCLWrapper() { return m_wrapper; } Module* getModule() { return m_module; } TestMethod getTestMethod() { return m_run; } unsigned int getId() { return m_id; } unsigned int getSubTest() { return m_subtest; } unsigned int getTestIndex() { return m_testindex; } bool isDumpEnabled() { return m_dump; } bool isDisplayEnabled() { return m_display; } bool isCPUEnabled() { return m_useCPU; } void* getWindow() { return m_window; } unsigned int getWidth() { return m_width; } unsigned int getHeight() { return m_height; } float* getBuffer() { return m_buffer; } bool getPerflab() { return m_perflab; } unsigned int getDeviceId() { return m_deviceId; } TestResult* getResult() { return m_result; } unsigned int getPlatformID() { return m_platform; } private: OCLWrapper* m_wrapper; Module* m_module; TestMethod m_run; unsigned int m_id; unsigned int m_subtest; unsigned int m_testindex; bool m_dump; bool m_display; bool m_useCPU; void* m_window; unsigned int m_width; unsigned int m_height; float* m_buffer; bool m_perflab; unsigned int m_deviceId; unsigned int m_platform; TestResult* m_result; }; ///////////////////////////////////////////////////////////////////////////// #endif // OCL_TEST_WORKER_H clr-rocm-5.7.1/opencl/tests/ocltst/env/getopt.cpp000066400000000000000000000031111450307266000217710ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "getopt.h" #include char *optarg = nullptr; int optind = 1; int getopt(int argc, char *const argv[], const char *optstring) { if ((optind >= argc) || (argv[optind][0] != '-') || (argv[optind][0] == 0)) { return -1; } int opt = argv[optind][1]; const char *p = strchr(optstring, opt); if (p == nullptr) { return '?'; } if (p[1] == ':') { optind++; if (optind >= argc) { return '?'; } optarg = argv[optind]; } optind++; return opt; } clr-rocm-5.7.1/opencl/tests/ocltst/env/getopt.h000066400000000000000000000023201450307266000214370ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once extern char *optarg; extern int optind; extern "C" int getopt(int argc, char *const argv[], const char *optstring); clr-rocm-5.7.1/opencl/tests/ocltst/env/oclTestLog.cpp000066400000000000000000000061131450307266000225530ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "oclTestLog.h" #include #include #include "OCLLog.h" oclLog::oclLog() : m_stdout_fp(stdout), m_filename(""), m_writeToFileIsEnabled(false) {} oclLog::~oclLog() { disable_write_to_file(); } void oclLog::enable_write_to_file(std::string filename) { m_writeToFileIsEnabled = true; m_filename = filename; FILE* fp = fopen(m_filename.c_str(), "w"); if (fp == NULL) { oclTestLog(OCLTEST_LOG_ALWAYS, "ERROR: Cannot open file %s. Disabling logging to file.\n", filename.c_str()); m_writeToFileIsEnabled = false; } else { fclose(fp); } } void oclLog::disable_write_to_file() { m_writeToFileIsEnabled = false; } void oclLog::vprint(char const* fmt, va_list args) { // hack for fixing the lnx64bit segfault and // garbage printing in file. XXX 2048 a magic number char buffer[4096]; memset(buffer, 0, sizeof(buffer)); int rc = vsnprintf(buffer, sizeof(buffer), fmt, args); assert(rc >= 0 && rc != sizeof(buffer)); fputs(buffer, m_stdout_fp); if (m_writeToFileIsEnabled) { FILE* fp = fopen(m_filename.c_str(), "a"); if (fp == NULL) { oclTestLog(OCLTEST_LOG_ALWAYS, "ERROR: Cannot open file %s. Disabling logging to file.\n", m_filename.c_str()); m_writeToFileIsEnabled = false; } fputs(buffer, fp); fclose(fp); } } void oclLog::flush() { fflush(m_stdout_fp); } static oclLog& theLog() { static oclLog Log; return Log; } static oclLoggingLevel currentLevel = OCLTEST_LOG_ALWAYS; static float logcount = 0.0f; void oclTestLog(oclLoggingLevel logLevel, const char* fmt, ...) { logcount += 1.0f; if (logLevel <= currentLevel) { va_list args; va_start(args, fmt); theLog().vprint(fmt, args); theLog().flush(); va_end(args); } } void oclTestEnableLogToFile(const char* filename) { theLog().enable_write_to_file(filename); } void oclTestSetLogLevel(int level) { if (level >= 0) { currentLevel = static_cast(level); } } clr-rocm-5.7.1/opencl/tests/ocltst/env/oclTestLog.h000066400000000000000000000030251450307266000222170ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef CALTESTLOG_H_ #define CALTESTLOG_H_ #include #include #include class oclLog { public: oclLog(); virtual ~oclLog(); virtual void vprint(char const* fmt, va_list args); virtual void flush(); virtual void enable_write_to_file(std::string filename); virtual void disable_write_to_file(); private: FILE* m_stdout_fp; std::string m_filename; bool m_writeToFileIsEnabled; }; #endif // CALTESTLOG_H_ clr-rocm-5.7.1/opencl/tests/ocltst/env/oclsysinfo.cpp000066400000000000000000000131131450307266000226620ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "oclsysinfo.h" #include #include #include #ifndef MAX_DEVICES #define MAX_DEVICES 16 #endif // MAX_DEVICES int oclSysInfo(std::string &info_string, bool use_cpu, unsigned dev_id, unsigned int platformIndex) { /* * Have a look at the available platforms and pick the one * in the platforms vector in index "platformIndex". */ cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; int error = clGetPlatformIDs(0, NULL, &numPlatforms); if (CL_SUCCESS != error) { fprintf(stderr, "clGetPlatformIDs() failed"); return 0; } if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error = clGetPlatformIDs(numPlatforms, platforms, NULL); if (CL_SUCCESS != error) { fprintf(stderr, "clGetPlatformIDs() failed"); return 0; } #if 0 for (unsigned i = 0; i < numPlatforms; ++i) { /* Get the number of requested devices */ error = clGetDeviceIDs(platforms[i], (use_cpu) ? CL_DEVICE_TYPE_CPU : CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices ); #if 0 /* clGetDeviceIDs fails when no GPU devices are present */ if (error) { fprintf(stderr, "clGetDeviceIDs failed: %d\n", error ); return 0; } #endif #if 0 char pbuf[100]; error = clGetPlatformInfo( platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { platform = platforms[i]; break; } #else /* Select platform with GPU devices present */ if (num_devices > 0) { platform = platforms[i]; break; } #endif } #endif error = clGetDeviceIDs(platforms[platformIndex], (use_cpu) ? CL_DEVICE_TYPE_CPU : CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices); if (error) { fprintf(stderr, "clGetDeviceIDs failed: %d\n", error); return 0; } platform = platforms[platformIndex]; delete[] platforms; } if (dev_id >= num_devices) { fprintf(stderr, "Device selected does not exist.\n"); return 0; } if (NULL == platform) { fprintf(stderr, "Couldn't find platform with GPU devices, cannot proceed.\n"); return 0; } devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); if (!devices) { fprintf(stderr, "no devices\n"); return 0; } /* Get the requested device */ error = clGetDeviceIDs(platform, (use_cpu) ? CL_DEVICE_TYPE_CPU : CL_DEVICE_TYPE_GPU, num_devices, devices, NULL); if (error) { fprintf(stderr, "clGetDeviceIDs failed: %d\n", error); return 0; } device = devices[dev_id]; char c[1024]; char tmpString[256]; static const char *no_yes[] = {"NO", "YES"}; sprintf(tmpString, "\nCompute Device info:\n"); info_string.append(tmpString); clGetPlatformInfo(platform, CL_PLATFORM_VERSION, sizeof(c), &c, NULL); sprintf(tmpString, "\tPlatform Version: %s\n", c); info_string.append(tmpString); clGetDeviceInfo(device, CL_DEVICE_NAME, sizeof(c), &c, NULL); sprintf(tmpString, "\tDevice Name: %s\n", c); info_string.append(tmpString); clGetDeviceInfo(device, CL_DEVICE_VENDOR, sizeof(c), &c, NULL); sprintf(tmpString, "\tVendor: %s\n", c); info_string.append(tmpString); clGetDeviceInfo(device, CL_DEVICE_VERSION, sizeof(c), &c, NULL); sprintf(tmpString, "\tDevice Version: %s\n", c); info_string.append(tmpString); clGetDeviceInfo(device, CL_DRIVER_VERSION, sizeof(c), &c, NULL); sprintf(tmpString, "\tDriver Version: %s\n", c); info_string.append(tmpString); clGetDeviceInfo(device, CL_DEVICE_BOARD_NAME_AMD, sizeof(c), &c, NULL); sprintf(tmpString, "\tBoard Name: %s\n", c); info_string.append(tmpString); #if defined(__linux__) cl_device_topology_amd topology; clGetDeviceInfo(device, CL_DEVICE_TOPOLOGY_AMD, sizeof(topology), &topology, NULL); if (topology.raw.type == CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD) { sprintf(tmpString, "\tDevice Topology: PCI[ B#%d, D#%d, F#%d]\n", topology.pcie.bus, topology.pcie.device, topology.pcie.function); info_string.append(tmpString); } #endif free(devices); return 1; } clr-rocm-5.7.1/opencl/tests/ocltst/env/oclsysinfo.h000066400000000000000000000024371450307266000223360ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLSYSINFO_H_ #define _OCLSYSINFO_H_ #include int oclSysInfo(std::string& info_string, bool useCPU, unsigned dev_id, unsigned int platformIndex = 0); #endif //_OCLSYSINFO_H_ clr-rocm-5.7.1/opencl/tests/ocltst/env/ocltst.cpp000066400000000000000000001400151450307266000220040ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ ///////////////////////////////////////////////////////////////////////////// #include #ifdef _WIN32 #include #include "Window.h" typedef HMODULE ModuleHandle; #endif ///////////////////////////////////////////////////////////////////////////// #ifdef __linux__ #include typedef void* ModuleHandle; #endif ///////////////////////////////////////////////////////////////////////////// #include "BaseTestImp.h" #include "Module.h" #include "OCLLog.h" #include "OCLTest.h" #include "OCLTestImp.h" #include "OCLTestList.h" #include "OCLWrapper.h" #include "Timer.h" #include "Worker.h" #include "getopt.h" #include "oclsysinfo.h" #include "pfm.h" //! Including OCLutilities Thread utility #include "OCL/Thread.h" //! Lock that needs to be obtained to access the global //! module variable static OCLutil::Lock moduleLock; #include #include #include #include #include #include ///////////////////////////////////////////////////////////////////////////// #ifdef _WIN32 static LONG WINAPI xFilter(LPEXCEPTION_POINTERS xEP); void serviceStubCall(); #endif #define MAX_DEVICES 16 #undef CHECK_RESULT #define CHECK_RESULT(test, msg) \ if ((test)) { \ printf("\n%s\n", msg); \ exit(1); \ } //! Declaration of a function that find devices of a specific type for the //! chosen platform int findAdapters(unsigned int platformIdx, bool useCPU, cl_platform_id*); //! class App that is used to run the tests on the system class App { public: static bool m_reRunFailed; static const char* m_svcMsg; //! Constructor for App App(unsigned int platform) : m_list(false), m_console(true), m_useCPU(false), m_dump(false), m_perflab(false), m_noSysInfoPrint(false), m_numItr(1), mp_testOrder(NULL), m_rndOrder(false), m_spawned(0), m_threads(1), m_runthread(0), m_width(512), m_height(512), m_window(0), m_platform(platform) { // initialize OCLWrapper reference m_wrapper = new OCLWrapper(); // m_workers = Set of worker objects that are used to run a subtest from a // module for (unsigned int i = 0; i < 256; i++) m_workers[i] = 0; // Setting the number of devices /* * Force caltst to use 1 thread at a time in Windows * only contextual calls are thread safe currently */ m_numDevices = findAdapters(m_platform, m_useCPU, NULL); // m_numDevices = 1; // Report structure used to store the results of the tests #if 0 testReport = (Report **)malloc(sizeof(Report *) * m_numDevices); for(unsigned int i = 0; i < m_numDevices; i++) { testReport[i] = new Report; } #else testReport = (Report**)malloc(sizeof(Report*)); testReport[0] = new Report; #endif } //! Destructor for App ~App() { // Deleting the Worker objects for (unsigned int i = 0; i < 256; i++) { if (m_workers[i]) { delete m_workers[i]; m_workers[i] = 0; } } // Deleting the report structures // for(unsigned int i = 0; i < m_numDevices; i++) for (unsigned int i = 0; i < 1; i++) { delete testReport[i]; } free(testReport); m_wrapper->clUnloadPlatformAMD(mpform_id); delete m_wrapper; } //! Function used to create a worker object corresponding to a subtest in a //! module void SetWorker(unsigned int index, OCLWrapper* wrapper, Module* module, TestMethod run, unsigned int id, unsigned int subtest, unsigned int test, bool dump, bool view, bool useCPU, void* window, unsigned int x, unsigned int y, bool perflab, unsigned int deviceId, unsigned int platform) { if (index >= 256) return; if (m_workers[index]) delete m_workers[index]; m_workers[index] = new Worker(wrapper, module, run, id, subtest, test, dump, view, useCPU, window, x, y, perflab, deviceId, platform); assert(m_workers[index] != 0); // oclTestLog(OCLTEST_LOG_ALWAYS, "Worker Device Id = %d\n", // m_workers[index]->getDeviceId()); } //! Function to return the 'index'th m_workers Worker* GetWorker(unsigned int index) { if (index >= 256) return 0; return m_workers[index]; } //! Create a thread to run the subtest void AddThread(unsigned int workerindex, unsigned int usage) { Worker* worker = GetWorker(workerindex); if (worker == 0) { return; } // usage = Whether to use threads or not if (usage != 0) { // Creating a thread // getTestMethod = runSubTest here // which takes a Worker object as an argument m_pool[workerindex].create(worker->getTestMethod(), (void*)(worker)); m_spawned++; } else { // Same as above without using threads TestMethod run = worker->getTestMethod(); if (run) { run(worker); UpdateTestReport(workerindex, worker->getResult()); } } return; } //! Function which waits for all threads to execute and also updates the //! report void WaitAllThreads() { for (unsigned int w = 0; w < m_spawned; w++) { m_pool[w].join(); UpdateTestReport(w, m_workers[w]->getResult()); } m_spawned = 0; } //! Function to add a worker thread so as to run a subtest of a module //! @param run = runSubtest function //! @param index = index of the module in m_modules //! @param subtest = the subtest number to run //! @param usage = whether to use threads or not //! @param test = The test in the module to be executed void AddWorkerThread(unsigned int index, unsigned int subtest, unsigned int test, unsigned int usage, TestMethod run) { if (m_spawned > m_threads) { WaitAllThreads(); } // Creating a worker thread for each device #if 0 for(unsigned int i = 0; i < m_numDevices; i++) { SetWorker(i, m_wrapper, &m_modules[index], run, m_spawned, subtest, test, m_dump, !m_console, m_useCPU, m_window, m_width, m_height, m_perflab, i, m_platform); } #else for (unsigned int i = 0; i < 1; i++) { SetWorker(i, m_wrapper, &m_modules[index], run, m_spawned, subtest, test, m_dump, !m_console, m_useCPU, m_window, m_width, m_height, m_perflab, m_deviceId, m_platform); } #endif // Creating and executing a thread for each device // for(unsigned int i = 0; i < m_numDevices; i++) for (unsigned int i = 0; i < 1; i++) { AddThread(i, usage); } } void printOCLinfo(void); //! Function to process the commandline arguments void CommandLine(unsigned int argc, char** argv); //! Function to scan for the different tests in the module void ScanForTests(); //! Function to run all the specified tests void RunAllTests(); //! Free memory void CleanUp(); //! Function to set the order in which test are executed. void SetTestRunOrder(int); //! Function to print the test order void PrintTestOrder(int); //! Function to get the number of iterations. int GetNumItr(void) { return m_numItr; } private: typedef std::vector TestIndexList; typedef std::vector StringList; void AddToList(StringList& strlist, const char* str); void LoadList(StringList& strlist, const char* filename); bool TestInList(StringList& strlist, const char* testname); //! Array storing the report for each device Report** testReport; //! Function to update the result of each device void UpdateTestReport(int index, TestResult* result) { if (result != NULL) { if (result->passed) { if (testReport[index]->max->value < result->value) { testReport[index]->max->value = result->value; testReport[index]->max->resultString = result->resultString; } if (testReport[index]->min->value > result->value) { testReport[index]->min->value = result->value; testReport[index]->min->resultString = result->resultString; } } else { testReport[index]->numFailedTests++; testReport[index]->success = false; } } else { testReport[index]->numFailedTests++; testReport[index]->success = false; } } //! Functions used to find the range of the tests to be run void GetTestIndexList(TestIndexList& testIndices, StringList& testList, const char* szModuleTestname, int maxIndex); void PruneTestIndexList(TestIndexList& testIndices, TestIndexList& avoidIndices, TestIndexList& erasedIndices); StringList m_paths; StringList m_tests; StringList m_avoid; std::vector m_modules; bool m_list; bool m_console; bool m_useCPU; bool m_dump; bool m_perflab; bool m_noSysInfoPrint; int m_numItr; int* mp_testOrder; bool m_rndOrder; //! m_pool = Various threads created to execute tests on multiple devices OCLutil::Thread m_pool[256]; Worker* m_workers[256]; //! Number of threads spawned unsigned int m_spawned; //! Upper limit on the number of threads that can be spawned unsigned int m_threads; unsigned int m_runthread; unsigned int m_width; unsigned int m_height; void* m_window; //! which index/platform id from the platforms vector returned by //! cl::Platform::get we should run on unsigned int m_platform; cl_platform_id mpform_id; //! Number of devices on the system unsigned int m_numDevices; // //! Device ID to use on the system unsigned int m_deviceId; // OCLWrapper reference OCLWrapper* m_wrapper; }; void App::printOCLinfo(void) { std::string calinfo; if (!m_noSysInfoPrint) { oclSysInfo(calinfo, m_useCPU, m_deviceId, m_platform); oclTestLog(OCLTEST_LOG_ALWAYS, calinfo.c_str()); } } /*----------------------------------------------------- Function to randomize the order in which tests are executed -------------------------------------------------------*/ #ifdef _WIN32 #include #endif // void App::SetTestRunOrder(int test_count) void App::SetTestRunOrder(int mod_index) { assert(mp_testOrder != NULL); unsigned int test_count = m_modules[mod_index].get_count(); StringList uniqueTests; for (unsigned int i = 0; i < m_tests.size(); ++i) { // see if the tests are being run using indices size_t nFirstBracket = m_tests[i].find("["); // set the test name std::string szTestName = m_tests[i]; // order of execution is set based on base name so get the base name if (nFirstBracket != std::string::npos) szTestName = szTestName.substr(0, nFirstBracket); bool bTestExists = false; for (unsigned int j = 0; j < uniqueTests.size(); ++j) { if (strcmp(szTestName.c_str(), uniqueTests[j].c_str()) == 0) { bTestExists = true; break; } } if (!bTestExists) { AddToList(uniqueTests, szTestName.c_str()); } } for (unsigned int i = 0; i < test_count && i < uniqueTests.size(); i++) { for (unsigned int j = 0; j < test_count; j++) { unsigned int index = i; // add all the prev test indices for (int k = 0; k < mod_index; k++) index += m_modules[k].get_count(); std::string szTestName = uniqueTests[index]; if (strcmp(szTestName.c_str(), m_modules[mod_index].get_name(j)) == 0) { mp_testOrder[i] = j; break; } } } if (m_rndOrder) { srand((unsigned int)time(NULL)); for (unsigned int i = 0; i < test_count; i++) { // find two random indices int index1 = (int)((float)test_count * (rand() / (RAND_MAX + 1.0))); int index2 = (int)((float)test_count * (rand() / (RAND_MAX + 1.0))); // swap the data int tmp = mp_testOrder[index1]; mp_testOrder[index1] = mp_testOrder[index2]; mp_testOrder[index2] = tmp; } } } ///////////////////////////////////////////////////////////////////////////// // Process device string. Returns true if there is a primary ATI Radeon device // adapter, false otherwise static bool procDevString(const char* devString) { // Search for the string "Radeon" inside the device string if (strstr(devString, "Radeon") || strstr(devString, "R600") || strstr(devString, "RV630") || strstr(devString, "RV670") || (strstr(devString, "Stream") && strstr(devString, "Processor"))) { // Ignore if the device is a secondary device, i.e., not an adapter if (strstr(devString, "Secondary")) { return false; } } else { return false; } return true; } //! //! Function to find the total number of adapters on the system //! int findAdapters(unsigned int platformIdx, bool useCPU, cl_platform_id* mpform) { unsigned int numOfAdapters = 0; cl_int error = 0; cl_uint numPlatforms = 0; error = clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT((error != CL_SUCCESS), "clGetPlatformIDs failed"); CHECK_RESULT((platformIdx >= numPlatforms), "Invalid platform"); cl_platform_id* platforms = new cl_platform_id[numPlatforms]; error = clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error != CL_SUCCESS, "clGetPlatformIDs failed"); cl_platform_id platform = 0; platform = platforms[platformIdx]; delete[] platforms; cl_device_type devType = CL_DEVICE_TYPE_GPU; if (useCPU) devType = CL_DEVICE_TYPE_CPU; error = clGetDeviceIDs(platform, devType, 0, 0, &numOfAdapters); CHECK_RESULT((error != CL_SUCCESS), "clGetDeviceIDs failed"); if (mpform) { (*mpform) = platform; } return (int)numOfAdapters; } int calibrate(OCLTest* test) { int n = 1; #if 0 while (1) { double timer = run(test, n); if (timer > 2.) { break; } n *= 2; } #endif return n; } void* dummyThread(void* argv) { unsigned int counter = 0; while (counter < 1000000) counter++; return argv; } //! Function used to run the test specified //! It would look something like OCLPerfInputspeed[0] double run(OCLTest* test, int passes) { CPerfCounter counter; counter.Reset(); counter.Start(); int i; for (i = 0; i < passes; i++) { test->run(); } counter.Stop(); double timer = counter.GetElapsedTime(); counter.Reset(); return timer; } //! Function to display the result after a test is finished //! It also stores the result in a TestResult object void report(Worker* w, const char* testname, int testnum, unsigned int crc, const char* errorMsg, float timer, TestResult* tr, const char* testDesc) { unsigned int thread = w->getId(); bool perflab = w->getPerflab(); unsigned int deviceId = w->getDeviceId(); char tmpUnits[256]; if (perflab) { oclTestLog(OCLTEST_LOG_ALWAYS, "%10.3f\n", timer); } else { const char* passedOrFailed[] = {"FAILED", "PASSED"}; // char teststring[256]; // sprintf(teststring, "%s[%d]", testname, testnum); // sprintf(tmpUnits, "Device[%d]:\t%-32s:\t%s\n", deviceId, teststring, // ((tr->passed) ? passedOrFailed[1] : passedOrFailed[0])); // If crc is not 0 or errorMsg is not empty, print the full stats if ((crc != 0) || (errorMsg && (errorMsg[0] != '\0'))) { sprintf(tmpUnits, "%s %s: %s[%d] T[%1d] [%3d], %10.3f %-20s (chksum 0x%08x)\n", testDesc, ((tr->passed) ? passedOrFailed[1] : passedOrFailed[0]), w->isCPUEnabled() ? "CPU" : "GPU", deviceId, thread, testnum, timer, errorMsg, crc); } else { sprintf(tmpUnits, "%s %s: %s[%d] T[%1d] [%3d], %10.3f\n", testDesc, ((tr->passed) ? passedOrFailed[1] : passedOrFailed[0]), w->isCPUEnabled() ? "CPU" : "GPU", deviceId, thread, testnum, timer); } oclTestLog(OCLTEST_LOG_ALWAYS, tmpUnits); tr->value = timer; tr->resultString.assign(tmpUnits); if (App::m_svcMsg && !tr->passed) { char escaped[2 * sizeof(tmpUnits)]; char* ptr = escaped; for (int i = 0; tmpUnits[i] != '\0'; ++i) { switch (tmpUnits[i]) { case '\n': *ptr++ = '|'; *ptr++ = 'n'; break; case '\r': *ptr++ = '|'; *ptr++ = 'r'; break; case '\'': case '|': case ']': case '[': *ptr++ = '|'; default: *ptr++ = tmpUnits[i]; } } *ptr = '\0'; oclTestLog(OCLTEST_LOG_ALWAYS, "##%s[testFailed name='%s.%s.%d' message='FAILED' " "details='%s']\n", App::m_svcMsg, w->getModule()->get_libname(), testname, testnum, escaped); } } } //! Thread Entry point void* runSubtest(void* worker) { char units[256]; double conversion; unsigned int crc = 0; bool second_run = false; // Getting the worker object that is running in this thread Worker* w = (Worker*)worker; if (w == 0) return NULL; unsigned int test = w->getTestIndex(); unsigned int subtest = w->getSubTest(); unsigned int deviceId = w->getDeviceId(); unsigned int platformIndex = w->getPlatformID(); TestResult* result = w->getResult(); RERUN_TEST: // Acquiring lock on the 'module' object common to all threads moduleLock.lock(); Module* m = w->getModule(); if (m == 0 || m->create_test == 0) return NULL; // If we can, used the cached version, // otherwise create the test. OCLTest* pt = (m->cached_test ? m->cached_test[subtest] : NULL); if (!pt) { pt = m->create_test(subtest); if (pt->cache_test() && m->cached_test) { m->cached_test[subtest] = pt; } } pt->clearError(); OCLTestImp* tmp = pt->toOCLTestImp(); if (tmp) { tmp->setOCLWrapper(w->getOCLWrapper()); } std::string subtestName = m->get_name(subtest); moduleLock.unlock(); if (pt == 0) return NULL; pt->resetDescString(); if (App::m_svcMsg) { oclTestLog(OCLTEST_LOG_ALWAYS, "##%s[testStarted name='%s.%s.%d' " "captureStandardOutput='true']\n", App::m_svcMsg, m->get_libname(), subtestName.c_str(), test); } // setting the type to CPU. if (w->isCPUEnabled()) { pt->useCPU(); } // Setting the device according to the worker thread pt->setDeviceId(w->getDeviceId()); pt->setPlatformIndex(w->getPlatformID()); // Opening the 'test'th subtest of 'pt' pt->open(test, units, conversion, deviceId); pt->clearPerfInfo(); char buffer[256]; sprintf(buffer, "%s[%3d]", subtestName.c_str(), test); oclTestLog(OCLTEST_LOG_ALWAYS, "%-32s", buffer); if (pt->hasErrorOccured()) { result->passed = false; report(w, subtestName.c_str(), test, crc, pt->getErrorMsg(), pt->getPerfInfo(), result, pt->testDescString.c_str()); } else { unsigned int n = calibrate(pt); double timer = run(pt, n); crc = pt->close(); if (pt->hasErrorOccured()) { // run second time if the test fails the first time. if (!second_run && App::m_reRunFailed && !App::m_svcMsg) { second_run = true; // Destroying a test object moduleLock.lock(); if (!pt->cache_test()) { m->destroy_test(pt); } moduleLock.unlock(); pt->clearError(); goto RERUN_TEST; } } result->passed = !pt->hasErrorOccured(); /// print conditional pass if it is passes the second time. if (second_run && result->passed) { report(w, subtestName.c_str(), test, crc, "Conditional PASS", pt->getPerfInfo(), result, pt->testDescString.c_str()); } else { report(w, subtestName.c_str(), test, crc, pt->getErrorMsg(), pt->getPerfInfo(), result, pt->testDescString.c_str()); } } if (App::m_svcMsg) { oclTestLog(OCLTEST_LOG_ALWAYS, "##%s[testFinished name='%s.%s.%d']\n", App::m_svcMsg, m->get_libname(), subtestName.c_str(), test); } // Make sure we clear the error after we report that there was an error. pt->clearError(); // Destroying a test object moduleLock.lock(); if (!pt->cache_test()) { m->destroy_test(pt); } moduleLock.unlock(); return NULL; } void App::PrintTestOrder(int mod_index) { oclTestLog(OCLTEST_LOG_ALWAYS, "Module: %s (%d tests)\n", m_modules[mod_index].name.c_str(), m_modules[mod_index].get_count()); for (unsigned int j = 0; j < m_modules[mod_index].get_count(); j++) { oclTestLog(OCLTEST_LOG_ALWAYS, "%s\n", m_modules[mod_index].get_name(mp_testOrder[j])); } } //! Function that runs all the tests specified in the command-line void App::RunAllTests() { #ifdef _WIN32 if (!m_console) m_window = new Window("Test", 100, 100, m_width, m_height, 0); #endif // // Add all tests to run list if none specified // if (m_tests.size() < 1) { for (unsigned int i = 0; i < m_modules.size(); i++) { for (unsigned int j = 0; j < m_modules[i].get_count(); j++) { AddToList(m_tests, m_modules[i].get_name(j)); } } } unsigned int num_passes = 0; unsigned int num_failures = 0; // // Run each test // for (unsigned int i = 0; i < m_modules.size(); i++) { oclTestLog(OCLTEST_LOG_ALWAYS, "\n-------------------------------------------------\n"); oclTestLog(OCLTEST_LOG_ALWAYS, "The OpenCL Testing Module %s Version = %d \n", m_modules[i].get_libname(), m_modules[i].get_version()); oclTestLog(OCLTEST_LOG_ALWAYS, "-------------------------------------------------\n"); if (App::m_svcMsg) { oclTestLog(OCLTEST_LOG_ALWAYS, "##%s[testSuiteStarted name='ocltst %s']\n", App::m_svcMsg, m_modules[i].get_libname()); } // array to keep track of order of test execution. int test_count = m_modules[i].get_count(); mp_testOrder = new int[test_count]; memset((void*)mp_testOrder, 0, sizeof(*mp_testOrder) * test_count); SetTestRunOrder(i); // // List all tests first if the option was turned on // if (m_list) { PrintTestOrder(i); delete[] mp_testOrder; continue; // return; } for (unsigned int itr_var = 0; itr_var < m_modules[i].get_count(); itr_var++) { // done for random order generation unsigned int subtest = mp_testOrder[itr_var]; const char* name = m_modules[i].get_name(subtest); if (itr_var < m_tests.size() && TestInList(m_tests, name)) { OCLTest* pt = NULL; if (m_modules[i].cached_test) { pt = m_modules[i].cached_test[subtest]; } // Try to use the cached version first! if (!pt) { pt = m_modules[i].create_test(subtest); if (pt->cache_test() && m_modules[i].cached_test) { m_modules[i].cached_test[subtest] = pt; } } int numSubTests = pt->getNumSubTests(); assert(numSubTests > 0); TestIndexList testIndices; GetTestIndexList(testIndices, m_tests, name, numSubTests - 1); TestIndexList avoidIndices; GetTestIndexList(avoidIndices, m_avoid, name, numSubTests - 1); TestIndexList erasedIndices; PruneTestIndexList(testIndices, avoidIndices, erasedIndices); int numTestsRun = 0; for (unsigned int j = 0; j < testIndices.size(); j++) { unsigned int test = testIndices[j]; WaitAllThreads(); AddWorkerThread(i, subtest, test, pt->getThreadUsage(), runSubtest); for (unsigned int thread = 1; (thread < m_threads) && (thread < m_modules.size()); thread++) { AddWorkerThread(thread, subtest, test, pt->getThreadUsage(), dummyThread); } numTestsRun++; } WaitAllThreads(); // Printing the test report // First checking whether the number of subtests is greater than 1. // No point printing report for a one subtest test if (numTestsRun > 0) { if (testReport[0]->success) { num_passes++; } else { num_failures++; } } if (App::m_svcMsg) { for (unsigned int j = 0; j < erasedIndices.size(); j++) { oclTestLog(OCLTEST_LOG_ALWAYS, "##%s[testIgnored name='%s.%s.%d']\n", App::m_svcMsg, m_modules[i].get_libname(), name, erasedIndices[j]); } } // Resetting the values of the test reports // for(unsigned int j = 0; j < m_numDevices; j++) for (unsigned int j = 0; j < 1; j++) { testReport[j]->reset(); } m_modules[i].destroy_test(pt); if (m_modules[i].cached_test) { m_modules[i].cached_test[subtest] = NULL; } } } if (App::m_svcMsg) { oclTestLog(OCLTEST_LOG_ALWAYS, "##%s[testSuiteFinished name='ocltst %s']\n", App::m_svcMsg, m_modules[i].get_libname()); } // print the order in which the test are executed if they are // randomized. if (m_rndOrder) { PrintTestOrder(i); } // deleting the test order delete[] mp_testOrder; } #ifdef _WIN32 if (!m_console && m_window) { ((Window*)m_window)->ConsumeEvents(); } #endif float total_tests = (float)(num_passes + num_failures); float percent_passed = 0.0f; float percent_failed = 0.0f; float percent_total = 0.0f; if (total_tests > 0) { percent_passed = 100.0f * ((float)num_passes / total_tests); percent_failed = 100.0f * ((float)num_failures / total_tests); percent_total = 100.0f * ((float)total_tests / total_tests); } oclTestLog(OCLTEST_LOG_ALWAYS, "\n\n"); oclTestLog(OCLTEST_LOG_ALWAYS, "----------------------------------------\n"); oclTestLog(OCLTEST_LOG_ALWAYS, "Total Passed Tests: %8d (%6.2f%s)\n", num_passes, percent_passed, "%"); oclTestLog(OCLTEST_LOG_ALWAYS, "Total Failed Tests: %8d (%6.2f%s)\n", num_failures, percent_failed, "%"); oclTestLog(OCLTEST_LOG_ALWAYS, "----------------------------------------\n"); oclTestLog(OCLTEST_LOG_ALWAYS, "Total Run Tests: %8d (%6.2f%s)\n", (int)total_tests, percent_total, "%"); oclTestLog(OCLTEST_LOG_ALWAYS, "\n\n"); } ///////////////////////////////////////////////////////////////////////////// void App::AddToList(StringList& strlist, const char* str) { std::string s(str); strlist.push_back(s); } void App::LoadList(StringList& strlist, const char* filename) { char buffer[1024]; FILE* fp = fopen(filename, "r"); if (fp == NULL) return; while (fgets(buffer, 1000, fp) != NULL) { size_t length = strlen(buffer); if (length > 0) { if (buffer[length - 1] != '\n') { length++; } buffer[length - 1] = 0; AddToList(strlist, buffer); } } fclose(fp); } static void Help(const char* name) { oclTestLog(OCLTEST_LOG_ALWAYS, "%s (-w | -V | -m | -M | -l | -t | -T | -p | -d | -x | -y | -g| " "-o | -n )\n", name); oclTestLog(OCLTEST_LOG_ALWAYS, " -w : enable window mode\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -V : enable TeamCity service messages\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -J : enable Jenkins service messages\n"); oclTestLog( OCLTEST_LOG_ALWAYS, " -d : dump test output to portable float map (pfm)\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -m : specify a DLL module with tests\n"); oclTestLog( OCLTEST_LOG_ALWAYS, " -M : specify a text file with one DLL module per line\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -l : list test names in DLL modules and exit\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -s : number of threads to spawn\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -t : run test\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -T : specify a text file with one test per line\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -a : specify a test to avoid\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -A : specify a text file of tests to avoid with " "one test per line\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -p : specify a platform to run on, 'amd','nvidia' " ",'intel' or 'ms'\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -h : this help text\n"); oclTestLog( OCLTEST_LOG_ALWAYS, " -x : x dimension for debug output image (and window)\n"); oclTestLog( OCLTEST_LOG_ALWAYS, " -y : y dimension for debug output image (and window)\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -P : Perflab mode (just print the result without " "any supplementary information)\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -n #number : run the tests specified with -m, -M, -t or -T " "options multiple times\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -r : Option to Randomize the order in which the " "tests are executed.\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -R : Option to ReRun failed tests for conditional " "pass.\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -i : Don't print system information\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -g : GPUid to run the tests on\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -o : dump the output to a specified file\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " -c : Run the test on the CPU device.\n"); oclTestLog(OCLTEST_LOG_ALWAYS, " : \n"); oclTestLog(OCLTEST_LOG_ALWAYS, " : To run only one subtest of a test, append the " "subtest to\n"); oclTestLog( OCLTEST_LOG_ALWAYS, " : the end of the test name in brackets. i.e. test[1]"); oclTestLog(OCLTEST_LOG_ALWAYS, "\n"); exit(0); } unsigned int getPlatformID(const char* str) { std::string strOfCLVendor(str); std::string strOfCLPlatformName; unsigned int platform = 0; // currently, the only input values amd,nvidia and intel are supported if (strOfCLVendor == "amd") { strOfCLPlatformName = "Advanced Micro Devices, Inc."; } else if (strOfCLVendor == "intel") { strOfCLPlatformName = "Intel(R) Corporation"; } else if (strOfCLVendor == "nvidia") { strOfCLPlatformName = "NVIDIA Corporation"; } else if (strOfCLVendor == "ms") { strOfCLPlatformName = "Microsoft"; } else { // fall-back on platform index 0 return platform; } cl_int status; cl_uint numPlatforms = 0; status = clGetPlatformIDs(0, NULL, &numPlatforms); if (status != CL_SUCCESS) { return platform; } cl_platform_id* platforms = new cl_platform_id[numPlatforms]; status = clGetPlatformIDs(numPlatforms, platforms, NULL); if (status == CL_SUCCESS) { unsigned int i; for (i = 0; i < numPlatforms; ++i) { char buff[200]; status = clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(buff), buff, NULL); if (status != CL_SUCCESS) { break; } if (strcmp(buff, strOfCLPlatformName.c_str()) == 0) { platform = i; break; } } } delete[] platforms; return platform; } static const char* supported_options = "dg:lm:M:o:Ps:t:T:a:A:p:v:wxy:in:rcRVJ"; unsigned int parseCommandLineForPlatform(unsigned int argc, char** argv) { int c; unsigned int platform = 0; while ((c = getopt(argc, argv, supported_options)) != -1) { switch (c) { case 'p': platform = getPlatformID(optarg); break; default: break; } } return platform; } void App::CommandLine(unsigned int argc, char** argv) { unsigned int i = 1; int c; bool hasOption = false; unsigned int tmpNumDevices = 0; unsigned int tmpDeviceId = 0; m_deviceId = 0; int tmp; while ((c = getopt(argc, argv, supported_options)) != -1) { switch (c) { case 'c': m_useCPU = true; break; case 'p': break; case 'w': m_console = false; hasOption = true; break; case 'V': m_svcMsg = "teamcity"; break; case 'J': m_svcMsg = "jenkins"; break; case 'd': m_dump = true; hasOption = true; break; case 'm': AddToList(m_paths, optarg); hasOption = true; break; case 'M': LoadList(m_paths, optarg); hasOption = true; break; case 'a': AddToList(m_avoid, optarg); hasOption = true; break; case 'A': LoadList(m_avoid, optarg); hasOption = true; break; case 'l': m_list = true; hasOption = true; break; // command line switch to loop execution of any specified test or tests n // number of times case 'n': m_numItr = atoi(optarg); break; // command line switch to randomize the order of test execution in OCLTest case 'r': m_rndOrder = true; break; // command line switch to rerun the failed tests to see if they pass on // second run case 'R': { m_reRunFailed = true; break; } case 't': AddToList(m_tests, optarg); hasOption = true; break; case 'T': LoadList(m_tests, optarg); hasOption = true; break; case 's': m_threads = atoi(optarg); hasOption = true; break; case 'h': Help(argv[0]); break; case 'x': m_width = atoi(optarg); hasOption = true; break; case 'y': m_height = atoi(optarg); hasOption = true; break; case 'P': m_perflab = true; hasOption = true; break; case 'g': #if 0 tmpNumDevices = (unsigned int)atoi(optarg); if(m_numDevices < tmpNumDevices) { oclTestLog(OCLTEST_LOG_ALWAYS, "Number of Devices(%d) less than specified by the user(%d). Using %d devices.\n", m_numDevices, tmpNumDevices, m_numDevices); } else { m_numDevices = tmpNumDevices; } #else tmpDeviceId = (unsigned int)atoi(optarg); #endif break; case 'v': tmp = atoi(optarg); if (tmp >= 0 && tmp < 100) { oclTestSetLogLevel(atoi(optarg)); } else { oclTestLog(OCLTEST_LOG_ALWAYS, "Invalid verbose level\n"); } break; case 'o': { hasOption = true; oclTestEnableLogToFile(optarg); } break; case 'i': m_noSysInfoPrint = true; break; default: Help(argv[0]); break; } } // Reset devices in case user overrode defaults m_numDevices = findAdapters(m_platform, m_useCPU, &mpform_id); if (m_numDevices < (tmpDeviceId + 1)) { m_deviceId = 0; oclTestLog(OCLTEST_LOG_ALWAYS, "User specified deviceId(%d) exceedes the number of " "Devices(%d). Using device %d.\n", tmpDeviceId, m_numDevices, m_deviceId); } else { m_deviceId = tmpDeviceId; } if (!hasOption) { Help(argv[0]); } } bool App::TestInList(StringList& strlist, const char* szModuleTestname) { if (szModuleTestname == NULL) { return false; } for (unsigned int i = 0; i < strlist.size(); i++) { // check to see if an index is specified for this test name int nIndex = -1; std::string szTestName = strlist[i]; if (szTestName.find("[") != std::string::npos) { size_t nFirstBracket = szTestName.find("["); size_t nLastBracket = szTestName.find("]"); if ((nFirstBracket != std::string::npos) && (nLastBracket != std::string::npos) && (nLastBracket > nFirstBracket)) { szTestName = szTestName.substr(0, nFirstBracket); } } if (strcmp(szModuleTestname, szTestName.c_str()) == 0) { return true; } } return false; } void App::GetTestIndexList(TestIndexList& testIndices, StringList& testList, const char* szModuleTestname, int maxIndex) { for (unsigned int i = 0; i < testList.size(); i++) { IndicesRange nIndex = {0, maxIndex}; // If the test name string ends with [...] parse the text // between the brackets to determine the index range. std::string szTestName = testList[i]; if (szTestName.find("[") != std::string::npos) { size_t nFirstBracket = szTestName.find("["); size_t nLastBracket = szTestName.find("]"); if ((nFirstBracket != std::string::npos) && (nLastBracket != std::string::npos) && (nLastBracket > nFirstBracket)) { // Getting the string between the brackets '[' and ']' // The values can be one of the following:- // [a-b] - Run tests from a to b // [a-] - Run tests from subtest a to subtest total_tests // [-b] - Run tests from subtest 0 to subtest b // a and b are indices of the tests to run std::string nIndexString = szTestName.substr( nFirstBracket + 1, nLastBracket - nFirstBracket - 1); size_t nIntermediateHyphen = szTestName.find("-"); if ((nIntermediateHyphen != std::string::npos) && (nIntermediateHyphen < nLastBracket) && (nIntermediateHyphen > nFirstBracket)) { // Getting the start index if ((nIntermediateHyphen - 1) == nFirstBracket) { nIndex.startIndex = 0; } else { nIndex.startIndex = atoi(szTestName .substr(nFirstBracket + 1, nIntermediateHyphen - nFirstBracket - 1) .c_str()); } // Getting the end index if ((nIntermediateHyphen + 1) == nLastBracket) { nIndex.endIndex = maxIndex; } else { nIndex.endIndex = atoi(szTestName .substr(nIntermediateHyphen + 1, nLastBracket - nIntermediateHyphen - 1) .c_str()); } } else { nIndex.startIndex = atoi( szTestName .substr(nFirstBracket + 1, nLastBracket - nFirstBracket - 1) .c_str()); nIndex.endIndex = nIndex.startIndex; } } szTestName = szTestName.substr(0, nFirstBracket); } if (strcmp(szModuleTestname, szTestName.c_str()) == 0) { // If the values are out of order, swap them. if (nIndex.startIndex > nIndex.endIndex) { int tmp = nIndex.startIndex; nIndex.startIndex = nIndex.endIndex; nIndex.endIndex = tmp; } // Add the indices in the specified range to the list. for (int i = nIndex.startIndex; i <= nIndex.endIndex; ++i) { if (i <= maxIndex) { testIndices.push_back(i); } else { oclTestLog(OCLTEST_LOG_ALWAYS, "Error: Invalid test index for subtest: %s!\n", szModuleTestname); } } // Now sort and prune duplicates. std::sort(testIndices.begin(), testIndices.end()); std::unique(testIndices.begin(), testIndices.end()); } } } void App::PruneTestIndexList(TestIndexList& testIndices, TestIndexList& avoidIndices, TestIndexList& erasedIndices) { for (TestIndexList::iterator it = testIndices.begin(); it != testIndices.end();) { unsigned int index = *it; TestIndexList::iterator result = std::find(avoidIndices.begin(), avoidIndices.end(), index); if (result != avoidIndices.end()) { it = testIndices.erase(it); erasedIndices.push_back(index); } else { ++it; } } } void App::ScanForTests() { for (unsigned int i = 0; i < m_paths.size(); i++) { Module mod; #ifdef _WIN32 std::string::iterator myIter; myIter = m_paths[i].end(); myIter--; if (*myIter == 0x0a) m_paths[i].erase(myIter); mod.hmodule = LoadLibrary(m_paths[i].c_str()); #endif #ifdef __linux__ mod.hmodule = dlopen(m_paths[i].c_str(), RTLD_NOW); #endif if (mod.hmodule == NULL) { fprintf(stderr, "Could not load module: %s\n", m_paths[i].c_str()); #ifdef __linux__ fprintf(stderr, "Error : %s\n", dlerror()); #else #endif } else { mod.name = m_paths[i]; #ifdef _WIN32 mod.get_count = (TestCountFuncPtr)GetProcAddress(mod.hmodule, "OCLTestList_TestCount"); mod.get_name = (TestNameFuncPtr)GetProcAddress(mod.hmodule, "OCLTestList_TestName"); mod.create_test = (CreateTestFuncPtr)GetProcAddress( mod.hmodule, "OCLTestList_CreateTest"); mod.destroy_test = (DestroyTestFuncPtr)GetProcAddress( mod.hmodule, "OCLTestList_DestroyTest"); mod.get_version = (TestVersionFuncPtr)GetProcAddress( mod.hmodule, "OCLTestList_TestLibVersion"); mod.get_libname = (TestLibNameFuncPtr)GetProcAddress( mod.hmodule, "OCLTestList_TestLibName"); #endif #ifdef __linux__ mod.get_count = (TestCountFuncPtr)dlsym(mod.hmodule, "OCLTestList_TestCount"); mod.get_name = (TestNameFuncPtr)dlsym(mod.hmodule, "OCLTestList_TestName"); mod.create_test = (CreateTestFuncPtr)dlsym(mod.hmodule, "OCLTestList_CreateTest"); mod.destroy_test = (DestroyTestFuncPtr)dlsym(mod.hmodule, "OCLTestList_DestroyTest"); mod.get_version = (TestVersionFuncPtr)dlsym(mod.hmodule, "OCLTestList_TestLibVersion"); mod.get_libname = (TestLibNameFuncPtr)dlsym(mod.hmodule, "OCLTestList_TestLibName"); #endif mod.cached_test = new OCLTest*[mod.get_count()]; for (int x = 0, y = mod.get_count(); x < y; ++x) { mod.cached_test[x] = NULL; } m_modules.push_back(mod); } } } void App::CleanUp() { for (unsigned int i = 0; i < m_modules.size(); i++) { if (m_modules[i].cached_test) { delete[] m_modules[i].cached_test; } #ifdef _WIN32 FreeLibrary(m_modules[i].hmodule); #endif #ifdef __linux__ dlclose(m_modules[i].hmodule); #endif } #ifdef _WIN32 if (m_window) delete m_window; m_window = 0; #endif } extern int optind; ///////////////////////////////////////////////////////////////////////////// bool App::m_reRunFailed = false; const char* App::m_svcMsg = nullptr; int main(int argc, char** argv) { #if EMU_ENV printf("Built for Emulation Environment\n"); #endif // EMU_ENV unsigned int platform = 0; platform = parseCommandLineForPlatform(argc, argv); // reset optind as we really didn't parse the full command line optind = 1; App app(platform); #ifdef _WIN32 // this function is registers windows service routine when ocltst is launched // by the OS on service initialization. On other scenarios, this function does // nothing. serviceStubCall(); // SetErrorMode(SEM_NOGPFAULTERRORBOX); // const LPTOP_LEVEL_EXCEPTION_FILTER oldFilter = // SetUnhandledExceptionFilter(xFilter); #endif // _WIN32 #ifdef AUTO_REGRESS try { #endif /* AUTO_REGRESS */ app.CommandLine(argc, argv); app.printOCLinfo(); app.ScanForTests(); for (int i = 0; i < app.GetNumItr(); i++) { app.RunAllTests(); } app.CleanUp(); #ifdef AUTO_REGRESS } catch (...) { oclTestLog(OCLTEST_LOG_ALWAYS, "Exiting due to unhandled exception!\n"); return (-1); } #endif /* AUTO_REGRESS */ return 0; } #ifdef _WIN32 #include typedef unsigned int uint32; typedef size_t uintp; struct StackEntry { uintp addr; uint32 line; uint32 disp; char symbol[128]; char file[128]; }; static const unsigned int MAX_DEPTH_PER_NODE = 24; struct Info { bool operator==(const Info& b) const { return key == b.key; } uintp key; // pointer, handle, whatever StackEntry stack[MAX_DEPTH_PER_NODE]; }; static void dumpTraceBack(CONTEXT& context) { Info info; oclTestLog(OCLTEST_LOG_ALWAYS, "Exception: exiting!\n"); HANDLE process = GetCurrentProcess(); STACKFRAME64 stackframe; memset(&stackframe, 0, sizeof(STACKFRAME64)); #if defined(_WIN64) stackframe.AddrPC.Offset = context.Rip; stackframe.AddrPC.Mode = AddrModeFlat; stackframe.AddrStack.Offset = context.Rsp; stackframe.AddrStack.Mode = AddrModeFlat; stackframe.AddrFrame.Offset = context.Rbp; stackframe.AddrFrame.Mode = AddrModeFlat; #else stackframe.AddrPC.Offset = context.Eip; stackframe.AddrPC.Mode = AddrModeFlat; stackframe.AddrStack.Offset = context.Esp; stackframe.AddrStack.Mode = AddrModeFlat; stackframe.AddrFrame.Offset = context.Ebp; stackframe.AddrFrame.Mode = AddrModeFlat; #endif unsigned int depth = 0; if (SymInitialize(process, NULL, true)) { while ((depth < MAX_DEPTH_PER_NODE) && StackWalk64(IMAGE_FILE_MACHINE_I386, process, GetCurrentThread(), &stackframe, &context, NULL, SymFunctionTableAccess64, SymGetModuleBase64, NULL)) { if (stackframe.AddrPC.Offset != 0) { // // we don't want to evaluate the names/lines yet // so just record the address // info.stack[depth].addr = (uintp)stackframe.AddrPC.Offset; DWORD64 disp64; DWORD disp; IMAGEHLP_SYMBOL64* symInfo; IMAGEHLP_LINE64 lineInfo; uintp addr = (uintp)stackframe.AddrPC.Offset; char buffer[128]; symInfo = (IMAGEHLP_SYMBOL64*)&buffer[0]; symInfo->SizeOfStruct = sizeof(symInfo); symInfo->MaxNameLength = (sizeof(buffer) - sizeof(IMAGEHLP_SYMBOL64)); lineInfo.SizeOfStruct = sizeof(lineInfo); if (SymGetSymFromAddr64(process, addr, &disp64, symInfo)) { sprintf(info.stack[depth].symbol, "%s", symInfo->Name); info.stack[depth].disp = (uint32)disp64; } else { sprintf(info.stack[depth].symbol, ""); } if (SymGetLineFromAddr64(process, addr, &disp, &lineInfo)) { sprintf(info.stack[depth].file, "%s", lineInfo.FileName); info.stack[depth].line = lineInfo.LineNumber; } else { info.stack[depth].file[0] = '\0'; } depth++; } } } SymCleanup(process); int j = 0; while (j < MAX_DEPTH_PER_NODE && info.stack[j].addr != 0) { oclTestLog(OCLTEST_LOG_ALWAYS, " %s()+%d (0x%.8x) %s:%d\n", info.stack[j].symbol, info.stack[j].disp, info.stack[j].addr, info.stack[j].file, info.stack[j].line); j++; } } static LONG WINAPI xFilter(LPEXCEPTION_POINTERS xEP) { CONTEXT context; CONTEXT* xCtx = &context; memset(xCtx, 0, sizeof(CONTEXT)); context.ContextFlags = CONTEXT_FULL; memcpy(xCtx, xEP->ContextRecord, sizeof(CONTEXT)); dumpTraceBack(context); return (EXCEPTION_EXECUTE_HANDLER); } #undef CHECK_RESULT #endif // WIN_OS ///////////////////////////////////////////////////////////////////////////// clr-rocm-5.7.1/opencl/tests/ocltst/env/pfm.cpp000066400000000000000000000046551450307266000212670ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "pfm.h" #ifdef _WIN32 #include #endif #include #include #include #include #include unsigned int SavePFM(const char* filename, const float* buffer, unsigned int width, unsigned int height, unsigned int components) { unsigned int error = 0; // // open the image file for writing // FILE* fh; if ((fh = fopen(filename, "wb")) == NULL) { return 1; } // // write the PFM header // #define PFMEOL "\x0a" fprintf(fh, "PF" PFMEOL "%d %d" PFMEOL "-1" PFMEOL, width, height); fflush(fh); // // write each scanline // const unsigned int lineSize = width * 3; float line[3 * 4096]; for (unsigned int y = height; y > 0; y--) { const float* v = buffer + components * width * (y - 1); for (unsigned int x = 0; x < width; x++) { line[x * 3 + 0] = v[x * components + 0]; line[x * 3 + 1] = (components > 1) ? v[x * components + 1] : v[x * components + 0]; line[x * 3 + 2] = (components > 2) ? v[x * components + 2] : v[x * components + 0]; } unsigned int written = (unsigned int)fwrite(line, (unsigned int)sizeof(float), lineSize, fh); if (written != lineSize) { error = 1; break; } fflush(fh); } fflush(fh); fclose(fh); return error; } clr-rocm-5.7.1/opencl/tests/ocltst/env/pfm.h000066400000000000000000000025041450307266000207230ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _PFM_H_ #define _PFM_H_ extern unsigned int SavePFM(const char* filename, const float* buffer, unsigned int width, unsigned int height, unsigned int components); #endif // _PFM_H_ clr-rocm-5.7.1/opencl/tests/ocltst/env/window.cpp000066400000000000000000000115261450307266000220070ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifdef _WIN32 #include #include #include #include "Window.h" HWND Window::_hWnd; unsigned char* Window::_data; unsigned int Window::_w; unsigned int Window::_h; void Window::OnPaint(void) { PAINTSTRUCT ps; HDC hDC = BeginPaint(_hWnd, &ps); if (_w && _h && _data) { BITMAPINFO bm; bm.bmiColors[0].rgbBlue = 0; bm.bmiColors[0].rgbGreen = 0; bm.bmiColors[0].rgbRed = 0; bm.bmiColors[0].rgbReserved = 0; bm.bmiHeader.biSize = sizeof(BITMAPINFOHEADER); bm.bmiHeader.biWidth = _w; bm.bmiHeader.biHeight = _h; bm.bmiHeader.biPlanes = 1; bm.bmiHeader.biBitCount = 32; bm.bmiHeader.biCompression = BI_RGB; bm.bmiHeader.biSizeImage = 0; bm.bmiHeader.biXPelsPerMeter = 0; bm.bmiHeader.biYPelsPerMeter = 0; bm.bmiHeader.biClrUsed = 0; bm.bmiHeader.biClrImportant = 0; int ret = SetDIBitsToDevice(hDC, 0, 0, _w, _h, 0, 0, 0, _h, _data, &bm, DIB_RGB_COLORS); assert(ret); } EndPaint(_hWnd, &ps); } /***************************************************************************** *****************************************************************************/ LRESULT WINAPI Window::DefWindowProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam) { switch (uMsg) { case WM_CHAR: switch (wParam) { case 27: // ESC exit(0); break; } return 0; case WM_PAINT: OnPaint(); return 0; } return ::DefWindowProc(hWnd, uMsg, wParam, lParam); } Window::Window(const char* title, int x, int y, int width, int height, unsigned int uiStyle) { _data = NULL; _w = 0; _h = 0; WNDCLASS wc = {0, (WNDPROC)Window::DefWindowProc, 0, 0, GetModuleHandle(0), LoadIcon(NULL, IDI_WINLOGO), LoadCursor(NULL, IDC_ARROW), NULL, NULL, "TST"}; if (!RegisterClass(&wc)) { MessageBox(NULL, "RegisterClass() failed", "Error", MB_OK); exit(0); } if (uiStyle == 0) { uiStyle = WS_OVERLAPPEDWINDOW | WS_CLIPSIBLINGS | WS_CLIPCHILDREN; } RECT r = {x, y, x + width, y + height}; AdjustWindowRect(&r, uiStyle, 0); _hWnd = CreateWindow("TST", title, uiStyle, r.left, r.top, r.right - r.left, r.bottom - r.top, NULL, NULL, GetModuleHandle(0), this); if (_hWnd == NULL) { MessageBox(NULL, "CreateWindow() failed.", "Error", MB_OK); exit(0); } ShowWindow(_hWnd, SW_SHOW); UpdateWindow(_hWnd); } Window::~Window(void) { DestroyWindow(_hWnd); if (_data) { delete[] _data; } UnregisterClass("TST", GetModuleHandle(NULL)); } void Window::ConsumeEvents(void) { while (1) { MSG msg; while (PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE)) { GetMessage(&msg, NULL, 0, 0); TranslateMessage(&msg); DispatchMessage(&msg); } } } void Window::ShowImage(unsigned int width, unsigned int height, float* data) { if (_data) { delete[] _data; } _data = new unsigned char[4 * width * height]; _w = width; _h = height; unsigned char* pb = _data; float* p = data; unsigned int i; for (i = 0; i < (unsigned int)(width * height); i++) { // // argb // float v = p[2] > 1.f ? 1.f : (p[2] < 0.f ? 0.f : p[2]); *pb++ = (unsigned char)(255.f * v); v = p[1] > 1.f ? 1.f : (p[1] < 0.f ? 0.f : p[1]); *pb++ = (unsigned char)(255.f * v); v = p[0] > 1.f ? 1.f : (p[0] < 0.f ? 0.f : p[0]); *pb++ = (unsigned char)(255.f * v); v = p[3] > 1.f ? 1.f : (p[3] < 0.f ? 0.f : p[3]); *pb++ = (unsigned char)(255.f * v); p += 4; } RedrawWindow(_hWnd, NULL, NULL, RDW_INVALIDATE); OnPaint(); } #endif // _WIN32 clr-rocm-5.7.1/opencl/tests/ocltst/env/window.h000066400000000000000000000034051450307266000214510ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _WINDOW_H_ #define _WINDOW_H_ #ifdef _WIN32 #include #include class Window { public: typedef LRESULT (*WindowProc)(HWND hW, UINT uMsg, WPARAM wP, LPARAM lP); public: Window(const char* title, int x, int y, int width, int height, unsigned int uiStyle); ~Window(); void ConsumeEvents(void); void ShowImage(unsigned int width, unsigned int height, float* data); private: static LRESULT WINAPI DefWindowProc(HWND hW, UINT uMsg, WPARAM wP, LPARAM lP); static void OnPaint(void); public: static HWND _hWnd; static unsigned char* _data; static unsigned int _w; static unsigned int _h; }; #endif // _WIN32 #endif // _WINDOW_H_ clr-rocm-5.7.1/opencl/tests/ocltst/include/000077500000000000000000000000001450307266000206225ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/include/OCL/000077500000000000000000000000001450307266000212375ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/include/OCL/Thread.h000066400000000000000000000070441450307266000226240ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef OCL_THREAD_H #define OCL_THREAD_H //! //! \file Thread.h //! #ifdef _WIN32 #ifndef _WIN32_WINNT #define _WIN32_WINNT 0x0501 #endif #include "windows.h" #else #include "pthread.h" #endif //! Entry point for the thread //! prototype of the entry point in windows typedef void *(*oclThreadFunc)(void *); namespace OCLutil { //! \class Lock //! \brief Provides a wrapper for locking primitives used to //! synchronize _CPU_ threads. //! //! Common usage would be: //! //! OCL::Lock lock; //! //! .... //! //! // Critical section begins //! //! lock.lock(); //! //! ..... //! //! // Critical section ends //! //! lock.unlock(); //! class Lock { public: //! Constructor for OCLLock Lock(); //! Destructor for OCLLock ~Lock(); //! Try to acquire the lock, if available continue, else wait on the lock void lock(); //! Try to acquire the lock, if available, hold it, else continue doing //! something else bool tryLock(); //! Unlock the lock and return void unlock(); private: ///////////////////////////////////////////////////////////// //! //! Private data members and methods //! //! System specific synchronization primitive #ifdef _WIN32 CRITICAL_SECTION _cs; #else pthread_mutex_t _lock; #endif }; ////////////////////////////////////////////////////////////// //! //! \class Thread //! \brief Provides a wrapper for creating a _CPU_ thread. //! //! This class provides a simple wrapper to a CPU thread/ //! The class name might be a bit confusing, esp considering //! the GPU has it's own threads as well. //! class Thread { public: //! Thread constructor and destructor. Note that the thread is //! NOT created in the constructor. The thread creation takes //! place in the create method Thread(); ~Thread(); //! Wrapper for pthread_create. Pass the thread's entry //! point and data to be passed to the routine bool create(oclThreadFunc func, void *arg); //! Wrapper for pthread_join. The calling thread //! will wait until _this_ thread exits bool join(); //! Get the thread data passed by the application void *getData() { return _data; } //! Get the thread ID static unsigned int getID(); private: ///////////////////////////////////////////////////////////// //! //! Private data members and methods //! #ifdef _WIN32 //! store the handle HANDLE _tid; unsigned int _ID; #else pthread_t _tid; pthread_attr_t _attr; #endif void *_data; }; }; // namespace OCLutil #endif clr-rocm-5.7.1/opencl/tests/ocltst/include/OCLLog.h000066400000000000000000000032021450307266000220470ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef OCLLOG_H_ #define OCLLOG_H_ #ifdef _WIN32 #ifdef OCLTST_LOG_BUILD #define DLLIMPORT __declspec(dllexport) #else #define DLLIMPORT __declspec(dllimport) #endif // OCLTST_ENV_BUILD #else #define DLLIMPORT #endif // _WIN32 enum oclLoggingLevel { OCLTEST_LOG_ALWAYS, OCLTEST_LOG_VERBOSE, }; extern DLLIMPORT void oclTestLog(oclLoggingLevel logLevel, const char* fmt, ...); extern DLLIMPORT void oclTestSetLogLevel(int level); extern DLLIMPORT void oclTestEnableLogToFile(const char* filename); #endif // OCLLOG_H_ clr-rocm-5.7.1/opencl/tests/ocltst/include/OCLTest.h000066400000000000000000000054151450307266000222550ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLTEST_H_ #define _OCLTEST_H_ #include #include "OCLWrapper.h" class BaseTestImp; class OCLTestImp; class OCLTest { public: virtual unsigned int getThreadUsage(void) = 0; virtual int getNumSubTests(void) = 0; virtual void open() = 0; virtual void open(unsigned int test, const char* deviceName, unsigned int architecture) = 0; virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId, unsigned int platformIndex) = 0; virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId) = 0; virtual void run(void) = 0; virtual unsigned int close(void) = 0; virtual void setErrorMsg(const char* error) = 0; virtual const char* getErrorMsg(void) = 0; virtual bool hasErrorOccured(void) = 0; virtual void clearError() = 0; virtual void setDeviceId(unsigned int deviceId) = 0; virtual void setPlatformIndex(unsigned int platformIndex) = 0; virtual OCLTestImp* toOCLTestImp() = 0; virtual BaseTestImp* toBaseTestImp() = 0; virtual float getPerfInfo() = 0; virtual void clearPerfInfo(void) = 0; virtual void setIterationCount(int cnt) = 0; virtual void useCPU() = 0; // Having this return true will allow the creation of the // test to be cached in between runs and will only be // deleted after all the tests are finished running. // This defaults to false as not many tests are modified // to use it. // FIXME: Switch all tests to support caching. virtual bool cache_test() { return true; } std::string testDescString; void resetDescString(void) { testDescString.clear(); } virtual ~OCLTest(){}; }; #endif // _OCLTEST_H_ clr-rocm-5.7.1/opencl/tests/ocltst/include/OCLTestList.h000066400000000000000000000032111450307266000231010ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLMODULE_H_ #define _OCLMODULE_H_ #ifdef _WIN32 #define OCLLCONV __cdecl #endif #ifdef __linux__ #define OCLLCONV #endif class OCLTest; // // exported function pointer typedefs // typedef unsigned int(OCLLCONV *TestCountFuncPtr)(void); typedef const char *(OCLLCONV *TestNameFuncPtr)(unsigned int); typedef OCLTest *(OCLLCONV *CreateTestFuncPtr)(unsigned int); typedef void(OCLLCONV *DestroyTestFuncPtr)(OCLTest *); typedef unsigned int(OCLLCONV *TestVersionFuncPtr)(void); typedef const char *(OCLLCONV *TestLibNameFuncPtr)(void); #endif // _OCLMODULE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/include/OCLTestUtils.h000066400000000000000000000025571450307266000233020ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef OCLTESTUTILS_H_ #define OCLTESTUTILS_H_ #include // @param FN Name of the file to be loaded // @param S String to store the loaded file // @brief Load file to a string // @return true on success bool loadFile(const char* FN, std::string& S); #endif /* OCLTESTUTILS_H_ */ clr-rocm-5.7.1/opencl/tests/ocltst/include/OCLWrapper.h000066400000000000000000000721661450307266000227650ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __OCLWrapper_H #define __OCLWrapper_H #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_USE_DEPRECATED_OPENCL_2_0_APIS #include "CL/cl.h" #include "CL/cl_ext.h" #include "CL/cl_gl.h" #include "cl_profile_amd.h" typedef CL_API_ENTRY cl_int(CL_API_CALL *clUnloadPlatformAMD_fn)( cl_platform_id id); // Function Pointer Declarations for cl_khr_gl_sharing extension (missing in // cl_gl.h) typedef CL_API_ENTRY cl_int(CL_API_CALL *clGetGLContextInfoKHR_fn)( const cl_context_properties *properties, cl_gl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); typedef CL_API_ENTRY cl_mem(CL_API_CALL *clCreateFromGLBuffer_fn)( cl_context context, cl_mem_flags flags, unsigned int bufobj, int *errcode_ret); typedef CL_API_ENTRY cl_mem(CL_API_CALL *clCreateFromGLTexture_fn)( cl_context context, cl_mem_flags flags, unsigned int texture_target, int miplevel, unsigned int texture, cl_int *errcode_ret); typedef CL_API_ENTRY cl_mem(CL_API_CALL *clCreateFromGLTexture2D_fn)( cl_context context, cl_mem_flags flags, unsigned int texture_target, int miplevel, unsigned int texture, cl_int *errcode_ret); typedef CL_API_ENTRY cl_mem(CL_API_CALL *clCreateFromGLRenderbuffer_fn)( cl_context context, cl_mem_flags flags, unsigned int renderbuffer, cl_int *errcode_ret); typedef CL_API_ENTRY cl_int(CL_API_CALL *clGetGLObjectInfo_fn)( cl_mem memobj, cl_gl_object_type *gl_object_type, unsigned int *gl_object_name); typedef CL_API_ENTRY cl_int(CL_API_CALL *clGetGLTextureInfo_fn)( cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); typedef CL_API_ENTRY cl_int(CL_API_CALL *clEnqueueAcquireGLObjects_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); typedef CL_API_ENTRY cl_int(CL_API_CALL *clEnqueueReleaseGLObjects_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); // Function Pointer Declarations for performance counters typedef CL_API_ENTRY cl_perfcounter_amd(CL_API_CALL *clCreatePerfCounterAMD_fn)( cl_device_id device, cl_perfcounter_property *properties, cl_int *errcode_ret); typedef CL_API_ENTRY cl_int(CL_API_CALL *clEnqueueBeginPerfCounterAMD_fn)( cl_command_queue command_queue, cl_uint num_perf_counters, cl_perfcounter_amd *perf_counters, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); typedef CL_API_ENTRY cl_int(CL_API_CALL *clEnqueueEndPerfCounterAMD_fn)( cl_command_queue command_queue, cl_uint num_perf_counters, cl_perfcounter_amd *perf_counters, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); typedef CL_API_ENTRY cl_int(CL_API_CALL *clGetPerfCounterInfoAMD_fn)( cl_perfcounter_amd perf_counter, cl_perfcounter_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); typedef CL_API_ENTRY cl_int(CL_API_CALL *clReleasePerfCounterAMD_fn)( cl_perfcounter_amd perf_counter); typedef CL_API_ENTRY cl_int(CL_API_CALL *clRetainPerfCounterAMD_fn)( cl_perfcounter_amd perf_counter); typedef CL_API_ENTRY cl_int(CL_API_CALL *clSetDeviceClockModeAMD_fn)( cl_device_id device, cl_set_device_clock_mode_input_amd set_clock_mode_input, cl_set_device_clock_mode_output_amd *set_clock_mode_Output); class OCLWrapper { public: OCLWrapper(); ~OCLWrapper() {} // All OCL APIs are declared in the order they appear in cl.h cl_int clGetPlatformIDs(cl_uint num_entries, cl_platform_id *platforms, cl_uint *num_platforms); cl_int clGetPlatformInfo(cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clGetDeviceIDs(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices); cl_int clGetDeviceInfo(cl_device_id device, cl_device_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_context clCreateContext(cl_context_properties *properties, cl_uint num_devices, const cl_device_id *devices, void(CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void *user_data, cl_int *errcode_ret); cl_context clCreateContextFromType( cl_context_properties *properties, cl_device_type device_type, void(CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void *user_data, cl_int *errcode_ret); cl_int clRetainContext(cl_context context); cl_int clReleaseContext(cl_context context); cl_int clGetContextInfo(cl_context context, cl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_command_queue clCreateCommandQueue(cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int *errcode_ret); cl_int clRetainCommandQueue(cl_command_queue command_queue); cl_int clReleaseCommandQueue(cl_command_queue command_queue); cl_int clGetCommandQueueInfo(cl_command_queue command_queue, cl_command_queue_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_mem clCreateBuffer(cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret); cl_mem clCreateImage2D(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, size_t image_width, size_t image_height, size_t image_row_pitch, void *host_ptr, cl_int *errcode_ret); cl_mem clCreateImage3D(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, size_t image_width, size_t image_height, size_t image_depth, size_t image_row_pitch, size_t image_slice_pitch, void *host_ptr, cl_int *errcode_ret); cl_int clRetainMemObject(cl_mem memobj); cl_int clReleaseMemObject(cl_mem memobj); cl_int clGetSupportedImageFormats(cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format *image_formats, cl_uint *num_image_formats); cl_int clGetMemObjectInfo(cl_mem memobj, cl_mem_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clGetImageInfo(cl_mem image, cl_image_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_sampler clCreateSampler(cl_context context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int *errcode_ret); cl_int clRetainSampler(cl_sampler sampler); cl_int clReleaseSampler(cl_sampler sampler); cl_int clGetSamplerInfo(cl_sampler sampler, cl_sampler_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_program clCreateProgramWithSource(cl_context context, cl_uint count, const char **strings, const size_t *lengths, cl_int *errcode_ret); cl_program clCreateProgramWithBinary(cl_context context, cl_uint num_devices, const cl_device_id *device_list, const size_t *lengths, const unsigned char **binaries, cl_int *binary_status, cl_int *errcode_ret); cl_int clRetainProgram(cl_program program); cl_int clReleaseProgram(cl_program program); cl_int clBuildProgram(cl_program program, cl_uint num_devices, const cl_device_id *device_list, const char *options, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data); cl_int clCompileProgram( cl_program program, cl_uint num_devices, const cl_device_id *device_list, const char *options, cl_uint num_input_headers, const cl_program *input_headers, const char **header_include_names, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data); cl_program clLinkProgram(cl_context context, cl_uint num_devices, const cl_device_id *device_list, const char *options, cl_uint num_input_programs, const cl_program *input_programs, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data, cl_int *errcode_ret); cl_int clUnloadCompiler(void); cl_int clUnloadPlatform(cl_platform_id); cl_int clGetProgramInfo(cl_program program, cl_program_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clGetProgramBuildInfo(cl_program program, cl_device_id device, cl_program_build_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_kernel clCreateKernel(cl_program program, const char *kernel_name, cl_int *errcode_ret); cl_int clCreateKernelsInProgram(cl_program program, cl_uint num_kernels, cl_kernel *kernels, cl_uint *num_kernels_ret); cl_int clRetainKernel(cl_kernel kernel); cl_int clReleaseKernel(cl_kernel kernel); cl_int clSetKernelArg(cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void *arg_value); cl_int clGetKernelInfo(cl_kernel kernel, cl_kernel_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clGetKernelWorkGroupInfo(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clWaitForEvents(cl_uint num_events, const cl_event *event_list); cl_int clGetEventInfo(cl_event evnt, cl_event_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clRetainEvent(cl_event evnt); cl_int clReleaseEvent(cl_event evnt); cl_int clGetEventProfilingInfo(cl_event evnt, cl_profiling_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clFlush(cl_command_queue command_queue); cl_int clFinish(cl_command_queue command_queue); cl_int clEnqueueReadBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t cb, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueWriteBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t cb, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueCopyBuffer(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueReadImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_read, const size_t *origin, const size_t *region, size_t row_pitch, size_t slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueWriteImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_write, const size_t *origin, const size_t *region, size_t input_row_pitch, size_t input_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueCopyImage(cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image, const size_t *src_origin, const size_t *dst_origin, const size_t *region, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueCopyImageToBuffer(cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buffer, const size_t *src_origin, const size_t *region, size_t dst_offset, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueCopyBufferToImage(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_image, size_t src_offset, const size_t *dst_origin, const size_t *region, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); void *clEnqueueMapBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map, cl_map_flags map_flags, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt, cl_int *errcode_ret); void *clEnqueueMapImage(cl_command_queue command_queue, cl_mem image, cl_bool blocking_map, cl_map_flags map_flags, const size_t *origin, const size_t *region, size_t *image_row_pitch, size_t *image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt, cl_int *errcode_ret); cl_int clEnqueueUnmapMemObject(cl_command_queue command_queue, cl_mem memobj, void *mapped_ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueNDRangeKernel( cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t *global_work_offset, const size_t *global_work_size, const size_t *local_work_size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueTask(cl_command_queue command_queue, cl_kernel kernel, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueNativeKernel(cl_command_queue command_queue, void(CL_CALLBACK *user_func)(void *), void *args, size_t cb_args, cl_uint num_mem_objects, const cl_mem *mem_list, const void **args_mem_loc, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueMarker(cl_command_queue command_queue, cl_event *evnt); cl_int clEnqueueMarkerWithWaitList(cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueWaitForEvents(cl_command_queue command_queue, cl_uint num_events, const cl_event *event_list); cl_int clEnqueueBarrier(cl_command_queue command_queue); void *clGetExtensionFunctionAddress(const char *func_name); cl_int clEnqueueReadBufferRect( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueWriteBufferRect( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clEnqueueCopyBufferRect( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, const size_t *src_origin, const size_t *dst_origin, const size_t *region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_mem clCreateImage(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, const cl_image_desc *image_desc, void *host_ptr, cl_int *errcode_ret); cl_mem clCreateSubBuffer(cl_mem mem, cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void *buffer_create_info, cl_int *errcode_ret); cl_int clSetEventCallback( cl_event event, cl_int command_exec_callback_type, void(CL_CALLBACK *pfn_event_notify)(cl_event event, cl_int event_command_exec_status, void *user_data), void *user_data); cl_int clEnqueueFillImage(cl_command_queue command_queue, cl_mem image, void *ptr, const size_t *origin, const size_t *region, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt); cl_int clUnloadPlatformAMD(cl_platform_id id); cl_int clEnqueueWaitSignalAMD(cl_command_queue command_queue, cl_mem mem_object, cl_uint value, cl_uint num_events, const cl_event *event_wait_list, cl_event *event); cl_int clEnqueueWriteSignalAMD(cl_command_queue command_queue, cl_mem mem_object, cl_uint value, cl_ulong offset, cl_uint num_events, const cl_event *event_list, cl_event *event); cl_int clEnqueueMakeBuffersResidentAMD( cl_command_queue command_queue, cl_uint num_mem_objs, cl_mem *mem_objects, cl_bool blocking_make_resident, cl_bus_address_amd *bus_addresses, cl_uint num_events, const cl_event *event_list, cl_event *event); cl_int clEnqueueMigrateMemObjects(cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem *mem_objects, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); // CL-GL Extension: cl_khr_gl_sharing cl_int clGetGLContextInfoKHR(const cl_context_properties *properties, cl_gl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_mem clCreateFromGLBuffer(cl_context context, cl_mem_flags flags, unsigned int bufobj, int *errcode_ret); cl_mem clCreateFromGLTexture(cl_context context, cl_mem_flags flags, unsigned int texture_target, int miplevel, unsigned int texture, cl_int *errcode_ret); cl_mem clCreateFromGLTexture2D(cl_context context, cl_mem_flags flags, unsigned int texture_target, int miplevel, unsigned int texture, cl_int *errcode_ret); cl_mem clCreateFromGLRenderbuffer(cl_context context, cl_mem_flags flags, unsigned int renderbuffer, cl_int *errcode_ret); cl_int clGetGLObjectInfo(cl_mem memobj, cl_gl_object_type *gl_object_type, unsigned int *gl_object_name); cl_int clGetGLTextureInfo(cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clEnqueueAcquireGLObjects(cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); cl_int clEnqueueReleaseGLObjects(cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); #if defined(CL_VERSION_2_0) cl_command_queue clCreateCommandQueueWithProperties( cl_context context, cl_device_id device, const cl_queue_properties *properties, cl_int *errcode_ret); void *clSVMAlloc(cl_context context, cl_svm_mem_flags flags, size_t size, cl_uint alignment); void clSVMFree(cl_context context, void *svm_pointer); cl_int clEnqueueSVMMap(cl_command_queue command_queue, cl_bool blocking_map, cl_map_flags flags, void *svm_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); cl_int clEnqueueSVMUnmap(cl_command_queue command_queue, void *svm_ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); cl_int clEnqueueSVMMemFill(cl_command_queue command_queue, void *svm_ptr, const void *pattern, size_t pattern_size, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); cl_int clSetKernelArgSVMPointer(cl_kernel kernel, cl_uint arg_index, const void *arg_value); cl_mem clCreatePipe(cl_context context, cl_mem_flags flags, cl_uint packet_size, cl_uint num_packets, const cl_pipe_properties *properties, cl_int *errcode_ret); cl_int clGetPipeInfo(cl_mem pipe, cl_pipe_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); #endif cl_perfcounter_amd clCreatePerfCounterAMD(cl_device_id device, cl_perfcounter_property *properties, cl_int *errcode_ret); cl_int clEnqueueBeginPerfCounterAMD(cl_command_queue command_queue, cl_uint num_perf_counters, cl_perfcounter_amd *perf_counters, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); cl_int clEnqueueEndPerfCounterAMD(cl_command_queue command_queue, cl_uint num_perf_counters, cl_perfcounter_amd *perf_counters, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); cl_int clGetPerfCounterInfoAMD(cl_perfcounter_amd perf_counter, cl_perfcounter_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); cl_int clReleasePerfCounterAMD(cl_perfcounter_amd perf_counter); cl_int clRetainPerfCounterAMD(cl_perfcounter_amd perf_counter); cl_int clSetDeviceClockModeAMD( cl_device_id device, cl_set_device_clock_mode_input_amd set_clock_mode_input, cl_set_device_clock_mode_output_amd *set_clock_mode_Output); private: clEnqueueWaitSignalAMD_fn clEnqueueWaitSignalAMD_ptr; clEnqueueWriteSignalAMD_fn clEnqueueWriteSignalAMD_ptr; clEnqueueMakeBuffersResidentAMD_fn clEnqueueMakeBuffersResidentAMD_ptr; // Unload the platform clUnloadPlatformAMD_fn clUnloadPlatformAMD_ptr; // CL-GL Extension: cl_khr_gl_sharing clGetGLContextInfoKHR_fn clGetGLContextInfoKHR_ptr; clCreateFromGLBuffer_fn clCreateFromGLBuffer_ptr; clCreateFromGLTexture_fn clCreateFromGLTexture_ptr; clCreateFromGLTexture2D_fn clCreateFromGLTexture2D_ptr; clCreateFromGLRenderbuffer_fn clCreateFromGLRenderbuffer_ptr; clGetGLObjectInfo_fn clGetGLObjectInfo_ptr; clGetGLTextureInfo_fn clGetGLTextureInfo_ptr; clEnqueueAcquireGLObjects_fn clEnqueueAcquireGLObjects_ptr; clEnqueueReleaseGLObjects_fn clEnqueueReleaseGLObjects_ptr; // Performance counters clCreatePerfCounterAMD_fn clCreatePerfCounterAMD_ptr; clEnqueueBeginPerfCounterAMD_fn clEnqueueBeginPerfCounterAMD_ptr; clEnqueueEndPerfCounterAMD_fn clEnqueueEndPerfCounterAMD_ptr; clGetPerfCounterInfoAMD_fn clGetPerfCounterInfoAMD_ptr; clReleasePerfCounterAMD_fn clReleasePerfCounterAMD_ptr; clRetainPerfCounterAMD_fn clRetainPerfCounterAMD_ptr; // Set clockMode clSetDeviceClockModeAMD_fn clSetDeviceClockModeAMD_ptr; }; #endif clr-rocm-5.7.1/opencl/tests/ocltst/module/000077500000000000000000000000001450307266000204645ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/module/common/000077500000000000000000000000001450307266000217545ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/module/common/BaseTestImp.cpp000066400000000000000000000126521450307266000246460ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "BaseTestImp.h" #include #include #include #include ///////////////////////////////////////////////////////////////////////////// static unsigned int crcinit(unsigned int crc); static int initializeSeed(void); ///////////////////////////////////////////////////////////////////////////// BaseTestImp::BaseTestImp() : _numSubTests(0), _openTest(0), _deviceName(NULL), _architecture(0) { _cpu = false; unsigned int i; for (i = 0; i < 256; i++) { _crctab[i] = crcinit(i << 24); } _crcword = ~0; _deviceId = 0; _platformIndex = 0; _perfInfo = 0.0f; #ifdef __linux__ // _useThreads = 0; // disable threads on linux #else _useThreads = 1; // if available on platform #endif clearError(); } void BaseTestImp::checkComplib(unsigned int test, const char *deviceName, unsigned int architecture) { BaseTestImp::open(); devices_ = 0; deviceCount_ = 0; context_ = 0; program_ = 0; kernel_ = 0; type_ = CL_DEVICE_TYPE_GPU; cl_uint numPlatforms = 0; error_ = clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT((error_ != CL_SUCCESS), "clGetPlatformIDs failed"); CHECK_RESULT((numPlatforms == 0), "No platform found"); cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); cl_platform_id platform = 0; #if 0 for(unsigned int i = 0; i < numPlatforms; ++i) { char buff[200]; error_ = clGetPlatformInfo(platforms[i],CL_PLATFORM_VENDOR, sizeof(buff), buff, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); if(strcmp(buff, "Advanced Micro Devices, Inc.") == 0) { platform = platforms[i]; break; } } #endif platform = platforms[_platformIndex]; delete[] platforms; CHECK_RESULT((platform == 0), "AMD Platform not found"); error_ = clGetDeviceIDs(platform, type_, 0, NULL, &deviceCount_); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs() failed"); devices_ = new cl_device_id[deviceCount_]; error_ = clGetDeviceIDs(platform, type_, deviceCount_, devices_, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs() failed"); char device_string[200]; clGetDeviceInfo(devices_[_deviceId], CL_DRIVER_VERSION, sizeof(device_string), &device_string, NULL); if (strstr(device_string, "LC")) { printf("Skipping test since it does not run with LC\n"); failed_ = true; return; } return; } BaseTestImp::~BaseTestImp() {} void BaseTestImp::open() { _crcword = 0; clearError(); } void BaseTestImp::open(unsigned int test, const char *deviceName, unsigned int architecture) { open(); } unsigned int BaseTestImp::close() { return _crcword; } unsigned int BaseTestImp::getThreadUsage(void) { return _useThreads; } int BaseTestImp::getNumSubTests(void) { return _numSubTests; } void BaseTestImp::setDeviceName(const char *name) { _deviceName = name; } const char *BaseTestImp::getDeviceName() { return _deviceName; } float BaseTestImp::getPerfInfo(void) { return _perfInfo; } void BaseTestImp::clearPerfInfo(void) { _perfInfo = 0.0; } void BaseTestImp::setDeviceId(unsigned int deviceId) { _deviceId = deviceId; } void BaseTestImp::setIterationCount(int cnt) { _iterationCnt = cnt; } unsigned int BaseTestImp::getDeviceId() { return _deviceId; } void BaseTestImp::setPlatformIndex(unsigned int platformIndex) { _platformIndex = platformIndex; } unsigned int BaseTestImp::getPlatformIndex() { return _platformIndex; } void BaseTestImp::setErrorMsg(const char *error) { _errorFlag = true; _errorMsg.assign((const char *)error); } const char *BaseTestImp::getErrorMsg() { return _errorMsg.c_str(); } bool BaseTestImp::hasErrorOccured() { return _errorFlag; } void BaseTestImp::clearError() { _errorFlag = false; _errorMsg.clear(); } ///////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////////// // // Same CRC32 as used by ogtst // static const unsigned int CRCMASK = 0x04c11db7; static unsigned int crcinit(unsigned int crc) { int i; unsigned int ans = crc; for (i = 0; i < 8; i++) { if (ans & 0x80000000) { ans = (ans << 1) ^ CRCMASK; } else { ans <<= 1; } } return (ans); } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/CMakeLists.txt000066400000000000000000000017421450307266000245200ustar00rootroot00000000000000set(COMMON_SOURCES ${OCLTST_DIR}/module/common/BaseTestImp.cpp ${OCLTST_DIR}/module/common/OCLTestImp.cpp ${OCLTST_DIR}/module/common/OCLTestListImp.cpp ${OCLTST_DIR}/module/common/OCLTestUtils.cpp ${OCLTST_DIR}/module/common/OCLThread.cpp ${OCLTST_DIR}/module/common/OCLWrapper.cpp ${OCLTST_DIR}/module/common/Timer.cpp) add_library(Common OBJECT ${COMMON_SOURCES}) set_target_properties(Common PROPERTIES CXX_STANDARD 14 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF POSITION_INDEPENDENT_CODE ON) target_compile_definitions(Common PUBLIC CL_TARGET_OPENCL_VERSION=220) if(EMU_ENV) target_compile_definitions(Common PUBLIC EMU_ENV=1) endif() target_include_directories(Common PUBLIC ${OPENCL_ICD_LOADER_HEADERS_DIR} ${OCLTST_DIR}/include ${OCLTST_DIR}/module/common ${OCLTST_DIR}/module/include ${PROJECT_SOURCE_DIR}/amdocl) #TODO remove cl_profile_amd.h dependency clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLGLCommon.cpp000066400000000000000000000124201450307266000244700ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLCommon.h" #include #include void OCLGLCommon::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { // OpenCL Initialization OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test (%d)", error_); char name[1024] = {0}; size_t size = 0; if (deviceId >= deviceCount_) { _errorFlag = true; return; } // Check that the device supports CL/GL interop extension _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_EXTENSIONS, 1024, name, &size); if (!strstr(name, "cl_khr_gl_sharing")) { printf("KHR GL sharing extension is required for this test!\n"); _errorFlag = true; return; } // OpenGL Initialization bool retVal = initializeGLContext(hGL_); CHECK_RESULT((retVal == CL_SUCCESS), "Error opening test (%d)", error_); createCLContextFromGLContext(hGL_); } bool OCLGLCommon::IsGLEnabled(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); bool bResult = initializeGLContext(hGL_); if (bResult) { deleteGLContext(hGL_); } OCLTestImp::close(); return bResult; } void OCLGLCommon::gluPerspective(double fovy, double aspect, double zNear, double zFar) { double xmin, xmax, ymin, ymax; ymax = zNear * tan(fovy * 3.149 / 360.0); ymin = -ymax; xmin = ymin * aspect; xmax = ymax * aspect; glFrustum(xmin, xmax, ymin, ymax, zNear, zFar); } unsigned int OCLGLCommon::close(void) { makeCurrent(hGL_); unsigned int retVal = OCLTestImp::close(); deleteGLContext(hGL_); return retVal; } void OCLGLCommon::dumpBuffer(float *pBuffer, const char fileName[], unsigned int dimSize) { if (pBuffer) { FILE *f = fopen(fileName, "w"); if (NULL != f) { unsigned int i, j; for (i = 0; i < dimSize; i++) { for (j = 0; j < dimSize; j++) { fprintf(f, "%e,\t", pBuffer[i * (dimSize) + j]); } fprintf(f, "\n"); } fclose(f); } } } bool OCLGLCommon::createGLFragmentProgramFromSource(const char *source, GLuint &shader, GLuint &program) { shader = glCreateShader(GL_FRAGMENT_SHADER); glShaderSource(shader, 1, &source, NULL); glCompileShader(shader); printShaderInfoLog(shader); program = glCreateProgram(); glAttachShader(program, shader); glLinkProgram(program); printProgramInfoLog(program); return program != 0; } int OCLGLCommon::printOglError(char *file, int line) { // // Returns 1 if an OpenGL error occurred, 0 otherwise. // GLenum glErr; int retCode = 0; glErr = glGetError(); if (glErr != GL_NO_ERROR) { printf("glError in file %s @ line %d: %d\n", file, line, glErr); retCode = 1; } return retCode; } // // Print out the information log for a shader object // void OCLGLCommon::printShaderInfoLog(GLuint shader) { int infologLength = 0; int charsWritten = 0; GLchar *infoLog; glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &infologLength); if (infologLength > 0) { infoLog = (GLchar *)malloc(infologLength); if (infoLog == NULL) { printf("ERROR: Could not allocate InfoLog buffer\n"); return; } glGetShaderInfoLog(shader, infologLength, &charsWritten, infoLog); printf("Shader InfoLog:\n%s\n\n", infoLog); free(infoLog); } } void OCLGLCommon::printProgramInfoLog(GLuint program) { int infologLength = 0; int charsWritten = 0; GLchar *infoLog; // printOpenGLError(); // Check for OpenGL errors glGetProgramiv(program, GL_INFO_LOG_LENGTH, &infologLength); // printOpenGLError(); // Check for OpenGL errors if (infologLength > 0) { infoLog = (GLchar *)malloc(infologLength); if (infoLog == NULL) { printf("ERROR: Could not allocate InfoLog buffer\n"); exit(1); } glGetProgramInfoLog(program, infologLength, &charsWritten, infoLog); printf("Program InfoLog:\n%s\n\n", infoLog); free(infoLog); } // printOpenGLError(); // Check for OpenGL errors } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLGLCommon.h000066400000000000000000000061171450307266000241430ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_COMMON_H_ #define _OCL_GL_COMMON_H_ #include #include #include #include #include #include "OCLTestImp.h" typedef struct OCLGLHandle_* OCLGLHandle; #define printOpenGLError() OCLGLCommon::printOglError(__FILE__, __LINE__) class OCLGLCommon : public OCLTestImp { public: ///////////////////////////////////////// // private initialization and clean-up // ///////////////////////////////////////// OCLGLCommon(); virtual ~OCLGLCommon(); /////////////////////// // virtual interface // /////////////////////// virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual unsigned int close(void); static void gluPerspective(double fovy, double aspect, double zNear, double zFar); static void dumpBuffer(float* pBuffer, const char fileName[], unsigned int dimSize); static int printOglError(char* file, int line); static bool createGLFragmentProgramFromSource(const char* source, GLuint& shader, GLuint& program); static void printShaderInfoLog(GLuint shader); static void printProgramInfoLog(GLuint program); protected: const OCLGLHandle getGLHandle() { return hGL_; } void makeCurrent(const OCLGLHandle hGL); void getCLContextPropertiesFromGLContext(const OCLGLHandle hGL, cl_context_properties properties[7]); bool createGLContext(OCLGLHandle& hGL); void destroyGLContext(OCLGLHandle& hGL); bool IsGLEnabled(unsigned int test, char* units, double& conversion, unsigned int deviceId); private: bool initializeGLContext(OCLGLHandle& hGL); void deleteGLContext(OCLGLHandle& hGL); bool checkAssociationDeviceWithGLContext(OCLGLHandle& hGL); void createCLContextFromGLContext(OCLGLHandle& hGL); OCLGLHandle hGL_; }; #endif // _OCL_GL_COMMON_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLGLCommonLinux.cpp000066400000000000000000000173741450307266000255250ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLCommon.h" struct OCLGLHandle_ { static Display* display; static XVisualInfo* vInfo; static int referenceCount; GLXContext context; Window window; Colormap cmap; }; Display* OCLGLHandle_::display = NULL; XVisualInfo* OCLGLHandle_::vInfo = NULL; int OCLGLHandle_::referenceCount = 0; OCLGLCommon::OCLGLCommon() { hGL_ = new OCLGLHandle_; hGL_->context = NULL; hGL_->window = 0; hGL_->cmap = 0; } OCLGLCommon::~OCLGLCommon() { destroyGLContext(hGL_); } void OCLGLCommon::destroyGLContext(OCLGLHandle& hGL) { deleteGLContext(hGL); delete hGL; hGL = NULL; } void OCLGLCommon::deleteGLContext(OCLGLHandle& hGL) { if (hGL->display != NULL) { glXMakeCurrent(hGL->display, None, NULL); if (hGL->cmap) { XFreeColormap(hGL->display, hGL->cmap); hGL->cmap = 0; } if (hGL->window) { XDestroyWindow(hGL->display, hGL->window); hGL->window = 0; } if (hGL->context) { glXDestroyContext(hGL->display, hGL->context); hGL->context = NULL; } hGL->referenceCount--; if (hGL->referenceCount == 0) { XCloseDisplay(hGL->display); hGL->display = NULL; XFree(hGL->vInfo); hGL->vInfo = NULL; } } } bool OCLGLCommon::createGLContext(OCLGLHandle& hGL) { hGL = new OCLGLHandle_; return initializeGLContext(hGL); } bool OCLGLCommon::initializeGLContext(OCLGLHandle& hGL) { if (hGL->display == NULL) { hGL->display = XOpenDisplay(NULL); if (hGL->display == NULL) { printf("XOpenDisplay() failed\n"); return false; } } if (hGL->vInfo == NULL) { int dblBuf[] = {GLX_RGBA, GLX_RED_SIZE, 1, GLX_GREEN_SIZE, 1, GLX_BLUE_SIZE, 1, GLX_DEPTH_SIZE, 12, GLX_DOUBLEBUFFER, None}; hGL->vInfo = glXChooseVisual(hGL->display, DefaultScreen(hGL->display), dblBuf); if (hGL->vInfo == NULL) { printf("glXChooseVisual() failed\n"); return false; } } hGL->referenceCount++; hGL->context = glXCreateContext(hGL->display, hGL->vInfo, None, True); if (hGL->context == NULL) { printf("glXCreateContext() failed\n"); return false; } XSetWindowAttributes swa = {0}; hGL->cmap = XCreateColormap(hGL->display, RootWindow(hGL->display, hGL->vInfo->screen), hGL->vInfo->visual, AllocNone); swa.colormap = hGL->cmap; hGL->window = XCreateWindow( hGL->display, RootWindow(hGL->display, hGL->vInfo->screen), 0, 0, 640, 480, 0, hGL->vInfo->depth, InputOutput, hGL->vInfo->visual, CWBorderPixel | CWColormap | CWEventMask, &swa); Bool glErr = glXMakeCurrent(hGL->display, hGL->window, hGL->context); if (False == glErr) { return false; } if (!checkAssociationDeviceWithGLContext(hGL)) { deleteGLContext(hGL); return false; } return true; } bool OCLGLCommon::checkAssociationDeviceWithGLContext(OCLGLHandle& hGL) { bool ret = false; size_t devicesSize = 0; cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform_, CL_GL_CONTEXT_KHR, (cl_context_properties)hGL->context, CL_GLX_DISPLAY_KHR, (cl_context_properties)hGL->display, 0}; error_ = _wrapper->clGetGLContextInfoKHR( properties, CL_DEVICES_FOR_GL_CONTEXT_KHR, 0, NULL, &devicesSize); if (error_ != CL_SUCCESS) { printf("clGetGLContextInfoKHR failed (%d)\n", error_); return false; } cl_uint numDevices = (cl_uint)devicesSize / sizeof(cl_device_id); cl_device_id* interopDevices = (cl_device_id*)malloc(devicesSize); error_ = _wrapper->clGetGLContextInfoKHR(properties, CL_DEVICES_FOR_GL_CONTEXT_KHR, devicesSize, interopDevices, NULL); if (error_ != CL_SUCCESS) { printf("clGetGLContextInfoKHR failed (%d)\n", error_); free(interopDevices); return false; } // Check that current device can be associated with OpenGL context for (unsigned int i = 0; i < numDevices; i++) { if (interopDevices[i] == devices_[_deviceId]) { ret = true; break; } } free(interopDevices); return ret; } void OCLGLCommon::createCLContextFromGLContext(OCLGLHandle& hGL) { cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform_, CL_GL_CONTEXT_KHR, (cl_context_properties)hGL->context, CL_GLX_DISPLAY_KHR, (cl_context_properties)hGL->display, 0}; // Release current command queue if (cmdQueues_[_deviceId]) { error_ = _wrapper->clReleaseCommandQueue(cmdQueues_[_deviceId]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } // Release current context if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseContext() failed"); } // Create new CL context from GL context context_ = clCreateContext(properties, 1, &devices_[_deviceId], NULL, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateContext() failed (%d)", error_); // Create command queue for new context cmdQueues_[_deviceId] = _wrapper->clCreateCommandQueue(context_, devices_[_deviceId], 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed (%d)", error_); // GLEW versions 1.13.0 and earlier do not fetch all GL function pointers // without glewExperimental set. glewExperimental = GL_TRUE; GLenum glErr = glewInit(); CHECK_RESULT((glErr != GLEW_OK), "glewInit() failed: %s", glewGetErrorString(glErr)); } void OCLGLCommon::makeCurrent(OCLGLHandle hGL) { if (hGL == NULL) { if (hGL_ != NULL) { glXMakeCurrent(hGL_->display, None, NULL); } } else { bool ret = glXMakeCurrent(hGL->display, hGL->window, hGL->context); assert(ret && "glXMakeCurrent failed!"); } } void OCLGLCommon::getCLContextPropertiesFromGLContext( const OCLGLHandle hGL, cl_context_properties properties[7]) { if (!properties) return; properties[0] = CL_CONTEXT_PLATFORM; properties[1] = (cl_context_properties)platform_; properties[2] = CL_GL_CONTEXT_KHR; properties[3] = (cl_context_properties)hGL->context; properties[4] = CL_GLX_DISPLAY_KHR; properties[5] = (cl_context_properties)hGL->display; properties[6] = 0; } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLGLCommonWindows.cpp000066400000000000000000000164501450307266000260520ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLCommon.h" struct OCLGLHandle_ { HDC hdc; HGLRC hglrc; }; OCLGLCommon::OCLGLCommon() { hGL_ = new OCLGLHandle_; hGL_->hdc = NULL; hGL_->hglrc = NULL; } OCLGLCommon::~OCLGLCommon() { destroyGLContext(hGL_); } void OCLGLCommon::destroyGLContext(OCLGLHandle& hGL) { deleteGLContext(hGL); delete hGL; hGL = NULL; } void OCLGLCommon::deleteGLContext(OCLGLHandle& hGL) { wglMakeCurrent(NULL, NULL); if (hGL->hglrc) { wglDeleteContext(hGL->hglrc); hGL->hglrc = NULL; } if (hGL->hdc) { DeleteDC(hGL->hdc); hGL->hdc = NULL; } } bool OCLGLCommon::createGLContext(OCLGLHandle& hGL) { hGL = new OCLGLHandle_; return initializeGLContext(hGL); } bool OCLGLCommon::initializeGLContext(OCLGLHandle& hGL) { BOOL glErr = FALSE; DISPLAY_DEVICE dispDevice; DWORD deviceNum; int pfmt; PIXELFORMATDESCRIPTOR pfd; pfd.nSize = sizeof(PIXELFORMATDESCRIPTOR); pfd.nVersion = 1; pfd.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL | PFD_DOUBLEBUFFER; pfd.iPixelType = PFD_TYPE_RGBA; pfd.cColorBits = 24; pfd.cRedBits = 8; pfd.cRedShift = 0; pfd.cGreenBits = 8; pfd.cGreenShift = 0; pfd.cBlueBits = 8; pfd.cBlueShift = 0; pfd.cAlphaBits = 8; pfd.cAlphaShift = 0; pfd.cAccumBits = 0; pfd.cAccumRedBits = 0; pfd.cAccumGreenBits = 0; pfd.cAccumBlueBits = 0; pfd.cAccumAlphaBits = 0; pfd.cDepthBits = 24; pfd.cStencilBits = 8; pfd.cAuxBuffers = 0; pfd.iLayerType = PFD_MAIN_PLANE; pfd.bReserved = 0; pfd.dwLayerMask = 0; pfd.dwVisibleMask = 0; pfd.dwDamageMask = 0; dispDevice.cb = sizeof(DISPLAY_DEVICE); for (deviceNum = 0; EnumDisplayDevices(NULL, deviceNum, &dispDevice, 0); deviceNum++) { if (dispDevice.StateFlags & DISPLAY_DEVICE_MIRRORING_DRIVER) { continue; } hGL->hdc = CreateDC(NULL, dispDevice.DeviceName, NULL, NULL); if (!hGL->hdc) { continue; } pfmt = ChoosePixelFormat(hGL->hdc, &pfd); if (pfmt == 0) { printf("Failed choosing the requested PixelFormat.\n"); return false; } glErr = SetPixelFormat(hGL->hdc, pfmt, &pfd); if (glErr == FALSE) { printf("Failed to set the requested PixelFormat.\n"); return false; } hGL->hglrc = wglCreateContext(hGL->hdc); if (NULL == hGL->hglrc) { printf("wglCreateContext() failed\n"); return false; } glErr = wglMakeCurrent(hGL->hdc, hGL->hglrc); if (FALSE == glErr) { printf("wglMakeCurrent() failed\n"); return false; } if (!checkAssociationDeviceWithGLContext(hGL)) { deleteGLContext(hGL); return false; } return true; } // for (deviceNum = 0; EnumDisplayDevices(NULL, deviceNum, &dispDevice, // 0); deviceNum++) { return false; } bool OCLGLCommon::checkAssociationDeviceWithGLContext(OCLGLHandle& hGL) { bool ret = false; size_t devicesSize = 0; cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform_, CL_GL_CONTEXT_KHR, (cl_context_properties)hGL->hglrc, CL_WGL_HDC_KHR, (cl_context_properties)hGL->hdc, 0}; error_ = _wrapper->clGetGLContextInfoKHR( properties, CL_DEVICES_FOR_GL_CONTEXT_KHR, 0, NULL, &devicesSize); if (error_ != CL_SUCCESS) { printf("clGetGLContextInfoKHR failed (%d)\n", error_); return false; } cl_uint numDevices = (cl_uint)devicesSize / sizeof(cl_device_id); cl_device_id* interopDevices = (cl_device_id*)malloc(devicesSize); error_ = _wrapper->clGetGLContextInfoKHR(properties, CL_DEVICES_FOR_GL_CONTEXT_KHR, devicesSize, interopDevices, NULL); if (error_ != CL_SUCCESS) { printf("clGetGLContextInfoKHR failed (%d)\n", error_); free(interopDevices); return false; } // Check that current device can be associated with OpenGL context for (unsigned int i = 0; i < numDevices; i++) { if (interopDevices[i] == devices_[_deviceId]) { ret = true; break; } } free(interopDevices); return ret; } void OCLGLCommon::createCLContextFromGLContext(OCLGLHandle& hGL) { cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform_, CL_GL_CONTEXT_KHR, (cl_context_properties)hGL->hglrc, CL_WGL_HDC_KHR, (cl_context_properties)hGL->hdc, 0}; // Release current command queue if (cmdQueues_[_deviceId]) { error_ = _wrapper->clReleaseCommandQueue(cmdQueues_[_deviceId]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } // Release current context if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseContext() failed"); } // Create new CL context from GL context context_ = clCreateContext(properties, 1, &devices_[_deviceId], NULL, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateContext() failed (%d)", error_); // Create command queue for new context cmdQueues_[_deviceId] = _wrapper->clCreateCommandQueue(context_, devices_[_deviceId], 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed (%d)", error_); GLenum glErr = glewInit(); CHECK_RESULT((glErr != GLEW_OK), "glewInit() failed"); } void OCLGLCommon::makeCurrent(OCLGLHandle hGL) { if (hGL == NULL) { wglMakeCurrent(NULL, NULL); } else { wglMakeCurrent(hGL->hdc, hGL->hglrc); } } void OCLGLCommon::getCLContextPropertiesFromGLContext( const OCLGLHandle hGL, cl_context_properties properties[7]) { if (!properties) return; properties[0] = CL_CONTEXT_PLATFORM; properties[1] = (cl_context_properties)platform_; properties[2] = CL_GL_CONTEXT_KHR; properties[3] = (cl_context_properties)hGL->hglrc; properties[4] = CL_WGL_HDC_KHR; properties[5] = (cl_context_properties)hGL->hdc; properties[6] = 0; } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLTestImp.cpp000066400000000000000000000173221450307266000244100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLTestImp.h" #include #include #include #include #include ///////////////////////////////////////////////////////////////////////////// static unsigned int crcinit(unsigned int crc); static int initializeSeed(void); ///////////////////////////////////////////////////////////////////////////// OCLutil::Lock OCLTestImp::openDeviceLock; OCLutil::Lock OCLTestImp::compileLock; OCLTestImp::OCLTestImp() : _wrapper(0), _seed(0), error_(0), type_(0), deviceCount_(0), devices_(0), platform_(0), context_(0), program_(0), kernel_(0) { unsigned int i; for (i = 0; i < 256; i++) { _crctab[i] = crcinit(i << 24); } _perfInfo = 0; _wrapper = 0; _iterationCnt = 0; _seed = initializeSeed(); _errorMsg = ""; _errorFlag = false; type_ = CL_DEVICE_TYPE_GPU; } OCLTestImp::~OCLTestImp() {} void OCLTestImp::useCPU() { type_ = CL_DEVICE_TYPE_CPU; } void OCLTestImp::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { devices_ = 0; context_ = 0; program_ = 0; kernel_ = 0; deviceCount_ = 0; open(test, units, conversion, deviceId, getPlatformIndex()); } void OCLTestImp::open(unsigned int test, char* units, double& conversion, unsigned int deviceId, unsigned int platformIndex) { BaseTestImp::open(); devices_ = 0; deviceCount_ = 0; context_ = 0; program_ = 0; kernel_ = 0; _deviceId = deviceId; _platformIndex = platformIndex; cl_uint numPlatforms = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT((error_ != CL_SUCCESS), "clGetPlatformIDs failed"); CHECK_RESULT((numPlatforms == 0), "No platform found"); cl_platform_id* platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); cl_platform_id platform = 0; #if 0 for(unsigned int i = 0; i < numPlatforms; ++i) { char buff[200]; error_ = _wrapper->clGetPlatformInfo(platforms[i],CL_PLATFORM_VENDOR, sizeof(buff), buff, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); if(strcmp(buff, "Advanced Micro Devices, Inc.") == 0) { platform = platforms[i]; break; } } #endif platform = platforms[_platformIndex]; delete[] platforms; CHECK_RESULT((platform == 0), "AMD Platform not found"); error_ = _wrapper->clGetDeviceIDs(platform, type_, 0, NULL, &deviceCount_); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs() failed"); devices_ = new cl_device_id[deviceCount_]; error_ = _wrapper->clGetDeviceIDs(platform, type_, deviceCount_, devices_, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs() failed"); cl_context_properties props[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform, 0}; context_ = _wrapper->clCreateContext(props, deviceCount_, devices_, NULL, 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateContext failed"); cl_command_queue cmdQueue; for (unsigned int i = 0; i < deviceCount_; ++i) { #ifndef CL_VERSION_2_0 cmdQueue = _wrapper->clCreateCommandQueue( context_, devices_[i], CL_QUEUE_PROFILING_ENABLE, &error_); #else cl_queue_properties prop[] = {CL_QUEUE_PROPERTIES, CL_QUEUE_PROFILING_ENABLE, 0}; cmdQueue = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[i], prop, &error_); #endif CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed"); cmdQueues_.push_back(cmdQueue); } platform_ = platform; } unsigned int OCLTestImp::close() { for (unsigned int i = 0; i < buffers().size(); ++i) { error_ = _wrapper->clReleaseMemObject(buffers()[i]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseMemObject() failed"); } buffers_.clear(); if (kernel_ != 0) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseKernel() failed"); } if (program_ != 0) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseProgram() failed"); } for (unsigned int i = 0; i < cmdQueues_.size(); ++i) { error_ = _wrapper->clReleaseCommandQueue(cmdQueues_[i]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } cmdQueues_.clear(); if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseContext() failed"); } if (devices_) { delete[] devices_; } return BaseTestImp::close(); } int OCLTestImp::genBitRand(int n) { int rslt; if (n <= 0 || n > 32) { assert(0); rslt = 0; } else if (n < 32) { _seed = _seed * 1103515245 + 12345; /* * return the most-significant n bits; they are the random ones (see * Knuth, Vol 2) */ rslt = (_seed & 0x7fffffff) >> (31 - n); } else { rslt = (genBitRand(16) << 16) | genBitRand(16); } return rslt; } int OCLTestImp::genIntRand(int a, int b) { int r; int sign = 1; int mySmall; int delta; int bits = 0; int rslt; if (a > b) { mySmall = b; delta = a - b; } else { mySmall = a; delta = b - a; } if (delta == 0) { rslt = a; return (rslt); } else if (delta < 0) { sign = -1; delta = -delta; } delta &= 0x7fffffff; for (r = delta; r > 0; r >>= 1) { bits++; } do { r = genBitRand(bits); } while (r > delta); rslt = mySmall + r * sign; return (rslt); } void OCLTestImp::setOCLWrapper(OCLWrapper* wrapper) { _wrapper = wrapper; } ///////////////////////////////////////////////////////////////////////////// #ifdef _WIN32 #include static int initializeSeed(void) { __int64 val; QueryPerformanceCounter((LARGE_INTEGER*)&val); return (int)val; } #endif // _WIN32 ///////////////////////////////////////////////////////////////////////////// #ifdef __linux__ #include static int initializeSeed(void) { struct timeval t; gettimeofday(&t, 0); return (int)t.tv_usec; } #endif // __linux__ ///////////////////////////////////////////////////////////////////////////// // // Same CRC32 as used by ogtst // static const unsigned int CRCMASK = 0x04c11db7; static unsigned int crcinit(unsigned int crc) { int i; unsigned int ans = crc; for (i = 0; i < 8; i++) { if (ans & 0x80000000) { ans = (ans << 1) ^ CRCMASK; } else { ans <<= 1; } } return (ans); } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLTestListImp.cpp000066400000000000000000000044311450307266000252410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLTestListImp.h" #include #include "OCLTest.h" // // OCLTestList_TestCount - retrieve the number of tests in the testing module // unsigned int OCL_CALLCONV OCLTestList_TestCount(void) { return TestListCount; } // // OCLTestList_TestLibVersion - retrieve the version of test lib in the testing // module // unsigned int OCL_CALLCONV OCLTestList_TestLibVersion(void) { return TestLibVersion; } // // OCLTestList_TestLibName - retrieve the name of test library // const char* OCL_CALLCONV OCLTestList_TestLibName(void) { return TestLibName; } // // OCLTestList_TestName - retrieve the name of the indexed test in the module // const char* OCL_CALLCONV OCLTestList_TestName(unsigned int testNum) { if (testNum >= OCLTestList_TestCount()) { return NULL; } return TestList[testNum].name; } // // OCLTestList_CreateTest - create a test by index // OCLTest* OCL_CALLCONV OCLTestList_CreateTest(unsigned int testNum) { if (testNum >= OCLTestList_TestCount()) { return NULL; } return reinterpret_cast((*TestList[testNum].create)()); } // // OCLTestList_DestroyTest - destroy a test object // void OCL_CALLCONV OCLTestList_DestroyTest(OCLTest* test) { delete test; } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLTestUtils.cpp000066400000000000000000000032501450307266000247560ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLTestUtils.h" #include #include bool loadFile(const char* filename, std::string& s) { size_t size; char* str; std::fstream f(filename, std::fstream::in | std::fstream::binary); if (f.is_open()) { size_t fileSize; f.seekg(0, std::fstream::end); size = fileSize = (size_t)f.tellg(); f.seekg(0, std::fstream::beg); str = new char[size + 1]; f.read(str, fileSize); f.close(); str[size] = '\0'; s = str; delete[] str; return true; } std::cerr << "Error: failed to open file: " << filename << '\n'; return false; } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLThread.cpp000066400000000000000000000122351450307266000242300ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ //! //! \file OCLThread.cpp //! #include #include #include "OCL/Thread.h" #ifdef _WIN32 #include #endif //! pack the function pointer and data inside this struct typedef struct __argsToThreadFunc { oclThreadFunc func; void *data; } argsToThreadFunc; #ifdef _WIN32 //! Windows thread callback - invokes the callback set by //! the application in OCLThread constructor unsigned _stdcall win32ThreadFunc(void *args) { argsToThreadFunc *ptr = (argsToThreadFunc *)args; OCLutil::Thread *obj = (OCLutil::Thread *)ptr->data; ptr->func(obj->getData()); delete args; return 0; } #endif //////////////////////////////////////////////////////////////////// //! //! Constructor for OCLLock //! OCLutil::Lock::Lock() { #ifdef _WIN32 InitializeCriticalSection(&_cs); #else pthread_mutex_init(&_lock, NULL); #endif } //////////////////////////////////////////////////////////////////// //! //! Destructor for OCLLock //! OCLutil::Lock::~Lock() { #ifdef _WIN32 DeleteCriticalSection(&_cs); #else pthread_mutex_destroy(&_lock); #endif } ////////////////////////////////////////////////////////////// //! //! Try to acquire the lock, wait for the lock if unavailable //! else hold the lock and enter the protected area //! void OCLutil::Lock::lock() { #ifdef _WIN32 EnterCriticalSection(&_cs); #else pthread_mutex_lock(&_lock); #endif } ////////////////////////////////////////////////////////////// //! //! Try to acquire the lock, if unavailable the function returns //! false and returns true if available(enters the critical //! section as well in this case). //! bool OCLutil::Lock::tryLock() { #ifdef _WIN32 return (TryEnterCriticalSection(&_cs) != 0); #else return !((bool)pthread_mutex_trylock(&_lock)); #endif } ////////////////////////////////////////////////////////////// //! //! Unlock the lock //! void OCLutil::Lock::unlock() { #ifdef _WIN32 LeaveCriticalSection(&_cs); #else pthread_mutex_unlock(&_lock); #endif } //////////////////////////////////////////////////////////////////// //! //! Constructor for OCLThread //! OCLutil::Thread::Thread() : _tid(0), _data(0) { #ifdef _WIN32 _ID = 0; #else #endif } //////////////////////////////////////////////////////////////////// //! //! Destructor for OCLLock //! OCLutil::Thread::~Thread() { #ifdef _WIN32 CloseHandle(_tid); #else #endif } ////////////////////////////////////////////////////////////// //! //! Create a new thread and return the status of the operation //! bool OCLutil::Thread::create(oclThreadFunc func, void *arg) { // Save the data internally _data = arg; unsigned int retVal; bool verbose = getenv("VERBOSE") != NULL; #ifdef _WIN32 // Setup the callback struct for thread function and pass to the // begin thread routine // xxx The following struct is allocated but never freed!!!! argsToThreadFunc *args = new argsToThreadFunc; args->func = func; args->data = this; _tid = (HANDLE)_beginthreadex(NULL, 0, win32ThreadFunc, args, 0, &retVal); if (verbose) { printf("Thread handle value = %p\n", _tid); printf("Done creating thread. Thread id value = %u\n", retVal); } #else //! Now create the thread with pointer to self as the data retVal = pthread_create(&_tid, NULL, func, arg); if (verbose) printf("Done creating thread. Ret value %d, Self = %u\n", retVal, (unsigned int)pthread_self()); #endif if (retVal != 0) return false; return true; } ////////////////////////////////////////////////////////////// //! //! Return the thread ID for the current OCLThread //! unsigned int OCLutil::Thread::getID() { #ifdef _WIN32 return GetCurrentThreadId(); // Type cast the thread handle to unsigned in and send it over #else return (unsigned int)pthread_self(); #endif } ////////////////////////////////////////////////////////////// //! //! Wait for this thread to join //! bool OCLutil::Thread::join() { #ifdef _WIN32 DWORD rc = WaitForSingleObject(_tid, INFINITE); if (rc == WAIT_FAILED) { printf("Bad call to function(invalid handle?)\n"); } #else int rc = pthread_join(_tid, NULL); #endif if (rc != 0) return false; return true; } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/OCLWrapper.cpp000066400000000000000000001261251450307266000244450ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLWrapper.h" OCLWrapper::OCLWrapper() { clEnqueueWaitSignalAMD_ptr = (clEnqueueWaitSignalAMD_fn)clGetExtensionFunctionAddress( "clEnqueueWaitSignalAMD"); clEnqueueWriteSignalAMD_ptr = (clEnqueueWriteSignalAMD_fn)clGetExtensionFunctionAddress( "clEnqueueWriteSignalAMD"); clEnqueueMakeBuffersResidentAMD_ptr = (clEnqueueMakeBuffersResidentAMD_fn)clGetExtensionFunctionAddress( "clEnqueueMakeBuffersResidentAMD"); clUnloadPlatformAMD_ptr = (clUnloadPlatformAMD_fn)clGetExtensionFunctionAddress( "clUnloadPlatformAMD"); // CL-GL function pointers clGetGLContextInfoKHR_ptr = (clGetGLContextInfoKHR_fn)clGetExtensionFunctionAddress( "clGetGLContextInfoKHR"); clCreateFromGLBuffer_ptr = (clCreateFromGLBuffer_fn)clGetExtensionFunctionAddress( "clCreateFromGLBuffer"); clCreateFromGLTexture_ptr = (clCreateFromGLTexture_fn)clGetExtensionFunctionAddress( "clCreateFromGLTexture"); clCreateFromGLTexture2D_ptr = (clCreateFromGLTexture2D_fn)clGetExtensionFunctionAddress( "clCreateFromGLTexture2D"); clCreateFromGLRenderbuffer_ptr = (clCreateFromGLRenderbuffer_fn)clGetExtensionFunctionAddress( "clCreateFromGLRenderbuffer"); clGetGLObjectInfo_ptr = (clGetGLObjectInfo_fn)clGetExtensionFunctionAddress("clGetGLObjectInfo"); clGetGLTextureInfo_ptr = (clGetGLTextureInfo_fn)clGetExtensionFunctionAddress( "clGetGLTextureInfo"); clEnqueueAcquireGLObjects_ptr = (clEnqueueAcquireGLObjects_fn)clGetExtensionFunctionAddress( "clEnqueueAcquireGLObjects"); clEnqueueReleaseGLObjects_ptr = (clEnqueueReleaseGLObjects_fn)clGetExtensionFunctionAddress( "clEnqueueReleaseGLObjects"); // Performance counter function pointers clCreatePerfCounterAMD_ptr = (clCreatePerfCounterAMD_fn)clGetExtensionFunctionAddress( "clCreatePerfCounterAMD"); clEnqueueBeginPerfCounterAMD_ptr = (clEnqueueBeginPerfCounterAMD_fn)clGetExtensionFunctionAddress( "clEnqueueBeginPerfCounterAMD"); clEnqueueEndPerfCounterAMD_ptr = (clEnqueueEndPerfCounterAMD_fn)clGetExtensionFunctionAddress( "clEnqueueEndPerfCounterAMD"); clGetPerfCounterInfoAMD_ptr = (clGetPerfCounterInfoAMD_fn)clGetExtensionFunctionAddress( "clGetPerfCounterInfoAMD"); clReleasePerfCounterAMD_ptr = (clReleasePerfCounterAMD_fn)clGetExtensionFunctionAddress( "clReleasePerfCounterAMD"); clRetainPerfCounterAMD_ptr = (clRetainPerfCounterAMD_fn)clGetExtensionFunctionAddress( "clRetainPerfCounterAMD"); clSetDeviceClockModeAMD_ptr = (clSetDeviceClockModeAMD_fn)clGetExtensionFunctionAddress( "clSetDeviceClockModeAMD"); } cl_int OCLWrapper::clGetPlatformIDs(cl_uint num_entries, cl_platform_id *platforms, cl_uint *num_platforms) { return ::clGetPlatformIDs(num_entries, platforms, num_platforms); } cl_int OCLWrapper::clGetPlatformInfo(cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetPlatformInfo(platform, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clGetDeviceIDs(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices) { return ::clGetDeviceIDs(platform, device_type, num_entries, devices, num_devices); } cl_int OCLWrapper::clGetDeviceInfo(cl_device_id device, cl_device_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetDeviceInfo(device, param_name, param_value_size, param_value, param_value_size_ret); } cl_context OCLWrapper::clCreateContext( cl_context_properties *properties, cl_uint num_devices, const cl_device_id *devices, void(CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void *user_data, cl_int *errcode_ret) { return ::clCreateContext(properties, num_devices, devices, pfn_notify, user_data, errcode_ret); } cl_context OCLWrapper::clCreateContextFromType( cl_context_properties *properties, cl_device_type device_type, void(CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void *user_data, cl_int *errcode_ret) { return ::clCreateContextFromType(properties, device_type, pfn_notify, user_data, errcode_ret); } cl_int OCLWrapper::clRetainContext(cl_context context) { return ::clRetainContext(context); } cl_int OCLWrapper::clReleaseContext(cl_context context) { return ::clReleaseContext(context); } cl_int OCLWrapper::clGetContextInfo(cl_context context, cl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetContextInfo(context, param_name, param_value_size, param_value, param_value_size_ret); } cl_command_queue OCLWrapper::clCreateCommandQueue( cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int *errcode_ret) { #if defined(CL_VERSION_2_0) cl_int err; cl_platform_id pid; bool version20 = true; err = ::clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(cl_platform_id), &pid, NULL); if (err == CL_SUCCESS) { size_t size; char *ver; err = ::clGetPlatformInfo(pid, CL_PLATFORM_VERSION, 0, NULL, &size); if (err == CL_SUCCESS) { ver = new char[size]; if (ver) { err = ::clGetPlatformInfo(pid, CL_PLATFORM_VERSION, size, ver, NULL); if (err == CL_SUCCESS) { if (ver[8] == '1') { version20 = false; } } delete[] ver; } } } if (version20) { const cl_queue_properties cprops[] = { CL_QUEUE_PROPERTIES, static_cast(properties), 0}; return ::clCreateCommandQueueWithProperties( context, device, properties ? cprops : NULL, errcode_ret); } else { return ::clCreateCommandQueue(context, device, properties, errcode_ret); } #else return ::clCreateCommandQueue(context, device, properties, errcode_ret); #endif } cl_int OCLWrapper::clRetainCommandQueue(cl_command_queue command_queue) { return ::clRetainCommandQueue(command_queue); } cl_int OCLWrapper::clReleaseCommandQueue(cl_command_queue command_queue) { return ::clReleaseCommandQueue(command_queue); } cl_int OCLWrapper::clGetCommandQueueInfo(cl_command_queue command_queue, cl_command_queue_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetCommandQueueInfo(command_queue, param_name, param_value_size, param_value, param_value_size_ret); } cl_mem OCLWrapper::clCreateBuffer(cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret) { return ::clCreateBuffer(context, flags, size, host_ptr, errcode_ret); } cl_mem OCLWrapper::clCreateImage2D(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, size_t image_width, size_t image_height, size_t image_row_pitch, void *host_ptr, cl_int *errcode_ret) { return ::clCreateImage2D(context, flags, image_format, image_width, image_height, image_row_pitch, host_ptr, errcode_ret); } cl_mem OCLWrapper::clCreateImage3D(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, size_t image_width, size_t image_height, size_t image_depth, size_t image_row_pitch, size_t image_slice_pitch, void *host_ptr, cl_int *errcode_ret) { return ::clCreateImage3D(context, flags, image_format, image_width, image_height, image_depth, image_row_pitch, image_slice_pitch, host_ptr, errcode_ret); } cl_int OCLWrapper::clRetainMemObject(cl_mem memobj) { return ::clRetainMemObject(memobj); } cl_int OCLWrapper::clReleaseMemObject(cl_mem memobj) { return ::clReleaseMemObject(memobj); } cl_int OCLWrapper::clGetSupportedImageFormats(cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format *image_formats, cl_uint *num_image_formats) { return ::clGetSupportedImageFormats(context, flags, image_type, num_entries, image_formats, num_image_formats); } cl_int OCLWrapper::clGetMemObjectInfo(cl_mem memobj, cl_mem_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetMemObjectInfo(memobj, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clGetImageInfo(cl_mem image, cl_image_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetImageInfo(image, param_name, param_value_size, param_value, param_value_size_ret); } cl_sampler OCLWrapper::clCreateSampler(cl_context context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int *errcode_ret) { #ifdef CL_VERSION_2_0 const cl_sampler_properties sprops[] = { CL_SAMPLER_NORMALIZED_COORDS, static_cast(normalized_coords), CL_SAMPLER_ADDRESSING_MODE, static_cast(addressing_mode), CL_SAMPLER_FILTER_MODE, static_cast(filter_mode), 0}; return ::clCreateSamplerWithProperties(context, sprops, errcode_ret); #else return ::clCreateSampler(context, normalized_coords, addressing_mode, filter_mode, errcode_ret); #endif } cl_int OCLWrapper::clRetainSampler(cl_sampler sampler) { return ::clRetainSampler(sampler); } cl_int OCLWrapper::clReleaseSampler(cl_sampler sampler) { return ::clReleaseSampler(sampler); } cl_int OCLWrapper::clGetSamplerInfo(cl_sampler sampler, cl_sampler_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetSamplerInfo(sampler, param_name, param_value_size, param_value, param_value_size_ret); } cl_program OCLWrapper::clCreateProgramWithSource(cl_context context, cl_uint count, const char **strings, const size_t *lengths, cl_int *errcode_ret) { return ::clCreateProgramWithSource(context, count, strings, lengths, errcode_ret); } cl_program OCLWrapper::clCreateProgramWithBinary( cl_context context, cl_uint num_devices, const cl_device_id *device_list, const size_t *lengths, const unsigned char **binaries, cl_int *binary_status, cl_int *errcode_ret) { return ::clCreateProgramWithBinary(context, num_devices, device_list, lengths, binaries, binary_status, errcode_ret); } cl_int OCLWrapper::clRetainProgram(cl_program program) { return ::clRetainProgram(program); } cl_int OCLWrapper::clReleaseProgram(cl_program program) { return ::clReleaseProgram(program); } cl_int OCLWrapper::clBuildProgram( cl_program program, cl_uint num_devices, const cl_device_id *device_list, const char *options, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data) { return ::clBuildProgram(program, num_devices, device_list, options, pfn_notify, user_data); } cl_int OCLWrapper::clCompileProgram( cl_program program, cl_uint num_devices, const cl_device_id *device_list, const char *options, cl_uint num_input_headers, const cl_program *input_headers, const char **header_include_names, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data) { return ::clCompileProgram(program, num_devices, device_list, options, num_input_headers, input_headers, header_include_names, pfn_notify, user_data); } cl_program OCLWrapper::clLinkProgram( cl_context context, cl_uint num_devices, const cl_device_id *device_list, const char *options, cl_uint num_input_programs, const cl_program *input_programs, void(CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data, cl_int *errcode_ret) { return ::clLinkProgram(context, num_devices, device_list, options, num_input_programs, input_programs, pfn_notify, user_data, errcode_ret); } cl_int OCLWrapper::clUnloadCompiler(void) { return ::clUnloadCompiler(); } cl_int OCLWrapper::clGetProgramInfo(cl_program program, cl_program_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetProgramInfo(program, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clGetProgramBuildInfo( cl_program program, cl_device_id device, cl_program_build_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetProgramBuildInfo(program, device, param_name, param_value_size, param_value, param_value_size_ret); } cl_kernel OCLWrapper::clCreateKernel(cl_program program, const char *kernel_name, cl_int *errcode_ret) { return ::clCreateKernel(program, kernel_name, errcode_ret); } cl_int OCLWrapper::clCreateKernelsInProgram(cl_program program, cl_uint num_kernels, cl_kernel *kernels, cl_uint *num_kernels_ret) { return ::clCreateKernelsInProgram(program, num_kernels, kernels, num_kernels_ret); } cl_int OCLWrapper::clRetainKernel(cl_kernel kernel) { return ::clRetainKernel(kernel); } cl_int OCLWrapper::clReleaseKernel(cl_kernel kernel) { return ::clReleaseKernel(kernel); } cl_int OCLWrapper::clSetKernelArg(cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void *arg_value) { return ::clSetKernelArg(kernel, arg_index, arg_size, arg_value); } cl_int OCLWrapper::clGetKernelInfo(cl_kernel kernel, cl_kernel_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetKernelInfo(kernel, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clGetKernelWorkGroupInfo( cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetKernelWorkGroupInfo(kernel, device, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clWaitForEvents(cl_uint num_events, const cl_event *event_list) { return ::clWaitForEvents(num_events, event_list); } cl_int OCLWrapper::clGetEventInfo(cl_event evnt, cl_event_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetEventInfo(evnt, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clRetainEvent(cl_event evnt) { return ::clRetainEvent(evnt); } cl_int OCLWrapper::clReleaseEvent(cl_event evnt) { return ::clReleaseEvent(evnt); } cl_int OCLWrapper::clGetEventProfilingInfo(cl_event evnt, cl_profiling_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetEventProfilingInfo(evnt, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clFlush(cl_command_queue command_queue) { return ::clFlush(command_queue); } cl_int OCLWrapper::clFinish(cl_command_queue command_queue) { return ::clFinish(command_queue); } cl_int OCLWrapper::clEnqueueReadBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t cb, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueReadBuffer(command_queue, buffer, blocking_read, offset, cb, ptr, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueWriteBuffer( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t cb, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueWriteBuffer(command_queue, buffer, blocking_write, offset, cb, ptr, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueCopyBuffer(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueCopyBuffer(command_queue, src_buffer, dst_buffer, src_offset, dst_offset, cb, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueReadBufferRect( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueReadBufferRect( command_queue, buffer, blocking_read, buffer_origin, host_origin, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueWriteBufferRect( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueWriteBufferRect( command_queue, buffer, blocking_write, buffer_origin, host_origin, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueCopyBufferRect( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, const size_t *src_origin, const size_t *dst_origin, const size_t *region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueCopyBufferRect( command_queue, src_buffer, dst_buffer, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueReadImage( cl_command_queue command_queue, cl_mem image, cl_bool blocking_read, const size_t *origin, const size_t *region, size_t row_pitch, size_t slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueReadImage(command_queue, image, blocking_read, origin, region, row_pitch, slice_pitch, ptr, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueWriteImage( cl_command_queue command_queue, cl_mem image, cl_bool blocking_write, const size_t *origin, const size_t *region, size_t input_row_pitch, size_t input_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueWriteImage(command_queue, image, blocking_write, origin, region, input_row_pitch, input_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueCopyImage( cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image, const size_t *src_origin, const size_t *dst_origin, const size_t *region, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueCopyImage(command_queue, src_image, dst_image, src_origin, dst_origin, region, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueCopyImageToBuffer( cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buffer, const size_t *src_origin, const size_t *region, size_t dst_offset, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueCopyImageToBuffer( command_queue, src_image, dst_buffer, src_origin, region, dst_offset, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueCopyBufferToImage( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_image, size_t src_offset, const size_t *dst_origin, const size_t *region, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueCopyBufferToImage( command_queue, src_buffer, dst_image, src_offset, dst_origin, region, num_events_in_wait_list, event_wait_list, evnt); } void *OCLWrapper::clEnqueueMapBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map, cl_map_flags map_flags, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt, cl_int *errcode_ret) { return ::clEnqueueMapBuffer(command_queue, buffer, blocking_map, map_flags, offset, cb, num_events_in_wait_list, event_wait_list, evnt, errcode_ret); } void *OCLWrapper::clEnqueueMapImage( cl_command_queue command_queue, cl_mem image, cl_bool blocking_map, cl_map_flags map_flags, const size_t *origin, const size_t *region, size_t *image_row_pitch, size_t *image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt, cl_int *errcode_ret) { return ::clEnqueueMapImage(command_queue, image, blocking_map, map_flags, origin, region, image_row_pitch, image_slice_pitch, num_events_in_wait_list, event_wait_list, evnt, errcode_ret); } cl_int OCLWrapper::clEnqueueUnmapMemObject(cl_command_queue command_queue, cl_mem memobj, void *mapped_ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueUnmapMemObject(command_queue, memobj, mapped_ptr, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueNDRangeKernel( cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t *global_work_offset, const size_t *global_work_size, const size_t *local_work_size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueNDRangeKernel( command_queue, kernel, work_dim, global_work_offset, global_work_size, local_work_size, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueTask(cl_command_queue command_queue, cl_kernel kernel, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { #if defined(CL_VERSION_2_0) static size_t const globalWorkSize[3] = {1, 0, 0}; static size_t const localWorkSize[3] = {1, 0, 0}; return ::clEnqueueNDRangeKernel( command_queue, kernel, 1, NULL, globalWorkSize, localWorkSize, num_events_in_wait_list, event_wait_list, evnt); #else return ::clEnqueueTask(command_queue, kernel, num_events_in_wait_list, event_wait_list, evnt); #endif } cl_int OCLWrapper::clEnqueueNativeKernel( cl_command_queue command_queue, void(CL_CALLBACK *user_func)(void *), void *args, size_t cb_args, cl_uint num_mem_objects, const cl_mem *mem_list, const void **args_mem_loc, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueNativeKernel( command_queue, user_func, args, cb_args, num_mem_objects, mem_list, args_mem_loc, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueMarker(cl_command_queue command_queue, cl_event *evnt) { return ::clEnqueueMarker(command_queue, evnt); } cl_int OCLWrapper::clEnqueueMarkerWithWaitList(cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueMarkerWithWaitList(command_queue, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clEnqueueWaitForEvents(cl_command_queue command_queue, cl_uint num_events, const cl_event *event_list) { return ::clEnqueueWaitForEvents(command_queue, num_events, event_list); } cl_int OCLWrapper::clEnqueueBarrier(cl_command_queue command_queue) { return ::clEnqueueBarrier(command_queue); } void *OCLWrapper::clGetExtensionFunctionAddress(const char *func_name) { return ::clGetExtensionFunctionAddress(func_name); } cl_mem OCLWrapper::clCreateImage(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, const cl_image_desc *image_desc, void *host_ptr, cl_int *errcode_ret) { return ::clCreateImage(context, flags, image_format, image_desc, host_ptr, errcode_ret); } cl_mem OCLWrapper::clCreateSubBuffer(cl_mem mem, cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void *buffer_create_info, cl_int *errcode_ret) { return ::clCreateSubBuffer(mem, flags, buffer_create_type, buffer_create_info, errcode_ret); } cl_int OCLWrapper::clSetEventCallback( cl_event event, cl_int command_exec_callback_type, void(CL_CALLBACK *pfn_event_notify)(cl_event event, cl_int event_command_exec_status, void *user_data), void *user_data) { return ::clSetEventCallback(event, command_exec_callback_type, pfn_event_notify, user_data); } cl_int OCLWrapper::clEnqueueFillImage( cl_command_queue command_queue, cl_mem image, void *ptr, const size_t *origin, const size_t *region, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *evnt) { return ::clEnqueueFillImage(command_queue, image, ptr, origin, region, num_events_in_wait_list, event_wait_list, evnt); } cl_int OCLWrapper::clUnloadPlatformAMD(cl_platform_id id) { if (clUnloadPlatformAMD_ptr) return clUnloadPlatformAMD_ptr(id); return CL_SUCCESS; } cl_int OCLWrapper::clEnqueueWaitSignalAMD(cl_command_queue command_queue, cl_mem mem_object, cl_uint value, cl_uint num_events, const cl_event *event_wait_list, cl_event *event) { return clEnqueueWaitSignalAMD_ptr(command_queue, mem_object, value, num_events, event_wait_list, event); } cl_int OCLWrapper::clEnqueueWriteSignalAMD(cl_command_queue command_queue, cl_mem mem_object, cl_uint value, cl_ulong offset, cl_uint num_events, const cl_event *event_list, cl_event *event) { return clEnqueueWriteSignalAMD_ptr(command_queue, mem_object, value, offset, num_events, event_list, event); } cl_int OCLWrapper::clEnqueueMakeBuffersResidentAMD( cl_command_queue command_queue, cl_uint num_mem_objs, cl_mem *mem_objects, cl_bool blocking_make_resident, cl_bus_address_amd *bus_addresses, cl_uint num_events, const cl_event *event_list, cl_event *event) { return clEnqueueMakeBuffersResidentAMD_ptr( command_queue, num_mem_objs, mem_objects, blocking_make_resident, bus_addresses, num_events, event_list, event); } cl_int OCLWrapper::clEnqueueMigrateMemObjects(cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem *mem_objects, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { return ::clEnqueueMigrateMemObjects( command_queue, num_mem_objects, mem_objects, flags, num_events_in_wait_list, event_wait_list, event); } cl_int OCLWrapper::clGetGLContextInfoKHR( const cl_context_properties *properties, cl_gl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return (*clGetGLContextInfoKHR_ptr)(properties, param_name, param_value_size, param_value, param_value_size_ret); } cl_mem OCLWrapper::clCreateFromGLBuffer(cl_context context, cl_mem_flags flags, unsigned int bufobj, int *errcode_ret) { return (*clCreateFromGLBuffer_ptr)(context, flags, bufobj, errcode_ret); } cl_mem OCLWrapper::clCreateFromGLTexture(cl_context context, cl_mem_flags flags, unsigned int texture_target, int miplevel, unsigned int texture, cl_int *errcode_ret) { return (*clCreateFromGLTexture_ptr)(context, flags, texture_target, miplevel, texture, errcode_ret); } cl_mem OCLWrapper::clCreateFromGLTexture2D(cl_context context, cl_mem_flags flags, unsigned int texture_target, int miplevel, unsigned int texture, cl_int *errcode_ret) { return (*clCreateFromGLTexture2D_ptr)(context, flags, texture_target, miplevel, texture, errcode_ret); } cl_mem OCLWrapper::clCreateFromGLRenderbuffer(cl_context context, cl_mem_flags flags, unsigned int renderbuffer, cl_int *errcode_ret) { return (*clCreateFromGLRenderbuffer_ptr)(context, flags, renderbuffer, errcode_ret); } cl_int OCLWrapper::clGetGLObjectInfo(cl_mem memobj, cl_gl_object_type *gl_object_type, unsigned int *gl_object_name) { return (*clGetGLObjectInfo_ptr)(memobj, gl_object_type, gl_object_name); } cl_int OCLWrapper::clGetGLTextureInfo(cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return (*clGetGLTextureInfo_ptr)(memobj, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clEnqueueAcquireGLObjects(cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { return (*clEnqueueAcquireGLObjects_ptr)(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } cl_int OCLWrapper::clEnqueueReleaseGLObjects(cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { return (*clEnqueueReleaseGLObjects_ptr)(command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); } #if defined(CL_VERSION_2_0) cl_command_queue OCLWrapper::clCreateCommandQueueWithProperties( cl_context context, cl_device_id device, const cl_queue_properties *properties, cl_int *errcode_ret) { return ::clCreateCommandQueueWithProperties(context, device, properties, errcode_ret); } void *OCLWrapper::clSVMAlloc(cl_context context, cl_svm_mem_flags flags, size_t size, cl_uint alignment) { return ::clSVMAlloc(context, flags, size, alignment); } void OCLWrapper::clSVMFree(cl_context context, void *svm_pointer) { return ::clSVMFree(context, svm_pointer); } cl_int OCLWrapper::clEnqueueSVMMap(cl_command_queue command_queue, cl_bool blocking_map, cl_map_flags flags, void *svm_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { return ::clEnqueueSVMMap(command_queue, blocking_map, flags, svm_ptr, size, num_events_in_wait_list, event_wait_list, event); } cl_int OCLWrapper::clEnqueueSVMUnmap(cl_command_queue command_queue, void *svm_ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { return ::clEnqueueSVMUnmap(command_queue, svm_ptr, num_events_in_wait_list, event_wait_list, event); } cl_int OCLWrapper::clEnqueueSVMMemFill(cl_command_queue command_queue, void *svm_ptr, const void *pattern, size_t pattern_size, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { return ::clEnqueueSVMMemFill(command_queue, svm_ptr, pattern, pattern_size, size, num_events_in_wait_list, event_wait_list, event); } cl_int OCLWrapper::clSetKernelArgSVMPointer(cl_kernel kernel, cl_uint arg_index, const void *arg_value) { return ::clSetKernelArgSVMPointer(kernel, arg_index, arg_value); } cl_mem OCLWrapper::clCreatePipe(cl_context context, cl_mem_flags flags, cl_uint packet_size, cl_uint pipe_max_packets, const cl_pipe_properties *properties, cl_int *errcode_ret) { return ::clCreatePipe(context, flags, packet_size, pipe_max_packets, properties, errcode_ret); } cl_int OCLWrapper::clGetPipeInfo(cl_mem pipe, cl_pipe_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return ::clGetPipeInfo(pipe, param_name, param_value_size, param_value, param_value_size_ret); } #endif cl_perfcounter_amd OCLWrapper::clCreatePerfCounterAMD( cl_device_id device, cl_perfcounter_property *properties, cl_int *errcode_ret) { return (*clCreatePerfCounterAMD_ptr)(device, properties, errcode_ret); } cl_int OCLWrapper::clEnqueueBeginPerfCounterAMD( cl_command_queue command_queue, cl_uint num_perf_counters, cl_perfcounter_amd *perf_counters, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { return (*clEnqueueBeginPerfCounterAMD_ptr)( command_queue, num_perf_counters, perf_counters, num_events_in_wait_list, event_wait_list, event); } cl_int OCLWrapper::clEnqueueEndPerfCounterAMD(cl_command_queue command_queue, cl_uint num_perf_counters, cl_perfcounter_amd *perf_counters, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { return (*clEnqueueEndPerfCounterAMD_ptr)( command_queue, num_perf_counters, perf_counters, num_events_in_wait_list, event_wait_list, event); } cl_int OCLWrapper::clGetPerfCounterInfoAMD(cl_perfcounter_amd perf_counter, cl_perfcounter_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { return (*clGetPerfCounterInfoAMD_ptr)(perf_counter, param_name, param_value_size, param_value, param_value_size_ret); } cl_int OCLWrapper::clReleasePerfCounterAMD(cl_perfcounter_amd perf_counter) { return (*clReleasePerfCounterAMD_ptr)(perf_counter); } cl_int OCLWrapper::clRetainPerfCounterAMD(cl_perfcounter_amd perf_counter) { return (*clRetainPerfCounterAMD_ptr)(perf_counter); } cl_int OCLWrapper::clSetDeviceClockModeAMD( cl_device_id device, cl_set_device_clock_mode_input_amd set_clock_mode_input, cl_set_device_clock_mode_output_amd *set_clock_mode_output) { return (*clSetDeviceClockModeAMD_ptr)(device, set_clock_mode_input, set_clock_mode_output); } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/Timer.cpp000066400000000000000000000051211450307266000235370ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "Timer.h" #ifdef _WIN32 #include #endif #ifdef __linux__ #include #define NANOSECONDS_PER_SEC 1000000000 #endif CPerfCounter::CPerfCounter() : _clocks(0), _start(0) { #ifdef _WIN32 QueryPerformanceFrequency((LARGE_INTEGER *)&_freq); #endif #ifdef __linux__ _freq = NANOSECONDS_PER_SEC; #endif } CPerfCounter::~CPerfCounter() { // EMPTY! } void CPerfCounter::Start(void) { #ifdef _WIN32 if (_start) { MessageBox(NULL, "Bad Perf Counter Start", "Error", MB_OK); exit(0); } QueryPerformanceCounter((LARGE_INTEGER *)&_start); #endif #ifdef __linux__ struct timespec s; clock_gettime(CLOCK_MONOTONIC, &s); _start = (i64)s.tv_sec * NANOSECONDS_PER_SEC + (i64)s.tv_nsec; #endif } void CPerfCounter::Stop(void) { i64 n; #ifdef _WIN32 if (!_start) { MessageBox(NULL, "Bad Perf Counter Stop", "Error", MB_OK); exit(0); } QueryPerformanceCounter((LARGE_INTEGER *)&n); #endif #ifdef __linux__ struct timespec s; clock_gettime(CLOCK_MONOTONIC, &s); n = (i64)s.tv_sec * NANOSECONDS_PER_SEC + (i64)s.tv_nsec; #endif n -= _start; _start = 0; _clocks += n; } void CPerfCounter::Reset(void) { #ifdef _WIN32 if (_start) { MessageBox(NULL, "Bad Perf Counter Reset", "Error", MB_OK); exit(0); } #endif _clocks = 0; } double CPerfCounter::GetElapsedTime(void) { #ifdef _WIN32 if (_start) { MessageBox(NULL, "Trying to get time while still running.", "Error", MB_OK); exit(0); } #endif return (double)_clocks / (double)_freq; } clr-rocm-5.7.1/opencl/tests/ocltst/module/common/Timer.h000066400000000000000000000026671450307266000232200ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _TIMER_H_ #define _TIMER_H_ #ifdef _WIN32 typedef __int64 i64; #endif #ifdef __linux__ typedef long long i64; #endif class CPerfCounter { public: CPerfCounter(); ~CPerfCounter(); void Start(void); void Stop(void); void Reset(void); double GetElapsedTime(void); private: i64 _freq; i64 _clocks; i64 _start; }; #endif // _TIMER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/dx/000077500000000000000000000000001450307266000210775ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/module/dx/OCLDX11Common.cpp000066400000000000000000000216711450307266000237760ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLDX11Common.h" #define D3D_FEATURE_LEVEL_11_1 0xb100 #define INITPFN(x) \ x = (x##_fn)clGetExtensionFunctionAddressForPlatform(platform_, #x); \ if ((x) == NULL) { \ char* buf = (char*)malloc(4096); \ _errorFlag = true; \ int rc = snprintf(buf, 4096, "Failed to get function pointer for %s", #x); \ assert(rc >= 0 && rc < (int)4096); \ printf("%s:%d - %s\n", __FILE__, __LINE__, buf); \ _errorMsg = std::string(buf); \ _crcword += 1; \ free(buf); \ return; \ } OCLDX11Common::OCLDX11Common() : OCLTestImp() { clGetDeviceIDsFromD3D11KHR = NULL; clCreateFromD3D11BufferKHR = NULL; clCreateFromD3D11Texture2DKHR = NULL; clCreateFromD3D11Texture3DKHR = NULL; clEnqueueAcquireD3D11ObjectsKHR = NULL; clEnqueueReleaseD3D11ObjectsKHR = NULL; clGetPlaneFromImageAMD = NULL; } OCLDX11Common::~OCLDX11Common() {} void OCLDX11Common::ExtensionCheck() { cl_int result = CL_SUCCESS; char extensions[1024]; result = _wrapper->clGetPlatformInfo(platform_, CL_PLATFORM_EXTENSIONS, sizeof(extensions), extensions, NULL); CHECK_RESULT(result != CL_SUCCESS, "Failed to list platform extensions."); extensionsAvailable = strstr(extensions, "cl_khr_d3d11_sharing") ? true : false; if (!extensionsAvailable) { printf("cl_khr_d3d11_sharing extension is required for this test!\n"); } OSVERSIONINFOEX versionInfo = {0}; versionInfo.dwOSVersionInfoSize = sizeof(OSVERSIONINFOEX); versionInfo.dwMajorVersion = 6; DWORDLONG conditionMask = 0; VER_SET_CONDITION(conditionMask, VER_MAJORVERSION, VER_GREATER_EQUAL); if (VerifyVersionInfo(&versionInfo, VER_MAJORVERSION, conditionMask)) { CHECK_RESULT(!extensionsAvailable, "Extension should be exported on Windows >= 6"); } else { CHECK_RESULT(extensionsAvailable, "Extension should not be exported on Windows < 6"); } result = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_EXTENSIONS, sizeof(extensions), extensions, NULL); CHECK_RESULT(result != CL_SUCCESS, "Failed to list device extensions."); extensionsAvailable = strstr(extensions, "cl_amd_planar_yuv") ? true : false; if (!extensionsAvailable) { printf("cl_amd_planar_yuv extension is required for this test!\n"); } } void OCLDX11Common::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { // OpenCL Initialization // OCLTestImp::open(test, units, conversion, deviceId); BaseTestImp::open(); devices_ = 0; deviceCount_ = 0; context_ = 0; program_ = 0; kernel_ = 0; _queue = 0; _deviceId = deviceId; dxD3D11Context = NULL; dxD3D11Device = NULL; CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test (%d)", error_); cl_uint numPlatforms = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT((error_ != CL_SUCCESS), "clGetPlatformIDs failed"); CHECK_RESULT((numPlatforms == 0), "No platform found"); cl_platform_id* platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform_ = platforms[_platformIndex]; CHECK_RESULT((platform_ == 0), "AMD Platform not found"); delete[] platforms; error_ = _wrapper->clGetDeviceIDs(platform_, type_, 0, NULL, &deviceCount_); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs() failed"); devices_ = new cl_device_id[deviceCount_]; error_ = _wrapper->clGetDeviceIDs(platform_, type_, deviceCount_, devices_, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs() failed"); ExtensionCheck(); if (!extensionsAvailable) { return; } // extract function pointers for exported functions INITPFN(clGetDeviceIDsFromD3D11KHR); INITPFN(clCreateFromD3D11BufferKHR); INITPFN(clCreateFromD3D11Texture2DKHR); INITPFN(clCreateFromD3D11Texture3DKHR); INITPFN(clEnqueueAcquireD3D11ObjectsKHR); INITPFN(clEnqueueReleaseD3D11ObjectsKHR); INITPFN(clGetPlaneFromImageAMD); char name[1024] = {0}; size_t size = 0; if (deviceId >= deviceCount_) { _errorFlag = true; return; } HRESULT hr = S_OK; UINT createDeviceFlags = 0; D3D_FEATURE_LEVEL featureLevels[] = { (D3D_FEATURE_LEVEL)D3D_FEATURE_LEVEL_11_1, D3D_FEATURE_LEVEL_11_0 }; D3D_FEATURE_LEVEL featureLevel; // Create only the device, not the swapchain. We can't create the swapchain // anyways without a handle to a window we explicitly own hr = D3D11CreateDevice(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, createDeviceFlags, featureLevels, _countof(featureLevels), D3D11_SDK_VERSION, &dxD3D11Device, &featureLevel, &dxD3D11Context); if (FAILED(hr)) { hr = D3D11CreateDevice(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, createDeviceFlags, featureLevels + 1, _countof(featureLevels) - 1, D3D11_SDK_VERSION, &dxD3D11Device, &featureLevel, &dxD3D11Context); } if (FAILED(hr)) { hr = D3D11CreateDevice(NULL, D3D_DRIVER_TYPE_SOFTWARE, NULL, createDeviceFlags, featureLevels, _countof(featureLevels), D3D11_SDK_VERSION, &dxD3D11Device, &featureLevel, &dxD3D11Context); } if (FAILED(hr)) { hr = D3D11CreateDevice(NULL, D3D_DRIVER_TYPE_SOFTWARE, NULL, createDeviceFlags, featureLevels + 1, _countof(featureLevels) - 1, D3D11_SDK_VERSION, &dxD3D11Device, &featureLevel, &dxD3D11Context); } cl_int status = 0; cl_context_properties cps[7] = { CL_CONTEXT_D3D11_DEVICE_KHR, (cl_context_properties)(ID3D11Device*)dxD3D11Device, CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE, CL_CONTEXT_PLATFORM, (cl_context_properties)platform_, 0}; cl_context_properties* cprops = (NULL == platform_) ? NULL : cps; cl_uint deviceListSize = 0; clGetDeviceIDsFromD3D11KHR(platform_, CL_D3D11_DEVICE_KHR, dxD3D11Device, CL_PREFERRED_DEVICES_FOR_D3D11_KHR, 0, NULL, &deviceListSize); std::vector devices; devices.resize(deviceListSize); clGetDeviceIDsFromD3D11KHR(platform_, CL_D3D11_DEVICE_KHR, dxD3D11Device, CL_PREFERRED_DEVICES_FOR_D3D11_KHR, deviceListSize, &devices[0], NULL); bool ret = false; // Check that current device can be associated with OpenGL context for (unsigned int i = 0; i < deviceListSize; i++) { if (devices[i] == devices_[_deviceId]) { ret = true; break; } } if (ret) { char buf[2000]; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_EXTENSIONS, sizeof(buf), buf, NULL); context_ = clCreateContext(cprops, 1, &devices_[_deviceId], NULL, NULL, &status); _queue = clCreateCommandQueue(context_, devices_[_deviceId], 0, &status); } CHECK_RESULT((ret != true), "Can't find D3D device!"); } unsigned int OCLDX11Common::close(void) { clReleaseCommandQueue(_queue); unsigned int retVal = OCLTestImp::close(); // deleteDXDevice(hDX_); if (dxD3D11Context) dxD3D11Context->Release(); if (dxD3D11Device) dxD3D11Device->Release(); return retVal; } clr-rocm-5.7.1/opencl/tests/ocltst/module/dx/OCLDX11Common.h000066400000000000000000000050111450307266000234310ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DX11_COMMON_H_ #define _OCL_DX11_COMMON_H_ #include #include #include "OCLTestImp.h" #include "d3d11.h" typedef CL_API_ENTRY cl_mem(CL_API_CALL* clGetPlaneFromImageAMD_fn)( cl_context /* context */, cl_mem /* mem */, cl_uint /* plane */, cl_int* /* errcode_ret */); class OCLDX11Common : public OCLTestImp { public: // S/////////////////////////////////////// // private initialization and clean-up // ///////////////////////////////////////// OCLDX11Common(); virtual ~OCLDX11Common(); /////////////////////// // virtual interface // /////////////////////// virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual unsigned int close(void); protected: bool extensionsAvailable; ID3D11Device* dxD3D11Device; ID3D11DeviceContext* dxD3D11Context; ID3D11Texture2D* dxDX11Texture; cl_command_queue _queue; clGetDeviceIDsFromD3D11KHR_fn clGetDeviceIDsFromD3D11KHR; clCreateFromD3D11BufferKHR_fn clCreateFromD3D11BufferKHR; clCreateFromD3D11Texture2DKHR_fn clCreateFromD3D11Texture2DKHR; clCreateFromD3D11Texture3DKHR_fn clCreateFromD3D11Texture3DKHR; clEnqueueAcquireD3D11ObjectsKHR_fn clEnqueueAcquireD3D11ObjectsKHR; clEnqueueReleaseD3D11ObjectsKHR_fn clEnqueueReleaseD3D11ObjectsKHR; clGetPlaneFromImageAMD_fn clGetPlaneFromImageAMD; private: void ExtensionCheck(); }; #endif // _OCL_DX11_COMMON_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/dx/OCLDX11YUY2.cpp000066400000000000000000000357661450307266000233300ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLDX11YUY2.h" #include #include #include #include #define DXGI_FORMAT_NV12 103 #define DXGI_FORMAT_P010 104 #define GROUP_SIZE 256 const static char strKernel[] = "__constant sampler_t imageSampler = CLK_NORMALIZED_COORDS_FALSE | " "CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; \n" "__kernel void image2imageCopy( " " \n" " __read_only image2d_t input, " " \n" " __write_only image2d_t output) " " \n" "{ " " \n" " int2 coord = (int2)(get_global_id(0), get_global_id(1)); " " \n" " uint4 temp = read_imageui(input, imageSampler, coord); " " \n" " write_imageui(output, coord, temp); " " \n" "} " " \n"; OCLDX11YUY2::OCLDX11YUY2() : OCLDX11Common() { _numSubTests = 4; blockSizeX = GROUP_SIZE; blockSizeY = 1; } OCLDX11YUY2::~OCLDX11YUY2() {} void OCLDX11YUY2::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { dxDX11Texture = 0; clImage2DOut = 0; _openTest = test; // Initialize random number seed srand((unsigned int)time(NULL)); OCLDX11Common::open(test, units, conversion, deviceId); if (_errorFlag) return; if (!extensionsAvailable) { return; } if (_openTest < 2) { dxFormat = (DXGI_FORMAT)DXGI_FORMAT_NV12; extensionsAvailable = formatSupported(); if (!extensionsAvailable) { printf("DXGI_FORMAT_NV12 is required for this test!\n"); return; } } else { dxFormat = (DXGI_FORMAT)DXGI_FORMAT_P010; extensionsAvailable = formatSupported(); if (!extensionsAvailable) { printf("DXGI_FORMAT_P010 is required for this test!\n"); return; } } CompileKernel(); AllocateOpenCLImage(); } void OCLDX11YUY2::run(void) { if (_errorFlag) return; if (!extensionsAvailable) return; D3D11_TEXTURE2D_DESC Desc = {0}; Desc.ArraySize = 1; Desc.BindFlags = 0; Desc.Format = dxFormat; Desc.Width = OCLDX11YUY2::WIDTH; Desc.Height = OCLDX11YUY2::HEIGHT; Desc.MipLevels = 1; Desc.SampleDesc.Count = 1; // Desc.MiscFlags=D3D11_RESOURCE_MISC_SHARED; //MM for fast GPU interop // MM: these flags are incompatible with D3D11_RESOURCE_MISC_SHARED // now we allocate texture without CPU access and if needed use temp texture // (see FromSystemToDX11 and FromDX11ToSystem) Desc.Usage = D3D11_USAGE_STAGING; Desc.BindFlags = 0; Desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE | D3D11_CPU_ACCESS_READ; ID3D11Texture2D *pTextureTmp; HRESULT hr = dxD3D11Device->CreateTexture2D(&Desc, NULL, &pTextureTmp); // fill memory D3D11_MAPPED_SUBRESOURCE LockedRectD11; if (SUCCEEDED(hr)) { hr = dxD3D11Context->Map(pTextureTmp, 0, D3D11_MAP_WRITE, 0, &LockedRectD11); } if (SUCCEEDED(hr)) { // fill memory with something for (int y = 0; y < OCLDX11YUY2::HEIGHT; y++) { BYTE *pLine = (BYTE *)LockedRectD11.pData + y * LockedRectD11.RowPitch; BYTE *pLineUV = (BYTE *)LockedRectD11.pData + y * LockedRectD11.RowPitch + OCLDX11YUY2::HEIGHT * LockedRectD11.RowPitch; for (int x = 0; x < OCLDX11YUY2::WIDTH; x++) { *pLine++ = 0x7F; // Y if (y < OCLDX11YUY2::HEIGHT / 2 && x < OCLDX11YUY2::WIDTH / 2) { *pLineUV++ = 0x1F; // U *pLineUV++ = 0x2F; // V } } } dxD3D11Context->Unmap(pTextureTmp, 0); } Desc.BindFlags = D3D11_BIND_RENDER_TARGET | D3D11_BIND_SHADER_RESOURCE; Desc.Usage = D3D11_USAGE_DEFAULT; Desc.CPUAccessFlags = 0; Desc.MiscFlags = (_openTest == 0) ? 0 : D3D11_RESOURCE_MISC_SHARED; // MM for fast GPU interop hr = dxD3D11Device->CreateTexture2D(&Desc, NULL, &dxDX11Texture); if (pTextureTmp != NULL) { dxD3D11Context->CopySubresourceRegion(dxDX11Texture, 0, 0, 0, 0, pTextureTmp, 0, NULL); pTextureTmp->Release(); } testInterop(); } void OCLDX11YUY2::AllocateOpenCLImage() { cl_int status = 0; cl_image_format format{}; format.image_channel_order = CL_R; format.image_channel_data_type = (dxFormat == DXGI_FORMAT_NV12) ? CL_UNSIGNED_INT8 : CL_UNSIGNED_INT16; cl_image_desc descr{}; descr.image_type = CL_MEM_OBJECT_IMAGE2D; descr.image_width = WIDTH; descr.image_height = HEIGHT + HEIGHT / 2; clImage2DOut = clCreateImage(context_, CL_MEM_WRITE_ONLY, &format, &descr, NULL, &status); CHECK_RESULT((status != CL_SUCCESS), "AllocateOpenCLImage() failed"); } void OCLDX11YUY2::testInterop() { // alloc cl_int clStatus = 0; cl_mem clImage2D = clCreateFromD3D11Texture2DKHR(context_, 0, dxDX11Texture, 0, &clStatus); CHECK_RESULT((clStatus != CL_SUCCESS), "clCreateFromD3D11Texture2DKHR() failed"); // bring objects to the queue cl_event clEvent = NULL; clEnqueueAcquireD3D11ObjectsKHR(_queue, 1, &clImage2D, 0, NULL, &clEvent); clStatus = clWaitForEvents(1, &clEvent); clReleaseEvent(clEvent); CopyOpenCLImage(clImage2D); bool ImageReadWorks = CheckCLImage(clImage2D); bool bKernelWorks = CheckCLImage(clImage2DOut); CHECK_RESULT_NO_RETURN((ImageReadWorks != true), "CheckCLImage(clImage2D) failed"); CHECK_RESULT_NO_RETURN((bKernelWorks != true), "CheckCLImage(clImage2DOut) failed"); cl_mem planeY = clGetPlaneFromImageAMD(context_, clImage2D, 0, &clStatus); CHECK_RESULT((clStatus != CL_SUCCESS), "clGetPlaneFromImageAMD(context_,clImage2D,0,&clStatus) failed"); cl_mem planeUV = clGetPlaneFromImageAMD(context_, clImage2D, 1, &clStatus); CHECK_RESULT((clStatus != CL_SUCCESS), "clGetPlaneFromImageAMD(context_,clImage2D,1,&clStatus) failed"); bool ImageWorksY = CheckCLImageY(planeY); bool ImageWorksUV = CheckCLImageUV(planeUV); clReleaseMemObject(planeY); clReleaseMemObject(planeUV); // release clEvent = NULL; // release object from the queue clStatus = clEnqueueReleaseD3D11ObjectsKHR(_queue, 1, &clImage2D, 0, NULL, &clEvent); clStatus = clWaitForEvents(1, &clEvent); clReleaseEvent(clEvent); // release mem object clReleaseMemObject(clImage2D); CHECK_RESULT_NO_RETURN((ImageWorksY != true), "CheckCLImageY() failed"); CHECK_RESULT_NO_RETURN((ImageWorksUV != true), "CheckCLImageUV() failed"); } unsigned int OCLDX11YUY2::close(void) { if (clImage2DOut) clReleaseMemObject(clImage2DOut); if (dxDX11Texture) dxDX11Texture->Release(); return OCLDX11Common::close(); } bool OCLDX11YUY2::CheckCLImage(cl_mem clImage) { cl_int clStatus = 0; size_t pitch = 0; clStatus = clGetImageInfo(clImage, CL_IMAGE_ROW_PITCH, sizeof(pitch), &pitch, NULL); pitch *= 2; cl_image_format format; clStatus = clGetImageInfo(clImage, CL_IMAGE_FORMAT, sizeof(format), &format, NULL); size_t height; clStatus = clGetImageInfo(clImage, CL_IMAGE_HEIGHT, sizeof(height), &height, NULL); CHECK_RESULT_NO_RETURN(height != (HEIGHT + HEIGHT / 2), "CheckCLImage: height!=(HEIGHT+HEIGHT/2)"); char *pTempBuffer = new char[(HEIGHT + HEIGHT / 2) * pitch]; size_t origin[] = {0, 0, 0}; size_t region[] = {WIDTH, HEIGHT + HEIGHT / 2, 1}; clStatus = clEnqueueReadImage(_queue, clImage, 1, origin, region, pitch, 0, pTempBuffer, 0, 0, 0); ::clFinish(_queue); // test bool bBreak = false; for (int y = 0; y < HEIGHT && !bBreak; y++) { char *pLine = (char *)pTempBuffer + y * pitch; char *pLineUV = (char *)pTempBuffer + y * pitch + HEIGHT * pitch; for (int x = 0; x < WIDTH; x++) { if (*pLine != 0x7F) // Y { bBreak = true; break; } pLine++; if (y < HEIGHT / 2 && x < WIDTH / 2) { if (*pLineUV != 0x1F) // U { bBreak = true; break; } pLineUV++; if (*pLineUV != 0x2F) // V { bBreak = true; break; } pLineUV++; } } } delete[] pTempBuffer; return !bBreak; } bool OCLDX11YUY2::CheckCLImageY(cl_mem clImage) { cl_int clStatus = 0; size_t pitch = 0; clStatus = clGetImageInfo(clImage, CL_IMAGE_ROW_PITCH, sizeof(pitch), &pitch, NULL); pitch *= 2; cl_image_format format; clStatus = clGetImageInfo(clImage, CL_IMAGE_FORMAT, sizeof(format), &format, NULL); size_t height; clStatus = clGetImageInfo(clImage, CL_IMAGE_HEIGHT, sizeof(height), &height, NULL); CHECK_RESULT_NO_RETURN(height != HEIGHT, "CheckCLImageY: height!=HEIGHT"); char *pTempBuffer = new char[HEIGHT * pitch]; size_t origin[] = {0, 0, 0}; size_t region[] = {WIDTH, HEIGHT, 1}; clStatus = clEnqueueReadImage(_queue, clImage, 1, origin, region, pitch, 0, pTempBuffer, 0, 0, 0); ::clFinish(_queue); // test bool bBreak = false; for (int y = 0; y < HEIGHT && !bBreak; y++) { char *pLine = (char *)pTempBuffer + y * pitch; for (int x = 0; x < WIDTH; x++) { if (*pLine != 0x7F) // Y { bBreak = true; break; } pLine++; } } delete[] pTempBuffer; return !bBreak; } bool OCLDX11YUY2::CheckCLImageUV(cl_mem clImage) { cl_int clStatus = 0; size_t pitch = 0; clStatus = clGetImageInfo(clImage, CL_IMAGE_ROW_PITCH, sizeof(pitch), &pitch, NULL); pitch *= 2; size_t width = 0; clStatus = clGetImageInfo(clImage, CL_IMAGE_WIDTH, sizeof(width), &width, NULL); cl_image_format format; clStatus = clGetImageInfo(clImage, CL_IMAGE_FORMAT, sizeof(format), &format, NULL); size_t height; clStatus = clGetImageInfo(clImage, CL_IMAGE_HEIGHT, sizeof(height), &height, NULL); CHECK_RESULT_NO_RETURN(height != HEIGHT / 2, "CheckCLImageUV: height!=HEIGHT/2"); char *pTempBuffer = new char[(HEIGHT / 2) * pitch]; size_t origin[] = {0, 0, 0}; size_t region[] = {WIDTH / 2, HEIGHT / 2, 1}; clStatus = clEnqueueReadImage(_queue, clImage, 1, origin, region, pitch, 0, pTempBuffer, 0, 0, 0); ::clFinish(_queue); bool bBreak = false; for (int y = 0; y < HEIGHT / 2 && !bBreak; y++) { char *pLineUV = (char *)pTempBuffer + y * pitch; for (int x = 0; x < WIDTH / 2; x++) { if (*pLineUV != 0x1F) // U { bBreak = true; break; } pLineUV++; if (*pLineUV != 0x2F) // V { bBreak = true; break; } pLineUV++; } } delete[] pTempBuffer; return !bBreak; } void OCLDX11YUY2::CopyOpenCLImage(cl_mem clImageSrc) { cl_int status = 0; // Set appropriate arguments to the kernel2D // input buffer image status = clSetKernelArg(kernel_, 0, sizeof(cl_mem), &clImageSrc); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLImage() failed at " "clSetKernelArg(kernel_,0,sizeof(cl_mem),&clImageSrc)"); status = clSetKernelArg(kernel_, 1, sizeof(cl_mem), &clImage2DOut); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLImage() failed at " "clSetKernelArg(kernel_,1,sizeof(cl_mem),&clImage2DOut)"); // Enqueue a kernel run call. size_t global_work_offset[] = {0, 0}; size_t globalThreads[] = {WIDTH, HEIGHT + HEIGHT / 2}; size_t localThreads[] = {blockSizeX, blockSizeY}; // status = // clEnqueueNDRangeKernel(_queue,kernel_,2,NULL,globalThreads,localThreads,0,NULL,0); status = clEnqueueNDRangeKernel(_queue, kernel_, 2, NULL, globalThreads, NULL, 0, NULL, 0); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLImage() failed at clEnqueueNDRangeKernel"); status = clFinish(_queue); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLImage() failed at clFinish"); } void OCLDX11YUY2::CompileKernel() { cl_int status = 0; size_t kernelSize = sizeof(strKernel); const char *strs = (const char *)&strKernel[0]; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strs, &kernelSize, &status); status = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (status != CL_SUCCESS) { if (status == CL_BUILD_PROGRAM_FAILURE) { cl_int logStatus; size_t buildLogSize = 0; logStatus = clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, buildLogSize, NULL, &buildLogSize); std::string buildLog; buildLog.resize(buildLogSize); logStatus = clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, buildLogSize, &buildLog[0], NULL); printf("%s", buildLog.c_str()); } return; } // get a kernel object handle for a kernel with the given name kernel_ = _wrapper->clCreateKernel(program_, "image2imageCopy", &status); size_t kernel2DWorkGroupSize = 0; status = clGetKernelWorkGroupInfo(kernel_, devices_[_deviceId], CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &kernel2DWorkGroupSize, 0); if ((blockSizeX * blockSizeY) > kernel2DWorkGroupSize) { if (blockSizeX > kernel2DWorkGroupSize) { blockSizeX = kernel2DWorkGroupSize; blockSizeY = 1; } } } bool OCLDX11YUY2::formatSupported() { UINT supported = 0u; dxD3D11Device->CheckFormatSupport(dxFormat, (UINT *)&supported); return supported & D3D11_FORMAT_SUPPORT_TEXTURE2D; } clr-rocm-5.7.1/opencl/tests/ocltst/module/dx/OCLDX11YUY2.h000066400000000000000000000037631450307266000227650ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DX11_YUY2_H_ #define _OCL_DX11_YUY2_H_ #include "OCLDX11Common.h" class OCLDX11YUY2 : public OCLDX11Common { public: OCLDX11YUY2(); virtual ~OCLDX11YUY2(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); protected: static const unsigned int WIDTH = 1280; static const unsigned int HEIGHT = 720; void testInterop(); void AllocateOpenCLImage(); bool CheckCLImage(cl_mem clImage); bool CheckCLImageY(cl_mem clImage); bool CheckCLImageUV(cl_mem clImage); void CopyOpenCLImage(cl_mem clImageSrc); void CompileKernel(); bool formatSupported(); void testFormat(); size_t blockSizeX; /**< Work-group size in x-direction */ size_t blockSizeY; /**< Work-group size in y-direction */ cl_mem clImage2DOut; DXGI_FORMAT dxFormat; }; #endif // _OCL_DX11_YUY2_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/dx/TestList.cpp000066400000000000000000000032261450307266000233610ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLTestListImp.h" // // Includes for tests // #ifdef _WIN32 #include "OCLDX11YUY2.h" #endif // // Helper macro for adding tests // template static void* dictionary_CreateTestFunc(void) { return new T(); } #define TEST(name) \ { #name, &dictionary_CreateTestFunc < name> } #ifdef _WIN32 TestEntry TestList[] = {TEST(OCLDX11YUY2)}; unsigned int TestListCount = sizeof(TestList) / sizeof(TestList[0]); #else TestEntry TestList[] = {{"void", 0}}; unsigned int TestListCount = 0; #endif unsigned int TestLibVersion = 0; const char* TestLibName = "ocldx"; clr-rocm-5.7.1/opencl/tests/ocltst/module/dx/ocldx.exclude000066400000000000000000000000141450307266000235560ustar00rootroot00000000000000# all clear clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/000077500000000000000000000000001450307266000210665ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/CMakeLists.txt000066400000000000000000000043241450307266000236310ustar00rootroot00000000000000set(TESTS OCLGLBuffer OCLGLBufferMultipleQueues OCLGLDepthBuffer OCLGLDepthTex OCLGLFenceSync OCLGLMsaaTexture OCLGLMultiContext OCLGLTexture OCLGLPerfSepia ) add_library(oclgl SHARED TestList.cpp $) foreach(TEST ${TESTS}) target_sources(oclgl PRIVATE ${TEST}.cpp) endforeach() target_sources(oclgl PRIVATE ${OCLTST_DIR}/module/common/OCLGLCommon.cpp ${OCLTST_DIR}/module/common/OCLGLCommonLinux.cpp) target_include_directories(oclgl PUBLIC ${OPENGL_INCLUDE_DIR} ${GLEW_INCLUDE_DIRS}) set_target_properties(oclgl PROPERTIES CXX_STANDARD 14 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst) target_compile_definitions(oclgl PRIVATE $) target_include_directories(oclgl PRIVATE $) target_link_libraries(oclgl PRIVATE OpenCL ${GLEW_LIBRARIES} ${OPENGL_LIBRARIES}) add_custom_command( TARGET oclgl POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/oclgl.exclude ${CMAKE_BINARY_DIR}/tests/ocltst/ogl.exclude) add_custom_target(test.ocltst.oclgl COMMAND ${CMAKE_COMMAND} -E env "OCL_ICD_FILENAMES=$" $ -p 0 -m $ -A oclgl.exclude DEPENDS ocltst oclgl amdocl WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst USES_TERMINAL) foreach(TEST ${TESTS}) add_custom_target(test.ocltst.oclgl.${TEST} COMMAND ${CMAKE_COMMAND} -E env "OCL_ICD_FILENAMES=$" $ -p 0 -m $ -t ${TEST} DEPENDS ocltst oclgl amdocl WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst USES_TERMINAL) endforeach() INSTALL(TARGETS oclgl DESTINATION ${OCLTST_INSTALL_DIR} COMPONENT ocltst) INSTALL(FILES oclgl.exclude DESTINATION ${OCLTST_INSTALL_DIR} COMPONENT ocltst) clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLBuffer.cpp000066400000000000000000000211061450307266000235640ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLBuffer.h" #include #include #include #include const static char* strKernel = "__kernel void glbuffer_test( __global uint4 *source, __global uint4 " "*glDest, __global uint4 *clDest) \n" "{ " " \n" " int tid = get_global_id(0); " " \n" " clDest[ tid ] = source[ tid ] + (uint4)(1); " " \n" " glDest[ tid ] = source[ tid ] + (uint4)(2); " " \n" "} " " \n"; OCLGLBuffer::OCLGLBuffer() : inGLBuffer_(0), outGLBuffer_(0) { _numSubTests = 1; } OCLGLBuffer::~OCLGLBuffer() {} void OCLGLBuffer::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { // Initialize random number seed srand((unsigned int)time(NULL)); OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; // Build the kernel program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); kernel_ = _wrapper->clCreateKernel(program_, "glbuffer_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } void OCLGLBuffer::run(void) { if (_errorFlag) { return; } cl_mem buffer; cl_uint4 inData[c_numOfElements] = {{{0}}}; cl_uint4 outDataCL[c_numOfElements] = {{{0}}}; cl_uint4 outDataGL[c_numOfElements] = {{{0}}}; // Initialize input data with random values for (unsigned int i = 0; i < c_numOfElements; i++) { for (unsigned int j = 0; j < sizeof(cl_uint4) / sizeof(cl_uint); j++) { inData[i].s[j] = (unsigned int)rand(); } } // Generate and Bind in & out OpenGL buffers glGenBuffers(1, &inGLBuffer_); glGenBuffers(1, &outGLBuffer_); glBindBuffer(GL_ARRAY_BUFFER, inGLBuffer_); glBufferData(GL_ARRAY_BUFFER, c_numOfElements * sizeof(cl_uint4), inData, GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, outGLBuffer_); glBufferData(GL_ARRAY_BUFFER, c_numOfElements * sizeof(cl_uint4), outDataGL, GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, 0); glFinish(); // Create input buffer from GL input buffer buffer = _wrapper->clCreateFromGLBuffer(context_, CL_MEM_READ_ONLY, inGLBuffer_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create input GL buffer (%d)", error_); buffers_.push_back(buffer); // Create output buffer from GL output buffer buffer = _wrapper->clCreateFromGLBuffer(context_, CL_MEM_WRITE_ONLY, outGLBuffer_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create output GL buffer (%d)", error_); buffers_.push_back(buffer); // Create a CL output buffer buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, c_numOfElements * sizeof(cl_uint4), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed (%d)", error_); buffers_.push_back(buffer); // Assign args and execute for (unsigned int i = 0; i < buffers_.size(); i++) { error_ = _wrapper->clSetKernelArg(kernel_, i, sizeof(cl_mem), &buffers()[i]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed (%d)", error_); } error_ = _wrapper->clEnqueueAcquireGLObjects(cmdQueues_[_deviceId], 2, &buffers()[0], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); size_t gws[1] = {c_numOfElements}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed (%d)", error_); error_ = _wrapper->clEnqueueReleaseGLObjects(cmdQueues_[_deviceId], 2, &buffers()[0], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReleaseGLObjects failed (%d)", error_); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_RESULT((error_ != CL_SUCCESS), "clFinish() failed (%d)", error_); // Get the results from both CL and GL buffers error_ = _wrapper->clEnqueueReadBuffer( cmdQueues_[_deviceId], buffers()[2], CL_TRUE, 0, c_numOfElements * sizeof(cl_uint4), outDataCL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to read output CL array! (%d)", error_); glBindBuffer(GL_ARRAY_BUFFER, outGLBuffer_); void* glMem = glMapBuffer(GL_ARRAY_BUFFER, GL_READ_ONLY); memcpy(outDataGL, glMem, c_numOfElements * sizeof(cl_uint4)); glUnmapBuffer(GL_ARRAY_BUFFER); cl_uint4 expectedCL = {{0}}; cl_uint4 expectedGL = {{0}}; // Check output for (unsigned int i = 0; i < c_numOfElements; ++i) { // Calculate expected value in CL output buffer (input + 1) expectedCL = inData[i]; expectedCL.s[0]++; expectedCL.s[1]++; expectedCL.s[2]++; expectedCL.s[3]++; // Calculate expected value in GL output buffer (input + 2) expectedGL = inData[i]; expectedGL.s[0] += 2; expectedGL.s[1] += 2; expectedGL.s[2] += 2; expectedGL.s[3] += 2; // Compare expected output with actual data received for (unsigned int j = 0; j < sizeof(cl_uint4) / sizeof(cl_uint); j++) { CHECK_RESULT((outDataCL[i].s[j] != expectedCL.s[j]), "Element %d in CL output buffer is incorrect!\n\t \ expected:{%d, %d, %d, %d} differs from actual:{%d, %d, %d, %d}", i, expectedCL.s[0], expectedCL.s[1], expectedCL.s[2], expectedCL.s[3], outDataCL[i].s[0], outDataCL[i].s[1], outDataCL[i].s[2], outDataCL[i].s[3]); CHECK_RESULT((outDataGL[i].s[j] != expectedGL.s[j]), "Element %d in GL output buffer is incorrect!\n\t \ expected:{%d, %d, %d, %d} differs from actual:{%d, %d, %d, %d}", i, expectedGL.s[0], expectedGL.s[1], expectedGL.s[2], expectedGL.s[3], outDataGL[i].s[0], outDataGL[i].s[1], outDataGL[i].s[2], outDataGL[i].s[3]); } } } unsigned int OCLGLBuffer::close(void) { for (unsigned int i = 0; i < buffers().size(); ++i) { clReleaseMemObject(buffers()[i]); } buffers_.clear(); // Delete GL in & out buffers glBindBuffer(GL_ARRAY_BUFFER, 0); glDeleteBuffers(1, &inGLBuffer_); inGLBuffer_ = 0; glDeleteBuffers(1, &outGLBuffer_); outGLBuffer_ = 0; return OCLGLCommon::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLBuffer.h000066400000000000000000000030711450307266000232320ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_BUFFER_H_ #define _OCL_GL_BUFFER_H_ #include "OCLGLCommon.h" class OCLGLBuffer : public OCLGLCommon { public: OCLGLBuffer(); virtual ~OCLGLBuffer(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: static const unsigned int c_numOfElements = 1024; GLuint inGLBuffer_; GLuint outGLBuffer_; }; #endif // _OCL_GL_BUFFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLBufferMultipleQueues.cpp000066400000000000000000000313031450307266000264700ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLBufferMultipleQueues.h" #include #include #include #include const static char* strKernel = "__kernel void glbuffer_test( __global uint4 *source, __global uint4 " "*glDest, __global uint4 *clDest) \n" "{ " " \n" " int tid = get_global_id(0); " " \n" " glDest[ tid ] = source[ tid ] + (uint4)(2); " " \n" " clDest[ tid ] = source[ tid ] + (uint4)(1); " " \n" "} " " \n"; OCLGLBufferMultipleQueues::OCLGLBufferMultipleQueues() { _numSubTests = 1; } OCLGLBufferMultipleQueues::~OCLGLBufferMultipleQueues() {} void OCLGLBufferMultipleQueues::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { // Initialize random number seed srand((unsigned int)time(NULL)); OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; // Create multiple queues for the device (first add already created queue in // OCLGLCommon::open, then add a second queue) deviceCmdQueues_.resize(QUEUES_PER_DEVICE_COUNT); deviceCmdQueues_[0] = cmdQueues_[deviceId]; for (int queueIndex = 1; queueIndex < QUEUES_PER_DEVICE_COUNT; queueIndex++) { cl_command_queue cmdQueue = _wrapper->clCreateCommandQueue( context_, devices_[deviceId], 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed"); deviceCmdQueues_[queueIndex] = cmdQueue; } // Build the kernel program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); kernel_ = _wrapper->clCreateKernel(program_, "glbuffer_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } void OCLGLBufferMultipleQueues::run(void) { if (_errorFlag) { return; } inputGLBufferPerQueue_.resize(QUEUES_PER_DEVICE_COUNT, NULL); outputGLBufferPerQueue_.resize(QUEUES_PER_DEVICE_COUNT, NULL); outputCLBufferPerQueue_.resize(QUEUES_PER_DEVICE_COUNT, NULL); std::vector > inData( QUEUES_PER_DEVICE_COUNT); // Input data per queue inGLBufferIDs_.resize(QUEUES_PER_DEVICE_COUNT, 0); outGLBufferIDs_.resize(QUEUES_PER_DEVICE_COUNT, 0); for (int queueIndex = 0; queueIndex < QUEUES_PER_DEVICE_COUNT; queueIndex++) { // Initialize input data with random values inData[queueIndex].resize(BUFFER_ELEMENTS_COUNT); for (int i = 0; i < BUFFER_ELEMENTS_COUNT; i++) { for (unsigned int j = 0; j < sizeof(cl_uint4) / sizeof(cl_uint); j++) { inData[queueIndex][i].s[j] = (unsigned int)rand(); } } // Generate and Bind in & out OpenGL buffers glGenBuffers(1, &inGLBufferIDs_[queueIndex]); glGenBuffers(1, &outGLBufferIDs_[queueIndex]); glBindBuffer(GL_ARRAY_BUFFER, inGLBufferIDs_[queueIndex]); glBufferData(GL_ARRAY_BUFFER, BUFFER_ELEMENTS_COUNT * sizeof(cl_uint4), &inData[queueIndex][0], GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, outGLBufferIDs_[queueIndex]); glBufferData(GL_ARRAY_BUFFER, BUFFER_ELEMENTS_COUNT * sizeof(cl_uint4), NULL, GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, 0); glFinish(); // Create input buffer from GL input buffer inputGLBufferPerQueue_[queueIndex] = _wrapper->clCreateFromGLBuffer( context_, CL_MEM_READ_ONLY, inGLBufferIDs_[queueIndex], &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create input GL buffer (%d)", error_); // Create output buffer from GL output buffer outputGLBufferPerQueue_[queueIndex] = _wrapper->clCreateFromGLBuffer( context_, CL_MEM_WRITE_ONLY, outGLBufferIDs_[queueIndex], &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create output GL buffer (%d)", error_); // Create a CL output buffer outputCLBufferPerQueue_[queueIndex] = _wrapper->clCreateBuffer( context_, CL_MEM_WRITE_ONLY, BUFFER_ELEMENTS_COUNT * sizeof(cl_uint4), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed (%d)", error_); } for (int queueIndex = 0; queueIndex < QUEUES_PER_DEVICE_COUNT; queueIndex++) { // Assign arguments to kernel according to queue index error_ = _wrapper->clSetKernelArg( kernel_, 0, sizeof(cl_mem), &inputGLBufferPerQueue_[queueIndex]); // Input source CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed (%d)", error_); error_ = _wrapper->clSetKernelArg( kernel_, 1, sizeof(cl_mem), &outputGLBufferPerQueue_[queueIndex]); // Output glDest CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed (%d)", error_); error_ = _wrapper->clSetKernelArg( kernel_, 2, sizeof(cl_mem), &outputCLBufferPerQueue_[queueIndex]); // Output clDest CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed (%d)", error_); // Acquire input GL buffer error_ = _wrapper->clEnqueueAcquireGLObjects( deviceCmdQueues_[queueIndex], 1, &inputGLBufferPerQueue_[queueIndex], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); // Acquire output GL buffer error_ = _wrapper->clEnqueueAcquireGLObjects( deviceCmdQueues_[queueIndex], 1, &outputGLBufferPerQueue_[queueIndex], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); // Enqueue the kernel size_t gws[1] = {BUFFER_ELEMENTS_COUNT}; error_ = _wrapper->clEnqueueNDRangeKernel(deviceCmdQueues_[queueIndex], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed (%d)", error_); // Release input GL buffer error_ = _wrapper->clEnqueueReleaseGLObjects( deviceCmdQueues_[queueIndex], 1, &inputGLBufferPerQueue_[queueIndex], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReleaseGLObjects failed (%d)", error_); // Release output GL buffer error_ = _wrapper->clEnqueueReleaseGLObjects( deviceCmdQueues_[queueIndex], 1, &outputGLBufferPerQueue_[queueIndex], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReleaseGLObjects failed (%d)", error_); // Flush commands in order to trigger the operations error_ = _wrapper->clFlush(deviceCmdQueues_[queueIndex]); CHECK_RESULT((error_ != CL_SUCCESS), "clFlush() failed (%d)", error_); } for (int queueIndex = 0; queueIndex < QUEUES_PER_DEVICE_COUNT; queueIndex++) { // Get the results from CL buffer (in a synchronous manner) cl_uint4 outDataCL[BUFFER_ELEMENTS_COUNT]; error_ = _wrapper->clEnqueueReadBuffer( deviceCmdQueues_[queueIndex], outputCLBufferPerQueue_[queueIndex], CL_TRUE, 0, BUFFER_ELEMENTS_COUNT * sizeof(cl_uint4), outDataCL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to read output CL array! (%d)", error_); cl_uint4 outDataGL[BUFFER_ELEMENTS_COUNT] = {{{0}}}; glBindBuffer(GL_ARRAY_BUFFER, outGLBufferIDs_[queueIndex]); // why again void* glMem = glMapBuffer(GL_ARRAY_BUFFER, GL_READ_ONLY); memcpy(outDataGL, glMem, BUFFER_ELEMENTS_COUNT * sizeof(cl_uint4)); glUnmapBuffer(GL_ARRAY_BUFFER); cl_uint4 expectedCL = {{0}}; cl_uint4 expectedGL = {{0}}; // Check output for (int i = 0; i < BUFFER_ELEMENTS_COUNT; ++i) { // Calculate expected value in CL output buffer (input + 1) expectedCL = inData[queueIndex][i]; expectedCL.s[0]++; expectedCL.s[1]++; expectedCL.s[2]++; expectedCL.s[3]++; // Calculate expected value in GL output buffer (input + 2) expectedGL = inData[queueIndex][i]; expectedGL.s[0] += 2; expectedGL.s[1] += 2; expectedGL.s[2] += 2; expectedGL.s[3] += 2; // Compare expected output with actual data received for (unsigned int j = 0; j < sizeof(cl_uint4) / sizeof(cl_uint); j++) { CHECK_RESULT((outDataCL[i].s[j] != expectedCL.s[j]), "Element %d in CL output buffer is incorrect!\n\t \ expected:{%d, %d, %d, %d} differs from actual:{%d, %d, %d, %d}", i, expectedCL.s[0], expectedCL.s[1], expectedCL.s[2], expectedCL.s[3], outDataCL[i].s[0], outDataCL[i].s[1], outDataCL[i].s[2], outDataCL[i].s[3]); CHECK_RESULT((outDataGL[i].s[j] != expectedGL.s[j]), "Element %d in GL output buffer is incorrect!\n\t \ expected:{%d, %d, %d, %d} differs from actual:{%d, %d, %d, %d}", i, expectedGL.s[0], expectedGL.s[1], expectedGL.s[2], expectedGL.s[3], outDataGL[i].s[0], outDataGL[i].s[1], outDataGL[i].s[2], outDataGL[i].s[3]); } } } } unsigned int OCLGLBufferMultipleQueues::close(void) { // Release cl buffers (must be done before releasing the associated GL // buffers) for (int bufferIndex = 0; bufferIndex < (int)inputGLBufferPerQueue_.size(); bufferIndex++) { error_ = _wrapper->clReleaseMemObject(inputGLBufferPerQueue_[bufferIndex]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseMemObject() failed"); } for (int bufferIndex = 0; bufferIndex < (int)outputGLBufferPerQueue_.size(); bufferIndex++) { error_ = _wrapper->clReleaseMemObject(outputGLBufferPerQueue_[bufferIndex]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseMemObject() failed"); } for (int bufferIndex = 0; bufferIndex < (int)outputCLBufferPerQueue_.size(); bufferIndex++) { error_ = _wrapper->clReleaseMemObject(outputCLBufferPerQueue_[bufferIndex]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseMemObject() failed"); } // Delete GL in & out buffers glBindBuffer(GL_ARRAY_BUFFER, 0); if (!inGLBufferIDs_.empty()) { glDeleteBuffers((int)inGLBufferIDs_.size(), &inGLBufferIDs_[0]); } if (!outGLBufferIDs_.empty()) { glDeleteBuffers((int)outGLBufferIDs_.size(), &outGLBufferIDs_[0]); } // Release queues created by open method, the first queue per device is // released by base class for (int queueIndex = 1; queueIndex < (int)deviceCmdQueues_.size(); queueIndex++) { error_ = _wrapper->clReleaseCommandQueue(deviceCmdQueues_[queueIndex]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } deviceCmdQueues_.clear(); return OCLGLCommon::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLBufferMultipleQueues.h000066400000000000000000000041631450307266000261410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_BUFFER_MULTIPLE_QUEUES_H_ #define _OCL_GL_BUFFER_MULTIPLE_QUEUES_H_ #include "OCLGLCommon.h" class OCLGLBufferMultipleQueues : public OCLGLCommon { public: OCLGLBufferMultipleQueues(); virtual ~OCLGLBufferMultipleQueues(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: static const int BUFFER_ELEMENTS_COUNT = 1024; static const int QUEUES_PER_DEVICE_COUNT = 2; std::vector deviceCmdQueues_; // Multiple queues per device (single device) std::vector inputGLBufferPerQueue_; // Input GL buffer per queue std::vector outputGLBufferPerQueue_; // Output GL buffer per queue std::vector outputCLBufferPerQueue_; // Input CL buffer per queue std::vector inGLBufferIDs_; // Input GL buffers IDs std::vector outGLBufferIDs_; // Output GL buffers IDs }; #endif // _OCL_GL_BUFFER_MULTIPLE_QUEUES_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLDepthBuffer.cpp000066400000000000000000000223661450307266000245620ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLDepthBuffer.h" #include #include #include #include const static char* strKernel = "#pragma OPENCL EXTENSION cl_amd_printf : enable\n" "__kernel void gldepths_test( __global float *output, read_only image2d_t " "source, sampler_t sampler){ \n" " int tidX = get_global_id(0);\n" " int tidY = get_global_id(1);\n" " float4 value = read_imagef( source, sampler, (int2)( tidX, tidY ) );\n" " output[ tidY * get_image_width( source ) + tidX ] = value.z;\n" "}\n"; OCLGLDepthBuffer::OCLGLDepthBuffer() : glDepthBuffer_(0), frameBufferOBJ_(0), colorBuffer_(0), clOutputBuffer_(0), clDepth_(0), clSampler_(0), pGLOutput_(0), pCLOutput_(0), extensionSupported_(false) { _numSubTests = 2; _currentTest = 0; } OCLGLDepthBuffer::~OCLGLDepthBuffer() {} void OCLGLDepthBuffer::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; char* pExtensions = (char*)malloc(8192); size_t returnSize; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_EXTENSIONS, 8192, pExtensions, &returnSize); // if extension if not supported if (!strstr(pExtensions, "cl_khr_gl_depth_images")) { printf("skipping test depth interop not supported\n"); free(pExtensions); return; } free(pExtensions); extensionSupported_ = true; _currentTest = test; // Build the kernel program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); kernel_ = _wrapper->clCreateKernel(program_, "gldepths_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } void OCLGLDepthBuffer::run(void) { if (_errorFlag || !extensionSupported_) { return; } bool retVal; switch (_currentTest) { case 0: retVal = testDepthRead(GL_DEPTH_COMPONENT32F, GL_DEPTH_ATTACHMENT); break; case 1: retVal = testDepthRead(GL_DEPTH_COMPONENT16, GL_DEPTH_ATTACHMENT); break; case 2: retVal = testDepthRead(GL_DEPTH24_STENCIL8, GL_DEPTH_STENCIL_ATTACHMENT); break; case 3: retVal = testDepthRead(GL_DEPTH32F_STENCIL8, GL_DEPTH_STENCIL_ATTACHMENT); break; default: CHECK_RESULT(true, "unsupported test number\n"); } CHECK_RESULT((retVal != true), "cl-gl depth test failed "); } bool OCLGLDepthBuffer::testDepthRead(GLint internalFormat, GLenum attachmentType) { cl_int error; size_t dimSizes[] = {c_dimSize, c_dimSize}; unsigned int bufferSize = c_dimSize * c_dimSize * 4; bool retVal = false; pGLOutput_ = (float*)malloc(bufferSize); pCLOutput_ = (float*)malloc(bufferSize); // create Frame buffer object glGenFramebuffers(1, &frameBufferOBJ_); // create textures glGenTextures(1, &colorBuffer_); glEnable(GL_TEXTURE_2D); glBindTexture(GL_TEXTURE_2D, colorBuffer_); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, c_dimSize, c_dimSize, 0, GL_RGBA, GL_UNSIGNED_BYTE, 0); glBindTexture(GL_TEXTURE_2D, 0); // create a renderbuffer for the depth/stencil buffer glGenRenderbuffers(1, &glDepthBuffer_); glBindRenderbuffer(GL_RENDERBUFFER, glDepthBuffer_); glRenderbufferStorage(GL_RENDERBUFFER, internalFormat, c_dimSize, c_dimSize); // glBindFramebuffer(GL_FRAMEBUFFER, frameBufferOBJ_); glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, colorBuffer_, 0); glFramebufferRenderbuffer(GL_FRAMEBUFFER, attachmentType, GL_RENDERBUFFER, glDepthBuffer_); GLenum status = glCheckFramebufferStatus(GL_FRAMEBUFFER); if (GL_FRAMEBUFFER_COMPLETE != status) { return false; } // set up gl state machine glViewport(0, 0, c_dimSize, c_dimSize); // Reset The Current Viewport glMatrixMode(GL_PROJECTION); // Select The Projection Matrix glLoadIdentity(); // Reset The Projection Matrix gluPerspective(30.0f, (GLfloat)c_dimSize / (GLfloat)c_dimSize, 0.1f, 100.0f); glMatrixMode(GL_MODELVIEW); // Select The Modelview Matrix glLoadIdentity(); glEnable(GL_DEPTH_TEST); // The Type Of Depth Testing To Do glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // Clear Screen And Depth Buffer glBegin(GL_QUADS); // Draw A Quad glVertex3f(-1.0f, 1.0f, -6.0f); // Top Left glVertex3f(1.0f, 1.0f, -6.0f); // Top Right glVertex3f(1.0f, -1.0f, -3.0f); // Bottom Right glVertex3f(-1.0f, -1.0f, -3.0f); // Bottom Left glEnd(); glFinish(); clDepth_ = _wrapper->clCreateFromGLRenderbuffer(context_, CL_MEM_READ_WRITE, glDepthBuffer_, &error); if (CL_SUCCESS != error) { printf("clCreateFromGLRenderbuffer failed\n"); return false; } clOutputBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, bufferSize, NULL, &error); if (CL_SUCCESS != error) return false; clSampler_ = _wrapper->clCreateSampler(context_, CL_FALSE, CL_ADDRESS_NONE, CL_FILTER_NEAREST, &error); if (CL_SUCCESS != error) return false; error = _wrapper->clEnqueueAcquireGLObjects(cmdQueues_[_deviceId], 1, &clDepth_, 0, NULL, NULL); _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &clOutputBuffer_); _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), &clDepth_); _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_sampler), &clSampler_); _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, dimSizes, NULL, 0, NULL, NULL); _wrapper->clEnqueueReleaseGLObjects(cmdQueues_[_deviceId], 1, &clDepth_, 0, NULL, NULL); _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], clOutputBuffer_, CL_TRUE, 0, bufferSize, pCLOutput_, 0, NULL, NULL); glReadPixels(0, 0, c_dimSize, c_dimSize, GL_DEPTH_COMPONENT, GL_FLOAT, pGLOutput_); // test that both resources are identical. if (0 == memcmp(pGLOutput_, pCLOutput_, bufferSize)) { retVal = true; // test successful } else { printf("expected results is different from actual results\n"); dumpBuffer(pGLOutput_, "GLDepth.csv", c_dimSize); dumpBuffer(pCLOutput_, "CLDepth.csv", c_dimSize); } return retVal; } unsigned int OCLGLDepthBuffer::close(void) { if (pGLOutput_) { free(pGLOutput_); pGLOutput_ = NULL; } if (pCLOutput_) { free(pCLOutput_); pCLOutput_ = NULL; } clReleaseMemObject(clDepth_); clReleaseMemObject(clOutputBuffer_); clReleaseSampler(clSampler_); // unbind the texture and frame buffer. glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, 0, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, 0, 0); glBindFramebuffer(GL_FRAMEBUFFER, 0); // clean gl resources glDeleteFramebuffers(1, &frameBufferOBJ_); frameBufferOBJ_ = 0; glDeleteTextures(1, &colorBuffer_); colorBuffer_ = 0; glDeleteTextures(1, &glDepthBuffer_); glDepthBuffer_ = 0; return OCLGLCommon::close(); } // helper functions unsigned int OCLGLDepthBuffer::formatToSize(GLint internalFormat) { switch (internalFormat) { case GL_DEPTH_COMPONENT32F: return 4; break; case GL_DEPTH_COMPONENT16: return 2; break; case GL_DEPTH24_STENCIL8: return 4; break; case GL_DEPTH32F_STENCIL8: return 8; break; default: return 0; } } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLDepthBuffer.h000066400000000000000000000043571450307266000242270ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_DEPTH_BUFFER_H_ #define _OCL_GL_DEPTH_BUFFER_H_ #include "OCLGLCommon.h" class OCLGLDepthBuffer : public OCLGLCommon { public: OCLGLDepthBuffer(); virtual ~OCLGLDepthBuffer(); static const unsigned int c_dimSize = 128; virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: //////////////////// // test functions // //////////////////// bool testDepthRead(GLint internalFormat, GLenum attachmentType); unsigned int _currentTest; ///////////////////// // private members // ///////////////////// // GL resource identifiers GLuint glDepthBuffer_; GLuint frameBufferOBJ_; GLuint colorBuffer_; // CL identifiers cl_mem clOutputBuffer_; cl_mem clDepth_; cl_sampler clSampler_; // pointers to buffers float* pGLOutput_; float* pCLOutput_; bool extensionSupported_; ////////////////////////////// // private helper functions // ////////////////////////////// // returns element size in bytes. static unsigned int formatToSize(GLint internalFormat); }; #endif // _OCL_GL_BUFFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLDepthTex.cpp000066400000000000000000000227561450307266000241140ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLDepthTex.h" #include #include #include #include #include const static char* strKernel = "__kernel void gldepths_test( __global float *output, read_only image2d_t " "source, sampler_t sampler){ \n" " int tidX = get_global_id(0);\n" " int tidY = get_global_id(1);\n" " float4 value = read_imagef( source, sampler, (int2)( tidX, tidY ) );\n" " output[ tidY * get_image_width( source ) + tidX ] = value.z;\n" "}\n"; OCLGLDepthTex::OCLGLDepthTex() : glDepthBuffer_(0), frameBufferOBJ_(0), colorBuffer_(0), clOutputBuffer_(0), clDepth_(0), clSampler_(0), pGLOutput_(0), pCLOutput_(0), extensionSupported_(false) { _numSubTests = 8; _currentTest = 0; } OCLGLDepthTex::~OCLGLDepthTex() {} void OCLGLDepthTex::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; char* pExtensions = (char*)malloc(8192); size_t returnSize; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_EXTENSIONS, 8192, pExtensions, &returnSize); // if extension if not supported if (!strstr(pExtensions, "cl_khr_gl_depth_images")) { free(pExtensions); printf("skipping test depth interop not supported\n"); return; } free(pExtensions); extensionSupported_ = true; static const char* OpenCL20Kernel = "-cl-std=CL2.0"; const char* options = OpenCL20Kernel; if (test < 4) { options = NULL; } _currentTest = test % 4; // Build the kernel program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], options, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); kernel_ = _wrapper->clCreateKernel(program_, "gldepths_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } void OCLGLDepthTex::run(void) { if (_errorFlag || !extensionSupported_) { return; } bool retVal; switch (_currentTest) { case 0: retVal = testDepthRead(GL_DEPTH24_STENCIL8, GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8); break; case 1: retVal = testDepthRead(GL_DEPTH_COMPONENT16, GL_DEPTH_COMPONENT, GL_FLOAT); break; case 2: retVal = testDepthRead(GL_DEPTH_COMPONENT32F, GL_DEPTH_COMPONENT, GL_FLOAT); break; case 3: retVal = testDepthRead(GL_DEPTH32F_STENCIL8, GL_DEPTH_STENCIL, GL_FLOAT_32_UNSIGNED_INT_24_8_REV); break; default: CHECK_RESULT(true, "unsupported test number\n"); } CHECK_RESULT((retVal != true), "cl-gl depth test failed "); } bool OCLGLDepthTex::testDepthRead(GLint internalFormat, GLenum format, GLenum type) { const unsigned int bufferSize = c_dimSize * c_dimSize * 4; pGLOutput_ = (float*)malloc(bufferSize); pCLOutput_ = (float*)malloc(bufferSize); size_t dimSizes[] = {c_dimSize, c_dimSize}; bool retVal = false; // create Frame buffer object glGenFramebuffers(1, &frameBufferOBJ_); glBindFramebuffer(GL_FRAMEBUFFER, frameBufferOBJ_); // create textures glGenTextures(1, &colorBuffer_); glBindTexture(GL_TEXTURE_2D, colorBuffer_); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, c_dimSize, c_dimSize, 0, GL_RGBA, GL_UNSIGNED_BYTE, 0); glGenTextures(1, &glDepthBuffer_); glBindTexture(GL_TEXTURE_2D, glDepthBuffer_); glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, c_dimSize, c_dimSize, 0, format, type, 0); GLint glError = glGetError(); // glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, colorBuffer_, 0); if (GL_DEPTH_COMPONENT == format) { glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, glDepthBuffer_, 0); } else { glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_STENCIL_ATTACHMENT, glDepthBuffer_, 0); } glBindFramebuffer(GL_FRAMEBUFFER, frameBufferOBJ_); GLenum status = glCheckFramebufferStatus(GL_FRAMEBUFFER); if (GL_FRAMEBUFFER_COMPLETE != status) { printf("frame buffer incomplete!\n"); return false; } // set up gl state machine glViewport(0, 0, c_dimSize, c_dimSize); // Reset The Current Viewport glMatrixMode(GL_PROJECTION); // Select The Projection Matrix glLoadIdentity(); // Reset The Projection Matrix gluPerspective(30.0f, (GLfloat)c_dimSize / (GLfloat)c_dimSize, 0.1f, 100.0f); glMatrixMode(GL_MODELVIEW); // Select The Modelview Matrix glLoadIdentity(); glEnable(GL_DEPTH_TEST); glBindFramebuffer(GL_FRAMEBUFFER, frameBufferOBJ_); cl_int error; clOutputBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, bufferSize, NULL, &error); if (CL_SUCCESS != error) return false; clSampler_ = _wrapper->clCreateSampler(context_, CL_FALSE, CL_ADDRESS_NONE, CL_FILTER_NEAREST, &error); if (CL_SUCCESS != error) return false; clDepth_ = _wrapper->clCreateFromGLTexture( context_, CL_MEM_READ_ONLY, GL_TEXTURE_2D, 0, glDepthBuffer_, &error); if (CL_SUCCESS != error) return false; for (int i = 0; i < 3; ++i) { // The Type Of Depth Testing To Do glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // Clear Screen And Depth Buffer const float zValues[3][2] = { {-6.f, -3.f}, {-5.f, -2.f}, {-4.f, -1.f}, }; glBegin(GL_QUADS); // Draw A Quad glVertex3f(-1.0f, 1.0f, zValues[i][0]); // Top Left glVertex3f(1.0f, 1.0f, zValues[i][0]); // Top Right glVertex3f(1.0f, -1.0f, zValues[i][1]); // Bottom Right glVertex3f(-1.0f, -1.0f, zValues[i][1]); // Bottom Left glEnd(); glFinish(); error = _wrapper->clEnqueueAcquireGLObjects(cmdQueues_[_deviceId], 1, &clDepth_, 0, NULL, NULL); _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &clOutputBuffer_); _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), &clDepth_); _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_sampler), &clSampler_); _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, dimSizes, NULL, 0, NULL, NULL); _wrapper->clEnqueueReleaseGLObjects(cmdQueues_[_deviceId], 1, &clDepth_, 0, NULL, NULL); _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], clOutputBuffer_, CL_TRUE, 0, bufferSize, pCLOutput_, 0, NULL, NULL); glReadPixels(0, 0, c_dimSize, c_dimSize, GL_DEPTH_COMPONENT, GL_FLOAT, pGLOutput_); // test that both resources are identical. if (0 == memcmp(pGLOutput_, pCLOutput_, bufferSize)) { retVal = true; // test successful } else { printf("expected results is different from actual results\n"); dumpBuffer(pGLOutput_, "GLDepth.csv", c_dimSize); dumpBuffer(pCLOutput_, "clDepth_.csv", c_dimSize); } } return retVal; } unsigned int OCLGLDepthTex::close(void) { if (pGLOutput_) { free(pGLOutput_); pGLOutput_ = NULL; } if (pCLOutput_) { free(pCLOutput_); pCLOutput_ = NULL; } clReleaseMemObject(clDepth_); clReleaseMemObject(clOutputBuffer_); clReleaseSampler(clSampler_); // unbind the texture and frame buffer. glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, 0, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, 0, 0); glBindFramebuffer(GL_FRAMEBUFFER, 0); // clean gl resources glDeleteFramebuffers(1, &frameBufferOBJ_); frameBufferOBJ_ = 0; glDeleteTextures(1, &colorBuffer_); colorBuffer_ = 0; glDeleteTextures(1, &glDepthBuffer_); glDepthBuffer_ = 0; return OCLGLCommon::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLDepthTex.h000066400000000000000000000040451450307266000235500ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_DEPTH_TEX_H_ #define _OCL_GL_DEPTH_TEX_H_ #include "OCLGLCommon.h" class OCLGLDepthTex : public OCLGLCommon { public: OCLGLDepthTex(); virtual ~OCLGLDepthTex(); static const unsigned int c_dimSize = 128; virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: //////////////////// // test functions // //////////////////// bool testDepthRead(GLint internalFormat, GLenum format, GLenum type); unsigned int _currentTest; ///////////////////// // private members // ///////////////////// // GL resource identifiers GLuint glDepthBuffer_; GLuint frameBufferOBJ_; GLuint colorBuffer_; // CL identifiers cl_mem clOutputBuffer_; cl_mem clDepth_; cl_sampler clSampler_; // pointers to buffers float* pGLOutput_; float* pCLOutput_; bool extensionSupported_; }; #endif // _OCL_GL_BUFFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLFenceSync.cpp000066400000000000000000000423301450307266000242320ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLFenceSync.h" #include #include #include #include #include "Timer.h" #ifndef WIN_OS #include #endif const static char *strKernel = "__kernel void glmulticontext_test( __global uint4 *source, __global uint4 " "*dest) \n" "{ " " \n" " int tid = get_global_id(0); " " \n" " dest[ tid ] = source [ tid ] + (uint4)(1); " " \n" "} " " \n"; OCLGLFenceSync::OCLGLFenceSync() { memset(contextData_, 0, sizeof(contextData_)); _numSubTests = 2; } OCLGLFenceSync::~OCLGLFenceSync() {} #ifdef WIN_OS typedef GLsync(__stdcall *glFenceSyncPtr)(GLenum condition, GLbitfield flags); typedef bool(__stdcall *glIsSyncPtr)(GLsync sync); typedef void(__stdcall *glDeleteSyncPtr)(GLsync sync); typedef GLenum(__stdcall *glClientWaitSyncPtr)(GLsync sync, GLbitfield flags, GLuint64 timeout); typedef void(__stdcall *glWaitSyncPtr)(GLsync sync, GLbitfield flags, GLuint64 timeout); typedef void(__stdcall *glGetInteger64vPtr)(GLenum pname, GLint64 *params); typedef void(__stdcall *glGetSyncivPtr)(GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values); #else typedef GLsync (*glFenceSyncPtr)(GLenum condition, GLbitfield flags); typedef bool (*glIsSyncPtr)(GLsync sync); typedef void (*glDeleteSyncPtr)(GLsync sync); typedef GLenum (*glClientWaitSyncPtr)(GLsync sync, GLbitfield flags, GLuint64 timeout); typedef void (*glWaitSyncPtr)(GLsync sync, GLbitfield flags, GLuint64 timeout); typedef void (*glGetInteger64vPtr)(GLenum pname, GLint64 *params); typedef void (*glGetSyncivPtr)(GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values); #endif typedef struct __GLsync *GLsync; glFenceSyncPtr glFenceSyncFunc; glIsSyncPtr glIsSyncFunc; glDeleteSyncPtr glDeleteSyncFunc; glClientWaitSyncPtr glClientWaitSyncFunc; glWaitSyncPtr glWaitSyncFunc; glGetInteger64vPtr glGetInteger64vFunc; glGetSyncivPtr glGetSyncivFunc; #define CHK_GL_ERR() printf("%s\n", gluErrorString(glGetError())) #define cl_khr_gl_event 1 static void InitSyncFns() { #ifdef WIN_OS glFenceSyncFunc = (glFenceSyncPtr)wglGetProcAddress("glFenceSync"); glIsSyncFunc = (glIsSyncPtr)wglGetProcAddress("glIsSync"); glDeleteSyncFunc = (glDeleteSyncPtr)wglGetProcAddress("glDeleteSync"); glClientWaitSyncFunc = (glClientWaitSyncPtr)wglGetProcAddress("glClientWaitSync"); glWaitSyncFunc = (glWaitSyncPtr)wglGetProcAddress("glWaitSync"); glGetInteger64vFunc = (glGetInteger64vPtr)wglGetProcAddress("glGetInteger64v"); glGetSyncivFunc = (glGetSyncivPtr)wglGetProcAddress("glGetSynciv"); #else glFenceSyncFunc = (glFenceSyncPtr)glXGetProcAddress((GLubyte *)"glFenceSync"); glIsSyncFunc = (glIsSyncPtr)glXGetProcAddress((GLubyte *)"glIsSync"); glDeleteSyncFunc = (glDeleteSyncPtr)glXGetProcAddress((GLubyte *)"glDeleteSync"); glClientWaitSyncFunc = (glClientWaitSyncPtr)glXGetProcAddress((GLubyte *)"glClientWaitSync"); glWaitSyncFunc = (glWaitSyncPtr)glXGetProcAddress((GLubyte *)"glWaitSync"); glGetInteger64vFunc = (glGetInteger64vPtr)glXGetProcAddress((GLubyte *)"glGetInteger64v"); glGetSyncivFunc = (glGetSyncivPtr)glXGetProcAddress((GLubyte *)"glGetSynciv"); #endif } #define USING_ARB_sync 1 typedef cl_event(CL_API_CALL *clCreateEventFromGLsyncKHR_fn)( cl_context context, GLsync sync, cl_int *errCode_ret); clCreateEventFromGLsyncKHR_fn clCreateEventFromGLsyncKHR_ptr; /* Helper to determine if an extension is supported by a device */ int is_extension_available(cl_device_id device, const char *extensionName) { char *extString; size_t size = 0; int err; int result = -1; if ((err = clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 0, NULL, &size))) { printf( "Error: failed to determine size of device extensions string (err = " "%d)\n", err); return -2; } if (0 == size) return -3; extString = (char *)malloc(size); if (NULL == extString) { printf( "Error: unable to allocate %ld byte buffer for extension string (err = " "%d)\n", (long)size, err); return -40; } if ((err = clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, size, extString, NULL))) { printf("Error: failed to obtain device extensions string (err = %d)\n", err); free(extString); return -5; } if (strstr(extString, extensionName)) result = 0; free(extString); return result; } void OCLGLFenceSync::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { _openTest = test; // Initialize random number seed srand((unsigned int)time(NULL)); OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; cl_context_properties properties[7] = {0}; for (unsigned int i = 0; i < c_glContextCount; i++) { error_ = is_extension_available(devices_[_deviceId], "cl_khr_gl_event"); if (error_ != CL_SUCCESS) { printf("Silent failure: cl_khr_gl_event extension not available (%d)\n", error_); extensionSupported_ = false; return; } extensionSupported_ = true; createGLContext(contextData_[i].glContext); getCLContextPropertiesFromGLContext(contextData_[i].glContext, properties); // Create new CL context from GL context contextData_[i].clContext = _wrapper->clCreateContext( properties, 1, &devices_[_deviceId], NULL, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateContext() failed (%d)", error_); // Create command queue for new context contextData_[i].clCmdQueue = _wrapper->clCreateCommandQueue( contextData_[i].clContext, devices_[_deviceId], 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed (%d)", error_); // Build the kernel contextData_[i].clProgram = _wrapper->clCreateProgramWithSource( contextData_[i].clContext, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); error_ = _wrapper->clBuildProgram(contextData_[i].clProgram, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(contextData_[i].clProgram, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); contextData_[i].clKernel = _wrapper->clCreateKernel( contextData_[i].clProgram, "glmulticontext_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } } void OCLGLFenceSync::run() { if (_errorFlag || !extensionSupported_) { return; } CPerfCounter timer; double sec; float perf; cl_uint4 inOutData[c_numOfElements] = {{{0}}}; cl_uint4 expectedData[c_numOfElements] = {{{0}}}; unsigned int m = sizeof(cl_uint4) / sizeof(cl_uint); int count = 0; // Initialize input data with random values for (unsigned int i = 0; i < c_numOfElements; i++) { for (unsigned int j = 0; j < m; j++) { inOutData[i].s[j] = (unsigned int)i; expectedData[i].s[j] = inOutData[i].s[j] + c_glContextCount; } } cl_event fenceEvent0 = NULL, fenceEvent = NULL; GLsync glFence0 = NULL, glFence = NULL; InitSyncFns(); clCreateEventFromGLsyncKHR_ptr = (clCreateEventFromGLsyncKHR_fn)clGetExtensionFunctionAddress( "clCreateEventFromGLsyncKHR"); if (clCreateEventFromGLsyncKHR_ptr == NULL) { printf( "ERROR: Unable to run fence_sync test (clCreateEventFromGLsyncKHR " "function not discovered!)\n"); return; } for (unsigned int i = 0; i < c_glContextCount; i++) { makeCurrent(contextData_[i].glContext); // Generate and Bind in & out OpenGL buffers GLuint inGLBuffer = 0, outGLBuffer = 0; glGenBuffers(1, &inGLBuffer); glGenBuffers(1, &outGLBuffer); glBindBuffer(GL_ARRAY_BUFFER, inGLBuffer); glBufferData(GL_ARRAY_BUFFER, c_numOfElements * sizeof(cl_uint4), inOutData, GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, outGLBuffer); glBufferData(GL_ARRAY_BUFFER, c_numOfElements * sizeof(cl_uint4), NULL, GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, 0); glFinish(); // Checking if clWaitForEvents works switch (_openTest) { case 0: // Using fence sync glFence0 = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0); CHECK_RESULT((glFence0 == NULL), "Unable to create GL fence"); fenceEvent0 = clCreateEventFromGLsyncKHR_ptr(contextData_[i].clContext, glFence0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create CL event from GL fence (%d)", error_); error_ = clWaitForEvents(1, &fenceEvent0); CHECK_RESULT((error_ != CL_SUCCESS), "clWaitForEvents() failed (%d)", error_); break; default: glFinish(); break; } if (fenceEvent != NULL) { clReleaseEvent(fenceEvent0); glDeleteSync(glFence0); } cl_event acqEvent1 = 0, acqEvent2 = 0, kernelEvent = 0, relEvent1 = 0, relEvent2 = 0; // Create input buffer from GL input buffer contextData_[i].inputBuffer = _wrapper->clCreateFromGLBuffer( contextData_[i].clContext, CL_MEM_READ_ONLY, inGLBuffer, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create input GL buffer (%d)", error_); // Create output buffer from GL output buffer contextData_[i].outputBuffer = _wrapper->clCreateFromGLBuffer( contextData_[i].clContext, CL_MEM_WRITE_ONLY, outGLBuffer, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create output GL buffer (%d)", error_); timer.Reset(); switch (_openTest) { case 0: // Using fence sync timer.Start(); glFence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0); timer.Stop(); CHECK_RESULT((glFence == NULL), "Unable to create GL fence"); timer.Start(); fenceEvent = clCreateEventFromGLsyncKHR_ptr(contextData_[i].clContext, glFence, &error_); timer.Stop(); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create CL event from GL fence (%d)", error_); break; default: break; } error_ = _wrapper->clSetKernelArg(contextData_[i].clKernel, 0, sizeof(cl_mem), &(contextData_[i].inputBuffer)); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed (%d)", error_); error_ = _wrapper->clSetKernelArg(contextData_[i].clKernel, 1, sizeof(cl_mem), &(contextData_[i].outputBuffer)); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed (%d)", error_); switch (_openTest) { case 0: // Using fence sync timer.Start(); error_ = _wrapper->clEnqueueAcquireGLObjects( contextData_[i].clCmdQueue, 1, &(contextData_[i].inputBuffer), 1, &fenceEvent, &acqEvent1); timer.Stop(); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); timer.Start(); error_ = _wrapper->clEnqueueAcquireGLObjects( contextData_[i].clCmdQueue, 1, &(contextData_[i].outputBuffer), 1, &fenceEvent, &acqEvent2); timer.Stop(); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); break; case 1: // Using glFinish timer.Start(); glFinish(); timer.Stop(); timer.Start(); error_ = _wrapper->clEnqueueAcquireGLObjects( contextData_[i].clCmdQueue, 1, &(contextData_[i].inputBuffer), 0, NULL, &acqEvent1); timer.Stop(); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); timer.Start(); error_ = _wrapper->clEnqueueAcquireGLObjects( contextData_[i].clCmdQueue, 1, &(contextData_[i].outputBuffer), 0, NULL, &acqEvent2); timer.Stop(); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); break; default: break; } size_t gws[1] = {c_numOfElements}; cl_event evts[2] = {acqEvent1, acqEvent2}; error_ = _wrapper->clEnqueueNDRangeKernel(contextData_[i].clCmdQueue, contextData_[i].clKernel, 1, NULL, gws, NULL, 2, evts, &kernelEvent); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed (%d)", error_); error_ = _wrapper->clEnqueueReleaseGLObjects(contextData_[i].clCmdQueue, 1, &(contextData_[i].inputBuffer), 1, &kernelEvent, &relEvent1); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReleaseGLObjects failed (%d)", error_); error_ = _wrapper->clEnqueueReleaseGLObjects( contextData_[i].clCmdQueue, 1, &(contextData_[i].outputBuffer), 1, &kernelEvent, &relEvent2); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReleaseGLObjects failed (%d)", error_); evts[0] = relEvent1; evts[1] = relEvent2; error_ = clWaitForEvents(2, evts); CHECK_RESULT((error_ != CL_SUCCESS), "clWaitForEvents() failed (%d)", error_); glBindBuffer(GL_ARRAY_BUFFER, outGLBuffer); void *glMem = glMapBuffer(GL_ARRAY_BUFFER, GL_READ_ONLY); memcpy(inOutData, glMem, c_numOfElements * sizeof(cl_uint4)); glUnmapBuffer(GL_ARRAY_BUFFER); _wrapper->clReleaseMemObject(contextData_[i].inputBuffer); _wrapper->clReleaseMemObject(contextData_[i].outputBuffer); // Delete GL buffers glBindBuffer(GL_ARRAY_BUFFER, 0); glDeleteBuffers(1, &inGLBuffer); inGLBuffer = 0; glDeleteBuffers(1, &outGLBuffer); outGLBuffer = 0; } sec = timer.GetElapsedTime(); perf = (float)sec * 1000000; // in microseconds _perfInfo = (float)perf; if (fenceEvent != NULL) { clReleaseEvent(fenceEvent); glDeleteSync(glFence); } // Compare expected output with actual data received for (unsigned int i = 0; i < c_numOfElements; i++) { for (unsigned int j = 0; j < m; j++) { if (inOutData[i].s[j] != expectedData[i].s[j]) { printf( "Element %u is incorrect!\t expected:[ %u, %u, %u, %u ] differs " "from actual:{%u, %u, %u, %u}\n", i, expectedData[i].s[0], expectedData[i].s[1], expectedData[i].s[2], expectedData[i].s[3], inOutData[i].s[0], inOutData[i].s[1], inOutData[i].s[2], inOutData[i].s[3]); count++; } } } if (count) printf("Number of elements wrong: %d\n", count); } unsigned int OCLGLFenceSync::close() { error_ = is_extension_available(devices_[_deviceId], "cl_khr_gl_event"); if (error_ == CL_SUCCESS) { for (unsigned int i = 0; i < c_glContextCount; i++) { makeCurrent(contextData_[i].glContext); _wrapper->clReleaseKernel(contextData_[i].clKernel); _wrapper->clReleaseProgram(contextData_[i].clProgram); _wrapper->clReleaseCommandQueue(contextData_[i].clCmdQueue); _wrapper->clReleaseContext(contextData_[i].clContext); destroyGLContext(contextData_[i].glContext); } } return OCLGLCommon::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLFenceSync.h000066400000000000000000000036161450307266000237030ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_FENCE_SYNC_H_ #define _OCL_GL_FENCE_SYNC_H_ #include "OCLGLCommon.h" class OCLGLFenceSync : public OCLGLCommon { public: OCLGLFenceSync(); virtual ~OCLGLFenceSync(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: static const unsigned int c_glContextCount = 1; static const unsigned int c_numOfElements = 8192; struct GLContextDataSet { OCLGLHandle glContext; cl_context clContext; cl_command_queue clCmdQueue; cl_program clProgram; cl_kernel clKernel; cl_mem inputBuffer; cl_mem outputBuffer; }; GLContextDataSet contextData_[c_glContextCount]; bool failed_; bool extensionSupported_; }; #endif // _OCL_GL_FENCE_SYNC_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLMsaaTexture.cpp000066400000000000000000000240351450307266000246210ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLMsaaTexture.h" #include #include #include #include const static char* strKernel = "__kernel void gl_msaa_test( __global uint4 *output, read_only " "image2d_msaa_t source, unsigned int numSamples){ \n" " int tidX = get_global_id(0);\n" " int tidY = get_global_id(1);\n" " for (int i = 0 ; i < numSamples ; i++) {\n" " uint4 value = read_imageui( source, (int2)( tidX, tidY ) ,i);\n" " int index = (tidY * get_image_width( source ) + tidX)*numSamples + " "i;\n" " output[ index ] = value;\n" " }\n" "}\n"; const static char* glDownSampleShader = "uniform sampler2DMS MsaaTex;\n" "uniform int numSamples;\n" "uniform ivec2 resolution;\n" "\n" "varying vec4 gl_TexCoord[ ]; \n" "\n" "void main(void)\n" "{\n" " vec4 accum = vec4(0.0,0.0,0.0,0.0);\n" " ivec2 coord = ivec2(resolution * gl_TexCoord[0].xy) ;\n" " for ( int i = 0 ; i < numSamples ; i++)\n" " {\n" " accum += texelFetch(MsaaTex,coord,i);\n" " }\n" " accum /= numSamples;\n" " \n" " \n" " \n" " gl_FragColor = accum;\n" "}"; OCLGLMsaaTexture::OCLGLMsaaTexture() : msaaDepthBuffer_(0), msaaFrameBufferOBJ_(0), msaaColorBuffer_(0), glShader_(0), glprogram_(0), clOutputBuffer_(0), clMsaa_(0), pGLOutput_(0), pCLOutput_(0) { _numSubTests = 1; _currentTest = 0; } OCLGLMsaaTexture::~OCLGLMsaaTexture() {} void OCLGLMsaaTexture::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; _currentTest = test; // Build the kernel program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); kernel_ = _wrapper->clCreateKernel(program_, "gl_msaa_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } void OCLGLMsaaTexture::run(void) { if (_errorFlag) { return; } bool retVal; switch (_currentTest) { case 0: retVal = testMsaaRead(GL_RGBA, 2); break; default: CHECK_RESULT(true, "unsupported test number\n"); } CHECK_RESULT((retVal != true), "cl-gl depth test failed "); } unsigned int OCLGLMsaaTexture::close(void) { if (pGLOutput_) { free(pGLOutput_); pGLOutput_ = NULL; } if (pCLOutput_) { free(pCLOutput_); pCLOutput_ = NULL; } clReleaseMemObject(clMsaa_); clReleaseMemObject(clOutputBuffer_); glFinish(); // unbind the texture and frame buffer. glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, 0, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, 0, 0); glBindFramebuffer(GL_FRAMEBUFFER, 0); glBindTexture(GL_TEXTURE_2D_MULTISAMPLE, 0); // clean gl resources glDeleteFramebuffers(1, &msaaFrameBufferOBJ_); msaaFrameBufferOBJ_ = 0; glDeleteTextures(1, &msaaColorBuffer_); msaaColorBuffer_ = 0; glDeleteTextures(1, &msaaDepthBuffer_); msaaDepthBuffer_ = 0; glDeleteProgram(glprogram_); glDeleteShader(glShader_); return OCLGLCommon::close(); } bool OCLGLMsaaTexture::testMsaaRead(GLint internalFormat, unsigned int numSamples) { size_t dimSizes[] = {c_dimSize, c_dimSize}; unsigned int bufferSize = c_dimSize * c_dimSize * 4; bool retVal = false; createGLFragmentProgramFromSource(glDownSampleShader, glShader_, glprogram_); ///////////////////// // create msaa FBO // ///////////////////// glGenFramebuffers(1, &msaaFrameBufferOBJ_); glBindFramebuffer(GL_FRAMEBUFFER, msaaFrameBufferOBJ_); // create textures glGenTextures(1, &msaaColorBuffer_); glBindTexture(GL_TEXTURE_2D_MULTISAMPLE, msaaColorBuffer_); glTexImage2DMultisample(GL_TEXTURE_2D_MULTISAMPLE, numSamples, GL_RGBA8, c_dimSize, c_dimSize, GL_TRUE); glGenTextures(1, &msaaDepthBuffer_); glBindTexture(GL_TEXTURE_2D_MULTISAMPLE, msaaDepthBuffer_); glTexImage2DMultisample(GL_TEXTURE_2D_MULTISAMPLE, numSamples, GL_DEPTH_COMPONENT24, c_dimSize, c_dimSize, GL_TRUE); // glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, msaaColorBuffer_, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, msaaDepthBuffer_, 0); // verify all resource allocations are well. GLenum status = glCheckFramebufferStatus(GL_FRAMEBUFFER); if (GL_FRAMEBUFFER_COMPLETE != status) { return false; } // set up gl state machine glViewport(0, 0, c_dimSize, c_dimSize); // Reset The Current Viewport glMatrixMode(GL_PROJECTION); // Select The Projection Matrix glLoadIdentity(); // Reset The Projection Matrix gluPerspective(30.0f, (GLfloat)c_dimSize / (GLfloat)c_dimSize, 0.1f, 100.0f); glMatrixMode(GL_MODELVIEW); // Select The Modelview Matrix glLoadIdentity(); glEnable(GL_DEPTH_TEST); // The Type Of Depth Testing To Do glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // Clear Screen And Depth Buffer glBegin(GL_QUADS); // Draw A Quad glVertex3f(-1.0f, 1.0f, -6.0f); // Top Left glVertex3f(1.0f, 1.0f, -6.0f); // Top Right glVertex3f(1.0f, -1.0f, -3.0f); // Bottom Right glVertex3f(-1.0f, -1.0f, -3.0f); // Bottom Left glEnd(); glFinish(); cl_int error; clOutputBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, bufferSize, NULL, &error); if (CL_SUCCESS != error) return false; clMsaa_ = _wrapper->clCreateFromGLTexture(context_, CL_MEM_READ_WRITE, GL_TEXTURE_2D_MULTISAMPLE, 0, msaaColorBuffer_, &error); if (CL_SUCCESS != error) return false; GLsizei samples; error = _wrapper->clGetGLTextureInfo(clMsaa_, CL_GL_NUM_SAMPLES, sizeof(samples), &samples, NULL); error = _wrapper->clEnqueueAcquireGLObjects(cmdQueues_[_deviceId], 1, &clMsaa_, 0, NULL, NULL); if (CL_SUCCESS != error) return false; _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &clOutputBuffer_); _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), &clMsaa_); _wrapper->clSetKernelArg(kernel_, 2, sizeof(unsigned int), &numSamples); _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, dimSizes, NULL, 0, NULL, NULL); _wrapper->clEnqueueReleaseGLObjects(cmdQueues_[_deviceId], 1, &clMsaa_, 0, NULL, NULL); pGLOutput_ = (unsigned int*)malloc(bufferSize); pCLOutput_ = (unsigned int*)malloc(bufferSize); _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], clOutputBuffer_, CL_TRUE, 0, bufferSize, pCLOutput_, 0, NULL, NULL); // down sample glBindFramebuffer(GL_FRAMEBUFFER, 0); glBindTexture(GL_TEXTURE_2D_MULTISAMPLE, msaaColorBuffer_); glUseProgram(glprogram_); glUniform1i(glGetUniformLocation(glprogram_, "numSamples"), numSamples); glUniform2i(glGetUniformLocation(glprogram_, "resolution"), c_dimSize, c_dimSize); glUniform1i(glGetUniformLocation(glprogram_, "MsaaTex"), 0); // printOpenGLError(); glBegin(GL_QUADS); glVertex2f(-1.0f, 1.0f); glTexCoord2f(1.0f, 0.0f); glVertex2f(1.0f, 1.0f); glTexCoord2f(1.0f, 1.0f); glVertex2f(1.0f, -1.0f); glTexCoord2f(0.0f, 1.0f); glVertex2f(-1.0f, -1.0f); glTexCoord2f(0.0f, 0.0f); glEnd(); glBindTexture(GL_TEXTURE_2D_MULTISAMPLE, 0); glUseProgram(0); glReadPixels(0, 0, c_dimSize, c_dimSize, GL_BGRA, GL_UNSIGNED_BYTE, pGLOutput_); if (absDiff(pGLOutput_, pCLOutput_, c_dimSize)) retVal = true; return retVal; } bool OCLGLMsaaTexture::absDiff(unsigned int* pGLBuffer, unsigned int* pCLBuffer, const unsigned int c_dimSize) { bool retVal = true; for (unsigned int i = 0; i < c_dimSize * c_dimSize; i++) { char clPixel[4]; char glPixel[4]; char diff[4] = {0}; memcpy(clPixel, &(pCLBuffer[i]), sizeof(clPixel)); memcpy(glPixel, &(pGLBuffer[i]), sizeof(glPixel)); for (int j = 0; j < 4; j++) { diff[j] = abs(clPixel[j] - glPixel[i]); if (diff[j] > 10) retVal = false; } } return retVal; } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLMsaaTexture.h000066400000000000000000000044451450307266000242710ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_MSAA_TEXTURE_H_ #define _OCL_GL_MSAA_TEXTURE_H_ #include "OCLGLCommon.h" class OCLGLMsaaTexture : public OCLGLCommon { public: OCLGLMsaaTexture(); virtual ~OCLGLMsaaTexture(); static const unsigned int c_dimSize = 128; virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: //////////////////// // test functions // //////////////////// bool testMsaaRead(GLint internalFormat, unsigned int NumSamples); unsigned int _currentTest; ////////////////////////////// // private helper functions // ////////////////////////////// // returns element size in bytes. static bool absDiff(unsigned int* pGLBuffer, unsigned int* pCLBuffer, const unsigned int dimSize); ///////////////////// // private members // ///////////////////// // GL resource identifiers GLuint msaaDepthBuffer_; GLuint msaaFrameBufferOBJ_; GLuint msaaColorBuffer_; GLuint glShader_; GLuint glprogram_; // CL identifiers cl_mem clOutputBuffer_; cl_mem clMsaa_; unsigned int* pGLOutput_; unsigned int* pCLOutput_; }; #endif // _OCL_GL_BUFFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLMultiContext.cpp000066400000000000000000000226651450307266000250250ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLMultiContext.h" #include #include #include #include const static char* strKernel = "__kernel void glmulticontext_test( __global uint4 *source, __global uint4 " "*dest) \n" "{ " " \n" " int tid = get_global_id(0); " " \n" " dest[ tid ] = source[ tid ] + (uint4)(1); " " \n" "} " " \n"; OCLGLMultiContext::OCLGLMultiContext() { memset(contextData_, 0, sizeof(contextData_)); _numSubTests = 1; } OCLGLMultiContext::~OCLGLMultiContext() {} void OCLGLMultiContext::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { // Initialize random number seed srand((unsigned int)time(NULL)); OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; cl_context_properties properties[7] = {0}; for (unsigned int i = 0; i < c_glContextCount; i++) { createGLContext(contextData_[i].glContext); getCLContextPropertiesFromGLContext(contextData_[i].glContext, properties); // Create new CL context from GL context contextData_[i].clContext = _wrapper->clCreateContext( properties, 1, &devices_[_deviceId], NULL, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateContext() failed (%d)", error_); // Create command queue for new context contextData_[i].clCmdQueue = _wrapper->clCreateCommandQueue( contextData_[i].clContext, devices_[_deviceId], 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed (%d)", error_); // Build the kernel contextData_[i].clProgram = _wrapper->clCreateProgramWithSource( contextData_[i].clContext, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); error_ = _wrapper->clBuildProgram(contextData_[i].clProgram, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(contextData_[i].clProgram, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); contextData_[i].clKernel = _wrapper->clCreateKernel( contextData_[i].clProgram, "glmulticontext_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } } void OCLGLMultiContext::run() { if (_errorFlag) { return; } cl_uint4 inOutData[c_numOfElements] = {{{0}}}; cl_uint4 expectedData[c_numOfElements] = {{{0}}}; // Initialize input data with random values for (unsigned int i = 0; i < c_numOfElements; i++) { for (unsigned int j = 0; j < sizeof(cl_uint4) / sizeof(cl_uint); j++) { inOutData[i].s[j] = (unsigned int)rand(); expectedData[i].s[j] = inOutData[i].s[j] + c_glContextCount; } } for (unsigned int i = 0; i < c_glContextCount; i++) { makeCurrent(contextData_[i].glContext); // Generate and Bind in & out OpenGL buffers GLuint inGLBuffer = 0, outGLBuffer = 0; glGenBuffers(1, &inGLBuffer); glGenBuffers(1, &outGLBuffer); glBindBuffer(GL_ARRAY_BUFFER, inGLBuffer); glBufferData(GL_ARRAY_BUFFER, c_numOfElements * sizeof(cl_uint4), inOutData, GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, outGLBuffer); glBufferData(GL_ARRAY_BUFFER, c_numOfElements * sizeof(cl_uint4), NULL, GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, 0); glFinish(); // Create input buffer from GL input buffer contextData_[i].inputBuffer = _wrapper->clCreateFromGLBuffer( contextData_[i].clContext, CL_MEM_READ_ONLY, inGLBuffer, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create input GL buffer (%d)", error_); // Create output buffer from GL output buffer contextData_[i].outputBuffer = _wrapper->clCreateFromGLBuffer( contextData_[i].clContext, CL_MEM_WRITE_ONLY, outGLBuffer, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to create output GL buffer (%d)", error_); error_ = _wrapper->clSetKernelArg(contextData_[i].clKernel, 0, sizeof(cl_mem), &(contextData_[i].inputBuffer)); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed (%d)", error_); error_ = _wrapper->clSetKernelArg(contextData_[i].clKernel, 1, sizeof(cl_mem), &(contextData_[i].outputBuffer)); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed (%d)", error_); error_ = _wrapper->clEnqueueAcquireGLObjects(contextData_[i].clCmdQueue, 1, &(contextData_[i].inputBuffer), 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); error_ = _wrapper->clEnqueueAcquireGLObjects( contextData_[i].clCmdQueue, 1, &(contextData_[i].outputBuffer), 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to acquire GL objects (%d)", error_); size_t gws[1] = {c_numOfElements}; error_ = _wrapper->clEnqueueNDRangeKernel(contextData_[i].clCmdQueue, contextData_[i].clKernel, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed (%d)", error_); error_ = _wrapper->clEnqueueReleaseGLObjects(contextData_[i].clCmdQueue, 1, &(contextData_[i].inputBuffer), 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReleaseGLObjects failed (%d)", error_); error_ = _wrapper->clEnqueueReleaseGLObjects( contextData_[i].clCmdQueue, 1, &(contextData_[i].outputBuffer), 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReleaseGLObjects failed (%d)", error_); error_ = _wrapper->clFinish(contextData_[i].clCmdQueue); CHECK_RESULT((error_ != CL_SUCCESS), "clFinish() failed (%d)", error_); glBindBuffer(GL_ARRAY_BUFFER, outGLBuffer); void* glMem = glMapBuffer(GL_ARRAY_BUFFER, GL_READ_ONLY); memcpy(inOutData, glMem, c_numOfElements * sizeof(cl_uint4)); glUnmapBuffer(GL_ARRAY_BUFFER); _wrapper->clReleaseMemObject(contextData_[i].inputBuffer); _wrapper->clReleaseMemObject(contextData_[i].outputBuffer); // Delete GL buffers glBindBuffer(GL_ARRAY_BUFFER, 0); glDeleteBuffers(1, &inGLBuffer); inGLBuffer = 0; glDeleteBuffers(1, &outGLBuffer); outGLBuffer = 0; } // Compare expected output with actual data received for (unsigned int i = 0; i < c_numOfElements; i++) { for (unsigned int j = 0; j < sizeof(cl_uint4) / sizeof(cl_uint); j++) { CHECK_RESULT((inOutData[i].s[j] != expectedData[i].s[j]), "Element %d is incorrect!\n\t \ expected:{%d, %d, %d, %d} differs from actual:{%d, %d, %d, %d}", i, expectedData[i].s[0], expectedData[i].s[1], expectedData[i].s[2], expectedData[i].s[3], inOutData[i].s[0], inOutData[i].s[1], inOutData[i].s[2], inOutData[i].s[3]); } } } unsigned int OCLGLMultiContext::close() { for (unsigned int i = 0; i < c_glContextCount; i++) { makeCurrent(contextData_[i].glContext); _wrapper->clReleaseKernel(contextData_[i].clKernel); _wrapper->clReleaseProgram(contextData_[i].clProgram); _wrapper->clReleaseCommandQueue(contextData_[i].clCmdQueue); _wrapper->clReleaseContext(contextData_[i].clContext); destroyGLContext(contextData_[i].glContext); } return OCLGLCommon::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLMultiContext.h000066400000000000000000000036031450307266000244610ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_MULTI_CONTEXT_H_ #define _OCL_GL_MULTI_CONTEXT_H_ #include "OCLGLCommon.h" class OCLGLMultiContext : public OCLGLCommon { public: OCLGLMultiContext(); virtual ~OCLGLMultiContext(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: static const unsigned int c_glContextCount = 3; static const unsigned int c_numOfElements = 128; struct GLContextDataSet { OCLGLHandle glContext; cl_context clContext; cl_command_queue clCmdQueue; cl_program clProgram; cl_kernel clKernel; cl_mem inputBuffer; cl_mem outputBuffer; }; GLContextDataSet contextData_[c_glContextCount]; bool failed_; }; #endif // _OCL_GL_MULTI_CONTEXT_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLPerfSepia.cpp000066400000000000000000000504351450307266000242400ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLPerfSepia.h" #include #include #include #include #define WIDTH 1024 #define HEIGHT 1024 // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define MAX(a, b) (a > b ? a : b) const char *sepiaVertexProgram = "!!ARBvp1.0\n" "\n" "\n" "OPTION ARB_position_invariant;\n" "\n" "PARAM p0 = program.local[2];\n" "PARAM p1 = program.local[3];\n" "ATTRIB a0 = vertex.texcoord[0];\n" "OUTPUT o0 = result.texcoord[0];\n" "OUTPUT o1 = result.texcoord[1];\n" "TEMP r0, r1;\n" "\n" "MOV o0, a0;\n" "#SWZ r1, a0, x, y, 0, 0;\n" "#DPH r0.x, r1, p0;\n" "#DPH r0.y, r1, p1;\n" "#MOV o1, r0;\n" "MOV o1, a0;\n" "\n" "END\n"; const char *sepiaFragmentProgram = "!!ARBfp1.0\n" "\n" "\n" "PARAM p0 = {1e-4, 0.085, 0.0, 0.0};\n" "PARAM p1 = {0.2125, 0.7154, 0.0721, 0.0};\n" "PARAM p2 = {-3605.984, 0.1323156, 0.0, -0.1991615};\n" "PARAM p3 = {708.7939, -0.3903106, -0.05854013, 0.6621023};\n" "PARAM p4 = {-50.93341, 0.4654831, 1.027555, -0.9069088};\n" "PARAM p5 = {3.116672, 0.7926372, 0.03219686, 1.411847};\n" "PARAM p6 = {8.95663e-4, -0.001104567, -6.0827e-4, 0.03277428};\n" "PARAM p7 = program.local[0];\n" "PARAM p8 = program.local[1];\n" "ATTRIB a0 = fragment.texcoord[1];\n" "OUTPUT o0 = result.color;\n" "TEMP r0, r1, r2, r3;\n" "\n" "TEX r1, a0, texture[0], RECT;\n" "#MAX r0, p0.x, r1.w;\n" "#RCP r2, r0.x;\n" "#DP3 r3, r1, p1;\n" "#MUL r0, r3, r2;\n" "#MAD r2, r0, p2, p3;\n" "#MAD r2, r2, r0, p4;\n" "#MAD r0, r2, r0, p5;\n" "#MUL r2, r1.w, p6;\n" "#MAD r2, r0, r3, r2;\n" "#MAD r0, r1.w, p0.y, -r3;\n" "#CMP r2.x, -r0, r2.x, r2.w;\n" "#MAD r0, r3, r3, -r3;\n" "#CMP r0, r0.x, r2, r3;\n" "#MOV r0.w, r1;\n" "#MUL r0, r0, p7;\n" "#LRP o0, p8.x, r0, r1;\n" "MOV o0, r1;\n" "\n" "END\n"; const static char *strKernel = "\n" "__kernel void program(write_only image2d_t dest, int flipped, int4 dim, " "float2 st_origin, float4 st_delta, float4 l0, float4 l1, float4 l2, " "float4 l3, read_only image2d_t t0, sampler_t t_sampler0)\n" "{\n" " const sampler_t sam = CLK_NORMALIZED_COORDS_FALSE | " "CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;\n" "// const float4 p0 = (float4)( 0x1.b33334p-3, 0x1.6e48e8p-1, " "0x1.275254p-4, 0x0p+0 );\n" "// const float4 p1 = (float4)( 0x1.a36e2ep-14, 0x1.5c28f6p-4, 0x0p+0, " "0x0p+0 );\n" "// const float4 p2 = (float4)( 0x1.d595dap-11, -0x1.218e3cp-10, " "-0x1.3ee89ep-11, 0x1.0c7ca6p-5 );\n" "// const float4 p3 = (float4)( -0x1.c2bf7cp+11, 0x1.0efb7cp-3, " "0x0p+0, -0x1.97e1fcp-3 );\n" "// const float4 p4 = (float4)( 0x1.62659ep+9, -0x1.8fad94p-2, " "-0x1.df8f8cp-5, 0x1.52ff12p-1 );\n" "// const float4 p5 = (float4)( -0x1.9777ap+5, 0x1.dca79ap-2, " "0x1.070dd8p+0, -0x1.d0565ap-1 );\n" "// const float4 p6 = (float4)( 0x1.8eef1cp+1, 0x1.95d48cp-1, " "0x1.07c1b6p-5, 0x1.696ecep+0 );\n" "// int dest_width = dim.x;\n" "// int dest_height = dim.y;\n" " float4 o0, r0, r1, r2, r3, r4;\n" "// float4 false_vector = (float4) 0.0f;\n" "// float4 true_vector = (float4) 1.0f;\n" " int2 loc = (int2)( get_global_id(0), get_global_id(1) );\n" "// if ((loc.x >= dim.x) || loc.y >= dim.y) return;\n" "// float4 f0 = (float4)( st_origin.x + ((float)loc.x + 0.5f) * " "st_delta.x + ((float)loc.y + 0.5f) * st_delta.z, st_origin.y + " "((float)loc.x + 0.5f) * st_delta.y + ((float)loc.y + 0.5f) * st_delta.w, " "0.0f, 0.0f );\n" "// r2 = f0;\n" "// r0.x = dot(r2.xy,l2.xy) + l2.w;\n" "// r0.y = dot(r2.xy,l3.xy) + l3.w;\n" "// r4 = r0;\n" " r1 = read_imagef(t0, sam/*t_sampler0*/, r4.xy);\n" "// r3 = dot(r1.xyz,p0.xyz);\n" "// r2 = max(p1.xxxx, r1.wwww);\n" "// r0 = native_recip(r2.xxxx);\n" "// r4 = r3*r0;\n" "// r2 = r1.wwww*p2;\n" "// r0 = mad(r4,p3,p4);\n" "// r0 = mad(r0,r4,p5);\n" "// r0 = mad(r0,r4,p6);\n" "// r2 = mad(r0,r3,r2);\n" "// r0 = mad(r1.wwww,p1.yyyy,-r3);\n" "// r2.x = select(r2.w,r2.x, isless(-r0.x, 0.0f));\n" "// r0 = mad(r3,r3,-r3);\n" "// r0 = select(r3,r2, isless(r0.xxxx, 0.0f));\n" "// r0.w = r1.w;\n" "// r0 = r0*l0;\n" "// r0 = mix(r1,r0, l1.xxxx);\n" "// r0.xyz = min(r0.xyz, r0.www);\n" "// o0 = r0;\n" " write_imagef(dest, loc /*(int2)( loc.x + dim.z , flipped ? " "get_image_height(dest) - (loc.y + dim.w + 1) : loc.y + dim.w )*/, r1 " "/*o0*/);\n" "}\n"; OCLGLPerfSepia::OCLGLPerfSepia() { _numSubTests = 2; } OCLGLPerfSepia::~OCLGLPerfSepia() {} void OCLGLPerfSepia::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { bVerify_ = false; silentFailure_ = false; iterations_ = 50000; bpr_ = 0; data_ = 0; result_ = 0; width_ = 0; height_ = 0; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; texId = 0; format_.image_channel_order = CL_RGBA; format_.image_channel_data_type = CL_UNORM_INT8; srand(0x8956); // some constant instead of time() so that we get same random // numbers if (!IsGLEnabled(test, units, conversion, deviceId)) { silentFailure_ = true; return; } OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; if (test == 0) { // Build the kernel program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); const char *optionsGPU = "-cl-denorms-are-zero -cl-mad-enable"; error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], optionsGPU, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); kernel_ = _wrapper->clCreateKernel(program_, "program", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } } void OCLGLPerfSepia::populateData(void) { width_ = WIDTH; height_ = HEIGHT; bpr_ = 4 * width_; data_ = (cl_uchar *)malloc(height_ * bpr_); for (unsigned int n = 0; n < (height_ * bpr_); n++) { data_[n] = (n & 3) ? (rand() % 256) : 0xFF; } } void OCLGLPerfSepia::runGL(void) { glDisable(GL_ALPHA_TEST); glDisable(GL_DEPTH_TEST); glDisable(GL_SCISSOR_TEST); glDisable(GL_BLEND); glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA); glDisable(GL_DITHER); glDisable(GL_CULL_FACE); glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE); glDepthMask(GL_FALSE); glStencilMask(0); glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE); // let's create the textures we need glEnable(GL_TEXTURE_RECTANGLE_EXT); glGenTextures(1, &texId); glBindTexture(GL_TEXTURE_RECTANGLE_EXT, texId); // have GL alloc memory for us for our destination texture which we will be // rendering into glTexImage2D(GL_TEXTURE_RECTANGLE_EXT, 0, GL_RGBA, width_, height_, 0, GL_BGRA /*RGBA*/, GL_UNSIGNED_INT_8_8_8_8_REV, NULL); glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MAG_FILTER, GL_NEAREST); // for the source texture we will provide a data ptr and hang on to it GLuint srcTexture; glGenTextures(1, &srcTexture); glBindTexture(GL_TEXTURE_RECTANGLE_EXT, srcTexture); glPixelStorei(GL_UNPACK_ROW_LENGTH, width_); glPixelStorei(GL_UNPACK_IMAGE_HEIGHT, height_); glPixelStorei(GL_UNPACK_ALIGNMENT, 8); // XXX Alex -- use optimal texture upload format. glTexImage2D(GL_TEXTURE_RECTANGLE_EXT, 0, GL_RGBA, width_, height_, 0, GL_BGRA, /* GL_RGBA,*/ format_.image_channel_order == CL_RGBA ? GL_UNSIGNED_INT_8_8_8_8 : GL_UNSIGNED_INT_8_8_8_8_REV, data_); glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glPixelStorei(GL_UNPACK_SWAP_BYTES, 0); glPixelStorei(GL_UNPACK_LSB_FIRST, 0); glPixelStorei(GL_UNPACK_ROW_LENGTH, 0); glPixelStorei(GL_UNPACK_IMAGE_HEIGHT, 0); glPixelStorei(GL_UNPACK_SKIP_PIXELS, 0); glPixelStorei(GL_UNPACK_SKIP_IMAGES, 0); glPixelStorei(GL_UNPACK_SKIP_ROWS, 0); glPixelStorei(GL_UNPACK_ALIGNMENT, 4); GLuint vertexProgram; GLuint fragmentProgram; glGenProgramsARB(1, &vertexProgram); glGenProgramsARB(1, &fragmentProgram); glBindProgramARB(GL_VERTEX_PROGRAM_ARB, vertexProgram); glProgramStringARB(GL_VERTEX_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, (GLsizei)strlen(sepiaVertexProgram), sepiaVertexProgram); glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, fragmentProgram); glProgramStringARB(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, (GLsizei)strlen(sepiaFragmentProgram), sepiaFragmentProgram); GLfloat l0[] = {1.0f, 0.99f, 0.92f, 1.0f}; GLfloat l1[] = {0.5, 0, 0, 0}; GLfloat l2[] = {1, 0, 0, 0}; GLfloat l3[] = {0, -1, 0, (GLfloat)height_}; glProgramLocalParameter4fvARB(GL_VERTEX_PROGRAM_ARB, 0, l0); glProgramLocalParameter4fvARB(GL_VERTEX_PROGRAM_ARB, 1, l1); glProgramLocalParameter4fvARB(GL_VERTEX_PROGRAM_ARB, 2, l2); glProgramLocalParameter4fvARB(GL_VERTEX_PROGRAM_ARB, 3, l3); glProgramLocalParameter4fvARB(GL_FRAGMENT_PROGRAM_ARB, 0, l0); glProgramLocalParameter4fvARB(GL_FRAGMENT_PROGRAM_ARB, 1, l1); glProgramLocalParameter4fvARB(GL_FRAGMENT_PROGRAM_ARB, 2, l2); glProgramLocalParameter4fvARB(GL_FRAGMENT_PROGRAM_ARB, 3, l3); GLuint fbo; glGenFramebuffersEXT(1, &fbo); glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fbo); glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_RECTANGLE_ARB, texId, 0); glViewport(0, 0, width_, height_); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glOrtho(0, width_, 0, height_, -1, 1); glClearColor(0, 0, 0, 0); glClear(GL_COLOR_BUFFER_BIT); glDisable(GL_BLEND); glEnable(GL_VERTEX_PROGRAM_ARB); glEnable(GL_FRAGMENT_PROGRAM_ARB); // warm up for (unsigned int k = 0; k < (iterations_ / 10); k++) { glBegin(GL_QUADS); glTexCoord2f(0, 0); glVertex2f(0, (GLfloat)height_); glTexCoord2f((GLfloat)width_, 0); glVertex2f((GLfloat)width_, (GLfloat)height_); glTexCoord2f((GLfloat)width_, (GLfloat)height_); glVertex2f((GLfloat)width_, 0); glTexCoord2f(0, (GLfloat)height_); glVertex2f(0, 0); glEnd(); glFlush(); glFinish(); } // actual test for (unsigned int k = 0; k < iterations_; k++) { if (k == 1) { timer_.Reset(); timer_.Start(); } glBegin(GL_QUADS); glTexCoord2f(0, 0); glVertex2f(0, (GLfloat)height_); glTexCoord2f((GLfloat)width_, 0); glVertex2f((GLfloat)width_, (GLfloat)height_); glTexCoord2f((GLfloat)width_, (GLfloat)height_); glVertex2f((GLfloat)width_, 0); glTexCoord2f(0, (GLfloat)height_); glVertex2f(0, 0); glEnd(); } glFlush(); glFinish(); timer_.Stop(); glDisable(GL_VERTEX_PROGRAM_ARB); glDisable(GL_FRAGMENT_PROGRAM_ARB); // now let's read back the pixels result_ = (cl_uchar *)malloc(width_ * height_ * 4); glReadPixels(0, 0, width_, height_, GL_RGBA, GL_UNSIGNED_INT_8_8_8_8_REV, result_); // bind back default frame buffer glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0); glDeleteFramebuffersEXT(1, &fbo); glDeleteTextures(1, &srcTexture); glDeleteProgramsARB(1, &vertexProgram); glDeleteProgramsARB(1, &fragmentProgram); } void OCLGLPerfSepia::runCL(void) { cl_mem dst, src; cl_sampler nearestZero; glEnable(GL_TEXTURE_RECTANGLE_EXT); glGenTextures(1, &texId); glBindTexture(GL_TEXTURE_RECTANGLE_EXT, texId); // XXX Alex: have GL alloc memory for us ... glTexImage2D(GL_TEXTURE_RECTANGLE_EXT, 0, GL_RGBA, width_, height_, 0, GL_RGBA /*BGRA*/, GL_UNSIGNED_INT_8_8_8_8_REV, NULL); dst = _wrapper->clCreateFromGLTexture2D( context_, CL_MEM_READ_WRITE, GL_TEXTURE_RECTANGLE_EXT, 0, texId, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateFromGLTexture2D error (%d)", error_); nearestZero = _wrapper->clCreateSampler(context_, CL_FALSE, CL_ADDRESS_CLAMP, CL_FILTER_NEAREST, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateSampler error (%d)", error_); src = _wrapper->clCreateImage2D( context_, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, &format_, width_, height_, bpr_, data_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateImage2D error (%d)", error_); int numArgs = 0; int dim[2] = {(int)width_, (int)height_}; int flipped[] = {1}; int dims[] = {(int)width_, (int)height_, 0, 0}; float st_origin[] = {0, 0}; float st_delta[] = {1, 0, 0, 1}; _wrapper->clSetKernelArg(kernel_, numArgs++, sizeof(cl_mem), &dst); // arg is a image2DGL named "dst" _wrapper->clSetKernelArg(kernel_, numArgs++, sizeof(int), &flipped); // arg is a int1 named "flipped" _wrapper->clSetKernelArg(kernel_, numArgs++, 4 * sizeof(int), &dims); // arg is a int4 named "dim" _wrapper->clSetKernelArg(kernel_, numArgs++, 2 * sizeof(float), &st_origin); // arg is a float2 named "st_origin" _wrapper->clSetKernelArg(kernel_, numArgs++, 4 * sizeof(float), &st_delta); // arg is a float4 named "st_delta" float l0[] = {1.0f, 0.99f, 0.92f, 1.0f}; float l1[] = {0.5f, 0.0f, 0.0f, 0.0f}; float l2[] = {1.0f, 0.0f, 0.0f, 0.0f}; float l3[] = {0.0f, -1.0f, 0.0f, (float)height_}; _wrapper->clSetKernelArg(kernel_, numArgs++, 4 * sizeof(float), &l0); // arg is a float4 named "l0" _wrapper->clSetKernelArg(kernel_, numArgs++, 4 * sizeof(float), &l1); // arg is a float4 named "l1" _wrapper->clSetKernelArg(kernel_, numArgs++, 4 * sizeof(float), &l2); // arg is a float4 named "l2" _wrapper->clSetKernelArg(kernel_, numArgs++, 4 * sizeof(float), &l3); // arg is a float4 named "l3" _wrapper->clSetKernelArg(kernel_, numArgs++, sizeof(cl_mem), &src); // arg is a image2D named "t0" _wrapper->clSetKernelArg( kernel_, numArgs++, sizeof(cl_sampler), &nearestZero); // arg is a sampler named "t_sampler0" size_t execution_threads[2]; size_t execution_local[2]; cl_uint work_dim = 2; error_ = _wrapper->clGetKernelWorkGroupInfo( kernel_, devices_[_deviceId], CL_KERNEL_WORK_GROUP_SIZE, sizeof(execution_local[0]), &execution_local[0], 0); CHECK_RESULT((error_ != CL_SUCCESS), "clGetKernelWorkGroupInfo error (%d)", error_); execution_local[1] = 1; work_dim = 2; GetKernelExecDimsForImage((unsigned int)execution_local[0], dim[0], dim[1], execution_threads, execution_local); result_ = (cl_uchar *)malloc(height_ * bpr_); const size_t origin[] = {0, 0, 0}; const size_t region[] = {width_, height_, 1}; // warm up for (unsigned int k = 0; k < (iterations_ / 10); k++) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, work_dim, NULL, execution_threads, execution_local, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel error (%d)", error_); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_RESULT((error_ != CL_SUCCESS), "clFinish error (%d)", error_); } // actual test for (unsigned int k = 0; k < iterations_; k++) { if (k == 1) { timer_.Reset(); timer_.Start(); } error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, work_dim, NULL, execution_threads, execution_local, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel error (%d)", error_); } error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_RESULT((error_ != CL_SUCCESS), "clFinish error (%d)", error_); timer_.Stop(); error_ = _wrapper->clEnqueueReadImage(cmdQueues_[_deviceId], dst, true, origin, region, bpr_, 0, result_, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadImage error (%d)", error_); _wrapper->clFinish(cmdQueues_[_deviceId]); _wrapper->clReleaseMemObject(src), src = NULL; _wrapper->clReleaseSampler(nearestZero); _wrapper->clReleaseMemObject(dst), dst = NULL; } void OCLGLPerfSepia::GetKernelExecDimsForImage(unsigned int work_group_size, unsigned int w, unsigned int h, size_t *global, size_t *local) { unsigned int a, b; static const unsigned int tile_size = 16; // local[0] and local[1] must be at least 1 local[0] = tile_size < work_group_size ? tile_size : work_group_size; local[1] = work_group_size / tile_size > tile_size ? tile_size : MAX(work_group_size / tile_size, 1); a = w; b = (unsigned int)local[0]; global[0] = ((a % b) != 0) ? (a / b + 1) : (a / b); global[0] *= local[0]; a = h; b = (unsigned int)local[1]; global[1] = ((a % b) != 0) ? (a / b + 1) : (a / b); global[1] *= local[1]; } void OCLGLPerfSepia::run(void) { if (_errorFlag || silentFailure_) { return; } populateData(); if (_openTest == 0) { runCL(); } else { runGL(); } if (bVerify_) { verifyResult(); } char buf[100]; SNPRINTF(buf, sizeof(buf), "%s iterations# %d", (_openTest == 0) ? "CL" : "GL", iterations_); testDescString = buf; _perfInfo = (float)timer_.GetElapsedTime(); } void OCLGLPerfSepia::verifyResult(void) { int r = 0, g = 0, b = 0, a = 0, d = 0; for (unsigned int k = 0; k < height_ * bpr_; k += 4) { a = a + result_[k + 0]; r = r + result_[k + 1]; g = g + result_[k + 2]; b = b + result_[k + 3]; } d = abs(r - 152797810) + abs(g - 125868080) + abs(b - 76147833) + abs(a - 267386880); CHECK_RESULT(d > 20000, "wrong result"); } unsigned int OCLGLPerfSepia::close(void) { if (silentFailure_) { return 0; } if (data_) { free(data_); } if (result_) { free(result_); } if (texId) { glDeleteTextures(1, &texId); } return OCLGLCommon::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLPerfSepia.h000066400000000000000000000037241450307266000237040ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_SEPIA_H_ #define _OCL_PERF_SEPIA_H_ #include "OCLGLCommon.h" #include "Timer.h" class OCLGLPerfSepia : public OCLGLCommon { public: OCLGLPerfSepia(); virtual ~OCLGLPerfSepia(); virtual void open(unsigned int test, char *units, double &conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: void runGL(void); void runCL(void); void populateData(void); void verifyResult(void); void GetKernelExecDimsForImage(unsigned int work_group_size, unsigned int w, unsigned int h, size_t *global, size_t *local); bool silentFailure_; cl_uint iterations_; cl_image_format format_; cl_uchar *data_; cl_uchar *result_; bool bVerify_; cl_uint width_; cl_uint height_; cl_uint bpr_; GLuint texId; CPerfCounter timer_; }; #endif // _OCL_PERF_SEPIA_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLTexture.cpp000066400000000000000000000123711450307266000240170ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGLTexture.h" #include #include #include #include const static char* strKernelui = "__kernel void gltexture_test(read_only image2d_t source, write_only " "image2d_t dest) \n" "{ " " \n" " int tidX = get_global_id(0); " " \n" " int tidY = get_global_id(1); " " \n" " uint4 pixel = read_imageui(source, (int2)(tidX, tidY)); " " \n" " write_imageui(dest, (int2)(tidX, tidY), pixel); " " \n" "}"; const static char* strKernelf = "__kernel void gltexture_test(read_only image2d_t source, write_only " "image2d_t dest) \n" "{ " " \n" " int tidX = get_global_id(0); " " \n" " int tidY = get_global_id(1); " " \n" " float4 pixel = read_imagef(source, (int2)(tidX, tidY)); " " \n" " write_imagef(dest, (int2)(tidX, tidY), pixel); " " \n" "} " " \n"; OCLGLTexture::OCLGLTexture() : inDataGL_(NULL), outDataGL_(NULL), inGLTexture_(0), outGLTexture_(0) { _numSubTests = 4 * 2; } OCLGLTexture::~OCLGLTexture() {} void OCLGLTexture::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { // Initialize random number seed srand((unsigned int)time(NULL)); OCLGLCommon::open(test, units, conversion, deviceId); if (_errorFlag) return; currentTest_ = test % 4; testRender_ = ((test / 4) >= 1) ? true : false; // Build the kernel if (0 == currentTest_) { program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernelui, NULL, &error_); } else { program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernelf, NULL, &error_); } CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed (%d)", error_); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed (%d)", error_); kernel_ = _wrapper->clCreateKernel(program_, "gltexture_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed (%d)", error_); } void OCLGLTexture::run(void) { bool retVal = false; switch (currentTest_) { case 0: retVal = runTextureTest(GL_RGBA32UI, GL_RGBA_INTEGER, GL_UNSIGNED_INT); break; case 1: retVal = runTextureTest(GL_RGBA8, GL_RGBA, GL_UNSIGNED_BYTE); break; case 2: retVal = runTextureTest(GL_RGBA16, GL_RGBA, GL_SHORT); break; case 3: retVal = runTextureTest(GL_RGBA32F, GL_RGBA, GL_FLOAT); break; default: CHECK_RESULT(true, "unsupported test number\n"); } CHECK_RESULT((retVal != true), "cl-gl texture interop test failed "); } unsigned int OCLGLTexture::close(void) { clReleaseMemObject(buffers_[0]); clReleaseMemObject(buffers_[1]); buffers_.clear(); // Delete GL in & out buffers glFinish(); glBindTexture(GL_TEXTURE_2D, 0); glDeleteTextures(1, &inGLTexture_); inGLTexture_ = 0; glDeleteTextures(1, &outGLTexture_); outGLTexture_ = 0; free(inDataGL_); inDataGL_ = NULL; free(outDataGL_); outDataGL_ = NULL; return OCLGLCommon::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/OCLGLTexture.h000066400000000000000000000165101450307266000234630ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GL_TEXTURE_H_ #define _OCL_GL_TEXTURE_H_ #include #include "OCLGLCommon.h" class OCLGLTexture : public OCLGLCommon { public: static const unsigned int c_imageWidth = 512; static const unsigned int c_imageHeight = 512; static const unsigned int c_elementsPerPixel = 4; OCLGLTexture(); virtual ~OCLGLTexture(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: unsigned int currentTest_; void* inDataGL_; void* outDataGL_; GLuint inGLTexture_; GLuint outGLTexture_; bool testRender_; template bool runTextureTest(GLint internalFormat, GLenum format, GLenum type); }; template bool OCLGLTexture::runTextureTest(GLint internalFormat, GLenum format, GLenum type) { cl_mem image; inDataGL_ = malloc(c_imageWidth * c_imageHeight * c_elementsPerPixel * sizeof(T)); outDataGL_ = malloc(c_imageWidth * c_imageHeight * c_elementsPerPixel * sizeof(T)); // Initialize input data with random values T* inputIterator = (T*)inDataGL_; for (unsigned int i = 0; i < c_imageWidth * c_imageHeight * c_elementsPerPixel; i++) { inputIterator[i] = (T)(rand() % 255); } // Initialize output data with zeros memset(outDataGL_, 0, c_imageWidth * c_imageHeight * c_elementsPerPixel * sizeof(T)); // Generate and Bind in & out OpenGL textures glGenTextures(1, &inGLTexture_); glGenTextures(1, &outGLTexture_); glBindTexture(GL_TEXTURE_2D, inGLTexture_); glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, (GLsizei)c_imageWidth, (GLsizei)c_imageHeight, 0, format, type, inDataGL_); glBindTexture(GL_TEXTURE_2D, outGLTexture_); glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, (GLsizei)c_imageWidth, (GLsizei)c_imageHeight, 0, format, type, outDataGL_); glFinish(); // Create input buffer from GL input texture image = _wrapper->clCreateFromGLTexture( context_, CL_MEM_READ_ONLY, GL_TEXTURE_2D, 0, inGLTexture_, &error_); if (error_ != CL_SUCCESS) { printf("Unable to create input buffer from GL texture (%d)", error_); return false; } buffers_.push_back(image); // Create output buffer from GL output texture image = _wrapper->clCreateFromGLTexture( context_, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, outGLTexture_, &error_); if (error_ != CL_SUCCESS) { printf("Unable to create output buffer from GL texture (%d)", error_); return false; } buffers_.push_back(image); size_t gws[2] = {c_imageWidth, c_imageHeight}; // Assign args for (unsigned int i = 0; i < buffers_.size(); i++) { error_ = _wrapper->clSetKernelArg(kernel_, i, sizeof(cl_mem), &buffers()[i]); if (error_ != CL_SUCCESS) { printf("clSetKernelArg() failed (%d)", error_); return false; } } int loop = (testRender_) ? 2 : 1; for (int l = 0; l < loop; ++l) { if (testRender_ && (l == 0)) { GLuint FrameBufferName = 0; glGenFramebuffers(1, &FrameBufferName); glBindFramebuffer(GL_FRAMEBUFFER, FrameBufferName); glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, inGLTexture_, 0); glClearColor(.5f, 1.f, 1.0f, 0); glClear(GL_COLOR_BUFFER_BIT); glFinish(); } error_ = _wrapper->clEnqueueAcquireGLObjects(cmdQueues_[_deviceId], 2, &buffers()[0], 0, NULL, NULL); if (error_ != CL_SUCCESS) { printf("Unable to acquire GL objects (%d)", error_); return false; } error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, gws, NULL, 0, NULL, NULL); if (error_ != CL_SUCCESS) { printf("clEnqueueNDRangeKernel() failed (%d)", error_); return false; } error_ = _wrapper->clEnqueueReleaseGLObjects(cmdQueues_[_deviceId], 2, &buffers()[0], 0, NULL, NULL); if (error_ != CL_SUCCESS) { printf("clEnqueueReleaseGLObjects failed (%d)", error_); return false; } error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); if (error_ != CL_SUCCESS) { printf("clFinish() failed (%d)", error_); return false; } if (testRender_ && (l == 0)) { glClearColor(1.f, 1.f, 1.f, 1.f); glClear(GL_COLOR_BUFFER_BIT); glFinish(); } } // Get the results from GL texture glBindTexture(GL_TEXTURE_2D, outGLTexture_); glActiveTexture(GL_TEXTURE0); glGetTexImage(GL_TEXTURE_2D, 0, format, type, outDataGL_); // Check output texture data inputIterator = (T*)inDataGL_; T* outputIterator = (T*)outDataGL_; T color; switch (type) { case GL_UNSIGNED_INT: color = (T)0x3f800000; break; case GL_UNSIGNED_BYTE: color = (T)0xff; break; case GL_SHORT: color = (T)0x7fff; break; case GL_FLOAT: color = (T)1.f; break; default: return false; } for (unsigned int i = 0; i < c_imageWidth * c_imageHeight * c_elementsPerPixel; i++) { if (testRender_) { if (outputIterator[i] != color) { std::cout << "Element " << i << " in output texture is incorrect! (internal format = " << internalFormat << "\n\t expected:" << inputIterator[i] << " differs from actual clear color:" << color << std::endl; return false; } } else if (inputIterator[i] != outputIterator[i]) { std::cout << "Element " << i << " in output texture is incorrect! (internal format = " << internalFormat << "\n\t expected:" << inputIterator[i] << " differs from actual: " << outputIterator[i] << std::endl; return false; } } return true; } #endif // _OCL_GL_TEXTURE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/TestList.cpp000066400000000000000000000036601450307266000233520ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLTestListImp.h" // // Includes for tests // #include "OCLGLBuffer.h" #include "OCLGLBufferMultipleQueues.h" #include "OCLGLDepthBuffer.h" #include "OCLGLDepthTex.h" #include "OCLGLFenceSync.h" #include "OCLGLMsaaTexture.h" #include "OCLGLMultiContext.h" #include "OCLGLTexture.h" #include "OCLGLPerfSepia.h" // // Helper macro for adding tests // template static void* dictionary_CreateTestFunc(void) { return new T(); } #define TEST(name) \ { #name, &dictionary_CreateTestFunc < name> } TestEntry TestList[] = { TEST(OCLGLBuffer), TEST(OCLGLBufferMultipleQueues), TEST(OCLGLTexture), TEST(OCLGLMultiContext), TEST(OCLGLFenceSync), TEST(OCLGLDepthTex), TEST(OCLGLPerfSepia), }; unsigned int TestListCount = sizeof(TestList) / sizeof(TestList[0]); unsigned int TestLibVersion = 0; const char* TestLibName = "oclgl"; clr-rocm-5.7.1/opencl/tests/ocltst/module/gl/oclgl.exclude000066400000000000000000000000141450307266000235340ustar00rootroot00000000000000# all clear clr-rocm-5.7.1/opencl/tests/ocltst/module/include/000077500000000000000000000000001450307266000221075ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/module/include/BaseTestImp.h000066400000000000000000000152611450307266000244450ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _BaseTestImp_H_ #define _BaseTestImp_H_ #include #include #include #include #include #include "OCLTest.h" #include "OCLWrapper.h" #define EXIT_SILENT_FAILURE 2 #define KERNEL(...) #__VA_ARGS__ #ifdef _MSC_VER #define NOMINMAX #define snprintf sprintf_s #endif #define CHECK_ERROR(error, msg) \ if (error != CL_SUCCESS) { \ _errorFlag = true; \ printf("\n\n%s\nError code: %d\n\n", msg, error); \ _errorMsg = msg; \ _crcword += 1; \ return; \ } #define CHECK_ERROR_NO_RETURN(error, msg) \ if (error != CL_SUCCESS) { \ _errorFlag = true; \ printf("\n\n%s\nError code: %d\n\n", msg, error); \ _errorMsg = msg; \ _crcword += 1; \ } #define CHECK_RESULT(test, msg, ...) \ if ((test)) { \ char* buf = (char*)malloc(4096); \ _errorFlag = true; \ int rc = snprintf(buf, 4096, msg, ##__VA_ARGS__); \ assert(rc >= 0 && rc < (int)4096); \ printf("%s:%d - %s\n", __FILE__, __LINE__, buf); \ _errorMsg = std::string(buf); \ _crcword += 1; \ free(buf); \ return; \ } #define CHECK_RESULT_ARGS CHECK_RESULT #define CHECK_RESULT_NO_RETURN(test, msg, ...) \ if ((test)) { \ char* buf = (char*)malloc(4096); \ _errorFlag = true; \ int rc = snprintf(buf, 4096, msg, ##__VA_ARGS__); \ assert(rc >= 0 && rc < (int)4096); \ printf("%s:%d - %s\n", __FILE__, __LINE__, buf); \ _errorMsg = std::string(msg); \ _crcword += 1; \ free(buf); \ } #define CHECK_RESULT_NO_RETURN_ARGS CHECK_RESULT_NO_RETURN #define CHECK_RESULT_SHUTDOWN(test, msg) \ if ((test)) { \ _errorFlag = true; \ printf("%s\n", msg); \ _errorMsg = msg; \ _crcword += 1; \ close(); \ return; \ } #define CHECK_RESULT_CL(test, msg) \ if ((test)) { \ _errorFlag = true; \ printf("%s\n", msg); \ _errorMsg = msg; \ _crcword += 1; \ return 1; \ } class BaseTestImp : public OCLTest { public: BaseTestImp(); virtual ~BaseTestImp(); public: virtual unsigned int getThreadUsage(void); virtual int getNumSubTests(void); //! Abstract functions being defined here virtual void open(); virtual void open(unsigned int test, const char* deviceName, unsigned int architecture); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId, unsigned int platformIndex) { return open(test, "Tahiti", platformIndex); } virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { return open(test, "Tahiti", 0); } virtual void run(void) = 0; virtual unsigned int close(void); //! Functions to set class members virtual void checkComplib(unsigned int test, const char* deviceName, unsigned int architecture); virtual void setDeviceName(const char*); virtual const char* getDeviceName(); virtual void setErrorMsg(const char* error); virtual const char* getErrorMsg(void); virtual bool hasErrorOccured(void); virtual void clearError(); BaseTestImp* toBaseTestImp() { return this; } virtual OCLTestImp* toOCLTestImp() { return NULL; } virtual void useCPU() { _cpu = true; } virtual void setIterationCount(int cnt); virtual void setDeviceId(unsigned int deviceId); virtual unsigned int getDeviceId(); virtual void setPlatformIndex(unsigned int platformIndex); virtual unsigned int getPlatformIndex(); virtual float getPerfInfo(); virtual void clearPerfInfo(); protected: unsigned int _numSubTests; unsigned int _openTest; unsigned int _useThreads; int _iterationCnt; float _perfInfo; bool _cpu; unsigned int _crcword; unsigned int _crctab[256]; bool _errorFlag; std::string _errorMsg; const char* _deviceName; unsigned int _architecture; unsigned int _deviceId; unsigned int _platformIndex; bool failed_ = false; cl_int error_; cl_uint type_; cl_uint deviceCount_; cl_device_id* devices_; cl_context context_; cl_program program_; cl_kernel kernel_; }; // enum to keep track of different memory types enum MemType { LOOCL, REMOTE_CACHED, REMOTE_UNCACHED }; class DataType { cl_image_format f; const char* str; unsigned int size; public: DataType() {} DataType(cl_image_format f, const char* str, unsigned int size) { this->f = f; this->str = str; this->size = size; } operator const char*() { return str; } operator unsigned int() { return size; } operator cl_image_format() { return f; } }; // useful for initialization of an array of data types for a test #define DTYPE(x, y) DataType(x, #x, (unsigned int)y) #endif clr-rocm-5.7.1/opencl/tests/ocltst/module/include/OCLTestImp.h000066400000000000000000000051271450307266000242100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLTestImp_H_ #define _OCLTestImp_H_ #include #include #include "BaseTestImp.h" #include "CL/cl.h" #include "OCL/Thread.h" #include "OCLTest.h" #include "OCLWrapper.h" class OCLTestImp : public BaseTestImp { public: OCLTestImp(); virtual ~OCLTestImp(); public: //! Abstract functions being defined here virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId, unsigned int platformIndex); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void) = 0; virtual unsigned int close(void); //! Functions to set class members public: void useCPU(); int genIntRand(int a, int b); int genBitRand(int n); void accumulateCRC(const void* buffer, int len); void setOCLWrapper(OCLWrapper* wrapper); OCLTestImp* toOCLTestImp() { return this; } static OCLutil::Lock openDeviceLock; static OCLutil::Lock compileLock; protected: const std::vector& buffers() const { return buffers_; } OCLWrapper* _wrapper; int _seed; // Common data of any CL program cl_int error_; cl_uint type_; cl_uint deviceCount_; cl_device_id* devices_; cl_platform_id platform_; std::vector cmdQueues_; cl_context context_; cl_program program_; cl_kernel kernel_; std::vector buffers_; }; // useful for initialization of an array of data types for a test #define DTYPE(x, y) DataType(x, #x, (unsigned int)y) #endif clr-rocm-5.7.1/opencl/tests/ocltst/module/include/OCLTestListImp.h000066400000000000000000000051421450307266000250410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __Dictionary_h__ #define __Dictionary_h__ // // Testing module (plugin) interface forward declarations // #ifdef _WIN32 #define OCL_DLLEXPORT __declspec(dllexport) #define OCL_CALLCONV __cdecl #endif #ifdef __linux__ #define OCL_DLLEXPORT #define OCL_CALLCONV #endif class OCLTest; // // OCLTestList_TestCount - retrieve the number of tests in the testing module // extern "C" OCL_DLLEXPORT unsigned int OCL_CALLCONV OCLTestList_TestCount(void); // // OCLTestList_TestLibVersion - retrieve the version of test lib in the testing // module // extern "C" OCL_DLLEXPORT unsigned int OCL_CALLCONV OCLTestList_TestLibVersion(void); // // OCLTestList_TestLibName - retrieve the name of test library // extern "C" OCL_DLLEXPORT const char* OCL_CALLCONV OCLTestList_TestLibName(void); // // OCLTestList_TestName - retrieve the name of the indexed test in the module // extern "C" OCL_DLLEXPORT const char* OCL_CALLCONV OCLTestList_TestName(unsigned int testNum); // // OCLTestList_CreateTest - create a test by index // extern "C" OCL_DLLEXPORT OCLTest* OCL_CALLCONV OCLTestList_CreateTest(unsigned int testNum); // // OCLTestList_DestroyTest - destroy a test object // extern "C" OCL_DLLEXPORT void OCL_CALLCONV OCLTestList_DestroyTest(OCLTest* test); // // internal global data that is populated in each dll // typedef struct _TestEntry { const char* name; void* (*create)(void); } TestEntry; extern TestEntry TestList[]; extern unsigned int TestListCount; extern unsigned int TestLibVersion; extern const char* TestLibName; #endif clr-rocm-5.7.1/opencl/tests/ocltst/module/include/OclIncludes.h000066400000000000000000000024041450307266000244640ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_INCLUDES_H #define _OCL_INCLUDES_H #ifdef _WIN32 #define POINTER_64 __ptr64 #include #include "d3d9.h" #endif #include "CL/cl.h" #endif //_OCL_INCLUDES_H clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/000077500000000000000000000000001450307266000214205ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/CMakeLists.txt000066400000000000000000000070001450307266000241550ustar00rootroot00000000000000set(TESTS OCLPerf3DImageWriteSpeed OCLPerfAES256 OCLPerfAtomicSpeed20 OCLPerfAtomicSpeed OCLPerfBufferCopyOverhead OCLPerfBufferCopySpeed OCLPerfBufferReadSpeed OCLPerfBufferWriteSpeed OCLPerfCommandQueue OCLPerfConcurrency OCLPerfCPUMemSpeed OCLPerfDeviceConcurrency OCLPerfDeviceEnqueue OCLPerfDeviceEnqueue2 OCLPerfDeviceEnqueueEvent OCLPerfDeviceEnqueueSier OCLPerfDevMemReadSpeed OCLPerfDevMemWriteSpeed OCLPerfDispatchSpeed OCLPerfDoubleDMA OCLPerfDoubleDMASeq OCLPerfFillBuffer OCLPerfFillImage OCLPerfFlush OCLPerfGenericBandwidth OCLPerfGenoilSiaMiner OCLPerfImageCopyCorners OCLPerfImageCopySpeed OCLPerfImageCreate OCLPerfImageMapUnmap OCLPerfImageReadSpeed OCLPerfImageReadsRGBA OCLPerfImageReadWrite OCLPerfImageSampleRate OCLPerfImageWriteSpeed OCLPerfKernelArguments OCLPerfKernelThroughput OCLPerfLDSLatency OCLPerfLDSReadSpeed OCLPerfMandelbrot OCLPerfMapBufferReadSpeed OCLPerfMapBufferWriteSpeed OCLPerfMapImageReadSpeed OCLPerfMapImageWriteSpeed OCLPerfMatrixTranspose OCLPerfMemCombine OCLPerfMemCreate OCLPerfMemLatency OCLPerfPinnedBufferReadSpeed OCLPerfPinnedBufferWriteSpeed OCLPerfPipeCopySpeed OCLPerfProgramGlobalRead OCLPerfProgramGlobalWrite OCLPerfSampleRate OCLPerfScalarReplArrayElem OCLPerfSdiP2PCopy OCLPerfSHA256 OCLPerfSVMAlloc OCLPerfSVMKernelArguments OCLPerfSVMMap OCLPerfSVMMemcpy OCLPerfSVMMemFill OCLPerfSVMSampleRate OCLPerfTextureMemLatency OCLPerfUAVReadSpeed OCLPerfUAVReadSpeedHostMem OCLPerfUAVWriteSpeedHostMem OCLPerfUncoalescedRead OCLPerfVerticalFetch ) add_library(oclperf SHARED TestList.cpp $) foreach(TEST ${TESTS}) target_sources(oclperf PRIVATE ${TEST}.cpp) endforeach() set_target_properties(oclperf PROPERTIES CXX_STANDARD 14 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst) target_compile_definitions(oclperf PRIVATE $) target_include_directories(oclperf PRIVATE $) target_link_libraries(oclperf PRIVATE OpenCL) add_custom_command( TARGET oclperf POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/oclperf.exclude ${CMAKE_BINARY_DIR}/tests/ocltst/oclperf.exclude) add_custom_target(test.ocltst.oclperf COMMAND ${CMAKE_COMMAND} -E env "OCL_ICD_FILENAMES=$" $ -p 0 -m $ -A oclperf.exclude DEPENDS ocltst oclperf amdocl WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst USES_TERMINAL) foreach(TEST ${TESTS}) add_custom_target(test.ocltst.oclperf.${TEST} COMMAND ${CMAKE_COMMAND} -E env "OCL_ICD_FILENAMES=$" $ -p 0 -m $ -t ${TEST} DEPENDS ocltst oclperf amdocl WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst USES_TERMINAL) endforeach() INSTALL(TARGETS oclperf DESTINATION ${OCLTST_INSTALL_DIR} COMPONENT ocltst) INSTALL(FILES oclperf.exclude DESTINATION ${OCLTST_INSTALL_DIR} COMPONENT ocltst) clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerf3DImageWriteSpeed.cpp000066400000000000000000000163701450307266000265130ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerf3DImageWriteSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define KERNEL_CODE(...) #__VA_ARGS__ #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {64, 128, 256, 512}; #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"CL_RGBA , CL_UNSIGNED_INT8"}; static const unsigned int formatSize[NUM_FORMATS] = {sizeof(CL_UNSIGNED_INT8)}; const static char *strKernel = {KERNEL_CODE( \n __kernel void image_kernel(write_only image3d_t input) { size_t x = get_global_id(0); size_t y = get_global_id(1); size_t z = get_global_id(2); int4 coords = (int4)(x, y, z, 0); write_imageui(input, coords, (1, 1, 1, 1)); } \n)}; OCLPerf3DImageWriteSpeed::OCLPerf3DImageWriteSpeed() { _numSubTests = NUM_SIZES * NUM_FORMATS; } OCLPerf3DImageWriteSpeed::~OCLPerf3DImageWriteSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerf3DImageWriteSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; testId_ = test; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; cmd_queue_ = 0; imageBuffer_ = 0; skip_ = false; char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (!strstr(charbuf, "cl_khr_3d_image_writes")) { skip_ = true; testDescString = "3D Write not supported. Test Skipped."; return; } bufSize_ = Sizes[test % NUM_SIZES]; bufnum_ = (test / NUM_SIZES) % NUM_FORMATS; memSize_ = bufSize_ * bufSize_ * bufSize_ * formatSize[bufnum_]; cmd_queue_ = cmdQueues_[_deviceId]; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "image_kernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); imageBuffer_ = _wrapper->clCreateImage3D( context_, CL_MEM_WRITE_ONLY, &formats[bufnum_], bufSize_, bufSize_, bufSize_, 0, 0, NULL, &error_); CHECK_RESULT(imageBuffer_ == 0, "clCreateImage(imageBuffer_) failed"); // set kernel arguments error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &imageBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } void OCLPerf3DImageWriteSpeed::run(void) { if (skip_) { return; } CPerfCounter timer; unsigned int fmt_num = (testId_ / NUM_SIZES) % NUM_FORMATS; size_t gws[3] = {bufSize_, bufSize_, bufSize_}; size_t lws[3] = {8, 8, 4}; // warm up error_ = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, kernel_, 3, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmd_queue_); // checkData char *bufptr = (char *)malloc(memSize_); size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, bufSize_}; size_t image_row_pitch = bufSize_ * formatSize[bufnum_]; size_t image_slice_pitch = image_row_pitch * bufSize_; error_ = clEnqueueReadImage(cmd_queue_, imageBuffer_, true, origin, region, image_row_pitch, image_slice_pitch, bufptr, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadImage() failed"); for (size_t i = 0; i < bufSize_ * bufSize_ * bufSize_ * 4; ++i) { if (bufptr[i] != 1) { printf("(%4dx%4dx%4d) fmt:%s(%1u) checkData() fail, image_ptr[%u] = %d\n", bufSize_, bufSize_, bufSize_, textFormats[fmt_num], formatSize[bufnum_], (unsigned int)i, (int)bufptr[i]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4dx%4d) fmt:%s(%1d) checkData() FAILED! ", bufSize_, bufSize_, bufSize_, textFormats[fmt_num], formatSize[bufnum_]); testDescString = buf; return; } } delete bufptr; // test begins unsigned int numIter = 5; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; ++i) { error_ = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, kernel_, 3, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // write_image speed in GB/s double perf = ((double)memSize_ * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%3dx%3dx%3d) fmt:%s(%1u) i: %2d (GB/s) ", bufSize_, bufSize_, bufSize_, textFormats[fmt_num], formatSize[bufnum_], numIter); testDescString = buf; } unsigned int OCLPerf3DImageWriteSpeed::close(void) { if (!skip_) { if (imageBuffer_) { error_ = _wrapper->clReleaseMemObject(imageBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(imageBuffer_) failed"); } } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerf3DImageWriteSpeed.h000066400000000000000000000033141450307266000261520ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_3DImageWriteSpeed_H_ #define _OCL_3DImageWriteSpeed_H_ #include "OCLTestImp.h" class OCLPerf3DImageWriteSpeed : public OCLTestImp { public: OCLPerf3DImageWriteSpeed(); virtual ~OCLPerf3DImageWriteSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); cl_command_queue cmd_queue_; cl_mem imageBuffer_; unsigned int bufSize_; unsigned int bufnum_; char* memptr; unsigned int memSize_; unsigned int testId_; bool skip_; }; #endif // _OCL_3DImageWriteSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfAES256.cpp000066400000000000000000000367451450307266000241630ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfAES256.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const char *aes256_kernel = "// NOTE: THIS KERNEL WAS ADOPTED FROM SISOFT SANDRA: DO NOT " "REDISTRIBUTE!!\n" "inline uint Load(__global uint* pData, const uint iX, const uint iY)\n" "{\n" " return pData[iX | (iY << 8)];\n" "}\n" "\n" "\n" "inline uint4 Load4(__global uint* pData, const uint4 uX, const uint iY)\n" "{\n" " uint uExtent = iY << 8;\n" " uint4 uNdx = uX + uExtent;\n" " \n" " return (uint4)(pData[uNdx.x], pData[uNdx.y], pData[uNdx.z], " "pData[uNdx.w]);\n" "}\n" "\n" "\n" "__kernel \n" "__attribute__((vec_type_hint(uint4))) \n" "void CryptThread(__global uint4* pInput, __global uint4* pOutput,\n" " __global uint* pTables,\n" " __global uint4* pKey, const uint iRounds)\n" "{\n" " const uint iNdx = get_global_id(0);\n" " \n" " uint4 state, istate, tstate;\n" " state = pInput[iNdx] ^ pKey[iRounds];\n" " \n" " for (uint i = iRounds-1; i; i--)\n" " {\n" " istate = state & 0xFF;\n" " tstate = Load4(pTables, istate.xyzw, 0);\n" "\n" " istate = (state >> 8) & 0xFF;\n" " tstate^= Load4(pTables, istate.wxyz, 1);\n" "\n" " istate = (state >> 16) & 0xFF;\n" " tstate^= Load4(pTables, istate.zwxy, 2);\n" "\n" " istate = state >> 24;\n" " tstate^= Load4(pTables, istate.yzwx, 3);\n" "\n" " state = tstate ^ pKey[i];\n" " }\n" "\n" " istate = state & 0xFF;\n" " tstate = Load4(pTables, istate.xyzw, 4);\n" "\n" " istate = (state >> 8) & 0xFF;\n" " tstate |= Load4(pTables, istate.wxyz, 4) << 8;\n" "\n" " istate = (state >> 16) & 0xFF;\n" " tstate |= Load4(pTables, istate.zwxy, 4) << 16;\n" "\n" " istate = state >> 24;\n" " tstate |= Load4(pTables, istate.yzwx, 4) << 24;\n" "\n" " pOutput[iNdx] = tstate ^ pKey[0];\n" "}\n"; static const char *aes256_kernel2 = "// NOTE: THIS KERNEL WAS ADOPTED FROM SISOFT SANDRA: DO NOT " "REDISTRIBUTE!!\n" "#define AES_BLOCK_SIZE 16\n" "#define AES_TABLE_SIZE 256\n" "\n" "#define AES_TABLE_MAX 5\n" "#define AES_CONST_SIZE (AES_TABLE_SIZE*AES_TABLE_MAX)\n" "\n" "#define AES_ROUND_128 10\n" "#define AES_ROUND_192 12\n" "#define AES_ROUND_256 14\n" "#define AES_ROUNDKEY_MAX (AES_BLOCK_SIZE/4*(AES_ROUND_256+1))\n" "#define _IS_GPU_\n" "\n" "\n" "inline uint Load(\n" "#ifdef _IS_GPU_\n" " __local uint* pData,\n" "#else\n" " __constant uint* pData,\n" "#endif\n" " const uint iX, const uint iY)\n" "{\n" " const uint uNdx = iX + iY*AES_TABLE_SIZE;\n" " return pData[uNdx];\n" "}\n" "\n" "\n" "inline uint4 Load4(\n" "#ifdef _IS_GPU_\n" " __local uint* pData,\n" "#else\n" " __constant uint* pData,\n" "#endif\n" " const uint4 uX, const uint iY)\n" "{\n" " const uint uExtent = iY*AES_TABLE_SIZE;\n" " const uint4 uNdx = uX + uExtent;\n" " \n" " return (uint4)(pData[uNdx.x], pData[uNdx.y], pData[uNdx.z], " "pData[uNdx.w]);\n" "}\n" "\n" "\n" "__kernel \n" "__attribute__((vec_type_hint(uint4)))\n" "#ifdef KERNEL_MAX_THREADS\n" "__attribute__((work_group_size_hint(KERNEL_MAX_THREADS, 1, 1)))\n" "#endif\n" "void CryptThread(__global const uint4* pInput, __global uint4* pOutput,\n" " __constant uint* pTables,\n" " __constant uint4* pKey, const uint iRounds)\n" "{\n" " const size_t iNdx = get_global_id(0);\n" "\n" "#ifdef _IS_GPU_\n" " #define Load4T(x, y) Load4(ulTables, x, y)\n" "\n" " __local uint ulTables[AES_CONST_SIZE];\n" "\n" " const uint iLdx = get_local_id(0);\n" " if (iLdx < AES_TABLE_SIZE) {\n" " const uint iGrps = get_local_size(0);\n" " const uint iLSize = min(iGrps, (uint)AES_TABLE_SIZE);\n" " const uint iBpL = AES_CONST_SIZE/iLSize;\n" "\n" " const uint iStart = iLdx*iBpL;\n" " const uint iEnd = iStart + iBpL;\n" "\n" " for (uint i=iStart; i> 8) & 0xFF;\n" " tstate^= Load4T(istate.yzwx, 1);\n" "\n" " istate = (state >> 16) & 0xFF;\n" " tstate^= Load4T(istate.zwxy, 2);\n" "\n" " istate = state >> 24;\n" " tstate^= Load4T(istate.wxyz, 3);\n" "\n" " state = tstate ^ pKey[i];\n" " }\n" "\n" " istate = state & 0xFF;\n" " tstate = Load4T(istate.xyzw, 4);\n" "\n" " istate = (state >> 8) & 0xFF;\n" " tstate |= Load4T(istate.yzwx, 4) << 8;\n" "\n" " istate = (state >> 16) & 0xFF;\n" " tstate |= Load4T(istate.zwxy, 4) << 16;\n" "\n" " istate = state >> 24;\n" " tstate |= Load4T(istate.wxyz, 4) << 24;\n" "\n" " pOutput[iNdx] = tstate ^ pKey[iRounds];\n" "}\n"; OCLPerfAES256::OCLPerfAES256() { _numSubTests = 2; } OCLPerfAES256::~OCLPerfAES256() {} void OCLPerfAES256::setData(cl_mem buffer, unsigned int val) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < bufSize_ / sizeof(unsigned int); i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); } void OCLPerfAES256::checkData(cl_mem buffer) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < bufSize_ / sizeof(unsigned int); i++) { } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfAES256::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; tableBuffer_ = 0; keyBuffer_ = 0; blockSize_ = 1024; maxIterations = 50; bufSize_ = 5592320 * sizeof(cl_uint4); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); // Increase iterations for devices with many CUs error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(size_t), &numCUs, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); maxIterations *= (unsigned int)(1 + 10 * numCUs / 20); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); inBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, bufSize_, NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); tableBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, 5120, NULL, &error_); CHECK_RESULT(tableBuffer_ == 0, "clCreateBuffer(tableBuffer) failed"); keyBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, 240, NULL, &error_); CHECK_RESULT(keyBuffer_ == 0, "clCreateBuffer(keyBuffer) failed"); if (_openTest == 0) { program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&aes256_kernel, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); testDescString += "orig"; } else { program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&aes256_kernel2, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); testDescString += " new"; } const char *buildOps = NULL; error_ = _wrapper->clBuildProgram(program_, 1, &device, buildOps, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "CryptThread", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); cl_uint rounds = 14; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_mem), (void *)&tableBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_mem), (void *)&keyBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_uint), (void *)&rounds); setData(inBuffer_, 0xdeadbeef); setData(outBuffer_, 0xdeadbeef); } void OCLPerfAES256::run(void) { int global = bufSize_ / sizeof(cl_uint4); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < maxIterations; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); } CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // No idea what data should be in here // checkData(outBuffer_); // Compute GB/s double perf = ((double)bufSize_ * (double)maxIterations * (double)(1e-09)) / sec; _perfInfo = (float)perf; } unsigned int OCLPerfAES256::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (tableBuffer_) { error_ = _wrapper->clReleaseMemObject(tableBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(tableBuffer_) failed"); } if (keyBuffer_) { error_ = _wrapper->clReleaseMemObject(keyBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(keyBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfAES256.h000066400000000000000000000036001450307266000236100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_AES256_H_ #define _OCL_AES256_H_ #include "OCLTestImp.h" class OCLPerfAES256 : public OCLTestImp { public: OCLPerfAES256(); virtual ~OCLPerfAES256(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem inBuffer_; cl_mem outBuffer_; cl_mem tableBuffer_; cl_mem keyBuffer_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int blockSize_; unsigned int maxIterations; size_t numCUs; }; #endif // _OCL_AES256_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfAtomicSpeed.cpp000066400000000000000000000704161450307266000255040ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfAtomicSpeed.h" #include #include #include #include #include #include "CL/cl.h" #include "OCLPerfAtomicSpeedKernels.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif // Define the test suite tests. testOCLPerfAtomicSpeedStruct testOCLPerfAtomicSpeedList[] = { {LocalHistogram, 1}, {LocalHistogram, 2}, {LocalHistogram, 4}, {GlobalHistogram, 1}, {GlobalHistogram, 2}, {GlobalHistogram, 4}, {Global4Histogram, 1}, {Global4Histogram, 2}, {Global4Histogram, 4}, {LocalReductionNoAtomics, 1}, {LocalReductionNoAtomics, 2}, {LocalReductionNoAtomics, 4}, {LocalReductionAtomics, 1}, {LocalReductionAtomics, 2}, {LocalReductionAtomics, 4}, {Local4ReductionNoAtomics, 1}, {Local4ReductionNoAtomics, 2}, {Local4ReductionNoAtomics, 4}, /* {Local4ReductionAtomics, 1}, {Local4ReductionAtomics, 2}, {Local4ReductionAtomics, 4},*/ {GlobalWGReduction, 1}, {GlobalWGReduction, 2}, {GlobalWGReduction, 4}, {GlobalAllToZeroReduction, 1}, {GlobalAllToZeroReduction, 2}, {GlobalAllToZeroReduction, 4}, {Global4WGReduction, 1}, {Global4WGReduction, 2}, {Global4WGReduction, 4}, {Global4AllToZeroReduction, 1}, {Global4AllToZeroReduction, 2}, {Global4AllToZeroReduction, 4}, }; /////////////////////////////////////////////////////////////////////////////// // OCLPerfAtomicSpeed implementation. /////////////////////////////////////////////////////////////////////////////// OCLPerfAtomicSpeed::OCLPerfAtomicSpeed() { _atomicsSupported = false; _dataSizeTooBig = false; _numSubTests = sizeof(testOCLPerfAtomicSpeedList) / sizeof(testOCLPerfAtomicSpeedStruct); _numLoops = 10; _nCurrentInputScale = 1; _maxMemoryAllocationSize = 0; _input = NULL; _output = NULL; _inputBuffer = NULL; _outputBuffer = NULL; _workgroupSize = 256; _programs.clear(); _kernels.clear(); } OCLPerfAtomicSpeed::~OCLPerfAtomicSpeed() {} void OCLPerfAtomicSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_int status = CL_SUCCESS; device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; _cpuReductionSum = 0; _nCurrentInputScale = testOCLPerfAtomicSpeedList[_openTest].inputScale; AtomicType atomicType = testOCLPerfAtomicSpeedList[_openTest].atomicType; // Setup stuff... setupHistogram(); calculateHostBin(); context_ = 0; cmd_queue_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); // Get last for default #if 0 platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); #if 0 if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { platform = platforms[i]; break; } #endif num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { #if 0 if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } #endif platform = platforms[_platformIndex]; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, NULL, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); // Global memory size error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), &_maxMemoryAllocationSize, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs(CL_DEVICE_GLOBAL_MEM_SIZE) failed"); // Check that the test size is not too big for the current GPU. _dataSizeTooBig = false; cl_ulong tenMB = 1024 * 10240; if (_inputNBytes >= (_maxMemoryAllocationSize - tenMB)) { _dataSizeTooBig = true; return; } char *p = strstr(charbuf, "cl_khr_global_int32_base_atomics"); char *p2 = strstr(charbuf, "cl_khr_local_int32_base_atomics"); _atomicsSupported = false; if (p || p2) _atomicsSupported = true; // Verify atomics are supported. if (!_atomicsSupported) return; cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); // Create buffers... _inputBuffer = clCreateBuffer(context_, CL_MEM_READ_ONLY, _inputNBytes, 0, &status); CHECK_RESULT(status, "clCreateBuffer failed. (inputBuffer)"); // Create the programs/kernels for the current test type. CreateKernels(atomicType); _nThreadsPerGroup = _workgroupSize; _nGroups = _nThreads / _nThreadsPerGroup; _outputNBytes = _nGroups * NBINS * sizeof(cl_uint); if (IsReduction(atomicType)) _outputNBytes = _inputNBytes; _output = (cl_uint *)malloc(_outputNBytes); if (0 == _output) { _dataSizeTooBig = true; return; } // Create output Buffer _outputBuffer = clCreateBuffer(context_, CL_MEM_READ_WRITE, _outputNBytes, 0, &status); CHECK_RESULT(status, "clCreateBuffer failed. (outputBuffer)"); } // Create the programs/kernels for the current test type. void OCLPerfAtomicSpeed::CreateKernels(const AtomicType atomicType) { char log[16384]; cl_kernel kernel_; cl_program program_; char buildOptions[1000]; cl_int status = CL_SUCCESS; SNPRINTF(buildOptions, sizeof(buildOptions), "-D NBINS=%d -D BITS_PER_PIX=%d -D NBANKS=%d", NBINS, BITS_PER_PIX, NBANKS); // Create the programs. switch (atomicType) { case LocalHistogram: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&local_atomics_histogram, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&local_atomics_reduce, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; case LocalReductionNoAtomics: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&local_reduction, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; case Local4ReductionNoAtomics: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&local_vec4_reduction, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; case LocalReductionAtomics: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&local_atomics_reduction, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; case Local4ReductionAtomics: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&local_vec4_atomics_reduction, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; case GlobalHistogram: case Global4Histogram: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&global_atomics_histogram, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; case GlobalWGReduction: case Global4WGReduction: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&global_atomics_sum_reduction_workgroup, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; case GlobalAllToZeroReduction: case Global4AllToZeroReduction: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&global_atomics_sum_reduction_all_to_zero, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; default: CHECK_RESULT(true, "Atomic type not supported (clCreateProgram)"); } // Build the programs. for (size_t i = 0; i < _programs.size(); i++) { error_ = _wrapper->clBuildProgram(_programs[i], 1, &device, buildOptions, NULL, NULL); if (error_ != CL_SUCCESS) { status = _wrapper->clGetProgramBuildInfo(_programs[i], device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } } switch (atomicType) { case LocalHistogram: kernel_ = _wrapper->clCreateKernel(_programs[0], "local_atomics_histogram", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); _kernels.push_back(kernel_); kernel_ = _wrapper->clCreateKernel(_programs[1], "local_atomics_reduce", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); _kernels.push_back(kernel_); break; case LocalReductionNoAtomics: case Local4ReductionNoAtomics: case LocalReductionAtomics: case Local4ReductionAtomics: kernel_ = _wrapper->clCreateKernel(_programs[0], "local_reduction", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); _kernels.push_back(kernel_); break; case GlobalHistogram: case Global4Histogram: kernel_ = _wrapper->clCreateKernel(_programs[0], "global_atomics_histogram", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); _kernels.push_back(kernel_); break; case GlobalWGReduction: case Global4WGReduction: kernel_ = _wrapper->clCreateKernel( _programs[0], "global_atomics_sum_reduction_workgroup", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); _kernels.push_back(kernel_); break; case GlobalAllToZeroReduction: case Global4AllToZeroReduction: kernel_ = _wrapper->clCreateKernel( _programs[0], "global_atomics_sum_reduction_all_to_zero", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); _kernels.push_back(kernel_); break; default: CHECK_RESULT(true, "Atomic type not supported (clCreateKernel)"); } } // Sets the kernel arguments based on the current test type. void OCLPerfAtomicSpeed::SetKernelArguments(const AtomicType atomicType) { int Arg = 0; int localSize = 0; int itemsPerThread = 1; cl_int status = CL_SUCCESS; switch (atomicType) { case LocalHistogram: // Set arguments for the local atomics histogram kernel status = _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(cl_mem), (void *)&_inputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (inputBuffer)"); status |= _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(cl_mem), (void *)&_outputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (outputBuffer)"); status |= _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(_n4VectorsPerThread), (void *)&_n4VectorsPerThread); CHECK_RESULT(status, "clSetKernelArg failed. (n4VectorsPerThread)"); // Set arguments for the local atomics reduce kernel Arg = 0; status |= _wrapper->clSetKernelArg(_kernels[1], Arg++, sizeof(cl_mem), (void *)&_outputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (outputBuffer)"); status |= _wrapper->clSetKernelArg(_kernels[1], Arg++, sizeof(_nGroups), (void *)&_nGroups); CHECK_RESULT(status, "clSetKernelArg failed. (nGroups)"); break; case LocalReductionAtomics: case LocalReductionNoAtomics: case Local4ReductionNoAtomics: case Local4ReductionAtomics: status = _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(cl_mem), (void *)&_inputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (inputBuffer)"); status |= _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(cl_mem), (void *)&_outputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (outputBuffer)"); localSize = DEFAULT_WG_SIZE * sizeof(cl_uint); if ((Local4ReductionNoAtomics == atomicType) || (Local4ReductionAtomics == atomicType)) localSize *= 4; status = _wrapper->clSetKernelArg(_kernels[0], Arg++, localSize, NULL); CHECK_RESULT(status, "clSetKernelArg failed. (local memory)"); break; case GlobalHistogram: case Global4Histogram: case GlobalWGReduction: case Global4WGReduction: case GlobalAllToZeroReduction: case Global4AllToZeroReduction: // Set arguments for the global atomics histogram kernel if ((Global4Histogram == atomicType) || (Global4WGReduction == atomicType) || (Global4AllToZeroReduction == atomicType)) itemsPerThread = 4; status = _wrapper->clSetKernelArg( _kernels[0], Arg++, sizeof(itemsPerThread), (void *)&itemsPerThread); CHECK_RESULT(status, "clSetKernelArg failed. (itemsPerThread)"); status = _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(cl_mem), (void *)&_inputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (inputBuffer)"); status |= _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(cl_mem), (void *)&_outputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (outputBuffer)"); break; default: CHECK_RESULT(true, "Atomic type not supported (clSetKernelArg)"); } } // Since we write multiple times to the output in global atomics, need to // reset the content every time. void OCLPerfAtomicSpeed::ResetGlobalOutput() { cl_int status; memset(_output, 0, _outputNBytes); status = _wrapper->clEnqueueWriteBuffer(cmd_queue_, _outputBuffer, CL_TRUE, 0, _outputNBytes, _output, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueWriteBuffer failed."); status = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); } // Run the local histogram kernels. void OCLPerfAtomicSpeed::RunLocalHistogram() { cl_uint status; cl_event events[2]; size_t globalThreads[3] = {1}; size_t localThreads[3] = {1}; size_t globalThreadsReduce = NBINS; size_t localThreadsReduce = _nThreadsPerGroup; globalThreads[0] = _nThreads; localThreads[0] = _nThreadsPerGroup; status = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, _kernels[0], 1, NULL, globalThreads, localThreads, 0, NULL, &events[0]); CHECK_RESULT(status, "clEnqueueNDRangeKernel failed. (histogram)"); status = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, _kernels[1], 1, NULL, &globalThreadsReduce, &localThreadsReduce, 1, &events[0], &events[1]); CHECK_RESULT(status, "clEnqueueNDRangeKernel failed. (reduce)"); status = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); status = _wrapper->clWaitForEvents(1, &events[0]); status |= _wrapper->clWaitForEvents(1, &events[1]); CHECK_RESULT(status, "clWaitForEvents failed."); } // Run the local reduction kernel. void OCLPerfAtomicSpeed::RunLocalReduction(const AtomicType atomicType) { cl_uint status; size_t globalThreads[3] = {1}; size_t localThreads[3] = {1}; globalThreads[0] = _inputNBytes / sizeof(cl_uint) / 2; localThreads[0] = _nThreadsPerGroup; if ((Local4ReductionNoAtomics == atomicType) || (Local4ReductionAtomics == atomicType)) globalThreads[0] /= 4; status = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, _kernels[0], 1, NULL, globalThreads, localThreads, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueNDRangeKernel failed. (reduction)"); status = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); } // Run the global histogram kernel. void OCLPerfAtomicSpeed::RunGlobalHistogram(AtomicType atomicType) { cl_uint status; size_t globalThreads[3] = {1}; size_t localThreads[3] = {1}; globalThreads[0] = _inputNBytes / sizeof(cl_uint); localThreads[0] = _nThreadsPerGroup; if ((Global4Histogram == atomicType) || (Global4WGReduction == atomicType) || (Global4AllToZeroReduction == atomicType)) globalThreads[0] /= 4; status = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, _kernels[0], 1, NULL, globalThreads, localThreads, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueNDRangeKernel failed."); status = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); } // Run the AtomicSpeed logic. void OCLPerfAtomicSpeed::run() { int Arg = 0; cl_uint status; AtomicType atomicType = testOCLPerfAtomicSpeedList[_openTest].atomicType; // Verify atomics are supported. if ((!_atomicsSupported) || (_dataSizeTooBig)) return; // Write data to the GPU status = _wrapper->clEnqueueWriteBuffer(cmd_queue_, _inputBuffer, CL_FALSE, 0, _inputNBytes, _input, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueWriteBuffer failed. (inputBuffer)"); status = _wrapper->clFlush(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); // Set the current arguments based on the test type. SetKernelArguments(atomicType); // Run the kernels. CPerfCounter timer; double totalTime = 0.0f; for (unsigned int k = 0; k < _numLoops + 1; k++) { // Since we run multiple times using global atomics the output // would get accumulated therefore first clean it. ResetGlobalOutput(); timer.Reset(); timer.Start(); switch (atomicType) { case LocalHistogram: RunLocalHistogram(); break; case LocalReductionAtomics: case LocalReductionNoAtomics: case Local4ReductionNoAtomics: case Local4ReductionAtomics: RunLocalReduction(atomicType); break; case GlobalHistogram: case Global4Histogram: case GlobalWGReduction: case Global4WGReduction: case GlobalAllToZeroReduction: case Global4AllToZeroReduction: RunGlobalHistogram(atomicType); break; default: CHECK_RESULT(true, "Atomic type not supported"); } timer.Stop(); // Don't count the warm-up if (0 != k) totalTime += timer.GetElapsedTime(); } // Read the results back to the CPU - Only do it for the last run // of the test instead of for each iteration of _numLoops. status = _wrapper->clEnqueueReadBuffer(cmd_queue_, _outputBuffer, CL_FALSE, 0, _outputNBytes, _output, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueReadBuffer failed."); status = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); // Print the results. PrintResults(atomicType, totalTime); // Check the results for the current test. _errorFlag = !(VerifyResults(atomicType)); } // Compare the results and see if they match bool OCLPerfAtomicSpeed::VerifyResults(const AtomicType atomicType) { cl_uint i = 0; bool flag = true; cl_uint calculatedValue = 0; cl_uint reductionElementCount = 0; switch (atomicType) { case LocalHistogram: case GlobalHistogram: case Global4Histogram: for (i = 0; i < NBINS; ++i) { if (_cpuhist[i] != _output[i]) { flag = false; break; } } break; case LocalReductionAtomics: case LocalReductionNoAtomics: case Local4ReductionNoAtomics: case Local4ReductionAtomics: case GlobalWGReduction: case Global4WGReduction: reductionElementCount = _inputNBytes / sizeof(cl_uint) / _nThreadsPerGroup; for (i = 0; i < reductionElementCount; i++) { calculatedValue += _output[i]; } flag = (calculatedValue == _cpuReductionSum); break; case GlobalAllToZeroReduction: case Global4AllToZeroReduction: flag = (_output[0] == _cpuReductionSum); break; default: CHECK_RESULT_NO_RETURN(true, "Atomic type not supported (VerifyResults)"); return false; } if (!flag) printf("WRONG VALUES!!!!!"); return flag; } unsigned int OCLPerfAtomicSpeed::close() { size_t i = 0; for (; i < _kernels.size(); i++) { error_ = _wrapper->clReleaseKernel(_kernels[i]); } for (; i < _programs.size(); i++) { error_ = _wrapper->clReleaseProgram(_programs[i]); } if (_inputBuffer) { error_ = clReleaseMemObject(_inputBuffer); CHECK_RESULT_NO_RETURN(error_, "clReleaseMemObject failed.(inputBuffer )"); } if (_outputBuffer) { error_ = clReleaseMemObject(_outputBuffer); CHECK_RESULT_NO_RETURN(error_, "clReleaseMemObject failed.(outputBuffer)"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } // Free host memory. free(_input); free(_output); // Reset everything. _kernels.clear(); _programs.clear(); _inputBuffer = NULL; _outputBuffer = NULL; cmd_queue_ = NULL; context_ = NULL; _input = NULL; _output = NULL; return _crcword; } /* Helper functions */ void OCLPerfAtomicSpeed::calculateHostBin() { // compute CPU histogram cl_int *p = (cl_int *)_input; memset(_cpuhist, 0, NBINS * sizeof(cl_uint)); _cpuReductionSum = 0; for (unsigned int i = 0; i < _inputNBytes / sizeof(cl_uint); i++) { _cpuhist[(p[i] >> 24) & 0xff]++; _cpuhist[(p[i] >> 16) & 0xff]++; _cpuhist[(p[i] >> 8) & 0xff]++; _cpuhist[(p[i] >> 0) & 0xff]++; _cpuReductionSum += ((p[i] >> 24) & 0x3) + ((p[i] >> 16) & 0x3) + ((p[i] >> 8) & 0x3) + ((p[i] >> 0) & 0x3); } } void OCLPerfAtomicSpeed::setupHistogram() { cl_int status = 0; _nThreads = 64 * 1024; #if defined(_WIN32) && !defined(_WIN64) _n4Vectors = 1024 * 1024; #else _n4Vectors = 2048 * 2048; #endif _n4Vectors *= _nCurrentInputScale; _n4VectorsPerThread = _n4Vectors / _nThreads; _inputNBytes = _n4Vectors * sizeof(cl_uint4); _input = (cl_uint *)malloc(_inputNBytes); if (0 == _input) { _dataSizeTooBig = true; return; } // random initialization of input time_t ltime; time(<ime); cl_uint a = (cl_uint)ltime, b = (cl_uint)ltime; cl_uint *p = (cl_uint *)_input; for (unsigned int i = 0; i < _inputNBytes / sizeof(cl_uint); i++) p[i] = (b = (a * (b & 65535)) + (b >> 16)); } // Print the results of the current test. void OCLPerfAtomicSpeed::PrintResults(const AtomicType atomicType, double totalTime) { char buf[500]; char sAtomicType[100]; double inputInGB = (double)_inputNBytes * (double)(1e-09); // each cl_uint in _inputNBytes contributes 4 items. double totalHistogramDataInGB = (double)inputInGB * 4; double perf = totalTime / _numLoops; switch (atomicType) { case LocalHistogram: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Local histogram"); break; case GlobalHistogram: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global histogram"); break; case Global4Histogram: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global vec 4 histogram"); break; case LocalReductionNoAtomics: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Local reduction NO atomics"); break; case Local4ReductionNoAtomics: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Local vec 4 reduction NO atomics"); break; case LocalReductionAtomics: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Local reduction with atomics"); break; case Local4ReductionAtomics: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Local vec 4 reduction with atomics"); break; case GlobalWGReduction: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global work-group reduction"); break; case Global4WGReduction: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global vec 4 work-group reduction"); break; case GlobalAllToZeroReduction: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global all to zero reduction"); break; case Global4AllToZeroReduction: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global vec 4 all to zero reduction"); break; default: CHECK_RESULT(true, "Atomic type not supported (PrintResults)"); } SNPRINTF(buf, sizeof(buf), "%45s: Input [%.3f GB], Time [%.3f sec]: GB/s", sAtomicType, totalHistogramDataInGB, perf); _perfInfo = (float)(totalHistogramDataInGB / perf); testDescString = buf; } bool OCLPerfAtomicSpeed::IsReduction(const AtomicType atomicType) { return ((atomicType >= LocalReductionNoAtomics) && (atomicType <= GlobalAllToZeroReduction)); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfAtomicSpeed.h000066400000000000000000000065451450307266000251530ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_AtomicSpeed_H_ #define _OCL_AtomicSpeed_H_ #include #include #include #include #include "OCLTestImp.h" #define DEFAULT_WG_SIZE 256 #define NBINS 256 #define BITS_PER_PIX 8 #define NBANKS 16 // Define the atomic type to test. enum AtomicType { LocalHistogram = 0, GlobalHistogram, Global4Histogram, LocalReductionNoAtomics, Local4ReductionNoAtomics, LocalReductionAtomics, Local4ReductionAtomics, GlobalWGReduction, Global4WGReduction, GlobalAllToZeroReduction, Global4AllToZeroReduction, }; typedef struct { AtomicType atomicType; int inputScale; } testOCLPerfAtomicSpeedStruct; // Define the OCLPerfAtomicSpeed class. class OCLPerfAtomicSpeed : public OCLTestImp { public: OCLPerfAtomicSpeed(); virtual ~OCLPerfAtomicSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); cl_context context_; cl_command_queue cmd_queue_; std::vector _programs; std::vector _kernels; cl_device_id device; bool _atomicsSupported; bool _dataSizeTooBig; cl_uint _numLoops; // Histogram related stuff... private: cl_ulong _maxMemoryAllocationSize; cl_uint _inputNBytes; cl_uint _outputNBytes; cl_uint _nCurrentInputScale; cl_uint _workgroupSize; // cl_uint nLoops; cl_uint _nThreads; cl_uint _nThreadsPerGroup; cl_uint _nGroups; cl_uint _n4Vectors; cl_uint _n4VectorsPerThread; cl_uint _nBins; cl_uint _nBytesLDSPerGrp; cl_uint* _input; cl_uint* _output; cl_mem _inputBuffer; cl_mem _outputBuffer; cl_uint _cpuhist[NBINS]; cl_uint _cpuReductionSum; void calculateHostBin(); void setupHistogram(); bool VerifyResults(const AtomicType atomicType); void ResetGlobalOutput(); // Methods that does the actual NDRange. void RunLocalHistogram(); void RunLocalReduction(const AtomicType atomicType); void RunGlobalHistogram(const AtomicType atomicType); void CreateKernels(const AtomicType atomicType); bool IsReduction(const AtomicType atomicType); void SetKernelArguments(const AtomicType atomicType); void PrintResults(const AtomicType atomicType, double totalTime); }; #endif // _OCL_AtomicSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfAtomicSpeed20.cpp000066400000000000000000000402461450307266000256440ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfAtomicSpeed20.h" #include #include #include #include #include #include "CL/cl.h" #include "OCLPerfAtomicSpeed20Kernels.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif // Define the test suite tests. testOCLPerfAtomicSpeed20Struct testOCLPerfAtomicSpeed20List[] = { {GlobalWGReduction, 1}, {GlobalWGReduction, 2}, {GlobalWGReduction, 4}, {GlobalAllToZeroReduction, 1}, {GlobalAllToZeroReduction, 2}, {GlobalAllToZeroReduction, 4}, {Global4WGReduction, 1}, {Global4WGReduction, 2}, {Global4WGReduction, 4}, {Global4AllToZeroReduction, 1}, {Global4AllToZeroReduction, 2}, {Global4AllToZeroReduction, 4}, }; /////////////////////////////////////////////////////////////////////////////// // OCLPerfAtomicSpeed20 implementation. /////////////////////////////////////////////////////////////////////////////// OCLPerfAtomicSpeed20::OCLPerfAtomicSpeed20() { _atomicsSupported = false; _dataSizeTooBig = false; _numSubTests = sizeof(testOCLPerfAtomicSpeed20List) / sizeof(testOCLPerfAtomicSpeed20Struct); _numLoops = 10; _nCurrentInputScale = 1; _maxMemoryAllocationSize = 0; _input = NULL; _output = NULL; _inputBuffer = NULL; _outputBuffer = NULL; skip_ = false; _workgroupSize = 256; _programs.clear(); _kernels.clear(); } OCLPerfAtomicSpeed20::~OCLPerfAtomicSpeed20() {} void OCLPerfAtomicSpeed20::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; #if defined(CL_VERSION_2_0) cl_device_id device; cl_int status = CL_SUCCESS; conversion = 1.0f; _openTest = test; _cpuReductionSum = 0; _nCurrentInputScale = testOCLPerfAtomicSpeed20List[_openTest].inputScale; AtomicType atomicType = testOCLPerfAtomicSpeed20List[_openTest].atomicType; // Setup stuff... setupHistogram(); calculateHostBin(); device = devices_[_deviceId]; cmd_queue_ = cmdQueues_[_deviceId]; char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); // Global memory size error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), &_maxMemoryAllocationSize, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo(CL_DEVICE_MAX_MEM_ALLOC_SIZE) failed"); // Check that the test size is not too big for the current GPU. _dataSizeTooBig = false; cl_ulong tenMB = 1024 * 10240; if (_inputNBytes >= (_maxMemoryAllocationSize - tenMB)) { _dataSizeTooBig = true; return; } char *p = strstr(charbuf, "cl_khr_global_int32_base_atomics"); _atomicsSupported = false; if (p) _atomicsSupported = true; // Verify atomics are supported. if (!_atomicsSupported) return; cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); // Create buffers... _inputBuffer = clCreateBuffer(context_, CL_MEM_READ_ONLY, _inputNBytes, 0, &status); CHECK_RESULT(status, "clCreateBuffer failed. (inputBuffer)"); // Create the programs/kernels for the current test type. CreateKernels(atomicType); _nThreadsPerGroup = _workgroupSize; _nGroups = _nThreads / _nThreadsPerGroup; _outputNBytes = _inputNBytes; _output = (cl_uint *)malloc(_outputNBytes); if (0 == _output) { _dataSizeTooBig = true; return; } // Create output Buffer _outputBuffer = clCreateBuffer(context_, CL_MEM_READ_WRITE, _outputNBytes, 0, &status); CHECK_RESULT(status, "clCreateBuffer failed. (outputBuffer)"); #else skip_ = true; testDescString = "OpenCL verion < 2.0. Test Skipped."; return; #endif } // Create the programs/kernels for the current test type. void OCLPerfAtomicSpeed20::CreateKernels(const AtomicType atomicType) { char log[16384]; cl_kernel kernel_; cl_program program_; char buildOptions[1000]; cl_int status = CL_SUCCESS; cl_device_id device = devices_[_deviceId]; SNPRINTF(buildOptions, sizeof(buildOptions), "-cl-std=CL2.0 -D NBINS=%d -D BITS_PER_PIX=%d -D NBANKS=%d", NBINS, BITS_PER_PIX, NBANKS); // Create the programs. switch (atomicType) { case GlobalWGReduction: case Global4WGReduction: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&global_atomics_sum_reduction_workgroup, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; case GlobalAllToZeroReduction: case Global4AllToZeroReduction: program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&global_atomics_sum_reduction_all_to_zero, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); _programs.push_back(program_); break; default: CHECK_RESULT(true, "Atomic type not supported (clCreateProgram)"); } // Build the programs. for (size_t i = 0; i < _programs.size(); i++) { error_ = _wrapper->clBuildProgram(_programs[i], 1, &device, buildOptions, NULL, NULL); if (error_ != CL_SUCCESS) { status = _wrapper->clGetProgramBuildInfo(_programs[i], device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } } switch (atomicType) { case GlobalWGReduction: case Global4WGReduction: kernel_ = _wrapper->clCreateKernel( _programs[0], "global_atomics_sum_reduction_workgroup", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); _kernels.push_back(kernel_); break; case GlobalAllToZeroReduction: case Global4AllToZeroReduction: kernel_ = _wrapper->clCreateKernel( _programs[0], "global_atomics_sum_reduction_all_to_zero", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); _kernels.push_back(kernel_); break; default: CHECK_RESULT(true, "Atomic type not supported (clCreateKernel)"); } } // Sets the kernel arguments based on the current test type. void OCLPerfAtomicSpeed20::SetKernelArguments(const AtomicType atomicType) { int Arg = 0; int localSize = 0; int itemsPerThread = 1; cl_int status = CL_SUCCESS; switch (atomicType) { case GlobalWGReduction: case Global4WGReduction: case GlobalAllToZeroReduction: case Global4AllToZeroReduction: // Set arguments for the global atomics histogram kernel if ((Global4WGReduction == atomicType) || (Global4AllToZeroReduction == atomicType)) itemsPerThread = 4; status = _wrapper->clSetKernelArg( _kernels[0], Arg++, sizeof(itemsPerThread), (void *)&itemsPerThread); CHECK_RESULT(status, "clSetKernelArg failed. (itemsPerThread)"); status = _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(cl_mem), (void *)&_inputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (inputBuffer)"); status |= _wrapper->clSetKernelArg(_kernels[0], Arg++, sizeof(cl_mem), (void *)&_outputBuffer); CHECK_RESULT(status, "clSetKernelArg failed. (outputBuffer)"); break; default: CHECK_RESULT(true, "Atomic type not supported (clSetKernelArg)"); } } // Since we write multiple times to the output in global atomics, need to // reset the content every time. void OCLPerfAtomicSpeed20::ResetGlobalOutput() { cl_int status; memset(_output, 0, _outputNBytes); status = _wrapper->clEnqueueWriteBuffer(cmd_queue_, _outputBuffer, CL_TRUE, 0, _outputNBytes, _output, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueWriteBuffer failed."); status = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); } // Run the global histogram kernel. void OCLPerfAtomicSpeed20::RunGlobalHistogram(AtomicType atomicType) { cl_uint status; size_t globalThreads[3] = {1}; size_t localThreads[3] = {1}; globalThreads[0] = _inputNBytes / sizeof(cl_uint); localThreads[0] = _nThreadsPerGroup; if ((Global4WGReduction == atomicType) || (Global4AllToZeroReduction == atomicType)) globalThreads[0] /= 4; status = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, _kernels[0], 1, NULL, globalThreads, localThreads, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueNDRangeKernel failed."); status = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); } // Run the AtomicSpeed logic. void OCLPerfAtomicSpeed20::run() { if (skip_) { return; } #if defined(CL_VERSION_2_0) int Arg = 0; cl_uint status; AtomicType atomicType = testOCLPerfAtomicSpeed20List[_openTest].atomicType; // Verify atomics are supported. if ((!_atomicsSupported) || (_dataSizeTooBig)) return; // Write data to the GPU status = _wrapper->clEnqueueWriteBuffer(cmd_queue_, _inputBuffer, CL_FALSE, 0, _inputNBytes, _input, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueWriteBuffer failed. (inputBuffer)"); status = _wrapper->clFlush(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); // Set the current arguments based on the test type. SetKernelArguments(atomicType); // Run the kernels. CPerfCounter timer; double totalTime = 0.0f; for (unsigned int k = 0; k < _numLoops + 1; k++) { // Since we run multiple times using global atomics the output // would get accumulated therefore first clean it. ResetGlobalOutput(); timer.Reset(); timer.Start(); switch (atomicType) { case GlobalWGReduction: case Global4WGReduction: case GlobalAllToZeroReduction: case Global4AllToZeroReduction: RunGlobalHistogram(atomicType); break; default: CHECK_RESULT(true, "Atomic type not supported"); } timer.Stop(); // Don't count the warm-up if (0 != k) totalTime += timer.GetElapsedTime(); } status = _wrapper->clEnqueueReadBuffer(cmd_queue_, _outputBuffer, CL_FALSE, 0, _outputNBytes, _output, 0, NULL, NULL); CHECK_RESULT(status, "clEnqueueReadBuffer failed."); status = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(status, "clFlush failed."); // Print the results. PrintResults(atomicType, totalTime); // Check the results for the current test. _errorFlag = !(VerifyResults(atomicType)); #endif } // Compare the results and see if they match bool OCLPerfAtomicSpeed20::VerifyResults(const AtomicType atomicType) { cl_uint i = 0; bool flag = true; cl_uint calculatedValue = 0; cl_uint reductionElementCount = 0; switch (atomicType) { case GlobalWGReduction: case Global4WGReduction: reductionElementCount = _inputNBytes / sizeof(cl_uint) / _nThreadsPerGroup; for (i = 0; i < reductionElementCount; i++) { calculatedValue += _output[i]; } flag = (calculatedValue == _cpuReductionSum); break; case GlobalAllToZeroReduction: case Global4AllToZeroReduction: flag = (_output[0] == _cpuReductionSum); break; default: CHECK_RESULT_NO_RETURN(true, "Atomic type not supported (VerifyResults)"); return false; } if (!flag) printf("WRONG VALUES!!!!!"); return flag; } unsigned int OCLPerfAtomicSpeed20::close() { size_t i = 0; for (; i < _kernels.size(); i++) { error_ = _wrapper->clReleaseKernel(_kernels[i]); } for (; i < _programs.size(); i++) { error_ = _wrapper->clReleaseProgram(_programs[i]); } if (_inputBuffer) { error_ = clReleaseMemObject(_inputBuffer); CHECK_RESULT_NO_RETURN(error_, "clReleaseMemObject failed.(inputBuffer )"); } if (_outputBuffer) { error_ = clReleaseMemObject(_outputBuffer); CHECK_RESULT_NO_RETURN(error_, "clReleaseMemObject failed.(outputBuffer)"); } // Free host memory. free(_input); free(_output); // Reset everything. _kernels.clear(); _programs.clear(); _inputBuffer = NULL; _outputBuffer = NULL; _input = NULL; _output = NULL; return OCLTestImp::close(); } /* Helper functions */ void OCLPerfAtomicSpeed20::calculateHostBin() { // compute CPU histogram cl_int *p = (cl_int *)_input; memset(_cpuhist, 0, NBINS * sizeof(cl_uint)); _cpuReductionSum = 0; for (unsigned int i = 0; i < _inputNBytes / sizeof(cl_uint); i++) { _cpuhist[(p[i] >> 24) & 0xff]++; _cpuhist[(p[i] >> 16) & 0xff]++; _cpuhist[(p[i] >> 8) & 0xff]++; _cpuhist[(p[i] >> 0) & 0xff]++; _cpuReductionSum += ((p[i] >> 24) & 0x3) + ((p[i] >> 16) & 0x3) + ((p[i] >> 8) & 0x3) + ((p[i] >> 0) & 0x3); } } void OCLPerfAtomicSpeed20::setupHistogram() { cl_int status = 0; _nThreads = 64 * 1024; _n4Vectors = 2048 * 2048; _n4Vectors *= _nCurrentInputScale; _n4VectorsPerThread = _n4Vectors / _nThreads; _inputNBytes = _n4Vectors * sizeof(cl_uint4); _input = (cl_uint *)malloc(_inputNBytes); if (0 == _input) { _dataSizeTooBig = true; return; } // random initialization of input time_t ltime; time(<ime); cl_uint a = (cl_uint)ltime, b = (cl_uint)ltime; cl_uint *p = (cl_uint *)_input; for (unsigned int i = 0; i < _inputNBytes / sizeof(cl_uint); i++) p[i] = (b = (a * (b & 65535)) + (b >> 16)); } // Print the results of the current test. void OCLPerfAtomicSpeed20::PrintResults(const AtomicType atomicType, double totalTime) { char buf[500]; char sAtomicType[100]; double inputInGB = (double)_inputNBytes * (double)(1e-09); // each cl_uint in _inputNBytes contributes 4 items. double totalHistogramDataInGB = (double)inputInGB * 4; double perf = totalTime / _numLoops; switch (atomicType) { case GlobalWGReduction: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global work-group reduction"); break; case Global4WGReduction: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global vec 4 work-group reduction"); break; case GlobalAllToZeroReduction: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global all to zero reduction"); break; case Global4AllToZeroReduction: SNPRINTF(sAtomicType, sizeof(sAtomicType), "Global vec 4 all to zero reduction"); break; default: CHECK_RESULT(true, "Atomic type not supported (PrintResults)"); } SNPRINTF(buf, sizeof(buf), "%45s: Input [%.3f GB], Time [%.3f sec]: GB/s", sAtomicType, totalHistogramDataInGB, perf); _perfInfo = (float)(totalHistogramDataInGB / perf); testDescString = buf; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfAtomicSpeed20.h000066400000000000000000000056561450307266000253170ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_AtomicSpeed20_H_ #define _OCL_AtomicSpeed20_H_ #include #include #include #include #include "OCLTestImp.h" #define DEFAULT_WG_SIZE 256 #define NBINS 256 #define BITS_PER_PIX 8 #define NBANKS 16 #include "OCLPerfAtomicSpeed.h" typedef struct { AtomicType atomicType; int inputScale; } testOCLPerfAtomicSpeed20Struct; // Define the OCLPerfAtomicSpeed20 class. class OCLPerfAtomicSpeed20 : public OCLTestImp { public: OCLPerfAtomicSpeed20(); virtual ~OCLPerfAtomicSpeed20(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); cl_command_queue cmd_queue_; std::vector _programs; std::vector _kernels; bool _atomicsSupported; bool _dataSizeTooBig; cl_uint _numLoops; // Histogram related stuff... private: cl_ulong _maxMemoryAllocationSize; cl_uint _inputNBytes; cl_uint _outputNBytes; cl_uint _nCurrentInputScale; cl_uint _workgroupSize; // cl_uint nLoops; cl_uint _nThreads; cl_uint _nThreadsPerGroup; cl_uint _nGroups; cl_uint _n4Vectors; cl_uint _n4VectorsPerThread; cl_uint _nBins; cl_uint _nBytesLDSPerGrp; cl_uint* _input; cl_uint* _output; cl_mem _inputBuffer; cl_mem _outputBuffer; bool skip_; cl_uint _cpuhist[NBINS]; cl_uint _cpuReductionSum; void calculateHostBin(); void setupHistogram(); bool VerifyResults(const AtomicType atomicType); void ResetGlobalOutput(); // Methods that does the actual NDRange. void RunGlobalHistogram(const AtomicType atomicType); void CreateKernels(const AtomicType atomicType); void SetKernelArguments(const AtomicType atomicType); void PrintResults(const AtomicType atomicType, double totalTime); }; #endif // _OCL_AtomicSpeed20_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfAtomicSpeed20Kernels.h000066400000000000000000000060601450307266000266310ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ static const char *global_atomics_sum_reduction_all_to_zero = "#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable\n" " __kernel void global_atomics_sum_reduction_all_to_zero(uint " "ItemsPerThread, __global uint *Input, __global atomic_int *Output )\n" "{\n" " uint sum = 0;\n" " const uint msk = (uint)3;\n" " const uint shft = (uint)8;\n" " \n" " uint tid = get_global_id(0);\n" " uint Stride = get_global_size(0);\n" " for( int i = 0; i < ItemsPerThread; i++)\n" " {\n" " uint data = Input[tid];\n" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " tid += Stride;\n" " }\n" " atomic_fetch_add_explicit( &(Output[0]), sum, memory_order_relaxed, " "memory_scope_device);\n" "}\n"; static const char *global_atomics_sum_reduction_workgroup = "#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable\n" " __kernel void global_atomics_sum_reduction_workgroup(uint " "ItemsPerThread, __global uint *Input, __global atomic_int *Output )\n" "{\n" " uint sum = 0;\n" " const uint msk = (uint)3;\n" " const uint shft = (uint)8;\n" " \n" " uint tid = get_global_id(0);\n" " uint Stride = get_global_size(0);\n" " for( int i = 0; i < ItemsPerThread; i++)\n" " {\n" " uint data = Input[tid];\n" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " tid += Stride;\n" " }\n" " atomic_fetch_add_explicit( &(Output[get_group_id(0)]), sum, " "memory_order_relaxed, memory_scope_device);\n" "}\n"; clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfAtomicSpeedKernels.h000066400000000000000000000361701450307266000264740ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ static const char *local_atomics_histogram = "#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable\n" "#define MIN(a,b) ((a) < (b)) ? (a) : (b) \n" "#define MAX(a,b) ((a) > (b)) ? (a) : (b) \n" "__kernel __attribute__((reqd_work_group_size(256,1,1)))\n" "void local_atomics_histogram(__global uint4 *Image,\n" "__global uint *Histogram,\n" "uint n4VectorsPerThread)\n" "{\n" " __local __attribute__((aligned(16))) uint subhists[NBANKS * NBINS];\n" "\n" " uint tid = get_global_id(0);\n" " uint ltid = get_local_id(0);\n" " uint Stride = get_global_size(0);\n" "\n" " uint i, idx;\n" " uint4 temp, temp2;\n" " const uint shft = (uint) BITS_PER_PIX;\n" " const uint msk = (uint) (NBINS-1);\n" " uint offset = (uint) ltid % (uint) (NBANKS);\n" "\n" " uint lmem_items = NBANKS * NBINS;\n" " uint lmem_items_per_thread;\n" " uint lmem_max_threads;\n" "\n" " // parallel LDS clear\n" " // first, calculate threads per item, at least 1:\n" " lmem_max_threads = MIN( 1, get_local_size(0) / lmem_items );\n" " // but no more than we have items:\n" " lmem_max_threads = MAX( 1, lmem_max_threads / lmem_items );\n" " // calculate threads total:\n" " lmem_max_threads = lmem_items / lmem_max_threads;\n" " // but no more than LDS banks:\n" " lmem_max_threads = MIN( get_local_size(0), lmem_max_threads );\n" "\n" " lmem_items_per_thread = lmem_items / lmem_max_threads;\n" "\n" " // now, clear LDS\n" " __local uint4 *p = (__local uint4 *) subhists;\n" "\n" " if( ltid < lmem_max_threads )\n" " {\n" " for(i=0, idx=ltid; i> shft;\n" " temp2 = (temp & msk) * (uint4) NBANKS + offset;\n" "\n" " (void) atom_inc( subhists + temp2.x );\n" " (void) atom_inc( subhists + temp2.y );\n" " (void) atom_inc( subhists + temp2.z );\n" " (void) atom_inc( subhists + temp2.w );\n" "\n" " temp = temp >> shft;\n" " temp2 = (temp & msk) * (uint4) NBANKS + offset;\n" "\n" " (void) atom_inc( subhists + temp2.x );\n" " (void) atom_inc( subhists + temp2.y );\n" " (void) atom_inc( subhists + temp2.z );\n" " (void) atom_inc( subhists + temp2.w );\n" "\n" " temp = temp >> shft;\n" " temp2 = (temp & msk) * (uint4) NBANKS + offset;\n" "\n" " (void) atom_inc( subhists + temp2.x );\n" " (void) atom_inc( subhists + temp2.y );\n" " (void) atom_inc( subhists + temp2.z );\n" " (void) atom_inc( subhists + temp2.w );\n" " }\n" "\n" " barrier( CLK_LOCAL_MEM_FENCE );\n" "\n" " // reduce __local banks to single histogram per work-group\n" "\n" " if( ltid < NBINS )\n" " {\n" " uint bin = 0;\n" " for( i=0; i> shft;\n" " atom_inc( &(Histogram[ (temp & msk) ]) );\n" " temp = temp >> shft;\n" " atom_inc( &(Histogram[ (temp & msk) ]) );\n" " temp = temp >> shft;\n" " atom_inc( &(Histogram[ (temp & msk) ]) );\n" " tid += Stride;" " }\n" "}\n"; static const char *global_vec4_atomics_histogram = "#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable\n" "__kernel __attribute__((reqd_work_group_size(256,1,1)))\n" "void global_atomics_histogram(uint ItemsPerThread,\n" "__global uint4 *Input,\n" "__global uint *Histogram)\n" "{\n" " uint tid = get_global_id(0);\n" " const uint shft = (uint) BITS_PER_PIX;\n" " const uint msk = (uint) (NBINS-1);\n" " uint Stride = get_global_size(0);\n" " for( int i = 0; i < ItemsPerThread; i++)\n" " {\n" " uint4 temp = Input[tid];\n" " atom_inc( &(Histogram[ (temp.x & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.y & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.z & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.w & msk) ]) );\n" " temp = temp >> shft;\n" " atom_inc( &(Histogram[ (temp.x & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.y & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.z & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.w & msk) ]) );\n" " temp = temp >> shft;\n" " atom_inc( &(Histogram[ (temp.x & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.y & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.z & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.w & msk) ]) );\n" " temp = temp >> shft;\n" " atom_inc( &(Histogram[ (temp.x & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.y & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.z & msk) ]) );\n" " atom_inc( &(Histogram[ (temp.w & msk) ]) );\n" " tid += Stride;" " }\n" "}\n"; static const char *global_atomics_sum_reduction_all_to_zero = "#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable\n" " __kernel void global_atomics_sum_reduction_all_to_zero(uint " "ItemsPerThread, __global uint *Input, __global int *Output )\n" "{\n" " uint sum = 0;\n" " const uint msk = (uint)3;\n" " const uint shft = (uint)8;\n" " \n" " uint tid = get_global_id(0);\n" " uint Stride = get_global_size(0);\n" " for( int i = 0; i < ItemsPerThread; i++)\n" " {\n" " uint data = Input[tid];\n" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " tid += Stride;\n" " }\n" " atom_add( &(Output[0]), sum);\n" "}\n"; static const char *global_atomics_sum_reduction_workgroup = "#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable\n" " __kernel void global_atomics_sum_reduction_workgroup(uint " "ItemsPerThread, __global uint *Input, __global int *Output )\n" "{\n" " uint sum = 0;\n" " const uint msk = (uint)3;\n" " const uint shft = (uint)8;\n" " \n" " uint tid = get_global_id(0);\n" " uint Stride = get_global_size(0);\n" " for( int i = 0; i < ItemsPerThread; i++)\n" " {\n" " uint data = Input[tid];\n" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " data = data >> shft;" " sum += data & msk;\n" " tid += Stride;\n" " }\n" " atom_add( &(Output[get_group_id(0)]), sum);\n" "}\n"; static const char *local_reduction = "__kernel void local_reduction(__global uint* input, __global uint* " "output, __local uint* sdata)\n" "{\n" " // load shared mem\n" " const uint msk = (uint)3;\n" " const uint shft = (uint)8;\n" " unsigned int tid = get_local_id(0);\n" "\n" " unsigned int localSize = get_local_size(0);\n" " unsigned int stride = get_global_id(0) * 2;\n" " unsigned int data1 = input[stride];\n" " unsigned int data2 = input[stride + 1];\n" " unsigned int sum = 0;\n" " for( int i = 0; i < 4; i++)\n" " {\n" " sum += (data1 & msk) + (data2 & msk);\n" " data1 = data1 >> shft;\n" " data2 = data2 >> shft;\n" " }\n" " sdata[tid] = sum;" "\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " // do reduction in shared mem\n" " for(unsigned int s = localSize >> 1; s > 0; s >>= 1)\n" " {\n" " if(tid < s) \n" " {\n" " sdata[tid] += sdata[tid + s];\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " }\n" "\n" " // write result for this block to global mem\n" " if(tid == 0) output[get_group_id(0)] = sdata[0];\n" "}\n"; static const char *local_vec4_reduction = "__kernel void local_reduction(__global uint4* input, __global uint4* " "output, __local uint4* sdata)\n" "{\n" " // load shared mem\n" " const uint msk = (uint)3;\n" " const uint shft = (uint)8;\n" " unsigned int tid = get_local_id(0);\n" "\n" " unsigned int localSize = get_local_size(0);\n" " unsigned int stride = get_global_id(0) * 2;\n" " uint4 data1 = input[stride];\n" " uint4 data2 = input[stride + 1];\n" " uint4 sum = 0;\n" " for( int i = 0; i < 4; i++)\n" " {\n" " sum += (data1 & msk) + (data2 & msk);\n" " data1 = data1 >> shft;\n" " data2 = data2 >> shft;\n" " }\n" " sdata[tid] = sum;" "\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " // do reduction in shared mem\n" " for(unsigned int s = localSize >> 1; s > 0; s >>= 1)\n" " {\n" " if(tid < s) \n" " {\n" " sdata[tid] += sdata[tid + s];\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " }\n" "\n" " // write result for this block to global mem\n" " if(tid == 0) output[get_group_id(0)] = sdata[0];\n" "}\n"; static const char *local_atomics_reduction = "#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable\n" "__kernel void local_reduction(__global uint* input, __global uint* " "output, __local uint* sdata)\n" "{\n" " // load shared mem\n" " const uint msk = (uint)3;\n" " const uint shft = (uint)8;\n" " unsigned int tid = get_local_id(0);\n" "\n" " unsigned int localSize = get_local_size(0);\n" " unsigned int stride = get_global_id(0) * 2;\n" " unsigned int data1 = input[stride];\n" " unsigned int data2 = input[stride + 1];\n" " unsigned int sum = 0;\n" " for( int i = 0; i < 4; i++)\n" " {\n" " sum += (data1 & msk) + (data2 & msk);\n" " data1 = data1 >> shft;\n" " data2 = data2 >> shft;\n" " }\n" " sdata[tid] = sum;" "\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " // do reduction in shared mem\n" " for(unsigned int s = localSize >> 1; s > 0; s >>= 1)\n" " {\n" " if(tid < s) \n" " {\n" " atom_add( &(sdata[tid]), sdata[tid + s]);\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " }\n" "\n" " // write result for this block to global mem\n" " if(tid == 0) output[get_group_id(0)] = sdata[0];\n" "}\n"; static const char *local_vec4_atomics_reduction = "#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable\n" "__kernel void local_reduction(__global uint4* input, __global uint4* " "output, __local uint4* sdata)\n" "{\n" " // load shared mem\n" " const uint msk = (uint)3;\n" " const uint shft = (uint)8;\n" " unsigned int tid = get_local_id(0);\n" "\n" " unsigned int localSize = get_local_size(0);\n" " unsigned int stride = get_global_id(0) * 2;\n" " uint4 data1 = input[stride];\n" " uint4 data2 = input[stride + 1];\n" " uint4 sum = 0;\n" " for( int i = 0; i < 4; i++)\n" " {\n" " sum += (data1 & msk) + (data2 & msk);\n" " data1 = data1 >> shft;\n" " data2 = data2 >> shft;\n" " }\n" " sdata[tid] = sum;" "\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " // do reduction in shared mem\n" " for(unsigned int s = localSize >> 1; s > 0; s >>= 1)\n" " {\n" " if(tid < s) \n" " {\n" " atom_add( &(sdata[tid]).x, sdata[tid + s].x);\n" " atom_add( &(sdata[tid]).y, sdata[tid + s].y);\n" " atom_add( &(sdata[tid]).z, sdata[tid + s].z);\n" " atom_add( &(sdata[tid]).w, sdata[tid + s].w);\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " }\n" "\n" " // write result for this block to global mem\n" " if(tid == 0) output[get_group_id(0)] = sdata[0];\n" "}\n"; clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfBufferCopyOverhead.cpp000066400000000000000000000211061450307266000270210ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfBufferCopyOverhead.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif typedef struct { unsigned int iterations; int flushEvery; } testStruct; static testStruct testList[] = { {1, -1}, {1, -1}, {10, 1}, {10, -1}, {100, 1}, {100, 10}, {100, -1}, {1000, 1}, {1000, 10}, {1000, 100}, {1000, -1}, {10000, 1}, {10000, 10}, {10000, 100}, {10000, 1000}, {10000, -1}, {100000, 1}, {100000, 10}, {100000, 100}, {100000, 1000}, {100000, 10000}, {100000, -1}, }; OCLPerfBufferCopyOverhead::OCLPerfBufferCopyOverhead() { _numSubTests = 2 * 2 * sizeof(testList) / sizeof(testStruct); } OCLPerfBufferCopyOverhead::~OCLPerfBufferCopyOverhead() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfBufferCopyOverhead::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test % (sizeof(testList) / sizeof(testStruct)); context_ = 0; cmd_queue_ = 0; srcBuffer_ = 0; dstBuffer_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices delete platforms; } bufSize_ = 4; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_READ_ONLY; sleep = ((test / (sizeof(testList) / sizeof(testStruct))) % 2) > 0; if (test >= ((sizeof(testList) / sizeof(testStruct)) * 2)) { srcHost = true; flags |= CL_MEM_ALLOC_HOST_PTR; } else { srcHost = false; } srcBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateBuffer(srcBuffer) failed"); flags = CL_MEM_WRITE_ONLY; if (!srcHost) { flags |= CL_MEM_ALLOC_HOST_PTR; } dstBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateBuffer(dstBuffer) failed"); } void OCLPerfBufferCopyOverhead::run(void) { CPerfCounter timer; cl_event event; cl_int eventStatus; unsigned int iter = testList[_openTest].iterations; // Warm up error_ = _wrapper->clEnqueueCopyBuffer(cmd_queue_, srcBuffer_, dstBuffer_, 0, 0, bufSize_, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < iter; i++) { error_ = _wrapper->clEnqueueCopyBuffer(cmd_queue_, srcBuffer_, dstBuffer_, 0, 0, bufSize_, 0, NULL, &event); CHECK_RESULT(error_, "clEnqueueCopyBuffer failed"); if ((testList[_openTest].flushEvery > 0) && (((i + 1) % testList[_openTest].flushEvery) == 0)) { if (sleep) { _wrapper->clFinish(cmd_queue_); } else { _wrapper->clFlush(cmd_queue_); error_ = _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, NULL); while (eventStatus > 0) { error_ = _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, NULL); } } } if (i != (iter - 1)) { _wrapper->clReleaseEvent(event); } } if (sleep) { _wrapper->clFinish(cmd_queue_); } else { _wrapper->clFlush(cmd_queue_); error_ = _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, NULL); while (eventStatus > 0) { error_ = _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, NULL); } } _wrapper->clReleaseEvent(event); timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer copy time in us double perf = sec * 1000. * 1000. / iter; const char *strSrc = NULL; const char *strDst = NULL; const char *strWait = NULL; if (srcHost) { strSrc = "host"; strDst = "dev"; } else { strSrc = "dev"; strDst = "host"; } if (sleep) { strWait = "sleep"; } else { strWait = "spin"; } _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " %5s, s:%4s d:%4s i:%6d (us) ", strWait, strSrc, strDst, iter); testDescString = buf; } unsigned int OCLPerfBufferCopyOverhead::close(void) { if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(dstBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfBufferCopyOverhead.h000066400000000000000000000033611450307266000264710ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_BufferCopyOverhead_H_ #define _OCL_BufferCopyOverhead_H_ #include "OCLTestImp.h" class OCLPerfBufferCopyOverhead : public OCLTestImp { public: OCLPerfBufferCopyOverhead(); virtual ~OCLPerfBufferCopyOverhead(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem srcBuffer_; cl_mem dstBuffer_; cl_int error_; unsigned int bufSize_; bool sleep; bool srcHost; }; #endif // _OCL_BufferCopyOverhead_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfBufferCopySpeed.cpp000066400000000000000000000352231450307266000263310ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfBufferCopySpeed.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 8 // 4KB, 8KB, 64KB, 256KB, 1 MB, 4MB, 16 MB, 16MB+10 static const unsigned int Sizes[NUM_SIZES] = { 4096, 8192, 65536, 262144, 1048576, 4194304, 16777216, 16777216 + 10}; static const unsigned int Iterations[2] = {1, OCLPerfBufferCopySpeed::NUM_ITER}; #define BUF_TYPES 4 // 16 ways to combine 4 different buffer types #define NUM_SUBTESTS (BUF_TYPES * BUF_TYPES) OCLPerfBufferCopySpeed::OCLPerfBufferCopySpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * 2; } OCLPerfBufferCopySpeed::~OCLPerfBufferCopySpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfBufferCopySpeed::setData(void *ptr, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; value = 0; for (unsigned int i = 0; i < size >> 2; i++) { ptr2[i] = value; value++; } } void OCLPerfBufferCopySpeed::checkData(void *ptr, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; value = 0; for (unsigned int i = 0; i < size >> 2; i++) { if (ptr2[i] != value) { printf("Data validation failed at %d! Got 0x%08x 0x%08x 0x%08x 0x%08x\n", i, ptr2[i], ptr2[i + 1], ptr2[i + 2], ptr2[i + 3]); printf("Expected 0x%08x 0x%08x 0x%08x 0x%08x\n", value, value, value, value); CHECK_RESULT(true, "Data validation failed!"); break; } value++; } } void OCLPerfBufferCopySpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; srcBuffer_ = 0; dstBuffer_ = 0; persistent[0] = false; persistent[1] = false; allocHostPtr[0] = false; allocHostPtr[1] = false; useHostPtr[0] = false; useHostPtr[1] = false; memptr[0] = NULL; memptr[1] = NULL; alignedmemptr[0] = NULL; alignedmemptr[1] = NULL; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } char getVersion[128]; error_ = _wrapper->clGetPlatformInfo(platform, CL_PLATFORM_VERSION, sizeof(getVersion), getVersion, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); platformVersion[0] = getVersion[7]; platformVersion[1] = getVersion[8]; platformVersion[2] = getVersion[9]; platformVersion[3] = '\0'; bufSize_ = Sizes[_openTest % NUM_SIZES]; unsigned int srcTest = (_openTest / NUM_SIZES) % BUF_TYPES; unsigned int dstTest = (_openTest / (NUM_SIZES * BUF_TYPES)) % BUF_TYPES; if (srcTest == 3) { useHostPtr[0] = true; } else if ((srcTest == 2) && isAMD) { persistent[0] = true; } else if (srcTest == 1) { allocHostPtr[0] = true; } if ((dstTest == 1) && isAMD) { persistent[1] = true; } else if (dstTest == 2) { allocHostPtr[1] = true; } else if (dstTest == 3) { useHostPtr[1] = true; } numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS)]; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_READ_ONLY; if (persistent[0]) { flags |= CL_MEM_USE_PERSISTENT_MEM_AMD; } else if (allocHostPtr[0]) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr[0]) { flags |= CL_MEM_USE_HOST_PTR; memptr[0] = malloc(bufSize_ + 4096); alignedmemptr[0] = (void *)(((size_t)memptr[0] + 4095) & ~4095); } srcBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedmemptr[0], &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateBuffer(srcBuffer) failed"); void *mem; mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); setData(mem, bufSize_, 0x600df00d); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, srcBuffer_, mem, 0, NULL, NULL); flags = CL_MEM_WRITE_ONLY; if (persistent[1]) { flags |= CL_MEM_USE_PERSISTENT_MEM_AMD; } else if (allocHostPtr[1]) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr[1]) { flags |= CL_MEM_USE_HOST_PTR; memptr[1] = malloc(bufSize_ + 4096); alignedmemptr[1] = (void *)(((size_t)memptr[1] + 4095) & ~4095); } dstBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedmemptr[1], &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateBuffer(dstBuffer) failed"); // Force persistent memory to be on GPU if (persistent[0]) { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, memBuffer, dstBuffer_, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } if (persistent[1]) { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, srcBuffer_, memBuffer, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } } void OCLPerfBufferCopySpeed::run(void) { CPerfCounter timer; // Warm up error_ = _wrapper->clEnqueueCopyBuffer(cmd_queue_, srcBuffer_, dstBuffer_, 0, 0, bufSize_, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clEnqueueCopyBuffer(cmd_queue_, srcBuffer_, dstBuffer_, 0, 0, bufSize_, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBuffer failed"); } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer copy bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; void *mem; mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); checkData(mem, bufSize_, 0x600df00d); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); const char *strSrc = NULL; const char *strDst = NULL; if (persistent[0]) strSrc = "per"; else if (allocHostPtr[0]) strSrc = "AHP"; else if (useHostPtr[0]) strSrc = "UHP"; else strSrc = "dev"; if (persistent[1]) strDst = "per"; else if (allocHostPtr[1]) strDst = "AHP"; else if (useHostPtr[1]) strDst = "UHP"; else strDst = "dev"; // Double results when src and dst are both on device if ((persistent[0] || (!allocHostPtr[0] && !useHostPtr[0])) && (persistent[1] || (!allocHostPtr[1] && !useHostPtr[1]))) perf *= 2.0; // Double results when src and dst are both in sysmem if ((allocHostPtr[0] || useHostPtr[0]) && (allocHostPtr[1] || useHostPtr[1])) perf *= 2.0; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) s:%s d:%s i:%4d (GB/s) ", bufSize_, strSrc, strDst, numIter); testDescString = buf; } unsigned int OCLPerfBufferCopySpeed::close(void) { if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(dstBuffer_) failed"); } if (memptr[0]) { free(memptr[0]); } if (memptr[1]) { free(memptr[1]); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } void OCLPerfBufferCopyRectSpeed::run(void) { CPerfCounter timer; size_t width = static_cast(sqrt(static_cast(bufSize_))); size_t srcOrigin[3] = {0, 0, 0}; size_t dstOrigin[3] = {0, 0, 0}; size_t region[3] = {width, width, 1}; // Clamp iteration count for non-local writes to shorten test runtime unsigned int testNumIter = numIter; if (allocHostPtr[1]) { testNumIter = (numIter < 100 ? numIter : 100); } // Skip for 1.0 platforms if ((platformVersion[0] == '1') && (platformVersion[2] == '0')) { char buf[256]; SNPRINTF(buf, sizeof(buf), " SKIPPED "); testDescString = buf; return; } // Warm up error_ = _wrapper->clEnqueueCopyBufferRect(cmd_queue_, srcBuffer_, dstBuffer_, srcOrigin, dstOrigin, region, width, 0, width, 0, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBufferRect failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < testNumIter; i++) { error_ = _wrapper->clEnqueueCopyBufferRect( cmd_queue_, srcBuffer_, dstBuffer_, srcOrigin, dstOrigin, region, width, 0, width, 0, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBufferRect failed"); } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer copy bandwidth in GB/s double perf = ((double)bufSize_ * testNumIter * (double)(1e-09)) / sec; const char *strSrc = NULL; const char *strDst = NULL; if (persistent[0]) strSrc = "per"; else if (allocHostPtr[0]) strSrc = "AHP"; else if (useHostPtr[0]) strSrc = "UHP"; else strSrc = "dev"; if (persistent[1]) strDst = "per"; else if (allocHostPtr[1]) strDst = "AHP"; else if (useHostPtr[1]) strDst = "UHP"; else strDst = "dev"; // Double results when src and dst are both on device if ((persistent[0] || (!allocHostPtr[0] && !useHostPtr[0])) && (persistent[1] || (!allocHostPtr[1] && !useHostPtr[1]))) perf *= 2.0; // Double results when src and dst are both in sysmem if ((allocHostPtr[0] || useHostPtr[0]) && (allocHostPtr[1] || useHostPtr[1])) perf *= 2.0; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) s:%s d:%s i:%4d (GB/s) ", bufSize_, strSrc, strDst, testNumIter); testDescString = buf; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfBufferCopySpeed.h000066400000000000000000000042521450307266000257740ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_BufferCopySpeed_H_ #define _OCL_BufferCopySpeed_H_ #include "OCLTestImp.h" class OCLPerfBufferCopySpeed : public OCLTestImp { public: OCLPerfBufferCopySpeed(); virtual ~OCLPerfBufferCopySpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem srcBuffer_; cl_mem dstBuffer_; cl_int error_; unsigned int bufSize_; bool persistent[2]; bool allocHostPtr[2]; bool useHostPtr[2]; unsigned int numIter; bool isAMD; char platformVersion[32]; void setData(void* ptr, unsigned int size, unsigned int value); void checkData(void* ptr, unsigned int size, unsigned int value); void* memptr[2]; void* alignedmemptr[2]; }; class OCLPerfBufferCopyRectSpeed : public OCLPerfBufferCopySpeed { public: OCLPerfBufferCopyRectSpeed() : OCLPerfBufferCopySpeed() {} public: virtual void run(void); }; #endif // _OCL_BufferCopySpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfBufferReadSpeed.cpp000066400000000000000000000260571450307266000262770ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfBufferReadSpeed.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 8 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = { 1024, 32 * 1024, 64 * 1024, 128 * 1024, 262144, 1048576, 4194304, 16777216}; static cl_uint blockedSubtests; static const unsigned int Iterations[2] = {1, OCLPerfBufferReadSpeed::NUM_ITER}; #define NUM_OFFSETS 1 static const unsigned int offsets[NUM_OFFSETS] = {0}; #define NUM_SUBTESTS (3 + NUM_OFFSETS) extern const char *blkStr[2]; OCLPerfBufferReadSpeed::OCLPerfBufferReadSpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * 2; blockedSubtests = _numSubTests; _numSubTests += NUM_SIZES * NUM_SUBTESTS; } OCLPerfBufferReadSpeed::~OCLPerfBufferReadSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfBufferReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; persistent = false; allocHostPtr = false; useHostPtr = false; hostMem = NULL; alignedMem = NULL; alignment = 4096; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); char getVersion[128]; error_ = _wrapper->clGetPlatformInfo(platform, CL_PLATFORM_VERSION, sizeof(getVersion), getVersion, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); platformVersion[0] = getVersion[7]; platformVersion[1] = getVersion[8]; platformVersion[2] = getVersion[9]; platformVersion[3] = '\0'; bufSize_ = Sizes[_openTest % NUM_SIZES]; if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) > 2) { useHostPtr = true; offset = offsets[((_openTest / NUM_SIZES) % NUM_SUBTESTS) - 3]; } else if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 2) && isAMD) { persistent = true; } else if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 1) { allocHostPtr = true; } if (_openTest < blockedSubtests) { numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS)]; } else { numIter = 4 * OCLPerfBufferReadSpeed::NUM_ITER / ((_openTest % NUM_SIZES) + 1); } devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; if (persistent) { flags |= CL_MEM_USE_PERSISTENT_MEM_AMD; } else if (allocHostPtr) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; hostMem = (char *)malloc(bufSize_ + alignment - 1 + offset); CHECK_RESULT(hostMem == 0, "malloc(hostMem) failed"); alignedMem = (char *)((((intptr_t)hostMem + alignment - 1) & ~(alignment - 1)) + offset); } outBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedMem, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); // Force memory to be on GPU if possible { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, memBuffer, outBuffer_, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } } void OCLPerfBufferReadSpeed::run(void) { CPerfCounter timer; char *mem = new char[bufSize_]; cl_bool blocking = (_openTest < blockedSubtests) ? CL_TRUE : CL_FALSE; // Warm up error_ = _wrapper->clEnqueueReadBuffer(cmd_queue_, outBuffer_, CL_TRUE, 0, bufSize_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clEnqueueReadBuffer(cmd_queue_, outBuffer_, blocking, 0, bufSize_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer failed"); } if (blocking != CL_TRUE) { _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (persistent) { SNPRINTF(str, sizeof(str), "PERSISTENT (GB/s)"); } else if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } else { SNPRINTF(str, sizeof(str), "(GB/s)"); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %3s i: %4d %29s ", bufSize_, blkStr[blocking], numIter, str); testDescString = buf; delete mem; } unsigned int OCLPerfBufferReadSpeed::close(void) { if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (hostMem) { free(hostMem); } return _crcword; } void OCLPerfBufferReadRectSpeed::run(void) { CPerfCounter timer; char *mem = new char[bufSize_]; size_t width = static_cast(sqrt(static_cast(bufSize_))); size_t bufOrigin[3] = {0, 0, 0}; size_t hostOrigin[3] = {0, 0, 0}; size_t region[3] = {width, width, 1}; cl_bool blocking = (_openTest < blockedSubtests) ? CL_TRUE : CL_FALSE; // Clamp iterations to reduce run time unsigned int testNumIter; testNumIter = (numIter < 100 ? numIter : 100); // Skip for 1.0 platforms if ((platformVersion[0] == '1') && (platformVersion[2] == '0')) { char buf[256]; SNPRINTF(buf, sizeof(buf), " SKIPPED "); testDescString = buf; return; } // Warm up error_ = _wrapper->clEnqueueReadBufferRect( cmd_queue_, outBuffer_, CL_TRUE, bufOrigin, hostOrigin, region, width, 0, width, 0, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBufferRect failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < testNumIter; i++) { error_ = _wrapper->clEnqueueReadBufferRect( cmd_queue_, outBuffer_, blocking, bufOrigin, hostOrigin, region, width, 0, width, 0, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBufferRect failed"); } if (blocking != CL_TRUE) { _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s double perf = ((double)bufSize_ * testNumIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (persistent) { SNPRINTF(str, sizeof(str), "PERSISTENT (GB/s)"); } else if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } else { SNPRINTF(str, sizeof(str), "(GB/s)"); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %3s i: %4d %29s ", bufSize_, blkStr[blocking], numIter, str); testDescString = buf; delete mem; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfBufferReadSpeed.h000066400000000000000000000040521450307266000257330ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_BufferReadSpeed_H_ #define _OCL_BufferReadSpeed_H_ #include "OCLTestImp.h" class OCLPerfBufferReadSpeed : public OCLTestImp { public: OCLPerfBufferReadSpeed(); virtual ~OCLPerfBufferReadSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; bool persistent; bool allocHostPtr; bool useHostPtr; unsigned int numIter; char* hostMem; char* alignedMem; size_t alignment; unsigned int offset; bool isAMD; char platformVersion[32]; }; class OCLPerfBufferReadRectSpeed : public OCLPerfBufferReadSpeed { public: OCLPerfBufferReadRectSpeed() : OCLPerfBufferReadSpeed() {} public: virtual void run(void); }; #endif // _OCL_BufferReadSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfBufferWriteSpeed.cpp000066400000000000000000000257601450307266000265160ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfBufferWriteSpeed.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 8 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = { 1024, 32 * 1024, 64 * 1024, 128 * 1024, 262144, 1048576, 4194304, 16777216}; static cl_uint blockedSubtests; static const unsigned int Iterations[2] = {1, OCLPerfBufferWriteSpeed::NUM_ITER}; #define NUM_OFFSETS 1 static const unsigned int offsets[NUM_OFFSETS] = {0}; #define NUM_SUBTESTS (3 + NUM_OFFSETS) extern const char *blkStr[2]; OCLPerfBufferWriteSpeed::OCLPerfBufferWriteSpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * 2; blockedSubtests = _numSubTests; _numSubTests += NUM_SIZES * NUM_SUBTESTS; } OCLPerfBufferWriteSpeed::~OCLPerfBufferWriteSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfBufferWriteSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; persistent = false; allocHostPtr = false; useHostPtr = false; hostMem = NULL; alignedMem = NULL; alignment = 4096; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); char getVersion[128]; error_ = _wrapper->clGetPlatformInfo(platform, CL_PLATFORM_VERSION, sizeof(getVersion), getVersion, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); platformVersion[0] = getVersion[7]; platformVersion[1] = getVersion[8]; platformVersion[2] = getVersion[9]; platformVersion[3] = '\0'; bufSize_ = Sizes[_openTest % NUM_SIZES]; if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) > 2) { useHostPtr = true; offset = offsets[((_openTest / NUM_SIZES) % NUM_SUBTESTS) - 3]; } else if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 2) && isAMD) { persistent = true; } else if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 1) { allocHostPtr = true; } if (_openTest < blockedSubtests) { numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS)]; } else { numIter = 4 * OCLPerfBufferWriteSpeed::NUM_ITER / ((_openTest % NUM_SIZES) + 1); } devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_READ_ONLY; if (persistent) { flags |= CL_MEM_USE_PERSISTENT_MEM_AMD; } else if (allocHostPtr) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; hostMem = (char *)malloc(bufSize_ + alignment - 1 + offset); CHECK_RESULT(hostMem == 0, "malloc(hostMem) failed"); alignedMem = (char *)((((intptr_t)hostMem + alignment - 1) & ~(alignment - 1)) + offset); } outBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedMem, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); // Force memory to be on GPU if possible { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, outBuffer_, memBuffer, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } } void OCLPerfBufferWriteSpeed::run(void) { CPerfCounter timer; char *mem = new char[bufSize_]; cl_bool blocking = (_openTest < blockedSubtests) ? CL_TRUE : CL_FALSE; // Warm up error_ = _wrapper->clEnqueueWriteBuffer(cmd_queue_, outBuffer_, CL_TRUE, 0, bufSize_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clEnqueueWriteBuffer(cmd_queue_, outBuffer_, blocking, 0, bufSize_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer failed"); } if (blocking != CL_TRUE) { _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer write bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (persistent) { SNPRINTF(str, sizeof(str), "PERSISTENT (GB/s)"); } else if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } else { SNPRINTF(str, sizeof(str), "(GB/s)"); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %3s i: %4d %29s ", bufSize_, blkStr[blocking], numIter, str); testDescString = buf; delete mem; } unsigned int OCLPerfBufferWriteSpeed::close(void) { if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (hostMem) { free(hostMem); } return _crcword; } void OCLPerfBufferWriteRectSpeed::run(void) { CPerfCounter timer; char *mem = new char[bufSize_]; size_t width = static_cast(sqrt(static_cast(bufSize_))); size_t bufOrigin[3] = {0, 0, 0}; size_t hostOrigin[3] = {0, 0, 0}; size_t region[3] = {width, width, 1}; cl_bool blocking = (_openTest < blockedSubtests) ? CL_TRUE : CL_FALSE; // Skip for 1.0 platforms if ((platformVersion[0] == '1') && (platformVersion[2] == '0')) { char buf[256]; SNPRINTF(buf, sizeof(buf), " SKIPPED "); testDescString = buf; return; } // Warm up error_ = _wrapper->clEnqueueWriteBufferRect( cmd_queue_, outBuffer_, CL_TRUE, bufOrigin, hostOrigin, region, width, 0, width, 0, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBufferRect failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clEnqueueWriteBufferRect( cmd_queue_, outBuffer_, blocking, bufOrigin, hostOrigin, region, width, 0, width, 0, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBufferRect failed"); } if (blocking != CL_TRUE) { _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer write bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (persistent) { SNPRINTF(str, sizeof(str), "PERSISTENT (GB/s)"); } else if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } else { SNPRINTF(str, sizeof(str), "(GB/s)"); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %3s i: %4d %29s ", bufSize_, blkStr[blocking], numIter, str); testDescString = buf; delete mem; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfBufferWriteSpeed.h000066400000000000000000000040641450307266000261550ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_BufferWriteSpeed_H_ #define _OCL_BufferWriteSpeed_H_ #include "OCLTestImp.h" class OCLPerfBufferWriteSpeed : public OCLTestImp { public: OCLPerfBufferWriteSpeed(); virtual ~OCLPerfBufferWriteSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; bool persistent; bool allocHostPtr; bool useHostPtr; unsigned int numIter; char* hostMem; char* alignedMem; size_t alignment; unsigned int offset; bool isAMD; char platformVersion[32]; }; class OCLPerfBufferWriteRectSpeed : public OCLPerfBufferWriteSpeed { public: OCLPerfBufferWriteRectSpeed() : OCLPerfBufferWriteSpeed() {} public: virtual void run(void); }; #endif // _OCL_BufferWriteSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfCPUMemSpeed.cpp000066400000000000000000000234451450307266000253560ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfCPUMemSpeed.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; #define ITER_COUNT 2 static const unsigned int Iterations[2] = {1, OCLPerfCPUMemSpeed::NUM_ITER}; #define NUM_OFFSETS 1 static const unsigned int offsets[NUM_OFFSETS] = {0}; #define NUM_SUBTESTS (3 + NUM_OFFSETS) OCLPerfCPUMemSpeed::OCLPerfCPUMemSpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * ITER_COUNT * 3; } OCLPerfCPUMemSpeed::~OCLPerfCPUMemSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfCPUMemSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; persistent = false; allocHostPtr = false; useHostPtr = false; hostMem = NULL; alignedMem = NULL; alignment = 4096; testMemset = false; isAMD = false; gpuSrc = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); CHECK_RESULT(num_devices == 0, "No devices found, cannot proceed"); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); bufSize_ = Sizes[_openTest % NUM_SIZES]; if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) > 2) { useHostPtr = true; offset = offsets[((_openTest / NUM_SIZES) % NUM_SUBTESTS) - 3]; } else if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 2) && isAMD) { persistent = true; } else if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 1) { allocHostPtr = true; } numIter = Iterations[(_openTest / (NUM_SIZES * NUM_SUBTESTS)) % 2]; if (_openTest >= (NUM_SIZES * NUM_SUBTESTS * ITER_COUNT * 2)) testMemset = true; else if (_openTest >= (NUM_SIZES * NUM_SUBTESTS * ITER_COUNT)) { gpuSrc = true; numIter = std::min(numIter, 10u); } devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags; if (gpuSrc) { flags = CL_MEM_WRITE_ONLY; mapFlags = CL_MAP_READ; } else { flags = CL_MEM_READ_ONLY; mapFlags = CL_MAP_WRITE; } if (persistent) { flags |= CL_MEM_USE_PERSISTENT_MEM_AMD; } else if (allocHostPtr) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; hostMem = (char *)malloc(bufSize_ + alignment - 1 + offset); CHECK_RESULT(hostMem == 0, "malloc(hostMem) failed"); alignedMem = (char *)((((intptr_t)hostMem + alignment - 1) & ~(alignment - 1)) + offset); } outBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedMem, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); // Force memory to be on GPU if possible { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, memBuffer, outBuffer_, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } } void OCLPerfCPUMemSpeed::run(void) { CPerfCounter timer; void *mem; // Warm up mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer_, CL_TRUE, mapFlags, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer_, CL_TRUE, mapFlags, 0, bufSize_, 0, NULL, NULL, &error_); char *cpumem = new char[bufSize_]; timer.Reset(); timer.Start(); if (testMemset) { for (unsigned int i = 0; i < numIter; i++) { memset(mem, 0, bufSize_); } } else { if (gpuSrc) { for (unsigned int i = 0; i < numIter; i++) { memcpy((void *)cpumem, mem, bufSize_); } } else { for (unsigned int i = 0; i < numIter; i++) { memcpy(mem, (void *)cpumem, bufSize_); } } } timer.Stop(); delete[] cpumem; CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); double sec = timer.GetElapsedTime(); // Map read bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (persistent) { SNPRINTF(str, sizeof(str), "PERSISTENT (GB/s)"); } else if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } else { SNPRINTF(str, sizeof(str), "(GB/s)"); } const char *str2 = NULL; if (testMemset) str2 = "memset to dev"; else { if (gpuSrc) str2 = "memcpy from dev"; else str2 = "memcpy to dev"; } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %15s i: %4d %29s ", bufSize_, str2, numIter, str); testDescString = buf; } unsigned int OCLPerfCPUMemSpeed::close(void) { if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (hostMem) { free(hostMem); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfCPUMemSpeed.h000066400000000000000000000035771450307266000250270ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_CPUMemSpeed_H_ #define _OCL_CPUMemSpeed_H_ #include "OCLTestImp.h" class OCLPerfCPUMemSpeed : public OCLTestImp { public: OCLPerfCPUMemSpeed(); virtual ~OCLPerfCPUMemSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; bool persistent; bool allocHostPtr; bool useHostPtr; unsigned int numIter; bool testMemset; char* hostMem; char* alignedMem; size_t alignment; unsigned int offset; bool isAMD; bool gpuSrc; cl_map_flags mapFlags; }; #endif // _OCL_CPUMemSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfCommandQueue.cpp000066400000000000000000000120441450307266000256630ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfCommandQueue.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" static const size_t BufSize = 0x1000; static const size_t Iterations = 0x100; static const size_t TotalQueues = 4; static const size_t TotalBufs = 4; OCLPerfCommandQueue::OCLPerfCommandQueue() { _numSubTests = TotalQueues * TotalBufs; failed_ = false; } OCLPerfCommandQueue::~OCLPerfCommandQueue() {} void OCLPerfCommandQueue::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { cl_mem buffer; _deviceId = deviceId; CPerfCounter timer; timer.Reset(); timer.Start(); OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); timer.Stop(); if (test == 0) { printf("Runtime load/init time: %0.2f ms\n", static_cast(timer.GetElapsedTime() * 1000)); } test_ = test; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } static const size_t MemObjects[] = {1, 100, 1000, 5000}; size_t numMems = MemObjects[test_ / TotalBufs]; size_t bufSize = BufSize * sizeof(cl_int4); for (size_t b = 0; b < numMems; ++b) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, bufSize, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfCommandQueue::run(void) { if (failed_) { return; } unsigned int* values; values = reinterpret_cast(new cl_int4[BufSize]); CPerfCounter timer; static const size_t Queues[] = {1, 2, 4, 8}; size_t numQueues = Queues[test_ % TotalQueues]; // Clear destination buffer memset(values, 0, BufSize * sizeof(cl_int4)); size_t iter = Iterations / (numQueues * ((size_t)1 << (test_ / TotalBufs + 1))); std::vector cmdQueues(numQueues); timer.Reset(); timer.Start(); for (size_t i = 0; i < iter; ++i) { for (size_t q = 0; q < numQueues; ++q) { cl_command_queue cmdQueue = _wrapper->clCreateCommandQueue( context_, devices_[_deviceId], 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed"); cmdQueues[q] = cmdQueue; } timer.Stop(); for (size_t q = 0; q < numQueues; ++q) { for (size_t b = 0; b < buffers_.size(); ++b) { error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues[q], buffers_[b], CL_TRUE, 0, sizeof(cl_int4), values, 0, NULL, NULL); } } timer.Start(); for (size_t q = 0; q < numQueues; ++q) { error_ = _wrapper->clReleaseCommandQueue(cmdQueues[q]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); } timer.Stop(); std::stringstream stream; stream << "Create+destroy time for " << numQueues << " queues and " << buffers_.size() << " buffers"; stream.precision(3); stream.width(5); stream.setf(std::ios::fixed, std::ios::floatfield); stream << "(ms)"; testDescString = stream.str(); _perfInfo = static_cast(timer.GetElapsedTime() * 1000 / (iter * numQueues)); delete[] values; } unsigned int OCLPerfCommandQueue::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfCommandQueue.h000066400000000000000000000030701450307266000253270ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_COMMAND_QUEUE_H_ #define _OCL_PERF_COMMAND_QUEUE_H_ #include "OCLTestImp.h" class OCLPerfCommandQueue : public OCLTestImp { public: OCLPerfCommandQueue(); virtual ~OCLPerfCommandQueue(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int test_; }; #endif // _OCL_PERF_COMMAND_QUEUE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfConcurrency.cpp000066400000000000000000000456171450307266000256060ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfConcurrency.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif typedef struct { double x; double y; double width; } coordRec; static coordRec coords[] = { {0.0, 0.0, 0.00001}, // All black }; static unsigned int numCoords = sizeof(coords) / sizeof(coordRec); static const char *float_mandel_vec = "__kernel void mandelbrot(__global uint *out, uint width, float xPos, " "float yPos, float xStep, float yStep, uint maxIter)\n" "{\n" " int tid = get_global_id(0);\n" " int i = tid % (width/4);\n" " int j = tid / (width/4);\n" " int4 veci = (int4)(4*i, 4*i+1, 4*i+2, 4*i+3);\n" " int4 vecj = (int4)(j, j, j, j);\n" " float4 x0;\n" " x0.s0 = (float)(xPos + xStep*veci.s0);\n" " x0.s1 = (float)(xPos + xStep*veci.s1);\n" " x0.s2 = (float)(xPos + xStep*veci.s2);\n" " x0.s3 = (float)(xPos + xStep*veci.s3);\n" " float4 y0;\n" " y0.s0 = (float)(yPos + yStep*vecj.s0);\n" " y0.s1 = (float)(yPos + yStep*vecj.s1);\n" " y0.s2 = (float)(yPos + yStep*vecj.s2);\n" " y0.s3 = (float)(yPos + yStep*vecj.s3);\n" "\n" " float4 x = x0;\n" " float4 y = y0;\n" "\n" " uint iter = 0;\n" " float4 tmp;\n" " int4 stay;\n" " int4 ccount = 0;\n" " float4 savx = x;\n" " float4 savy = y;\n" " stay = (x*x+y*y) <= (float4)(4.0f, 4.0f, 4.0f, 4.0f);\n" " for (iter = 0; (stay.s0 | stay.s1 | stay.s2 | stay.s3) && (iter < " "maxIter); iter+=16)\n" " {\n" " x = savx;\n" " y = savy;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " stay = (x*x+y*y) <= (float4)(4.0f, 4.0f, 4.0f, 4.0f);\n" " savx = (stay ? x : savx);\n" " savy = (stay ? y : savy);\n" " ccount -= stay*16;\n" " }\n" " // Handle remainder\n" " if (!(stay.s0 & stay.s1 & stay.s2 & stay.s3))\n" " {\n" " iter = 16;\n" " do\n" " {\n" " x = savx;\n" " y = savy;\n" " // More efficient to use scalar ops here: Why?\n" " stay.s0 = ((x.s0*x.s0+y.s0*y.s0) <= 4.0f) && (ccount.s0 < " "maxIter);\n" " stay.s1 = ((x.s1*x.s1+y.s1*y.s1) <= 4.0f) && (ccount.s1 < " "maxIter);\n" " stay.s2 = ((x.s2*x.s2+y.s2*y.s2) <= 4.0f) && (ccount.s2 < " "maxIter);\n" " stay.s3 = ((x.s3*x.s3+y.s3*y.s3) <= 4.0f) && (ccount.s3 < " "maxIter);\n" " tmp = x;\n" " x = x*x + x0 - y*y;\n" " y = 2.0f*tmp*y + y0;\n" " ccount += stay;\n" " iter--;\n" " savx.s0 = (stay.s0 ? x.s0 : savx.s0);\n" " savx.s1 = (stay.s1 ? x.s1 : savx.s1);\n" " savx.s2 = (stay.s2 ? x.s2 : savx.s2);\n" " savx.s3 = (stay.s3 ? x.s3 : savx.s3);\n" " savy.s0 = (stay.s0 ? y.s0 : savy.s0);\n" " savy.s1 = (stay.s1 ? y.s1 : savy.s1);\n" " savy.s2 = (stay.s2 ? y.s2 : savy.s2);\n" " savy.s3 = (stay.s3 ? y.s3 : savy.s3);\n" " } while ((stay.s0 | stay.s1 | stay.s2 | stay.s3) && iter);\n" " }\n" " __global uint4 *vecOut = (__global uint4 *)out;\n" " vecOut[tid] = convert_uint4(ccount);\n" "}\n"; OCLPerfConcurrency::OCLPerfConcurrency() { _numSubTests = 10 * numCoords; } OCLPerfConcurrency::~OCLPerfConcurrency() {} void OCLPerfConcurrency::setData(cl_mem buffer, unsigned int val) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_[0], buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < width_; i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_[0], buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_[0]); } void OCLPerfConcurrency::checkData(cl_mem buffer) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_[0], buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); totalIters = 0; for (unsigned int i = 0; i < width_; i++) { totalIters += data[i]; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_[0], buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_[0]); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfConcurrency::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; unsigned int i; if (type_ != CL_DEVICE_TYPE_GPU) { char msg[256]; SNPRINTF(msg, sizeof(msg), "No GPU devices present. Exiting!\t"); testDescString = msg; return; } _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; for (i = 0; i < MAX_ASYNC_QUEUES; i++) { cmd_queue_[i] = 0; program_[i] = 0; kernel_[i] = 0; outBuffer_[i] = 0; } // Maximum iteration count // NOTE: Some kernels are unrolled 16 times, so make sure maxIter is divisible // by 16 NOTE: Can increase to get better peak performance numbers, but be // sure not to TDR slow ASICs! NOTE:. for warmup run we use maxIter = 256 and // then for the actual run we use maxIter = 8388608 * (engine_clock / 1000). maxIter = 256; // NOTE: Width needs to be divisible by 4 because the float_mandel_vec kernel // processes 4 pixels at once NOTE: Can increase to get better peak // performance numbers, but be sure not to TDR slow ASICs! width_ = 256; // We compute a square domain bufSize_ = width_ * sizeof(cl_uint); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); cl_uint numAsyncQueues; error_ = _wrapper->clGetDeviceInfo( device, CL_DEVICE_AVAILABLE_ASYNC_QUEUES_AMD, sizeof(numAsyncQueues), &numAsyncQueues, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); CHECK_RESULT(numAsyncQueues > MAX_ASYNC_QUEUES, "numAsyncQueues is too large for this test"); error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(size_t), &numCUs, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); switch (_openTest) { case 0: num_cmd_queues = num_programs = num_kernels = num_outbuffers = 1; break; case 1: num_cmd_queues = 1; num_programs = 1; num_kernels = 1; num_outbuffers = 2; break; case 2: num_cmd_queues = 1; num_programs = 2; num_kernels = 2; num_outbuffers = 2; break; case 3: num_cmd_queues = num_programs = num_kernels = num_outbuffers = 2; break; case 4: case 5: case 6: case 7: case 8: case 9: num_cmd_queues = num_programs = num_kernels = num_outbuffers = numAsyncQueues % 8; break; default: break; } for (i = 0; i < num_cmd_queues; i++) { cmd_queue_[i] = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_[i] == 0, "clCreateCommandQueue failed"); } for (i = 0; i < num_outbuffers; i++) { outBuffer_[i] = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_[i] == 0, "clCreateBuffer(outBuffer) failed"); } const char *tmp; tmp = float_mandel_vec; for (i = 0; i < num_programs; i++) { program_[i] = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_[i] == 0, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_[i], 1, &device, "", NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo( program_[i], device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } } for (i = 0; i < num_kernels; i++) { kernel_[i] = _wrapper->clCreateKernel(program_[i], "mandelbrot", &error_); CHECK_RESULT(kernel_[i] == 0, "clCreateKernel failed"); } coordIdx = _openTest % numCoords; float xStep = (float)(coords[coordIdx].width / (double)width_); float yStep = (float)(-coords[coordIdx].width / (double)width_); float xPos = (float)(coords[coordIdx].x - 0.5 * coords[coordIdx].width); float yPos = (float)(coords[coordIdx].y + 0.5 * coords[coordIdx].width); for (i = 0; i < num_kernels; i++) { error_ = _wrapper->clSetKernelArg(kernel_[i], 0, sizeof(cl_mem), (void *)&outBuffer_[i]); error_ = _wrapper->clSetKernelArg(kernel_[i], 1, sizeof(cl_uint), (void *)&width_); error_ = _wrapper->clSetKernelArg(kernel_[i], 2, sizeof(cl_float), (void *)&xPos); error_ = _wrapper->clSetKernelArg(kernel_[i], 3, sizeof(cl_float), (void *)&yPos); error_ = _wrapper->clSetKernelArg(kernel_[i], 4, sizeof(cl_float), (void *)&xStep); error_ = _wrapper->clSetKernelArg(kernel_[i], 5, sizeof(cl_float), (void *)&yStep); error_ = _wrapper->clSetKernelArg(kernel_[i], 6, sizeof(cl_uint), (void *)&maxIter); } for (i = 0; i < num_outbuffers; i++) { setData(outBuffer_[i], 0xdeadbeef); } unsigned int clkFrequency = 0; error_ = clGetDeviceInfo(device, CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(clkFrequency), &clkFrequency, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); assert(clkFrequency > 0); maxIter = (unsigned int)(((8388608 * ((float)clkFrequency / 1000)) * numCUs) / 128); maxIter = (maxIter + 15) & ~15; } void OCLPerfConcurrency::run(void) { // Test runs only on GPU if (type_ != CL_DEVICE_TYPE_GPU) return; int global = width_ >> 2; // We handle 4 pixels per thread int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; unsigned int i; // Warmup for (i = 0; i < num_kernels; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_[i % num_cmd_queues], kernel_[i], 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } for (i = 0; i < num_cmd_queues; i++) { _wrapper->clFlush(cmd_queue_[i]); } for (i = 0; i < num_cmd_queues; i++) { _wrapper->clFinish(cmd_queue_[i]); } for (i = 0; i < num_kernels; i++) { error_ = _wrapper->clSetKernelArg(kernel_[i], 6, sizeof(cl_uint), (void *)&maxIter); } CPerfCounter timer; timer.Reset(); timer.Start(); for (i = 0; i < num_kernels; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_[i % num_cmd_queues], kernel_[i], 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } if (_openTest == 1) { error_ = _wrapper->clSetKernelArg(kernel_[0], 0, sizeof(cl_mem), (void *)&outBuffer_[1]); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_[0], kernel_[0], 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } for (i = 0; i < num_cmd_queues; i++) { _wrapper->clFlush(cmd_queue_[i]); } for (i = 0; i < num_cmd_queues; i++) { _wrapper->clFinish(cmd_queue_[i]); } timer.Stop(); double sec = timer.GetElapsedTime(); unsigned long long expected = (unsigned long long)width_ * (unsigned long long)maxIter; for (i = 0; i < num_outbuffers; i++) { checkData(outBuffer_[i]); CHECK_RESULT(totalIters != expected, "Incorrect iteration count detected!"); } _perfInfo = (float)sec; if (_openTest == 0) testDescString = "time for 1 kernel (s) "; else if (_openTest == 1) testDescString = "time for 2 kernels (s) (same kernel) "; else if (_openTest == 2) testDescString = "time for 2 kernels (s) (diff kernels)"; else { char buf[128]; SNPRINTF(buf, sizeof(buf), "time for %d kernels (s) ( %d queues) ", num_kernels, num_cmd_queues); testDescString = buf; } } unsigned int OCLPerfConcurrency::close(void) { unsigned int i; // Test runs only on GPU if (type_ != CL_DEVICE_TYPE_GPU) return 0; _wrapper->clFinish(cmd_queue_[0]); for (i = 0; i < num_outbuffers; i++) { error_ = _wrapper->clReleaseMemObject(outBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } for (i = 0; i < num_kernels; i++) { error_ = _wrapper->clReleaseKernel(kernel_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel(kernel_) failed"); } for (i = 0; i < num_programs; i++) { error_ = _wrapper->clReleaseProgram(program_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram(program_) failed"); } for (i = 0; i < num_cmd_queues; i++) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfConcurrency.h000066400000000000000000000041421450307266000252370ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_Perf_Concurrency_H_ #define _OCL_Perf_Concurrency_H_ #include "OCLTestImp.h" class OCLPerfConcurrency : public OCLTestImp { public: OCLPerfConcurrency(); virtual ~OCLPerfConcurrency(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); #define MAX_ASYNC_QUEUES 8 cl_context context_; cl_command_queue cmd_queue_[MAX_ASYNC_QUEUES]; cl_program program_[MAX_ASYNC_QUEUES]; cl_kernel kernel_[MAX_ASYNC_QUEUES]; cl_mem outBuffer_[MAX_ASYNC_QUEUES]; cl_int error_; unsigned int num_cmd_queues; unsigned int num_programs; unsigned int num_kernels; unsigned int num_outbuffers; unsigned int width_; unsigned int bufSize_; unsigned int maxIter; unsigned int coordIdx; unsigned long long totalIters; size_t numCUs; }; #endif // _OCL_Perf_Concurrency_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDevMemReadSpeed.cpp000066400000000000000000000220351450307266000262330ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDevMemReadSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 1 static const unsigned int Sizes[NUM_SIZES] = {256 * 1024 * 1024}; const static char *strKernel = "__kernel void read_kernel(__global uint16 *src, ulong size1, uint " "threads, __global uint* dst\n" " )\n" "{\n" " uint16 pval;\n" " int idx = get_global_id(0);\n" " __global uint16 *srcEnd = src + size1;\n" " uint tmp = 0;\n" " src = &src[idx];" " while (src < srcEnd) \n" " {\n" " pval = *src;\n" " src += threads;\n" " tmp += pval.s0 + pval.s1 + pval.s2 + pval.s3 + pval.s4 + pval.s5 + pval.s6 + \ pval.s7 + pval.s8 + pval.s9 + pval.sa + pval.sb + pval.sc + pval.sd + pval.se + pval.sf;\n" " }\n" " atomic_add(dst, tmp);\n" "}\n"; OCLPerfDevMemReadSpeed::OCLPerfDevMemReadSpeed() { _numSubTests = 1; } OCLPerfDevMemReadSpeed::~OCLPerfDevMemReadSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfDevMemReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; skip_ = false; dstBuffer_ = 0; nBytes = Sizes[0]; cl_ulong loopCnt = nBytes / (16 * sizeof(cl_uint)); cl_uint maxCUs; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(cl_uint), &maxCUs, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); wgs = 64; const static cl_uint wavesPerCU = 8; nWorkItems = maxCUs * wavesPerCU * wgs; inputData = 0x1; nIter = 1000; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "read_kernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); srcBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, nBytes, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer(srcBuffer) failed"); void *mem; mem = _wrapper->clEnqueueMapBuffer(cmdQueues_[_deviceId], srcBuffer_, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, nBytes, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); for (unsigned int i = 0; i < nBytes / sizeof(cl_uint); ++i) { reinterpret_cast(mem)[i] = inputData; } dstBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer(dstBuffer) failed"); _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], srcBuffer_, mem, 0, NULL, NULL); mem = _wrapper->clEnqueueMapBuffer(cmdQueues_[_deviceId], dstBuffer_, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, sizeof(cl_uint), 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memset(mem, 0, sizeof(cl_uint)); _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], dstBuffer_, mem, 0, NULL, NULL); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &srcBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_ulong), (void *)&loopCnt); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&nWorkItems); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_mem), (void *)&dstBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } void OCLPerfDevMemReadSpeed::run(void) { if (skip_) { return; } CPerfCounter timer; size_t gws[1] = {nWorkItems}; size_t lws[1] = {wgs}; // warm up error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); cl_uint *memResult; memResult = (cl_uint *)malloc(sizeof(cl_uint)); if (0 == memResult) { CHECK_RESULT_NO_RETURN(0, "malloc failed!\n"); return; } memset(memResult, 0, sizeof(cl_uint)); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], dstBuffer_, CL_FALSE, 0, sizeof(cl_uint), memResult, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer dstBuffer_ failed!"); _wrapper->clFinish(cmdQueues_[_deviceId]); if (memResult[0] != (nBytes / sizeof(cl_uint))) { CHECK_RESULT_NO_RETURN(0, "Data validation failed for warm up run!\n"); free(memResult); return; } free(memResult); timer.Reset(); timer.Start(); double sec2 = 0; cl_event *events = new cl_event[nIter]; for (unsigned int i = 0; i < nIter; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, &events[i]); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); for (unsigned int i = 0; i < nIter; i++) { cl_ulong startTime = 0, endTime = 0; error_ = _wrapper->clGetEventProfilingInfo( events[i], CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, 0); CHECK_RESULT(error_, "clGetEventProfilingInfo failed"); error_ = _wrapper->clGetEventProfilingInfo( events[i], CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, 0); CHECK_RESULT(error_, "clGetEventProfilingInfo failed"); _wrapper->clReleaseEvent(events[i]); sec2 += endTime - startTime; } double sec = timer.GetElapsedTime(); delete[] events; // read speed in GB/s double perf = ((double)nBytes * nIter * (double)(1e-09)) / sec; double perf2 = ((double)nBytes * nIter) / sec2; _perfInfo = (float)perf2; float perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) i:%4d Wall time Perf: %.2f (GB/s)", nBytes, nIter, perfInfo); testDescString = buf; } unsigned int OCLPerfDevMemReadSpeed::close(void) { if (!skip_) { if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDevMemReadSpeed.h000066400000000000000000000035441450307266000257040ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DevMemReadSpeed_H_ #define _OCL_DevMemReadSpeed_H_ #include "OCLTestImp.h" class OCLPerfDevMemReadSpeed : public OCLTestImp { public: OCLPerfDevMemReadSpeed(); virtual ~OCLPerfDevMemReadSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); cl_mem srcBuffer_; cl_mem dstBuffer_; unsigned int nWorkItems; // number of GPU work items unsigned int wgs; // work group size unsigned int nBytes; // input and output buffer size unsigned int nIter; // overall number of timing loops cl_uint inputData; // input data to fill the input buffer bool skip_; }; #endif // _OCL_DevMemReadSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDevMemWriteSpeed.cpp000066400000000000000000000167501450307266000264610ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDevMemWriteSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 1 static const unsigned int Sizes[NUM_SIZES] = {256 * 1024 * 1024}; const static char *strKernel = "__kernel void write_kernel(__global uint16 *dst, ulong size1, uint " "threads\n" " )\n" "{\n" " uint16 pval = (uint16)(0xabababab, 0xabababab, 0xabababab, 0xabababab, 0xabababab, 0xabababab, 0xabababab, 0xabababab,\ 0xabababab, 0xabababab, 0xabababab, 0xabababab, 0xabababab, 0xabababab, 0xabababab, 0xabababab);\n" " int idx = get_global_id(0);\n" " __global uint16 *dstEnd = dst + size1;\n" " dst = &dst[idx];" " do\n" " {\n" " *dst = pval;\n" " dst += threads;\n" " }\n" " while (dst < dstEnd);\n" "}\n"; OCLPerfDevMemWriteSpeed::OCLPerfDevMemWriteSpeed() { _numSubTests = 1; } OCLPerfDevMemWriteSpeed::~OCLPerfDevMemWriteSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfDevMemWriteSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; skip_ = false; dstBuffer_ = 0; nBytes = Sizes[0]; cl_ulong loopCnt = nBytes / (16 * sizeof(cl_uint)); cl_uint maxCUs; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(cl_uint), &maxCUs, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); wgs = 64; const static cl_uint wavesPerCU = 8; nWorkItems = maxCUs * wavesPerCU * wgs; inputData = 0xabababab; nIter = 1000; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "write_kernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); dstBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, nBytes, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer(dstBuffer) failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &dstBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_ulong), (void *)&loopCnt); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&nWorkItems); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } void OCLPerfDevMemWriteSpeed::run(void) { if (skip_) { return; } CPerfCounter timer; size_t gws[1] = {nWorkItems}; size_t lws[1] = {wgs}; // warm up error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); cl_uint *memResult; memResult = (cl_uint *)malloc(nBytes); if (0 == memResult) { CHECK_RESULT_NO_RETURN(0, "malloc failed!\n"); return; } memset(memResult, 0, nBytes); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], dstBuffer_, CL_FALSE, 0, nBytes, memResult, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer dstBuffer_ failed!"); _wrapper->clFinish(cmdQueues_[_deviceId]); for (unsigned int i = 0; i < nBytes / sizeof(cl_uint); i++) { if (((cl_uint *)memResult)[i] != inputData) { CHECK_RESULT_NO_RETURN(0, "Data validation failed for warm up run!\n"); free(memResult); return; } } free(memResult); timer.Reset(); timer.Start(); double sec2 = 0; cl_event *events = new cl_event[nIter]; for (unsigned int i = 0; i < nIter; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, &events[i]); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); for (unsigned int i = 0; i < nIter; i++) { cl_ulong startTime = 0, endTime = 0; error_ = _wrapper->clGetEventProfilingInfo( events[i], CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, 0); CHECK_RESULT(error_, "clGetEventProfilingInfo failed"); error_ = _wrapper->clGetEventProfilingInfo( events[i], CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, 0); CHECK_RESULT(error_, "clGetEventProfilingInfo failed"); _wrapper->clReleaseEvent(events[i]); sec2 += endTime - startTime; } double sec = timer.GetElapsedTime(); delete[] events; // write speed in GB/s double perf = ((double)nBytes * nIter * (double)(1e-09)) / sec; double perf2 = ((double)nBytes * nIter) / sec2; _perfInfo = (float)perf2; float perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) i:%4d Wall time Perf: %.2f (GB/s)", nBytes, nIter, perfInfo); testDescString = buf; } unsigned int OCLPerfDevMemWriteSpeed::close(void) { if (!skip_) { if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDevMemWriteSpeed.h000066400000000000000000000035131450307266000261170ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DevMemWriteSpeed_H_ #define _OCL_DevMemWriteSpeed_H_ #include "OCLTestImp.h" class OCLPerfDevMemWriteSpeed : public OCLTestImp { public: OCLPerfDevMemWriteSpeed(); virtual ~OCLPerfDevMemWriteSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); cl_mem dstBuffer_; unsigned int nWorkItems; // number of GPU work items unsigned int wgs; // work group size unsigned int nBytes; // output buffer size unsigned int nIter; // overall number of timing loops cl_uint inputData; // input data to fill the input buffer bool skip_; }; #endif // _OCL_DevMemWriteSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceConcurrency.cpp000066400000000000000000000413401450307266000267130ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDeviceConcurrency.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif typedef struct { double x; double y; double width; } coordRec; static coordRec coords[] = { {0.0, 0.0, 0.00001}, // All black }; static unsigned int numCoords = sizeof(coords) / sizeof(coordRec); static const char *float_mandel_vec = "__kernel void mandelbrot(__global uint *out, uint width, float xPos, " "float yPos, float xStep, float yStep, uint maxIter)\n" "{\n" " int tid = get_global_id(0);\n" " int i = tid % (width/4);\n" " int j = tid / (width/4);\n" " int4 veci = (int4)(4*i, 4*i+1, 4*i+2, 4*i+3);\n" " int4 vecj = (int4)(j, j, j, j);\n" " float4 x0;\n" " x0.s0 = (float)(xPos + xStep*veci.s0);\n" " x0.s1 = (float)(xPos + xStep*veci.s1);\n" " x0.s2 = (float)(xPos + xStep*veci.s2);\n" " x0.s3 = (float)(xPos + xStep*veci.s3);\n" " float4 y0;\n" " y0.s0 = (float)(yPos + yStep*vecj.s0);\n" " y0.s1 = (float)(yPos + yStep*vecj.s1);\n" " y0.s2 = (float)(yPos + yStep*vecj.s2);\n" " y0.s3 = (float)(yPos + yStep*vecj.s3);\n" "\n" " float4 x = x0;\n" " float4 y = y0;\n" "\n" " uint iter = 0;\n" " float4 tmp;\n" " int4 stay;\n" " int4 ccount = 0;\n" " float4 savx = x;\n" " float4 savy = y;\n" " stay = (x*x+y*y) <= (float4)(4.0f, 4.0f, 4.0f, 4.0f);\n" " for (iter = 0; (stay.s0 | stay.s1 | stay.s2 | stay.s3) && (iter < " "maxIter); iter+=16)\n" " {\n" " x = savx;\n" " y = savy;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " // Two iterations\n" " tmp = x*x + x0 - y*y;\n" " y = 2.0f * x * y + y0;\n" " x = tmp*tmp + x0 - y*y;\n" " y = 2.0f * tmp * y + y0;\n" "\n" " stay = (x*x+y*y) <= (float4)(4.0f, 4.0f, 4.0f, 4.0f);\n" " savx = (stay ? x : savx);\n" " savy = (stay ? y : savy);\n" " ccount -= stay*16;\n" " }\n" " // Handle remainder\n" " if (!(stay.s0 & stay.s1 & stay.s2 & stay.s3))\n" " {\n" " iter = 16;\n" " do\n" " {\n" " x = savx;\n" " y = savy;\n" " // More efficient to use scalar ops here: Why?\n" " stay.s0 = ((x.s0*x.s0+y.s0*y.s0) <= 4.0f) && (ccount.s0 < " "maxIter);\n" " stay.s1 = ((x.s1*x.s1+y.s1*y.s1) <= 4.0f) && (ccount.s1 < " "maxIter);\n" " stay.s2 = ((x.s2*x.s2+y.s2*y.s2) <= 4.0f) && (ccount.s2 < " "maxIter);\n" " stay.s3 = ((x.s3*x.s3+y.s3*y.s3) <= 4.0f) && (ccount.s3 < " "maxIter);\n" " tmp = x;\n" " x = x*x + x0 - y*y;\n" " y = 2.0f*tmp*y + y0;\n" " ccount += stay;\n" " iter--;\n" " savx.s0 = (stay.s0 ? x.s0 : savx.s0);\n" " savx.s1 = (stay.s1 ? x.s1 : savx.s1);\n" " savx.s2 = (stay.s2 ? x.s2 : savx.s2);\n" " savx.s3 = (stay.s3 ? x.s3 : savx.s3);\n" " savy.s0 = (stay.s0 ? y.s0 : savy.s0);\n" " savy.s1 = (stay.s1 ? y.s1 : savy.s1);\n" " savy.s2 = (stay.s2 ? y.s2 : savy.s2);\n" " savy.s3 = (stay.s3 ? y.s3 : savy.s3);\n" " } while ((stay.s0 | stay.s1 | stay.s2 | stay.s3) && iter);\n" " }\n" " __global uint4 *vecOut = (__global uint4 *)out;\n" " vecOut[tid] = convert_uint4(ccount);\n" "}\n"; OCLPerfDeviceConcurrency::OCLPerfDeviceConcurrency() { cl_uint numPlatforms; cl_platform_id platform = NULL; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); if (num_devices > MAX_DEVICES) { num_devices = MAX_DEVICES; } delete platforms; } _numSubTests = num_devices; } OCLPerfDeviceConcurrency::~OCLPerfDeviceConcurrency() {} void OCLPerfDeviceConcurrency::setData(cl_mem buffer, unsigned int idx, unsigned int val) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_[idx], buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < width_; i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_[idx], buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_[idx]); } void OCLPerfDeviceConcurrency::checkData(cl_mem buffer, unsigned int idx) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_[idx], buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); totalIters = 0; for (unsigned int i = 0; i < width_; i++) { totalIters += data[i]; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_[idx], buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_[idx]); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfDeviceConcurrency::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; num_devices = 0; cl_device_id *devices = NULL; unsigned int i; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; for (i = 0; i < MAX_DEVICES; i++) { cmd_queue_[i] = 0; program_[i] = 0; kernel_[i] = 0; outBuffer_[i] = 0; } // Maximum iteration count // NOTE: Some kernels are unrolled 16 times, so make sure maxIter is divisible // by 16 NOTE: Can increase to get better peak performance numbers, but be // sure not to TDR slow ASICs! NOTE:. for warmup run we use maxIter = 256 and // then for the actual run we use maxIter = 8388608 * (engine_clock / 1000). maxIter = 256; // NOTE: Width needs to be divisible by 4 because the float_mandel_vec kernel // processes 4 pixels at once NOTE: Can increase to get better peak // performance numbers, but be sure not to TDR slow ASICs! width_ = 256; // We compute a square domain bufSize_ = width_ * sizeof(cl_uint); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); if (num_devices > MAX_DEVICES) { num_devices = MAX_DEVICES; } delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested devices */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); context_ = _wrapper->clCreateContext(NULL, num_devices, devices, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cur_devices = _openTest + 1; for (i = 0; i < cur_devices; i++) { cmd_queue_[i] = _wrapper->clCreateCommandQueue(context_, devices[i], 0, NULL); CHECK_RESULT(cmd_queue_[i] == 0, "clCreateCommandQueue failed"); outBuffer_[i] = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_[i] == 0, "clCreateBuffer(outBuffer) failed"); } const char *tmp; tmp = float_mandel_vec; for (i = 0; i < cur_devices; i++) { program_[i] = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_[i] == 0, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_[i], 1, &devices[i], "", NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo( program_[i], devices[i], CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error on device %d -> %s\n", i, log); CHECK_RESULT(0, "clBuildProgram failed"); } } for (i = 0; i < cur_devices; i++) { kernel_[i] = _wrapper->clCreateKernel(program_[i], "mandelbrot", &error_); CHECK_RESULT(kernel_[i] == 0, "clCreateKernel failed"); } coordIdx = _openTest % numCoords; float xStep = (float)(coords[coordIdx].width / (double)width_); float yStep = (float)(-coords[coordIdx].width / (double)width_); float xPos = (float)(coords[coordIdx].x - 0.5 * coords[coordIdx].width); float yPos = (float)(coords[coordIdx].y + 0.5 * coords[coordIdx].width); for (i = 0; i < cur_devices; i++) { error_ = _wrapper->clSetKernelArg(kernel_[i], 0, sizeof(cl_mem), (void *)&outBuffer_[i]); error_ = _wrapper->clSetKernelArg(kernel_[i], 1, sizeof(cl_uint), (void *)&width_); error_ = _wrapper->clSetKernelArg(kernel_[i], 2, sizeof(cl_float), (void *)&xPos); error_ = _wrapper->clSetKernelArg(kernel_[i], 3, sizeof(cl_float), (void *)&yPos); error_ = _wrapper->clSetKernelArg(kernel_[i], 4, sizeof(cl_float), (void *)&xStep); error_ = _wrapper->clSetKernelArg(kernel_[i], 5, sizeof(cl_float), (void *)&yStep); error_ = _wrapper->clSetKernelArg(kernel_[i], 6, sizeof(cl_uint), (void *)&maxIter); } for (i = 0; i < cur_devices; i++) { setData(outBuffer_[i], i, 0xdeadbeef); } cl_uint clkFrequency = 0; error_ = clGetDeviceInfo(devices[0], CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(clkFrequency), &clkFrequency, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); assert(clkFrequency > 0); maxIter = (unsigned int)(8388608 * ((float)clkFrequency / 1000)); maxIter = (maxIter + 15) & ~15; } void OCLPerfDeviceConcurrency::run(void) { int global = width_ >> 2; // We handle 4 pixels per thread int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; unsigned int i; // Warmup for (i = 0; i < cur_devices; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_[i], kernel_[i], 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } for (i = 0; i < cur_devices; i++) { _wrapper->clFlush(cmd_queue_[i]); } for (i = 0; i < cur_devices; i++) { _wrapper->clFinish(cmd_queue_[i]); } for (i = 0; i < cur_devices; i++) { error_ = _wrapper->clSetKernelArg(kernel_[i], 6, sizeof(cl_uint), (void *)&maxIter); } CPerfCounter timer; timer.Reset(); timer.Start(); for (i = 0; i < cur_devices; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_[i], kernel_[i], 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } for (i = 0; i < cur_devices; i++) { _wrapper->clFlush(cmd_queue_[i]); } for (i = 0; i < cur_devices; i++) { _wrapper->clFinish(cmd_queue_[i]); } timer.Stop(); double sec = timer.GetElapsedTime(); unsigned long long expected = (unsigned long long)width_ * (unsigned long long)maxIter; for (i = 0; i < cur_devices; i++) { checkData(outBuffer_[i], i); CHECK_RESULT(totalIters != expected, "Incorrect iteration count detected!"); } _perfInfo = (float)sec; char buf[128]; SNPRINTF(buf, sizeof(buf), "time for %2d devices (s) (%2d queues) ", cur_devices, cur_devices); testDescString = buf; } unsigned int OCLPerfDeviceConcurrency::close(void) { unsigned int i; for (i = 0; i < cur_devices; i++) { error_ = _wrapper->clReleaseMemObject(outBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } for (i = 0; i < cur_devices; i++) { error_ = _wrapper->clReleaseKernel(kernel_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel(kernel_) failed"); } for (i = 0; i < cur_devices; i++) { error_ = _wrapper->clReleaseProgram(program_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram(program_) failed"); } for (i = 0; i < cur_devices; i++) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceConcurrency.h000066400000000000000000000040701450307266000263570ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_Perf_DeviceConcurrency_H_ #define _OCL_Perf_DeviceConcurrency_H_ #include "OCLTestImp.h" class OCLPerfDeviceConcurrency : public OCLTestImp { public: OCLPerfDeviceConcurrency(); virtual ~OCLPerfDeviceConcurrency(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void setData(cl_mem buffer, unsigned int idx, unsigned int data); void checkData(cl_mem buffer, unsigned int idx); #define MAX_DEVICES 16 cl_context context_; cl_command_queue cmd_queue_[MAX_DEVICES]; cl_program program_[MAX_DEVICES]; cl_kernel kernel_[MAX_DEVICES]; cl_mem outBuffer_[MAX_DEVICES]; cl_int error_; cl_uint num_devices; cl_uint cur_devices; unsigned int width_; unsigned int bufSize_; unsigned int maxIter; unsigned int coordIdx; unsigned long long totalIters; }; #endif // _OCL_Perf_DeviceConcurrency_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceEnqueue.cpp000066400000000000000000000170341450307266000260330ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDeviceEnqueue.h" #include #include #include #include #include "CL/cl.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define KERNEL_CODE(...) #__VA_ARGS__ typedef struct { unsigned int threads; } testStruct; static testStruct testList[] = { {64}, {128}, {256}, {512}, {1024}, {2048}, {4096}, }; const static char* strKernel = {KERNEL_CODE( \n __kernel void childKernel(__global uint* buf) { int idx = get_global_id(0); if (idx < 0) { buf[idx] = 0; } } \n __kernel void parentKernel(__global uint* buf) { queue_t def_q = get_default_queue(); ndrange_t ndrange = ndrange_1D(64, 64); int gid = get_global_id(0); int enq_res = enqueue_kernel(def_q, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ childKernel(buf); }); } \n)}; OCLPerfDeviceEnqueue::OCLPerfDeviceEnqueue() { testListSize = sizeof(testList) / sizeof(testStruct); _numSubTests = 7 * testListSize; deviceQueue_ = NULL; failed_ = false; kernel2_ = NULL; } OCLPerfDeviceEnqueue::~OCLPerfDeviceEnqueue() {} void OCLPerfDeviceEnqueue::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { if (type_ == CL_DEVICE_TYPE_CPU) { return; } OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; threads = testList[testID_ % testListSize].threads; size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } delete strVersion; cl_uint maxDevQSize = 0; #if defined(CL_VERSION_2_0) error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, sizeof(cl_uint), &maxDevQSize, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); #endif program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "parentKernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); kernel2_ = _wrapper->clCreateKernel(program_, "childKernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_ALLOC_HOST_PTR, 2048, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); // Hardcoded for us if (testID_ >= testListSize) { queueSize = (1 << (testID_ / testListSize)) * 256 * 1024; queueSize = std::min(queueSize, maxDevQSize); threads *= (1 << (testID_ / testListSize - 1)); threads = std::min(threads, queueSize / 128); } else { queueSize = std::max((cl_uint)threads * 128, (cl_uint)16384); } #if defined(CL_VERSION_2_0) const cl_queue_properties cprops[] = { CL_QUEUE_PROPERTIES, static_cast(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE_DEFAULT | CL_QUEUE_ON_DEVICE), CL_QUEUE_SIZE, queueSize, 0}; deviceQueue_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[deviceId], cprops, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); #endif } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfDeviceEnqueue::run(void) { CPerfCounter timer; if (type_ == CL_DEVICE_TYPE_CPU) { return; } if (failed_) return; cl_mem buffer = buffers()[0]; size_t gws[1] = {threads}; size_t lws[1] = {64}; if (gws[0] >= 256) { lws[0] = 256; } error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); // Try to normalize the amount of work per test unsigned int repeats = (64 / threads) * 50; if (repeats == 0) repeats = 1; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < repeats; i++) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); } timer.Stop(); double sec = timer.GetElapsedTime(); _perfInfo = (float)(threads * repeats) / (float)(sec * 1000000.); char buf[256]; SNPRINTF(buf, sizeof(buf), "%7d threads spawning 64 threads, queue size %5dKB (Mdisp/s)", threads, queueSize / 1024); testDescString = buf; } unsigned int OCLPerfDeviceEnqueue::close(void) { // FIXME: Re-enable CPU test once bug 10143 is fixed. if (type_ == CL_DEVICE_TYPE_CPU) { return 0; } if (NULL != deviceQueue_) { _wrapper->clReleaseCommandQueue(deviceQueue_); } if (NULL != kernel2_) { _wrapper->clReleaseKernel(kernel2_); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceEnqueue.h000066400000000000000000000032761450307266000255030ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLPERF_DEVICE_ENQUEUE_H_ #define _OCLPERF_DEVICE_ENQUEUE_H_ #include "OCLTestImp.h" class OCLPerfDeviceEnqueue : public OCLTestImp { public: OCLPerfDeviceEnqueue(); virtual ~OCLPerfDeviceEnqueue(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_command_queue deviceQueue_; bool failed_; unsigned int testID_; cl_kernel kernel2_; unsigned int testListSize; unsigned int threads; cl_uint queueSize; }; #endif // _OCLPERF_DEVICE_ENQUEUE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceEnqueue2.cpp000066400000000000000000000204721450307266000261150ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDeviceEnqueue2.h" #include #include #include #include #include "CL/cl.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define KERNEL_CODE(...) #__VA_ARGS__ typedef struct { unsigned int threads; } testStruct; static testStruct testList[] = { {64}, {128}, {256}, {512}, {1024}, {2048}, {4096}, }; static unsigned int qsizeList[] = { 16, 32, 64, 128, 256, 512, }; static unsigned int levelList[] = { 1, 2, 4, 8, }; const static char* strKernel = {KERNEL_CODE( \n __kernel void childKernel(__global uint* buf, uint level) { if (level) { queue_t def_q = get_default_queue(); ndrange_t ndrange = ndrange_1D(64, 64); int gid = get_global_id(0); int lid = get_local_id(0); if (lid == 0) { int enq_res = enqueue_kernel(def_q, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ childKernel(buf, level - 1); }); } } else { int idx = get_global_id(0); if (idx < 0) { buf[idx] = 0; } } } \n __kernel void parentKernel(__global uint* buf, uint level) { queue_t def_q = get_default_queue(); ndrange_t ndrange = ndrange_1D(64, 64); int gid = get_global_id(0); if (level) { int enq_res = enqueue_kernel(def_q, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ childKernel(buf, level - 1); }); } } \n)}; OCLPerfDeviceEnqueue2::OCLPerfDeviceEnqueue2() { subTests_level = sizeof(levelList) / sizeof(unsigned int); subTests_qsize = (sizeof(qsizeList) / sizeof(unsigned int)); subTests_thread = sizeof(testList) / sizeof(testStruct); testListSize = subTests_thread; _numSubTests = subTests_level * subTests_qsize * subTests_thread; deviceQueue_ = NULL; failed_ = false; kernel2_ = NULL; level = 2; skip_ = false; } OCLPerfDeviceEnqueue2::~OCLPerfDeviceEnqueue2() {} void OCLPerfDeviceEnqueue2::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { if (type_ == CL_DEVICE_TYPE_CPU) { return; } OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; threads = testList[testID_ / (subTests_qsize * subTests_level)].threads; queueSize = qsizeList[(testID_ / subTests_level) % subTests_qsize] * 1024; level = levelList[testID_ % subTests_level]; size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } delete strVersion; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "parentKernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); kernel2_ = _wrapper->clCreateKernel(program_, "childKernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_ALLOC_HOST_PTR, 2048, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); #if defined(CL_VERSION_2_0) const cl_queue_properties cprops[] = { CL_QUEUE_PROPERTIES, static_cast(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE_DEFAULT | CL_QUEUE_ON_DEVICE), CL_QUEUE_SIZE, queueSize, 0}; deviceQueue_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[deviceId], cprops, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); #else skip_ = true; testDescString = "DeviceEnqueue NOT supported for < 2.0 builds. Test Skipped."; return; #endif } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfDeviceEnqueue2::run(void) { CPerfCounter timer; if (type_ == CL_DEVICE_TYPE_CPU) { return; } if (failed_) { return; } if (skip_) { return; } cl_mem buffer = buffers()[0]; size_t gws[1] = {threads}; size_t lws[1] = {64}; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(unsigned int), &level); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); // Try to normalize the amount of work per test // unsigned int repeats = (4096 / threads) * 100 ; unsigned int repeats = (4096 / threads) * 10; // unsigned int repeats = 100; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < repeats; i++) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); } timer.Stop(); double sec = timer.GetElapsedTime(); _perfInfo = (float)(threads * repeats * level) / (float)(sec * 1000000.); char buf[256]; SNPRINTF( buf, sizeof(buf), "%5d threads spawning 64 threads, queue size %3dKB (Mdisp/s), level=%2d", threads, queueSize / 1024, level); testDescString = buf; } unsigned int OCLPerfDeviceEnqueue2::close(void) { // FIXME: Re-enable CPU test once bug 10143 is fixed. if (type_ == CL_DEVICE_TYPE_CPU) { return 0; } if (deviceQueue_) { error_ = _wrapper->clReleaseCommandQueue(deviceQueue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (kernel2_) { error_ = _wrapper->clReleaseKernel(kernel2_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceEnqueue2.h000066400000000000000000000035411450307266000255600ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLPERF_DEVICE_ENQUEUE2_H_ #define _OCLPERF_DEVICE_ENQUEUE2_H_ #include "OCLTestImp.h" class OCLPerfDeviceEnqueue2 : public OCLTestImp { public: OCLPerfDeviceEnqueue2(); virtual ~OCLPerfDeviceEnqueue2(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_command_queue deviceQueue_; unsigned int testID_; cl_kernel kernel2_; unsigned int testListSize; unsigned int threads; cl_uint queueSize; unsigned int subTests_level; unsigned int subTests_qsize; unsigned int subTests_thread; unsigned int level; unsigned int lws_value; bool failed_; bool skip_; }; #endif // _OCLPERF_DEVICE_ENQUEUE2_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceEnqueueEvent.cpp000066400000000000000000000212561450307266000270360ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDeviceEnqueueEvent.h" #include #include #include #include #include "CL/cl.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define KERNEL_CODE(...) #__VA_ARGS__ typedef struct { unsigned int threads; } testStruct; static testStruct testList[] = { {64}, {128}, {256}, {512}, {1024}, {2048}, {4096}, }; static unsigned int qsizeList[] = { 16, 32, 64, 128, 256, 512, }; static unsigned int levelList[] = { 1, 2, 4, 8, }; const static char* strKernel = {KERNEL_CODE( \n __kernel void childKernel(__global uint* buf, uint level, clk_event_t wait_evt) { int idx = get_global_id(0); if (idx < 0) { buf[idx] = 0; } } \n __kernel void parentKernel(__global uint* buf, uint level) { if (level) { queue_t def_q = get_default_queue(); ndrange_t ndrange = ndrange_1D(64, 64); clk_event_t user_evt = create_user_event(); clk_event_t block_evt, wait_evt; wait_evt = user_evt; for (uint i = 0; i < level; i++) { int enq_res = enqueue_kernel(def_q, CLK_ENQUEUE_FLAGS_NO_WAIT, ndrange, 0, /*&user_evt*/ NULL, &block_evt, ^{ childKernel(buf, level - 1, block_evt); }); // wait_evt = block_evt; } if (is_valid_event(user_evt)) { set_user_event_status(user_evt, CL_COMPLETE); release_event(user_evt); } } else { int idx = get_global_id(0); if (idx < 0) { buf[idx] = 0; } } } \n)}; OCLPerfDeviceEnqueueEvent::OCLPerfDeviceEnqueueEvent() { subTests_level = sizeof(levelList) / sizeof(unsigned int); subTests_qsize = (sizeof(qsizeList) / sizeof(unsigned int)); subTests_thread = sizeof(testList) / sizeof(testStruct); testListSize = subTests_thread; //_numSubTests = 2*testListSize + subTests_level + subTests_qsize; _numSubTests = subTests_level * subTests_qsize * subTests_thread; deviceQueue_ = NULL; failed_ = false; skip_ = false; kernel2_ = NULL; level = 2; } OCLPerfDeviceEnqueueEvent::~OCLPerfDeviceEnqueueEvent() {} void OCLPerfDeviceEnqueueEvent::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { if (type_ == CL_DEVICE_TYPE_CPU) { return; } OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; threads = testList[testID_ / (subTests_qsize * subTests_level)].threads; queueSize = qsizeList[(testID_ / subTests_level) % subTests_qsize] * 1024; level = levelList[testID_ % subTests_level]; lws_value = 64; size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } delete strVersion; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "parentKernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); kernel2_ = _wrapper->clCreateKernel(program_, "childKernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_ALLOC_HOST_PTR, 2048, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); #if defined(CL_VERSION_2_0) const cl_queue_properties cprops[] = { CL_QUEUE_PROPERTIES, static_cast(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE_DEFAULT | CL_QUEUE_ON_DEVICE), CL_QUEUE_SIZE, queueSize, 0}; deviceQueue_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[deviceId], cprops, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); #else skip_ = true; testDescString = "DeviceEnqueue NOT supported for < 2.0 builds. Test Skipped."; return; #endif } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfDeviceEnqueueEvent::run(void) { CPerfCounter timer; if (type_ == CL_DEVICE_TYPE_CPU) { return; } if (failed_) { return; } if (skip_) { return; } cl_mem buffer = buffers()[0]; size_t gws[1] = {threads}; size_t lws[1] = {lws_value}; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(unsigned int), &level); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); // Try to normalize the amount of work per test // unsigned int repeats = (4096 / threads) * 100 ; unsigned int repeats = (4096 / threads) * 10; // unsigned int repeats = 100; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < repeats; i++) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); } timer.Stop(); double sec = timer.GetElapsedTime(); _perfInfo = (float)(threads * repeats * level) / (float)(sec * 1000000.); char buf[256]; SNPRINTF( buf, sizeof(buf), "%5d threads spawning %2d threads, queue size %3dKB (Mdisp/s), level=%2d", threads, lws_value, queueSize / 1024, level); testDescString = buf; } unsigned int OCLPerfDeviceEnqueueEvent::close(void) { // FIXME: Re-enable CPU test once bug 10143 is fixed. if (type_ == CL_DEVICE_TYPE_CPU) { return 0; } if (deviceQueue_) { error_ = _wrapper->clReleaseCommandQueue(deviceQueue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (kernel2_) { error_ = _wrapper->clReleaseKernel(kernel2_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceEnqueueEvent.h000066400000000000000000000035741450307266000265060ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLPERF_DEVICE_ENQUEUE_EVENT_H_ #define _OCLPERF_DEVICE_ENQUEUE_EVENT_H_ #include "OCLTestImp.h" class OCLPerfDeviceEnqueueEvent : public OCLTestImp { public: OCLPerfDeviceEnqueueEvent(); virtual ~OCLPerfDeviceEnqueueEvent(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_command_queue deviceQueue_; unsigned int testID_; cl_kernel kernel2_; unsigned int testListSize; unsigned int threads; cl_uint queueSize; unsigned int subTests_level; unsigned int subTests_qsize; unsigned int subTests_thread; unsigned int level; unsigned int lws_value; bool failed_; bool skip_; }; #endif // _OCLPERF_DEVICE_ENQUEUE_EVENT_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceEnqueueSier.cpp000066400000000000000000000172431450307266000266600ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDeviceEnqueueSier.h" #include #include #include #include #include #include "CL/cl.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define KERNEL_CODE(...) #__VA_ARGS__ typedef struct { unsigned int threads; } testStruct; static unsigned int sizeList[] = { 81, 243, 729, 2187, 6561, 19683, 59049, }; const static char* strKernel = {KERNEL_CODE( \n __kernel void parentKernel(__global uint* buf, int width, int offsetx, int offsety) { int x = get_global_id(0); int y = get_global_id(1); queue_t q = get_default_queue(); int one_third = get_global_size(0) / 3; int two_thirds = 2 * one_third; if (x >= one_third && x < two_thirds && y >= one_third && y < two_thirds) { int idx = get_global_id(0); if (idx < 0) { buf[idx] = 0; } } else { if (one_third > 1 && x % one_third == 0 && y % one_third == 0) { const size_t grid[2] = {one_third, one_third}; enqueue_kernel(q, 0, ndrange_2D(grid), ^{ parentKernel(buf, width, x + offsetx, y + offsety); }); } } } \n)}; OCLPerfDeviceEnqueueSier::OCLPerfDeviceEnqueueSier() { _numSubTests = sizeof(sizeList) / sizeof(unsigned int); deviceQueue_ = NULL; failed_ = false; skip_ = false; } OCLPerfDeviceEnqueueSier::~OCLPerfDeviceEnqueueSier() {} void OCLPerfDeviceEnqueueSier::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { if (type_ == CL_DEVICE_TYPE_CPU) { return; } OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } delete strVersion; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "parentKernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_ALLOC_HOST_PTR, 2048, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); queueSize = 512 * 1024; image_size = sizeList[testID_]; #if defined(CL_VERSION_2_0) const cl_queue_properties cprops[] = { CL_QUEUE_PROPERTIES, static_cast(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE_DEFAULT | CL_QUEUE_ON_DEVICE), CL_QUEUE_SIZE, queueSize, 0}; deviceQueue_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[deviceId], cprops, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); #else skip_ = true; testDescString = "DeviceEnqueue NOT supported for < 2.0 builds. Test Skipped."; return; #endif } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfDeviceEnqueueSier::run(void) { CPerfCounter timer; if (type_ == CL_DEVICE_TYPE_CPU) { return; } if (failed_) { return; } if (skip_) { return; } cl_mem buffer = buffers()[0]; size_t gws[1] = {1}; size_t lws[1] = {0}; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); int width = image_size, offsetx = 0, offsety = 0; error_ |= _wrapper->clSetKernelArg(kernel_, 1, sizeof(int), (void*)&width); error_ |= _wrapper->clSetKernelArg(kernel_, 2, sizeof(int), (void*)&offsetx); error_ |= _wrapper->clSetKernelArg(kernel_, 3, sizeof(int), (void*)&offsety); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, 0, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); size_t global_work_size[2] = {image_size, image_size}; // Try to normalize the amount of work per test unsigned int repeats = 100; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < repeats; i++) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, global_work_size, 0, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); } timer.Stop(); double sec = timer.GetElapsedTime(); unsigned int numOfKernels = (int)pow(8.0, log(image_size) / log(3) - 1); _perfInfo = (float)(numOfKernels * repeats) / (float)(sec * 1000000.); char buf[256]; SNPRINTF(buf, sizeof(buf), "image_size = %5d, queue size %3dKB (Mdisp/s)", image_size, queueSize / 1024); testDescString = buf; } unsigned int OCLPerfDeviceEnqueueSier::close(void) { // FIXME: Re-enable CPU test once bug 10143 is fixed. if (type_ == CL_DEVICE_TYPE_CPU) { return 0; } if (deviceQueue_) { error_ = _wrapper->clReleaseCommandQueue(deviceQueue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDeviceEnqueueSier.h000066400000000000000000000033671450307266000263270ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLPERF_DEVICE_ENQUEUE_SIER_H_ #define _OCLPERF_DEVICE_ENQUEUE_SIER_H_ #include "OCLTestImp.h" class OCLPerfDeviceEnqueueSier : public OCLTestImp { public: OCLPerfDeviceEnqueueSier(); virtual ~OCLPerfDeviceEnqueueSier(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_command_queue deviceQueue_; unsigned int testID_; unsigned int testListSize; // unsigned int threads; cl_uint queueSize; unsigned int image_size; bool failed_; bool skip_; }; #endif // _OCLPERF_DEVICE_ENQUEUE_SIER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDispatchSpeed.cpp000066400000000000000000000311301450307266000260150ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDispatchSpeed.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define CHAR_BUF_SIZE 512 typedef struct { unsigned int iterations; int flushEvery; } testStruct; testStruct testList[] = { {1, -1}, {1, -1}, {10, 1}, {10, -1}, {100, 1}, {100, 10}, {100, -1}, {1000, 1}, {1000, 10}, {1000, 100}, {1000, -1}, {10000, 1}, {10000, 10}, {10000, 100}, {10000, 1000}, {10000, -1}, {100000, 1}, {100000, 10}, {100000, 100}, {100000, 1000}, {100000, 10000}, {100000, -1}, }; unsigned int mapTestList[] = {1, 1, 10, 100, 1000, 10000, 100000}; void OCLPerfDispatchSpeed::genShader(void) { shader_.clear(); shader_ += "__kernel void _dispatchSpeed(__global float *outBuf)\n" "{\n" " int i = (int) get_global_id(0);\n" " if (i < 0)\n" " outBuf[i] = 0.0f;\n" "}\n"; } OCLPerfDispatchSpeed::OCLPerfDispatchSpeed() { testListSize = sizeof(testList) / sizeof(testStruct); _numSubTests = 2 * 2 * testListSize; } OCLPerfDispatchSpeed::~OCLPerfDispatchSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfDispatchSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test % testListSize; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; outBuffer_ = 0; sleep = false; doWarmup = false; if ((test / testListSize) % 2) { doWarmup = true; } if (test >= (testListSize * 2)) { sleep = true; } bufSize_ = 64 * sizeof(cl_float); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } else { CHECK_RESULT(numPlatforms == 0, "No platforms available!"); } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); genShader(); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &device, "", NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_dispatchSpeed", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); } void OCLPerfDispatchSpeed::run(void) { int global = bufSize_ / sizeof(cl_float); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; cl_event event; cl_int eventStatus; if (doWarmup) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, &event); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); } timer.Reset(); timer.Start(); for (unsigned int i = 0; i < testList[_openTest].iterations; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, &event); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); if ((testList[_openTest].flushEvery > 0) && (((i + 1) % testList[_openTest].flushEvery) == 0)) { if (sleep) { _wrapper->clFinish(cmd_queue_); } else { _wrapper->clFlush(cmd_queue_); error_ = _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, NULL); while (eventStatus > 0) { error_ = _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, NULL); } } } if (i != (testList[_openTest].iterations - 1)) { _wrapper->clReleaseEvent(event); } } if (sleep) { _wrapper->clFinish(cmd_queue_); } else { _wrapper->clFlush(cmd_queue_); error_ = _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, NULL); while (eventStatus > 0) { error_ = _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, NULL); } } _wrapper->clReleaseEvent(event); timer.Stop(); double sec = timer.GetElapsedTime(); // microseconds per launch double perf = (1000000.f * sec / testList[_openTest].iterations); const char *waitType; const char *extraChar; const char *n; const char *warmup; if (sleep) { waitType = "sleep"; extraChar = ""; n = ""; } else { waitType = "spin"; n = "n"; extraChar = " "; } if (doWarmup) { warmup = "warmup"; } else { warmup = ""; } _perfInfo = (float)perf; char buf[256]; if (testList[_openTest].flushEvery > 0) { SNPRINTF(buf, sizeof(buf), " %7d dispatches %s%sing every %5d %6s (us/disp)", testList[_openTest].iterations, waitType, n, testList[_openTest].flushEvery, warmup); } else { SNPRINTF(buf, sizeof(buf), " %7d dispatches (%s%s) %6s (us/disp)", testList[_openTest].iterations, waitType, extraChar, warmup); } testDescString = buf; } unsigned int OCLPerfDispatchSpeed::close(void) { if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } OCLPerfMapDispatchSpeed::OCLPerfMapDispatchSpeed() { testListSize = sizeof(mapTestList) / sizeof(unsigned int); _numSubTests = 2 * testListSize; } void OCLPerfMapDispatchSpeed::run(void) { cl_mem outBuffer; outBuffer = _wrapper->clCreateBuffer(context_, CL_MEM_ALLOC_HOST_PTR, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer); int global = bufSize_ / sizeof(cl_float); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; if (doWarmup) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); } timer.Reset(); timer.Start(); void *mem; for (unsigned int i = 0; i < mapTestList[_openTest]; i++) { mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer, CL_TRUE, CL_MAP_WRITE_INVALIDATE_REGION, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // microseconds per launch double perf = (1000000.f * sec / mapTestList[_openTest]); const char *warmup; if (doWarmup) { warmup = "warmup"; } else { warmup = ""; } _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " %7d maps and dispatches %6s (us/disp)", mapTestList[_openTest], warmup); testDescString = buf; _wrapper->clReleaseMemObject(outBuffer); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDispatchSpeed.h000066400000000000000000000036131450307266000254670ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DispatchSpeed_H_ #define _OCL_DispatchSpeed_H_ #include "OCLTestImp.h" class OCLPerfDispatchSpeed : public OCLTestImp { public: OCLPerfDispatchSpeed(); virtual ~OCLPerfDispatchSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(void); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem outBuffer_; cl_int error_; bool doWarmup; unsigned int bufSize_; bool sleep; unsigned int testListSize; }; class OCLPerfMapDispatchSpeed : public OCLPerfDispatchSpeed { public: OCLPerfMapDispatchSpeed(); virtual void run(void); }; #endif // _OCL_DispatchSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDoubleDMA.cpp000066400000000000000000000356241450307266000250450ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDoubleDMA.h" #include #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" const size_t blockX = 256; const size_t blockY = 256; const size_t blockZ = 512; const size_t chunk = 16; const size_t size_S = blockX * blockY * blockZ * sizeof(cl_float4); const size_t size_s = blockX * blockY * chunk * sizeof(cl_float4); static const int WindowWidth = 80; const size_t MaxQueues = 3; bool profEnable = false; static const char* strKernel = "__kernel void dummy(__global float4* out) \n" "{ \n" " uint id = get_global_id(0); \n" " float4 value = (float4)(1.0f, 2.0f, 3.0f, 4.0f); \n" " uint factorial = 1; \n" " for (uint i = 1; i < (id / 0x400); ++i)\n" " { \n" " factorial *= i; \n" " } \n" " out[id] = value * factorial; \n" "} \n"; class ProfileQueue { public: enum Operation { Write = 0, Execute, Read, Total }; static const char* OperationName[Total]; static const char StartCommand[Total]; static const char ExecCommand[Total]; ProfileQueue() {} ~ProfileQueue() { for (size_t op = 0; op < Total; ++op) { for (size_t idx = 0; idx < events_[op].size(); ++idx) { clReleaseEvent(events_[op][idx]); } } } void addEvent(Operation op, cl_event event) { events_[op].push_back(event); } void findMinMax(cl_long* min, cl_long* max) { // Find time min/max ranges for the frame scaling for (size_t op = 0; (op < ProfileQueue::Total); ++op) { cl_long time; if (events_[op].size() == 0) continue; clGetEventProfilingInfo(events_[op][0], CL_PROFILING_COMMAND_START, sizeof(cl_long), &time, NULL); if (0 == *min) { *min = time; } else { *min = std::min(*min, time); } clGetEventProfilingInfo(events_[op][events_[op].size() - 1], CL_PROFILING_COMMAND_END, sizeof(cl_long), &time, NULL); if (0 == *max) { *max = time; } else { *max = std::max(*max, time); } } } void display(cl_long start, cl_long finish) { std::string graph; graph.resize(WindowWidth + 1); graph[WindowWidth] = '\x0'; cl_long timeFrame = finish - start; cl_long interval = timeFrame / WindowWidth; // Find time min/max ranges for the frame scaling for (size_t op = 0; (op < Total); ++op) { if (events_[op].size() == 0) continue; cl_long timeStart, timeEnd; int begin = 0, end = 0; for (size_t idx = 0; idx < events_[op].size(); ++idx) { bool cutStart = false; clGetEventProfilingInfo(events_[op][idx], CL_PROFILING_COMMAND_START, sizeof(cl_long), &timeStart, NULL); clGetEventProfilingInfo(events_[op][idx], CL_PROFILING_COMMAND_END, sizeof(cl_long), &timeEnd, NULL); // Continue if out of the frame scope if (timeStart >= finish) continue; if (timeEnd <= start) continue; if (timeStart <= start) { timeStart = start; cutStart = true; } if (timeEnd >= finish) { timeEnd = finish; } // Readjust time to the frame timeStart -= start; timeEnd -= start; timeStart = static_cast( floor(static_cast(timeStart) / interval + 0.5f)); timeEnd = static_cast( floor(static_cast(timeEnd) / interval + 0.5f)); begin = static_cast(timeStart); // Idle from end to begin for (int c = end; c < begin; ++c) { graph[c] = '-'; } end = static_cast(timeEnd); for (int c = begin; c < end; ++c) { if ((c == begin) && !cutStart) { graph[c] = StartCommand[op]; } else { graph[c] = ExecCommand[op]; } } if ((begin == end) && (end < WindowWidth)) { graph[begin] = '+'; } } if (end < WindowWidth) { for (int c = end; c < WindowWidth; ++c) { graph[c] = '-'; } } printf("%s\n", graph.c_str()); } } private: // Profiling events std::vector events_[Total]; }; const char* ProfileQueue::OperationName[Total] = { "BufferWrite", "KernelExecution", "BufferRead"}; const char ProfileQueue::StartCommand[Total] = {'W', 'X', 'R'}; const char ProfileQueue::ExecCommand[Total] = {'>', '#', '<'}; class Profile { public: Profile(bool profEna, int numQueues) : profileEna_(profEna), numQueues_(numQueues), min_(0), max_(0), execTime_(0) {} ~Profile() {} void addEvent(int queue, ProfileQueue::Operation op, cl_event event) { if (profileEna_) { profQueue[queue].addEvent(op, event); } } cl_long findExecTime() { if (execTime_ != 0) return execTime_; for (int q = 0; q < numQueues_; ++q) { profQueue[q].findMinMax(&min_, &max_); } execTime_ = max_ - min_; return execTime_; } void display(cl_long start, cl_long finish) { if (!profileEna_) return; printf("\n ----------- Time frame %.3f (us), scale 1:%.0f\n", (float)(finish - start) / 1000, (float)(finish - start) / (1000 * WindowWidth)); for (size_t op = 0; (op < ProfileQueue::Total); ++op) { printf("%s - %c%c; ", ProfileQueue::OperationName[op], ProfileQueue::StartCommand[op], ProfileQueue::ExecCommand[op]); } printf("\n"); for (int q = 0; q < numQueues_; ++q) { printf("CommandQueue #%d\n", q); profQueue[q].display(min_ + start, min_ + finish); } } private: bool profileEna_; int numQueues_; //!< Total number of queues cl_long min_; //!< Min HW timestamp cl_long max_; //!< Max HW timestamp cl_long execTime_; //!< Profile time ProfileQueue profQueue[MaxQueues]; }; OCLPerfDoubleDMA::OCLPerfDoubleDMA() { _numSubTests = 2 * MaxQueues * 2; failed_ = false; } OCLPerfDoubleDMA::~OCLPerfDoubleDMA() {} void OCLPerfDoubleDMA::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { _deviceId = deviceId; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); test_ = test; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "dummy", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); size_t bufSize = size_s; cl_mem buffer; if (test_ >= (2 * MaxQueues)) { profEnable = true; } test_ %= 2 * MaxQueues; size_t numBufs = (test_ % MaxQueues) + 1; for (size_t b = 0; b < numBufs; ++b) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, bufSize, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, size_S, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfDoubleDMA::run(void) { if (failed_) { return; } CPerfCounter timer; const int numQueues = (test_ % MaxQueues) + 1; const bool useKernel = ((test_ / MaxQueues) > 0); const int numBufs = numQueues; Profile profile(profEnable, numQueues); std::vector cmdQueues(numQueues); int q; cl_command_queue_properties qProp = (profEnable) ? CL_QUEUE_PROFILING_ENABLE : 0; for (q = 0; q < numQueues; ++q) { cl_command_queue cmdQueue = _wrapper->clCreateCommandQueue( context_, devices_[_deviceId], qProp, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed"); cmdQueues[q] = cmdQueue; } float* Data_s = (float*)_wrapper->clEnqueueMapBuffer( cmdQueues[0], buffers_[numBufs], CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, size_S, 0, NULL, NULL, &error_); size_t gws[1] = {size_s / (4 * sizeof(float))}; size_t lws[1] = {256}; // Warm-up for (q = 0; q < numQueues; ++q) { error_ |= _wrapper->clEnqueueWriteBuffer(cmdQueues[q], buffers_[q], CL_FALSE, 0, size_s, (char*)Data_s, 0, NULL, NULL); error_ |= _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[q]); error_ |= _wrapper->clEnqueueNDRangeKernel(cmdQueues[q], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); error_ |= _wrapper->clEnqueueReadBuffer(cmdQueues[q], buffers_[q], CL_FALSE, 0, size_s, (char*)Data_s, 0, NULL, NULL); error_ |= _wrapper->clFinish(cmdQueues[q]); } CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "Execution failed"); size_t s_done = 0; cl_event r[MaxQueues] = {0}, w[MaxQueues] = {0}, x[MaxQueues] = {0}; /*---------- pass2: copy Data_s to and from GPU Buffers ----------*/ s_done = 0; timer.Reset(); timer.Start(); int idx = numBufs - 1; // Start from the last so read/write won't go to the same DMA when kernel is // executed q = numQueues - 1; size_t iter = 0; while (1) { if (0 == r[idx]) { error_ |= _wrapper->clEnqueueWriteBuffer( cmdQueues[q], buffers_[idx], CL_FALSE, 0, size_s, (char*)Data_s + s_done, 0, NULL, &w[idx]); } else { error_ |= _wrapper->clEnqueueWriteBuffer( cmdQueues[q], buffers_[idx], CL_FALSE, 0, size_s, (char*)Data_s + s_done, 1, &r[idx], &w[idx]); if (!profEnable) { error_ |= _wrapper->clReleaseEvent(r[idx]); } } _wrapper->clFlush(cmdQueues[q]); profile.addEvent(q, ProfileQueue::Write, w[idx]); if (useKernel) { // Change the queue ++q %= numQueues; // Implicit flush of DMA engine on kernel start, because memory dependency error_ |= _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[idx]); error_ |= _wrapper->clEnqueueNDRangeKernel(cmdQueues[q], kernel_, 1, NULL, gws, lws, 1, &w[idx], &x[idx]); if (!profEnable) { error_ |= _wrapper->clReleaseEvent(w[idx]); } profile.addEvent(q, ProfileQueue::Execute, x[idx]); } _wrapper->clFlush(cmdQueues[q]); // Change the queue ++q %= numQueues; error_ |= _wrapper->clEnqueueReadBuffer( cmdQueues[q], buffers_[idx], CL_FALSE, 0, size_s, (char*)Data_s + s_done, 1, (useKernel) ? &x[idx] : &w[idx], &r[idx]); if (!profEnable) { error_ |= _wrapper->clReleaseEvent((useKernel) ? x[idx] : w[idx]); } profile.addEvent(q, ProfileQueue::Read, r[idx]); _wrapper->clFlush(cmdQueues[q]); if ((s_done += size_s) >= size_S) { if (!profEnable) { error_ |= _wrapper->clReleaseEvent(r[idx]); } break; } ++iter; ++idx %= numBufs; ++q %= numQueues; } for (q = 0; q < numQueues; ++q) { error_ |= _wrapper->clFinish(cmdQueues[q]); } timer.Stop(); error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues[0], buffers_[numBufs], Data_s, 0, NULL, NULL); error_ |= _wrapper->clFinish(cmdQueues[0]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "Execution failed"); cl_long gpuTimeFrame = profile.findExecTime(); cl_long oneIter = gpuTimeFrame / iter; // Display 4 iterations in the middle cl_long startFrame = oneIter * (iter / 2 - 2); cl_long finishFrame = oneIter * (iter / 2 + 2); profile.display(startFrame, finishFrame); for (q = 0; q < numQueues; ++q) { error_ = _wrapper->clReleaseCommandQueue(cmdQueues[q]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } double GBytes = (double)(2 * size_S) / (double)(1000 * 1000 * 1000); _perfInfo = static_cast(GBytes / timer.GetElapsedTime()); std::stringstream stream; if (useKernel) { stream << "Write/Kernel/Read operation "; } else { stream << "Write/Read operation "; } stream << numQueues << " queues; profiling " << ((profEnable) ? "enabled" : "disabled") << " [GB/s]"; stream.flags(std::ios::right | std::ios::showbase); testDescString = stream.str(); } unsigned int OCLPerfDoubleDMA::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDoubleDMA.h000066400000000000000000000030461450307266000245030ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_DOUBLE_DMA_H_ #define _OCL_PERF_DOUBLE_DMA_H_ #include "OCLTestImp.h" class OCLPerfDoubleDMA : public OCLTestImp { public: OCLPerfDoubleDMA(); virtual ~OCLPerfDoubleDMA(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int test_; }; #endif // _OCL_PERF_DOUBLE_DMA_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDoubleDMASeq.cpp000066400000000000000000000243471450307266000255160ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfDoubleDMASeq.h" #include #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" #ifdef _WIN32 const size_t blockX = 128; const size_t blockY = 128; const size_t blockZ = 256; #else const size_t blockX = 256; const size_t blockY = 256; const size_t blockZ = 512; #endif const size_t chunk = 16; const size_t size_S = blockX * blockY * blockZ * sizeof(cl_float4); const size_t size_s = blockX * blockY * chunk * sizeof(cl_float4); static const int WindowWidth = 80; const size_t MaxQueues = 3; static const char *strKernel = "__kernel void dummy(__global float4* out) \n" "{ \n" " uint id = get_global_id(0); \n" " float4 value = (float4)(1.0f, 2.0f, 3.0f, 4.0f); \n" " uint factorial = 1; \n" " for (uint i = 1; i < (id / 0x400); ++i)\n" " { \n" " factorial *= i; \n" " } \n" " out[id] = value * factorial; \n" "} \n"; OCLPerfDoubleDMASeq::OCLPerfDoubleDMASeq() { _numSubTests = MaxQueues * 2; failed_ = false; } OCLPerfDoubleDMASeq::~OCLPerfDoubleDMASeq() {} void OCLPerfDoubleDMASeq::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { _deviceId = deviceId; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); test_ = test; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "dummy", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); size_t bufSize = size_s; cl_mem buffer; test_ %= MaxQueues; events_ = ((test / MaxQueues) == 0) ? false : true; size_t numBufs = (test_ % MaxQueues) + 1; for (size_t b = 0; b < numBufs; ++b) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, bufSize, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, size_S, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfDoubleDMASeq::run(void) { if (failed_) { return; } CPerfCounter timer; const int numQueues = (test_ % MaxQueues) + 1; const int numBufs = numQueues; std::vector cmdQueues(numQueues); int q; cl_command_queue_properties qProp = 0; for (q = 0; q < numQueues; ++q) { cl_command_queue cmdQueue = _wrapper->clCreateCommandQueue( context_, devices_[_deviceId], qProp, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed"); cmdQueues[q] = cmdQueue; } CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "Execution failed"); float *Data_s = (float *)_wrapper->clEnqueueMapBuffer( cmdQueues[0], buffers_[numBufs], CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, size_S, 0, NULL, NULL, &error_); size_t gws[1] = {size_s / (4 * sizeof(float))}; size_t lws[1] = {256}; // Warm-up for (q = 0; q < numQueues; ++q) { error_ |= _wrapper->clEnqueueWriteBuffer(cmdQueues[q], buffers_[q], CL_FALSE, 0, size_s, (char *)Data_s, 0, NULL, NULL); error_ |= _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&buffers_[q]); error_ |= _wrapper->clEnqueueNDRangeKernel(cmdQueues[q], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); error_ |= _wrapper->clEnqueueReadBuffer(cmdQueues[q], buffers_[q], CL_FALSE, 0, size_s, (char *)Data_s, 0, NULL, NULL); error_ |= _wrapper->clFinish(cmdQueues[q]); } CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "Execution failed"); size_t s_done = 0; cl_event x[MaxQueues] = {0}; /*---------- pass2: copy Data_s to and from GPU Buffers ----------*/ s_done = 0; timer.Reset(); timer.Start(); int idx = numBufs - 1; // Start from the last so read/write won't go to the same DMA when kernel is // executed q = numQueues - 1; size_t iter = 0; if (events_) { while (1) { error_ |= _wrapper->clEnqueueWriteBuffer( cmdQueues[q], buffers_[idx], CL_FALSE, 0, size_s, (char *)Data_s + s_done, 0, NULL, NULL); // Implicit flush of DMA engine on kernel start, because memory dependency error_ |= _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&buffers_[idx]); int prevQ; if (q == 0) { prevQ = numQueues - 1; } else { prevQ = q - 1; } if ((x[prevQ] != NULL) && (numQueues != 1)) { error_ |= _wrapper->clEnqueueNDRangeKernel( cmdQueues[q], kernel_, 1, NULL, gws, lws, 1, &x[prevQ], &x[q]); error_ |= _wrapper->clReleaseEvent(x[prevQ]); x[prevQ] = NULL; } else { error_ |= _wrapper->clEnqueueNDRangeKernel( cmdQueues[q], kernel_, 1, NULL, gws, lws, 0, NULL, &x[q]); if (numQueues == 1) { error_ |= _wrapper->clReleaseEvent(x[q]); x[q] = NULL; } } error_ |= _wrapper->clFlush(cmdQueues[q]); // Change the queue error_ |= _wrapper->clEnqueueReadBuffer( cmdQueues[q], buffers_[idx], CL_FALSE, 0, size_s, (char *)Data_s + s_done, 0, NULL, NULL); if ((s_done += size_s) >= size_S) { break; } error_ |= _wrapper->clFlush(cmdQueues[q]); ++iter; ++idx %= numBufs; ++q %= numQueues; } for (q = 0; q < numQueues; ++q) { if (x[q] != NULL) { error_ |= _wrapper->clReleaseEvent(x[q]); } } } else { while (1) { error_ |= _wrapper->clEnqueueWriteBuffer( cmdQueues[q], buffers_[idx], CL_FALSE, 0, size_s, (char *)Data_s + s_done, 0, NULL, NULL); // Implicit flush of DMA engine on kernel start, because memory dependency error_ |= _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&buffers_[idx]); error_ |= _wrapper->clEnqueueNDRangeKernel(cmdQueues[q], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); // Change the queue error_ |= _wrapper->clEnqueueReadBuffer( cmdQueues[q], buffers_[idx], CL_FALSE, 0, size_s, (char *)Data_s + s_done, 0, NULL, NULL); if ((s_done += size_s) >= size_S) { break; } error_ |= _wrapper->clFlush(cmdQueues[q]); ++iter; ++idx %= numBufs; ++q %= numQueues; } } for (q = 0; q < numQueues; ++q) { error_ |= _wrapper->clFinish(cmdQueues[q]); } timer.Stop(); error_ |= _wrapper->clEnqueueUnmapMemObject(cmdQueues[0], buffers_[numBufs], Data_s, 0, NULL, NULL); error_ |= _wrapper->clFinish(cmdQueues[0]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "Execution failed"); for (q = 0; q < numQueues; ++q) { error_ = _wrapper->clReleaseCommandQueue(cmdQueues[q]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } double GBytes = (double)(2 * size_S) / (double)(1000 * 1000 * 1000); _perfInfo = static_cast(GBytes / timer.GetElapsedTime()); std::stringstream stream; stream << "Write/Kernel/Read operation "; stream << numQueues << " queues "; if (events_) { stream << " (use events) "; } stream << " [GB/s]"; stream.flags(std::ios::right | std::ios::showbase); testDescString = stream.str(); } unsigned int OCLPerfDoubleDMASeq::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfDoubleDMASeq.h000066400000000000000000000031131450307266000251470ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_DOUBLE_DMA_SEQ_H_ #define _OCL_PERF_DOUBLE_DMA_SEQ_H_ #include "OCLTestImp.h" class OCLPerfDoubleDMASeq : public OCLTestImp { public: OCLPerfDoubleDMASeq(); virtual ~OCLPerfDoubleDMASeq(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int test_; bool events_; }; #endif // _OCL_PERF_DOUBLE_DMA_SEQ_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfFillBuffer.cpp000066400000000000000000000072311450307266000253220ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfFillBuffer.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif static size_t typeSizeList[] = { 1, // sizeof(cl_uchar) 2, 4, 8, 16, 32, 64, 128, // sizeof(cl_ulong16) }; static unsigned int eleNumList[] = { 0x0020000, 0x0080000, 0x0200000, 0x0800000, 0x2000000, }; OCLPerfFillBuffer::OCLPerfFillBuffer() { num_typeSize_ = sizeof(typeSizeList) / sizeof(size_t); num_elements_ = sizeof(eleNumList) / sizeof(unsigned int); _numSubTests = num_elements_ * num_typeSize_; failed_ = false; skip_ = false; } OCLPerfFillBuffer::~OCLPerfFillBuffer() {} void OCLPerfFillBuffer::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testTypeSize_ = typeSizeList[(test / num_elements_) % num_typeSize_]; testNumEle_ = eleNumList[test % num_elements_]; bufSize_ = testNumEle_ * 4; buffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, bufSize_, 0, &error_); CHECK_RESULT(buffer_ == 0, "clCreateBuffer(buffer_) failed"); return; } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfFillBuffer::run(void) { CPerfCounter timer; size_t iter = 100; void *data = malloc(testTypeSize_); timer.Reset(); timer.Start(); for (size_t i = 0; i < iter; ++i) { error_ = clEnqueueFillBuffer(cmdQueues_[_deviceId], buffer_, data, testTypeSize_, 0, bufSize_, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueFillBuffer() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); char buf[256]; SNPRINTF(buf, sizeof(buf), "FillBuffer (GB/s) for %6d KB, typeSize:%3d", (int)bufSize_ / 1024, (int)testTypeSize_); testDescString = buf; double sec = timer.GetElapsedTime(); _perfInfo = static_cast((bufSize_ * iter * (double)(1e-09)) / sec); } unsigned int OCLPerfFillBuffer::close(void) { if (buffer_) { error_ = _wrapper->clReleaseMemObject(buffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(buffer) failed"); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfFillBuffer.h000066400000000000000000000032771450307266000247750ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_FILL_BUFFER_H_ #define _OCL_PERF_FILL_BUFFER_H_ #include "OCLTestImp.h" class OCLPerfFillBuffer : public OCLTestImp { public: OCLPerfFillBuffer(); virtual ~OCLPerfFillBuffer(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_mem buffer_; unsigned int bufSize_; unsigned int num_typeSize_; unsigned int num_elements_; size_t testTypeSize_; unsigned int testNumEle_; bool failed_; bool skip_; }; #endif // _OCL_PERF_FILL_BUFFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfFillImage.cpp000066400000000000000000000070051450307266000251320ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfFillImage.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif static unsigned int sizeList[] = { 256, 512, 1024, 2048, 4096, 8192, }; OCLPerfFillImage::OCLPerfFillImage() { num_sizes_ = sizeof(sizeList) / sizeof(unsigned int); _numSubTests = num_sizes_; failed_ = false; skip_ = false; } OCLPerfFillImage::~OCLPerfFillImage() {} void OCLPerfFillImage::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); bufSize_ = sizeList[test % num_sizes_]; cl_image_format format = {CL_RGBA, CL_UNSIGNED_INT8}; buffer_ = _wrapper->clCreateImage2D(context_, CL_MEM_WRITE_ONLY, &format, bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(buffer_ == 0, "clCreateImage2D(imageBuffer_) failed"); return; } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfFillImage::run(void) { CPerfCounter timer; size_t iter = 100; cl_uint4 fillColor = {1, 1, 1, 1}; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, 1}; timer.Reset(); timer.Start(); for (size_t i = 0; i < iter; ++i) { error_ = clEnqueueFillImage(cmdQueues_[_deviceId], buffer_, (const void *)&fillColor, origin, region, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueFillImage() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); char buf[256]; SNPRINTF(buf, sizeof(buf), "FillImage (GB/s) for %4dx%4d ", (int)bufSize_, (int)bufSize_); testDescString = buf; double sec = timer.GetElapsedTime(); _perfInfo = static_cast( (bufSize_ * bufSize_ * 4 * iter * (double)(1e-09)) / sec); } unsigned int OCLPerfFillImage::close(void) { if (buffer_) { error_ = _wrapper->clReleaseMemObject(buffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(buffer) failed"); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfFillImage.h000066400000000000000000000031441450307266000245770ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_FILL_IMAGE_H_ #define _OCL_PERF_FILL_IMAGE_H_ #include "OCLTestImp.h" class OCLPerfFillImage : public OCLTestImp { public: OCLPerfFillImage(); virtual ~OCLPerfFillImage(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_mem buffer_; unsigned int bufSize_; unsigned int num_sizes_; bool failed_; bool skip_; }; #endif // _OCL_PERF_FILL_IMAGE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfFlush.cpp000066400000000000000000000146551450307266000243730ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfFlush.h" #include #include #include #include #include #include "CL/cl.h" static const cl_uint Iterations = 0x10000; static const cl_uint IterationDivider = 2; static const size_t MaxBuffers = IterationDivider; static size_t BufSize = 0x1000; const static char* strKernel = "__kernel void factorial(__global uint* out) \n" "{ \n" " uint id = get_global_id(0); \n" " uint factorial = 1; \n" " for (uint i = 1; i < (id / 0x10000); ++i) \n" " { \n" " factorial *= i; \n" " } \n" " out[id] = factorial; \n" "} \n"; unsigned int NumTests = 3; OCLPerfFlush::OCLPerfFlush() { _numSubTests = NumTests; failed_ = false; } OCLPerfFlush::~OCLPerfFlush() {} void OCLPerfFlush::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); test_ = test; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } size_t maxWorkGroupSize = 1; cl_uint computePower = 1; error_ = _wrapper->clGetDeviceInfo( devices_[deviceId], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(maxWorkGroupSize), &maxWorkGroupSize, NULL); computePower *= static_cast(maxWorkGroupSize); cl_uint maxComputeUnits = 1; error_ = _wrapper->clGetDeviceInfo( devices_[deviceId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(maxComputeUnits), &maxComputeUnits, NULL); computePower *= 32 * maxComputeUnits; BufSize = (BufSize < static_cast(computePower)) ? static_cast(computePower) : BufSize; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "factorial", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; for (size_t i = 0; i < MaxBuffers; ++i) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, BufSize * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } } void OCLPerfFlush::run(void) { if (failed_) { return; } for (size_t y = 0; y < IterationDivider; ++y) { cl_mem buffer = buffers()[y]; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {BufSize}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); CPerfCounter timer; const char* descriptions[] = { "Single batch: ", "clFlush(): ", "clFinish(): "}; timer.Reset(); timer.Start(); cl_uint x; for (x = 0; x < Iterations / IterationDivider; x++) { for (size_t y = 0; y < IterationDivider; ++y) { cl_mem buffer = buffers()[y]; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {BufSize}; error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } if (test_ == 1) { _wrapper->clFlush(cmdQueues_[_deviceId]); } else if (test_ == 2) { _wrapper->clFinish(cmdQueues_[_deviceId]); } } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); std::stringstream stream; stream << "Loop[" << std::hex << Iterations << "], " << descriptions[test_]; stream << "(sec)"; testDescString = stream.str(); _perfInfo = static_cast(timer.GetElapsedTime()); } unsigned int OCLPerfFlush::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfFlush.h000066400000000000000000000030131450307266000240220ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_FLUSH_H_ #define _OCL_PERF_FLUSH_H_ #include "OCLTestImp.h" class OCLPerfFlush : public OCLTestImp { public: OCLPerfFlush(); virtual ~OCLPerfFlush(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int test_; }; #endif // _OCL_PERF_FLUSH_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfGenericBandwidth.cpp000066400000000000000000000254061450307266000265070ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfGenericBandwidth.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; void OCLPerfGenericBandwidth::genShader(unsigned int idx) { shader_.clear(); if (idx == 0) { shader_ += "__kernel __attribute__((reqd_work_group_size(64,1,1))) void " "_genericReadSpeed(global float *outBuf, global float *inBuf, local " "float *inLocal, float c, char useLocal)\n" "{\n" " int gid = (int) get_global_id(0);\n" " int lid = (int) get_local_id(0);\n" " float val0 = 0.0f;\n" " float val1 = 0.0f;\n" " float *localLocal;\n" " int hacklid = gid % 64;\n" " if (useLocal)\n" " localLocal = inLocal;\n" " else\n" " localLocal = inBuf;\n" " for (int i = 0; i < (768/64); i++) {\n" " localLocal[hacklid + i*64] = lid;\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" "#pragma nounroll\n" " for (uint i = 0; i < 32;i++)\n" " {\n" " val0 += localLocal[lid+0];\n" " val1 += localLocal[lid+64];\n" " val0 += localLocal[lid+128];\n" " val1 += localLocal[lid+192];\n" " val0 += localLocal[lid+256];\n" " val1 += localLocal[lid+320];\n" " val0 += localLocal[lid+384];\n" " val1 += localLocal[lid+448];\n" " lid += 1;\n" " }\n" " val0 += val1;\n" " val1 = min(val0,1.0f);\n" " if ((lid + val1) < 0){\n" " outBuf[gid] = val0;\n" " }\n" "}\n"; dataSizeBytes_ = 768 * 4; } else { shader_ += "__kernel __attribute__((reqd_work_group_size(64,1,1))) void " "_genericReadSpeed(global float *outBuf, global float *inBuf, local " "float *inLocal, float c, char useLocal)\n" "{\n" " uint gid = (uint) get_global_id(0);\n" " int lid = (int) get_local_id(0);\n" " float val0 = 0.0f;\n" " float val1 = 0.0f;\n" " float *localLocal;\n" " uint hacklid = gid % 64;\n" " if (useLocal)\n" " localLocal = inLocal;\n" " else\n" " localLocal = inBuf;\n" " for (int i = 0; i < (256/64); i++) {\n" " localLocal[hacklid + i*64] = lid;\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " #pragma nounroll\n" " for (uint i = 0; i < 32;i++)\n" " {\n" " val0 += localLocal[8*i+0];\n" " val1 += localLocal[8*i+1];\n" " val0 += localLocal[8*i+2];\n" " val1 += localLocal[8*i+3];\n" " val0 += localLocal[8*i+4];\n" " val1 += localLocal[8*i+5];\n" " val0 += localLocal[8*i+6];\n" " val1 += localLocal[8*i+7];\n" " }\n" " val0 += val1;\n" " val1 = min(val0,1.0f);\n" " if ((lid + val1) < 0){\n" " outBuf[gid] = val0;\n" " }\n" "}\n"; dataSizeBytes_ = 256 * 4; } } OCLPerfGenericBandwidth::OCLPerfGenericBandwidth() { _numSubTests = NUM_SIZES * 4; } OCLPerfGenericBandwidth::~OCLPerfGenericBandwidth() {} void OCLPerfGenericBandwidth::setData(cl_mem buffer, float val) { float *data = (float *)_wrapper->clEnqueueMapBuffer( cmdQueues_[_deviceId], buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmdQueues_[_deviceId]); } void OCLPerfGenericBandwidth::checkData(cl_mem buffer) { float *data = (float *)_wrapper->clEnqueueMapBuffer( cmdQueues_[_deviceId], buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) { if (data[i] != (float)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmdQueues_[_deviceId]); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfGenericBandwidth::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); _crcword = 0; conversion = 1.0f; failed = false; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; useLDS_ = ((test / NUM_SIZES) % 2) == 0 ? 1 : 0; size_t param_size = 0; char *strVersion = 0; error_ = _wrapper->clGetDeviceInfo( devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[9] < '2') { failed = true; return; } delete strVersion; numReads_ = 32; width_ = Sizes[test % NUM_SIZES]; shaderIdx_ = test / (NUM_SIZES * 2); bufSize_ = width_; inBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); genShader(shaderIdx_); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_genericReadSpeed", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); float foo = 0; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 2, 1024 * sizeof(cl_float), (void *)NULL); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_float), (void *)&foo); error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_char), (void *)&useLDS_); setData(outBuffer_, 1.2345678f); } void OCLPerfGenericBandwidth::run(void) { if (failed) return; int global = bufSize_ / sizeof(cl_float); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues_[_deviceId], kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); double sec = timer.GetElapsedTime(); char buf[256]; const char *buf2; if (useLDS_) buf2 = "LDS"; else buf2 = "global"; const char *buf3; if (shaderIdx_ == 0) { buf3 = "reads"; numReads_ *= 8; } else { buf3 = "broadcast"; numReads_ *= 8; } // LDS bandwidth in GB/s // We have one extra write per LDS location to initialize LDS double perf = ((double)global * (numReads_ * sizeof(cl_float) + dataSizeBytes_ / 64) * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; SNPRINTF(buf, sizeof(buf), " %6s %9s %8d threads, %3d reads (GB/s) ", buf2, buf3, global, numReads_); testDescString = buf; // checkData(outBuffer_); } unsigned int OCLPerfGenericBandwidth::close(void) { if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfGenericBandwidth.h000066400000000000000000000036571450307266000261600ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GenericBandwidth_H_ #define _OCL_GenericBandwidth_H_ #include "OCLTestImp.h" class OCLPerfGenericBandwidth : public OCLTestImp { public: OCLPerfGenericBandwidth(); virtual ~OCLPerfGenericBandwidth(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(unsigned int idx); void setData(cl_mem buffer, float data); void checkData(cl_mem buffer); static const unsigned int NUM_ITER = 100; cl_mem inBuffer_; cl_mem outBuffer_; unsigned int width_; unsigned int bufSize_; unsigned int vecSizeIdx_; unsigned int numReads_; unsigned int shaderIdx_; unsigned int dataSizeBytes_; cl_char useLDS_; bool failed; }; #endif // _OCL_GenericBandwidth_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfGenoilSiaMiner.cpp000066400000000000000000000440501450307266000261470ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfGenoilSiaMiner.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_INTENSITY 15 static const unsigned int intensities[NUM_INTENSITY] = { DEFAULT_INTENSITY, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31}; static const char *siaKernel = " inline static uint2 ror64(const uint2 x, const uint y) " " \n" " { " " \n" " return " "(uint2)(((x).x>>y)^((x).y<<(32-y)),((x).y>>y)^((x).x<<(32-y))); " " \n" " } " " \n" " inline static uint2 ror64_2(const uint2 x, const uint y) " " \n" " { " " \n" " return " "(uint2)(((x).y>>(y-32))^((x).x<<(64-y)),((x).x>>(y-32))^((x).y<<(64-y))); " " \n" " } " " \n" " __constant static const uchar blake2b_sigma[12][16] = { " " \n" " { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 } " ", \n" " { 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 } " ", \n" " { 11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4 } " ", \n" " { 7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8 } " ", \n" " { 9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13 } " ", \n" " { 2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9 } " ", \n" " { 12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11 } " ", \n" " { 13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10 } " ", \n" " { 6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5 } " ", \n" " { 10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0 } " ", \n" " { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 } " ", \n" " { 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 } " "}; \n" " // Target is passed in via headerIn[32 - 29] " " \n" " __kernel void nonceGrind(__global ulong *headerIn, __global ulong " "*nonceOut) { \n" " ulong target = headerIn[4]; " " \n" " ulong m[16] = { headerIn[0], headerIn[1], " " \n" " headerIn[2], headerIn[3], " " \n" " (ulong)get_global_id(0), headerIn[5], " " \n" " headerIn[6], headerIn[7], " " \n" " headerIn[8], headerIn[9], 0, 0, 0, 0, 0, 0 }; " " \n" " ulong v[16] = { 0x6a09e667f2bdc928, 0xbb67ae8584caa73b, " "0x3c6ef372fe94f82b, 0xa54ff53a5f1d36f1, \n" " 0x510e527fade682d1, 0x9b05688c2b3e6c1f, " "0x1f83d9abfb41bd6b, 0x5be0cd19137e2179, \n" " 0x6a09e667f3bcc908, 0xbb67ae8584caa73b, " "0x3c6ef372fe94f82b, 0xa54ff53a5f1d36f1, \n" " 0x510e527fade68281, 0x9b05688c2b3e6c1f, " "0xe07c265404be4294, 0x5be0cd19137e2179 }; \n" " #define G(r,i,a,b,c,d) \\\n" " a = a + b + m[ blake2b_sigma[r][2*i] ]; \\\n" " ((uint2*)&d)[0] = ((uint2*)&d)[0].yx ^ ((uint2*)&a)[0].yx; \\\n" " c = c + d; \\\n" " ((uint2*)&b)[0] = ror64( ((uint2*)&b)[0] ^ ((uint2*)&c)[0], 24U); " "\\\n" " a = a + b + m[ blake2b_sigma[r][2*i+1] ]; \\\n" " ((uint2*)&d)[0] = ror64( ((uint2*)&d)[0] ^ ((uint2*)&a)[0], 16U); " "\\\n" " c = c + d; \\\n" " ((uint2*)&b)[0] = ror64_2( ((uint2*)&b)[0] ^ ((uint2*)&c)[0], " "63U);\n" " #define ROUND(r) \\\n" " G(r,0,v[ 0],v[ 4],v[ 8],v[12]); \\\n" " G(r,1,v[ 1],v[ 5],v[ 9],v[13]); \\\n" " G(r,2,v[ 2],v[ 6],v[10],v[14]); \\\n" " G(r,3,v[ 3],v[ 7],v[11],v[15]); \\\n" " G(r,4,v[ 0],v[ 5],v[10],v[15]); \\\n" " G(r,5,v[ 1],v[ 6],v[11],v[12]); \\\n" " G(r,6,v[ 2],v[ 7],v[ 8],v[13]); \\\n" " G(r,7,v[ 3],v[ 4],v[ 9],v[14]); " " \n" " ROUND( 0 ); " " \n" " ROUND( 1 ); " " \n" " ROUND( 2 ); " " \n" " ROUND( 3 ); " " \n" " ROUND( 4 ); " " \n" " ROUND( 5 ); " " \n" " ROUND( 6 ); " " \n" " ROUND( 7 ); " " \n" " ROUND( 8 ); " " \n" " ROUND( 9 ); " " \n" " ROUND( 10 ); " " \n" " ROUND( 11 ); " " \n" " #undef G " " \n" " #undef ROUND " " \n" " if (as_ulong(as_uchar8(0x6a09e667f2bdc928 ^ v[0] ^ " "v[8]).s76543210) < target) { \n" " *nonceOut = m[4]; " " \n" " return; " " \n" " } " " \n" " }\n"; OCLPerfGenoilSiaMiner::OCLPerfGenoilSiaMiner() { _numSubTests = NUM_INTENSITY; } OCLPerfGenoilSiaMiner::~OCLPerfGenoilSiaMiner() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfGenoilSiaMiner::setHeader(uint32_t *ptr) { ptr[0] = 0x10; for (unsigned int i = 1; i < 9; i++) { ptr[i] = 0; } ptr[9] = 0x4a5e1e4b; ptr[10] = 0xaab89f3a; ptr[11] = 0x32518a88; ptr[12] = 0xc31bc87f; ptr[13] = 0x618f7667; ptr[14] = 0x3e2cc77a; ptr[15] = 0xb2127b7a; ptr[16] = 0xfdeda33b; ptr[17] = 0x495fab29; ptr[18] = 0x1d00ffff; ptr[19] = 0x7c2bac1d; } void OCLPerfGenoilSiaMiner::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; // Parse args. isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } char getVersion[128]; error_ = _wrapper->clGetPlatformInfo(platform, CL_PLATFORM_VERSION, sizeof(getVersion), getVersion, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); platformVersion[0] = getVersion[7]; platformVersion[1] = getVersion[8]; platformVersion[2] = getVersion[9]; platformVersion[3] = '\0'; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; // Make sure the device can handle our local item size. size_t max_group_size = 0; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &max_group_size, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); if (local_item_size > max_group_size) { char buf[256]; SNPRINTF(buf, sizeof(buf), "Selected device cannot handle work groups larger than %zu.\n", local_item_size); local_item_size = max_group_size; testDescString = buf; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); // Create Buffer Objects. blockHeadermobj_ = _wrapper->clCreateBuffer( context_, CL_MEM_READ_ONLY, 80 * sizeof(uint8_t), NULL, &error_); CHECK_RESULT(blockHeadermobj_ == 0, "clCreateBuffer(outBuffer) failed"); nonceOutmobj_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, 8 * sizeof(uint8_t), NULL, &error_); CHECK_RESULT(nonceOutmobj_ == 0, "clCreateBuffer(outBuffer) failed"); // Create kernel program from source file. program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&siaKernel, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &device, NULL, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } // Create data parallel OpenCL kernel. kernel_ = _wrapper->clCreateKernel(program_, "nonceGrind", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); // Set OpenCL kernel arguments. error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&blockHeadermobj_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&nonceOutmobj_); } void OCLPerfGenoilSiaMiner::run(void) { CPerfCounter timer; uint8_t blockHeader[80]; uint8_t target[32] = {255}; uint8_t nonceOut[8] = {0}; setHeader((uint32_t *)blockHeader); intensity = intensities[_openTest % NUM_INTENSITY]; size_t global_item_size = 1ULL << intensity; timer.Reset(); timer.Start(); // By doing a bunch of low intensity calls, we prevent freezing // By splitting them up inside this function, we also avoid calling // get_block_for_work too often. for (unsigned int i = 0; i < cycles_per_iter; i++) { // Offset global ids so that each loop call tries a different set of // hashes. size_t globalid_offset = i * global_item_size; // Copy input data to the memory buffer. error_ = clEnqueueWriteBuffer(cmd_queue_, blockHeadermobj_, CL_TRUE, 0, 80 * sizeof(uint8_t), blockHeader, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueWriteBuffer failed"); error_ = clEnqueueWriteBuffer(cmd_queue_, nonceOutmobj_, CL_TRUE, 0, 8 * sizeof(uint8_t), nonceOut, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueWriteBuffer failed"); // Run the kernel. error_ = clEnqueueNDRangeKernel(cmd_queue_, kernel_, 1, &globalid_offset, &global_item_size, &local_item_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); // Copy result to host and see if a block was found. error_ = clEnqueueReadBuffer(cmd_queue_, nonceOutmobj_, CL_TRUE, 0, 8 * sizeof(uint8_t), nonceOut, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer failed"); // if (nonceOut[0] != 0) { // // Copy nonce to header. // memcpy(blockHeader + 32, nonceOut, 8); // break; //} } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // Hash rate calculation MH/s double hash_rate = cycles_per_iter * global_item_size / (sec * 1000000); _perfInfo = (float)hash_rate; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4d cycles) Work_items:%10zu Intensity:%d (MH/s) ", cycles_per_iter, global_item_size, intensity); testDescString = buf; } unsigned int OCLPerfGenoilSiaMiner::close(void) { if (blockHeadermobj_) { error_ = _wrapper->clReleaseMemObject(blockHeadermobj_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(blockHeadermobj_) failed"); } if (nonceOutmobj_) { error_ = _wrapper->clReleaseMemObject(nonceOutmobj_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(nonceOutmobj_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfGenoilSiaMiner.h000066400000000000000000000054551450307266000256220ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GenoilSiaMiner_H_ #define _OCL_GenoilSiaMiner_H_ #include "OCLTestImp.h" class OCLPerfGenoilSiaMiner : public OCLTestImp { public: OCLPerfGenoilSiaMiner(); virtual ~OCLPerfGenoilSiaMiner(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; // 2^intensity hashes are calculated each time the kernel is called // Minimum of 2^8 (256) because our default local_item_size is 256 // global_item_size (2^intensity) must be a multiple of local_item_size // Max of 2^32 so that people can't send an hour of work to the GPU at one // time #define MIN_INTENSITY 8 #define MAX_INTENSITY 32 #define DEFAULT_INTENSITY 16 // Number of times the GPU kernel is called between updating the command line // text #define MIN_CPI 1 // Must do one call per update #define MAX_CPI 65536 // 2^16 is a slightly arbitrary max #define DEFAULT_CPI 30 // The maximum size of the .cl file we read in and compile #define MAX_SOURCE_SIZE (0x200000) cl_context context_; cl_command_queue cmd_queue_; cl_int error_; cl_program program_; cl_kernel kernel_; // mem objects for storing our kernel parameters cl_mem blockHeadermobj_ = NULL; cl_mem nonceOutmobj_ = NULL; // More gobal variables the grindNonce needs to access size_t local_item_size = 256; // Size of local work groups. 256 is usually optimal unsigned int blocks_mined = 0; unsigned int intensity = DEFAULT_INTENSITY; unsigned cycles_per_iter = DEFAULT_CPI; bool isAMD; char platformVersion[32]; void setHeader(uint32_t* ptr); }; #endif // _OCL_GenoilSiaMiner_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageCopyCorners.cpp000066400000000000000000000340001450307266000265050ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageCopyCorners.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 2 static const unsigned int Sizes0[NUM_SIZES] = {512, 16384}; static const unsigned int Sizes1[NUM_SIZES] = {16384, 512}; #define NUM_FORMATS 3 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}, {CL_R, CL_UNSIGNED_INT32}, {CL_RGBA, CL_UNSIGNED_INT32}}; static const char *textFormats[NUM_FORMATS] = {"R8G8B8A8", "R32", "R32G32B32A32"}; static const unsigned int formatSize[NUM_FORMATS] = { 4 * sizeof(cl_uchar), 1 * sizeof(cl_uint), 4 * sizeof(cl_uint)}; static const unsigned int Iterations[2] = {1, OCLPerfImageCopyCorners::NUM_ITER}; #define NUM_SUBTESTS 3 OCLPerfImageCopyCorners::OCLPerfImageCopyCorners() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * NUM_FORMATS * 2; } OCLPerfImageCopyCorners::~OCLPerfImageCopyCorners() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageCopyCorners::setData(void *ptr, unsigned int pitch, unsigned int size) { unsigned int *ptr2 = (unsigned int *)ptr; unsigned int value = 0; for (unsigned int i = 0; i > 2; i++) { ptr2[i] = value; value++; } } void OCLPerfImageCopyCorners::checkData(void *ptr, unsigned int pitch, unsigned int size) { unsigned int *ptr2 = (unsigned int *)ptr; unsigned int value = 0; for (unsigned int i = 0; i < size >> 2; i++) { if (ptr2[i] != value) { printf("Data validation failed at %d! Got 0x%08x 0x%08x 0x%08x 0x%08x\n", i, ptr2[i], ptr2[i + 1], ptr2[i + 2], ptr2[i + 3]); printf("Expected 0x%08x 0x%08x 0x%08x 0x%08x\n", value, value, value, value); CHECK_RESULT(true, "Data validation failed!"); break; } value++; } } void OCLPerfImageCopyCorners::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; size_t queryOut = 0; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; srcBuffer_ = 0; dstBuffer_ = 0; srcImage_ = false; dstImage_ = false; skip_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } bufnum_ = (_openTest / (NUM_SIZES * NUM_SUBTESTS)) % NUM_FORMATS; if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) + 1) & 1) { srcImage_ = true; } if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) + 1) & 2) { dstImage_ = true; } numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS * NUM_FORMATS)]; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } if (_openTest % NUM_SIZES) { error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE2D_MAX_WIDTH, sizeof(size_t), &queryOut, NULL); bufSizeW_ = (cl_uint)queryOut; bufSizeH_ = Sizes1[_openTest % NUM_SIZES]; } else { error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE2D_MAX_HEIGHT, sizeof(size_t), &queryOut, NULL); bufSizeW_ = Sizes0[_openTest % NUM_SIZES]; bufSizeH_ = (cl_uint)queryOut; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; void *mem; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSizeW_, bufSizeH_, 1}; size_t image_row_pitch; size_t image_slice_pitch; unsigned int memSize; if (dstImage_) { dstBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSizeW_, bufSizeH_, 0, NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateImage(dstBuffer) failed"); mem = _wrapper->clEnqueueMapImage( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSizeH_; } else { dstBuffer_ = _wrapper->clCreateBuffer( context_, flags, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateBuffer(dstBuffer) failed"); mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSizeW_ * bufSizeH_ * formatSize[bufnum_]; image_row_pitch = 0; } setData(mem, (unsigned int)image_row_pitch, memSize); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); flags = CL_MEM_READ_ONLY; if (srcImage_) { srcBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSizeW_, bufSizeH_, 0, NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateImage(srcBuffer) failed"); mem = _wrapper->clEnqueueMapImage( cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSizeH_; } else { srcBuffer_ = _wrapper->clCreateBuffer( context_, flags, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateBuffer(srcBuffer) failed"); mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSizeW_ * bufSizeH_ * formatSize[bufnum_]; image_row_pitch = 0; } setData(mem, (unsigned int)image_row_pitch, memSize); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, srcBuffer_, mem, 0, NULL, NULL); } void OCLPerfImageCopyCorners::run(void) { if (skip_) { return; } size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSizeW_, bufSizeH_, 1}; // Warm up if (srcImage_ == false) { error_ = _wrapper->clEnqueueCopyBufferToImage( cmd_queue_, srcBuffer_, dstBuffer_, 0, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBufferToImage failed"); } else if (dstImage_ == false) { error_ = _wrapper->clEnqueueCopyImageToBuffer( cmd_queue_, srcBuffer_, dstBuffer_, origin, region, 0, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImageToBuffer failed"); } else { error_ = _wrapper->clEnqueueCopyImage(cmd_queue_, srcBuffer_, dstBuffer_, origin, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImage failed"); } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { if (srcImage_ == false) { error_ = _wrapper->clEnqueueCopyBufferToImage( cmd_queue_, srcBuffer_, dstBuffer_, 0, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBufferToImage failed"); } else if (dstImage_ == false) { error_ = _wrapper->clEnqueueCopyImageToBuffer( cmd_queue_, srcBuffer_, dstBuffer_, origin, region, 0, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImageToBuffer failed"); } else { error_ = _wrapper->clEnqueueCopyImage(cmd_queue_, srcBuffer_, dstBuffer_, origin, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImage failed"); } } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Stop(); double sec = timer.GetElapsedTime(); // Image copy bandwidth in GB/s double perf = ((double)bufSizeW_ * bufSizeH_ * formatSize[bufnum_] * 2 * numIter * (double)(1e-09)) / sec; const char *strSrc = NULL; const char *strDst = NULL; if (srcImage_) strSrc = "img"; else strSrc = "buf"; if (dstImage_) strDst = "img"; else strDst = "buf"; void *mem; size_t image_row_pitch; size_t image_slice_pitch; unsigned int memSize; if (dstImage_) { mem = _wrapper->clEnqueueMapImage( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSizeH_; } else { mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSizeW_ * bufSizeH_ * formatSize[bufnum_]; image_row_pitch = 0; } checkData(mem, (unsigned int)image_row_pitch, memSize); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s src:%s dst:%s i: %4d (GB/s) ", bufSizeW_, bufSizeH_, textFormats[bufnum_], strSrc, strDst, numIter); testDescString = buf; } unsigned int OCLPerfImageCopyCorners::close(void) { if (skip_) { return CL_SUCCESS; } _wrapper->clFinish(cmd_queue_); if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(dstBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageCopyCorners.h000066400000000000000000000037101450307266000261560ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageCopyCorners_H_ #define _OCL_ImageCopyCorners_H_ #include "OCLTestImp.h" class OCLPerfImageCopyCorners : public OCLTestImp { public: OCLPerfImageCopyCorners(); virtual ~OCLPerfImageCopyCorners(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 10; cl_context context_; cl_command_queue cmd_queue_; cl_mem srcBuffer_; cl_mem dstBuffer_; cl_int error_; bool skip_; unsigned int bufSizeW_; unsigned int bufSizeH_; unsigned int bufnum_; bool srcImage_; bool dstImage_; unsigned int numIter; void setData(void* ptr, unsigned int pitch, unsigned int size); void checkData(void* ptr, unsigned int pitch, unsigned int size); }; #endif // _OCL_ImageCopyCorners_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageCopySpeed.cpp000066400000000000000000000322171450307266000261420ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageCopySpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {256, 512, 1024, 2048}; #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"R8G8B8A8"}; static const unsigned int formatSize[NUM_FORMATS] = {4 * sizeof(cl_uchar)}; static const unsigned int Iterations[2] = {1, OCLPerfImageCopySpeed::NUM_ITER}; #define NUM_SUBTESTS 3 OCLPerfImageCopySpeed::OCLPerfImageCopySpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * NUM_FORMATS * 2; } OCLPerfImageCopySpeed::~OCLPerfImageCopySpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageCopySpeed::setData(void *ptr, unsigned int pitch, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; for (unsigned int i = 0; i < size >> 2; i++) { ptr2[i] = value; } } void OCLPerfImageCopySpeed::checkData(void *ptr, unsigned int pitch, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; for (unsigned int i = 0; i < size >> 2; i++) { if (ptr2[i] != value) { printf("Data validation failed at %d! Got 0x%08x 0x%08x 0x%08x 0x%08x\n", i, ptr2[i], ptr2[i + 1], ptr2[i + 2], ptr2[i + 3]); printf("Expected 0x%08x 0x%08x 0x%08x 0x%08x\n", value, value, value, value); break; } } } void OCLPerfImageCopySpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; srcBuffer_ = 0; dstBuffer_ = 0; srcImage_ = false; dstImage_ = false; skip_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } bufSize_ = Sizes[_openTest % NUM_SIZES]; bufnum_ = (_openTest / (NUM_SIZES * NUM_SUBTESTS)) % NUM_FORMATS; if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) + 1) & 1) { srcImage_ = true; } if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) + 1) & 2) { dstImage_ = true; } numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS * NUM_FORMATS)]; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; void *mem; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, 1}; size_t image_row_pitch; size_t image_slice_pitch; unsigned int memSize; if (dstImage_) { dstBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateImage(dstBuffer) failed"); mem = _wrapper->clEnqueueMapImage( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSize_; } else { dstBuffer_ = _wrapper->clCreateBuffer( context_, flags, bufSize_ * bufSize_ * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateBuffer(dstBuffer) failed"); mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSize_ * bufSize_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSize_ * bufSize_ * formatSize[bufnum_]; image_row_pitch = 0; } setData(mem, (unsigned int)image_row_pitch, memSize, 0xdeadbeef); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); flags = CL_MEM_READ_ONLY; if (srcImage_) { srcBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateImage(srcBuffer) failed"); mem = _wrapper->clEnqueueMapImage( cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSize_; } else { srcBuffer_ = _wrapper->clCreateBuffer( context_, flags, bufSize_ * bufSize_ * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateBuffer(srcBuffer) failed"); mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSize_ * bufSize_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSize_ * bufSize_ * formatSize[bufnum_]; image_row_pitch = 0; } setData(mem, (unsigned int)image_row_pitch, memSize, 0x600df00d); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, srcBuffer_, mem, 0, NULL, NULL); } void OCLPerfImageCopySpeed::run(void) { if (skip_) { return; } size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, 1}; // Warm up if (srcImage_ == false) { error_ = _wrapper->clEnqueueCopyBufferToImage( cmd_queue_, srcBuffer_, dstBuffer_, 0, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBufferToImage failed"); } else if (dstImage_ == false) { error_ = _wrapper->clEnqueueCopyImageToBuffer( cmd_queue_, srcBuffer_, dstBuffer_, origin, region, 0, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImageToBuffer failed"); } else { error_ = _wrapper->clEnqueueCopyImage(cmd_queue_, srcBuffer_, dstBuffer_, origin, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImage failed"); } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { if (srcImage_ == false) { error_ = _wrapper->clEnqueueCopyBufferToImage( cmd_queue_, srcBuffer_, dstBuffer_, 0, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBufferToImage failed"); } else if (dstImage_ == false) { error_ = _wrapper->clEnqueueCopyImageToBuffer( cmd_queue_, srcBuffer_, dstBuffer_, origin, region, 0, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImageToBuffer failed"); } else { error_ = _wrapper->clEnqueueCopyImage(cmd_queue_, srcBuffer_, dstBuffer_, origin, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImage failed"); } } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Stop(); double sec = timer.GetElapsedTime(); // Image copy bandwidth in GB/s double perf = ((double)bufSize_ * bufSize_ * formatSize[bufnum_] * 2 * numIter * (double)(1e-09)) / sec; const char *strSrc = NULL; const char *strDst = NULL; if (srcImage_) strSrc = "img"; else strSrc = "buf"; if (dstImage_) strDst = "img"; else strDst = "buf"; void *mem; size_t image_row_pitch; size_t image_slice_pitch; unsigned int memSize; if (dstImage_) { mem = _wrapper->clEnqueueMapImage( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSize_; } else { mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSize_ * bufSize_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSize_ * bufSize_ * formatSize[bufnum_]; image_row_pitch = 0; } checkData(mem, (unsigned int)image_row_pitch, memSize, 0x600df00d); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s src:%s dst:%s i: %4d (GB/s) ", bufSize_, bufSize_, textFormats[bufnum_], strSrc, strDst, numIter); testDescString = buf; } unsigned int OCLPerfImageCopySpeed::close(void) { if (skip_) { return CL_SUCCESS; } _wrapper->clFinish(cmd_queue_); if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(dstBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageCopySpeed.h000066400000000000000000000037521450307266000256110ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageCopySpeed_H_ #define _OCL_ImageCopySpeed_H_ #include "OCLTestImp.h" class OCLPerfImageCopySpeed : public OCLTestImp { public: OCLPerfImageCopySpeed(); virtual ~OCLPerfImageCopySpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem srcBuffer_; cl_mem dstBuffer_; cl_int error_; bool skip_; unsigned int bufSize_; unsigned int bufnum_; bool srcImage_; bool dstImage_; unsigned int numIter; void setData(void* ptr, unsigned int pitch, unsigned int size, unsigned int value); void checkData(void* ptr, unsigned int pitch, unsigned int size, unsigned int value); }; #endif // _OCL_ImageCopySpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageCreate.cpp000066400000000000000000000150671450307266000254560ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageCreate.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {256, 512, 1024, 2048}; #if defined(CL_VERSION_2_0) #define NUM_FORMATS 3 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}, {CL_sRGBA, CL_UNORM_INT8}, {CL_DEPTH, CL_UNORM_INT16}}; static const char *textFormats[NUM_FORMATS] = {"CL_RGBA , CL_UNSIGNED_INT8", "CL_sRGBA, CL_UNORM_INT8 ", "CL_DEPTH, CL_UNORM_INT16 "}; static const unsigned int formatSize[NUM_FORMATS] = { sizeof(CL_UNSIGNED_INT8), sizeof(CL_UNORM_INT8), sizeof(CL_UNORM_INT16)}; #else #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"CL_RGBA, CL_UNSIGNED_INT8"}; static const unsigned int formatSize[NUM_FORMATS] = {sizeof(CL_UNSIGNED_INT8)}; #endif OCLPerfImageCreate::OCLPerfImageCreate() { _numSubTests = NUM_SIZES * NUM_FORMATS; } OCLPerfImageCreate::~OCLPerfImageCreate() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageCreate::setData(void *ptr, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; for (unsigned int i = 0; i < size >> 2; i++) { ptr2[i] = value; value++; } } void OCLPerfImageCreate::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; testId_ = test; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; cmd_queue_ = 0; outBuffer_ = 0; skip_ = false; // check device version size_t param_size = 0; char *strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { skip_ = true; testDescString = "sRGBA Image not supported for < 2.0 devices. Test Skipped."; delete strVersion; return; } delete strVersion; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } bufSize_ = Sizes[test % NUM_SIZES]; bufnum_ = (test / NUM_SIZES) % NUM_FORMATS; memSize = bufSize_ * bufSize_ * formatSize[bufnum_]; numIter = 100; outBuffer_ = (cl_mem *)malloc(numIter * sizeof(cl_mem)); memptr = new char[memSize]; cmd_queue_ = cmdQueues_[_deviceId]; } void OCLPerfImageCreate::run(void) { if (skip_) { return; } CPerfCounter timer; cl_image_desc imageInfo; memset(&imageInfo, 0x0, sizeof(cl_image_desc)); imageInfo.image_type = CL_MEM_OBJECT_IMAGE2D; imageInfo.image_width = bufSize_; imageInfo.image_height = bufSize_; imageInfo.image_depth = 1; imageInfo.image_array_size = 1; imageInfo.image_row_pitch = bufSize_ * formatSize[bufnum_]; imageInfo.image_slice_pitch = imageInfo.image_row_pitch * (bufSize_); setData(memptr, memSize, 0xdeadbeef); char *dstmem = new char[memSize]; size_t origin[3] = {0, 0, 0}; size_t region[3] = {1, 1, 1}; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; ++i) { outBuffer_[i] = clCreateImage(context_, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, &formats[bufnum_], &imageInfo, memptr, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Error clCreateImage()"); error_ = _wrapper->clEnqueueReadImage(cmd_queue_, outBuffer_[i], CL_TRUE, origin, region, 0, 0, dstmem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadImage failed"); _wrapper->clFinish(cmd_queue_); } timer.Stop(); delete dstmem; double sec = timer.GetElapsedTime(); // Image create in GB/s double perf = ((double)memSize * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; unsigned int fmt_num = (testId_ / NUM_SIZES) % NUM_FORMATS; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s(%1d) i: %4d (GB/s) ", bufSize_, bufSize_, textFormats[fmt_num], formatSize[bufnum_], numIter); testDescString = buf; } unsigned int OCLPerfImageCreate::close(void) { if (skip_) { return CL_SUCCESS; } if (memptr) { delete memptr; } if (outBuffer_) { for (unsigned int i = 0; i < numIter; ++i) { if (outBuffer_[i]) { error_ = _wrapper->clReleaseMemObject(outBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_[i]) failed"); } } } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageCreate.h000066400000000000000000000034101450307266000251100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageCreate_H_ #define _OCL_ImageCreate_H_ #include "OCLTestImp.h" class OCLPerfImageCreate : public OCLTestImp { public: OCLPerfImageCreate(); virtual ~OCLPerfImageCreate(); public: virtual void open(unsigned int test, char *units, double &conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); virtual void setData(void *ptr, unsigned int size, unsigned int value); cl_command_queue cmd_queue_; cl_mem *outBuffer_; unsigned int bufSize_; unsigned int bufnum_; unsigned int numIter; char *memptr; unsigned int memSize; unsigned int testId_; bool skip_; }; #endif // _OCL_ImageCreate_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageMapUnmap.cpp000066400000000000000000000311631450307266000257640ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageMapUnmap.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 1 static const unsigned int Sizes0[2] = {0xc0, 0x18a}; #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = {{CL_R, CL_SNORM_INT16}}; static const char *textFormats[NUM_FORMATS] = {"R16"}; static const unsigned int formatSize[NUM_FORMATS] = {2 * sizeof(cl_uchar)}; static const unsigned int Iterations[2] = {1, OCLPerfImageMapUnmap::NUM_ITER}; #define NUM_SUBTESTS 1 OCLPerfImageMapUnmap::OCLPerfImageMapUnmap() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * NUM_FORMATS * 1; } OCLPerfImageMapUnmap::~OCLPerfImageMapUnmap() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageMapUnmap::setData(void *ptr, unsigned int pitch, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; value = 0; for (unsigned int i = 0; i < size >> 2; i++) { ptr2[i] = value; value++; } } void OCLPerfImageMapUnmap::checkData(void *ptr, unsigned int pitch, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; value = 0; for (unsigned int i = 0; i < size >> 2; i++) { if (ptr2[i] != value) { printf("Data validation failed at %d! Got 0x%08x 0x%08x 0x%08x 0x%08x\n", i, ptr2[i], ptr2[i + 1], ptr2[i + 2], ptr2[i + 3]); printf("Expected 0x%08x 0x%08x 0x%08x 0x%08x\n", value, value, value, value); CHECK_RESULT(true, "Data validation failed!"); break; } value++; } } void OCLPerfImageMapUnmap::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; size_t queryOut = 0; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; skip_ = false; context_ = 0; cmd_queue_ = 0; srcBuffer_ = 0; dstBuffer_ = 0; srcImage_ = false; dstImage_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } bufnum_ = (_openTest / (NUM_SIZES * NUM_SUBTESTS)) % NUM_FORMATS; srcImage_ = true; dstImage_ = false; numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS * NUM_FORMATS)]; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } bufSizeW_ = Sizes0[0]; bufSizeH_ = Sizes0[1]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; cl_mem_flags flags2 = CL_MEM_WRITE_ONLY; void *mem; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSizeW_, bufSizeH_, 1}; size_t image_row_pitch; size_t image_slice_pitch; cl_image_desc imageInfo; memset(&imageInfo, 0x0, sizeof(cl_image_desc)); imageInfo.image_type = CL_MEM_OBJECT_IMAGE2D; imageInfo.image_width = bufSizeW_; imageInfo.image_height = bufSizeH_; imageInfo.image_depth = 1; imageInfo.image_array_size = 1; imageInfo.image_row_pitch = bufSizeW_ * formatSize[bufnum_]; imageInfo.image_slice_pitch = imageInfo.image_row_pitch * (bufSizeH_); void *host_ptr = malloc(imageInfo.image_row_pitch * imageInfo.image_height); unsigned int memSize; if (dstImage_) { dstBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSizeW_, bufSizeH_, 0, host_ptr, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateImage(dstBuffer) failed"); mem = _wrapper->clEnqueueMapImage( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSizeH_; } else { dstBuffer_ = _wrapper->clCreateBuffer( context_, flags2, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateBuffer(dstBuffer) failed"); mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSizeW_ * bufSizeH_ * formatSize[bufnum_]; image_row_pitch = 0; } setData(mem, (unsigned int)image_row_pitch, memSize, 0xdeadbeef); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); flags = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR; if (srcImage_) { srcBuffer_ = _wrapper->clCreateImage(context_, flags, &formats[bufnum_], &imageInfo, host_ptr, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateImage(srcBuffer) failed"); mem = _wrapper->clEnqueueMapImage( cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSizeH_; error_ = _wrapper->clFinish(cmd_queue_); } else { srcBuffer_ = _wrapper->clCreateBuffer( context_, flags, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateBuffer(srcBuffer) failed"); mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSizeW_ * bufSizeH_ * formatSize[bufnum_]; image_row_pitch = 0; } setData(mem, (unsigned int)image_row_pitch, memSize, 0x600df00d); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, srcBuffer_, mem, 0, NULL, NULL); error_ = _wrapper->clFinish(cmd_queue_); } void OCLPerfImageMapUnmap::run(void) { if (skip_) { return; } size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSizeW_, bufSizeH_, 1}; if (srcImage_ == false) { error_ = _wrapper->clEnqueueCopyBufferToImage( cmd_queue_, srcBuffer_, dstBuffer_, 0, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBufferToImage failed"); } else if (dstImage_ == false) { error_ = _wrapper->clEnqueueCopyImageToBuffer( cmd_queue_, srcBuffer_, dstBuffer_, origin, region, 0, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImageToBuffer failed"); } else { error_ = _wrapper->clEnqueueCopyImage(cmd_queue_, srcBuffer_, dstBuffer_, origin, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImage failed"); } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); const char *strSrc = NULL; const char *strDst = NULL; if (srcImage_) strSrc = "img"; else strSrc = "buf"; if (dstImage_) strDst = "img"; else strDst = "buf"; void *mem; size_t image_row_pitch; size_t image_slice_pitch; unsigned int memSize; if (dstImage_) { mem = _wrapper->clEnqueueMapImage( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * bufSizeH_; } else { mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSizeW_ * bufSizeH_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)bufSizeW_ * bufSizeH_ * formatSize[bufnum_]; image_row_pitch = 0; } checkData(mem, (unsigned int)image_row_pitch, memSize, 0x600df00d); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); _perfInfo = 0; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s src:%s dst:%s i: %4d (GB/s) ", bufSizeW_, bufSizeH_, textFormats[bufnum_], strSrc, strDst, numIter); testDescString = buf; } unsigned int OCLPerfImageMapUnmap::close(void) { if(skip_) { return CL_SUCCESS; } _wrapper->clFinish(cmd_queue_); if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(dstBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageMapUnmap.h000066400000000000000000000037751450307266000254410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageMapUnmap_H_ #define _OCL_ImageMapUnmap_H_ #include "OCLTestImp.h" class OCLPerfImageMapUnmap : public OCLTestImp { public: OCLPerfImageMapUnmap(); virtual ~OCLPerfImageMapUnmap(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1; cl_context context_; cl_command_queue cmd_queue_; cl_mem srcBuffer_; cl_mem dstBuffer_; cl_int error_; bool skip_; unsigned int bufSizeW_; unsigned int bufSizeH_; unsigned int bufnum_; bool srcImage_; bool dstImage_; unsigned int numIter; void setData(void* ptr, unsigned int pitch, unsigned int size, unsigned int value); void checkData(void* ptr, unsigned int pitch, unsigned int size, unsigned int value); }; #endif // _OCL_ImageMapUnmap_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageReadSpeed.cpp000066400000000000000000000266601450307266000261100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageReadSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {256, 512, 1024, 2048}; #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"R8G8B8A8"}; static const unsigned int formatSize[NUM_FORMATS] = {4}; static const unsigned int Iterations[2] = {1, OCLPerfImageReadSpeed::NUM_ITER}; OCLPerfImageReadSpeed::OCLPerfImageReadSpeed() { _numSubTests = NUM_SIZES * NUM_FORMATS * 2; } OCLPerfImageReadSpeed::~OCLPerfImageReadSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; skip_ = false; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; memptr = NULL; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); delete platforms; } bufSize_ = Sizes[_openTest % NUM_SIZES]; bufnum_ = (_openTest / NUM_SIZES) % NUM_FORMATS; numIter = Iterations[_openTest / (NUM_SIZES * NUM_FORMATS)]; CHECK_RESULT(platform == 0, "Couldn't find platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; outBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateImage(outBuffer) failed"); memptr = new char[bufSize_ * bufSize_ * formatSize[bufnum_]]; } void OCLPerfImageReadSpeed::run(void) { if(skip_) { return; } CPerfCounter timer; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, 1}; // Warm up error_ = _wrapper->clEnqueueReadImage(cmd_queue_, outBuffer_, CL_TRUE, origin, region, 0, 0, memptr, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadImage failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clEnqueueReadImage(cmd_queue_, outBuffer_, CL_TRUE, origin, region, 0, 0, memptr, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadImage failed"); } timer.Stop(); double sec = timer.GetElapsedTime(); // Image read bandwidth in GB/s double perf = ((double)bufSize_ * bufSize_ * formatSize[bufnum_] * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s i: %4d (GB/s) ", bufSize_, bufSize_, textFormats[bufnum_], numIter); testDescString = buf; } unsigned int OCLPerfImageReadSpeed::close(void) { if (memptr) { delete memptr; } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } OCLPerfPinnedImageReadSpeed::OCLPerfPinnedImageReadSpeed() { _numSubTests = NUM_SIZES * NUM_FORMATS * 2; } OCLPerfPinnedImageReadSpeed::~OCLPerfPinnedImageReadSpeed() {} void OCLPerfPinnedImageReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; inBuffer_ = 0; memptr = NULL; skip_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); delete platforms; } bufSize_ = Sizes[_openTest % NUM_SIZES]; bufnum_ = (_openTest / NUM_SIZES) % NUM_FORMATS; numIter = Iterations[_openTest / (NUM_SIZES * NUM_FORMATS)]; CHECK_RESULT(platform == 0, "Couldn't find platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR; inBuffer_ = _wrapper->clCreateBuffer( context_, flags, bufSize_ * bufSize_ * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); flags = CL_MEM_WRITE_ONLY; outBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateImage(outBuffer) failed"); memptr = (char *)_wrapper->clEnqueueMapBuffer( cmd_queue_, inBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSize_ * bufSize_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); } unsigned int OCLPerfPinnedImageReadSpeed::close(void) { if (skip_) { return CL_SUCCESS; } if (memptr) { error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, inBuffer_, memptr, 0, NULL, NULL); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clEnqueueUnmapMemObject(inBuffer_) failed"); clFinish(cmd_queue_); } if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageReadSpeed.h000066400000000000000000000040721450307266000255460ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageReadSpeed_H_ #define _OCL_ImageReadSpeed_H_ #include "OCLTestImp.h" class OCLPerfImageReadSpeed : public OCLTestImp { public: OCLPerfImageReadSpeed(); virtual ~OCLPerfImageReadSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; bool skip_; unsigned int bufSize_; unsigned int bufnum_; unsigned int numIter; char* memptr; }; class OCLPerfPinnedImageReadSpeed : public OCLPerfImageReadSpeed { public: OCLPerfPinnedImageReadSpeed(); virtual ~OCLPerfPinnedImageReadSpeed(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual unsigned int close(void); cl_mem inBuffer_; }; #endif // _OCL_ImageReadSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageReadWrite.cpp000066400000000000000000000175231450307266000261400ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageReadWrite.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define KERNEL_CODE(...) #__VA_ARGS__ #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {256, 512, 1024, 2048}; #if defined(CL_VERSION_2_0) #define NUM_FORMATS 2 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}, {CL_sRGBA, CL_UNORM_INT8}}; static const char *textFormats[NUM_FORMATS] = {"CL_RGBA , CL_UNSIGNED_INT8", "CL_sRGBA, CL_UNORM_INT8 "}; static const unsigned int formatSize[NUM_FORMATS] = {sizeof(CL_UNSIGNED_INT8), sizeof(CL_UNORM_INT8)}; #else #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"CL_RGBA , CL_UNSIGNED_INT8"}; static const unsigned int formatSize[NUM_FORMATS] = {sizeof(CL_UNSIGNED_INT8)}; #endif const static char *strKernel = {KERNEL_CODE( \n __constant sampler_t s_nearest = CLK_FILTER_NEAREST | CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE; \n __kernel void image_kernel(read_write image2d_t image, uint zero) { int x = get_global_id(0); int y = get_global_id(1); int offset = y * get_image_width(image) + x; int2 coords = (int2)(x, y); uint4 tmp = read_imageui(image, s_nearest, coords); write_imageui(image, coords, 1 + tmp * zero); } \n)}; OCLPerfImageReadWrite::OCLPerfImageReadWrite() { _numSubTests = NUM_SIZES * NUM_FORMATS; } OCLPerfImageReadWrite::~OCLPerfImageReadWrite() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageReadWrite::setData(void *ptr, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; for (unsigned int i = 0; i < size >> 2; i++) { ptr2[i] = value; value++; } } void OCLPerfImageReadWrite::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; testId_ = test; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; cmd_queue_ = 0; imageBuffer_ = 0; skip_ = false; // check device version size_t param_size = 0; char *strVersion = 0; error_ = _wrapper->clGetDeviceInfo( devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[9] < '2') { skip_ = true; testDescString = "Image read_write qualifier not supported in OpenCL C < 2.0. Test " "Skipped."; delete strVersion; return; } delete strVersion; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } bufSize_ = Sizes[test % NUM_SIZES]; bufnum_ = (test / NUM_SIZES) % NUM_FORMATS; memSize = bufSize_ * bufSize_ * formatSize[bufnum_]; numIter = 100; memptr = new char[memSize]; cmd_queue_ = cmdQueues_[_deviceId]; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "image_kernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); // create image setData(memptr, memSize, 0x0); imageBuffer_ = _wrapper->clCreateImage2D( context_, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, &formats[bufnum_], bufSize_, bufSize_, 0, memptr, &error_); CHECK_RESULT(error_ != CL_SUCCESS, "clCreateImage2D() failed"); const unsigned int zero = 0; // set kernel arguments error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &imageBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(unsigned int), &zero); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } void OCLPerfImageReadWrite::run(void) { if (skip_) { return; } CPerfCounter timer; size_t gws[2] = {bufSize_, bufSize_}; size_t lws[2] = {8, 8}; error_ = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, kernel_, 2, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmd_queue_); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; ++i) { error_ = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, kernel_, 2, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // speed in GB/s double perf = ((double)memSize * numIter * (double)(1e-09)) * 2 / sec; _perfInfo = (float)perf; char buf[256]; unsigned int fmt_num = (testId_ / NUM_SIZES) % NUM_FORMATS; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s(%1d) i: %4d (GB/s) ", bufSize_, bufSize_, textFormats[fmt_num], formatSize[bufnum_], numIter); testDescString = buf; } unsigned int OCLPerfImageReadWrite::close(void) { if (!skip_) { if (memptr) { delete memptr; } if (imageBuffer_) { error_ = _wrapper->clReleaseMemObject(imageBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(imageBuffer_) failed"); } } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageReadWrite.h000066400000000000000000000034221450307266000255760ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageReadWrite #define _OCL_ImageReadWrite #include "OCLTestImp.h" class OCLPerfImageReadWrite : public OCLTestImp { public: OCLPerfImageReadWrite(); virtual ~OCLPerfImageReadWrite(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); virtual void setData(void* ptr, unsigned int size, unsigned int value); cl_command_queue cmd_queue_; cl_mem imageBuffer_; unsigned int bufSize_; unsigned int bufnum_; unsigned int numIter; char* memptr; unsigned int memSize; unsigned int testId_; bool skip_; }; #endif // _OCL_ImageReadWrite clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageReadsRGBA.cpp000066400000000000000000000211001450307266000257260ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageReadsRGBA.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define KERNEL_CODE(...) #__VA_ARGS__ #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {256, 512, 1024, 2048}; #if defined(CL_VERSION_2_0) #define NUM_FORMATS 2 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}, {CL_sRGBA, CL_UNORM_INT8}}; static const char *textFormats[NUM_FORMATS] = {"CL_RGBA , CL_UNSIGNED_INT8", "CL_sRGBA, CL_UNORM_INT8 "}; static const unsigned int formatSize[NUM_FORMATS] = {sizeof(CL_UNSIGNED_INT8), sizeof(CL_UNORM_INT8)}; #else #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"CL_RGBA , CL_UNSIGNED_INT8"}; static const unsigned int formatSize[NUM_FORMATS] = {sizeof(CL_UNSIGNED_INT8)}; #endif const static char *strKernel = {KERNEL_CODE( \n __constant sampler_t s_nearest = CLK_FILTER_NEAREST | CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE; \n // Read sRGBA image object (input) and convert it to linear RGB values // (results): __kernel void image_kernel(read_only image2d_t input, __global float4 *results) { int x = get_global_id(0); int y = get_global_id(1); int offset = y * get_image_width(input) + x; int2 coords = (int2)(x, y); float4 tmp = read_imagef(input, s_nearest, coords); if (x < 0 && tmp.x == 0.f) { results[offset] = tmp; } } \n)}; OCLPerfImageReadsRGBA::OCLPerfImageReadsRGBA() { _numSubTests = NUM_SIZES * NUM_FORMATS; } OCLPerfImageReadsRGBA::~OCLPerfImageReadsRGBA() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageReadsRGBA::setData(void *ptr, unsigned int size, float value) { unsigned int *ptr_i = (unsigned int *)ptr; for (unsigned int i = 0; i < size >> 2; i++) { ptr_i[i] = (int)value; value++; } } void OCLPerfImageReadsRGBA::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; testId_ = test; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; cmd_queue_ = 0; imageBuffer_ = 0; valueBuffer_ = 0; skip_ = false; // check device version size_t param_size = 0; char *strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { skip_ = true; testDescString = "sRGBA Image not supported for < 2.0 devices. Test Skipped."; delete strVersion; return; } delete strVersion; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } bufSize_ = Sizes[test % NUM_SIZES]; bufnum_ = (test / NUM_SIZES) % NUM_FORMATS; memSize = bufSize_ * bufSize_ * formatSize[bufnum_]; numIter = 100; memptr = new char[memSize]; cmd_queue_ = cmdQueues_[_deviceId]; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "image_kernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); setData(memptr, memSize, 0.f); size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, 1}; // create image imageBuffer_ = _wrapper->clCreateImage2D( context_, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, &formats[bufnum_], bufSize_, bufSize_, 0, memptr, &error_); CHECK_RESULT(imageBuffer_ == 0, "clCreateImage2D(imageBuffer_) failed"); valueBuffer_ = clCreateBuffer( context_, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, memSize, 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "Error clCreateBuffer()"); // set kernel arguments error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &imageBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), &valueBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } void OCLPerfImageReadsRGBA::run(void) { if (skip_) { return; } CPerfCounter timer; size_t gws[2] = {bufSize_, bufSize_}; size_t lws[2] = {8, 8}; // warm-up error_ = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, kernel_, 2, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmd_queue_); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; ++i) { error_ = _wrapper->clEnqueueNDRangeKernel(cmd_queue_, kernel_, 2, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // read_imagef from sRGB to linear RGB speed in GB/s double perf = ((double)memSize * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; unsigned int fmt_num = (testId_ / NUM_SIZES) % NUM_FORMATS; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s(%1d) i: %4d (GB/s) ", bufSize_, bufSize_, textFormats[fmt_num], formatSize[bufnum_], numIter); testDescString = buf; } unsigned int OCLPerfImageReadsRGBA::close(void) { if (skip_) { return CL_SUCCESS; } if (memptr) { delete memptr; } if (imageBuffer_) { error_ = _wrapper->clReleaseMemObject(imageBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(imageBuffer_) failed"); } if (valueBuffer_) { error_ = _wrapper->clReleaseMemObject(valueBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(valueBuffer_) failed"); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageReadsRGBA.h000066400000000000000000000034531450307266000254060ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageReadsRGBA_H_ #define _OCL_ImageReadsRGBA_H_ #include "OCLTestImp.h" class OCLPerfImageReadsRGBA : public OCLTestImp { public: OCLPerfImageReadsRGBA(); virtual ~OCLPerfImageReadsRGBA(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); virtual void setData(void* ptr, unsigned int size, float value); cl_command_queue cmd_queue_; cl_mem imageBuffer_; cl_mem valueBuffer_; unsigned int bufSize_; unsigned int bufnum_; unsigned int numIter; char* memptr; unsigned int memSize; unsigned int testId_; bool skip_; }; #endif // _OCL_ImageReadsRGBA_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageSampleRate.cpp000066400000000000000000000300661450307266000263040ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageSampleRate.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_TYPES 6 static const cl_image_format formats[NUM_TYPES] = { {CL_R, CL_UNSIGNED_INT8}, {CL_RG, CL_UNSIGNED_INT8}, {CL_RGBA, CL_UNSIGNED_INT8}, {CL_R, CL_FLOAT}, {CL_RGBA, CL_HALF_FLOAT}, {CL_RGBA, CL_FLOAT}}; static const char *types[NUM_TYPES] = { "R8", "R8G8", "R8G8B8A8", "R32F", "R16G16B16A16F", "R32G32B32A32F"}; static const unsigned int typeSizes[NUM_TYPES] = {1, 2, 4, 4, 8, 16}; #define NUM_SIZES 12 static const unsigned int sizes[NUM_SIZES] = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048}; #define NUM_BUFS 6 #define MAX_BUFS (1 << (NUM_BUFS - 1)) OCLPerfImageSampleRate::OCLPerfImageSampleRate() { _numSubTests = NUM_TYPES * NUM_SIZES * NUM_BUFS; } OCLPerfImageSampleRate::~OCLPerfImageSampleRate() {} void OCLPerfImageSampleRate::setKernel(void) { shader_.clear(); shader_ += "kernel void sampleRate(global float4* outBuffer, unsigned int " "inBufSize, unsigned int writeIt,\n"; char buf[256]; for (unsigned int i = 0; i < numBufs_; i++) { SNPRINTF(buf, sizeof(buf), "read_only image2d_t inBuffer%d", i); shader_ += buf; if (i < (numBufs_ - 1)) { shader_ += ","; } shader_ += "\n"; } shader_ += ")\n"; shader_ += "{\n" " uint gid = get_global_id(0);\n" " uint inputIdx = gid % inBufSize;\n" " const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | " "CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;\n" " float4 tmp = (float4)0.0f;\n"; for (unsigned int i = 0; i < numBufs_; i++) { SNPRINTF(buf, sizeof(buf), " tmp += read_imagef(inBuffer%d, sampler, (int2)( gid %% " "inBufSize, (gid / inBufSize) %% inBufSize));\n", i); shader_ += buf; } shader_ += " if (writeIt*(unsigned int)tmp.x) outBuffer[gid] = tmp;\n" "}\n"; // printf("Shader -> %s\n", shader_.c_str()); } void OCLPerfImageSampleRate::setData(cl_mem buffer, unsigned int val) { size_t origin[3] = {0, 0, 0}; size_t region[3] = {width_, width_, 1}; size_t image_row_pitch; size_t image_slice_pitch; unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapImage( cmd_queue_, buffer, true, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < width_ * width_; i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } void OCLPerfImageSampleRate::checkData(cl_mem buffer) { #if 0 float* data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, outBufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < outBufSize_/sizeof(float); i++) { if (data[i] != (float)numBufs_) { printf("Data validation failed at %d! Got %f, expected %f\n", i, data[i], (float)numBufs_); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); #endif } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageSampleRate::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; skip_ = false; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; // We compute a square domain width_ = sizes[test % NUM_SIZES]; numBufs_ = (1 << ((test / NUM_SIZES) % NUM_BUFS)); typeIdx_ = (test / (NUM_SIZES * NUM_BUFS)) % NUM_TYPES; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); delete platforms; } /* * If we could find a platform, use it. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); inBuffer_ = (cl_mem *)malloc(sizeof(cl_mem) * numBufs_); memset(inBuffer_, 0, sizeof(cl_mem) * numBufs_); for (unsigned int i = 0; i < numBufs_; i++) { inBuffer_[i] = _wrapper->clCreateImage2D(context_, CL_MEM_READ_ONLY, &formats[typeIdx_], width_, width_, 0, NULL, &error_); CHECK_RESULT(inBuffer_[i] == 0, "clCreateImage2D(inBuffer) failed"); } outBufSize_ = sizes[NUM_SIZES - 1] * sizes[NUM_SIZES - 1] * sizeof(cl_float4); outBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, outBufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); setKernel(); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); const char *buildOps = NULL; error_ = _wrapper->clBuildProgram(program_, 1, &device, buildOps, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "sampleRate", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(outBuffer) failed"); unsigned int sizeDW = width_; error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(unsigned int), (void *)&sizeDW); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(sizeDW) failed"); unsigned int writeIt = 0; error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(unsigned int), (void *)&writeIt); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(writeIt) failed"); for (unsigned int i = 0; i < numBufs_; i++) { error_ = _wrapper->clSetKernelArg(kernel_, i + 3, sizeof(cl_mem), (void *)&inBuffer_[i]); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(inBuffer) failed"); // setData(inBuffer_[i], 0x3f800000); } // setData(outBuffer_, 0xdeadbeef); } void OCLPerfImageSampleRate::run(void) { if (skip_) { return; } int global = outBufSize_ / typeSizes[typeIdx_]; int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; unsigned int maxIter = MAX_ITERATIONS * (MAX_BUFS / numBufs_); CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < maxIter; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); } CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // checkData(outBuffer_); // Compute GB/s double perf = ((double)outBufSize_ * numBufs_ * (double)maxIter * (double)(1e-09)) / sec; char buf[256]; SNPRINTF(buf, sizeof(buf), "Domain %dx%d, %13s, %2d images,%4dx%4d (GB/s)", sizes[NUM_SIZES - 1], sizes[NUM_SIZES - 1], types[typeIdx_], numBufs_, width_, width_); _perfInfo = (float)perf; testDescString = buf; } unsigned int OCLPerfImageSampleRate::close(void) { if (skip_) { return CL_SUCCESS; } _wrapper->clFinish(cmd_queue_); if (inBuffer_) { for (unsigned int i = 0; i < numBufs_; i++) { if (inBuffer_[i]) { error_ = _wrapper->clReleaseMemObject(inBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } } free(inBuffer_); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageSampleRate.h000066400000000000000000000037521450307266000257530ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_IMAGESAMPLERATE_H_ #define _OCL_IMAGESAMPLERATE_H_ #include "OCLTestImp.h" class OCLPerfImageSampleRate : public OCLTestImp { public: OCLPerfImageSampleRate(); virtual ~OCLPerfImageSampleRate(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); void setKernel(void); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem* inBuffer_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int outBufWidth_; unsigned int outBufSize_; static const unsigned int MAX_ITERATIONS = 25; unsigned int numBufs_; unsigned int typeIdx_; bool skip_; }; #endif // _OCL_IMAGESAMPLERATE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageWriteSpeed.cpp000066400000000000000000000301111450307266000263110ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfImageWriteSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {256, 512, 1024, 2048}; #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"R8G8B8A8"}; static const unsigned int formatSize[NUM_FORMATS] = {4}; static const unsigned int Iterations[2] = {1, OCLPerfImageWriteSpeed::NUM_ITER}; OCLPerfImageWriteSpeed::OCLPerfImageWriteSpeed() { _numSubTests = NUM_SIZES * NUM_FORMATS * 2; } OCLPerfImageWriteSpeed::~OCLPerfImageWriteSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfImageWriteSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; memptr = NULL; skip_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } bufSize_ = Sizes[_openTest % NUM_SIZES]; bufnum_ = (_openTest / NUM_SIZES) % NUM_FORMATS; numIter = Iterations[_openTest / (NUM_SIZES * NUM_FORMATS)]; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; outBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateImage(outBuffer) failed"); memptr = new char[bufSize_ * bufSize_ * formatSize[bufnum_]]; } void OCLPerfImageWriteSpeed::run(void) { if (skip_) { return; } CPerfCounter timer; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, 1}; // Warm up error_ = _wrapper->clEnqueueWriteImage(cmd_queue_, outBuffer_, CL_TRUE, origin, region, 0, 0, memptr, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadImage failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clEnqueueWriteImage(cmd_queue_, outBuffer_, CL_TRUE, origin, region, 0, 0, memptr, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadImage failed"); } timer.Stop(); double sec = timer.GetElapsedTime(); // Image write bandwidth in GB/s double perf = ((double)bufSize_ * bufSize_ * formatSize[bufnum_] * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s i: %4d (GB/s) ", bufSize_, bufSize_, textFormats[bufnum_], numIter); testDescString = buf; } unsigned int OCLPerfImageWriteSpeed::close(void) { if(skip_) { return CL_SUCCESS; } if (memptr) { delete memptr; } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } OCLPerfPinnedImageWriteSpeed::OCLPerfPinnedImageWriteSpeed() { _numSubTests = NUM_SIZES * NUM_FORMATS * 2; } OCLPerfPinnedImageWriteSpeed::~OCLPerfPinnedImageWriteSpeed() {} void OCLPerfPinnedImageWriteSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; memptr = NULL; skip_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); delete platforms; } bufSize_ = Sizes[_openTest % NUM_SIZES]; bufnum_ = (_openTest / NUM_SIZES) % NUM_FORMATS; numIter = Iterations[_openTest / (NUM_SIZES * NUM_FORMATS)]; CHECK_RESULT(platform == 0, "Couldn't find platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR; inBuffer_ = _wrapper->clCreateBuffer( context_, flags, bufSize_ * bufSize_ * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); flags = CL_MEM_WRITE_ONLY; outBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateImage(outBuffer) failed"); memptr = (char *)_wrapper->clEnqueueMapBuffer( cmd_queue_, inBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSize_ * bufSize_ * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); } unsigned int OCLPerfPinnedImageWriteSpeed::close(void) { if (skip_) { return CL_SUCCESS; } if (memptr) { error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, inBuffer_, memptr, 0, NULL, NULL); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clEnqueueUnmapMemObject(inBuffer_) failed"); clFinish(cmd_queue_); } if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfImageWriteSpeed.h000066400000000000000000000041061450307266000257630ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageWriteSpeed_H_ #define _OCL_ImageWriteSpeed_H_ #include "OCLTestImp.h" class OCLPerfImageWriteSpeed : public OCLTestImp { public: OCLPerfImageWriteSpeed(); virtual ~OCLPerfImageWriteSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; unsigned int bufnum_; unsigned int numIter; char* memptr; bool skip_; }; class OCLPerfPinnedImageWriteSpeed : public OCLPerfImageWriteSpeed { public: OCLPerfPinnedImageWriteSpeed(); virtual ~OCLPerfPinnedImageWriteSpeed(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual unsigned int close(void); cl_mem inBuffer_; }; #endif // _OCL_ImageWriteSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfKernelArguments.cpp000066400000000000000000000217261450307266000264150ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfKernelArguments.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" static const size_t BufSize = 0x1000; static const size_t Iterations = 0x10000; static const size_t TotalQueues = 4; static const size_t NumBufCnts = 4; static const size_t TotalArgs = 4; #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif static const char* Arguments[TotalArgs] = { "__global uint* out", "__global uint* out, __global uint* buf0, __global uint* buf1, __global " "uint* buf2, __global uint* buf3", "__global uint* out, __global uint* buf0, __global uint* buf1, __global " "uint* buf2, __global uint* buf3, \n" "__global uint* buf4, __global uint* buf5, __global uint* buf6, __global " "uint* buf7, __global uint* buf8", "__global uint* out, __global uint* buf0, __global uint* buf1, __global " "uint* buf2, __global uint* buf3,\n" "__global uint* buf4, __global uint* buf5, __global uint* buf6, __global " "uint* buf7, __global uint* buf8,\n" "__global uint* buf9, __global uint* buf10, __global uint* buf11, __global " "uint* buf12, __global uint* buf13,\n" "__global uint* buf14, __global uint* buf15, __global uint* buf16, " "__global uint* buf17, __global uint* buf18"}; static const char* strKernel = "__kernel void dummy(%s) \n" "{ \n" " uint id = get_global_id(0); \n" " uint value = 1; \n" " out[id] = value; \n" "} \n"; OCLPerfKernelArguments::OCLPerfKernelArguments() { _numSubTests = TotalQueues * TotalArgs * NumBufCnts * 2; failed_ = false; } OCLPerfKernelArguments::~OCLPerfKernelArguments() {} void OCLPerfKernelArguments::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { cl_mem buffer; _deviceId = deviceId; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); test_ = test; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } perBatch_ = test >= (TotalQueues * TotalArgs * NumBufCnts); size_t numArguments = (test_ / TotalQueues) % TotalArgs; char* program = new char[4096]; SNPRINTF(program, sizeof(char) * 4096, strKernel, Arguments[numArguments]); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char**)&program, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "dummy", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); delete[] program; static const size_t NumBuffs[NumBufCnts] = {0x20, 0x100, 0x800, 0x2000}; size_t numMems = NumBuffs[(test_ / (TotalQueues * TotalArgs)) % NumBufCnts]; size_t bufSize = BufSize * sizeof(cl_int4); for (size_t b = 0; b < numMems; ++b) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, bufSize, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfKernelArguments::run(void) { if (failed_) { return; } unsigned int* values; values = reinterpret_cast(new cl_int4[BufSize]); CPerfCounter timer; static const size_t Queues[] = {1, 2, 4, 8}; size_t numQueues = Queues[test_ % TotalQueues]; cl_uint numArguments; _wrapper->clGetKernelInfo(kernel_, CL_KERNEL_NUM_ARGS, sizeof(cl_uint), &numArguments, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetKernelInfo() failed"); // Clear destination buffer memset(values, 0, BufSize * sizeof(cl_int4)); size_t iter = Iterations / numQueues / buffers_.size(); iter = (iter == 0) ? 1 : iter; std::vector cmdQueues(numQueues); for (size_t q = 0; q < numQueues; ++q) { cl_command_queue cmdQueue = _wrapper->clCreateCommandQueue( context_, devices_[_deviceId], 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed"); cmdQueues[q] = cmdQueue; } // Warm-up for (size_t b = 0; b < (buffers_.size() / numArguments); ++b) { for (size_t q = 0; q < numQueues; ++q) { for (cl_uint a = 0; a < numArguments; ++a) { cl_mem buffer = buffers()[(b * numArguments + a) % buffers_.size()]; error_ = _wrapper->clSetKernelArg(kernel_, a, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } size_t gws[1] = {256}; size_t lws[1] = {256}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues[q], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } } for (size_t q = 0; q < numQueues; ++q) { _wrapper->clFinish(cmdQueues[q]); } size_t disp = 0; timer.Reset(); timer.Start(); for (size_t i = 0; i < iter; ++i) { for (size_t b = 0; b < buffers_.size(); ++b) { for (size_t q = 0; q < numQueues; ++q) { for (cl_uint a = 0; a < numArguments; ++a) { cl_mem buffer = buffers()[(b * numArguments + a) % buffers_.size()]; error_ = _wrapper->clSetKernelArg(kernel_, a, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } size_t gws[1] = {256}; size_t lws[1] = {256}; error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues[q], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); disp++; if (perBatch_) { _wrapper->clFlush(cmdQueues[q]); } } if (perBatch_) { for (size_t q = 0; q < numQueues; ++q) { _wrapper->clFinish(cmdQueues[q]); } } } } for (size_t q = 0; q < numQueues; ++q) { _wrapper->clFinish(cmdQueues[q]); } timer.Stop(); for (size_t q = 0; q < numQueues; ++q) { error_ = _wrapper->clReleaseCommandQueue(cmdQueues[q]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } std::stringstream stream; if (perBatch_) stream << "Time per batch (us) for " << numQueues << " queues, "; else stream << "Time per dispatch (us) for " << numQueues << " queues, "; stream.flags(std::ios::right | std::ios::showbase); stream.width(2); stream << numArguments; stream << " args, "; stream.flags(std::ios::right | std::ios::showbase); stream.width(4); stream << buffers_.size() << " bufs"; testDescString = stream.str(); _perfInfo = static_cast(timer.GetElapsedTime() * 1000000 / disp); delete[] values; } unsigned int OCLPerfKernelArguments::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfKernelArguments.h000066400000000000000000000031341450307266000260530ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_KERNEL_ARGUMENTS_H_ #define _OCL_PERF_KERNEL_ARGUMENTS_H_ #include "OCLTestImp.h" class OCLPerfKernelArguments : public OCLTestImp { public: OCLPerfKernelArguments(); virtual ~OCLPerfKernelArguments(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int test_; bool perBatch_; }; #endif // _OCL_PERF_KERNEL_ARGUMENTS_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfKernelThroughput.cpp000066400000000000000000001116361450307266000266210ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfKernelThroughput.h" #include #include #include #include #include #include "CL/cl.h" #include "Timer.h" #define DO_GPU_KERNELS 1 #if 0 #define ENTER(X) printf("Entering %s\n", X); #define EXIT(X) printf("Exiting %s\n", X); #define PKT(X) X #else #define ENTER(X) #define EXIT(X) #define PKT(X) #endif // work with multiples of 128 #define ROUND_MULT(VAL, MULT) ((VAL / MULT) * MULT) /* int roundUp( int numToRound, int multiple) { int r = numToRound % multiple; if (r == 0) { return numToRound; } else { return numToRound + multiple - remainder; } } */ // quiety warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define WORK_GROUP_SIZE 256 /******************************************************************************* * Enumerated Types for Tests ******************************************************************************/ // memory operations const LARGE_INT numKernelTypes = 2; static const char *kernelType[numKernelTypes] = {"MatMul", "Madds"}; // source/read memory locations const LARGE_INT numMemPaths = 2; static const char *memPath[numMemPaths] = {"Host", "Device"}; // buffer size const LARGE_INT numNumElements = 12; // 15; static const LARGE_INT numElements[numNumElements] = { 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216 //, // 67108864, // 268435456 }; // flops/byte const LARGE_INT numWorkSizes = 5; static const LARGE_INT workSize[numWorkSizes] = {1, 4, 16, 64, 256}; const float initFloat = 0.001f; const float zeroFloat = 0.0f; #define WORK_GROUP_SIZE 256 /******************************************************************************* * Write the Matrix Multiply Shader Kernel ******************************************************************************/ void OCLPerfKernelThroughput::genShaderMatrixMultiply() { ENTER("genShaderMatrixMultiply"); std::stringstream ss; ss.clear(); #if 0 printf("%ix%i * %ix%i = %ix%i:\n", matrixDim1_, matrixDim2_, matrixDim2_, matrixDim1_, matrixDim1_, matrixDim1_ ); #endif ss << "#define BLOCK_SIZE 16\n" "#define HA " << matrixDim1_ << "\n" "#define WA " << matrixDim2_ << "\n" "#define HB WA\n" "#define WB HA\n" "#define HC HA\n" "#define WC WB\n" "__kernel void\n" "__attribute__((reqd_work_group_size(16,16,1)))\n" "kernel1(\n" " __global float * restrict C,\n" " __global float * restrict A,\n" " __global float * restrict B )\n" "{\n" " int bx = get_group_id(0);\n" " int by = get_group_id(1);\n" " int tx = get_local_id(0);\n" " int ty = get_local_id(1);\n" " int aBegin = WA * BLOCK_SIZE * by;\n" " int aEnd = aBegin + WA - 1;\n" " int aStep = BLOCK_SIZE;\n" " int bBegin = BLOCK_SIZE * bx;\n" " int bStep = BLOCK_SIZE * WB;\n" " __private float c = 0.f;\n" " __local float localA[BLOCK_SIZE][BLOCK_SIZE];\n" " __local float localB[BLOCK_SIZE][BLOCK_SIZE];\n" " for (\n" " int a = aBegin, b = bBegin;\n" " a <= aEnd;\n" " a += aStep, b += bStep)\n" " {\n" " localA[ty][tx] = (get_global_id(0) < WA && get_global_id(1) < " "HA) ? A[a + WA * ty + tx] : 0;\n" " localB[ty][tx] = (get_global_id(0) < WB && get_global_id(1) < " "HB) ? B[b + WB * ty + tx] : 0;\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " for (int k = 0; k < BLOCK_SIZE; ++k)\n" " c += localA[ty][k] * localB[k][tx];\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " }\n" " int cIdx = WB * BLOCK_SIZE * by + BLOCK_SIZE * bx + WB * ty + tx;\n" " if (get_global_id(0) < WC && get_global_id(1) < WC)\n" " {\n" " C[cIdx] = c;\n" " }\n" "}\n"; shader_ = ss.str(); gold_ = 0.f; for (int i = 0; i < matrixDim2_; i++) gold_ += initFloat * initFloat; // gold_ = initFloat * initFloat * matrixDim2_; // printf("shader:\n%s\n", shader_.c_str()); // printf("gold_: %f\n", gold_); EXIT("genShaderMatrixMultiply"); } /******************************************************************************* * Write the Madds Shader Kernel ******************************************************************************/ void OCLPerfKernelThroughput::genShaderMadds() { ENTER("genShaderMadds"); int flopLoopIter = 2 * (flopsPerByte_ * 4 * 4) / 16; // bytes, flops std::stringstream ss; ss.clear(); float a, b; ss << // begin kernel "__kernel void\n" "__attribute__((reqd_work_group_size(" << 256 << ",1,1)))\n" "kernel1(\n" " __global float4 * restrict input,\n" " __global float4 * restrict output )\n" "{\n"; // begin loop ss << " for ( uint idx = get_global_id(0);\n" " idx < " << numElements[numElementsIdx_] << ";\n" " idx += get_global_size(0) )\n" " {\n"; // do load ss << " float4 prefetch = input[ idx ];\n" " float a0 = prefetch.x;\n" " float a1 = prefetch.y;\n" " float a2 = prefetch.z;\n" " float a3 = prefetch.w;\n" " float b0 = a0;\n" " float b1 = a1;\n" " float b2 = a2;\n" " float b3 = a3;\n"; a = initFloat; b = a; // do math for (int i = 0; i < flopLoopIter; i++) { ss << " a0 += b3*b1;\n" " a1 += b0*b2;\n" " a2 += b1*b3;\n" " a3 += b2*b0;\n" " b0 += a3*a1;\n" " b1 += a0*a2;\n" " b2 += a1*a3;\n" " b3 += a2*a0;\n"; // printf("a += b*b; %f += %f*%f\n", a, b, b); a += b * b; // printf("b += a*a; %f += %f*%f\n", b, a, a); b += a * a; } // do write or accumulate ss << " __private float4 tmp;\n" " tmp.x = b0;\n" " tmp.y = b1;\n" " tmp.z = b2;\n" " tmp.w = b3;\n" " output[ idx ] = tmp;\n"; gold_ = b; // printf("GPU gold_ Tmp: %f\n", gold_); // end loop ss << " } // end loop\n"; // end kernel ss << " } // end kernel\n\n"; shader_ = ss.str(); // printf("shader:\n%s\n", shader_.c_str()); // printf("gold_: %f\n", gold_); EXIT("genShaderMadds"); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} /******************************************************************************* * Constructor ******************************************************************************/ OCLPerfKernelThroughput::OCLPerfKernelThroughput() { ENTER("constructor"); _numSubTests = numKernelTypes * numMemPaths * numNumElements * numWorkSizes; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; context_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); // Get last for default platform = platforms[numPlatforms - 1]; for (unsigned i = 0; i < numPlatforms; ++i) { char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[i], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present // instead of just returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { // printf("NumDevices: %i\n", num_devices); platform = platforms[i]; break; } } delete platforms; } /* * If we could find our platform, use it, else die. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; // get gpu speed error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(maxClockFrequency_), &maxClockFrequency_, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(maxComputeUnits_), &maxComputeUnits_, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (maxComputeUnits_ > 8) { // printf("%i CUs reported; assuming 8 instead.", maxComputeUnits_); maxComputeUnits_ = 8; } // printf("Compute Units: %i\n", maxComputeUnits_); // printf("Subtests: %i\n", _numSubTests); // create context context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } cl_uint tmp; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(tmp), &tmp, NULL); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); // printf("NumComputeUnits: %u\n", tmp); maxComputeUnits_ = static_cast(tmp); // printf("NumComputeUnits: %lld\n", maxComputeUnits_); EXIT("constructor"); } OCLPerfKernelThroughput::~OCLPerfKernelThroughput() {} /******************************************************************************* * Open - initializes test, compile GPU kernel ******************************************************************************/ void OCLPerfKernelThroughput::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { ENTER("open"); /*********************************************************** * select subtest **********************************************************/ int testIdx = test + numKernelTypes * numMemPaths * numNumElements * numWorkSizes; memPathIdx_ = testIdx % numMemPaths; testIdx /= numMemPaths; numElementsIdx_ = testIdx % numNumElements; testIdx /= numNumElements; workSizeIdx_ = testIdx % numWorkSizes; testIdx /= numWorkSizes; kernelTypeIdx_ = testIdx % numKernelTypes; testIdx /= numKernelTypes; // float md1; // kernel values switch (kernelTypeIdx_) { case 0: // Matrix Multiply // md1 = sqrt(1.f*numElements[numElementsIdx_]); // printf("MD1: sqrt(%f) = %f\n", 1.f*numElements[numElementsIdx_],md1); matrixDim1_ = static_cast(sqrt(1.f * numElements[numElementsIdx_])); matrixDim2_ = matrixDim1_ * (int)workSize[workSizeIdx_]; genShaderMatrixMultiply(); work_dim_ = 2; global_work_size_ = new size_t[work_dim_]; global_work_size_[0] = ((matrixDim1_ - 1) / 16 + 1) * 16; // matrixDim1_ < 16 ? 16 : matrixDim1_; global_work_size_[1] = global_work_size_[0]; local_work_size_ = new size_t[work_dim_]; local_work_size_[0] = 16; local_work_size_[1] = local_work_size_[0]; /* printf("Global: %ix%i; Local: %ix%i; Matrix: %ix%i\n", global_work_size_[0], global_work_size_[1], local_work_size_[0], local_work_size_[1], matrixDim1_, matrixDim2_ ); */ input1BufferSize_ = static_cast(matrixDim1_ * matrixDim2_ * sizeof(float)); input2BufferSize_ = static_cast(matrixDim2_ * matrixDim1_ * sizeof(float)); output1BufferSize_ = static_cast(matrixDim1_ * matrixDim1_ * sizeof(float)); _reqDataSize = (1.0 * matrixDim1_ * matrixDim2_ * sizeof(float)) + (1.0 * matrixDim2_ * matrixDim1_ * sizeof(float)) + (1.0 * matrixDim1_ * matrixDim1_ * sizeof(float)); break; case 1: // Flops/Byte flopsPerByte_ = (int)workSize[workSizeIdx_]; // for kernelType == 0 genShaderMadds(); numWorkGroupsPerComputeUnit_ = 32; // TODO numThreads_ = numWorkGroupsPerComputeUnit_ * maxComputeUnits_ * WORK_GROUP_SIZE; work_dim_ = 1; global_work_size_ = new size_t[work_dim_]; local_work_size_ = new size_t[work_dim_]; global_work_size_[0] = numThreads_; local_work_size_[0] = WORK_GROUP_SIZE; input1BufferSize_ = static_cast(numElements[numElementsIdx_] * sizeof(float4)); input2BufferSize_ = 0; output1BufferSize_ = static_cast(numElements[numElementsIdx_] * sizeof(float4)); _reqDataSize = 2.0 * numElements[numElementsIdx_] * sizeof(float4); break; } PKT(printf("Test Parameters:\n" "\tkernelTypeIdx: %i\n" "\tmemPathIdx: %i\n" "\tnumElementsIdx: %i\n" "\tworkSizeIdx: %i\n" "\n\n", kernelTypeIdx_, memPathIdx_, numElementsIdx_, workSizeIdx_);) /*********************************************************** * get context and queue **********************************************************/ cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0; _deviceId = deviceId; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; input1Buffer_ = 0; output1Buffer_ = 0; _errorFlag = false; // Reset error code so a single error // doesn't prevent other subtests from running _errorMsg = ""; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present // instead of just returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); delete platforms; } /* * If we could find our platform, use it, else die. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* * Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); device = devices[0]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, CL_QUEUE_PROFILING_ENABLE, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); // Global memory size cl_ulong _maxMemoryAllocationSize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), &_maxMemoryAllocationSize, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs(CL_DEVICE_GLOBAL_MEM_SIZE) failed"); #if 0 printf("Buffer Sizes: %i %i %i = %f\n", input1BufferSize_, input2BufferSize_, output1BufferSize_, _reqDataSize); #endif _dataSizeTooBig = (_reqDataSize > _maxMemoryAllocationSize); if (_dataSizeTooBig) { // printf("DATA TOO LARGE FOR DEVICE !!!"); return; } // create kernel char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "kernel1", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); /*********************************************************** * Allocate GPU Memory **********************************************************/ cl_mem_flags inputBufferFlags = 0; cl_mem_flags outputBufferFlags = 0; // choose gpu source buffer type switch (memPathIdx_) { case 0: // host memory // printf("Allocating Host Memories\n"); // allocate "device" memory inputBufferFlags = CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR; outputBufferFlags = CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR; input1Buffer_ = _wrapper->clCreateBuffer( context_, inputBufferFlags, input1BufferSize_, NULL, &error_); CHECK_RESULT(input1Buffer_ == 0, "clCreateBuffer Input failed"); if (input1Buffer_ == 0) printf("Error: %i\n", error_); if (input2BufferSize_) { input2Buffer_ = _wrapper->clCreateBuffer( context_, inputBufferFlags, input2BufferSize_, NULL, &error_); CHECK_RESULT(input2Buffer_ == 0, "clCreateBuffer Input failed"); } output1Buffer_ = _wrapper->clCreateBuffer( context_, outputBufferFlags, output1BufferSize_, NULL, &error_); CHECK_RESULT(output1Buffer_ == 0, "clCreateBuffer Input failed"); if (output1Buffer_ == 0) printf("Error: %i\n", error_); // map host memory input1Ptr_ = (float *)_wrapper->clEnqueueMapBuffer( cmd_queue_, input1Buffer_, true, CL_MAP_WRITE, 0, input1BufferSize_, 0, NULL, NULL, &error_); if (input2BufferSize_) { input2Ptr_ = (float *)_wrapper->clEnqueueMapBuffer( cmd_queue_, input2Buffer_, true, CL_MAP_WRITE, 0, input2BufferSize_, 0, NULL, NULL, &error_); } output1Ptr_ = (float *)_wrapper->clEnqueueMapBuffer( cmd_queue_, output1Buffer_, true, CL_MAP_READ, 0, output1BufferSize_, 0, NULL, NULL, &error_); _wrapper->clFinish(cmd_queue_); break; case 1: // device memory // printf("Allocating Device Memories\n"); // allocate device memory inputBufferFlags = CL_MEM_READ_WRITE; outputBufferFlags = CL_MEM_READ_WRITE; input1Buffer_ = _wrapper->clCreateBuffer( context_, inputBufferFlags, input1BufferSize_, NULL, &error_); CHECK_RESULT(input1Buffer_ == 0, "clCreateBuffer Input failed"); if (input2BufferSize_) { input2Buffer_ = _wrapper->clCreateBuffer( context_, inputBufferFlags, input2BufferSize_, NULL, &error_); CHECK_RESULT(input2Buffer_ == 0, "clCreateBuffer Input failed"); } output1Buffer_ = _wrapper->clCreateBuffer( context_, outputBufferFlags, output1BufferSize_, NULL, &error_); CHECK_RESULT(output1Buffer_ == 0, "clCreateBuffer Input failed"); // printf("\tDone Allocating Device Memory\n"); // allocate host memory input1Ptr_ = new float[input1BufferSize_ / sizeof(float)]; if (input2BufferSize_) { input2Ptr_ = new float[input2BufferSize_ / sizeof(float)]; } output1Ptr_ = new float[output1BufferSize_ / sizeof(float)]; // printf("\tDone Allocating Host Memory\n"); break; default: CHECK_RESULT(1, "Invalid Memory Path Idx"); // invalid } for (unsigned int i = 0; i < input1BufferSize_ / sizeof(float); i++) { input1Ptr_[i] = initFloat; } for (unsigned int i = 0; i < input2BufferSize_ / sizeof(float); i++) { input2Ptr_[i] = initFloat; } for (unsigned int i = 0; i < output1BufferSize_ / sizeof(float); i++) { output1Ptr_[i] = zeroFloat; } #if 0 printf("Allocating GPU: %.0fMB, %.0fMB\n", static_cast(1.f*input1BufferSize_/1024.f/1024.f), static_cast(1.f*output1BufferSize_/1024.f/1024.f)); input1Buffer_ = _wrapper->clCreateBuffer( context_, inputBufferFlags, input1BufferSize_, NULL, &error_); CHECK_RESULT(input1Buffer_ == 0, "clCreateBuffer Input failed"); output1Buffer_ = _wrapper->clCreateBuffer( context_, outputBufferFlags, output1BufferSize_, NULL, &error_); CHECK_RESULT(output1Buffer_ == 0, "clCreateBuffer Output failed"); error_ = /*_wrapper->*/clEnqueueFillBuffer( cmd_queue_, input1Buffer_, &initFloat, sizeof(initFloat), 0, input1BufferSize_, 0, NULL, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueFillBuffer failed"); error_ = /*_wrapper->*/clEnqueueFillBuffer( cmd_queue_, output1Buffer_, &zeroFloat, sizeof(zeroFloat), 0, output1BufferSize_, 0, NULL, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueFillBuffer failed"); /*********************************************************** * Set Kernel Args **********************************************************/ error_ = _wrapper->clSetKernelArg( kernel_, 0, sizeof(input1Buffer_), (void *) &input1Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg( kernel_, 1, sizeof(output1Buffer_), (void *) &output1Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); #endif EXIT("open"); } /******************************************************************************* * Run - execute full test once and return performance ******************************************************************************/ void OCLPerfKernelThroughput::run(void) { ENTER("run"); CPerfCounter timer; if (!_dataSizeTooBig) { // set kernel args #if 1 switch (kernelTypeIdx_) { case 0: // Matrix Multiply error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(output1Buffer_), (void *)&output1Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(input1Buffer_), (void *)&input1Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(input2Buffer_), (void *)&input2Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); break; case 1: // Flops/Byte error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(input1Buffer_), (void *)&input1Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(output1Buffer_), (void *)&output1Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); break; } #endif launchKernel(); timer.Reset(); timer.Start(); for (int i = 0; i < MAX_LOOP_ITER; i++) { launchKernel(); } timer.Stop(); } // data not too large double totalSec = _dataSizeTooBig ? 1 : timer.GetElapsedTime(); // printf("Total Time: %f seconds\n", totalSec); // printf("Average Kernel Time: %f seconds\n", totalSec / MAX_LOOP_ITER); // analyze performance avgKernelTime_ = (float)(totalSec / MAX_LOOP_ITER * 1000000); // microseconds double flopCount; switch (kernelTypeIdx_) { case 0: // Matrix Multiply flopCount = (2.0 * matrixDim1_ * matrixDim1_ * matrixDim2_); // printf("FlopCount = 2*%i*%i*%i=%f\n", // matrixDim1_,matrixDim1_,matrixDim2_,flopCount); bandwidth_ = (float)(1.f * _reqDataSize / 1024.f / 1024.f / 1024.f) * 1000000.f / avgKernelTime_; // GB/s gflops_ = (float)(1000000.f * flopCount / avgKernelTime_ / 1000000000.0); break; case 1: // Madds flopCount = _reqDataSize * flopsPerByte_; bandwidth_ = (float)(1.f * _reqDataSize / 1024.f / 1024.f / 1024.f) * 1000000.f / avgKernelTime_; // GB/s gflops_ = bandwidth_ * flopsPerByte_; break; } if (_dataSizeTooBig) { printf("REQUESTED DATA SIZE EXCEEDS GLOBAL MEMORY !!!\n"); bandwidth_ = 0; gflops_ = 0; avgKernelTime_ = 0; } // here print out details char buf[512]; int bytesWritten; bytesWritten = SNPRINTF( buf, sizeof(buf), "Kernel:%7s; " "Work:%4i; " "Buff:%11.0f; " "Path:%7s; " "%10.5e GB/s; " "%10.5e GFlop/s; ", kernelType[kernelTypeIdx_], static_cast(workSize[workSizeIdx_]), _reqDataSize, memPath[memPathIdx_], bandwidth_, gflops_); testDescString = buf; _perfInfo = avgKernelTime_; if (!_dataSizeTooBig) checkData(); EXIT("run"); } void OCLPerfKernelThroughput::launchKernel(void) { ENTER("launchKernel") /*********************************************************** * Copy Data To **********************************************************/ // printf("Copying Data To Device\n"); switch (memPathIdx_) { case 0: // zero copy // do nothing // void *inputPtr = _wrapper->clEnqueueMapBuffer( // cmd_queue_, input1Buffer_, true, CL_MAP_READ, // 0, input1BufferSize_, 0, NULL, NULL, &error_); // void *outputPtr = _wrapper->clEnqueueMapBuffer( // cmd_queue_, output1Buffer_, true, CL_MAP_READ, // 0, output1BufferSize_, 0, NULL, NULL, &error_); //_wrapper->clFinish(cmd_queue_); break; case 1: // explicit copy to device memory // printf("Queue: %p\n", &cmd_queue_); // printf("devBuffer: %i\n", input1Buffer_); // printf("hstBuffer: %p\n", input1Ptr_); // printf("bufSize: %i\n", input1BufferSize_); error_ = _wrapper->clEnqueueWriteBuffer( cmd_queue_, input1Buffer_, true, 0, input1BufferSize_, (const void *)input1Ptr_, 0, NULL, NULL); if (input2BufferSize_) { error_ = _wrapper->clEnqueueWriteBuffer( cmd_queue_, input2Buffer_, true, 0, input2BufferSize_, (const void *)input2Ptr_, 0, NULL, NULL); } // printf("Error: %i\n", error_); std::fflush(stdout); _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_ != CL_SUCCESS, "clWriteBuffer failed"); //_error = _wrapper->clEnqueueWriteBuffer( // cmd_queue_, output1Buffer_, true, 0, output1BufferSize_, // (const void *)output1Ptr_, 0, NULL, NULL ); // CHECK_RESULT(error_ != CL_SUCCESS, "clWriteBuffer failed"); break; } /*********************************************************** * Set Kernel Args **********************************************************/ #if 0 error_ = _wrapper->clSetKernelArg( kernel_, 0, sizeof(input1Buffer_), (void *) &input1Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg( kernel_, 1, sizeof(output1Buffer_), (void *) &output1Buffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); #endif // printf("Launching Kernel: %ix%i threads\n", global_work_size_[0], // local_work_size_[0]); /*********************************************************** * Launch Kernel **********************************************************/ error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, work_dim_, NULL, (const size_t *)global_work_size_, (const size_t *)local_work_size_, 0, NULL, NULL); // printf("Error: %i\n", error_); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); /*********************************************************** * Copy Data From **********************************************************/ // printf("Copying Data From Device\n"); switch (memPathIdx_) { case 0: // zero copy // do nothing // void *inputPtr = _wrapper->clEnqueueMapBuffer( // cmd_queue_, input1Buffer_, true, CL_MAP_READ, // 0, input1BufferSize_, 0, NULL, NULL, &error_); // void *outputPtr = _wrapper->clEnqueueMapBuffer( // cmd_queue_, output1Buffer_, true, CL_MAP_READ, // 0, output1BufferSize_, 0, NULL, NULL, &error_); //_wrapper->clFinish(cmd_queue_); break; case 1: // explicit copy to device memory //_error = _wrapper->clEnqueueReadBuffer( // cmd_queue_, input1Buffer_, true, 0, input1BufferSize_, // (void *)input1Ptr_, 0, NULL, NULL ); // CHECK_RESULT(error_ != CL_SUCCESS, "clWriteBuffer failed"); // printf("VAL0 %p error_ = _wrapper->clEnqueueReadBuffer( cmd_queue_, output1Buffer_, true, 0, output1BufferSize_, (void *)output1Ptr_, 0, NULL, NULL); // printf("Error: %i\n", error_); CHECK_RESULT(error_ != CL_SUCCESS, "clWriteBuffer failed"); break; } EXIT("launchKernel") } /******************************************************************************* * Check Data ******************************************************************************/ void OCLPerfKernelThroughput::checkData() { _wrapper->clFinish(cmd_queue_); float errorThreshhold = 0.00001f; float eqMax = gold_ + errorThreshhold * gold_; float eqMin = gold_ - errorThreshhold * gold_; /* printf("%ix%i * %ix%i = %ix%i:\n", matrixDim1_, matrixDim2_, matrixDim2_, matrixDim1_, matrixDim1_, matrixDim1_ ); */ for (unsigned int i = 0; i < output1BufferSize_ / sizeof(float); i++) { float value = output1Ptr_[i]; bool equal = (value > eqMin && value < eqMax); if (!equal) { #if 0 printf("Output[%i] = %.6e; gold_ = %.6e; %s\n", i, value, gold_, equal ? "Equal" : "NOT Equal"); #endif // printf("FAILURE\n"); // CHECK_RESULT_NO_RETURN(1, "Data validation failed!\n"); _errorFlag = true; break; } else { // printf("M[%i] = %.6e\n", i, output1Ptr_[i]); } } } /******************************************************************************* * Close - delete all data and release opencl objects ******************************************************************************/ unsigned int OCLPerfKernelThroughput::close(void) { ENTER("close"); _wrapper->clFinish(cmd_queue_); if (global_work_size_) { delete[] global_work_size_; global_work_size_ = NULL; } if (local_work_size_) { delete[] local_work_size_; local_work_size_ = NULL; } // switch for memory type switch (memPathIdx_) { case 0: // zero copy // unmap ptr if (input1Ptr_) { error_ = /*_wrapper->*/ clEnqueueUnmapMemObject( cmd_queue_, input1Buffer_, input1Ptr_, 0, NULL, NULL); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clEnqueueUnmapMemObject(input_) failed"); _wrapper->clFinish(cmd_queue_); error_ = _wrapper->clReleaseMemObject(input1Buffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(input1Buffer_) failed"); input1Buffer_ = 0; } if (input2Ptr_) { error_ = /*_wrapper->*/ clEnqueueUnmapMemObject( cmd_queue_, input2Buffer_, input2Ptr_, 0, NULL, NULL); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clEnqueueUnmapMemObject(input_) failed"); _wrapper->clFinish(cmd_queue_); error_ = _wrapper->clReleaseMemObject(input2Buffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(input2Buffer_) failed"); input2Buffer_ = 0; } if (output1Ptr_) { error_ = /*_wrapper->*/ clEnqueueUnmapMemObject( cmd_queue_, output1Buffer_, output1Ptr_, 0, NULL, NULL); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clEnqueueUnmapMemObject(output_) failed"); _wrapper->clFinish(cmd_queue_); error_ = _wrapper->clReleaseMemObject(output1Buffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(input1Buffer_) failed"); output1Buffer_ = 0; } break; case 1: // explicit copy to device memory // release object if (input1Buffer_) { error_ = _wrapper->clReleaseMemObject(input1Buffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(input1Buffer_) failed"); input1Buffer_ = 0; } if (input2Buffer_) { error_ = _wrapper->clReleaseMemObject(input2Buffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(input2Buffer_) failed"); input2Buffer_ = 0; } if (output1Buffer_) { error_ = _wrapper->clReleaseMemObject(output1Buffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(input1Buffer_) failed"); output1Buffer_ = 0; } if (input1Ptr_) { delete[] input1Ptr_; input1Ptr_ = 0; } if (input2Ptr_) { delete[] input2Ptr_; input2Ptr_ = 0; } if (output1Ptr_) { delete[] output1Ptr_; output1Ptr_ = 0; } break; } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); kernel_ = 0; } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); program_ = 0; } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); cmd_queue_ = 0; } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); context_ = 0; } _wrapper->clFinish(cmd_queue_); EXIT("close"); return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfKernelThroughput.h000066400000000000000000000065131450307266000262630ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /******************************************************************************* * Kernel Throughput * * * * * * ******************************************************************************/ #ifndef _OCL_KernelThroughput_H_ #define _OCL_KernelThroughput_H_ #ifdef WIN32 #include "xmmintrin.h" #endif #include "OCLTestImp.h" //#include //#define WIN32_LEAN_AND_MEAN //Restricts windows.h to include only the core //API. #include "windows.h" #undef Yield #include #include // #include #include #define LARGE_INT long long #define UNSIGNED_LARGE_INT unsigned long long #define MAX_LOOP_ITER 10 typedef cl_float4 float4; typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int); class OCLPerfKernelThroughput : public OCLTestImp { public: OCLPerfKernelThroughput(); virtual ~OCLPerfKernelThroughput(); public: virtual void open(unsigned int test, char *units, double &conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShaderMadds(); void genShaderMatrixMultiply(); void checkData(); // void allocateBuffers(); void launchKernel(); // test parameters int kernelTypeIdx_; int memPathIdx_; int numElementsIdx_; int workSizeIdx_; float gold_; double _reqDataSize; bool _dataSizeTooBig; // device attributes cl_uint maxComputeUnits_; cl_uint maxClockFrequency_; LARGE_INT numComputeUnits_; LARGE_INT numWorkGroupsPerComputeUnit_; LARGE_INT numThreads_; cl_uint work_dim_; size_t *global_work_size_; size_t *local_work_size_; // opencl objects cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_int error_; // buffer sizes // kernel-specific values int flopsPerByte_; int matrixDim1_, matrixDim2_; // buffers size_t input1BufferSize_; size_t input2BufferSize_; size_t output1BufferSize_; cl_mem input1Buffer_; cl_mem input2Buffer_; cl_mem output1Buffer_; float *input1Ptr_; float *input2Ptr_; float *output1Ptr_; // performance results float bandwidth_; // GB/s float gflops_; // GFlop/s float avgKernelTime_; // microseconds }; #endif // _OCL_KernelThroughput_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfLDSLatency.cpp000066400000000000000000000347011450307266000252460ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfLDSLatency.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const unsigned int NUM_SIZES = 5; // 2k up to 64MB static const unsigned int Sizes[NUM_SIZES] = {2048, 4096, 8192, 16384, 32768}; // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif void OCLPerfLDSLatency::genShader() { shader_.clear(); // DO NOT PUBLISH // Adopted from SiSoft Sandra 2013's memory latency test shader_ += "__kernel\n" //"__attribute__((work_group_size_hint(1, 1, 1)))\n" "void MemWalker(\n" " global uint * restrict input,\n" " global uint * restrict output,\n" " const uint uCount, const uint uSize,\n" " const uint uOffset, const int bMem, const uint repeats)\n" "{\n" " uint o = uOffset;\n" " uint lid = get_local_id(0);\n" " uint x = lid*o;\n" " local uint lclData[8192];\n" "\n" " {\n" " uint i = uCount;\n" " while (i--) {\n" " uint oldX = x;\n" " x = input[x];\n" " lclData[oldX] = x;\n" " }\n" " }\n" "\n" " x = lid*uOffset;\n" " for (uint loop = 0; loop < repeats; loop++) {\n" " uint i = uCount;\n" " while (i--) {\n" " x = lclData[x] + o;\n" " }\n" " }\n" "\n" " output[0] = x;\n" "}\n"; // printf("shader:\n%s\n", shader_.c_str()); shader_ += "\n\n"; shader_ += "__kernel\n" //"__attribute__((work_group_size_hint(1, 1, 1)))\n" "void Overhead(\n" " global uint * restrict input,\n" " global uint * restrict output,\n" " const uint uCount, const uint uSize,\n" " const uint uOffset, const int bMem, const uint repeats)\n" "{\n" " local uint lclData[8192];\n" "#ifdef USE_FLOAT\n" " {\n" " uint x = 0;\n" " uint i = uCount;\n" " while (i--) {\n" " uint oldX = x;\n" " x = input[x] /* + o*/;\n" " lclData[oldX] = x;\n" " }\n" " }\n" " float x = (float)input[0];\n" " for (uint loop = 0; loop < repeats; loop++) {\n" " uint i = uCount;\n" " x = (float)uOffset*x;\n" " while (i--) {\n" " x += (float)i;\n" " }\n" " }\n" " output[0] = (uint)x + uOffset*lclData[8191];\n" "#else\n" " {\n" " uint x = 0;\n" " uint i = uCount;\n" " while (i--) {\n" " uint oldX = x;\n" " x = input[x] /* + o*/;\n" " lclData[oldX] = x;\n" " }\n" " }\n" " uint x = input[0];\n" " for (uint loop = 0; loop < repeats; loop++) {\n" " uint i = uCount;\n" " x = x*uOffset;\n" " while (i--) {\n" " x += i;\n" " }\n" " }\n" " output[0] = x + uOffset*lclData[8191];\n" "#endif\n" "}\n"; } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLPerfLDSLatency::OCLPerfLDSLatency() { _numSubTests = NUM_SIZES * 2; maxSize_ = Sizes[NUM_SIZES - 1] * 2048; } OCLPerfLDSLatency::~OCLPerfLDSLatency() {} void OCLPerfLDSLatency::setData(cl_mem buffer, unsigned int val) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_WRITE, 0, width_, 0, NULL, NULL, &error_); unsigned int *data = (unsigned int *)ptr; for (unsigned int i = 0; i < bufSizeDW_; i++) { data[(i * (1024 + 17)) % bufSizeDW_] = ((i + 1) * (1024 + 17)) % bufSizeDW_; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); clFinish(cmd_queue_); } void OCLPerfLDSLatency::checkData(cl_mem buffer) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, &error_); unsigned int *data = (unsigned int *)ptr; if (data[0] != 0) { printf("OutData= 0x%08x\n", data[0]); CHECK_RESULT_NO_RETURN(data[0] != 0, "Data validation failed!\n"); } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); } void OCLPerfLDSLatency::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; moreThreads = false; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; _errorFlag = false; // Reset error code so a single error doesn't prevent // other subtests from running _errorMsg = ""; isAMD_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD_ = true; } } delete platforms; } width_ = Sizes[test % NUM_SIZES]; bufSizeDW_ = width_ / sizeof(cl_uint); moreThreads = ((test / NUM_SIZES) % 2) ? true : false; CHECK_RESULT(platform == 0, "Couldn't find OpenCL platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "Failed to allocate devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); device = devices[0]; free(devices); devices = NULL; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_uint flags; flags = 0; inBuffer_ = _wrapper->clCreateBuffer(context_, flags, width_, NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, 1 * sizeof(cl_uint), NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); genShader(); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); if (isAMD_) args += " -D USE_FLOAT"; error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "MemWalker", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel(MemWalker) failed"); kernel2_ = _wrapper->clCreateKernel(program_, "Overhead", &error_); CHECK_RESULT(kernel2_ == 0, "clCreateKernel(Overhead) failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_uint), (void *)&bufSizeDW_); unsigned int zero = 0; error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_uint), (void *)&zero); int bMem = 1; error_ = _wrapper->clSetKernelArg(kernel_, 5, sizeof(cl_int), (void *)&bMem); // Limit the repeats, large buffers will have more samples, but the test runs // for a long time repeats_ = std::max((maxSize_ >> 4) / bufSizeDW_, 1u); error_ = _wrapper->clSetKernelArg(kernel_, 6, sizeof(cl_uint), (void *)&repeats_); error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel2_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel2_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel2_, 3, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel2_, 4, sizeof(cl_uint), (void *)&zero); error_ = _wrapper->clSetKernelArg(kernel2_, 5, sizeof(cl_int), (void *)&bMem); error_ = _wrapper->clSetKernelArg(kernel2_, 6, sizeof(cl_uint), (void *)&repeats_); setData(inBuffer_, (int)1.0f); } void OCLPerfLDSLatency::run(void) { int global = 1; int local = 1; if (moreThreads) { if (isAMD_) { global *= 64; local *= 64; } else { global *= 32; local *= 32; } } size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; // Warm-up unsigned int warmup = 128; error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&warmup); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); _wrapper->clFinish(cmd_queue_); // Restore input buffer when finished as it may have been modified by RW test setData(inBuffer_, (int)1.0f); CPerfCounter timer, timer2; timer.Reset(); timer.Start(); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); checkData(outBuffer_); timer2.Reset(); timer2.Start(); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel2_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer2.Stop(); double sec = timer.GetElapsedTime() - timer2.GetElapsedTime(); // Read latency in ns double perf = sec * (double)(1e09) / ((double)bufSizeDW_ * (double)repeats_); _perfInfo = (float)perf; char buf[256]; char buf2[32]; buf2[0] = '\0'; SNPRINTF(buf, sizeof(buf), "%10s %2d threads, %8d reads, %5d repeats (ns)", buf2, global, bufSizeDW_, repeats_); testDescString = buf; } unsigned int OCLPerfLDSLatency::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (kernel2_) { error_ = _wrapper->clReleaseKernel(kernel2_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfLDSLatency.h000066400000000000000000000036471450307266000247200ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_LDSLATENCY_H_ #define _OCL_LDSLATENCY_H_ #include "OCLTestImp.h" class OCLPerfLDSLatency : public OCLTestImp { public: OCLPerfLDSLatency(); virtual ~OCLPerfLDSLatency(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(void); void setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_kernel kernel2_; cl_mem inBuffer_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int bufSizeDW_; unsigned int repeats_; unsigned int maxSize_; bool isAMD_; bool moreThreads; }; #endif // _OCL_LDSLATENCY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfLDSReadSpeed.cpp000066400000000000000000000337711450307266000255110ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfLDSReadSpeed.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; void OCLPerfLDSReadSpeed::genShader(unsigned int idx) { shader_.clear(); if (idx == 0) { shader_ += "__kernel __attribute__((reqd_work_group_size(64,1,1))) void " "_ldsReadSpeed(__global float *outBuf, float c)\n" "{\n" " uint gid = (int) get_global_id(0);\n" " uint lid = (int) get_local_id(0);\n" " __local float localLocal[2048];\n" " float val1 = c;\n" " float val2 = c;\n" " float val3 = c;\n" " float val4 = c;\n" " uint hacklid = gid % 64;\n" " for (int i = 0; i < (2048/64); i++) {\n" " localLocal[hacklid + i*64] = lid;\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " val1 += localLocal[lid+0];\n" " val2 += localLocal[lid+64];\n" " val3 += localLocal[lid+128];\n" " val4 += localLocal[lid+192];\n" " val1 += localLocal[lid+256];\n" " val2 += localLocal[lid+320];\n" " val3 += localLocal[lid+384];\n" " val4 += localLocal[lid+448];\n" " val1 += localLocal[lid+512];\n" " val2 += localLocal[lid+576];\n" " val3 += localLocal[lid+640];\n" " val4 += localLocal[lid+704];\n" " val1 += localLocal[lid+768];\n" " val2 += localLocal[lid+832];\n" " val3 += localLocal[lid+896];\n" " val4 += localLocal[lid+960];\n" " val1 += localLocal[lid+1024];\n" " val2 += localLocal[lid+1088];\n" " val3 += localLocal[lid+1152];\n" " val4 += localLocal[lid+1216];\n" " val1 += localLocal[lid+1280];\n" " val2 += localLocal[lid+1344];\n" " val3 += localLocal[lid+1408];\n" " val4 += localLocal[lid+1472];\n" " val1 += localLocal[lid+1536];\n" " val2 += localLocal[lid+1600];\n" " val3 += localLocal[lid+1664];\n" " val4 += localLocal[lid+1728];\n" " val1 += localLocal[lid+1792];\n" " val2 += localLocal[lid+1856];\n" " val3 += localLocal[lid+1920];\n" " val4 += localLocal[lid+1984];\n" " outBuf[gid] = val1+val2+val3+val4;\n" "}\n"; ldsSizeBytes_ = 2048 * 4; } else if (idx == 1) { shader_ += "__kernel __attribute__((reqd_work_group_size(64,1,1))) void " "_ldsReadSpeed(__global float *outBuf, float c)\n" "{\n" " uint gid = (uint) get_global_id(0);\n" " int lid = (int) get_local_id(0);\n" " __local float localLocal[768];\n" " float val0 = 0.0f;\n" " float val1 = 0.0f;\n" " uint hacklid = gid % 64;\n" " for (int i = 0; i < (768/64); i++) {\n" " localLocal[hacklid + i*64] = lid;\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" "#pragma nounroll\n" "for (uint i = 0; i < 32;i++)\n" "{\n" " val0 += localLocal[lid+0];\n" " val1 += localLocal[lid+64];\n" " val0 += localLocal[lid+128];\n" " val1 += localLocal[lid+192];\n" " val0 += localLocal[lid+256];\n" " val1 += localLocal[lid+320];\n" " val0 += localLocal[lid+384];\n" " val1 += localLocal[lid+448];\n" " lid += 1;\n" "}\n" "val0 += val1;\n" "val1 = min(val0,1.0f);\n" "if ((lid + val1) < 0){\n" " outBuf[gid] = val0;\n" "}\n" "}\n"; ldsSizeBytes_ = 768 * 4; } else { shader_ += "__kernel __attribute__((reqd_work_group_size(64,1,1))) void " "_ldsReadSpeed(__global float *outBuf, float c)\n" "{\n" " uint gid = (uint) get_global_id(0);\n" " int lid = (int) get_local_id(0);\n" " __local float localLocal[256];\n" " float val0 = 0.0f;\n" " float val1 = 0.0f;\n" " uint hacklid = gid % 64;\n" " for (int i = 0; i < (256/64); i++) {\n" " localLocal[hacklid + i*64] = lid;\n" " }\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" "#pragma nounroll\n" "for (uint i = 0; i < 32;i++)\n" "{\n" " val0 += localLocal[8*i+0];\n" " val1 += localLocal[8*i+1];\n" " val0 += localLocal[8*i+2];\n" " val1 += localLocal[8*i+3];\n" " val0 += localLocal[8*i+4];\n" " val1 += localLocal[8*i+5];\n" " val0 += localLocal[8*i+6];\n" " val1 += localLocal[8*i+7];\n" "}\n" "val0 += val1;\n" "val1 = min(val0,1.0f);\n" "if ((lid + val1) < 0){\n" " outBuf[gid] = val0;\n" "}\n" "}\n"; ldsSizeBytes_ = 256 * 4; } } OCLPerfLDSReadSpeed::OCLPerfLDSReadSpeed() { _numSubTests = NUM_SIZES * 3; } OCLPerfLDSReadSpeed::~OCLPerfLDSReadSpeed() {} void OCLPerfLDSReadSpeed::setData(cl_mem buffer, float val) { float *data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); } void OCLPerfLDSReadSpeed::checkData(cl_mem buffer) { float *data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) { if (data[i] != (float)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfLDSReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; outBuffer_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } numReads_ = 32; width_ = Sizes[test % NUM_SIZES]; shaderIdx_ = test / NUM_SIZES; bufSize_ = width_; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); genShader(shaderIdx_); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &device, "", NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_ldsReadSpeed", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); float foo = 0; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_float), (void *)&foo); setData(outBuffer_, 1.2345678f); } void OCLPerfLDSReadSpeed::run(void) { int global = bufSize_ / sizeof(cl_float); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); char buf[256]; const char *buf2; if (shaderIdx_ == 0) { buf2 = " def kernel"; } else if (shaderIdx_ == 1) { buf2 = "SI friendly"; numReads_ *= 8; } else { buf2 = " broadcast"; numReads_ *= 8; } // LDS bandwidth in GB/s // We have one extra write per LDS location to initialize LDS double perf = ((double)global * (numReads_ * sizeof(cl_float) + ldsSizeBytes_ / 64) * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; SNPRINTF(buf, sizeof(buf), " %s %8d threads, %3d reads (GB/s) ", buf2, global, numReads_); testDescString = buf; // checkData(outBuffer_); } unsigned int OCLPerfLDSReadSpeed::close(void) { _wrapper->clFinish(cmd_queue_); if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfLDSReadSpeed.h000066400000000000000000000037231450307266000251500ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_LDSReadSpeed_H_ #define _OCL_LDSReadSpeed_H_ #include "OCLTestImp.h" class OCLPerfLDSReadSpeed : public OCLTestImp { public: OCLPerfLDSReadSpeed(); virtual ~OCLPerfLDSReadSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(unsigned int idx); void setData(cl_mem buffer, float data); void checkData(cl_mem buffer); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int vecSizeIdx_; unsigned int numReads_; unsigned int shaderIdx_; unsigned int ldsSizeBytes_; }; #endif // _OCL_LDSReadSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMandelbrot.cpp000066400000000000000000001043121450307266000253670ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMandelbrot.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif typedef struct { double x; double y; double width; } coordRec; coordRec coords[] = { {0.0, 0.0, 4.0}, // Whole set {0.0, 0.0, 0.00001}, // All black {-0.0180789661868, 0.6424294066162, 0.00003824140}, // Hit detail }; static unsigned int numCoords = sizeof(coords) / sizeof(coordRec); static const char *float_mandel = "__kernel void mandelbrot(__global uint *out, uint width, float xPos, " "float yPos, float xStep, float yStep, uint maxIter)\n" "{\n" " int tid = get_global_id(0);\n" " int i = tid % width;\n" " int j = tid / width;\n" " float x0 = (float)(xPos + xStep*i);\n" " float y0 = (float)(yPos + yStep*j);\n" "\n" " float x = x0;\n" " float y = y0;\n" "\n" " uint iter = 0;\n" " float tmp;\n" " for (iter = 0; (x*x + y*y <= 4.0f) && (iter < maxIter); iter++)\n" " {\n" " tmp = x;\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" " }\n" " out[tid] = iter;\n" "}\n"; static const char *float_mandel_vec = "__kernel void mandelbrot(__global uint *out, uint width, float xPos, " "float yPos, float xStep, float yStep, uint maxIter)\n" "{\n" " int tid = get_global_id(0);\n" " int i = tid % (width/4);\n" " int j = tid / (width/4);\n" " int4 veci = (int4)(4*i, 4*i+1, 4*i+2, 4*i+3);\n" " int4 vecj = (int4)(j, j, j, j);\n" " float4 x0;\n" " x0.s0 = (float)(xPos + xStep*veci.s0);\n" " x0.s1 = (float)(xPos + xStep*veci.s1);\n" " x0.s2 = (float)(xPos + xStep*veci.s2);\n" " x0.s3 = (float)(xPos + xStep*veci.s3);\n" " float4 y0;\n" " y0.s0 = (float)(yPos + yStep*vecj.s0);\n" " y0.s1 = (float)(yPos + yStep*vecj.s1);\n" " y0.s2 = (float)(yPos + yStep*vecj.s2);\n" " y0.s3 = (float)(yPos + yStep*vecj.s3);\n" "\n" " float4 x = x0;\n" " float4 y = y0;\n" "\n" " uint iter = 0;\n" " float4 tmp;\n" " int4 stay;\n" " int4 ccount = 0;\n" " float4 savx = x;\n" " float4 savy = y;\n" " stay = (x*x+y*y) <= (float4)(4.0f, 4.0f, 4.0f, 4.0f);\n" " for (iter = 0; (stay.s0 | stay.s1 | stay.s2 | stay.s3) && (iter < " "maxIter); iter+=16)\n" " {\n" " x = savx;\n" " y = savy;\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " stay = (x*x+y*y) <= (float4)(4.0f, 4.0f, 4.0f, 4.0f);\n" " savx = select(savx,x,stay);\n" " savy = select(savy,y,stay);\n" " ccount -= stay*16;\n" " }\n" " // Handle remainder\n" " if (!(stay.s0 & stay.s1 & stay.s2 & stay.s3))\n" " {\n" " iter = 16;\n" " do\n" " {\n" " x = savx;\n" " y = savy;\n" " // More efficient to use scalar ops here: Why?\n" " stay.s0 = ((x.s0*x.s0+y.s0*y.s0) <= 4.0f) && (ccount.s0 < " "maxIter);\n" " stay.s1 = ((x.s1*x.s1+y.s1*y.s1) <= 4.0f) && (ccount.s1 < " "maxIter);\n" " stay.s2 = ((x.s2*x.s2+y.s2*y.s2) <= 4.0f) && (ccount.s2 < " "maxIter);\n" " stay.s3 = ((x.s3*x.s3+y.s3*y.s3) <= 4.0f) && (ccount.s3 < " "maxIter);\n" " tmp = x;\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" " ccount += stay;\n" " iter--;\n" " savx.s0 = (stay.s0 ? x.s0 : savx.s0);\n" " savx.s1 = (stay.s1 ? x.s1 : savx.s1);\n" " savx.s2 = (stay.s2 ? x.s2 : savx.s2);\n" " savx.s3 = (stay.s3 ? x.s3 : savx.s3);\n" " savy.s0 = (stay.s0 ? y.s0 : savy.s0);\n" " savy.s1 = (stay.s1 ? y.s1 : savy.s1);\n" " savy.s2 = (stay.s2 ? y.s2 : savy.s2);\n" " savy.s3 = (stay.s3 ? y.s3 : savy.s3);\n" " } while ((stay.s0 | stay.s1 | stay.s2 | stay.s3) && iter);\n" " }\n" " __global uint4 *vecOut = (__global uint4 *)out;\n" " vecOut[tid] = convert_uint4(ccount);\n" "}\n"; static const char *float_mandel_unroll = "__kernel void mandelbrot(__global uint *out, uint width, float xPos, " "float yPos, float xStep, float yStep, uint maxIter)\n" "{\n" " int tid = get_global_id(0);\n" " int i = tid % width;\n" " int j = tid / width;\n" " float x0 = (float)(xPos + xStep*(float)i);\n" " float y0 = (float)(yPos + yStep*(float)j);\n" "\n" " float x = x0;\n" " float y = y0;\n" "\n" "#define FAST\n" " uint iter = 0;\n" " float tmp;\n" " int stay;\n" " int ccount = 0;\n" " stay = (x*x+y*y) <= 4.0;\n" " float savx = x;\n" " float savy = y;\n" "#ifdef FAST\n" " for (iter = 0; (iter < maxIter); iter+=16)\n" "#else\n" " for (iter = 0; stay && (iter < maxIter); iter+=16)\n" "#endif\n" " {\n" " x = savx;\n" " y = savy;\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " stay = (x*x+y*y) <= 4.0;\n" " savx = select(savx,x,stay);\n" " savy = select(savy,y,stay);\n" " ccount += stay*16;\n" "#ifdef FAST\n" " if (!stay)\n" " break;\n" "#endif\n" " }\n" " // Handle remainder\n" " if (!stay)\n" " {\n" " iter = 16;\n" " do\n" " {\n" " x = savx;\n" " y = savy;\n" " stay = ((x*x+y*y) <= 4.0) && (ccount < maxIter);\n" " tmp = x;\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" " ccount += stay;\n" " iter--;\n" " savx = select(savx,x,stay);\n" " savy = select(savy,y,stay);\n" " } while (stay && iter);\n" " }\n" " out[tid] = (uint)ccount;\n" "}\n"; static const char *double_mandel = "#ifdef USE_CL_AMD_FP64\n" "#pragma OPENCL EXTENSION cl_amd_fp64 : enable\n" "#endif\n" "#ifdef USE_CL_KHR_FP64\n" "#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n" "#endif\n" "__kernel void mandelbrot(__global uint *out, uint width, double xPos, " "double yPos, double xStep, double yStep, uint maxIter)\n" "{\n" " int tid = get_global_id(0);\n" " int i = tid % width;\n" " int j = tid / width;\n" " double x0 = (double)(xPos + xStep*i);\n" " double y0 = (double)(yPos + yStep*j);\n" "\n" " double x = x0;\n" " double y = y0;\n" "\n" " uint iter = 0;\n" " double tmp;\n" " for (iter = 0; (x*x + y*y <= 4.0) && (iter < maxIter); iter++)\n" " {\n" " tmp = x;\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" " }\n" " out[tid] = iter;\n" "}\n"; static const char *double_mandel_unroll = "#ifdef USE_CL_AMD_FP64\n" "#pragma OPENCL EXTENSION cl_amd_fp64 : enable\n" "#endif\n" "#ifdef USE_CL_KHR_FP64\n" "#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n" "#endif\n" "__kernel void mandelbrot(__global uint *out, uint width, double xPos, " "double yPos, double xStep, double yStep, uint maxIter)\n" "{\n" " int tid = get_global_id(0);\n" " int i = tid % width;\n" " int j = tid / width;\n" " double x0 = (double)(xPos + xStep*(double)i);\n" " double y0 = (double)(yPos + yStep*(double)j);\n" "\n" " double x = x0;\n" " double y = y0;\n" "\n" "#define FAST\n" " uint iter = 0;\n" " double tmp;\n" " int stay;\n" " int ccount = 0;\n" " stay = (x*x+y*y) <= 4.0;\n" " double savx = x;\n" " double savy = y;\n" "#ifdef FAST\n" " for (iter = 0; (iter < maxIter); iter+=16)\n" "#else\n" " for (iter = 0; stay && (iter < maxIter); iter+=16)\n" "#endif\n" " {\n" " x = savx;\n" " y = savy;\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " // Two iterations\n" " tmp = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*x,y,y0);\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(tmp,tmp,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" "\n" " stay = (x*x+y*y) <= 4.0;\n" " savx = (stay ? x : savx);//select(savx,x,stay);\n" " savy = (stay ? y : savy);//select(savy,y,stay);\n" " ccount += stay*16;\n" "#ifdef FAST\n" " if (!stay)\n" " break;\n" "#endif\n" " }\n" " // Handle remainder\n" " if (!stay)\n" " {\n" " iter = 16;\n" " do\n" " {\n" " x = savx;\n" " y = savy;\n" " stay = ((x*x+y*y) <= 4.0) && (ccount < maxIter);\n" " tmp = x;\n" " x = MUL_ADD_INS(-y,y,MUL_ADD_INS(x,x,x0));\n" " y = MUL_ADD_INS(2.0f*tmp,y,y0);\n" " ccount += stay;\n" " iter--;\n" " savx = (stay ? x : savx);//select(savx,x,stay);\n" " savy = (stay ? y : savy);//select(savy,y,stay);\n" " } while (stay && iter);\n" " }\n" " out[tid] = (uint)ccount;\n" "}\n"; static const unsigned int FMA_EXPECTEDVALUES_INDEX = 15; // Expected results for each kernel run at each coord unsigned long long expectedIters[] = { 203277748ull, 2147483648ull, 120254651ull, 203277748ull, 2147483648ull, 120254651ull, 203277748ull, 2147483648ull, 120254651ull, 203315114ull, 2147483648ull, 120042599ull, 203315114ull, 2147483648ull, 120042599ull, 203280620ull, 2147483648ull, 120485704ull, 203280620ull, 2147483648ull, 120485704ull, 203280620ull, 2147483648ull, 120485704ull, 203315114ull, 2147483648ull, 120042599ull, 203315114ull, 2147483648ull, 120042599ull}; // nvidia supports CL_KHR_FP64, so they get better results for doubles. Not // sure why we differ in floats though unsigned long long expectedItersNV[] = { 203277748ull, 2147483648ull, 120254651ull, 203277748ull, 2147483648ull, 120254651ull, 203277748ull, 2147483648ull, 120254651ull, 203315226ull, 2147483648ull, 120091921ull, 203315226ull, 2147483648ull, 120091921ull, // end of mad 203280620ull, 2147483648ull, 120485704ull, 203280620ull, 2147483648ull, 120485704ull, 203280620ull, 2147483648ull, 120485704ull, 203315114ull, 2147483648ull, 120042599ull, 203315114ull, 2147483648ull, 120042599ull}; const char *shaderStr[] = {" float_mad", " float_vector_mad", " float_unroll_mad", " double_mad", "double_unroll_mad", " float_fma", " float_vector_fma", " float_unroll_fma", " double_fma", "double_unroll_fma"}; OCLPerfMandelbrot::OCLPerfMandelbrot() { _numSubTests = 10 * numCoords; } OCLPerfMandelbrot::~OCLPerfMandelbrot() {} void OCLPerfMandelbrot::setData(cl_mem buffer, unsigned int val) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < width_ * width_; i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } void OCLPerfMandelbrot::checkData(cl_mem buffer) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < width_ * width_; i++) { totalIters += data[i]; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfMandelbrot::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; skip = false; totalIters = 0; isAMD = false; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; outBuffer_ = 0; // Maximum iteration count // NOTE: Some kernels are unrolled 16 times, so make sure maxIter is divisible // by 16 NOTE: Can increase to get better peak performance numbers, but be // sure not to TDR slow ASICs! unsigned int maxIter = 32768; // NOTE: Width needs to be divisible by 4 because the float_mandel_vec kernel // processes 4 pixels at once NOTE: Can increase to get better peak // performance numbers, but be sure not to TDR slow ASICs! width_ = 256; // We compute a square domain bufSize_ = width_ * width_ * sizeof(cl_uint); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); // Get last for default #if 0 platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); #if 0 if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { platform = platforms[i]; break; } #endif num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } platform = platforms[_platformIndex]; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); doubleSupport = false; char *p = strstr(charbuf, "cl_amd_fp64"); char *p2 = strstr(charbuf, "cl_khr_fp64"); if (p || p2) doubleSupport = true; else doubleSupport = false; cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); const char *tmp; shaderIdx = _openTest / numCoords; if ((doubleSupport != true) && ((shaderIdx == 3) || (shaderIdx == 4) || (shaderIdx == 8) || (shaderIdx == 9))) { // We don't support doubles, so skip those tests skip = true; _perfInfo = 0.0f; return; } if (shaderIdx == 0 || shaderIdx == 5) { tmp = float_mandel; } else if (shaderIdx == 1 || shaderIdx == 6) { tmp = float_mandel_vec; } else if (shaderIdx == 2 || shaderIdx == 7) { tmp = float_mandel_unroll; } else if (shaderIdx == 3 || shaderIdx == 8) { tmp = double_mandel; } else { tmp = double_mandel_unroll; } std::string curr(tmp); std::string searchString("MUL_ADD_INS"); std::string replaceString; if (shaderIdx < 5) { replaceString = "mad"; } else { replaceString = "fma"; } std::string::size_type pos = 0; while ((pos = curr.find(searchString, pos)) != std::string::npos) { curr.replace(pos, searchString.size(), replaceString); pos++; } tmp = curr.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); const char *buildOps = NULL; if (p) buildOps = "-DUSE_CL_AMD_FP64"; else if (p2) buildOps = "-DUSE_CL_KHR_FP64"; error_ = _wrapper->clBuildProgram(program_, 1, &device, buildOps, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "mandelbrot", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); coordIdx = _openTest % numCoords; if ((shaderIdx == 0) || (shaderIdx == 1) || (shaderIdx == 2) || (shaderIdx == 5) || (shaderIdx == 6) || (shaderIdx == 7)) { float xStep = (float)(coords[coordIdx].width / (double)width_); float yStep = (float)(-coords[coordIdx].width / (double)width_); float xPos = (float)(coords[coordIdx].x - 0.5 * coords[coordIdx].width); float yPos = (float)(coords[coordIdx].y + 0.5 * coords[coordIdx].width); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_uint), (void *)&width_); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_float), (void *)&xPos); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_float), (void *)&yPos); error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_float), (void *)&xStep); error_ = _wrapper->clSetKernelArg(kernel_, 5, sizeof(cl_float), (void *)&yStep); error_ = _wrapper->clSetKernelArg(kernel_, 6, sizeof(cl_uint), (void *)&maxIter); } else { double xStep = coords[coordIdx].width / (double)width_; double yStep = -coords[coordIdx].width / (double)width_; double xPos = coords[coordIdx].x - 0.5 * coords[coordIdx].width; double yPos = coords[coordIdx].y + 0.5 * coords[coordIdx].width; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_uint), (void *)&width_); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_double), (void *)&xPos); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_double), (void *)&yPos); error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_double), (void *)&xStep); error_ = _wrapper->clSetKernelArg(kernel_, 5, sizeof(cl_double), (void *)&yStep); error_ = _wrapper->clSetKernelArg(kernel_, 6, sizeof(cl_uint), (void *)&maxIter); } setData(outBuffer_, 0xdeadbeef); } void OCLPerfMandelbrot::run(void) { if (skip) return; int global = width_ * width_; // We handle 4 pixels per thread if ((shaderIdx == 1) || (shaderIdx == 6)) global >>= 2; int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; // Warm-up error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); double totalTime = 0.0; for (unsigned int k = 0; k < numLoops; k++) { CPerfCounter timer; timer.Reset(); timer.Start(); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); totalTime += sec; } checkData(outBuffer_); // Compute GFLOPS. There are 7 FLOPs per iteration double perf = ((double)totalIters * 7 * (double)(1e-09)) / (totalTime / (double)numLoops); _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " %s (GFLOPS) ", shaderStr[shaderIdx]); testDescString = buf; // Dump iteration count // printf(" totalIter = %lld\n", totalIters); if (isAMD && (type_ == CL_DEVICE_TYPE_GPU)) { CHECK_RESULT((totalIters != expectedIters[_openTest]) && (totalIters != expectedIters[(_openTest < FMA_EXPECTEDVALUES_INDEX ? _openTest + FMA_EXPECTEDVALUES_INDEX : _openTest)]), "Incorrect iteration count detected!"); } else { CHECK_RESULT(totalIters != expectedItersNV[_openTest], "Incorrect iteration count detected!"); } } unsigned int OCLPerfMandelbrot::close(void) { if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } OCLPerfAsyncMandelbrot::OCLPerfAsyncMandelbrot() {} OCLPerfAsyncMandelbrot::~OCLPerfAsyncMandelbrot() {} void OCLPerfAsyncMandelbrot::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { // Create common items first OCLPerfMandelbrot::open(test, units, conversion, deviceId); // Create resources for async test cmd_queue2_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue2_ == 0, "clCreateCommandQueue failed"); outBuffer2_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer2) failed"); } void OCLPerfAsyncMandelbrot::run(void) { if (skip) return; int global = width_ * width_; // We handle 4 pixels per thread if ((shaderIdx == 1) || (shaderIdx == 6)) global >>= 2; int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; // Warm-up error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer2_); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue2_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue2_); double totalTime = 0.0; for (unsigned int k = 0; k < numLoops; k++) { CPerfCounter timer; timer.Reset(); timer.Start(); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer2_); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue2_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFlush(cmd_queue_); _wrapper->clFlush(cmd_queue2_); _wrapper->clFinish(cmd_queue_); _wrapper->clFinish(cmd_queue2_); timer.Stop(); double sec = timer.GetElapsedTime(); totalTime += sec; } checkData(outBuffer_); checkData(outBuffer2_); // Compute GFLOPS. There are 7 FLOPs per iteration double perf = ((double)(totalIters * 7) * (double)(1e-09)) / (totalTime / (double)numLoops); _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " async %s (GFLOPS) ", shaderStr[shaderIdx]); testDescString = buf; // Dump iteration count // printf(" totalIter = %lld\n", totalIters); if (isAMD && (type_ == CL_DEVICE_TYPE_GPU)) { CHECK_RESULT( (totalIters != 2 * expectedIters[_openTest]) && (totalIters != 2 * expectedIters[(_openTest < FMA_EXPECTEDVALUES_INDEX ? _openTest + FMA_EXPECTEDVALUES_INDEX : _openTest)]), "Incorrect iteration count detected!"); } else { CHECK_RESULT(totalIters != 2 * expectedItersNV[_openTest], "Incorrect iteration count detected!"); } } unsigned int OCLPerfAsyncMandelbrot::close(void) { _wrapper->clFinish(cmd_queue_); _wrapper->clFinish(cmd_queue2_); // Clean up async test items if (outBuffer2_) { error_ = _wrapper->clReleaseMemObject(outBuffer2_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer2_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue2_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } // Clean up the rest return OCLPerfMandelbrot::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMandelbrot.h000066400000000000000000000045371450307266000250440ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_Mandelbrot_H_ #define _OCL_Mandelbrot_H_ #include "OCLTestImp.h" class OCLPerfMandelbrot : public OCLTestImp { public: OCLPerfMandelbrot(); virtual ~OCLPerfMandelbrot(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem outBuffer_; cl_int error_; cl_device_id device; unsigned int width_; unsigned int bufSize_; bool doubleSupport; bool skip; unsigned int maxIter; unsigned int shaderIdx; unsigned int coordIdx; unsigned long long totalIters; bool isAMD; static const unsigned int numLoops = 10; }; class OCLPerfAsyncMandelbrot : public OCLPerfMandelbrot { public: OCLPerfAsyncMandelbrot(); virtual ~OCLPerfAsyncMandelbrot(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); cl_command_queue cmd_queue2_; cl_mem outBuffer2_; }; #endif // _OCL_Mandelbrot_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMapBufferReadSpeed.cpp000066400000000000000000000220631450307266000267260ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMapBufferReadSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; static const unsigned int Iterations[2] = {1, OCLPerfMapBufferReadSpeed::NUM_ITER}; #define NUM_OFFSETS 1 static const unsigned int offsets[NUM_OFFSETS] = {0}; #define NUM_SUBTESTS (3 + NUM_OFFSETS) OCLPerfMapBufferReadSpeed::OCLPerfMapBufferReadSpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * 2; } OCLPerfMapBufferReadSpeed::~OCLPerfMapBufferReadSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfMapBufferReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; persistent = false; allocHostPtr = false; useHostPtr = false; hostMem = NULL; alignedMem = NULL; alignment = 4096; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); bufSize_ = Sizes[_openTest % NUM_SIZES]; if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) > 2) { useHostPtr = true; offset = offsets[((_openTest / NUM_SIZES) % NUM_SUBTESTS) - 3]; } else if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 2) && isAMD) { persistent = true; } else if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 1) { allocHostPtr = true; } numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS)]; devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; if (persistent) { flags |= CL_MEM_USE_PERSISTENT_MEM_AMD; } else if (allocHostPtr) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; hostMem = (char *)malloc(bufSize_ + alignment - 1 + offset); CHECK_RESULT(hostMem == 0, "malloc(hostMem) failed"); alignedMem = (char *)((((intptr_t)hostMem + alignment - 1) & ~(alignment - 1)) + offset); } outBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedMem, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); // Force memory to be on GPU, if possible { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, memBuffer, outBuffer_, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } } void OCLPerfMapBufferReadSpeed::run(void) { CPerfCounter timer; void *mem; // Warm up mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); } timer.Stop(); double sec = timer.GetElapsedTime(); // Map read bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; if (persistent || allocHostPtr) { _perfInfo = (float)(sec / numIter) * 1000000.0f; // Get us per map } else { _perfInfo = (float)perf; } char str[256]; if (persistent) { SNPRINTF(str, sizeof(str), "PERSISTENT (us)"); } else if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (us)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } else { SNPRINTF(str, sizeof(str), "(GB/s)"); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) i: %4d %29s ", bufSize_, numIter, str); testDescString = buf; } unsigned int OCLPerfMapBufferReadSpeed::close(void) { if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (hostMem) { free(hostMem); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMapBufferReadSpeed.h000066400000000000000000000035571450307266000264020ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MapBufferReadSpeed_H_ #define _OCL_MapBufferReadSpeed_H_ #include "OCLTestImp.h" class OCLPerfMapBufferReadSpeed : public OCLTestImp { public: OCLPerfMapBufferReadSpeed(); virtual ~OCLPerfMapBufferReadSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; bool persistent; bool allocHostPtr; bool useHostPtr; unsigned int numIter; char* hostMem; char* alignedMem; size_t alignment; unsigned int offset; bool isAMD; }; #endif // _OCL_MapBufferReadSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMapBufferWriteSpeed.cpp000066400000000000000000000240501450307266000271430ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMapBufferWriteSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; static const unsigned int Iterations[2] = { 1, OCLPerfMapBufferWriteSpeed::NUM_ITER}; #define NUM_OFFSETS 1 static const unsigned int offsets[NUM_OFFSETS] = {0}; #define NUM_SUBTESTS (3 + NUM_OFFSETS) OCLPerfMapBufferWriteSpeed::OCLPerfMapBufferWriteSpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * 3; } OCLPerfMapBufferWriteSpeed::~OCLPerfMapBufferWriteSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfMapBufferWriteSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; persistent = false; allocHostPtr = false; useHostPtr = false; hostMem = NULL; alignedMem = NULL; alignment = 4096; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); char getVersion[128]; error_ = _wrapper->clGetPlatformInfo(platform, CL_PLATFORM_VERSION, sizeof(getVersion), getVersion, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); platformVersion[0] = getVersion[7]; platformVersion[1] = getVersion[8]; platformVersion[2] = getVersion[9]; platformVersion[3] = '\0'; bufSize_ = Sizes[_openTest % NUM_SIZES]; if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) > 2) { useHostPtr = true; offset = offsets[((_openTest / NUM_SIZES) % NUM_SUBTESTS) - 3]; } else if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 2) && isAMD) { persistent = true; } else if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 1) { allocHostPtr = true; } numIter = Iterations[std::min(_openTest / (NUM_SIZES * NUM_SUBTESTS), 1u)]; if (_openTest < NUM_SIZES * NUM_SUBTESTS * 2) { mapFlags = CL_MAP_WRITE; } else { mapFlags = CL_MAP_WRITE_INVALIDATE_REGION; } devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_READ_ONLY; if (persistent) { flags |= CL_MEM_USE_PERSISTENT_MEM_AMD; } else if (allocHostPtr) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; hostMem = (char *)malloc(bufSize_ + alignment - 1 + offset); CHECK_RESULT(hostMem == 0, "malloc(hostMem) failed"); alignedMem = (char *)((((intptr_t)hostMem + alignment - 1) & ~(alignment - 1)) + offset); } outBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedMem, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); // Force memory to be on GPU if possible { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, outBuffer_, memBuffer, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } } void OCLPerfMapBufferWriteSpeed::run(void) { CPerfCounter timer; if (_openTest >= NUM_SIZES * NUM_SUBTESTS * 2) { // Skip CL_MAP_WRITE_INVALIDATE_REGION testing for 1.0 and 1.1 platforms if ((platformVersion[0] == '1') && ((platformVersion[2] == '0') || (platformVersion[2] == '1'))) { char buf[256]; SNPRINTF(buf, sizeof(buf), " SKIPPED "); testDescString = buf; return; } } void *mem; // Warm up mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer_, CL_TRUE, mapFlags, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer_, CL_TRUE, mapFlags, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); } timer.Stop(); double sec = timer.GetElapsedTime(); // Map write bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; if (persistent || allocHostPtr) { _perfInfo = (float)(sec / numIter) * 1000000.0f; // Get us per map } else { _perfInfo = (float)perf; } char str[256]; if (persistent) { SNPRINTF(str, sizeof(str), "PERSISTENT (us)"); } else if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (us)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } else { SNPRINTF(str, sizeof(str), "(GB/s)"); } char str2[256]; if (mapFlags == CL_MAP_WRITE_INVALIDATE_REGION) { SNPRINTF(str2, sizeof(str2), "INV_REG %29s", str); } else { SNPRINTF(str2, sizeof(str2), "%29s", str); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) i: %4d %37s ", bufSize_, numIter, str2); testDescString = buf; } unsigned int OCLPerfMapBufferWriteSpeed::close(void) { if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (hostMem) { free(hostMem); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMapBufferWriteSpeed.h000066400000000000000000000036521450307266000266150ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MapBufferWriteSpeed_H_ #define _OCL_MapBufferWriteSpeed_H_ #include "OCLTestImp.h" class OCLPerfMapBufferWriteSpeed : public OCLTestImp { public: OCLPerfMapBufferWriteSpeed(); virtual ~OCLPerfMapBufferWriteSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; bool persistent; bool allocHostPtr; bool useHostPtr; unsigned int numIter; char* hostMem; char* alignedMem; size_t alignment; unsigned int offset; bool isAMD; cl_map_flags mapFlags; char platformVersion[32]; }; #endif // _OCL_MapBufferWriteSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMapImageReadSpeed.cpp000066400000000000000000000177251450307266000265500ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMapImageReadSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {256, 512, 1024, 2048}; #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"R8G8B8A8"}; static const unsigned int formatSize[NUM_FORMATS] = {4}; static const unsigned int Iterations[2] = {1, OCLPerfMapImageReadSpeed::NUM_ITER}; OCLPerfMapImageReadSpeed::OCLPerfMapImageReadSpeed() { _numSubTests = NUM_SIZES * NUM_FORMATS * 2; } OCLPerfMapImageReadSpeed::~OCLPerfMapImageReadSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfMapImageReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; skip_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } bufSize_ = Sizes[_openTest % NUM_SIZES]; bufnum_ = (_openTest / NUM_SIZES) % NUM_FORMATS; numIter = Iterations[_openTest / (NUM_SIZES * NUM_FORMATS)]; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; outBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateImage(outBuffer) failed"); } void OCLPerfMapImageReadSpeed::run(void) { if(skip_) { return; } CPerfCounter timer; void *mem; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, 1}; size_t image_row_pitch; size_t image_slice_pitch; // Warm up mem = _wrapper->clEnqueueMapImage( cmd_queue_, outBuffer_, CL_TRUE, CL_MAP_READ, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { mem = _wrapper->clEnqueueMapImage( cmd_queue_, outBuffer_, CL_TRUE, CL_MAP_READ, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); } timer.Stop(); double sec = timer.GetElapsedTime(); // Image map read bandwidth in GB/s double perf = ((double)bufSize_ * bufSize_ * formatSize[bufnum_] * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s i: %4d (GB/s) ", bufSize_, bufSize_, textFormats[bufnum_], numIter); testDescString = buf; } unsigned int OCLPerfMapImageReadSpeed::close(void) { if (skip_) { return CL_SUCCESS; } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMapImageReadSpeed.h000066400000000000000000000033651450307266000262100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MapImageReadSpeed_H_ #define _OCL_MapImageReadSpeed_H_ #include "OCLTestImp.h" class OCLPerfMapImageReadSpeed : public OCLTestImp { public: OCLPerfMapImageReadSpeed(); virtual ~OCLPerfMapImageReadSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; unsigned int bufnum_; unsigned int numIter; bool skip_; }; #endif // _OCL_MapImageReadSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMapImageWriteSpeed.cpp000066400000000000000000000200071450307266000267520ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMapImageWriteSpeed.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 4 static const unsigned int Sizes[NUM_SIZES] = {256, 512, 1024, 2048}; #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = { {CL_RGBA, CL_UNSIGNED_INT8}}; static const char *textFormats[NUM_FORMATS] = {"R8G8B8A8"}; static const unsigned int formatSize[NUM_FORMATS] = {4}; static const unsigned int Iterations[2] = {1, OCLPerfMapImageWriteSpeed::NUM_ITER}; OCLPerfMapImageWriteSpeed::OCLPerfMapImageWriteSpeed() { _numSubTests = NUM_SIZES * NUM_FORMATS * 2; } OCLPerfMapImageWriteSpeed::~OCLPerfMapImageWriteSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfMapImageWriteSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; skip_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } bufSize_ = Sizes[_openTest % NUM_SIZES]; bufnum_ = (_openTest / NUM_SIZES) % NUM_FORMATS; numIter = Iterations[_openTest / (NUM_SIZES * NUM_FORMATS)]; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; size_t size; cl_bool imageSupport_ = false; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport_), &imageSupport_, &size); if (!imageSupport_) { printf("\n%s\n", "Image not supported, skipping this test!"); skip_ = true; return; } context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_READ_ONLY; outBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSize_, bufSize_, 0, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateImage(outBuffer) failed"); } void OCLPerfMapImageWriteSpeed::run(void) { if (skip_) { return; } CPerfCounter timer; void *mem; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_, bufSize_, 1}; size_t image_row_pitch; size_t image_slice_pitch; // Warm up mem = _wrapper->clEnqueueMapImage( cmd_queue_, outBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { mem = _wrapper->clEnqueueMapImage( cmd_queue_, outBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); } timer.Stop(); double sec = timer.GetElapsedTime(); // Image map write bandwidth in GB/s double perf = ((double)bufSize_ * bufSize_ * formatSize[bufnum_] * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s i: %4d (GB/s) ", bufSize_, bufSize_, textFormats[bufnum_], numIter); testDescString = buf; } unsigned int OCLPerfMapImageWriteSpeed::close(void) { if (skip_) { return CL_SUCCESS; } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMapImageWriteSpeed.h000066400000000000000000000033731450307266000264260ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MapImageWriteSpeed_H_ #define _OCL_MapImageWriteSpeed_H_ #include "OCLTestImp.h" class OCLPerfMapImageWriteSpeed : public OCLTestImp { public: OCLPerfMapImageWriteSpeed(); virtual ~OCLPerfMapImageWriteSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; unsigned int bufnum_; unsigned int numIter; bool skip_; }; #endif // _OCL_MapImageWriteSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMatrixTranspose.cpp000066400000000000000000000277701450307266000264570ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMatrixTranspose.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const unsigned int NUM_BLOCK_SIZES = 2; static const unsigned int blockSizes[NUM_BLOCK_SIZES] = {8, 16}; static const unsigned int NUM_MATRIX_DIMS = 2; static const unsigned int matrixDims[NUM_MATRIX_DIMS] = {1024, 1920}; static const char *matrixtranspose_kernel = "kernel void matrixTranspose(global uint *restrict inBuf, global uint " "*restrict outBuf, local uint *localBuf, uint blockSize, uint width, uint " "height)\n" "{\n" " uint globalIdx = get_global_id(0);\n" " uint globalIdy = get_global_id(1);\n" " uint localIdx = get_local_id(0);\n" " uint localIdy = get_local_id(1);\n" " /* copy from input to local memory */\n" " /* Note that we transpose the x and y coordinates when storing */\n" " localBuf[localIdx*blockSize + localIdy] = inBuf[globalIdy*width + " "globalIdx];\n" " /* wait until the whole block is filled */\n" " barrier(CLK_LOCAL_MEM_FENCE);\n" " uint groupIdx = get_group_id(0);\n" " uint groupIdy = get_group_id(1);\n" " /* calculate the corresponding target location for transpose by " "inverting x and y values*/\n" " /* Here we don't swap localIdx and localIdy, this is to get larger " "bursts when threads write to memory. */\n" " /* To make this work, we've swapped the coordinates when we write to " "local memory. */\n" " uint targetGlobalIdx = groupIdy*blockSize + localIdx;\n" " uint targetGlobalIdy = groupIdx*blockSize + localIdy;\n" " /* calculate the corresponding raster indices of source and target " "*/\n" " uint targetIndex = targetGlobalIdy*height + targetGlobalIdx;\n" " uint sourceIndex = localIdy * blockSize + localIdx;\n" " outBuf[targetIndex] = localBuf[sourceIndex];\n" "}\n"; OCLPerfMatrixTranspose::OCLPerfMatrixTranspose() { _numSubTests = NUM_BLOCK_SIZES * NUM_MATRIX_DIMS; } OCLPerfMatrixTranspose::~OCLPerfMatrixTranspose() {} void OCLPerfMatrixTranspose::setData(cl_mem buffer) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < height_; i++) { for (unsigned int j = 0; j < width_; j++) { *(data + i * width_ + j) = i * width_ + j; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } void OCLPerfMatrixTranspose::fillData(cl_mem buffer, unsigned int val) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < width_ * height_; i++) { data[i] = val; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } void OCLPerfMatrixTranspose::checkData(cl_mem buffer) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); bool err = false; for (unsigned int i = 0; (i < width_) && !err; i++) { for (unsigned int j = 0; (j < height_) && !err; j++) { if (*(data + i * height_ + j) != (j * width_ + i)) { printf("Data mismatch at (%d, %d)! Got %d, expected %d\n", j, i, *(data + i * height_ + j), j * width_ + i); err = true; break; } } break; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfMatrixTranspose::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; blockSize_ = blockSizes[_openTest % NUM_BLOCK_SIZES]; width_ = matrixDims[_openTest / NUM_BLOCK_SIZES]; height_ = width_; // We compute a square domain bufSize_ = width_ * height_ * sizeof(cl_uint); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); inBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, bufSize_, NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); setData(inBuffer_); outBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); fillData(outBuffer_, 0xdeadbeef); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&matrixtranspose_kernel, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); char *buildOps = NULL; error_ = _wrapper->clBuildProgram(program_, 1, &device, buildOps, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "matrixTranspose", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg( kernel_, 2, sizeof(cl_uint) * blockSize_ * blockSize_, NULL); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_uint), (void *)&blockSize_); error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_uint), (void *)&width_); error_ = _wrapper->clSetKernelArg(kernel_, 5, sizeof(cl_uint), (void *)&height_); } void OCLPerfMatrixTranspose::run(void) { size_t global_work_size[2] = {width_, height_}; size_t local_work_size[2] = {blockSize_, blockSize_}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < MAX_ITERATIONS; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 2, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); } CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); checkData(outBuffer_); // Compute GB/s double perf = ((double)bufSize_ * (double)MAX_ITERATIONS * (double)(1e-09)) / sec; _perfInfo = (float)perf; testDescString = ""; char str[64]; sprintf(str, "(%d,%d) matrix with (%2d,%2d) block size %fms (GB/s) ", width_, height_, blockSize_, blockSize_, (sec / (double)MAX_ITERATIONS) * 1000.); testDescString += str; } unsigned int OCLPerfMatrixTranspose::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMatrixTranspose.h000066400000000000000000000037071450307266000261160ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MATRIX_TRANSPOSE_H_ #define _OCL_MATRIX_TRANSPOSE_H_ #include "OCLTestImp.h" class OCLPerfMatrixTranspose : public OCLTestImp { public: OCLPerfMatrixTranspose(); virtual ~OCLPerfMatrixTranspose(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void setData(cl_mem buffer); void fillData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem inBuffer_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int height_; unsigned int bufSize_; unsigned int blockSize_; static const unsigned int MAX_ITERATIONS = 50; }; #endif // _OCL_MATRIX_TRANSPOSE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMemCombine.cpp000066400000000000000000000204631450307266000253170ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMemCombine.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif struct TestParams { const char* type; unsigned int numCombine; unsigned int assignSize; }; TestParams testParams[] // char type causes shader compiler to crash. reenable once get a fix for // the shader compiler //= {{"char", 16}, {"short", 8}, {"int", 4}, {"long", 4}, {"float", 4}}; //= {{"char", 16, 1}, {"short", 8, 2}, {"int", 4, 4}, {"long", 4, 8}, = {{"short", 8, 2}, {"int", 4, 4}, {"long", 4, 8}, {"float", 4, 4}, {"char4", 4, 4}, {"uchar16", 4, 16}, {"short2", 4, 4}, {"int2", 4, 8}, {"uint4", 4, 16}, {"long2", 4, 16}, {"float2", 4, 8}}; const int numTests = sizeof(testParams) / sizeof(TestParams); // Generate a kernel that does array loads and stores, which should be combined // by MemCombine void genCombineVLoadVStores(const char* type, int loopSize, int numCombine, char* ret) { sprintf(ret, "__kernel void combine_vload_vstores(__global %s" " * restrict src, __global %s *result) {\n", type, type); strcat(ret, " int id = get_global_id(0);\n"); strcat(ret, " int gsize = get_global_size(0);\n"); char buf[256]; sprintf(buf, " for (int i = 0; i < %d; i+=gsize) {\n", loopSize); strcat(ret, buf); sprintf(buf, " int j = (i+id) * %d;\n", numCombine); strcat(ret, buf); for (int i = 0; i < numCombine; ++i) { sprintf(buf, " result[j+%d] = src[j+%d];\n", i, i); strcat(ret, buf); } strcat(ret, " }\n}\n"); } void OCLPerfMemCombine::setData(cl_mem buffer, unsigned int bufSize, unsigned char val) { unsigned char* data = (unsigned char*)_wrapper->clEnqueueMapBuffer( cmdQueues_[0], buffer, true, CL_MAP_WRITE, 0, bufSize, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < bufSize; ++i) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[0], buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmdQueues_[0]); } void print1Darray(unsigned char* buffer, unsigned int bufSize) { for (unsigned int i = 0; i < bufSize; ++i) { if (i % 32 == 0) printf("\n"); printf("%d ", buffer[i]); } printf("\n"); } void OCLPerfMemCombine::checkData(cl_mem buffer, unsigned int bufSize, unsigned int limit, unsigned char defVal) { unsigned char* data = (unsigned char*)_wrapper->clEnqueueMapBuffer( cmdQueues_[0], buffer, true, CL_MAP_READ, 0, bufSize, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < bufSize; i++) { unsigned char expected; if (i < limit) { expected = 1U; } else { expected = defVal; } if (data[i] != expected) { printf("at index %d:\n", i); print1Darray(&data[i], 16); CHECK_RESULT(1, "incorrect output data detected!"); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[0], buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmdQueues_[0]); } OCLPerfMemCombine::OCLPerfMemCombine() { _numSubTests = numTests; } OCLPerfMemCombine::~OCLPerfMemCombine() {} static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfMemCombine::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { _openTest = test; context_ = 0; kernel_ = NULL; program_ = NULL; OCLTestImp::open(test, units, conversion, deviceId); cl_mem inBuffer = _wrapper->clCreateBuffer(context_, 0, inSize_, NULL, &error_); CHECK_RESULT(inBuffer == 0, "clCreateBuffer(inBuffer) failed"); buffers_.push_back(inBuffer); cl_mem outBuffer = _wrapper->clCreateBuffer(context_, 0, outSize_, NULL, &error_); CHECK_RESULT(outBuffer == 0, "clCreateBuffer(outBuffer) failed"); buffers_.push_back(outBuffer); createKernel(testParams[test].type, testParams[test].numCombine); setData(inBuffer, inSize_, 1U); setData(outBuffer, outSize_, 0); dataRange_ = loopSize_ * numCombine_ * testParams[test].assignSize; } void OCLPerfMemCombine::createKernel(const char* type, int numCombine) { dataType_ = type; numCombine_ = numCombine; ///////////////////////////////////////////////////////////////// // Load CL file, build CL program object, create CL kernel object ///////////////////////////////////////////////////////////////// char source[1024]; genCombineVLoadVStores(type, loopSize_, numCombine, source); size_t sourceSize[] = {strlen(source)}; const char* src = &source[0]; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &src, sourceSize, &error_); CHECK_RESULT(error_ != CL_SUCCESS, "clCreateProgramWithSource failed"); /* create a cl program executable for all the devices specified */ error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); return; } /* get a kernel object handle for a kernel with the given name */ const char* kernelName = "combine_vload_vstores"; kernel_ = _wrapper->clCreateKernel(program_, kernelName, &error_); CHECK_RESULT(error_ != CL_SUCCESS, "clCreateProgramWithSource failed"); /*** Set appropriate arguments to the kernel ***/ /* the input array to the kernel */ error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers()[0]); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); /* the output array to the kernel */ error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void*)&buffers()[1]); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg failed"); } void OCLPerfMemCombine::run(void) { size_t globalThreads[1]; size_t localThreads[1]; globalThreads[0] = 64; localThreads[0] = 64; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; ++i) { /* * Enqueue a kernel run call. */ error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[0], kernel_, 1, NULL, globalThreads, localThreads, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmdQueues_[0]); timer.Stop(); double sec = timer.GetElapsedTime(); char buf[256]; SNPRINTF(buf, sizeof(buf), "%d %-8s (sec)", numCombine_, dataType_); testDescString = buf; _perfInfo = (float)sec; checkData(buffers()[1], outSize_, dataRange_, 0); return; } unsigned int OCLPerfMemCombine::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMemCombine.h000066400000000000000000000040151450307266000247570ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MemCombine_H_ #define _OCL_MemCombine_H_ #include "OCLTestImp.h" class OCLPerfMemCombine : public OCLTestImp { enum { inSize_ = 4096U * 1024U }; enum { outSize_ = 4096U * 1024U }; enum { loopSize_ = 8192 }; public: OCLPerfMemCombine(); virtual ~OCLPerfMemCombine(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; const char* dataType_; unsigned int numCombine_; unsigned int dataRange_; unsigned char input[inSize_]; unsigned char output[outSize_]; private: void createKernel(const char* type, int numCombine); void setData(cl_mem buffer, unsigned int bufSize, unsigned char val); void checkData(cl_mem buffer, unsigned int bufSize, unsigned int limit, unsigned char defVal); }; #endif // _OCL_MemCombine_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMemCreate.cpp000066400000000000000000000144001450307266000251400ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMemCreate.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" #if defined(_WIN32) && !defined(_WIN64) static const size_t BufSize = 0x200000; static const size_t BufSizeC = 0x100000; #else static const size_t BufSize = 0x400000; static const size_t BufSizeC = 0x200000; #endif static const size_t Iterations = 0x100; static const size_t IterationsC = 0x1000; static const char* strKernel = "__kernel void dummy(__global uint* out) \n" "{ \n" " uint id = get_global_id(0); \n" " uint value = 1; \n" " if ((int)get_local_id(0) < 0) \n" " out[id] = value; \n" "} \n"; #define NUM_TESTS 5 OCLPerfMemCreate::OCLPerfMemCreate() { _numSubTests = NUM_TESTS * 2; failed_ = false; } OCLPerfMemCreate::~OCLPerfMemCreate() {} void OCLPerfMemCreate::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { _deviceId = deviceId; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); test_ = test % NUM_TESTS; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); useSubBuf_ = (test >= NUM_TESTS); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "dummy", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfMemCreate::run(void) { if (failed_) { return; } cl_mem buffer, subBuf; cl_mem* bufptr; unsigned int* values; values = reinterpret_cast(new cl_int4[BufSize]); CPerfCounter timer; cl_mem_flags flags = CL_MEM_READ_ONLY; void* hostPtr = NULL; // Clear destination buffer memset(values, 0, BufSize * sizeof(cl_int4)); size_t bufSize = ((test_ % 2) == 0) ? BufSize * sizeof(cl_int4) : BufSizeC * sizeof(cl_int4); size_t iter = ((test_ % 2) == 0) ? Iterations : IterationsC; if (test_ == 4) { hostPtr = values; bufSize = 0x100000; flags = CL_MEM_USE_HOST_PTR; } else if ((test_ / 2) > 0) { iter = ((test_ % 2) == 0) ? Iterations / 10 : IterationsC; flags |= CL_MEM_ALLOC_HOST_PTR; } timer.Reset(); timer.Start(); for (size_t i = 0; i < iter; ++i) { buffer = _wrapper->clCreateBuffer(context_, flags, bufSize, hostPtr, &error_); bufptr = &buffer; CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); if (useSubBuf_) { cl_buffer_region reg; reg.origin = 0; reg.size = bufSize; subBuf = _wrapper->clCreateSubBuffer( buffer, flags, CL_BUFFER_CREATE_TYPE_REGION, ®, &error_); bufptr = &subBuf; CHECK_RESULT((error_ != CL_SUCCESS), "clCreateSubBuffer() failed"); } error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), bufptr); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {64}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); if (useSubBuf_) _wrapper->clReleaseMemObject(subBuf); _wrapper->clReleaseMemObject(buffer); } timer.Stop(); std::stringstream stream; static const char* Message[] = {" create+destroy time [uncached] ", " create+destroy time [cached ] "}; static const char* Type[] = {"DEV", "AHP", "UHP"}; stream << Type[test_ / 2]; stream << Message[test_ % 2]; stream << " per allocation (ms) "; stream << bufSize / 1024 << " KB"; if (useSubBuf_) stream << " subbuf "; testDescString = stream.str(); _perfInfo = static_cast(timer.GetElapsedTime() * 1000 / iter); delete[] values; } unsigned int OCLPerfMemCreate::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMemCreate.h000066400000000000000000000030711450307266000246070ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_MEM_CREATE_H_ #define _OCL_PERF_MEM_CREATE_H_ #include "OCLTestImp.h" class OCLPerfMemCreate : public OCLTestImp { public: OCLPerfMemCreate(); virtual ~OCLPerfMemCreate(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int test_; bool useSubBuf_; }; #endif // _OCL_PERF_MEM_CREATE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMemLatency.cpp000066400000000000000000000342051450307266000253410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfMemLatency.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const unsigned int NUM_SIZES = 16; // 2k up to 64MB static const unsigned int Sizes[NUM_SIZES] = { 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864}; // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif void OCLPerfMemLatency::genShader() { shader_.clear(); // DO NOT PUBLISH // Adopted from SiSoft Sandra 2013's memory latency test shader_ += "#ifdef MAKEVOLATILE\n" "#define VOLATILE volatile\n" "#else\n" "#define VOLATILE\n" "#endif\n" "__kernel\n" //"__attribute__((work_group_size_hint(1, 1, 1)))\n" "void MemWalker(\n" " global VOLATILE uint * restrict input,\n" " __global uint * restrict output,\n" " const uint uCount, const uint uSize,\n" " const uint uOffset, const int bMem, const uint repeats)\n" "{\n" " uint o = uOffset;\n" " uint lid = 0;//get_local_id(0)*o;\n" " uint x = lid;\n" "\n" " for (uint loop = 0; loop < repeats; loop++) {\n" " uint i = uCount;\n" " while (i--) {\n" " x = input[x] /* + o*/;\n" " }\n" " }\n" "\n" "#ifdef MAKERW\n" " input[0] = x;\n" "#endif\n" " output[0] = x;\n" "}\n"; // printf("shader:\n%s\n", shader_.c_str()); shader_ += "\n\n"; shader_ += "__kernel\n" //"__attribute__((work_group_size_hint(1, 1, 1)))\n" "void Overhead(\n" " __global uint * restrict input,\n" " __global uint * restrict output,\n" " const uint uCount, const uint uSize,\n" " const uint uOffset, const int bMem, const uint repeats)\n" "{\n" "#ifdef USE_FLOAT\n" " float x = (float)input[0];\n" " for (uint loop = 0; loop < repeats; loop++) {\n" " uint i = uCount;\n" " x = (float)uOffset*x;\n" " while (i--) {\n" " x += (float)i;\n" " }\n" " }\n" " output[0] = (uint)x;\n" "#else\n" " uint x = input[0];\n" " for (uint loop = 0; loop < repeats; loop++) {\n" " uint i = uCount;\n" " x = x*uOffset;\n" " while (i--) {\n" " x += i;\n" " }\n" " }\n" " output[0] = x;\n" "#endif\n" "}\n"; } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLPerfMemLatency::OCLPerfMemLatency() { _numSubTests = NUM_SIZES * 6; maxSize_ = Sizes[NUM_SIZES - 1]; } OCLPerfMemLatency::~OCLPerfMemLatency() {} void OCLPerfMemLatency::setData(cl_mem buffer, unsigned int val) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_WRITE, 0, width_, 0, NULL, NULL, &error_); unsigned int *data = (unsigned int *)ptr; for (unsigned int i = 0; i < bufSizeDW_; i++) { data[(i * (1024 + 17)) % bufSizeDW_] = ((i + 1) * (1024 + 17)) % bufSizeDW_; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); clFinish(cmd_queue_); } void OCLPerfMemLatency::checkData(cl_mem buffer) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, &error_); unsigned int *data = (unsigned int *)ptr; if (data[0] != 0) { printf("OutData= 0x%08x\n", data[0]); CHECK_RESULT_NO_RETURN(data[0] != 0, "Data validation failed!\n"); } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); } void OCLPerfMemLatency::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; moreThreads = false; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; _errorFlag = false; // Reset error code so a single error doesn't prevent // other subtests from running _errorMsg = ""; isAMD_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD_ = true; } } delete platforms; } width_ = Sizes[test % NUM_SIZES]; bufSizeDW_ = width_ / sizeof(cl_uint); moreThreads = ((test / NUM_SIZES) % 2) ? true : false; makeVolatile = (test >= 2 * NUM_SIZES) ? true : false; makeRW = (test >= 4 * NUM_SIZES) ? true : false; CHECK_RESULT(platform == 0, "Couldn't find OpenCL platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "Failed to allocate devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); device = devices[0]; free(devices); devices = NULL; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); inBuffer_ = _wrapper->clCreateBuffer(context_, 0, width_, NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, 1 * sizeof(cl_uint), NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); genShader(); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); if (isAMD_) args += " -D USE_FLOAT"; if (makeVolatile) args += " -D MAKEVOLATILE"; if (makeRW) args += " -D MAKERW"; error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "MemWalker", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel(MemWalker) failed"); kernel2_ = _wrapper->clCreateKernel(program_, "Overhead", &error_); CHECK_RESULT(kernel2_ == 0, "clCreateKernel(Overhead) failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_uint), (void *)&bufSizeDW_); unsigned int zero = 0; error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_uint), (void *)&zero); int bMem = 1; error_ = _wrapper->clSetKernelArg(kernel_, 5, sizeof(cl_int), (void *)&bMem); // Limit the repeats, large buffers will have more samples, but the test runs // for a long time repeats_ = std::max((maxSize_ >> 4) / bufSizeDW_, 1u); error_ = _wrapper->clSetKernelArg(kernel_, 6, sizeof(cl_uint), (void *)&repeats_); error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel2_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel2_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel2_, 3, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel2_, 4, sizeof(cl_uint), (void *)&zero); error_ = _wrapper->clSetKernelArg(kernel2_, 5, sizeof(cl_int), (void *)&bMem); error_ = _wrapper->clSetKernelArg(kernel2_, 6, sizeof(cl_uint), (void *)&repeats_); setData(inBuffer_, (int)1.0f); } void OCLPerfMemLatency::run(void) { int global = 1; int local = 1; if (moreThreads) { if (isAMD_) { global *= 64; local *= 64; } else { global *= 32; local *= 32; } } size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; // Warm-up unsigned int warmup = 128; error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&warmup); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); _wrapper->clFinish(cmd_queue_); // Restore input buffer when finished as it may have been modified by RW test setData(inBuffer_, (int)1.0f); CPerfCounter timer, timer2; timer.Reset(); timer.Start(); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); checkData(outBuffer_); timer2.Reset(); timer2.Start(); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel2_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer2.Stop(); double sec = timer.GetElapsedTime() - timer2.GetElapsedTime(); // Read latency in ns double perf = sec * (double)(1e09) / ((double)bufSizeDW_ * (double)repeats_); _perfInfo = (float)perf; char buf[256]; char buf2[32]; if (makeRW) SNPRINTF(buf2, sizeof(buf), "volatileRW"); else if (makeVolatile) SNPRINTF(buf2, sizeof(buf), "volatile"); else buf2[0] = '\0'; SNPRINTF(buf, sizeof(buf), "%10s %2d threads, %8d reads, %5d repeats (ns)", buf2, global, bufSizeDW_, repeats_); testDescString = buf; } unsigned int OCLPerfMemLatency::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (kernel2_) { error_ = _wrapper->clReleaseKernel(kernel2_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfMemLatency.h000066400000000000000000000037131450307266000250060ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MEMLATENCY_H_ #define _OCL_MEMLATENCY_H_ #include "OCLTestImp.h" class OCLPerfMemLatency : public OCLTestImp { public: OCLPerfMemLatency(); virtual ~OCLPerfMemLatency(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(void); void setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_kernel kernel2_; cl_mem inBuffer_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int bufSizeDW_; unsigned int repeats_; unsigned int maxSize_; bool isAMD_; bool moreThreads; bool makeVolatile; bool makeRW; }; #endif // _OCL_MEMLATENCY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfPinnedBufferReadSpeed.cpp000066400000000000000000000277471450307266000274440ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfPinnedBufferReadSpeed.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 8 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = { 1024, 4 * 1024, 8 * 1024, 16 * 1024, 262144, 1048576, 4194304, 16777216}; static const unsigned int Iterations[2] = { 1, OCLPerfPinnedBufferReadSpeed::NUM_ITER}; #define NUM_OFFSETS 2 static const unsigned int offsets[NUM_OFFSETS] = {0, 16}; #define NUM_SUBTESTS (1 + NUM_OFFSETS) static cl_uint blockedSubtests; OCLPerfPinnedBufferReadSpeed::OCLPerfPinnedBufferReadSpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * 2; blockedSubtests = _numSubTests; _numSubTests += NUM_SIZES * NUM_SUBTESTS; } OCLPerfPinnedBufferReadSpeed::~OCLPerfPinnedBufferReadSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} const char *blkStr[2] = {"n/b", "blk"}; void OCLPerfPinnedBufferReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; inBuffer_ = 0; outBuffer_ = 0; persistent = false; allocHostPtr = false; useHostPtr = false; hostMem = NULL; alignedMem = NULL; alignment = 4096; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); char getVersion[128]; error_ = _wrapper->clGetPlatformInfo(platform, CL_PLATFORM_VERSION, sizeof(getVersion), getVersion, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); platformVersion[0] = getVersion[7]; platformVersion[1] = getVersion[8]; platformVersion[2] = getVersion[9]; platformVersion[3] = '\0'; bufSize_ = Sizes[_openTest % NUM_SIZES]; if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) > 0) { useHostPtr = true; offset = offsets[((_openTest / NUM_SIZES) % NUM_SUBTESTS) - 1]; } else if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 0) { allocHostPtr = true; } if (_openTest < blockedSubtests) { numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS)]; } else { numIter = 4 * OCLPerfPinnedBufferReadSpeed::NUM_ITER / ((_openTest % NUM_SIZES) + 1); } devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; if (allocHostPtr) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; hostMem = (char *)malloc(bufSize_ + alignment - 1 + offset); CHECK_RESULT(hostMem == 0, "malloc(hostMem) failed"); alignedMem = (char *)((((intptr_t)hostMem + alignment - 1) & ~(alignment - 1)) + offset); } inBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, 0, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedMem, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); // Force memory to be on GPU if possible { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, memBuffer, outBuffer_, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clEnqueueCopyBuffer(cmd_queue_, memBuffer, inBuffer_, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } } void OCLPerfPinnedBufferReadSpeed::run(void) { CPerfCounter timer; void *mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); cl_bool blocking = (_openTest < blockedSubtests) ? CL_TRUE : CL_FALSE; // Warm up error_ = _wrapper->clEnqueueReadBuffer(cmd_queue_, inBuffer_, CL_TRUE, 0, bufSize_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clEnqueueReadBuffer(cmd_queue_, inBuffer_, blocking, 0, bufSize_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer failed"); } if (blocking != CL_TRUE) { _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %3s i: %4d %31s ", bufSize_, blkStr[blocking], numIter, str); testDescString = buf; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapMemObject failed"); } unsigned int OCLPerfPinnedBufferReadSpeed::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (hostMem) { free(hostMem); } return _crcword; } void OCLPerfPinnedBufferReadRectSpeed::run(void) { CPerfCounter timer; void *mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, outBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); size_t width = static_cast(sqrt(static_cast(bufSize_))); cl_bool blocking = (_openTest < blockedSubtests) ? CL_TRUE : CL_FALSE; size_t bufOrigin[3] = {0, 0, 0}; size_t hostOrigin[3] = {0, 0, 0}; size_t region[3] = {width, width, 1}; // Clamp iteration count to reduce test run time unsigned int testNumIter; testNumIter = (numIter < 100 ? numIter : 100); // Skip for 1.0 platforms if ((platformVersion[0] == '1') && (platformVersion[2] == '0')) { testDescString = " SKIPPED "; return; } // Warm up error_ = _wrapper->clEnqueueReadBufferRect( cmd_queue_, inBuffer_, CL_TRUE, bufOrigin, hostOrigin, region, width, 0, width, 0, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBufferRect failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < testNumIter; i++) { error_ = _wrapper->clEnqueueReadBufferRect( cmd_queue_, inBuffer_, blocking, bufOrigin, hostOrigin, region, width, 0, width, 0, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBufferRect failed"); } if (blocking != CL_TRUE) { _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s double perf = ((double)bufSize_ * testNumIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %3s i: %4d %31s ", bufSize_, blkStr[blocking], testNumIter, str); testDescString = buf; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, outBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapMemObject failed"); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfPinnedBufferReadSpeed.h000066400000000000000000000041721450307266000270740ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PinnedBufferReadSpeed_H_ #define _OCL_PinnedBufferReadSpeed_H_ #include "OCLTestImp.h" class OCLPerfPinnedBufferReadSpeed : public OCLTestImp { public: OCLPerfPinnedBufferReadSpeed(); virtual ~OCLPerfPinnedBufferReadSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem inBuffer_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; bool persistent; bool allocHostPtr; bool useHostPtr; unsigned int numIter; char* hostMem; char* alignedMem; size_t alignment; unsigned int offset; bool isAMD; char platformVersion[32]; }; class OCLPerfPinnedBufferReadRectSpeed : public OCLPerfPinnedBufferReadSpeed { public: OCLPerfPinnedBufferReadRectSpeed() : OCLPerfPinnedBufferReadSpeed() {} public: virtual void run(void); }; #endif // _OCL_PinnedBufferReadSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfPinnedBufferWriteSpeed.cpp000066400000000000000000000277351450307266000276600ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfPinnedBufferWriteSpeed.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 8 // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = { 1024, 4 * 1024, 8 * 1024, 16 * 1024, 262144, 1048576, 4194304, 16777216}; static cl_uint blockedSubtests; static const unsigned int Iterations[2] = { 1, OCLPerfPinnedBufferWriteSpeed::NUM_ITER}; #define NUM_OFFSETS 2 static const unsigned int offsets[NUM_OFFSETS] = {0, 16}; #define NUM_SUBTESTS (1 + NUM_OFFSETS) OCLPerfPinnedBufferWriteSpeed::OCLPerfPinnedBufferWriteSpeed() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * 2; blockedSubtests = _numSubTests; _numSubTests += NUM_SIZES * NUM_SUBTESTS; } OCLPerfPinnedBufferWriteSpeed::~OCLPerfPinnedBufferWriteSpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} extern const char *blkStr[2]; void OCLPerfPinnedBufferWriteSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; outBuffer_ = 0; persistent = false; allocHostPtr = false; useHostPtr = false; hostMem = NULL; alignedMem = NULL; alignment = 4096; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); char getVersion[128]; error_ = _wrapper->clGetPlatformInfo(platform, CL_PLATFORM_VERSION, sizeof(getVersion), getVersion, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); platformVersion[0] = getVersion[7]; platformVersion[1] = getVersion[8]; platformVersion[2] = getVersion[9]; platformVersion[3] = '\0'; bufSize_ = Sizes[_openTest % NUM_SIZES]; if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) > 0) { useHostPtr = true; offset = offsets[((_openTest / NUM_SIZES) % NUM_SUBTESTS) - 1]; } else if (((_openTest / NUM_SIZES) % NUM_SUBTESTS) == 0) { allocHostPtr = true; } if (_openTest < blockedSubtests) { numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS)]; } else { numIter = 4 * OCLPerfPinnedBufferWriteSpeed::NUM_ITER / ((_openTest % NUM_SIZES) + 1); } devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_READ_ONLY; if (allocHostPtr) { flags |= CL_MEM_ALLOC_HOST_PTR; } else if (useHostPtr) { flags |= CL_MEM_USE_HOST_PTR; hostMem = (char *)malloc(bufSize_ + alignment - 1 + offset); CHECK_RESULT(hostMem == 0, "malloc(hostMem) failed"); alignedMem = (char *)((((intptr_t)hostMem + alignment - 1) & ~(alignment - 1)) + offset); } inBuffer_ = _wrapper->clCreateBuffer(context_, flags, bufSize_, alignedMem, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, 0, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); // Force memory to be on GPU if possible { cl_mem memBuffer = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(memBuffer == 0, "clCreateBuffer(memBuffer) failed"); _wrapper->clEnqueueCopyBuffer(cmd_queue_, memBuffer, inBuffer_, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clEnqueueCopyBuffer(cmd_queue_, memBuffer, outBuffer_, 0, 0, bufSize_, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); _wrapper->clReleaseMemObject(memBuffer); } } void OCLPerfPinnedBufferWriteSpeed::run(void) { CPerfCounter timer; void *mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, inBuffer_, CL_TRUE, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); cl_bool blocking = (_openTest < blockedSubtests) ? CL_TRUE : CL_FALSE; // Warm up error_ = _wrapper->clEnqueueWriteBuffer(cmd_queue_, outBuffer_, CL_TRUE, 0, bufSize_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueWriteBuffer failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clEnqueueWriteBuffer(cmd_queue_, outBuffer_, blocking, 0, bufSize_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueWriteBuffer failed"); } if (blocking != CL_TRUE) { _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s double perf = ((double)bufSize_ * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %3s i: %4d %31s ", bufSize_, blkStr[blocking], numIter, str); testDescString = buf; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, inBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapMemObject failed"); } unsigned int OCLPerfPinnedBufferWriteSpeed::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (hostMem) { free(hostMem); } return _crcword; } void OCLPerfPinnedBufferWriteRectSpeed::run(void) { CPerfCounter timer; void *mem = _wrapper->clEnqueueMapBuffer(cmd_queue_, inBuffer_, CL_TRUE, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); size_t width = static_cast(sqrt(static_cast(bufSize_))); size_t bufOrigin[3] = {0, 0, 0}; size_t hostOrigin[3] = {0, 0, 0}; size_t region[3] = {width, width, 1}; // Clamp iteration count to reduce test run time unsigned int testNumIter; testNumIter = (numIter < 100 ? numIter : 100); cl_bool blocking = (_openTest < blockedSubtests) ? CL_TRUE : CL_FALSE; // Skip for 1.0 platforms if ((platformVersion[0] == '1') && (platformVersion[2] == '0')) { testDescString = " SKIPPED "; return; } // Warm up error_ = _wrapper->clEnqueueWriteBufferRect( cmd_queue_, outBuffer_, CL_TRUE, bufOrigin, hostOrigin, region, width, 0, width, 0, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBufferRect failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < testNumIter; i++) { error_ = _wrapper->clEnqueueWriteBufferRect( cmd_queue_, outBuffer_, blocking, bufOrigin, hostOrigin, region, width, 0, width, 0, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueWriteBufferRect failed"); } if (blocking != CL_TRUE) { _wrapper->clFinish(cmd_queue_); } timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s double perf = ((double)bufSize_ * testNumIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char str[256]; if (allocHostPtr) { SNPRINTF(str, sizeof(str), "ALLOC_HOST_PTR (GB/s)"); } else if (useHostPtr) { SNPRINTF(str, sizeof(str), "off: %4d USE_HOST_PTR (GB/s)", offset); } char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) %3s i: %4d %31s ", bufSize_, blkStr[blocking], testNumIter, str); testDescString = buf; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, inBuffer_, mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapMemObject failed"); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfPinnedBufferWriteSpeed.h000066400000000000000000000042041450307266000273070ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PinnedBufferWriteSpeed_H_ #define _OCL_PinnedBufferWriteSpeed_H_ #include "OCLTestImp.h" class OCLPerfPinnedBufferWriteSpeed : public OCLTestImp { public: OCLPerfPinnedBufferWriteSpeed(); virtual ~OCLPerfPinnedBufferWriteSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1000; cl_context context_; cl_command_queue cmd_queue_; cl_mem inBuffer_; cl_mem outBuffer_; cl_int error_; unsigned int bufSize_; bool persistent; bool allocHostPtr; bool useHostPtr; unsigned int numIter; char* hostMem; char* alignedMem; size_t alignment; unsigned int offset; bool isAMD; char platformVersion[32]; }; class OCLPerfPinnedBufferWriteRectSpeed : public OCLPerfPinnedBufferWriteSpeed { public: OCLPerfPinnedBufferWriteRectSpeed() : OCLPerfPinnedBufferWriteSpeed() {} public: virtual void run(void); }; #endif // _OCL_PinnedBufferWriteSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfPipeCopySpeed.cpp000066400000000000000000000466201450307266000260200ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfPipeCopySpeed.h" #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define KERNEL_CODE(...) #__VA_ARGS__ const static char * strKernel = { KERNEL_CODE( \n kernel void initPipe(global DATA_TYPE* inBuf, write_only pipe DATA_TYPE outPipe)\n {\n int gid = get_global_id(0);\n write_pipe(outPipe, &inBuf[gid]);\n }\n \n kernel void copyPipe(read_only pipe DATA_TYPE inPipe, write_only pipe DATA_TYPE outPipe)\n {\n DATA_TYPE tmp;\n read_pipe(inPipe, &tmp);\n write_pipe(outPipe, &tmp);\n }\n \n kernel void readPipe(read_only pipe DATA_TYPE inPipe, global DATA_TYPE* outBuf)\n {\n int gid = get_global_id(0);\n DATA_TYPE tmp;\n read_pipe(inPipe, &tmp);\n outBuf[gid] = tmp;\n }\n \n kernel void initPipe_reserve(global DATA_TYPE* inBuf, write_only pipe DATA_TYPE outPipe)\n {\n int gid = get_global_id(0);\n local reserve_id_t resId;\n resId = reserve_write_pipe(outPipe, 1);\n if (is_valid_reserve_id(resId)) {\n write_pipe(outPipe, resId, 0, &inBuf[gid]);\n commit_write_pipe(outPipe, resId);\n }\n }\n \n kernel void copyPipe_reserve(read_only pipe DATA_TYPE inPipe, write_only pipe DATA_TYPE outPipe)\n {\n local reserve_id_t resId;\n resId = reserve_read_pipe(inPipe, 1);\n if (is_valid_reserve_id(resId)) {\n DATA_TYPE tmp;\n read_pipe(inPipe, resId, 0, &tmp);\n commit_read_pipe(inPipe, resId);\n resId = reserve_write_pipe(outPipe, 1);\n if (is_valid_reserve_id(resId)) {\n write_pipe(outPipe, resId, 0, &tmp);\n commit_write_pipe(outPipe, resId);\n }\n }\n }\n \n kernel void readPipe_reserve(read_only pipe DATA_TYPE inPipe, global DATA_TYPE* outBuf)\n {\n int gid = get_global_id(0);\n local reserve_id_t resId;\n resId = reserve_read_pipe(inPipe, 1);\n if (is_valid_reserve_id(resId)) {\n DATA_TYPE tmp;\n read_pipe(inPipe, resId, 0, &tmp);\n commit_read_pipe(inPipe, resId);\n outBuf[gid] = tmp;\n }\n }\n \n kernel void initPipe_wg(global DATA_TYPE* inBuf, write_only pipe DATA_TYPE outPipe)\n {\n int gid = get_global_id(0);\n local reserve_id_t resId;\n resId = work_group_reserve_write_pipe(outPipe, get_local_size(0));\n if (is_valid_reserve_id(resId)) {\n write_pipe(outPipe, resId, get_local_id(0), &inBuf[gid]);\n work_group_commit_write_pipe(outPipe, resId);\n }\n }\n \n kernel void copyPipe_wg(read_only pipe DATA_TYPE inPipe, write_only pipe DATA_TYPE outPipe)\n {\n local reserve_id_t resId;\n resId = work_group_reserve_read_pipe(inPipe, get_local_size(0));\n if (is_valid_reserve_id(resId)) {\n DATA_TYPE tmp;\n read_pipe(inPipe, resId, get_local_id(0), &tmp);\n work_group_commit_read_pipe(inPipe, resId);\n resId = work_group_reserve_write_pipe(outPipe, get_local_size(0));\n if (is_valid_reserve_id(resId)) {\n write_pipe(outPipe, resId, get_local_id(0), &tmp);\n work_group_commit_write_pipe(outPipe, resId);\n }\n }\n }\n \n kernel void readPipe_wg(read_only pipe DATA_TYPE inPipe, global DATA_TYPE* outBuf)\n {\n int gid = get_global_id(0);\n local reserve_id_t resId;\n resId = work_group_reserve_read_pipe(inPipe, get_local_size(0));\n if (is_valid_reserve_id(resId)) {\n DATA_TYPE tmp;\n read_pipe(inPipe, resId, get_local_id(0), &tmp);\n work_group_commit_read_pipe(inPipe, resId);\n outBuf[gid] = tmp;\n }\n }\n \n \x23 ifdef SUBGROUPS\n \x23 pragma OPENCL EXTENSION cl_khr_subgroups : enable\n kernel __attribute__((reqd_work_group_size(64,1,1))) void initPipe_sg(global DATA_TYPE* inBuf, write_only pipe DATA_TYPE outPipe)\n {\n int gid = get_global_id(0);\n local reserve_id_t resId;\n resId = sub_group_reserve_write_pipe(outPipe, get_sub_group_size());\n if (is_valid_reserve_id(resId)) {\n write_pipe(outPipe, resId, get_sub_group_local_id(), &inBuf[gid]);\n sub_group_commit_write_pipe(outPipe, resId);\n }\n }\n \n kernel __attribute__((reqd_work_group_size(64,1,1))) void copyPipe_sg(read_only pipe DATA_TYPE inPipe, write_only pipe DATA_TYPE outPipe)\n {\n local reserve_id_t resId;\n resId = sub_group_reserve_read_pipe(inPipe, get_sub_group_size());\n if (is_valid_reserve_id(resId)) {\n DATA_TYPE tmp;\n read_pipe(inPipe, resId, get_sub_group_local_id(), &tmp);\n sub_group_commit_read_pipe(inPipe, resId);\n resId = sub_group_reserve_write_pipe(outPipe, get_sub_group_size());\n if (is_valid_reserve_id(resId)) {\n write_pipe(outPipe, resId, get_sub_group_local_id(), &tmp);\n sub_group_commit_write_pipe(outPipe, resId);\n }\n }\n }\n \n kernel __attribute__((reqd_work_group_size(64,1,1))) void readPipe_sg(read_only pipe DATA_TYPE inPipe, global DATA_TYPE* outBuf)\n {\n int gid = get_global_id(0);\n local reserve_id_t resId;\n resId = sub_group_reserve_read_pipe(inPipe, get_sub_group_size());\n if (is_valid_reserve_id(resId)) {\n DATA_TYPE tmp;\n read_pipe(inPipe, resId, get_sub_group_local_id(), &outBuf[gid]);\n sub_group_commit_read_pipe(inPipe, resId);\n outBuf[gid] = tmp;\n }\n }\n \x23 endif\n \n ) }; #define NUM_SIZES 6 // 4KB, 8KB, 64KB, 256KB, 1 MB, 4MB static const unsigned int Sizes[NUM_SIZES] = {4096, 8192, 65536, 262144, 1048576, 4194304}; #define NUM_TYPES 3 static const char *types[NUM_TYPES] = {"int", "int4", "int16"}; static const unsigned int typeSize[NUM_TYPES] = {4, 16, 64}; #define NUM_TESTS 4 OCLPerfPipeCopySpeed::OCLPerfPipeCopySpeed() { _numSubTests = NUM_TESTS * NUM_SIZES * NUM_TYPES; } OCLPerfPipeCopySpeed::~OCLPerfPipeCopySpeed() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfPipeCopySpeed::setData(cl_mem buffer) { int *mem; int dwTypeSize = (int)(typeSize[typeIdx_]) >> 2; mem = (int *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, CL_TRUE, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); for (int i = 0; i < (int)numElements; i++) { for (int j = 0; j < dwTypeSize; j++) { mem[i * dwTypeSize + j] = i; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, (void *)mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); clFinish(cmd_queue_); } void OCLPerfPipeCopySpeed::checkData(cl_mem buffer) { int *mem; int dwTypeSize = (int)(typeSize[typeIdx_]) >> 2; char *histo; histo = (char *)malloc(numElements * sizeof(char)); memset(histo, 0, sizeof(char) * numElements); mem = (int *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, CL_TRUE, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); int errCnt = 0; for (int i = 0; (i < (int)numElements) && (errCnt < 5); i++) { int tmp = mem[dwTypeSize * i]; for (int j = 1; (j < dwTypeSize) && (errCnt < 5); j++) { if (mem[i * dwTypeSize + j] != tmp) { // BAD DATA! printf("BAD DATA at element %d, ref %d, got %d\n", i, tmp, mem[i * dwTypeSize + j]); errCnt++; } } if (histo[tmp] == 1) { printf("BAD DATA at element %d, val %d already found!\n", i, tmp); errCnt++; } histo[tmp] = 1; } errCnt = 0; for (int i = 0; (i < (int)numElements) && (errCnt < 5); i++) { if (histo[i] != 1) { printf("BAD DATA at element %d, val not found!\n", i); errCnt++; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, (void *)mem, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueUnmapBuffer failed"); clFinish(cmd_queue_); free(histo); } void OCLPerfPipeCopySpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); _crcword = 0; conversion = 1.0f; cl_device_id device = devices_[deviceId]; cmd_queue_ = cmdQueues_[_deviceId]; program_ = 0; initPipe_ = 0; copyPipe_ = 0; readPipe_ = 0; srcBuffer_ = 0; dstBuffer_ = 0; pipe_[0] = 0; pipe_[1] = 0; failed_ = false; subgroupSupport_ = false; bufSize_ = Sizes[test % NUM_SIZES]; typeIdx_ = (test / NUM_SIZES) % NUM_TYPES; testIdx_ = test / (NUM_SIZES * NUM_TYPES); numIter = NUM_ITER; char getVersion[128]; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_VERSION, sizeof(getVersion), getVersion, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (getVersion[7] < '2') { failed_ = true; _errorMsg = "OpenCL 2.0 not supported"; return; } srcBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, bufSize_, NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateBuffer(srcBuffer) failed"); numElements = bufSize_ / typeSize[typeIdx_]; char args[100]; #if defined(CL_VERSION_2_0) pipe_[0] = _wrapper->clCreatePipe(context_, CL_MEM_HOST_NO_ACCESS, typeSize[typeIdx_], numElements, NULL, &error_); CHECK_RESULT(pipe_[0] == 0, "clCreatePipe(pipe_[0]) failed"); pipe_[1] = _wrapper->clCreatePipe(context_, CL_MEM_HOST_NO_ACCESS, typeSize[typeIdx_], numElements, NULL, &error_); CHECK_RESULT(pipe_[1] == 0, "clCreatePipe(pipe_[1]) failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); char *p = strstr(charbuf, "cl_khr_subgroups"); if (p) { subgroupSupport_ = true; SNPRINTF(args, sizeof(args), "-cl-std=CL2.0 -D DATA_TYPE=%s -D SUBGROUPS", types[typeIdx_]); } else { if (test >= (NUM_SIZES * NUM_TYPES * 3)) { // No support for subgroups, so skip these tests failed_ = true; _errorMsg = "Subgroup extension not supported"; return; } SNPRINTF(args, sizeof(args), "-cl-std=CL2.0 -D DATA_TYPE=%s", types[typeIdx_]); } #endif dstBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, bufSize_, NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateBuffer(dstBuffer) failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &device, args, NULL, NULL); if (error_ != CL_SUCCESS) { printf("\nerror: %d\n", error_); cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } if (testIdx_ == 0) { initPipe_ = _wrapper->clCreateKernel(program_, "initPipe", &error_); CHECK_RESULT(initPipe_ == 0, "clCreateKernel(initPipe) failed"); copyPipe_ = _wrapper->clCreateKernel(program_, "copyPipe", &error_); CHECK_RESULT(copyPipe_ == 0, "clCreateKernel(copyPipe) failed"); readPipe_ = _wrapper->clCreateKernel(program_, "readPipe", &error_); CHECK_RESULT(readPipe_ == 0, "clCreateKernel(readPipe) failed"); testName_ = "r/w"; } else if (testIdx_ == 1) { initPipe_ = _wrapper->clCreateKernel(program_, "initPipe_reserve", &error_); CHECK_RESULT(initPipe_ == 0, "clCreateKernel(initPipe) failed"); copyPipe_ = _wrapper->clCreateKernel(program_, "copyPipe_reserve", &error_); CHECK_RESULT(copyPipe_ == 0, "clCreateKernel(copyPipe) failed"); readPipe_ = _wrapper->clCreateKernel(program_, "readPipe_reserve", &error_); CHECK_RESULT(readPipe_ == 0, "clCreateKernel(readPipe) failed"); numIter = 10; // Limit iteration count because this test is very slow testName_ = "r/w w/ reserve"; } else if (testIdx_ == 2) { initPipe_ = _wrapper->clCreateKernel(program_, "initPipe_wg", &error_); CHECK_RESULT(initPipe_ == 0, "clCreateKernel(initPipe) failed"); copyPipe_ = _wrapper->clCreateKernel(program_, "copyPipe_wg", &error_); CHECK_RESULT(copyPipe_ == 0, "clCreateKernel(copyPipe) failed"); readPipe_ = _wrapper->clCreateKernel(program_, "readPipe_wg", &error_); CHECK_RESULT(readPipe_ == 0, "clCreateKernel(readPipe) failed"); testName_ = "wg r/w w/ reserve"; } else if (testIdx_ == 3) { initPipe_ = _wrapper->clCreateKernel(program_, "initPipe_sg", &error_); CHECK_RESULT(initPipe_ == 0, "clCreateKernel(initPipe) failed"); copyPipe_ = _wrapper->clCreateKernel(program_, "copyPipe_sg", &error_); CHECK_RESULT(copyPipe_ == 0, "clCreateKernel(copyPipe) failed"); readPipe_ = _wrapper->clCreateKernel(program_, "readPipe_sg", &error_); CHECK_RESULT(readPipe_ == 0, "clCreateKernel(readPipe) failed"); testName_ = "sg r/w w/ reserve"; } else { CHECK_RESULT(1, "Invalid test index!"); } setData(srcBuffer_); } void OCLPerfPipeCopySpeed::run(void) { if (failed_) return; CPerfCounter timer; size_t global_work_size[1] = {(size_t)numElements}; size_t local_work_size[1] = {64}; error_ = _wrapper->clSetKernelArg(initPipe_, 0, sizeof(cl_mem), (void *)&srcBuffer_); error_ = _wrapper->clSetKernelArg(initPipe_, 1, sizeof(cl_mem), (void *)&pipe_[0]); // Warm up error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, initPipe_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); error_ = _wrapper->clSetKernelArg(copyPipe_, 0, sizeof(cl_mem), (void *)&pipe_[0]); error_ = _wrapper->clSetKernelArg(copyPipe_, 1, sizeof(cl_mem), (void *)&pipe_[1]); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, copyPipe_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < numIter; i++) { error_ = _wrapper->clSetKernelArg(copyPipe_, 0, sizeof(cl_mem), (void *)&pipe_[(i + 1) % 2]); error_ = _wrapper->clSetKernelArg(copyPipe_, 1, sizeof(cl_mem), (void *)&pipe_[i % 2]); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, copyPipe_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); timer.Stop(); // pipe[(numIter-1)%2 has the data error_ = _wrapper->clSetKernelArg(readPipe_, 0, sizeof(cl_mem), (void *)&pipe_[(numIter - 1) % 2]); error_ = _wrapper->clSetKernelArg(readPipe_, 1, sizeof(cl_mem), (void *)&dstBuffer_); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, readPipe_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel(readPipe) failed"); error_ = _wrapper->clFinish(cmd_queue_); checkData(dstBuffer_); double sec = timer.GetElapsedTime(); // Pipe copy total bandwidth in GB/s double perf = 2. * ((double)bufSize_ * numIter * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " %17s (%8d bytes) block size: %2d i:%4d (GB/s) ", testName_.c_str(), bufSize_, typeSize[typeIdx_], numIter); testDescString = buf; } unsigned int OCLPerfPipeCopySpeed::close(void) { if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (pipe_[0]) { error_ = _wrapper->clReleaseMemObject(pipe_[0]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(pipe_[0]) failed"); } if (pipe_[1]) { error_ = _wrapper->clReleaseMemObject(pipe_[1]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(pipe_[1]) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(dstBuffer_) failed"); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfPipeCopySpeed.h000066400000000000000000000037431450307266000254640ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PipeCopySpeed_H_ #define _OCL_PipeCopySpeed_H_ #include "OCLTestImp.h" class OCLPerfPipeCopySpeed : public OCLTestImp { public: OCLPerfPipeCopySpeed(); virtual ~OCLPerfPipeCopySpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 100; void setData(cl_mem buffer); void checkData(cl_mem buffer); cl_command_queue cmd_queue_; cl_mem srcBuffer_; cl_mem pipe_[2]; cl_mem dstBuffer_; cl_program program_; cl_kernel initPipe_; cl_kernel copyPipe_; cl_kernel readPipe_; unsigned int bufSize_; unsigned int typeIdx_; unsigned int numElements; unsigned int numIter; unsigned int testIdx_; std::string testName_; bool subgroupSupport_; bool failed_; }; #endif // _OCL_PipeCopySpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfProgramGlobalRead.cpp000066400000000000000000000455101450307266000266300ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfProgramGlobalRead.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const unsigned int NUM_SIZES = 4; static const unsigned int NUM_READ_MODES = 6; // Limit to 32 reads for now static const unsigned int MAX_READ_MODES = 4; static const unsigned int NumReads[NUM_READ_MODES] = {1, 4, 16, 32, 64, 128}; // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; static const unsigned int MaxTypes = 6; static unsigned int NumTypes = MaxTypes; static const char *types[MaxTypes] = {"char", "short", "int", "long", "float", "double"}; static unsigned int StartType = 0; static const unsigned int NumVecWidths = 3; // 5; char8 global scope does not work; bug opened static const char *vecWidths[NumVecWidths] = {"", "2", "4"}; //, "8", "16"}; static const unsigned int vecWidths_int[NumVecWidths] = {1, 2, 4}; //, 8, 16}; static const unsigned int TypeSize[MaxTypes] = { sizeof(cl_char), sizeof(cl_short), sizeof(cl_int), sizeof(cl_long), sizeof(cl_float), sizeof(cl_double)}; #define CHAR_BUF_SIZE 512 // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif void OCLPerfProgramGlobalRead::genShader(unsigned int type, unsigned int vecWidth, unsigned int numReads, unsigned int bufSize) { char buf[CHAR_BUF_SIZE]; shader_.clear(); shader_ += "#ifdef USE_ARENA\n" "#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable\n" "#endif\n"; shader_ += "#ifdef USE_AMD_DOUBLES\n" "#pragma OPENCL EXTENSION cl_amd_fp64 : enable\n" "#endif\n"; shader_ += "#ifdef USE_KHR_DOUBLES\n" "#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n" "#endif\n"; SNPRINTF(buf, CHAR_BUF_SIZE, "__global %s%s gp[%d];\n", types[type], vecWidths[vecWidth], bufSize); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, "__kernel void __attribute__((reqd_work_group_size(64,1,1))) " "_ReadSpeed(__global %s%s * restrict outBuf, constant uint * " "restrict constBuf)\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += "{\n" " uint i = (uint) get_global_id(0);\n"; if (numReads == 1) { SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += " const unsigned int Max = constBuf[0];\n" " temp = *(gp + i % Max);\n"; shader_ += " *(outBuf + i) = temp;\n" "}\n"; } else { SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp0 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp1 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp2 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp3 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += " const unsigned int Max = constBuf[0];\n" " unsigned int idx0 = (i % Max) + constBuf[1];\n" " unsigned int idx1 = (i % Max) + constBuf[2];\n" " unsigned int idx2 = (i % Max) + constBuf[3];\n" " unsigned int idx3 = (i % Max) + constBuf[4];\n"; for (unsigned int i = 0; i < (numReads >> 2); i++) { shader_ += " temp0 += *(gp + idx0);\n"; shader_ += " temp1 += *(gp + idx1);\n"; shader_ += " temp2 += *(gp + idx2);\n"; shader_ += " temp3 += *(gp + idx3);\n"; shader_ += " idx0 += constBuf[5];\n"; shader_ += " idx1 += constBuf[5];\n"; shader_ += " idx2 += constBuf[5];\n"; shader_ += " idx3 += constBuf[5];\n"; } shader_ += " *(outBuf + i) = temp0 + temp1 + temp2 + temp3;\n" "}\n"; } } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLPerfProgramGlobalRead::OCLPerfProgramGlobalRead() { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; context_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); // Get last for default platform = platforms[numPlatforms - 1]; for (unsigned i = 0; i < numPlatforms; ++i) { char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[i], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of // just returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { platform = platforms[i]; break; } } delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); char *p = strstr(charbuf, "cl_khr_byte_addressable_store"); char *p2 = strstr(charbuf, "cl_khr_fp64"); NumTypes = MaxTypes; if (!p) { // No arena ops NumTypes -= 2; StartType = 2; } if (!p2) { // Doubles not supported NumTypes--; } _numSubTests = NumTypes * NumVecWidths * NUM_SIZES * MAX_READ_MODES; if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } skip_ = false; } OCLPerfProgramGlobalRead::~OCLPerfProgramGlobalRead() {} // Fill with 1s of appropriate type void OCLPerfProgramGlobalRead::setData(cl_mem buffer, float val) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); switch (typeIdx_) { case 0: // char { char *data = (char *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(char)); i++) data[i] = (char)val; break; } case 1: // short { short *data = (short *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(short)); i++) data[i] = (short)val; break; } case 2: // int { int *data = (int *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(int)); i++) data[i] = (int)val; break; } case 3: // long { cl_long *data = (cl_long *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(cl_long)); i++) data[i] = (cl_long)val; break; } case 4: // float { float *data = (float *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(float)); i++) data[i] = val; break; } case 5: // double { double *data = (double *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(double)); i++) data[i] = (double)val; break; } default: // oops break; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); } void OCLPerfProgramGlobalRead::checkData(cl_mem buffer) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); switch (typeIdx_) { case 0: // char { char *data = (char *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(char)); i++) { if (data[i] != (char)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 1: // short { short *data = (short *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(short)); i++) { if (data[i] != (short)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 2: // int { int *data = (int *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(int)); i++) { if (data[i] != (int)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 3: // long { cl_long *data = (cl_long *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(cl_long)); i++) { if (data[i] != (cl_long)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 4: // float { float *data = (float *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(float)); i++) { if (data[i] != (float)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 5: // double { double *data = (double *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(double)); i++) { if (data[i] != (double)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } default: // oops break; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); } void OCLPerfProgramGlobalRead::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; cmd_queue_ = 0; outBuffer_ = 0; constBuffer_ = 0; #if defined(CL_VERSION_2_0) cl_device_id device; numReads_ = NumReads[test % MAX_READ_MODES]; width_ = Sizes[(test / MAX_READ_MODES) % NUM_SIZES]; vecSizeIdx_ = (test / (MAX_READ_MODES * NUM_SIZES)) % NumVecWidths; typeIdx_ = (test / (MAX_READ_MODES * NUM_SIZES * NumVecWidths)) % NumTypes + StartType; bufSize_ = width_; cmd_queue_ = cmdQueues_[_deviceId]; device = devices_[_deviceId]; outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); constBuffer_ = _wrapper->clCreateBuffer(context_, 0, 16 * 2, NULL, &error_); CHECK_RESULT(constBuffer_ == 0, "clCreateBuffer(constBuffer) failed"); genShader(typeIdx_, vecSizeIdx_, numReads_, bufSize_ / (TypeSize[typeIdx_] * (1 << vecSizeIdx_))); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); if (typeIdx_ < 2) { args += "-D USE_ARENA "; } args += "-cl-std=CL2.0"; error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_ReadSpeed", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&constBuffer_); setData(outBuffer_, 1.2345678f); unsigned int *cBuf = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, constBuffer_, true, CL_MAP_WRITE, 0, 16 * 2, 0, NULL, NULL, &error_); // Force all wavefronts to fetch the same data. We are looking for peak speed // here. cBuf[0] = 64; // These values are chosen to assure there is no data reuse within a clause. // If caching is not working, then the uncached numbers will be low. cBuf[1] = 0; cBuf[2] = 64; cBuf[3] = 128; cBuf[4] = 192; cBuf[5] = 0; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, constBuffer_, cBuf, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); #else skip_ = true; testDescString = "Program scope globals not supported for < 2.0 builds. Test Skipped."; return; #endif } void OCLPerfProgramGlobalRead::run(void) { if (skip_) { return; } #if defined(CL_VERSION_2_0) int global = bufSize_ / (TypeSize[typeIdx_] * (1 << vecSizeIdx_)); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // Program scope global read bandwidth in GB/s double perf = ((double)bufSize_ * numReads_ * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; char buf2[256]; SNPRINTF(buf, sizeof(buf), "%s%s", types[typeIdx_], vecWidths[vecSizeIdx_]); SNPRINTF(buf2, sizeof(buf2), " %-8s (%8d) %2d reads: (GB/s) ", buf, width_, numReads_); testDescString = buf2; // checkData(outBuffer_); #endif } unsigned int OCLPerfProgramGlobalRead::close(void) { #if defined(CL_VERSION_2_0) if (cmd_queue_) _wrapper->clFinish(cmd_queue_); if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (constBuffer_) { error_ = _wrapper->clReleaseMemObject(constBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(constBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } #endif return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfProgramGlobalRead.h000066400000000000000000000040421450307266000262700ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PROGRAMGLOBALREAD_H #define _OCL_PROGRAMGLOBALREAD_H #include "OCLTestImp.h" class OCLPerfProgramGlobalRead : public OCLTestImp { public: OCLPerfProgramGlobalRead(); virtual ~OCLPerfProgramGlobalRead(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(unsigned int type, unsigned int vecWidth, unsigned int numReads, unsigned int bufSize); void setData(cl_mem buffer, float data); void checkData(cl_mem buffer); static const unsigned int NUM_ITER = 100; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem outBuffer_; cl_mem constBuffer_; unsigned int width_; unsigned int bufSize_; unsigned int vecSizeIdx_; unsigned int numReads_; unsigned int typeIdx_; bool skip_; }; #endif // _OCL_PROGRAMGLOBALREAD_H clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfProgramGlobalWrite.cpp000066400000000000000000000325751450307266000270560ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfProgramGlobalWrite.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const unsigned int NUM_SIZES = 4; static const unsigned int NUM_READ_MODES = 6; // Limit to 32 reads for now static const unsigned int MAX_READ_MODES = 4; static const unsigned int NumReads[NUM_READ_MODES] = {1, 4, 16, 32, 64, 128}; // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; static const unsigned int MaxTypes = 6; static unsigned int NumTypes = MaxTypes; static const char *types[MaxTypes] = {"char", "short", "int", "long", "float", "double"}; static unsigned int StartType = 0; static const unsigned int NumVecWidths = 3; // 5; char8 global scope does not work; bug opened static const char *vecWidths[NumVecWidths] = {"", "2", "4"}; //, "8", "16"}; static const unsigned int vecWidths_int[NumVecWidths] = {1, 2, 4}; //, 8, 16}; static const unsigned int TypeSize[MaxTypes] = { sizeof(cl_char), sizeof(cl_short), sizeof(cl_int), sizeof(cl_long), sizeof(cl_float), sizeof(cl_double)}; #define CHAR_BUF_SIZE 512 // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif void OCLPerfProgramGlobalWrite::genShader(unsigned int type, unsigned int vecWidth, unsigned int numReads, unsigned int bufSize) { char buf[CHAR_BUF_SIZE]; shader_.clear(); shader_ += "#ifdef USE_ARENA\n" "#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable\n" "#endif\n"; shader_ += "#ifdef USE_AMD_DOUBLES\n" "#pragma OPENCL EXTENSION cl_amd_fp64 : enable\n" "#endif\n"; shader_ += "#ifdef USE_KHR_DOUBLES\n" "#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n" "#endif\n"; SNPRINTF(buf, CHAR_BUF_SIZE, "__global %s%s gp[%d];\n", types[type], vecWidths[vecWidth], bufSize); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, "__kernel void __attribute__((reqd_work_group_size(64,1,1))) " "_WriteSpeed(constant uint * restrict constBuf)\n"); shader_.append(buf); shader_ += "{\n" " uint i = (uint) get_global_id(0);\n"; if (numReads == 1) { SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += " const unsigned int Max = constBuf[0];\n"; shader_ += " *(gp + i % Max) = 0;\n" "}\n"; } else { SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp0 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp1 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp2 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp3 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += " const unsigned int Max = constBuf[0];\n" " unsigned int idx0 = (i % Max) + constBuf[1];\n" " unsigned int idx1 = (i % Max) + constBuf[2];\n" " unsigned int idx2 = (i % Max) + constBuf[3];\n" " unsigned int idx3 = (i % Max) + constBuf[4];\n"; for (unsigned int i = 0; i < (numReads >> 2); i++) { shader_ += " *(gp + idx0) = idx0;\n"; shader_ += " *(gp + idx1) = idx1;\n"; shader_ += " *(gp + idx2) = idx2;\n"; shader_ += " *(gp + idx3) = idx3;\n"; shader_ += " idx0 += constBuf[5];\n"; shader_ += " idx1 += constBuf[5];\n"; shader_ += " idx2 += constBuf[5];\n"; shader_ += " idx3 += constBuf[5];\n"; } shader_ += "}\n"; } SNPRINTF(buf, CHAR_BUF_SIZE, "__kernel void __dummyRead(global %s%s *in)\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += "{\n" " uint i = (uint) get_global_id(0);\n"; SNPRINTF(buf, CHAR_BUF_SIZE, " in[i] = gp[i];\n"); shader_.append(buf); shader_ += "}\n"; } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLPerfProgramGlobalWrite::OCLPerfProgramGlobalWrite() { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; context_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); // Get last for default platform = platforms[numPlatforms - 1]; for (unsigned i = 0; i < numPlatforms; ++i) { char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[i], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of // just returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { platform = platforms[i]; break; } } delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); char *p = strstr(charbuf, "cl_khr_byte_addressable_store"); char *p2 = strstr(charbuf, "cl_khr_fp64"); NumTypes = MaxTypes; if (!p) { // No arena ops NumTypes -= 2; StartType = 2; } if (!p2) { // Doubles not supported NumTypes--; } _numSubTests = NumTypes * NumVecWidths * NUM_SIZES * MAX_READ_MODES; if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } skip_ = false; } OCLPerfProgramGlobalWrite::~OCLPerfProgramGlobalWrite() {} void OCLPerfProgramGlobalWrite::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { error_ = CL_SUCCESS; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; cmd_queue_ = 0; outBuffer_ = 0; constBuffer_ = 0; #if defined(CL_VERSION_2_0) cl_device_id device; numReads_ = NumReads[test % MAX_READ_MODES]; width_ = Sizes[(test / MAX_READ_MODES) % NUM_SIZES]; vecSizeIdx_ = (test / (MAX_READ_MODES * NUM_SIZES)) % NumVecWidths; typeIdx_ = (test / (MAX_READ_MODES * NUM_SIZES * NumVecWidths)) % NumTypes + StartType; bufSize_ = width_; cmd_queue_ = cmdQueues_[_deviceId]; device = devices_[_deviceId]; outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); constBuffer_ = _wrapper->clCreateBuffer(context_, 0, 16 * 2, NULL, &error_); CHECK_RESULT(constBuffer_ == 0, "clCreateBuffer(constBuffer) failed"); genShader(typeIdx_, vecSizeIdx_, numReads_, bufSize_ / (TypeSize[typeIdx_] * (1 << vecSizeIdx_))); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); if (typeIdx_ < 2) { args += "-D USE_ARENA "; } args += "-cl-std=CL2.0"; error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_WriteSpeed", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&constBuffer_); unsigned int *cBuf = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, constBuffer_, true, CL_MAP_WRITE, 0, 16 * 2, 0, NULL, NULL, &error_); // Force all wavefronts to fetch the same data. We are looking for peak speed // here. cBuf[0] = 64; // These values are chosen to assure there is no data reuse within a clause. // If caching is not working, then the uncached numbers will be low. cBuf[1] = 0; cBuf[2] = 64; cBuf[3] = 128; cBuf[4] = 192; cBuf[5] = 0; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, constBuffer_, cBuf, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); #else skip_ = true; testDescString = "Program scope globals not supported for < 2.0 builds. Test Skipped."; return; #endif } void OCLPerfProgramGlobalWrite::run(void) { if (skip_) { return; } #if defined(CL_VERSION_2_0) int global = bufSize_ / (TypeSize[typeIdx_] * (1 << vecSizeIdx_)); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // Program scope global write bandwidth in GB/s double perf = ((double)bufSize_ * numReads_ * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; char buf2[256]; SNPRINTF(buf, sizeof(buf), "%s%s", types[typeIdx_], vecWidths[vecSizeIdx_]); SNPRINTF(buf2, sizeof(buf2), " %-8s (%8d) %2d reads: (GB/s) ", buf, width_, numReads_); testDescString = buf2; #endif } unsigned int OCLPerfProgramGlobalWrite::close(void) { #if defined(CL_VERSION_2_0) if (cmd_queue_) _wrapper->clFinish(cmd_queue_); if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (constBuffer_) { error_ = _wrapper->clReleaseMemObject(constBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(constBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } #endif return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfProgramGlobalWrite.h000066400000000000000000000037371450307266000265210ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PROGRAMGLOBALWRITE_H_ #define _OCL_PROGRAMGLOBALWRITE_H_ #include "OCLTestImp.h" class OCLPerfProgramGlobalWrite : public OCLTestImp { public: OCLPerfProgramGlobalWrite(); virtual ~OCLPerfProgramGlobalWrite(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(unsigned int type, unsigned int vecWidth, unsigned int numReads, unsigned int bufSize); static const unsigned int NUM_ITER = 100; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem outBuffer_; cl_mem constBuffer_; unsigned int width_; unsigned int bufSize_; unsigned int vecSizeIdx_; unsigned int numReads_; unsigned int typeIdx_; bool skip_; }; #endif // _OCL_PROGRAMGLOBALWRITE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSHA256.cpp000066400000000000000000000707231450307266000241600ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSHA256.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const char *sha256_kernel = "typedef uint UINT;\n" "\n" "#define VECTOR_LEN 1\n" "\n" "#ifdef LITTLE_E\n" "\n" "inline UINT byteswap(UINT x)\n" "{\n" " UINT res = 0;\n" " \n" " for (uint i=0; i<4; i++)\n" " {\n" " res <<= 8;\n" " res |= (x & 0xff);\n" " x >>= 8;\n" " }\n" " \n" " return res;\n" "}\n" "\n" "#else\n" "\n" "inline UINT byteswap(const UINT x)\n" "{\n" " return x;\n" "}\n" "\n" "#endif\n" "\n" "\n" "void sha256_step( const UINT data[16], UINT *state )\n" "{\n" " UINT W[64], temp1, temp2;\n" " UINT A, B, C, D, E, F, G, H;\n" "\n" " for( int i = 0; i < 16; i++)\n" " {\n" " W[i] = byteswap(data[i]);\n" " }\n" "\n" "#define SHR(x,n) ((x & 0xFFFFFFFF) >> n)\n" "#define ROTR(x,n) (SHR(x,n) | (x << (32 - n)))\n" "\n" "#define S0(x) (ROTR(x, 7) ^ ROTR(x,18) ^ SHR(x, 3))\n" "#define S1(x) (ROTR(x,17) ^ ROTR(x,19) ^ SHR(x,10))\n" "\n" "#define S2(x) (ROTR(x, 2) ^ ROTR(x,13) ^ ROTR(x,22))\n" "#define S3(x) (ROTR(x, 6) ^ ROTR(x,11) ^ ROTR(x,25))\n" "\n" "#define F0(x,y,z) ((x & y) | (z & (x | y)))\n" "#define F1(x,y,z) (z ^ (x & (y ^ z)))\n" "\n" "#define R(t) \\\n" "( \\\n" " W[t] = S1(W[t - 2]) + W[t - 7] + \\\n" " S0(W[t - 15]) + W[t - 16] \\\n" ")\n" "\n" "#define P(a,b,c,d,e,f,g,h,x,K) \\\n" "{ \\\n" " temp1 = h + S3(e) + F1(e,f,g) + K + x; \\\n" " temp2 = S2(a) + F0(a,b,c); \\\n" " d += temp1; h = temp1 + temp2; \\\n" "}\n" "\n" " A = state[0];\n" " B = state[1];\n" " C = state[2];\n" " D = state[3];\n" " E = state[4];\n" " F = state[5];\n" " G = state[6];\n" " H = state[7];\n" "\n" " P( A, B, C, D, E, F, G, H, W[ 0], 0x428A2F98 );\n" " P( H, A, B, C, D, E, F, G, W[ 1], 0x71374491 );\n" " P( G, H, A, B, C, D, E, F, W[ 2], 0xB5C0FBCF );\n" " P( F, G, H, A, B, C, D, E, W[ 3], 0xE9B5DBA5 );\n" " P( E, F, G, H, A, B, C, D, W[ 4], 0x3956C25B );\n" " P( D, E, F, G, H, A, B, C, W[ 5], 0x59F111F1 );\n" " P( C, D, E, F, G, H, A, B, W[ 6], 0x923F82A4 );\n" " P( B, C, D, E, F, G, H, A, W[ 7], 0xAB1C5ED5 );\n" " P( A, B, C, D, E, F, G, H, W[ 8], 0xD807AA98 );\n" " P( H, A, B, C, D, E, F, G, W[ 9], 0x12835B01 );\n" " P( G, H, A, B, C, D, E, F, W[10], 0x243185BE );\n" " P( F, G, H, A, B, C, D, E, W[11], 0x550C7DC3 );\n" " P( E, F, G, H, A, B, C, D, W[12], 0x72BE5D74 );\n" " P( D, E, F, G, H, A, B, C, W[13], 0x80DEB1FE );\n" " P( C, D, E, F, G, H, A, B, W[14], 0x9BDC06A7 );\n" " P( B, C, D, E, F, G, H, A, W[15], 0xC19BF174 );\n" " P( A, B, C, D, E, F, G, H, R(16), 0xE49B69C1 );\n" " P( H, A, B, C, D, E, F, G, R(17), 0xEFBE4786 );\n" " P( G, H, A, B, C, D, E, F, R(18), 0x0FC19DC6 );\n" " P( F, G, H, A, B, C, D, E, R(19), 0x240CA1CC );\n" " P( E, F, G, H, A, B, C, D, R(20), 0x2DE92C6F );\n" " P( D, E, F, G, H, A, B, C, R(21), 0x4A7484AA );\n" " P( C, D, E, F, G, H, A, B, R(22), 0x5CB0A9DC );\n" " P( B, C, D, E, F, G, H, A, R(23), 0x76F988DA );\n" " P( A, B, C, D, E, F, G, H, R(24), 0x983E5152 );\n" " P( H, A, B, C, D, E, F, G, R(25), 0xA831C66D );\n" " P( G, H, A, B, C, D, E, F, R(26), 0xB00327C8 );\n" " P( F, G, H, A, B, C, D, E, R(27), 0xBF597FC7 );\n" " P( E, F, G, H, A, B, C, D, R(28), 0xC6E00BF3 );\n" " P( D, E, F, G, H, A, B, C, R(29), 0xD5A79147 );\n" " P( C, D, E, F, G, H, A, B, R(30), 0x06CA6351 );\n" " P( B, C, D, E, F, G, H, A, R(31), 0x14292967 );\n" " P( A, B, C, D, E, F, G, H, R(32), 0x27B70A85 );\n" " P( H, A, B, C, D, E, F, G, R(33), 0x2E1B2138 );\n" " P( G, H, A, B, C, D, E, F, R(34), 0x4D2C6DFC );\n" " P( F, G, H, A, B, C, D, E, R(35), 0x53380D13 );\n" " P( E, F, G, H, A, B, C, D, R(36), 0x650A7354 );\n" " P( D, E, F, G, H, A, B, C, R(37), 0x766A0ABB );\n" " P( C, D, E, F, G, H, A, B, R(38), 0x81C2C92E );\n" " P( B, C, D, E, F, G, H, A, R(39), 0x92722C85 );\n" " P( A, B, C, D, E, F, G, H, R(40), 0xA2BFE8A1 );\n" " P( H, A, B, C, D, E, F, G, R(41), 0xA81A664B );\n" " P( G, H, A, B, C, D, E, F, R(42), 0xC24B8B70 );\n" " P( F, G, H, A, B, C, D, E, R(43), 0xC76C51A3 );\n" " P( E, F, G, H, A, B, C, D, R(44), 0xD192E819 );\n" " P( D, E, F, G, H, A, B, C, R(45), 0xD6990624 );\n" " P( C, D, E, F, G, H, A, B, R(46), 0xF40E3585 );\n" " P( B, C, D, E, F, G, H, A, R(47), 0x106AA070 );\n" " P( A, B, C, D, E, F, G, H, R(48), 0x19A4C116 );\n" " P( H, A, B, C, D, E, F, G, R(49), 0x1E376C08 );\n" " P( G, H, A, B, C, D, E, F, R(50), 0x2748774C );\n" " P( F, G, H, A, B, C, D, E, R(51), 0x34B0BCB5 );\n" " P( E, F, G, H, A, B, C, D, R(52), 0x391C0CB3 );\n" " P( D, E, F, G, H, A, B, C, R(53), 0x4ED8AA4A );\n" " P( C, D, E, F, G, H, A, B, R(54), 0x5B9CCA4F );\n" " P( B, C, D, E, F, G, H, A, R(55), 0x682E6FF3 );\n" " P( A, B, C, D, E, F, G, H, R(56), 0x748F82EE );\n" " P( H, A, B, C, D, E, F, G, R(57), 0x78A5636F );\n" " P( G, H, A, B, C, D, E, F, R(58), 0x84C87814 );\n" " P( F, G, H, A, B, C, D, E, R(59), 0x8CC70208 );\n" " P( E, F, G, H, A, B, C, D, R(60), 0x90BEFFFA );\n" " P( D, E, F, G, H, A, B, C, R(61), 0xA4506CEB );\n" " P( C, D, E, F, G, H, A, B, R(62), 0xBEF9A3F7 );\n" " P( B, C, D, E, F, G, H, A, R(63), 0xC67178F2 );\n" "\n" " state[0] += A;\n" " state[1] += B;\n" " state[2] += C;\n" " state[3] += D;\n" " state[4] += E;\n" " state[5] += F;\n" " state[6] += G;\n" " state[7] += H;\n" "}\n" "\n" "\n" "#define choose_temp(x) ((x)/16)\n" "\n" "#define STORE_TO_TEMP(i) tb[((i)/16)][((i)%16)]\n" "\n" "\n" "__kernel void CryptThread(__global const uint *buffer, __global uint " "*state, const uint blockLen, const uint foo)\n" "{\n" " const uint init[8] = {\n" " 0x6a09e667,\n" " 0xbb67ae85,\n" " 0x3c6ef372,\n" " 0xa54ff53a,\n" " 0x510e527f,\n" " 0x9b05688c,\n" " 0x1f83d9ab,\n" " 0x5be0cd19\n" " };\n" " \n" " const uint id = get_global_id(0);\n" " uint len = blockLen;\n" " uint i, j;\n" " const uint startPosInDWORDs = (len*id*foo)/4;\n" " const uint msgLenInBitsl = len * 8;\n" " const uint msgLenInBitsh = (len) >> (32-3);\n" " UINT localState[8];\n" "\n" " for (j=0; j<8; j++) {\n" " localState[j] = init[j];\n" " }\n" "\n" " i = 0;\n" " while (len >=64)\n" " {\n" " UINT data[16];\n" " for (j=0; j<16; j++) {\n" " data[j] = buffer[j + startPosInDWORDs + i];\n" " }\n" "\n" " sha256_step(data, localState);\n" " i += 16;\n" " len -= 64;\n" " }\n" "\n" " len /= 4;\n" "\n" " UINT tb[2][16];\n" "\n" " for (j=0; j>= 8;\n" " }\n" " \n" " return res;\n" "}\n" "\n" "#else\n" "\n" "inline UINT byteswap(const UINT x)\n" "{\n" " return x;\n" "}\n" "\n" "#endif\n" "\n" "\n" "void sha256_step( const UINT data[16], UINT *state )\n" "{\n" " UINT W[64], temp1, temp2;\n" " UINT A, B, C, D, E, F, G, H;\n" "\n" " for( int i = 0; i < 16; i++)\n" " {\n" " W[i] = byteswap(data[i]);\n" " }\n" "\n" "#define SHR(x,n) ((x & 0xFFFFFFFF) >> n)\n" "#define ROTR(x,n) (SHR(x,n) | (x << (32 - n)))\n" "\n" "#define S0(x) (ROTR(x, 7) ^ ROTR(x,18) ^ SHR(x, 3))\n" "#define S1(x) (ROTR(x,17) ^ ROTR(x,19) ^ SHR(x,10))\n" "\n" "#define S2(x) (ROTR(x, 2) ^ ROTR(x,13) ^ ROTR(x,22))\n" "#define S3(x) (ROTR(x, 6) ^ ROTR(x,11) ^ ROTR(x,25))\n" "\n" "#define F0(x,y,z) ((x & y) | (z & (x | y)))\n" "#define F1(x,y,z) (z ^ (x & (y ^ z)))\n" "\n" "#define R(t) \\\n" "( \\\n" " W[t] = S1(W[t - 2]) + W[t - 7] + \\\n" " S0(W[t - 15]) + W[t - 16] \\\n" ")\n" "\n" "#define P(a,b,c,d,e,f,g,h,x,K) \\\n" "{ \\\n" " temp1 = h + S3(e) + F1(e,f,g) + K + x; \\\n" " temp2 = S2(a) + F0(a,b,c); \\\n" " d += temp1; h = temp1 + temp2; \\\n" "}\n" "\n" " A = state[0];\n" " B = state[1];\n" " C = state[2];\n" " D = state[3];\n" " E = state[4];\n" " F = state[5];\n" " G = state[6];\n" " H = state[7];\n" "\n" " P( A, B, C, D, E, F, G, H, W[ 0], 0x428A2F98 );\n" " P( H, A, B, C, D, E, F, G, W[ 1], 0x71374491 );\n" " P( G, H, A, B, C, D, E, F, W[ 2], 0xB5C0FBCF );\n" " P( F, G, H, A, B, C, D, E, W[ 3], 0xE9B5DBA5 );\n" " P( E, F, G, H, A, B, C, D, W[ 4], 0x3956C25B );\n" " P( D, E, F, G, H, A, B, C, W[ 5], 0x59F111F1 );\n" " P( C, D, E, F, G, H, A, B, W[ 6], 0x923F82A4 );\n" " P( B, C, D, E, F, G, H, A, W[ 7], 0xAB1C5ED5 );\n" " P( A, B, C, D, E, F, G, H, W[ 8], 0xD807AA98 );\n" " P( H, A, B, C, D, E, F, G, W[ 9], 0x12835B01 );\n" " P( G, H, A, B, C, D, E, F, W[10], 0x243185BE );\n" " P( F, G, H, A, B, C, D, E, W[11], 0x550C7DC3 );\n" " P( E, F, G, H, A, B, C, D, W[12], 0x72BE5D74 );\n" " P( D, E, F, G, H, A, B, C, W[13], 0x80DEB1FE );\n" " P( C, D, E, F, G, H, A, B, W[14], 0x9BDC06A7 );\n" " P( B, C, D, E, F, G, H, A, W[15], 0xC19BF174 );\n" " P( A, B, C, D, E, F, G, H, R(16), 0xE49B69C1 );\n" " P( H, A, B, C, D, E, F, G, R(17), 0xEFBE4786 );\n" " P( G, H, A, B, C, D, E, F, R(18), 0x0FC19DC6 );\n" " P( F, G, H, A, B, C, D, E, R(19), 0x240CA1CC );\n" " P( E, F, G, H, A, B, C, D, R(20), 0x2DE92C6F );\n" " P( D, E, F, G, H, A, B, C, R(21), 0x4A7484AA );\n" " P( C, D, E, F, G, H, A, B, R(22), 0x5CB0A9DC );\n" " P( B, C, D, E, F, G, H, A, R(23), 0x76F988DA );\n" " P( A, B, C, D, E, F, G, H, R(24), 0x983E5152 );\n" " P( H, A, B, C, D, E, F, G, R(25), 0xA831C66D );\n" " P( G, H, A, B, C, D, E, F, R(26), 0xB00327C8 );\n" " P( F, G, H, A, B, C, D, E, R(27), 0xBF597FC7 );\n" " P( E, F, G, H, A, B, C, D, R(28), 0xC6E00BF3 );\n" " P( D, E, F, G, H, A, B, C, R(29), 0xD5A79147 );\n" " P( C, D, E, F, G, H, A, B, R(30), 0x06CA6351 );\n" " P( B, C, D, E, F, G, H, A, R(31), 0x14292967 );\n" " P( A, B, C, D, E, F, G, H, R(32), 0x27B70A85 );\n" " P( H, A, B, C, D, E, F, G, R(33), 0x2E1B2138 );\n" " P( G, H, A, B, C, D, E, F, R(34), 0x4D2C6DFC );\n" " P( F, G, H, A, B, C, D, E, R(35), 0x53380D13 );\n" " P( E, F, G, H, A, B, C, D, R(36), 0x650A7354 );\n" " P( D, E, F, G, H, A, B, C, R(37), 0x766A0ABB );\n" " P( C, D, E, F, G, H, A, B, R(38), 0x81C2C92E );\n" " P( B, C, D, E, F, G, H, A, R(39), 0x92722C85 );\n" " P( A, B, C, D, E, F, G, H, R(40), 0xA2BFE8A1 );\n" " P( H, A, B, C, D, E, F, G, R(41), 0xA81A664B );\n" " P( G, H, A, B, C, D, E, F, R(42), 0xC24B8B70 );\n" " P( F, G, H, A, B, C, D, E, R(43), 0xC76C51A3 );\n" " P( E, F, G, H, A, B, C, D, R(44), 0xD192E819 );\n" " P( D, E, F, G, H, A, B, C, R(45), 0xD6990624 );\n" " P( C, D, E, F, G, H, A, B, R(46), 0xF40E3585 );\n" " P( B, C, D, E, F, G, H, A, R(47), 0x106AA070 );\n" " P( A, B, C, D, E, F, G, H, R(48), 0x19A4C116 );\n" " P( H, A, B, C, D, E, F, G, R(49), 0x1E376C08 );\n" " P( G, H, A, B, C, D, E, F, R(50), 0x2748774C );\n" " P( F, G, H, A, B, C, D, E, R(51), 0x34B0BCB5 );\n" " P( E, F, G, H, A, B, C, D, R(52), 0x391C0CB3 );\n" " P( D, E, F, G, H, A, B, C, R(53), 0x4ED8AA4A );\n" " P( C, D, E, F, G, H, A, B, R(54), 0x5B9CCA4F );\n" " P( B, C, D, E, F, G, H, A, R(55), 0x682E6FF3 );\n" " P( A, B, C, D, E, F, G, H, R(56), 0x748F82EE );\n" " P( H, A, B, C, D, E, F, G, R(57), 0x78A5636F );\n" " P( G, H, A, B, C, D, E, F, R(58), 0x84C87814 );\n" " P( F, G, H, A, B, C, D, E, R(59), 0x8CC70208 );\n" " P( E, F, G, H, A, B, C, D, R(60), 0x90BEFFFA );\n" " P( D, E, F, G, H, A, B, C, R(61), 0xA4506CEB );\n" " P( C, D, E, F, G, H, A, B, R(62), 0xBEF9A3F7 );\n" " P( B, C, D, E, F, G, H, A, R(63), 0xC67178F2 );\n" "\n" " state[0] += A;\n" " state[1] += B;\n" " state[2] += C;\n" " state[3] += D;\n" " state[4] += E;\n" " state[5] += F;\n" " state[6] += G;\n" " state[7] += H;\n" "}\n" "\n" "\n" "#define choose_temp(x) ((x)/16)\n" "\n" "#define STORE_TO_TEMP(i) tb[((i)/16)][((i)%16)]\n" "\n" "#define WAVEFRONT_SIZE 64\n" "\n" "__kernel void CryptThread(__global const uint *buffer, __global uint " "*state, const uint blockLen, const uint foo)\n" "{\n" " const uint init[8] = {\n" " 0x6a09e667,\n" " 0xbb67ae85,\n" " 0x3c6ef372,\n" " 0xa54ff53a,\n" " 0x510e527f,\n" " 0x9b05688c,\n" " 0x1f83d9ab,\n" " 0x5be0cd19\n" " };\n" " \n" " const uint id = get_global_id(0);\n" " const uint lid = get_local_id(0);\n" " uint len = blockLen;\n" " uint i, j;\n" " const uint startPosInDWORDs = (len*id*foo)/4;\n" "uint blockStartInDWORDs = (len*(id / WAVEFRONT_SIZE)*WAVEFRONT_SIZE)/4;\n" " const uint msgLenInBitsl = len * 8;\n" " const uint msgLenInBitsh = (len) >> (32-3);\n" " UINT localState[8];\n" "\n" " for (j=0; j<8; j++) {\n" " localState[j] = init[j];\n" " }\n" "\n" " i = 0;\n" " while (len >=64)\n" " {\n" " UINT data[16];\n" " for (j=0; j<16; j++) {\n" " //data[j] = buffer[j + startPosInDWORDs + i];\n" " data[j] = buffer[j*WAVEFRONT_SIZE + blockStartInDWORDs " "+ i*WAVEFRONT_SIZE + lid];\n" " }\n" "\n" " sha256_step(data, localState);\n" " i += 16;\n" " len -= 64;\n" " }\n" "\n" " len /= 4;\n" "\n" " UINT tb[2][16];\n" "\n" " for (j=0; jclEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); if (error_ != CL_SUCCESS) { printf("\nError code : %d\n", error_); } else { for (unsigned int i = 0; i < width_; i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); if (error_ == CL_SUCCESS) retVal = true; } return retVal; } void OCLPerfSHA256::checkData(cl_mem buffer) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < width_; i++) { } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfSHA256::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; num_input_buf_ = 1; num_output_buf_ = 1; blockSize_ = 1024; isAMD = false; width_ = 22347776; // We compute a square domain bufSize_ = width_ * sizeof(cl_uint); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); switch (_openTest % NUM_BUF_TYPES) { case 0: num_input_buf_ = 1; num_output_buf_ = 1; break; case 1: num_input_buf_ = 1; num_output_buf_ = 4; break; case 2: num_input_buf_ = 4; num_output_buf_ = 4; break; }; inBuffer_ = new cl_mem[num_input_buf_]; outBuffer_ = new cl_mem[num_output_buf_]; for (int i = 0; i < num_input_buf_; ++i) { inBuffer_[i] = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(inBuffer_[i] == 0, "clCreateBuffer(inBuffer) failed"); bool result = setData(inBuffer_[i], 0xdeadbeef); CHECK_RESULT(result != true, "clEnqueueMapBuffer buffer failed"); } for (int i = 0; i < num_output_buf_; ++i) { outBuffer_[i] = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_[i] == 0, "clCreateBuffer(outBuffer) failed"); bool result = setData(outBuffer_[i], 0xdeadbeef); CHECK_RESULT(result != true, "clEnqueueMapBuffer buffer failed"); } if (_openTest >= NUM_BUF_TYPES) { program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&sha256_opt_kernel, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); } else { program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&sha256_kernel, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); } const char *buildOps = NULL; if (isAMD) { // Enable caching buildOps = "-fno-alias"; } error_ = _wrapper->clBuildProgram(program_, 1, &device, buildOps, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "CryptThread", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_[0]); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_[0]); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&blockSize_); // Foo is not part of the original test, this can be used to see how much of // the performance is limited by fetch. Set foo to 0 and all threads will // fetch the same 1k block. This way they will all be in cache and hit max // fetch speed. unsigned int foo = 1; error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_uint), (void *)&foo); } void OCLPerfSHA256::run(void) { int global = bufSize_ / blockSize_; // 32 gives the best result due to memory thrashing. Need to optimize and // give feedback to SiSoft. int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; // Warm-up for (unsigned int i = 0; i < 10; i++) { if (num_input_buf_ > 1) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_[i % num_input_buf_]); } if (num_output_buf_ > 1) { error_ = _wrapper->clSetKernelArg( kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_[i % num_output_buf_]); } error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); } CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < MAX_ITERATIONS; i++) { if (num_input_buf_ > 1) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_[i % num_input_buf_]); } if (num_output_buf_ > 1) { error_ = _wrapper->clSetKernelArg( kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_[i % num_output_buf_]); } error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); } CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // No idea what data should be in here // checkData(outBuffer_); // Compute GB/s double perf = ((double)bufSize_ * (double)MAX_ITERATIONS * (double)(1e-09)) / sec; _perfInfo = (float)perf; if (_openTest >= NUM_BUF_TYPES) { testDescString = "opt "; } else { testDescString = "def "; } testDescString += "with "; char str[40]; sprintf(str, "%2d ip buff and %2d op buff ", num_input_buf_, num_output_buf_); testDescString += str; } unsigned int OCLPerfSHA256::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { for (int i = 0; i < num_input_buf_; ++i) { error_ = _wrapper->clReleaseMemObject(inBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } delete[] inBuffer_; } if (outBuffer_) { for (int i = 0; i < num_output_buf_; ++i) { error_ = _wrapper->clReleaseMemObject(outBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } delete[] outBuffer_; } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSHA256.h000066400000000000000000000036321450307266000236200ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_SHA256_H_ #define _OCL_SHA256_H_ #include "OCLTestImp.h" class OCLPerfSHA256 : public OCLTestImp { public: OCLPerfSHA256(); virtual ~OCLPerfSHA256(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; bool setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem* inBuffer_; cl_mem* outBuffer_; cl_int num_input_buf_; cl_int num_output_buf_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int blockSize_; static const unsigned int MAX_ITERATIONS = 100; bool isAMD; }; #endif // _OCL_SHA256_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMAlloc.cpp000066400000000000000000000215461450307266000247270ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSVMAlloc.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 5 #define NUM_CG_FLAGS 3 #define NUM_FG_FLAGS 3 static size_t sizeList[NUM_SIZES] = { 0x040000, 0x080000, 0x100000, 0x200000, 0x400000, }; #if defined(CL_VERSION_2_0) static const cl_svm_mem_flags CGFlags[NUM_CG_FLAGS] = { CL_MEM_READ_WRITE, CL_MEM_WRITE_ONLY, CL_MEM_READ_ONLY, }; static const cl_svm_mem_flags FGFlags[NUM_FG_FLAGS] = { 0, CL_MEM_SVM_FINE_GRAIN_BUFFER, CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS, }; #endif static const char *strKernel = "__kernel void dummy(__global uint* out) \n" "{ \n" " uint id = get_global_id(0); \n" " uint value = 1; \n" " if ((int)get_local_id(0) < 0) \n" " out[id] = value; \n" "} \n"; OCLPerfSVMAlloc::OCLPerfSVMAlloc() { _numSubTests = NUM_CG_FLAGS * NUM_FG_FLAGS * NUM_SIZES + NUM_SIZES; failed_ = false; skip_ = false; } OCLPerfSVMAlloc::~OCLPerfSVMAlloc() {} void OCLPerfSVMAlloc::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); #if defined(CL_VERSION_2_0) FGSystem_ = (test >= (NUM_CG_FLAGS * NUM_FG_FLAGS * NUM_SIZES)); testFGFlag_ = (test / (NUM_SIZES * NUM_CG_FLAGS)) % NUM_FG_FLAGS; testCGFlag_ = (test / NUM_SIZES) % NUM_CG_FLAGS; testSize_ = test % NUM_SIZES; cl_device_svm_capabilities caps; error_ = clGetDeviceInfo(devices_[deviceId], CL_DEVICE_SVM_CAPABILITIES, sizeof(cl_device_svm_capabilities), &caps, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if ((caps & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER) == 0) { skip_ = true; // Should never happen as OCL 2.0 devices are required to // support coarse grain SVM testDescString = "Coarse Grain Buffer NOT supported. Test Skipped."; return; } else if (testFGFlag_ > 0 && (caps & CL_DEVICE_SVM_FINE_GRAIN_BUFFER) == 0) { skip_ = true; // No support for fine grain buffer SVM testDescString = "Fine Grain Buffer NOT supported. Test Skipped."; return; } else if (FGSystem_ && (caps & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM) == 0) { skip_ = true; // No support for fine grain system SVM testDescString = "Fine Grain System NOT supported. Test Skipped."; return; } else if (testFGFlag_ == 2 && (caps & CL_DEVICE_SVM_ATOMICS) == 0) { skip_ = true; // No support for fine grain system SVM testDescString = "SVM Atomic NOT supported. Test Skipped."; return; } cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "dummy", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); return; #else skip_ = true; testDescString = "SVM NOT supported for < 2.0 builds. Test Skipped."; return; #endif } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfSVMAlloc::run(void) { if (skip_) { return; } if (failed_) { return; } #if defined(CL_VERSION_2_0) cl_uint *buffer = NULL; CPerfCounter timer; void *hostPtr = NULL; size_t bufSize = sizeList[testSize_] * sizeof(cl_int4); size_t iter = 100; cl_mem_flags flags = CGFlags[testCGFlag_] | FGFlags[testFGFlag_]; timer.Reset(); timer.Start(); size_t gws[1] = {bufSize / sizeof(cl_int4)}; size_t lws[1] = {64}; for (size_t i = 0; i < iter; ++i) { if (!FGSystem_) { buffer = (cl_uint *)clSVMAlloc(context_, flags, bufSize, 0); } else { buffer = (cl_uint *)malloc(bufSize); } CHECK_RESULT(buffer == 0, "Allocation failed"); error_ = _wrapper->clSetKernelArgSVMPointer(kernel_, 0, buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); if (!FGSystem_) { clSVMFree(context_, (void *)buffer); } else { free(buffer); } } timer.Stop(); CPerfCounter timer2; timer2.Reset(); size_t numN = 100; if (!FGSystem_) { buffer = (cl_uint *)clSVMAlloc(context_, flags, bufSize, 0); } else { buffer = (cl_uint *)malloc(bufSize); } CHECK_RESULT(buffer == 0, "Allocation failed"); timer2.Start(); for (size_t i = 0; i < numN; ++i) { error_ = _wrapper->clSetKernelArgSVMPointer(kernel_, 0, buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer2.Stop(); if (!FGSystem_) { clSVMFree(context_, (void *)buffer); } else { free(buffer); } char pFlags[5]; pFlags[0] = (testCGFlag_ == 0 || testCGFlag_ == 2) ? 'R' : '_'; // CL_MEM_READ_ONLY pFlags[1] = (testCGFlag_ == 0 || testCGFlag_ == 1) ? 'W' : '_'; // CL_MEM_WRITE_ONLY pFlags[2] = (testFGFlag_ == 1 || testFGFlag_ == 2) ? 'F' : '_'; // CL_MEM_SVM_FINE_GRAIN_BUFFER pFlags[3] = (testFGFlag_ == 2) ? 'A' : '_'; // CL_MEM_SVM_ATOMICS char buf[256]; if (!FGSystem_ && (testFGFlag_ == 0)) { SNPRINTF(buf, sizeof(buf), "Coarse Grain Buffer Alloc + Free (GB/s) for %6d KB, flags=%4s", (int)bufSize / 1024, pFlags); } else if (!FGSystem_ && (testFGFlag_ > 0)) { SNPRINTF(buf, sizeof(buf), "Fine Grain Buffer Alloc + Free (GB/s) for %6d KB, flags=%4s", (int)bufSize / 1024, pFlags); } else if (FGSystem_) { SNPRINTF(buf, sizeof(buf), "Fine Grain System Alloc + Free (GB/s) for %6d KB, flags=N/A ", (int)bufSize / 1024); } testDescString = buf; double sec1 = timer.GetElapsedTime(); double sec2 = timer2.GetElapsedTime(); _perfInfo = static_cast((bufSize * (double)(1e-09)) / (sec1 / iter - sec2 / numN)); #endif } unsigned int OCLPerfSVMAlloc::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMAlloc.h000066400000000000000000000031741450307266000243710ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_SVM_ALLOC_H_ #define _OCL_PERF_SVM_ALLOC_H_ #include "OCLTestImp.h" class OCLPerfSVMAlloc : public OCLTestImp { public: OCLPerfSVMAlloc(); virtual ~OCLPerfSVMAlloc(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int testSize_; bool FGSystem_; unsigned int testCGFlag_; unsigned int testFGFlag_; bool skip_; }; #endif // _OCL_PERF_SVM_ALLOC_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMKernelArguments.cpp000066400000000000000000000223641450307266000270020ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSVMKernelArguments.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" static const size_t BufSize = 0x1000; static const size_t Iterations = 0x10000; static const size_t TotalQueues = 4; static const size_t TotalBufs = 4; static const size_t TotalArgs = 4; #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif static const char *Arguments[TotalArgs] = { "__global uint* out", "__global uint* out, __global uint* buf0, __global uint* buf1, __global " "uint* buf2, __global uint* buf3", "__global uint* out, __global uint* buf0, __global uint* buf1, __global " "uint* buf2, __global uint* buf3, \n" "__global uint* buf4, __global uint* buf5, __global uint* buf6, __global " "uint* buf7, __global uint* buf8", "__global uint* out, __global uint* buf0, __global uint* buf1, __global " "uint* buf2, __global uint* buf3,\n" "__global uint* buf4, __global uint* buf5, __global uint* buf6, __global " "uint* buf7, __global uint* buf8,\n" "__global uint* buf9, __global uint* buf10, __global uint* buf11, __global " "uint* buf12, __global uint* buf13,\n" "__global uint* buf14, __global uint* buf15, __global uint* buf16, " "__global uint* buf17, __global uint* buf18"}; static const char *strKernel = "__kernel void dummy(%s) \n" "{ \n" " uint id = get_global_id(0); \n" " uint value = 1; \n" " out[id] = value; \n" "} \n"; OCLPerfSVMKernelArguments::OCLPerfSVMKernelArguments() { _numSubTests = TotalQueues * TotalArgs; // * TotalBufs; failed_ = false; skip_ = false; } OCLPerfSVMKernelArguments::~OCLPerfSVMKernelArguments() {} void OCLPerfSVMKernelArguments::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { #if defined(CL_VERSION_2_0) // cl_mem buffer; _deviceId = deviceId; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); test_ = test; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); cl_device_svm_capabilities caps; error_ = clGetDeviceInfo(devices_[deviceId], CL_DEVICE_SVM_CAPABILITIES, sizeof(cl_device_svm_capabilities), &caps, NULL); // check if CL_DEVICE_SVM_COARSE_GRAIN_BUFFER is set. Skip the test if not. if (!(caps & 0x1)) { skip_ = true; testDescString = "SVM NOT supported. Test Skipped."; return; } if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } size_t numArguments = (test_ / TotalQueues) % TotalArgs; char *program = new char[4096]; SNPRINTF(program, sizeof(char) * 4096, strKernel, Arguments[numArguments]); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&program, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "dummy", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); delete[] program; static const size_t NumBuffs[TotalBufs] = {0x20, 0x100, 0x800, 0x2000}; size_t bufSize = BufSize * sizeof(cl_int); numBufs_ = (unsigned int)NumBuffs[test_ / (TotalQueues * TotalArgs)]; inOutBuffer = (void **)malloc(sizeof(void *) * numBufs_); for (size_t b = 0; b < numBufs_; ++b) { inOutBuffer[b] = clSVMAlloc(context_, CL_MEM_READ_WRITE, bufSize, 0); CHECK_RESULT((error_ != CL_SUCCESS), "clSVMAlloc() failed"); } #else skip_ = true; testDescString = "SVM NOT supported for < 2.0 builds. Test Skipped."; return; #endif } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfSVMKernelArguments::run(void) { if (skip_) { return; } if (failed_) { return; } #if defined(CL_VERSION_2_0) CPerfCounter timer; static const size_t Queues[] = {1, 2, 4, 8}; size_t numQueues = Queues[test_ % TotalQueues]; cl_uint numArguments; _wrapper->clGetKernelInfo(kernel_, CL_KERNEL_NUM_ARGS, sizeof(cl_uint), &numArguments, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetKernelInfo() failed"); size_t iter = Iterations / numQueues / numBufs_; iter = (iter == 0) ? 1 : iter; std::vector cmdQueues(numQueues); for (size_t q = 0; q < numQueues; ++q) { cl_command_queue cmdQueue = _wrapper->clCreateCommandQueue( context_, devices_[_deviceId], 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueue() failed"); cmdQueues[q] = cmdQueue; } // Warm-up for (size_t b = 0; b < (numBufs_ / numArguments); ++b) { for (size_t q = 0; q < numQueues; ++q) { for (cl_uint a = 0; a < numArguments; ++a) { void *buffer = inOutBuffer[(b * numArguments + a) % numBufs_]; error_ = _wrapper->clSetKernelArgSVMPointer(kernel_, a, buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArgSVMPointer() failed"); } size_t gws[1] = {256}; size_t lws[1] = {256}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues[q], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } } for (size_t q = 0; q < numQueues; ++q) { _wrapper->clFinish(cmdQueues[q]); } size_t disp = 0; timer.Reset(); timer.Start(); for (size_t i = 0; i < iter; ++i) { for (size_t b = 0; b < numBufs_; ++b) { for (size_t q = 0; q < numQueues; ++q) { for (cl_uint a = 0; a < numArguments; ++a) { void *buffer = inOutBuffer[(b * numArguments + a) % numBufs_]; error_ = _wrapper->clSetKernelArgSVMPointer(kernel_, a, buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArgSVMPointer() failed"); } size_t gws[1] = {256}; size_t lws[1] = {256}; error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues[q], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); disp++; } } } for (size_t q = 0; q < numQueues; ++q) { _wrapper->clFinish(cmdQueues[q]); } timer.Stop(); for (size_t q = 0; q < numQueues; ++q) { error_ = _wrapper->clReleaseCommandQueue(cmdQueues[q]); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseCommandQueue() failed"); } std::stringstream stream; stream << "Setup time (us) for " << numQueues << " queues, "; stream.flags(std::ios::right | std::ios::showbase); stream.width(2); stream << numArguments; stream << " arguments, "; stream.flags(std::ios::right | std::ios::showbase); stream.width(4); stream << numBufs_ << " buffers"; testDescString = stream.str(); _perfInfo = static_cast(timer.GetElapsedTime() * 1000000 / disp); #endif } unsigned int OCLPerfSVMKernelArguments::close(void) { #if defined(CL_VERSION_2_0) for (size_t b = 0; b < numBufs_; ++b) { _wrapper->clSVMFree(context_, inOutBuffer[b]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clSVMFree() failed"); } #endif return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMKernelArguments.h000066400000000000000000000032571450307266000264470ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_SVM_KERNEL_ARGUMENTS_H_ #define _OCL_PERF_SVM_KERNEL_ARGUMENTS_H_ #include #include "OCLTestImp.h" class OCLPerfSVMKernelArguments : public OCLTestImp { public: OCLPerfSVMKernelArguments(); virtual ~OCLPerfSVMKernelArguments(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int test_; bool skip_; void** inOutBuffer; unsigned int numBufs_; }; #endif // _OCL_PERF_SVM_KERNEL_ARGUMENTS_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMMap.cpp000066400000000000000000000114441450307266000244060ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSVMMap.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 5 static size_t sizeList[] = { 0x040000, 0x080000, 0x100000, 0x200000, 0x400000, }; #define NUM_FLAGS 4 static const cl_map_flags Flags[NUM_FLAGS] = {CL_MAP_READ, CL_MAP_WRITE, CL_MAP_READ | CL_MAP_WRITE, CL_MAP_WRITE_INVALIDATE_REGION}; OCLPerfSVMMap::OCLPerfSVMMap() { _numSubTests = NUM_SIZES * NUM_FLAGS; failed_ = false; skip_ = false; } OCLPerfSVMMap::~OCLPerfSVMMap() {} void OCLPerfSVMMap::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { #if defined(CL_VERSION_2_0) _deviceId = deviceId; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testFlag_ = test / NUM_SIZES; testSize_ = test % NUM_SIZES; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); cl_device_svm_capabilities caps; error_ = clGetDeviceInfo(devices_[deviceId], CL_DEVICE_SVM_CAPABILITIES, sizeof(cl_device_svm_capabilities), &caps, NULL); // check if CL_DEVICE_SVM_COARSE_GRAIN_BUFFER is set. Skip the test if not. if (!(caps & 0x1)) { skip_ = true; testDescString = "SVM NOT supported. Test Skipped."; return; } if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } #else skip_ = true; testDescString = "SVM NOT supported for < 2.0 builds. Test Skipped."; return; #endif } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfSVMMap::run(void) { if (skip_) { return; } if (failed_) { return; } #if defined(CL_VERSION_2_0) void *buffer; CPerfCounter timer; void *hostPtr = NULL; const size_t bufSize = sizeList[testSize_] * sizeof(cl_int4); const cl_map_flags flag = Flags[testFlag_]; const size_t iter = 100; timer.Reset(); buffer = clSVMAlloc(context_, CL_MEM_READ_WRITE, bufSize, 0); CHECK_RESULT((error_ != CL_SUCCESS), "clSVMAlloc() failed"); for (size_t i = 0; i < iter; ++i) { timer.Start(); error_ = clEnqueueSVMMap(cmdQueues_[_deviceId], CL_FALSE, flag, buffer, bufSize, 0, 0, 0); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueSVMMap() failed"); error_ = clEnqueueSVMUnmap(cmdQueues_[_deviceId], buffer, 0, 0, 0); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueSVMUnmap() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); } clSVMFree(context_, (void *)buffer); char pFlags[4]; pFlags[0] = (testFlag_ == 0 || testFlag_ == 2) ? 'R' : '_'; // CL_MAP_READ pFlags[1] = (testFlag_ == 1 || testFlag_ == 2) ? 'W' : '_'; // CL_MAP_WRITE pFlags[2] = (testFlag_ == 3) ? 'I' : '_'; // CL_MAP_WRITE_INVALIDATE_REGION char buf[256]; SNPRINTF(buf, sizeof(buf), "Map + Unmap (GB/s) for %6d KB, flags=%3s", (int)bufSize / 1024, pFlags); testDescString = buf; double sec = timer.GetElapsedTime(); _perfInfo = static_cast((bufSize * iter * (double)(1e-09)) / sec); #endif } unsigned int OCLPerfSVMMap::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMMap.h000066400000000000000000000031001450307266000240410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_SVM_MAP_H_ #define _OCL_PERF_SVM_MAP_H_ #include "OCLTestImp.h" class OCLPerfSVMMap : public OCLTestImp { public: OCLPerfSVMMap(); virtual ~OCLPerfSVMMap(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int testSize_; unsigned int testFlag_; bool skip_; }; #endif // _OCL_PERF_SVM_MAP_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMMemFill.cpp000066400000000000000000000162061450307266000252170ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSVMMemFill.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_MODES 3 #define NUM_CG_FLAGS 2 #define NUM_FG_FLAGS 3 static size_t typeSizeList[] = { 1, // sizeof(cl_uchar) 2, 4, 8, 16, 32, 64, 128, // sizeof(cl_ulong16) }; static unsigned int eleNumList[] = { 0x0020000, 0x0080000, 0x0200000, 0x0800000, 0x2000000, }; #if defined(CL_VERSION_2_0) static const cl_svm_mem_flags CGFlags[NUM_CG_FLAGS] = { CL_MEM_READ_WRITE, CL_MEM_WRITE_ONLY, }; static const cl_svm_mem_flags FGFlags[NUM_FG_FLAGS] = { 0, CL_MEM_SVM_FINE_GRAIN_BUFFER, CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS, }; #endif OCLPerfSVMMemFill::OCLPerfSVMMemFill() { num_typeSize_ = sizeof(typeSizeList) / sizeof(size_t); num_elements_ = sizeof(eleNumList) / sizeof(unsigned int); _numSubTests = num_elements_ * num_typeSize_ * (NUM_FG_FLAGS * NUM_CG_FLAGS + 1); failed_ = false; skip_ = false; } OCLPerfSVMMemFill::~OCLPerfSVMMemFill() {} void OCLPerfSVMMemFill::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); #if defined(CL_VERSION_2_0) FGSystem_ = (test >= (num_elements_ * num_typeSize_ * NUM_FG_FLAGS * NUM_CG_FLAGS)); testFGFlag_ = (test / (num_elements_ * num_typeSize_ * NUM_CG_FLAGS)) % NUM_FG_FLAGS; testCGFlag_ = (test / (num_elements_ * num_typeSize_)) % NUM_CG_FLAGS; testTypeSize_ = typeSizeList[(test / num_elements_) % num_typeSize_]; testNumEle_ = eleNumList[test % num_elements_]; cl_device_svm_capabilities caps; error_ = clGetDeviceInfo(devices_[deviceId], CL_DEVICE_SVM_CAPABILITIES, sizeof(cl_device_svm_capabilities), &caps, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if ((caps & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER) == 0) { skip_ = true; // Should never happen as OCL 2.0 devices are required to // support coarse grain SVM testDescString = "Coarse Grain Buffer NOT supported. Test Skipped."; return; } else if (testFGFlag_ > 0 && (caps & CL_DEVICE_SVM_FINE_GRAIN_BUFFER) == 0) { skip_ = true; // No support for fine grain buffer SVM testDescString = "Fine Grain Buffer NOT supported. Test Skipped."; return; } else if (FGSystem_ && (caps & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM) == 0) { skip_ = true; // No support for fine grain system SVM testDescString = "Fine Grain System NOT supported. Test Skipped."; return; } else if (testFGFlag_ == 2 && ((caps & CL_DEVICE_SVM_ATOMICS) == 0)) { skip_ = true; // No support for SVM Atomic testDescString = "SVM Atomic NOT supported. Test Skipped."; return; } cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } return; #else skip_ = true; testDescString = "SVM NOT supported for < 2.0 builds. Test Skipped."; return; #endif } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfSVMMemFill::run(void) { if (skip_) { return; } if (failed_) { return; } #if defined(CL_VERSION_2_0) cl_uint *buffer = NULL; CPerfCounter timer; size_t iter = 100, bufSize = testNumEle_ * 4; cl_mem_flags flags = CGFlags[testCGFlag_] | FGFlags[testFGFlag_]; void *data = malloc(bufSize); timer.Reset(); if (!FGSystem_) { buffer = (cl_uint *)clSVMAlloc(context_, flags, bufSize, (cl_uint)testTypeSize_); CHECK_RESULT(buffer == 0, "Allocation failed"); } else { // FGSystem_ = true buffer = (cl_uint *)malloc(bufSize); CHECK_RESULT(buffer == 0, "Allocation failed"); } timer.Start(); for (size_t i = 0; i < iter; ++i) { error_ = clEnqueueSVMMemFill(cmdQueues_[_deviceId], buffer, data, testTypeSize_, bufSize, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueSVMMemFill() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); if (!FGSystem_) { clSVMFree(context_, (void *)buffer); } else { free(buffer); } char pFlags[5]; pFlags[0] = (testCGFlag_ == 0 || testCGFlag_ == 2) ? 'R' : '_'; // CL_MEM_READ_ONLY pFlags[1] = (testCGFlag_ == 0 || testCGFlag_ == 1) ? 'W' : '_'; // CL_MEM_WRITE_ONLY pFlags[2] = (testFGFlag_ == 1 || testFGFlag_ == 2) ? 'F' : '_'; // CL_MEM_SVM_FINE_GRAIN_BUFFER pFlags[3] = (testFGFlag_ == 2) ? 'A' : '_'; // CL_MEM_SVM_ATOMICS char buf[256]; if (!FGSystem_ && (testFGFlag_ == 0)) { SNPRINTF(buf, sizeof(buf), "Coarse Grain Buffer SVMMemFill (GB/s) for %6d KB, typeSize:%3d, " "flags=%4s", (int)bufSize / 1024, (int)testTypeSize_, pFlags); } else if (!FGSystem_ && (testFGFlag_ > 0)) { SNPRINTF(buf, sizeof(buf), "Fine Grain Buffer SVMMemFill (GB/s) for %6d KB, typeSize:%3d, " "flags=%4s", (int)bufSize / 1024, (int)testTypeSize_, pFlags); } else if (FGSystem_) { SNPRINTF(buf, sizeof(buf), "Fine Grain System SVMMemFill (GB/s) for %6d KB, typeSize:%3d, " "flags=%4s", (int)bufSize / 1024, (int)testTypeSize_, pFlags); } testDescString = buf; double sec = timer.GetElapsedTime(); _perfInfo = static_cast((bufSize * iter * (double)(1e-09)) / sec); #endif } unsigned int OCLPerfSVMMemFill::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMMemFill.h000066400000000000000000000033561450307266000246660ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_SVM_MEMFILL_H_ #define _OCL_PERF_SVM_MEMFILL_H_ #include "OCLTestImp.h" class OCLPerfSVMMemFill : public OCLTestImp { public: OCLPerfSVMMemFill(); virtual ~OCLPerfSVMMemFill(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: unsigned int num_typeSize_; unsigned int num_elements_; bool FGSystem_; size_t testTypeSize_; unsigned int testCGFlag_; unsigned int testFGFlag_; unsigned int testNumEle_; bool atomic_; bool failed_; bool skip_; }; #endif // _OCL_PERF_SVM_MEMFILL_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMMemcpy.cpp000066400000000000000000000166371450307266000251340ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSVMMemcpy.h" #include #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 5 #define NUM_SRC_FLAGS 2 #define NUM_DST_FLAGS 2 #define NUM_FG_FLAGS 3 static size_t sizeList[NUM_SIZES] = { 0x040000, 0x080000, 0x100000, 0x200000, 0x400000, }; #if defined(CL_VERSION_2_0) static const cl_svm_mem_flags srcFlagList[NUM_SRC_FLAGS] = {CL_MEM_READ_WRITE, CL_MEM_READ_ONLY}; static const cl_svm_mem_flags dstFlagList[NUM_DST_FLAGS] = {CL_MEM_READ_WRITE, CL_MEM_WRITE_ONLY}; static const cl_svm_mem_flags FGFlags[NUM_FG_FLAGS] = { 0, CL_MEM_SVM_FINE_GRAIN_BUFFER, CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS, }; #endif OCLPerfSVMMemcpy::OCLPerfSVMMemcpy() { _numSubTests = (NUM_SRC_FLAGS * NUM_DST_FLAGS * NUM_FG_FLAGS + 1) * NUM_SIZES; failed_ = false; skip_ = false; } OCLPerfSVMMemcpy::~OCLPerfSVMMemcpy() {} void OCLPerfSVMMemcpy::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); #if defined(CL_VERSION_2_0) FGSystem_ = (test >= (NUM_SIZES * NUM_SRC_FLAGS * NUM_DST_FLAGS * NUM_FG_FLAGS)); testFGFlag_ = (test / (NUM_SIZES * NUM_DST_FLAGS * NUM_SRC_FLAGS)) % (NUM_FG_FLAGS); testSrcFlag_ = (test / (NUM_SIZES * NUM_DST_FLAGS)) % (NUM_SRC_FLAGS); testDstFlag_ = (test / NUM_SIZES) % (NUM_DST_FLAGS); testSize_ = test % NUM_SIZES; cl_device_svm_capabilities caps; error_ = clGetDeviceInfo(devices_[deviceId], CL_DEVICE_SVM_CAPABILITIES, sizeof(cl_device_svm_capabilities), &caps, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if ((caps & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER) == 0) { skip_ = true; // Should never happen as OCL 2.0 devices are required to // support coarse grain SVM testDescString = "Coarse Grain Buffer NOT supported. Test Skipped."; return; } else if ((testFGFlag_ > 0) && (caps & CL_DEVICE_SVM_FINE_GRAIN_BUFFER) == 0) { skip_ = true; // No support for fine grain buffer SVM testDescString = "Fine Grain Buffer NOT supported. Test Skipped."; return; } else if (FGSystem_ && (caps & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM) == 0) { skip_ = true; // No support for fine grain system SVM testDescString = "Fine Grain System NOT supported. Test Skipped."; return; } else if ((testFGFlag_ == 2) && ((caps & CL_DEVICE_SVM_ATOMICS) == 0)) { skip_ = true; // No support for SVM Atomic testDescString = "SVM Atomic NOT supported. Test Skipped."; return; } cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } return; #else skip_ = true; testDescString = "SVM NOT supported for < 2.0 builds. Test Skipped."; return; #endif } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfSVMMemcpy::run(void) { if (skip_) { return; } if (failed_) { return; } #if defined(CL_VERSION_2_0) cl_uint *src = NULL, *dst = NULL; CPerfCounter timer; size_t bufSize = sizeList[testSize_] * sizeof(cl_int4); size_t iter = 100; cl_mem_flags srcFlags = srcFlagList[testSrcFlag_] | FGFlags[testFGFlag_]; cl_mem_flags dstFlags = dstFlagList[testDstFlag_] | FGFlags[testFGFlag_]; size_t gws[1] = {bufSize / sizeof(cl_int4)}; size_t lws[1] = {64}; if (!FGSystem_) { src = (cl_uint *)clSVMAlloc(context_, srcFlags, bufSize, 0); CHECK_RESULT(src == 0, "Allocation failed"); dst = (cl_uint *)clSVMAlloc(context_, dstFlags, bufSize, 0); CHECK_RESULT(dst == 0, "Allocation failed"); } else { // FGSystem_ == true src = (cl_uint *)malloc(bufSize); dst = (cl_uint *)malloc(bufSize); } timer.Reset(); timer.Start(); for (size_t i = 0; i < iter; ++i) { clEnqueueSVMMemcpy(cmdQueues_[_deviceId], false, dst, src, bufSize, 0, NULL, NULL); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); if (!FGSystem_) { clSVMFree(context_, (void *)src); clSVMFree(context_, (void *)dst); } else { // FGSystem_ = true free(src); free(dst); } char pSrcFlags[5]; pSrcFlags[0] = (testSrcFlag_ == 0 || testSrcFlag_ == 1) ? 'R' : '_'; // CL_MEM_READ_ONLY pSrcFlags[1] = (testSrcFlag_ == 0) ? 'W' : '_'; // CL_MEM_WRITE_ONLY pSrcFlags[2] = (testFGFlag_ == 1 || testFGFlag_ == 2) ? 'F' : '_'; // CL_MEM_SVM_FINE_GRAIN_BUFFER pSrcFlags[3] = (testFGFlag_ == 2) ? 'A' : '_'; // CL_MEM_SVM_ATOMICS pSrcFlags[4] = '\0'; char pDstFlags[5]; pDstFlags[0] = (testDstFlag_ == 0) ? 'R' : '_'; pDstFlags[1] = (testDstFlag_ == 0 || testDstFlag_ == 1) ? 'W' : '_'; pDstFlags[2] = (testFGFlag_ == 1 || testFGFlag_ == 2) ? 'F' : '_'; pDstFlags[3] = (testFGFlag_ == 2) ? 'A' : '_'; pSrcFlags[4] = '\0'; char buf[256]; if (FGSystem_) { SNPRINTF(buf, sizeof(buf), "Fine Grain System SVMMemcpy (GB/s) for %6d KB, from:%4s to:%4s", (int)bufSize / 1024, pSrcFlags, pDstFlags); } else if (testFGFlag_ == 0) { SNPRINTF(buf, sizeof(buf), "Coarse Grain Buffer SVMMemcpy (GB/s) for %6d KB, from:%4s to:%4s", (int)bufSize / 1024, pSrcFlags, pDstFlags); } else { SNPRINTF(buf, sizeof(buf), "Fine Grain Buffer SVMMemcpy (GB/s) for %6d KB, from:%4s to:%4s", (int)bufSize / 1024, pSrcFlags, pDstFlags); } testDescString = buf; double sec = timer.GetElapsedTime(); _perfInfo = static_cast((bufSize * iter * (double)(1e-09)) / sec); #endif } unsigned int OCLPerfSVMMemcpy::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMMemcpy.h000066400000000000000000000032401450307266000245630ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERF_SVM_MEMCPY_H_ #define _OCL_PERF_SVM_MEMCPY_H_ #include "OCLTestImp.h" class OCLPerfSVMMemcpy : public OCLTestImp { public: OCLPerfSVMMemcpy(); virtual ~OCLPerfSVMMemcpy(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int testSize_; unsigned int testSrcFlag_; unsigned int testDstFlag_; unsigned int testFGFlag_; bool FGSystem_; bool skip_; }; #endif // _OCL_PERF_SVM_MEMCPY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMSampleRate.cpp000066400000000000000000000276311450307266000257330ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSVMSampleRate.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_TYPES 3 static const char *types[NUM_TYPES] = {"float", "float2", "float4"}; static const unsigned int typeSizes[NUM_TYPES] = {4, 8, 16}; #define NUM_SIZES 12 static const unsigned int sizes[NUM_SIZES] = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048}; #define NUM_BUFS 6 #define MAX_BUFS (1 << (NUM_BUFS - 1)) #define NUM_READS numBufs_ OCLPerfSVMSampleRate::OCLPerfSVMSampleRate() { _numSubTests = NUM_TYPES * NUM_SIZES * NUM_BUFS * 3; skip_ = false; } OCLPerfSVMSampleRate::~OCLPerfSVMSampleRate() {} void OCLPerfSVMSampleRate::setKernel(void) { shader_.clear(); shader_ += "kernel void sampleRate(global DATATYPE* outBuffer, unsigned int " "inBufSize, unsigned int writeIt,\n"; char buf[256]; for (unsigned int i = 0; i < numBufs_; i++) { SNPRINTF(buf, sizeof(buf), "global DATATYPE* inBuffer%d", i); shader_ += buf; if (i < (numBufs_ - 1)) { shader_ += ","; } shader_ += "\n"; } shader_ += ")\n"; shader_ += "{\n" " uint gid = get_global_id(0);\n" " uint inputIdx = gid % inBufSize;\n" " DATATYPE tmp = (DATATYPE)0.0f;\n"; for (unsigned int j = 0; j < (NUM_READS / numBufs_); j++) { for (unsigned int i = 0; i < numBufs_; i++) { SNPRINTF(buf, sizeof(buf), " tmp += inBuffer%d[inputIdx];\n", i); shader_ += buf; } shader_ += " inputIdx += writeIt;\n"; // writeIt is 0, so we don't need // to add a modulo } if (typeSizes[typeIdx_] > 4) { shader_ += " if (writeIt*(unsigned int)tmp.x) outBuffer[gid] = tmp;\n" "}\n"; } else { shader_ += " if (writeIt*(unsigned int)tmp) outBuffer[gid] = tmp;\n" "}\n"; } // printf("Shader -> %s\n", shader_.c_str()); } void OCLPerfSVMSampleRate::setData(void *buffer, unsigned int val) { #if defined(CL_VERSION_2_0) error_ = _wrapper->clEnqueueSVMMemFill( cmd_queue_, buffer, &val, sizeof(unsigned int), bufSize_, 0, NULL, NULL); if ((error_ == CL_MEM_OBJECT_ALLOCATION_FAILURE) || (error_ == CL_OUT_OF_RESOURCES) || (error_ == CL_OUT_OF_HOST_MEMORY)) { error_ = CL_SUCCESS; skip_ = true; testDescString = "Not enough memory, skipped"; return; } _wrapper->clFinish(cmd_queue_); #endif } void OCLPerfSVMSampleRate::checkData(void *buffer) { #if defined(CL_VERSION_2_0) error_ = _wrapper->clEnqueueSVMMap(cmd_queue_, true, CL_MAP_READ, buffer, outBufSize_, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueSVMMap failed"); float *data = (float *)buffer; for (unsigned int i = 0; i < outBufSize_ / sizeof(float); i++) { if (data[i] != (float)numBufs_) { printf("Data validation failed at %d! Got %f, expected %f\n", i, data[i], (float)numBufs_); break; } } error_ = _wrapper->clEnqueueSVMUnmap(cmd_queue_, buffer, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); #endif } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfSVMSampleRate::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_device_id device; error_ = CL_SUCCESS; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; cmd_queue_ = 0; inBuffer_ = NULL; outBuffer_ = NULL; coarseGrainBuffer_ = false; fineGrainBuffer_ = false; fineGrainSystem_ = false; // We compute a square domain width_ = sizes[test % NUM_SIZES]; typeIdx_ = (test / NUM_SIZES) % NUM_TYPES; bufSize_ = width_ * width_ * typeSizes[typeIdx_]; numBufs_ = (1 << ((test / (NUM_SIZES * NUM_TYPES)) % NUM_BUFS)); svmMode_ = test / (NUM_SIZES * NUM_TYPES * NUM_BUFS); device = devices_[deviceId]; #if defined(CL_VERSION_2_0) cl_device_svm_capabilities caps; error_ = clGetDeviceInfo(device, CL_DEVICE_SVM_CAPABILITIES, sizeof(cl_device_svm_capabilities), &caps, NULL); if (svmMode_ == 0) { if (caps & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER) { coarseGrainBuffer_ = true; testdesc = "crs"; } else { skip_ = true; // Should never happen as OCL 2.0 devices are required to // support coarse grain SVM testDescString = "Coarse grain SVM NOT supported. Test Skipped."; return; } } else if (svmMode_ == 1) { if (caps & CL_DEVICE_SVM_FINE_GRAIN_BUFFER) { fineGrainBuffer_ = true; testdesc = "fgb"; } else { skip_ = true; // No support for fine grain buffer SVM testDescString = "Fine grain buffer SVM NOT supported. Test Skipped."; return; } } else if (svmMode_ == 2) { if (caps & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM) { fineGrainSystem_ = true; testdesc = "fgs"; } else { skip_ = true; // No support for fine grain system SVM testDescString = "Fine grain system SVM NOT supported. Test Skipped."; return; } } char charbuf[1024]; cmd_queue_ = cmdQueues_[_deviceId]; outBufSize_ = sizes[NUM_SIZES - 1] * sizes[NUM_SIZES - 1] * typeSizes[NUM_TYPES - 1]; if ((svmMode_ == 0) || (svmMode_ == 1)) { inBuffer_ = (void **)malloc(sizeof(void *) * numBufs_); memset(inBuffer_, 0, sizeof(void *) * numBufs_); cl_mem_flags flags; flags = CL_MEM_READ_ONLY; if (svmMode_ == 1) flags |= CL_MEM_SVM_FINE_GRAIN_BUFFER; for (unsigned int i = 0; i < numBufs_; i++) { inBuffer_[i] = _wrapper->clSVMAlloc(context_, flags, bufSize_, 0); CHECK_RESULT(inBuffer_[i] == NULL, "clCreateBuffer(inBuffer) failed"); } flags = CL_MEM_WRITE_ONLY; if (svmMode_ == 1) flags |= CL_MEM_SVM_FINE_GRAIN_BUFFER; outBuffer_ = _wrapper->clSVMAlloc(context_, flags, outBufSize_, 0); CHECK_RESULT(outBuffer_ == NULL, "clCreateBuffer(outBuffer) failed"); } else { inBuffer_ = (void **)malloc(sizeof(void *) * numBufs_); memset(inBuffer_, 0, sizeof(void *) * numBufs_); for (unsigned int i = 0; i < numBufs_; i++) { inBuffer_[i] = malloc(bufSize_); CHECK_RESULT(inBuffer_[i] == NULL, "malloc(inBuffer) failed"); } outBuffer_ = malloc(outBufSize_); CHECK_RESULT(outBuffer_ == NULL, "malloc(outBuffer) failed"); } setKernel(); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); const char *buildOps = NULL; // Have to force OCL 2.0 to use SVM SNPRINTF(charbuf, sizeof(charbuf), "-cl-std=CL2.0 -D DATATYPE=%s", types[typeIdx_]); buildOps = charbuf; error_ = _wrapper->clBuildProgram(program_, 1, &device, buildOps, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "sampleRate", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArgSVMPointer(kernel_, 0, outBuffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(outBuffer) failed"); unsigned int sizeDW = width_ * width_; error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(unsigned int), (void *)&sizeDW); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(sizeDW) failed"); unsigned int writeIt = 0; error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(unsigned int), (void *)&writeIt); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(writeIt) failed"); for (unsigned int i = 0; i < numBufs_; i++) { error_ = _wrapper->clSetKernelArgSVMPointer(kernel_, i + 3, inBuffer_[i]); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(inBuffer) failed"); setData(inBuffer_[i], 0x3f800000); if (skip_) return; } setData(outBuffer_, 0xdeadbeef); #else skip_ = true; testDescString = "SVM NOT supported for < 2.0 builds. Test Skipped."; return; #endif } void OCLPerfSVMSampleRate::run(void) { int global = outBufSize_ / typeSizes[typeIdx_]; int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; unsigned int maxIter = MAX_ITERATIONS * (MAX_BUFS / numBufs_); if (skip_) return; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < maxIter; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); } CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // Test doesn't write anything, so nothing to check // checkData(outBuffer_); // Compute GB/s double perf = ((double)outBufSize_ * NUM_READS * (double)maxIter * (double)(1e-09)) / sec; char buf[256]; SNPRINTF(buf, sizeof(buf), "Domain %dx%d, %2d %s bufs, %6s, %4dx%4d (GB/s)", sizes[NUM_SIZES - 1], sizes[NUM_SIZES - 1], numBufs_, testdesc.c_str(), types[typeIdx_], width_, width_); _perfInfo = (float)perf; testDescString = buf; } unsigned int OCLPerfSVMSampleRate::close(void) { #if defined(CL_VERSION_2_0) if (cmd_queue_) _wrapper->clFinish(cmd_queue_); if ((svmMode_ == 0) || (svmMode_ == 1)) { if (inBuffer_) { for (unsigned int i = 0; i < numBufs_; i++) { if (inBuffer_[i]) { _wrapper->clSVMFree(context_, inBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clSVMFree(inBuffer_) failed"); } } free(inBuffer_); } if (outBuffer_) { _wrapper->clSVMFree(context_, outBuffer_); } } else { if (inBuffer_) { for (unsigned int i = 0; i < numBufs_; i++) { if (inBuffer_[i]) { free(inBuffer_[i]); } } free(inBuffer_); } if (outBuffer_) { free(outBuffer_); } } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } #endif return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSVMSampleRate.h000066400000000000000000000040551450307266000253730ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_SVMSAMPLERATE_H_ #define _OCL_SVMSAMPLERATE_H_ #include "OCLTestImp.h" class OCLPerfSVMSampleRate : public OCLTestImp { public: OCLPerfSVMSampleRate(); virtual ~OCLPerfSVMSampleRate(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void setData(void* buffer, unsigned int data); void checkData(void* buffer); void setKernel(void); cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; void** inBuffer_; void* outBuffer_; unsigned int width_; unsigned int bufSize_; unsigned int outBufSize_; static const unsigned int MAX_ITERATIONS = 25; unsigned int numBufs_; unsigned int typeIdx_; unsigned int svmMode_; bool skip_; bool coarseGrainBuffer_; bool fineGrainBuffer_; bool fineGrainSystem_; std::string testdesc; }; #endif // _OCL_SVMSAMPLERATE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSampleRate.cpp000066400000000000000000000272031450307266000253400ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSampleRate.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_TYPES 3 static const char *types[NUM_TYPES] = {"float", "float2", "float4"}; static const unsigned int typeSizes[NUM_TYPES] = {4, 8, 16}; #define NUM_SIZES 12 static const unsigned int sizes[NUM_SIZES] = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048}; #define NUM_BUFS 6 #define MAX_BUFS (1 << (NUM_BUFS - 1)) OCLPerfSampleRate::OCLPerfSampleRate() { _numSubTests = NUM_TYPES * NUM_SIZES * NUM_BUFS; skip_ = false; } OCLPerfSampleRate::~OCLPerfSampleRate() {} void OCLPerfSampleRate::setKernel(void) { shader_.clear(); shader_ += "kernel void sampleRate(global DATATYPE* outBuffer, unsigned int " "inBufSize, unsigned int writeIt,\n"; char buf[256]; for (unsigned int i = 0; i < numBufs_; i++) { SNPRINTF(buf, sizeof(buf), "global DATATYPE* inBuffer%d", i); shader_ += buf; if (i < (numBufs_ - 1)) { shader_ += ","; } shader_ += "\n"; } shader_ += ")\n"; shader_ += "{\n" " uint gid = get_global_id(0);\n" " uint inputIdx = gid % inBufSize;\n" " DATATYPE tmp = (DATATYPE)0.0f;\n"; for (unsigned int i = 0; i < numBufs_; i++) { SNPRINTF(buf, sizeof(buf), " tmp += inBuffer%d[inputIdx];\n", i); shader_ += buf; } if (typeSizes[typeIdx_] > 4) { shader_ += " if (writeIt*(unsigned int)tmp.x) outBuffer[gid] = tmp;\n" "}\n"; } else { shader_ += " if (writeIt*(unsigned int)tmp) outBuffer[gid] = tmp;\n" "}\n"; } // printf("Shader -> %s\n", shader_.c_str()); } void OCLPerfSampleRate::setData(cl_mem buffer, unsigned int val) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); if (data == NULL) { if ((error_ == CL_MEM_OBJECT_ALLOCATION_FAILURE) || (error_ == CL_OUT_OF_RESOURCES) || (error_ == CL_OUT_OF_HOST_MEMORY)) { printf("WARNING: Not enough memory, skipped\n"); error_ = CL_SUCCESS; skip_ = true; } else { CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueMapBuffer failed"); } return; } for (unsigned int i = 0; i < bufSize_ / sizeof(unsigned int); i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } void OCLPerfSampleRate::checkData(cl_mem buffer) { float *data = (float *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_READ, 0, outBufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < outBufSize_ / sizeof(float); i++) { if (data[i] != (float)numBufs_) { printf("Data validation failed at %d! Got %f, expected %f\n", i, data[i], (float)numBufs_); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfSampleRate::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; // We compute a square domain width_ = sizes[test % NUM_SIZES]; typeIdx_ = (test / NUM_SIZES) % NUM_TYPES; bufSize_ = width_ * width_ * typeSizes[typeIdx_]; numBufs_ = (1 << (test / (NUM_SIZES * NUM_TYPES))); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); delete platforms; } /* * If we could find a platform, use it. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); inBuffer_ = (cl_mem *)malloc(sizeof(cl_mem) * numBufs_); memset(inBuffer_, 0, sizeof(cl_mem) * numBufs_); for (unsigned int i = 0; i < numBufs_; i++) { inBuffer_[i] = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, bufSize_, NULL, &error_); CHECK_RESULT(inBuffer_[i] == 0, "clCreateBuffer(inBuffer) failed"); } outBufSize_ = sizes[NUM_SIZES - 1] * sizes[NUM_SIZES - 1] * typeSizes[NUM_TYPES - 1]; outBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, outBufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); setKernel(); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); const char *buildOps = NULL; SNPRINTF(charbuf, sizeof(charbuf), "-D DATATYPE=%s", types[typeIdx_]); buildOps = charbuf; error_ = _wrapper->clBuildProgram(program_, 1, &device, buildOps, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "sampleRate", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(outBuffer) failed"); unsigned int sizeDW = width_ * width_; error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(unsigned int), (void *)&sizeDW); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(sizeDW) failed"); unsigned int writeIt = 0; error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(unsigned int), (void *)&writeIt); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(writeIt) failed"); for (unsigned int i = 0; i < numBufs_; i++) { error_ = _wrapper->clSetKernelArg(kernel_, i + 3, sizeof(cl_mem), (void *)&inBuffer_[i]); CHECK_RESULT(error_ != CL_SUCCESS, "clSetKernelArg(inBuffer) failed"); setData(inBuffer_[i], 0x3f800000); if (skip_) return; } setData(outBuffer_, 0xdeadbeef); } void OCLPerfSampleRate::run(void) { int global = outBufSize_ / typeSizes[typeIdx_]; int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; unsigned int maxIter = MAX_ITERATIONS * (MAX_BUFS / numBufs_); if (skip_) return; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < maxIter; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); } CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // checkData(outBuffer_); // Compute GB/s double perf = ((double)outBufSize_ * numBufs_ * (double)maxIter * (double)(1e-09)) / sec; char buf[256]; SNPRINTF(buf, sizeof(buf), "Domain %dx%d, %2d bufs, %6s, %4dx%4d (GB/s)", sizes[NUM_SIZES - 1], sizes[NUM_SIZES - 1], numBufs_, types[typeIdx_], width_, width_); _perfInfo = (float)perf; testDescString = buf; } unsigned int OCLPerfSampleRate::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { for (unsigned int i = 0; i < numBufs_; i++) { if (inBuffer_[i]) { error_ = _wrapper->clReleaseMemObject(inBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } } free(inBuffer_); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSampleRate.h000066400000000000000000000037111450307266000250030ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_SAMPLERATE_H_ #define _OCL_SAMPLERATE_H_ #include "OCLTestImp.h" class OCLPerfSampleRate : public OCLTestImp { public: OCLPerfSampleRate(); virtual ~OCLPerfSampleRate(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); void setKernel(void); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem* inBuffer_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int outBufSize_; static const unsigned int MAX_ITERATIONS = 25; unsigned int numBufs_; unsigned int typeIdx_; bool skip_; }; #endif // _OCL_SAMPLERATE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfScalarReplArrayElem.cpp000066400000000000000000000266251450307266000271440ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfScalarReplArrayElem.h" #include #include #include #include "CL/cl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 1 static const unsigned int Sizes[NUM_SIZES] = {16777216}; // 16 static void genKernelSource(const char *vtypeName, unsigned arrayLen, unsigned loopCount, char *source) { sprintf(source, "%s foo(uint lid, __local %s *localLocal)\n" "{\n" " %s val0 = 0.0f;\n" " %s val1 = 0.0f;\n" " for (int i = 0; i < %d; ++i) {\n" " val0 += localLocal[lid];\n" " lid += 16;\n" " }\n" " %s val = val0+val1;\n" " return val;\n" "}\n" "__kernel __attribute__((reqd_work_group_size(64,1,1)))" " void _ldsReadSpeed(__global %s *outBuf)\n" "{\n" " uint gid = (int) get_global_id(0);\n" " uint lid = (int) get_local_id(0);\n" " __local %s localLocal[%d];\n" " outBuf[gid] = foo(lid, localLocal);\n" "}\n", vtypeName, vtypeName, vtypeName, vtypeName, loopCount, vtypeName, vtypeName, vtypeName, arrayLen); } typedef struct { const char *name; unsigned nBytes; } ExplicitType; static const ExplicitType tyChar = {"char", 1}; static const ExplicitType tyShort = {"short", 2}; static const ExplicitType tyInt = {"int", 4}; static const ExplicitType tyLong = {"long", 8}; static const ExplicitType tyFloat = {"float", 4}; static const ExplicitType tyDouble = {"double", 8}; typedef struct { ExplicitType elemType; unsigned nElems; const char *name; unsigned getSize() const { return elemType.nBytes * nElems; } } VectorType; static const VectorType vecTypes[] = { {tyChar, 8, "char8"}, {tyShort, 4, "short4"}, {tyInt, 2, "int2"}, {tyFloat, 2, "float2"}, {tyLong, 1, "long"}, {tyChar, 16, "char16"}, {tyShort, 8, "short8"}, {tyInt, 4, "int4"}, {tyFloat, 4, "float4"}, {tyLong, 2, "long2"}, {tyShort, 16, "short16"}, {tyInt, 8, "int8"}, {tyFloat, 8, "float8"}, {tyLong, 4, "long4"}, {tyInt, 16, "int16"}, {tyFloat, 16, "float16"}, {tyLong, 8, "long8"}, {tyLong, 16, "long16"}}; static const unsigned ldsBytes = 4 * 4096; static const unsigned nVecTypes = sizeof(vecTypes) / sizeof(VectorType); void OCLPerfScalarReplArrayElem::genShader(unsigned int idx) { VectorType vecType = vecTypes[idx]; ExplicitType elemType = vecType.elemType; unsigned vecSize = vecType.nElems; unsigned arrayLen = ldsBytes / vecType.getSize(); unsigned loopCount = arrayLen / 16; char source[7192]; genKernelSource(vecType.name, arrayLen, loopCount, source); shader_ = std::string(source); numReads_ = loopCount; itemWidth_ = vecType.getSize(); } OCLPerfScalarReplArrayElem::OCLPerfScalarReplArrayElem() { _numSubTests = NUM_SIZES * nVecTypes; } OCLPerfScalarReplArrayElem::~OCLPerfScalarReplArrayElem() {} void OCLPerfScalarReplArrayElem::setData(cl_mem buffer, float val) { float *data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } void OCLPerfScalarReplArrayElem::checkData(cl_mem buffer) { float *data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) { if (data[i] != (float)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfScalarReplArrayElem::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; outBuffer_ = 0; _openTest = test; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } width_ = Sizes[test % NUM_SIZES]; shaderIdx_ = test / NUM_SIZES; bufSize_ = width_; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); genShader(shaderIdx_); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &device, "", NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_ldsReadSpeed", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); // setData(outBuffer_, 1.2345678f); } void OCLPerfScalarReplArrayElem::run(void) { int global = bufSize_ / itemWidth_; int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // Constant bandwidth in GB/s double perf = ((double)global * numReads_ * itemWidth_ * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " %10s %8d threads, %4d reads (GB/s)", vecTypes[shaderIdx_].name, global, numReads_); testDescString = buf; // checkData(outBuffer_); } unsigned int OCLPerfScalarReplArrayElem::close(void) { if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfScalarReplArrayElem.h000066400000000000000000000040261450307266000266000ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ScalarReplArrayElem_H_ #define _OCL_ScalarReplArrayElem_H_ #include "OCLTestImp.h" class OCLPerfScalarReplArrayElem : public OCLTestImp { public: OCLPerfScalarReplArrayElem(); virtual ~OCLPerfScalarReplArrayElem(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(unsigned int idx); void setData(cl_mem buffer, float data); void checkData(cl_mem buffer); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int numReads_; unsigned int shaderIdx_; unsigned int itemWidth_; unsigned int vecTypeIdx_; unsigned int vecSizeIdx_; }; #endif // _OCL_ScalarReplArrayElem_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSdiP2PCopy.cpp000066400000000000000000000245151450307266000252020ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfSdiP2PCopy.h" #include #include "Timer.h" #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 5 // 64KB, 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {65536, 262144, 1048576, 4194304, 16777216}; OCLPerfSdiP2PCopy::OCLPerfSdiP2PCopy() { // If there are two different gpus in the system, // we have to test each of them _numSubTests = 2 * NUM_SIZES; } OCLPerfSdiP2PCopy::~OCLPerfSdiP2PCopy() {} void OCLPerfSdiP2PCopy::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { cl_uint numPlatforms = 0; cl_platform_id platform = NULL; cl_uint num_devices = 0; _crcword = 0; conversion = 1.0f; _openTest = test % NUM_SIZES; bufSize_ = Sizes[_openTest]; error_ = 0; srcBuff_ = 0; inputArr_ = 0; outputArr_ = 0; extPhysicalBuff_ = 0; silentFailure = false; busAddressableBuff_ = 0; devices_[0] = devices_[1] = 0; contexts_[0] = contexts_[1] = 0; cmd_queues_[0] = cmd_queues_[1] = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(numPlatforms == 0, "clGetPlatformIDs failed"); error_ = _wrapper->clGetPlatformIDs(1, &platform, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); error_ = _wrapper->clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices); if (num_devices != 2) { printf( "\nSilent Failure: Two GPUs are required to run OCLPerfSdiP2PCopy " "test\n"); silentFailure = true; return; } error_ = _wrapper->clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, num_devices, devices_, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); if (test >= NUM_SIZES) { cl_device_id temp = devices_[0]; devices_[0] = devices_[1]; devices_[1] = temp; } size_t param_size = 0; char* strExtensions = 0; error_ = _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_EXTENSIONS, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strExtensions = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_EXTENSIONS, param_size, strExtensions, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strstr(strExtensions, "cl_amd_bus_addressable_memory") == 0) { printf( "\nSilent Failure: cl_amd_bus_addressable_memory extension is not " "enabled on GPU 0\n"); silentFailure = true; free(strExtensions); return; } free(strExtensions); error_ = _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_EXTENSIONS, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strExtensions = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_EXTENSIONS, param_size, strExtensions, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strstr(strExtensions, "cl_amd_bus_addressable_memory") == 0) { printf( "\nSilent Failure: cl_amd_bus_addressable_memory extension is not " "enabled on GPU 1\n"); silentFailure = true; free(strExtensions); return; } free(strExtensions); deviceNames_ = " ["; param_size = 0; char* strDeviceName = 0; error_ = _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_NAME, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strDeviceName = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_NAME, param_size, strDeviceName, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); deviceNames_ = deviceNames_ + strDeviceName; free(strDeviceName); error_ = _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_NAME, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strDeviceName = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_NAME, param_size, strDeviceName, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); deviceNames_ = deviceNames_ + "->"; deviceNames_ = deviceNames_ + strDeviceName; free(strDeviceName); deviceNames_ = deviceNames_ + "]"; cl_context_properties props[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform, 0}; contexts_[0] = _wrapper->clCreateContext(props, 1, &devices_[0], 0, 0, &error_); CHECK_RESULT(contexts_[0] == 0, "clCreateContext failed"); contexts_[1] = _wrapper->clCreateContext(props, 1, &devices_[1], 0, 0, &error_); CHECK_RESULT(contexts_[1] == 0, "clCreateContext failed"); cmd_queues_[0] = _wrapper->clCreateCommandQueue(contexts_[0], devices_[0], 0, NULL); CHECK_RESULT(cmd_queues_[0] == 0, "clCreateCommandQueue failed"); cmd_queues_[1] = _wrapper->clCreateCommandQueue(contexts_[1], devices_[1], 0, NULL); CHECK_RESULT(cmd_queues_[1] == 0, "clCreateCommandQueue failed"); busAddressableBuff_ = _wrapper->clCreateBuffer( contexts_[0], CL_MEM_BUS_ADDRESSABLE_AMD, bufSize_, 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); error_ = _wrapper->clEnqueueMakeBuffersResidentAMD( cmd_queues_[0], 1, &busAddressableBuff_, true, &busAddr_, 0, 0, 0); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMakeBuffersResidentAMD failed"); extPhysicalBuff_ = _wrapper->clCreateBuffer( contexts_[1], CL_MEM_EXTERNAL_PHYSICAL_AMD, bufSize_, &busAddr_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); srcBuff_ = _wrapper->clCreateBuffer(contexts_[1], CL_MEM_READ_WRITE, bufSize_, 0, &error_); CHECK_RESULT(error_ != CL_SUCCESS, "clCreateBuffer failed"); inputArr_ = (cl_uint*)malloc(bufSize_); outputArr_ = (cl_uint*)malloc(bufSize_); for (unsigned int i = 0; i < (bufSize_ / sizeof(cl_uint)); ++i) { inputArr_[i] = i + 1; outputArr_[i] = 0; } error_ = _wrapper->clEnqueueWriteBuffer(cmd_queues_[1], srcBuff_, CL_TRUE, 0, bufSize_, inputArr_, 0, 0, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteBuffer failed"); } void OCLPerfSdiP2PCopy::run(void) { if (silentFailure) { return; } CPerfCounter timer; // Warm up error_ = _wrapper->clEnqueueCopyBuffer(cmd_queues_[1], srcBuff_, extPhysicalBuff_, 0, 0, bufSize_, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBuffer failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueCopyBuffer(cmd_queues_[1], srcBuff_, extPhysicalBuff_, 0, 0, bufSize_, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBuffer failed"); } error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); timer.Stop(); double sec = timer.GetElapsedTime(); error_ = _wrapper->clEnqueueReadBuffer(cmd_queues_[0], busAddressableBuff_, CL_TRUE, 0, bufSize_, outputArr_, 0, 0, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteBuffer failed"); CHECK_RESULT((memcmp(inputArr_, outputArr_, bufSize_) != 0), "copy failed"); // Buffer copy bandwidth in GB/s double perf = ((double)bufSize_ * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes) i:%4d (GB/s) %s", bufSize_, NUM_ITER, deviceNames_.c_str()); testDescString = buf; } unsigned int OCLPerfSdiP2PCopy::close(void) { if (srcBuff_) { error_ = _wrapper->clReleaseMemObject(srcBuff_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject failed"); } if (extPhysicalBuff_) { error_ = _wrapper->clReleaseMemObject(extPhysicalBuff_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject failed"); } if (busAddressableBuff_) { error_ = _wrapper->clReleaseMemObject(busAddressableBuff_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject failed"); } if (cmd_queues_[0]) { error_ = _wrapper->clReleaseCommandQueue(cmd_queues_[0]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (cmd_queues_[1]) { error_ = _wrapper->clReleaseCommandQueue(cmd_queues_[1]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (contexts_[0]) { error_ = _wrapper->clReleaseContext(contexts_[0]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (contexts_[1]) { error_ = _wrapper->clReleaseContext(contexts_[1]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (inputArr_) { free(inputArr_); } if (outputArr_) { free(outputArr_); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfSdiP2PCopy.h000066400000000000000000000035451450307266000246470ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_SdiP2PCopy_H_ #define _OCL_SdiP2PCopy_H_ #include "OCLTestImp.h" class OCLPerfSdiP2PCopy : public OCLTestImp { public: OCLPerfSdiP2PCopy(); virtual ~OCLPerfSdiP2PCopy(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: static const unsigned int NUM_ITER = 1024; bool silentFailure; cl_context contexts_[2]; cl_device_id devices_[2]; cl_command_queue cmd_queues_[2]; cl_mem srcBuff_; cl_mem extPhysicalBuff_; cl_mem busAddressableBuff_; cl_int error_; cl_bus_address_amd busAddr_; cl_uint* inputArr_; cl_uint* outputArr_; unsigned int bufSize_; std::string deviceNames_; }; #endif // _OCL_SdiP2PCopy_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfTextureMemLatency.cpp000066400000000000000000000344231450307266000267240ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfTextureMemLatency.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const unsigned int NUM_SIZES = 13; // 2k up to 64MB static const cl_uint2 Dims[NUM_SIZES] = { {{32, 16}}, {{32, 32}}, {{64, 32}}, {{64, 64}}, {{128, 64}}, {{128, 128}}, {{256, 128}}, {{256, 256}}, {{512, 256}}, {{512, 512}}, {{1024, 512}}, {{1024, 1024}}, {{2048, 1024}}}; // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif void OCLPerfTextureMemLatency::genShader() { shader_.clear(); // Adopted from SiSoft Sandra 2013's memory latency test shader_ += "constant sampler_t insample = CLK_NORMALIZED_COORDS_FALSE | " "CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;\n" "__kernel\n" "__attribute__((work_group_size_hint(1, 1, 1)))\n" "void MemWalker(\n" " read_only image2d_t input,\n" " __global uint * restrict output,\n" " const uint uCount, const uint uSize,\n" " const uint4 uOffset, const int bMem, const uint repeats)\n" "{\n" " uint4 o = uOffset;\n" " uint lid = get_local_id(0);\n" " uint4 x = lid*o;\n" "\n" " for (uint loop = 0; (loop < repeats); loop++) {\n" " uint i = uCount;\n" " int2 nx = (int2)(0,0);\n" " nx = (int2)((x.y << 8) | x.x, (x.w << 8) | x.z);\n" " while (i--) {\n" " x = read_imageui(input, insample, nx);\n" " x.x += o.x;\n" " x.z += o.z;\n" " nx = (int2)((x.y << 8) | x.x, (x.w << 8) | x.z);\n" " }\n" " }\n" "\n" " output[0] = x.x + x.y;\n" "}\n"; // printf("shader:\n%s\n", shader_.c_str()); shader_ += "\n\n"; shader_ += "__kernel\n" "__attribute__((work_group_size_hint(1, 1, 1)))\n" "void Overhead(\n" " read_only image2d_t input,\n" " __global uint * restrict output,\n" " const uint uCount, const uint uSize,\n" " const uint4 uOffset, const int bMem, const uint repeats)\n" "{\n" " uint4 o = uOffset;\n" " uint lid = get_local_id(0);\n" " uint4 x = lid*o;\n" " x += o;\n" " int2 nx;\n" " for (uint loop = 0; loop < repeats; loop++) {\n" " uint i = uCount;\n" " nx = (int2)(0,0);\n" " nx = (int2)((x.y << 8) | x.x, (x.w << 8) | x.z);\n" " while (i--) {\n" " x.x = nx.x + o.x;\n" " x.z = nx.y + o.y;\n" " nx = (int2)((x.y << 8) | x.x, (x.w << 8) | x.z);\n" " }\n" " }\n" " output[0] = nx.x | nx.y;\n" "}\n"; } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLPerfTextureMemLatency::OCLPerfTextureMemLatency() { _numSubTests = NUM_SIZES; maxSize_ = Dims[NUM_SIZES - 1].s[0] * Dims[NUM_SIZES - 1].s[1]; } OCLPerfTextureMemLatency::~OCLPerfTextureMemLatency() {} void OCLPerfTextureMemLatency::setData(cl_mem buffer, unsigned int val) { size_t origin[3] = {0, 0, 0}; size_t region[3] = {width_, height_, 1}; void *ptr = _wrapper->clEnqueueMapImage( cmd_queue_, buffer, true, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed."); unsigned int *data = (unsigned int *)ptr; unsigned int nextOffset = 0; for (unsigned int i = 0; i < bufSizeDW_; i++) { unsigned int offset = ((1024 + 17) * (i + 1)) % bufSizeDW_; unsigned int x, y; x = offset % width_; y = offset / width_; unsigned int newx, newy; newx = nextOffset % width_; newy = nextOffset / width_; data[newy * image_row_pitch / sizeof(unsigned int) + newx] = (y << 16) | (x & 0xffff); nextOffset = offset; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); clFinish(cmd_queue_); } void OCLPerfTextureMemLatency::checkData(cl_mem buffer) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, sizeof(cl_uint), 0, NULL, NULL, &error_); unsigned int *data = (unsigned int *)ptr; if (data[0] != 0) { printf("OutData= 0x%08x\n", data[0]); CHECK_RESULT_NO_RETURN(data[0] != 0, "Data validation failed!\n"); } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); clFinish(cmd_queue_); } void OCLPerfTextureMemLatency::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; _errorFlag = false; // Reset error code so a single error doesn't prevent // other subtests from running _errorMsg = ""; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { } } delete platforms; } width_ = Dims[test % NUM_SIZES].s[0]; height_ = Dims[test % NUM_SIZES].s[1]; bufSizeDW_ = width_ * height_; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); device = devices[0]; free(devices); context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_image_format format = {CL_RGBA, CL_UNSIGNED_INT8}; inBuffer_ = _wrapper->clCreateImage2D(context_, CL_MEM_READ_ONLY, &format, width_, height_, 0, NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateImage(inBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, sizeof(cl_uint), NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); genShader(); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "MemWalker", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel(MemWalker) failed"); kernel2_ = _wrapper->clCreateKernel(program_, "Overhead", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel(Overhead) failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_uint), (void *)&bufSizeDW_); cl_uint4 zero; zero.s[0] = 0; zero.s[1] = 0; zero.s[2] = 0; zero.s[3] = 0; error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_uint4), (void *)&zero); int bMem = 1; error_ = _wrapper->clSetKernelArg(kernel_, 5, sizeof(cl_int), (void *)&bMem); repeats_ = std::max((maxSize_ >> 2) / bufSizeDW_, 1u); error_ = _wrapper->clSetKernelArg(kernel_, 6, sizeof(cl_uint), (void *)&repeats_); error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel2_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel2_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel2_, 3, sizeof(cl_uint), (void *)&bufSizeDW_); error_ = _wrapper->clSetKernelArg(kernel2_, 4, sizeof(cl_uint4), (void *)&zero); error_ = _wrapper->clSetKernelArg(kernel2_, 5, sizeof(cl_int), (void *)&bMem); error_ = _wrapper->clSetKernelArg(kernel2_, 6, sizeof(cl_uint), (void *)&repeats_); setData(inBuffer_, (int)1.0f); } void OCLPerfTextureMemLatency::run(void) { int global = 1; int local = 1; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; // Warm-up unsigned int warmup = 128; error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&warmup); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&bufSizeDW_); _wrapper->clFinish(cmd_queue_); CPerfCounter timer, timer2; timer.Reset(); timer.Start(); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer.Stop(); checkData(outBuffer_); timer2.Reset(); timer2.Start(); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel2_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); _wrapper->clFinish(cmd_queue_); timer2.Stop(); double sec = timer.GetElapsedTime() - timer2.GetElapsedTime(); // Read latency in ns double perf = sec * (double)(1e09) / ((double)bufSizeDW_ * (double)repeats_); _perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), "%8d reads, %5d repeats (ns)", bufSizeDW_, repeats_); testDescString = buf; } unsigned int OCLPerfTextureMemLatency::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (kernel2_) { error_ = _wrapper->clReleaseKernel(kernel2_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfTextureMemLatency.h000066400000000000000000000037741450307266000263760ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_TEXTUREMEMLATENCY_H_ #define _OCL_TEXTUREMEMLATENCY_H_ #include "OCLTestImp.h" class OCLPerfTextureMemLatency : public OCLTestImp { public: OCLPerfTextureMemLatency(); virtual ~OCLPerfTextureMemLatency(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(void); void setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_kernel kernel2_; cl_mem inBuffer_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int height_; size_t image_row_pitch; size_t image_slice_pitch; unsigned int bufSizeDW_; unsigned int repeats_; unsigned int maxSize_; }; #endif // _OCL_TEXTUREMEMLATENCY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfUAVReadSpeed.cpp000066400000000000000000000534331450307266000255170ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfUAVReadSpeed.h" #include #include #include #include "CL/cl.h" #include "Timer.h" static const unsigned int NUM_SIZES = 4; static const unsigned int NUM_READ_MODES = 6; // Limit to 32 reads for now static const unsigned int MAX_READ_MODES = 4; static const unsigned int NumReads[NUM_READ_MODES] = {1, 4, 16, 32, 64, 128}; // 256KB, 1 MB, 4MB, 16 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; static const unsigned int MaxTypes = 6; static unsigned int NumTypes = MaxTypes; static const char *types[MaxTypes] = {"char", "short", "int", "long", "float", "double"}; static unsigned int StartType = 0; static const unsigned int NumVecWidths = 5; static const char *vecWidths[NumVecWidths] = {"", "2", "4", "8", "16"}; static const unsigned int TypeSize[MaxTypes] = { sizeof(cl_char), sizeof(cl_short), sizeof(cl_int), sizeof(cl_long), sizeof(cl_float), sizeof(cl_double)}; #define CHAR_BUF_SIZE 512 // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif void OCLPerfUAVReadSpeed::genShader(unsigned int type, unsigned int vecWidth, unsigned int numReads) { char buf[CHAR_BUF_SIZE]; shader_.clear(); shader_ += "#ifdef USE_ARENA\n" "#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable\n" "#endif\n"; shader_ += "#ifdef USE_AMD_DOUBLES\n" "#pragma OPENCL EXTENSION cl_amd_fp64 : enable\n" "#endif\n"; shader_ += "#ifdef USE_KHR_DOUBLES\n" "#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n" "#endif\n"; SNPRINTF(buf, CHAR_BUF_SIZE, "__kernel void __attribute__((reqd_work_group_size(64,1,1))) " "_uavReadSpeed(__global %s%s * restrict inBuf, __global %s%s * " "restrict outBuf, constant uint * restrict constBuf)\n", types[type], vecWidths[vecWidth], types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += "{\n" " uint i = (uint) get_global_id(0);\n"; if (numReads == 1) { SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += " const unsigned int Max = constBuf[0];\n" " temp = *(inBuf + i % Max);\n"; shader_ += " *(outBuf + i) = temp;\n" "}\n"; } else { SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp0 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp1 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp2 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp3 = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += " const unsigned int Max = constBuf[0];\n" " unsigned int idx0 = (i % Max) + constBuf[1];\n" " unsigned int idx1 = (i % Max) + constBuf[2];\n" " unsigned int idx2 = (i % Max) + constBuf[3];\n" " unsigned int idx3 = (i % Max) + constBuf[4];\n"; for (unsigned int i = 0; i < (numReads >> 2); i++) { shader_ += " temp0 += *(inBuf + idx0);\n"; shader_ += " temp1 += *(inBuf + idx1);\n"; shader_ += " temp2 += *(inBuf + idx2);\n"; shader_ += " temp3 += *(inBuf + idx3);\n"; shader_ += " idx0 += constBuf[5];\n"; shader_ += " idx1 += constBuf[5];\n"; shader_ += " idx2 += constBuf[5];\n"; shader_ += " idx3 += constBuf[5];\n"; } shader_ += " *(outBuf + i) = temp0 + temp1 + temp2 + temp3;\n" "}\n"; } // printf("shader:\n%s\n", shader_.c_str()); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLPerfUAVReadSpeed::OCLPerfUAVReadSpeed() { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; context_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); // Get last for default platform = platforms[numPlatforms - 1]; for (unsigned i = 0; i < numPlatforms; ++i) { char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[i], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of // just returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { platform = platforms[i]; break; } } delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); char *p = strstr(charbuf, "cl_khr_byte_addressable_store"); char *p2 = strstr(charbuf, "cl_khr_fp64"); char *p3 = strstr(charbuf, "cl_amd_fp64"); NumTypes = MaxTypes; if (!p) { // No arena ops NumTypes -= 2; StartType = 2; } if (!p2 && !p3) { // Doubles not supported NumTypes--; } _numSubTests = NumTypes * NumVecWidths * NUM_SIZES * MAX_READ_MODES * 2; if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } } OCLPerfUAVReadSpeed::~OCLPerfUAVReadSpeed() {} // Fill with 1s of appropriate type void OCLPerfUAVReadSpeed::setData(cl_mem buffer, float val) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); switch (typeIdx_) { case 0: // char { char *data = (char *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(char)); i++) data[i] = (char)val; break; } case 1: // short { short *data = (short *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(short)); i++) data[i] = (short)val; break; } case 2: // int { int *data = (int *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(int)); i++) data[i] = (int)val; break; } case 3: // long { cl_long *data = (cl_long *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(cl_long)); i++) data[i] = (cl_long)val; break; } case 4: // float { float *data = (float *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(float)); i++) data[i] = val; break; } case 5: // double { double *data = (double *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(double)); i++) data[i] = (double)val; break; } default: // oops break; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); } void OCLPerfUAVReadSpeed::checkData(cl_mem buffer) { void *ptr = _wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); switch (typeIdx_) { case 0: // char { char *data = (char *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(char)); i++) { if (data[i] != (char)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 1: // short { short *data = (short *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(short)); i++) { if (data[i] != (short)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 2: // int { int *data = (int *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(int)); i++) { if (data[i] != (int)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 3: // long { cl_long *data = (cl_long *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(cl_long)); i++) { if (data[i] != (cl_long)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 4: // float { float *data = (float *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(float)); i++) { if (data[i] != (float)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } case 5: // double { double *data = (double *)ptr; for (unsigned int i = 0; i < (bufSize_ / sizeof(double)); i++) { if (data[i] != (double)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } break; } default: // oops break; } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, ptr, 0, NULL, NULL); } void OCLPerfUAVReadSpeed::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; constBuffer_ = 0; isAMD = false; _errorFlag = false; // Reset error code so a single error doesn't prevent // other subtests from running _errorMsg = ""; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } numReads_ = NumReads[test % MAX_READ_MODES]; width_ = Sizes[(test / MAX_READ_MODES) % NUM_SIZES]; vecSizeIdx_ = (test / (MAX_READ_MODES * NUM_SIZES)) % NumVecWidths; typeIdx_ = (test / (MAX_READ_MODES * NUM_SIZES * NumVecWidths)) % NumTypes + StartType; cached_ = (test >= (MAX_READ_MODES * NUM_SIZES * NumTypes * NumVecWidths)); bufSize_ = width_; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); device = devices[0]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); inBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); constBuffer_ = _wrapper->clCreateBuffer(context_, 0, 16 * 2, NULL, &error_); CHECK_RESULT(constBuffer_ == 0, "clCreateBuffer(constBuffer) failed"); genShader(typeIdx_, vecSizeIdx_, numReads_); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); if (cached_ && isAMD) { args = "-fno-alias "; } if (typeIdx_ < 2) { args += "-D USE_ARENA "; } if (typeIdx_ == 5) { if (isAMD) { args += "-D USE_AMD_DOUBLES "; } else { args += "-D USE_KHR_DOUBLES "; } } #if 0 // This setting can dramatically boost the long16 perf results by avoiding spilling. if (isAMD) args += "-Wb,-pre-RA-sched=list-tdrr"; #endif error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_uavReadSpeed", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_mem), (void *)&constBuffer_); setData(inBuffer_, 1.0f); setData(outBuffer_, 1.2345678f); unsigned int *cBuf = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, constBuffer_, true, CL_MAP_WRITE, 0, 16 * 2, 0, NULL, NULL, &error_); // Force all wavefronts to fetch the same data. We are looking for peak speed // here. cBuf[0] = 64; // These values are chosen to assure there is no data reuse within a clause. // If caching is not working, then the uncached numbers will be low. cBuf[1] = 0; cBuf[2] = 64; cBuf[3] = 128; cBuf[4] = 192; cBuf[5] = 0; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, constBuffer_, cBuf, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); } void OCLPerfUAVReadSpeed::run(void) { int global = bufSize_ / (TypeSize[typeIdx_] * (1 << vecSizeIdx_)); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // Constant bandwidth in GB/s double perf = ((double)bufSize_ * numReads_ * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; char buf2[256]; SNPRINTF(buf, sizeof(buf), "%s%s", types[typeIdx_], vecWidths[vecSizeIdx_]); SNPRINTF(buf2, sizeof(buf2), " %-8s (%8d) %2d reads: %-8s (GB/s) ", buf, width_, numReads_, (cached_ ? "cached" : "uncached")); testDescString = buf2; checkData(outBuffer_); } unsigned int OCLPerfUAVReadSpeed::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (constBuffer_) { error_ = _wrapper->clReleaseMemObject(constBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(constBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfUAVReadSpeed.h000066400000000000000000000040741450307266000251610ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_UAVReadSpeed_H_ #define _OCL_UAVReadSpeed_H_ #include "OCLTestImp.h" class OCLPerfUAVReadSpeed : public OCLTestImp { public: OCLPerfUAVReadSpeed(); virtual ~OCLPerfUAVReadSpeed(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(unsigned int type, unsigned int vecWidth, unsigned int numReads); void setData(cl_mem buffer, float data); void checkData(cl_mem buffer); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem inBuffer_; cl_mem outBuffer_; cl_mem constBuffer_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int vecSizeIdx_; unsigned int numReads_; unsigned int typeIdx_; bool cached_; bool isAMD; }; #endif // _OCL_UAVReadSpeed_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfUAVReadSpeedHostMem.cpp000066400000000000000000000370221450307266000270100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfUAVReadSpeedHostMem.h" #include #include #include #include "CL/cl.h" #include "Timer.h" const unsigned int NUM_SIZES = 4; const unsigned int NUM_READ_MODES = 1; const unsigned int MAX_READ_MODES = 1; static const unsigned int NumReads[NUM_READ_MODES] = {1}; // 256KB, 1 MB, 4MB, 16 MB and 64 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; static const unsigned int MaxTypes = 2; static unsigned int NumTypes = MaxTypes; static const char *types[MaxTypes] = {"float", "double"}; static const unsigned int TypeSize[MaxTypes] = {sizeof(cl_float), sizeof(cl_double)}; static const unsigned int NumVecWidths = 5; static const char *vecWidths[NumVecWidths] = {"", "2", "4", "8", "16"}; #define CHAR_BUF_SIZE 512 // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif void OCLPerfUAVReadSpeedHostMem::genShader(unsigned int type, unsigned int vecWidth, unsigned int numReads) { char buf[CHAR_BUF_SIZE]; shader_.clear(); shader_ += "#ifdef USE_AMD_DOUBLES\n" "#pragma OPENCL EXTENSION cl_amd_fp64 : enable\n" "#endif\n"; shader_ += "#ifdef USE_KHR_DOUBLES\n" "#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n" "#endif\n"; SNPRINTF(buf, CHAR_BUF_SIZE, "__kernel void _uavReadSpeedHostMem(__global %s%s *inBuf, __global " "%s%s *outBuf, constant uint *constBuf)\n", types[type], vecWidths[vecWidth], types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += "{\n" " int i = (int) get_global_id(0);\n"; SNPRINTF(buf, CHAR_BUF_SIZE, " %s%s temp = 0;\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += " temp = *(inBuf + i);\n"; if (vecWidth == 0) { shader_ += " if (temp < 0)\n" " *(outBuf + i) = temp;\n" "}\n"; } else { shader_ += " if (temp.s0 < 0)\n" " *(outBuf + i) = temp;\n" "}\n"; } // printf("shader:\n%s\n", shader_.c_str()); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLPerfUAVReadSpeedHostMem::OCLPerfUAVReadSpeedHostMem() { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; context_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); // Get last for default platform = platforms[numPlatforms - 1]; for (unsigned i = 0; i < numPlatforms; ++i) { char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[i], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of // just returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { platform = platforms[i]; break; } } delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); char *p = strstr(charbuf, "cl_khr_fp64"); char *p2 = strstr(charbuf, "cl_amd_fp64"); NumTypes = MaxTypes; if (!p && !p2) { // Doubles not supported NumTypes--; } _numSubTests = NumTypes * NumVecWidths * NUM_SIZES * MAX_READ_MODES; if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } } OCLPerfUAVReadSpeedHostMem::~OCLPerfUAVReadSpeedHostMem() {} void OCLPerfUAVReadSpeedHostMem::setData(cl_mem buffer, float val) { float *data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } void OCLPerfUAVReadSpeedHostMem::checkData(cl_mem buffer) { float *data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) { if (data[i] != (float)numReads_) { printf("Data validation failed at index %d!\n", i); printf("Expected %d %d %d %d\nGot %d %d %d %d\n", numReads_, numReads_, numReads_, numReads_, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } void OCLPerfUAVReadSpeedHostMem::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; constBuffer_ = 0; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } numReads_ = NumReads[test % MAX_READ_MODES]; width_ = Sizes[(test / MAX_READ_MODES) % NUM_SIZES]; vecSizeIdx_ = (test / (MAX_READ_MODES * NUM_SIZES)) % NumVecWidths; typeIdx_ = (test / (MAX_READ_MODES * NUM_SIZES * NumVecWidths)) % NumTypes; cached_ = true; bufSize_ = width_; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); device = devices[0]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); inBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, bufSize_, NULL, &error_); CHECK_RESULT(inBuffer_ == 0, "clCreateBuffer(inBuffer) failed"); outBuffer_ = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); constBuffer_ = _wrapper->clCreateBuffer(context_, 0, 16 * 2, NULL, &error_); CHECK_RESULT(constBuffer_ == 0, "clCreateBuffer(constBuffer) failed"); genShader(typeIdx_, vecSizeIdx_, numReads_); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); if (cached_ && isAMD) { args = "-fno-alias "; } if (typeIdx_ == 1) { if (isAMD) { args += "-D USE_AMD_DOUBLES "; } else { args += "-D USE_KHR_DOUBLES "; } } error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_uavReadSpeedHostMem", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_mem), (void *)&constBuffer_); setData(inBuffer_, 0.0f); setData(outBuffer_, 1.2345678f); unsigned int *cBuf = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, constBuffer_, true, CL_MAP_WRITE, 0, 16 * 2, 0, NULL, NULL, &error_); cBuf[0] = bufSize_ / (TypeSize[typeIdx_] * (1 << vecSizeIdx_)); cBuf[1] = 0; cBuf[2] = 1024; cBuf[3] = 2048; cBuf[4] = 3072; cBuf[5] = 0; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, constBuffer_, cBuf, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); } void OCLPerfUAVReadSpeedHostMem::run(void) { int global = bufSize_ / (TypeSize[typeIdx_] * (1 << vecSizeIdx_)); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // Constant bandwidth in GB/s double perf = ((double)bufSize_ * numReads_ * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; char buf2[256]; SNPRINTF(buf, sizeof(buf), "%s%s", types[typeIdx_], vecWidths[vecSizeIdx_]); SNPRINTF(buf2, sizeof(buf2), " %-8s (%8d) (GB/s) ", buf, width_); testDescString = buf2; // Test doesn't write anything // checkData(outBuffer_); } unsigned int OCLPerfUAVReadSpeedHostMem::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { error_ = _wrapper->clReleaseMemObject(inBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (constBuffer_) { error_ = _wrapper->clReleaseMemObject(constBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(constBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfUAVReadSpeedHostMem.h000066400000000000000000000041461450307266000264560ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_UAVReadSpeedHostMem_H_ #define _OCL_UAVReadSpeedHostMem_H_ #include "OCLTestImp.h" class OCLPerfUAVReadSpeedHostMem : public OCLTestImp { public: OCLPerfUAVReadSpeedHostMem(); virtual ~OCLPerfUAVReadSpeedHostMem(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(unsigned int type, unsigned int vecWidth, unsigned int numReads); void setData(cl_mem buffer, float data); void checkData(cl_mem buffer); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem inBuffer_; cl_mem outBuffer_; cl_mem constBuffer_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int vecSizeIdx_; unsigned int numReads_; unsigned int typeIdx_; bool isAMD; bool cached_; }; #endif // _OCL_UAVReadSpeedHostMem_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfUAVWriteSpeedHostMem.cpp000066400000000000000000000323061450307266000272270ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfUAVWriteSpeedHostMem.h" #include #include #include #include "CL/cl.h" #include "Timer.h" const unsigned int NUM_SIZES = 4; // 256KB, 1 MB, 4MB, 16 MB and 64 MB static const unsigned int Sizes[NUM_SIZES] = {262144, 1048576, 4194304, 16777216}; static const unsigned int MaxTypes = 2; static unsigned int NumTypes = 2; static const char *types[MaxTypes] = {"float", "double"}; static const unsigned int TypeSize[MaxTypes] = {sizeof(cl_float), sizeof(cl_double)}; static const unsigned int NumVecWidths = 5; static const char *vecWidths[NumVecWidths] = {"", "2", "4", "8", "16"}; #define CHAR_BUF_SIZE 512 // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif void OCLPerfUAVWriteSpeedHostMem::genShader(unsigned int type, unsigned int vecWidth) { char buf[CHAR_BUF_SIZE]; shader_.clear(); shader_ += "#ifdef USE_AMD_DOUBLES\n" "#pragma OPENCL EXTENSION cl_amd_fp64 : enable\n" "#endif\n"; shader_ += "#ifdef USE_KHR_DOUBLES\n" "#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n" "#endif\n"; SNPRINTF(buf, CHAR_BUF_SIZE, "__kernel void _uavWriteSpeedHostMem(__global %s%s *outBuf)\n", types[type], vecWidths[vecWidth]); shader_.append(buf); shader_ += "{\n" " int i = (int) get_global_id(0);\n" " *(outBuf + i) = 0;\n" "}\n"; // printf("shader:\n%s\n", shader_.c_str()); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLPerfUAVWriteSpeedHostMem::OCLPerfUAVWriteSpeedHostMem() { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; context_ = 0; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); // Get last for default platform = platforms[numPlatforms - 1]; for (unsigned i = 0; i < numPlatforms; ++i) { char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[i], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of // just returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { platform = platforms[i]; break; } } delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); char *p = strstr(charbuf, "cl_khr_fp64"); char *p2 = strstr(charbuf, "cl_amd_fp64"); NumTypes = MaxTypes; if (!p && !p2) { // Doubles not supported NumTypes--; } _numSubTests = NumTypes * NumVecWidths * NUM_SIZES; if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } } OCLPerfUAVWriteSpeedHostMem::~OCLPerfUAVWriteSpeedHostMem() {} void OCLPerfUAVWriteSpeedHostMem::setData(cl_mem buffer, float val) { float *data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); } void OCLPerfUAVWriteSpeedHostMem::checkData(cl_mem buffer) { float *data = (float *)_wrapper->clEnqueueMapBuffer(cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < (bufSize_ >> 2); i++) { if (data[i] != 0.0f) { printf("Data validation failed at index %d!\n", i); printf("Expected %lf %lf %lf %lf\nGot %d %d %d %d\n", 0.0f, 0.0f, 0.0f, 0.0f, (unsigned int)data[i], (unsigned int)data[i + 1], (unsigned int)data[i + 2], (unsigned int)data[i + 3]); CHECK_RESULT_NO_RETURN(0, "Data validation failed!\n"); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); _wrapper->clFinish(cmd_queue_); } void OCLPerfUAVWriteSpeedHostMem::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; outBuffer_ = 0; isAMD = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } width_ = Sizes[test % NUM_SIZES]; vecSizeIdx_ = (test / NUM_SIZES) % NumVecWidths; typeIdx_ = (test / (NUM_SIZES * NumVecWidths)) % NumTypes; bufSize_ = width_; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); device = devices[0]; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); outBuffer_ = _wrapper->clCreateBuffer( context_, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_ == 0, "clCreateBuffer(outBuffer) failed"); genShader(typeIdx_, vecSizeIdx_); char *tmp = (char *)shader_.c_str(); program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&tmp, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); std::string args; args.clear(); if (typeIdx_ == 1) { if (isAMD) { args += "-D USE_AMD_DOUBLES "; } else { args += "-D USE_KHR_DOUBLES "; } } error_ = _wrapper->clBuildProgram(program_, 1, &device, args.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "_uavWriteSpeedHostMem", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&outBuffer_); setData(outBuffer_, 1.2345678f); } void OCLPerfUAVWriteSpeedHostMem::run(void) { int global = bufSize_ / (TypeSize[typeIdx_] * (1 << vecSizeIdx_)); int local = 64; size_t global_work_size[1] = {(size_t)global}; size_t local_work_size[1] = {(size_t)local}; CPerfCounter timer; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); } _wrapper->clFinish(cmd_queue_); timer.Stop(); double sec = timer.GetElapsedTime(); // Constant bandwidth in GB/s double perf = ((double)bufSize_ * NUM_ITER * (double)(1e-09)) / sec; _perfInfo = (float)perf; char buf[256]; char buf2[256]; SNPRINTF(buf, sizeof(buf), "%s%s", types[typeIdx_], vecWidths[vecSizeIdx_]); SNPRINTF(buf2, sizeof(buf2), " %-8s (%8d) (GB/s) ", buf, width_); testDescString = buf2; // Test just writes 0s checkData(outBuffer_); } unsigned int OCLPerfUAVWriteSpeedHostMem::close(void) { _wrapper->clFinish(cmd_queue_); if (outBuffer_) { error_ = _wrapper->clReleaseMemObject(outBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfUAVWriteSpeedHostMem.h000066400000000000000000000037571450307266000267040ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_UAVWriteSpeedHostMem_H_ #define _OCL_UAVWriteSpeedHostMem_H_ #include "OCLTestImp.h" class OCLPerfUAVWriteSpeedHostMem : public OCLTestImp { public: OCLPerfUAVWriteSpeedHostMem(); virtual ~OCLPerfUAVWriteSpeedHostMem(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; void genShader(unsigned int type, unsigned int vecWidth); void setData(cl_mem buffer, float data); void checkData(cl_mem buffer); static const unsigned int NUM_ITER = 100; cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem outBuffer_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int vecSizeIdx_; unsigned int typeIdx_; bool isAMD; }; #endif // _OCL_UAVWriteSpeedHostMem_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfUncoalescedRead.cpp000066400000000000000000000246231450307266000263270ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfUncoalescedRead.h" #include #include #include #include "Timer.h" const char* OCLPerfUncoalescedRead::kernel_str = "#define NUM_READS 32\n\ __kernel void read_uncoalescing(__global float *input,__global float *output)\n\ {\n\ float val = (float)(0.0f);\n\ size_t gid = get_global_id(0);\n\ val = val + input[gid * NUM_READS + 0];\n\ val = val + input[gid * NUM_READS + 1];\n\ val = val + input[gid * NUM_READS + 2];\n\ val = val + input[gid * NUM_READS + 3];\n\ val = val + input[gid * NUM_READS + 4];\n\ val = val + input[gid * NUM_READS + 5];\n\ val = val + input[gid * NUM_READS + 6];\n\ val = val + input[gid * NUM_READS + 7];\n\ val = val + input[gid * NUM_READS + 8];\n\ val = val + input[gid * NUM_READS + 9];\n\ val = val + input[gid * NUM_READS + 10];\n\ val = val + input[gid * NUM_READS + 11];\n\ val = val + input[gid * NUM_READS + 12];\n\ val = val + input[gid * NUM_READS + 13];\n\ val = val + input[gid * NUM_READS + 14];\n\ val = val + input[gid * NUM_READS + 15];\n\ val = val + input[gid * NUM_READS + 16];\n\ val = val + input[gid * NUM_READS + 17];\n\ val = val + input[gid * NUM_READS + 18];\n\ val = val + input[gid * NUM_READS + 19];\n\ val = val + input[gid * NUM_READS + 20];\n\ val = val + input[gid * NUM_READS + 21];\n\ val = val + input[gid * NUM_READS + 22];\n\ val = val + input[gid * NUM_READS + 23];\n\ val = val + input[gid * NUM_READS + 24];\n\ val = val + input[gid * NUM_READS + 25];\n\ val = val + input[gid * NUM_READS + 26];\n\ val = val + input[gid * NUM_READS + 27];\n\ val = val + input[gid * NUM_READS + 28];\n\ val = val + input[gid * NUM_READS + 29];\n\ val = val + input[gid * NUM_READS + 30];\n\ val = val + input[gid * NUM_READS + 31];\n\ output[gid] = val;\n\ }\n"; OCLPerfUncoalescedRead::OCLPerfUncoalescedRead() { _numSubTests = 3; } OCLPerfUncoalescedRead::~OCLPerfUncoalescedRead() {} void OCLPerfUncoalescedRead::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "error_ opening test"); silentFailure = false; _openTest = test; program_ = 0; kernel_ = 0; input_buff = NULL; if (test > 0) { size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo( devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); strVersion = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); if (strVersion[9] < '2') { printf("\nOpenCL C 2.0 not supported\n"); silentFailure = true; } free(strVersion); if (silentFailure) return; } cl_mem buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, SIZE * NUM_READS * sizeof(cl_float), 0, &error_); buffers_.push_back(buffer); buffer = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, SIZE * sizeof(cl_float), 0, &error_); buffers_.push_back(buffer); srand(0x8956); input_buff = (float*)malloc(SIZE * NUM_READS * sizeof(float)); for (unsigned int i = 0; i < SIZE * NUM_READS; ++i) { input_buff[i] = (float)rand(); } error_ = _wrapper->clEnqueueWriteBuffer( cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, SIZE * NUM_READS * sizeof(cl_float), input_buff, 0, 0, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); float* buff = (float*)_wrapper->clEnqueueMapBuffer( cmdQueues_[_deviceId], buffers_[1], CL_TRUE, CL_MAP_WRITE, 0, SIZE * sizeof(cl_float), 0, 0, 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapBuffer failed"); memset(buff, 0, SIZE * sizeof(cl_float)); error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], buffers_[1], buff, 0, 0, 0); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); std::string compileOptions = ""; if (test > 0) { compileOptions = "-cl-std=CL2.0"; } if (test > 1) { compileOptions += " -fsc-use-buffer-for-hsa-global "; } error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], compileOptions.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "read_uncoalescing", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void*)&buffers_[1]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); } void OCLPerfUncoalescedRead::validate(void) { bool success = true; float* buff = (float*)_wrapper->clEnqueueMapBuffer( cmdQueues_[_deviceId], buffers_[1], CL_TRUE, CL_MAP_READ, 0, SIZE * sizeof(cl_float), 0, 0, 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapBuffer failed"); for (unsigned int i = 0; i < SIZE; ++i) { volatile float val = 0; for (int j = 0; j < NUM_READS; ++j) { val += input_buff[i * NUM_READS + j]; } if (val != buff[i]) { success = false; std::string errorMsg = "Invalid result. Expected: "; errorMsg += std::to_string(val); errorMsg += " Actual result: "; errorMsg += std::to_string(buff[i]); CHECK_RESULT(true, errorMsg.c_str()); break; } } error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], buffers_[1], buff, 0, 0, 0); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapBuffer failed"); } void OCLPerfUncoalescedRead::run(void) { if (silentFailure) { return; } CPerfCounter timer; // Warm up size_t workGroupSize = SIZE; for (int i = 0; i < 50; ++i) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &workGroupSize, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); } cl_event eventArr[NUM_ITER]; timer.Reset(); timer.Start(); for (unsigned int i = 0; i < NUM_ITER; i++) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &workGroupSize, NULL, 0, NULL, &eventArr[i]); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); } error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_RESULT(error_, "clFinish failed"); timer.Stop(); double sec1 = timer.GetElapsedTime(); double sec2 = 0; for (unsigned int i = 0; i < NUM_ITER; ++i) { cl_ulong startTime = 0, endTime = 0; error_ = _wrapper->clGetEventProfilingInfo(eventArr[i], CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, 0); CHECK_RESULT(error_, "clGetEventProfilingInfo failed"); error_ = _wrapper->clGetEventProfilingInfo( eventArr[i], CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, 0); CHECK_RESULT(error_, "clGetEventProfilingInfo failed"); sec2 += 1e-9 * (endTime - startTime); error_ = _wrapper->clReleaseEvent(eventArr[i]); CHECK_RESULT(error_, "clReleaseEvent failed"); } validate(); // Buffer copy bandwidth in GB/s double perf1 = ((double)SIZE * NUM_READS * NUM_ITER * sizeof(cl_float) * (double)(1e-09)) / sec1; double perf2 = ((double)SIZE * NUM_READS * NUM_ITER * sizeof(cl_float) * (double)(1e-09)) / sec2; _perfInfo = (float)perf2; std::ostringstream strStream; switch (_openTest) { case 0: strStream << "OCL1.2 "; break; case 1: strStream << "OCL2.0 "; break; case 2: strStream << "OCL2.0/flag "; break; } strStream << std::fixed << std::setprecision(2) << perf1 << " timer GB/s "; strStream << "time: " << std::setprecision(3) << sec1 << "s (profile GB/s)"; testDescString = strStream.str(); ; } unsigned int OCLPerfUncoalescedRead::close(void) { if (input_buff) { free(input_buff); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfUncoalescedRead.h000066400000000000000000000033271450307266000257720ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_UncoalescedRead_H_ #define _OCL_UncoalescedRead_H_ #include "OCLTestImp.h" #define NUM_READS 32 class OCLPerfUncoalescedRead : public OCLTestImp { public: OCLPerfUncoalescedRead(); virtual ~OCLPerfUncoalescedRead(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: static const unsigned int NUM_ITER = 1000; static const unsigned int SIZE = 250000; static const char* kernel_str; bool silentFailure; float* input_buff; void validate(void); }; #endif // _OCL_UncoalescedRead_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfVerticalFetch.cpp000066400000000000000000000303621450307266000260260ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfVerticalFetch.h" #include #include #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 1 #define WIDTH 4952 #define HEIGHT 3288 unsigned int Sizes[NUM_SIZES] = {WIDTH * HEIGHT * 4}; #define KERNEL_CODE(...) #__VA_ARGS__ const static char* strKernel = KERNEL_CODE( \n __kernel void ResizeVerticalFilter( const __global uint* inputImage, const unsigned int inputColumns, const unsigned int inputRows, __local uint* inputImageCache, const int numCachedPixels, __global uint* dst) { const unsigned int startY = get_group_id(1) * get_local_size(1); float scale = 0.5f; const float support = 0.5f; const int cacheRangeStartY = max((int)((startY + 0.5f) / 1.0f + support + 0.5f), (int)(0)); const int cacheRangeEndY = min((int)(cacheRangeStartY + numCachedPixels), (int)inputRows); const unsigned int x = get_global_id(0); event_t e = async_work_group_strided_copy( inputImageCache, inputImage + cacheRangeStartY * inputColumns + x, cacheRangeEndY - cacheRangeStartY, inputColumns, 0); wait_group_events(1, &e); if (get_local_id(1) == 0) { // uint sum = 0; // for (unsigned int chunk = 0; chunk < numCachedPixels; chunk++) { // sum += inputImageCache[chunk]; // } atomic_add(dst, inputImageCache[0]); } } \n); OCLPerfVerticalFetch::OCLPerfVerticalFetch() { ptr_ = nullptr; _numSubTests = 6; } OCLPerfVerticalFetch::~OCLPerfVerticalFetch() {} static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPerfVerticalFetch::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { error_ = CL_SUCCESS; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = 0; kernel_ = 0; skip_ = false; dstBuffer_ = 0; cl_ulong loopCnt = nBytes / (16 * sizeof(cl_uint)); cl_uint maxCUs; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(cl_uint), &maxCUs, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); wgs = 64; const static cl_uint wavesPerCU = 8; nWorkItems = maxCUs * wavesPerCU * wgs; uint32_t memLoc = CL_MEM_USE_HOST_PTR; inputData = 0x1; switch (test) { case 0: nIter = 1; mem_type_ = "UHP"; break; case 1: nIter = 100; mem_type_ = "UHP"; break; case 2: nIter = 1; memLoc = CL_MEM_ALLOC_HOST_PTR; mem_type_ = "AHP"; break; case 3: nIter = 100; memLoc = CL_MEM_ALLOC_HOST_PTR; mem_type_ = "AHP"; break; case 4: nIter = 1; memLoc = 0; mem_type_ = "dev"; break; case 5: nIter = 1000; memLoc = 0; mem_type_ = "dev"; break; } std::string nameFile("dim.ini"); std::fstream is(nameFile.c_str(), std::fstream::in | std::fstream::binary); std::string line; if (is.is_open()) { size_t posStart = 0; do { std::getline(is, line); } while (line.find_first_of('/', posStart) != std::string::npos); // Find global/local posStart = 0; size_t posEnd = 1; std::string dimS = line.substr(posStart, posEnd - posStart); dim = std::stoi(dimS.c_str(), nullptr, 10); posStart = posEnd; posEnd = line.find_first_of('[', posStart); for (cl_uint i = 0; i < dim; ++i) { posStart = posEnd + 1; posEnd = line.find_first_of(',', posStart); std::string global = line.substr(posStart, posEnd - posStart); gws[i] = std::stoi(global.c_str(), nullptr, 10); } posEnd = line.find_first_of('[', posStart); for (cl_uint i = 0; i < dim; ++i) { posStart = posEnd + 1; posEnd = line.find_first_of(',', posStart); std::string global = line.substr(posStart, posEnd - posStart); lws[i] = std::stoi(global.c_str(), nullptr, 10); } posEnd = line.find_first_of('[', posStart); posStart = posEnd + 1; posEnd = line.find_first_of(',', posStart); std::string global = line.substr(posStart, posEnd - posStart); numCachedPixels_ = std::stoi(global.c_str(), nullptr, 10); is.close(); } else { dim = 2; gws[0] = WIDTH; gws[1] = 512; lws[0] = 1; lws[1] = 256; numCachedPixels_ = 1676; } cl_uint width = static_cast(gws[0]); cl_uint height = numCachedPixels_ * static_cast(gws[1] / lws[1]); if (gws[1] > 512) { gws[1] = 512; } Sizes[0] = width * height * sizeof(int); nBytes = Sizes[0]; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "ResizeVerticalFilter", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); if (memLoc == CL_MEM_USE_HOST_PTR) { ptr_ = malloc(nBytes); } srcBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY | memLoc, nBytes, ptr_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer(srcBuffer) failed"); void* mem; mem = _wrapper->clEnqueueMapBuffer(cmdQueues_[_deviceId], srcBuffer_, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, nBytes, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); for (unsigned int i = 0; i < nBytes / sizeof(cl_uint); ++i) { reinterpret_cast(mem)[i] = inputData; } dstBuffer_ = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer(dstBuffer) failed"); _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], srcBuffer_, mem, 0, NULL, NULL); mem = _wrapper->clEnqueueMapBuffer(cmdQueues_[_deviceId], dstBuffer_, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, sizeof(cl_uint), 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memset(mem, 0, sizeof(cl_uint)); _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], dstBuffer_, mem, 0, NULL, NULL); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &srcBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_uint), (void*)&width); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void*)&height); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 3, numCachedPixels_ * sizeof(cl_uint), 0); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_uint), (void*)&numCachedPixels_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 5, sizeof(cl_mem), (void*)&dstBuffer_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } void OCLPerfVerticalFetch::run(void) { if (skip_) { return; } CPerfCounter timer; // warm up error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, dim, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); cl_uint* memResult; memResult = (cl_uint*)malloc(sizeof(cl_uint)); if (0 == memResult) { CHECK_RESULT_NO_RETURN(0, "malloc failed!\n"); return; } memset(memResult, 0, sizeof(cl_uint)); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], dstBuffer_, CL_FALSE, 0, sizeof(cl_uint), memResult, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueReadBuffer dstBuffer_ failed!"); _wrapper->clFinish(cmdQueues_[_deviceId]); if (memResult[0] != ((gws[0] * gws[1]) / (lws[0] * lws[1]))) { CHECK_RESULT_NO_RETURN(0, "Data validation failed for warm up run!\n"); // free(memResult); // return; } free(memResult); timer.Reset(); timer.Start(); double sec2 = 0; cl_event* events = new cl_event[nIter]; for (unsigned int i = 0; i < nIter; i++) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, dim, NULL, gws, lws, 0, NULL, &events[i]); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); for (unsigned int i = 0; i < nIter; i++) { cl_ulong startTime = 0, endTime = 0; error_ = _wrapper->clGetEventProfilingInfo( events[i], CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, 0); CHECK_RESULT(error_, "clGetEventProfilingInfo failed"); error_ = _wrapper->clGetEventProfilingInfo( events[i], CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, 0); CHECK_RESULT(error_, "clGetEventProfilingInfo failed"); _wrapper->clReleaseEvent(events[i]); sec2 += endTime - startTime; } double sec = timer.GetElapsedTime(); delete[] events; // read speed in GB/s double perf = ((double)nBytes * nIter * (double)(1e-09)) / sec; double perf2 = ((double)nBytes * nIter) / sec2; _perfInfo = (float)perf2; float perfInfo = (float)perf; char buf[256]; SNPRINTF(buf, sizeof(buf), " (%8d bytes, %s) i:%4d Wall time Perf: %.2f (GB/s)", nBytes, mem_type_, nIter, perfInfo); testDescString = buf; } unsigned int OCLPerfVerticalFetch::close(void) { if (!skip_) { if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } } if (ptr_ != nullptr) { free(ptr_); ptr_ = nullptr; } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/OCLPerfVerticalFetch.h000066400000000000000000000035721450307266000254760ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "OCLTestImp.h" class OCLPerfVerticalFetch : public OCLTestImp { public: OCLPerfVerticalFetch(); virtual ~OCLPerfVerticalFetch(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); cl_mem srcBuffer_; cl_mem dstBuffer_; unsigned int nWorkItems; // number of GPU work items unsigned int wgs; // work group size unsigned int nBytes; // input and output buffer size unsigned int nIter; // overall number of timing loops cl_uint inputData; // input data to fill the input buffer bool skip_; void* ptr_; const char* mem_type_; cl_uint dim; size_t gws[3]; size_t lws[3]; cl_uint numCachedPixels_; }; clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/TestList.cpp000066400000000000000000000144761450307266000237130ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLTestListImp.h" // // Includes for tests // #include "OCLPerfAES256.h" #include "OCLPerfAtomicSpeed.h" #include "OCLPerfBufferCopyOverhead.h" #include "OCLPerfBufferCopySpeed.h" #include "OCLPerfBufferReadSpeed.h" #include "OCLPerfBufferWriteSpeed.h" #include "OCLPerfCPUMemSpeed.h" #include "OCLPerfCommandQueue.h" #include "OCLPerfConcurrency.h" #include "OCLPerfDevMemReadSpeed.h" #include "OCLPerfDevMemWriteSpeed.h" #include "OCLPerfDeviceConcurrency.h" #include "OCLPerfDeviceEnqueue.h" #include "OCLPerfDispatchSpeed.h" #include "OCLPerfDoubleDMA.h" #include "OCLPerfDoubleDMASeq.h" #include "OCLPerfFillBuffer.h" #include "OCLPerfFillImage.h" #include "OCLPerfFlush.h" #include "OCLPerfGenericBandwidth.h" #include "OCLPerfGenoilSiaMiner.h" #include "OCLPerfImageCopyCorners.h" #include "OCLPerfImageCopySpeed.h" #include "OCLPerfImageMapUnmap.h" #include "OCLPerfImageReadSpeed.h" #include "OCLPerfImageSampleRate.h" #include "OCLPerfImageWriteSpeed.h" #include "OCLPerfKernelArguments.h" #include "OCLPerfLDSLatency.h" #include "OCLPerfLDSReadSpeed.h" #include "OCLPerfMandelbrot.h" #include "OCLPerfMapBufferReadSpeed.h" #include "OCLPerfMapBufferWriteSpeed.h" #include "OCLPerfMapImageReadSpeed.h" #include "OCLPerfMapImageWriteSpeed.h" #include "OCLPerfMatrixTranspose.h" #include "OCLPerfMemCombine.h" #include "OCLPerfMemCreate.h" #include "OCLPerfMemLatency.h" #include "OCLPerfPinnedBufferReadSpeed.h" #include "OCLPerfPinnedBufferWriteSpeed.h" #include "OCLPerfPipeCopySpeed.h" #include "OCLPerfSHA256.h" #include "OCLPerfSampleRate.h" #include "OCLPerfScalarReplArrayElem.h" #include "OCLPerfSdiP2PCopy.h" #include "OCLPerfTextureMemLatency.h" #include "OCLPerfUAVReadSpeed.h" #include "OCLPerfUAVReadSpeedHostMem.h" #include "OCLPerfUAVWriteSpeedHostMem.h" #include "OCLPerfVerticalFetch.h" // 2.0 #include "OCLPerf3DImageWriteSpeed.h" #include "OCLPerfAtomicSpeed20.h" #include "OCLPerfDeviceEnqueue2.h" #include "OCLPerfDeviceEnqueueEvent.h" #include "OCLPerfDeviceEnqueueSier.h" #include "OCLPerfImageCreate.h" #include "OCLPerfImageReadWrite.h" #include "OCLPerfImageReadsRGBA.h" #include "OCLPerfProgramGlobalRead.h" #include "OCLPerfProgramGlobalWrite.h" #include "OCLPerfSVMAlloc.h" #include "OCLPerfSVMKernelArguments.h" #include "OCLPerfSVMMap.h" #include "OCLPerfSVMMemFill.h" #include "OCLPerfSVMMemcpy.h" #include "OCLPerfSVMSampleRate.h" #include "OCLPerfUncoalescedRead.h" // // Helper macro for adding tests // template static void* dictionary_CreateTestFunc(void) { return new T(); } #define TEST(name) \ { #name, &dictionary_CreateTestFunc < name> } TestEntry TestList[] = { TEST(OCLPerfUAVReadSpeed), TEST(OCLPerfUAVReadSpeedHostMem), TEST(OCLPerfUAVWriteSpeedHostMem), TEST(OCLPerfLDSReadSpeed), TEST(OCLPerfDispatchSpeed), TEST(OCLPerfMapBufferReadSpeed), TEST(OCLPerfMapBufferWriteSpeed), TEST(OCLPerfBufferReadSpeed), TEST(OCLPerfBufferReadRectSpeed), TEST(OCLPerfPinnedBufferReadSpeed), TEST(OCLPerfPinnedBufferReadRectSpeed), TEST(OCLPerfBufferWriteSpeed), TEST(OCLPerfBufferWriteRectSpeed), TEST(OCLPerfPinnedBufferWriteSpeed), TEST(OCLPerfPinnedBufferWriteRectSpeed), TEST(OCLPerfBufferCopySpeed), TEST(OCLPerfBufferCopyRectSpeed), TEST(OCLPerfMapImageReadSpeed), TEST(OCLPerfMapImageWriteSpeed), TEST(OCLPerfMemCombine), TEST(OCLPerfImageReadSpeed), TEST(OCLPerfPinnedImageReadSpeed), TEST(OCLPerfImageWriteSpeed), TEST(OCLPerfPinnedImageWriteSpeed), TEST(OCLPerfImageCopySpeed), TEST(OCLPerfCPUMemSpeed), TEST(OCLPerfMandelbrot), TEST(OCLPerfAsyncMandelbrot), TEST(OCLPerfConcurrency), TEST(OCLPerfDeviceConcurrency), TEST(OCLPerfAES256), TEST(OCLPerfSHA256), TEST(OCLPerfAtomicSpeed), TEST(OCLPerfMatrixTranspose), TEST(OCLPerfImageCopyCorners), TEST(OCLPerfScalarReplArrayElem), TEST(OCLPerfSdiP2PCopy), TEST(OCLPerfFlush), TEST(OCLPerfMemCreate), TEST(OCLPerfImageMapUnmap), TEST(OCLPerfCommandQueue), TEST(OCLPerfKernelArguments), TEST(OCLPerfDoubleDMA), TEST(OCLPerfDoubleDMASeq), TEST(OCLPerfMemLatency), TEST(OCLPerfTextureMemLatency), TEST(OCLPerfSampleRate), TEST(OCLPerfImageSampleRate), TEST(OCLPerfBufferCopyOverhead), TEST(OCLPerfMapDispatchSpeed), TEST(OCLPerfDeviceEnqueue), TEST(OCLPerfPipeCopySpeed), TEST(OCLPerfGenericBandwidth), TEST(OCLPerfLDSLatency), TEST(OCLPerfDeviceEnqueue2), TEST(OCLPerfSVMAlloc), TEST(OCLPerfSVMMap), TEST(OCLPerfDeviceEnqueueEvent), TEST(OCLPerfSVMKernelArguments), TEST(OCLPerfDeviceEnqueueSier), TEST(OCLPerfProgramGlobalRead), TEST(OCLPerfProgramGlobalWrite), TEST(OCLPerfAtomicSpeed20), TEST(OCLPerfSVMSampleRate), TEST(OCLPerfImageCreate), TEST(OCLPerfImageReadsRGBA), TEST(OCLPerf3DImageWriteSpeed), TEST(OCLPerfImageReadWrite), TEST(OCLPerfSVMMemcpy), TEST(OCLPerfSVMMemFill), TEST(OCLPerfFillBuffer), TEST(OCLPerfFillImage), TEST(OCLPerfUncoalescedRead), TEST(OCLPerfGenoilSiaMiner), TEST(OCLPerfDevMemReadSpeed), TEST(OCLPerfDevMemWriteSpeed), TEST(OCLPerfVerticalFetch), }; unsigned int TestListCount = sizeof(TestList) / sizeof(TestList[0]); unsigned int TestLibVersion = 0; const char* TestLibName = "oclperf"; clr-rocm-5.7.1/opencl/tests/ocltst/module/perf/oclperf.exclude000066400000000000000000000012471450307266000244310ustar00rootroot00000000000000# We don't need to run regressions on these tests, they are purely for performance testing and debugging OCLPerfMemLatency OCLPerfTextureMemLatency OCLPerfSampleRate OCLPerfImageSampleRate OCLPerfBufferCopyOverhead OCLPerfDeviceEnqueue OCLPerfPipeCopySpeed OCLPerfGenericBandwidth OCLPerfLDSLatency OCLPerfFillBuffer OCLPerfDeviceEnqueue2 OCLPerfDeviceEnqueueEvent OCLPerfDeviceEnqueueSier OCLPerfSVMAlloc OCLPerfSVMMap OCLPerfSVMKernelArguments OCLPerfProgramGlobalRead OCLPerfProgramGlobalWrite OCLPerfAtomicSpeed20 OCLPerfSVMSampleRate OCLPerfImageCreate OCLPerfImageReadsRGBA OCLPerf3DImageWriteSpeed OCLPerfImageReadWrite OCLPerfSVMMemcpy OCLPerfSVMMemFill OCLPerfFillImage clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/000077500000000000000000000000001450307266000221475ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/CMakeLists.txt000066400000000000000000000052071450307266000247130ustar00rootroot00000000000000set(TESTS OCLAsyncMap OCLAsyncTransfer OCLAtomicCounter OCLBlitKernel OCLBufferFromImage OCLCPUGuardPages OCLCreateBuffer OCLCreateContext OCLCreateImage OCLDeviceAtomic OCLDeviceQueries OCLDynamic OCLDynamicBLines OCLGenericAddressSpace OCLGetQueueThreadID OCLGlobalOffset OCLImage2DFromBuffer OCLImageCopyPartial OCLKernelBinary OCLLDS32K OCLLinearFilter OCLMapCount OCLMemDependency OCLMemObjs OCLMemoryInfo OCLMultiQueue OCLOfflineCompilation OCLP2PBuffer OCLPartialWrkgrp OCLPerfCounters OCLPersistent OCLPinnedMemory OCLPlatformAtomics OCLProgramScopeVariables OCLReadWriteImage OCLRTQueue OCLSDI OCLSemaphore OCLStablePState OCLSVM OCLThreadTrace OCLUnalignedCopy ) add_library(oclruntime SHARED TestList.cpp $) foreach(TEST ${TESTS}) target_sources(oclruntime PRIVATE ${TEST}.cpp) endforeach() set_target_properties(oclruntime PROPERTIES CXX_STANDARD 14 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst) target_compile_definitions(oclruntime PRIVATE $) target_include_directories(oclruntime PRIVATE $) target_link_libraries(oclruntime PRIVATE OpenCL) add_custom_command( TARGET oclruntime POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/oclruntime.exclude ${CMAKE_BINARY_DIR}/tests/ocltst/oclruntime.exclude) add_custom_target(test.ocltst.oclruntime COMMAND ${CMAKE_COMMAND} -E env "OCL_ICD_FILENAMES=$" $ -p 0 -m $ -A oclruntime.exclude DEPENDS ocltst oclruntime amdocl WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst USES_TERMINAL) foreach(TEST ${TESTS}) add_custom_target(test.ocltst.oclruntime.${TEST} COMMAND ${CMAKE_COMMAND} -E env "OCL_ICD_FILENAMES=$" $ -p 0 -m $ -t ${TEST} DEPENDS ocltst oclruntime amdocl WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/tests/ocltst USES_TERMINAL) endforeach() INSTALL(TARGETS oclruntime DESTINATION ${OCLTST_INSTALL_DIR} COMPONENT ocltst) INSTALL(FILES oclruntime.exclude DESTINATION ${OCLTST_INSTALL_DIR} COMPONENT ocltst) clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLAsyncMap.cpp000066400000000000000000000075431450307266000247350ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLAsyncMap.h" #include #include #include #include #include "CL/cl.h" #if EMU_ENV static const size_t BufSize = 0x800; static const size_t MapRegion = 0x100; #else static const size_t BufSize = 0x800000; static const size_t MapRegion = 0x100000; #endif // EMU_ENV static const unsigned int NumMaps = BufSize / MapRegion; OCLAsyncMap::OCLAsyncMap() { _numSubTests = 1; } OCLAsyncMap::~OCLAsyncMap() {} void OCLAsyncMap::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, BufSize * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLAsyncMap::run(void) { cl_uint* values[NumMaps]; cl_mem mapBuffer = buffers()[0]; size_t offset = 0; size_t region = MapRegion * sizeof(cl_uint); for (unsigned int i = 0; i < NumMaps; ++i) { values[i] = reinterpret_cast(_wrapper->clEnqueueMapBuffer( cmdQueues_[_deviceId], mapBuffer, CL_TRUE, (CL_MAP_READ | CL_MAP_WRITE), offset, region, 0, NULL, NULL, &error_)); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapBuffer() failed"); offset += region; } for (unsigned int i = 0; i < NumMaps; ++i) { for (unsigned int j = 0; j < MapRegion; ++j) { values[i][j] = i; } } for (unsigned int i = 0; i < NumMaps; ++i) { error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], mapBuffer, values[i], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapBuffer() failed"); } values[0] = reinterpret_cast(_wrapper->clEnqueueMapBuffer( cmdQueues_[_deviceId], mapBuffer, CL_TRUE, CL_MAP_READ, 0, BufSize * sizeof(cl_uint), 0, NULL, NULL, &error_)); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapBuffer() failed"); for (unsigned int i = 0; i < NumMaps; ++i) { values[i] = values[0] + i * MapRegion; for (unsigned int j = 0; j < MapRegion; ++j) { CHECK_RESULT((values[i][j] != i), "validation failed"); } } error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], mapBuffer, values[0], 0, NULL, NULL); _wrapper->clFinish(cmdQueues_[_deviceId]); } unsigned int OCLAsyncMap::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLAsyncMap.h000066400000000000000000000027241450307266000243760ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ASYNC_MAP_H_ #define _OCL_ASYNC_MAP_H_ #include "OCLTestImp.h" class OCLAsyncMap : public OCLTestImp { public: OCLAsyncMap(); virtual ~OCLAsyncMap(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); }; #endif // _OCL_ASYNC_MAP_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLAsyncTransfer.cpp000066400000000000000000000140731450307266000260000ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLAsyncTransfer.h" #include #include #include #include #include "CL/cl.h" #if EMU_ENV static const size_t Iterations = 1; static const size_t IterationDivider = 1; static const size_t BufSize = 10; #else static const size_t Iterations = 0x100; static const size_t IterationDivider = 2; static const size_t BufSize = 0x800000; #endif // EMU_ENV static const size_t MaxBuffers = IterationDivider; const static char* strKernel = "__kernel void factorial(__global uint* out) \n" "{ \n" " uint id = get_global_id(0); \n" " uint factorial = 1; \n" #if EMU_ENV " for (uint i = 1; i < id; ++i) \n" #else " for (uint i = 1; i < (id / 0x10000); ++i) \n" #endif // EMU_ENV " { \n" " factorial *= i; \n" " } \n" " out[id] = factorial; \n" "} \n"; OCLAsyncTransfer::OCLAsyncTransfer() { _numSubTests = 1; } OCLAsyncTransfer::~OCLAsyncTransfer() {} void OCLAsyncTransfer::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "factorial", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; for (size_t i = 0; i < MaxBuffers; ++i) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, BufSize * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_ALLOC_HOST_PTR, BufSize * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLAsyncTransfer::run(void) { void* values; CPerfCounter timer; cl_mem mapBuffer = buffers()[MaxBuffers]; values = _wrapper->clEnqueueMapBuffer( cmdQueues_[_deviceId], mapBuffer, true, (CL_MAP_READ | CL_MAP_WRITE), 0, BufSize * sizeof(cl_uint), 0, NULL, NULL, &error_); timer.Reset(); timer.Start(); size_t x; for (x = 0; x < Iterations / IterationDivider; x++) { for (size_t y = 0; y < IterationDivider; ++y) { cl_mem buffer = buffers()[y]; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {BufSize}; error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } cl_mem readBuffer = buffers()[0]; error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], readBuffer, false, 0, BufSize * sizeof(cl_uint), values, 0, NULL, NULL); _wrapper->clFlush(cmdQueues_[_deviceId]); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s double perf = ((double)BufSize * sizeof(cl_uint) * x * (double)(1e-09)) / sec; printf(" Time: %.2f sec, BW: %.2f GB/s ", sec, perf); error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], mapBuffer, values, 0, NULL, NULL); _wrapper->clFinish(cmdQueues_[_deviceId]); } unsigned int OCLAsyncTransfer::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLAsyncTransfer.h000066400000000000000000000027621450307266000254470ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ASYNC_TRANSFER_H_ #define _OCL_ASYNC_TRANSFER_H_ #include "OCLTestImp.h" class OCLAsyncTransfer : public OCLTestImp { public: OCLAsyncTransfer(); virtual ~OCLAsyncTransfer(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); }; #endif // _OCL_ASYNC_TRANSFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLAtomicCounter.cpp000066400000000000000000000153121450307266000257670ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLAtomicCounter.h" #include #include #include #include "CL/cl.h" const static unsigned int MaxCounters = 2; const static char* strKernel = "#pragma OPENCL EXTENSION cl_ext_atomic_counters_32 : enable \n" "__kernel void atomic_test( \n" " counter32_t counter0, counter32_t counter1, global uint* out_val) \n" "{ \n" " if (!get_global_id(0)) { \n" " uint val0 = atomic_inc(counter0); \n" " uint val1 = atomic_dec(counter1); \n" " out_val[0] = val0; \n" " out_val[1] = val1; \n" " } \n" "} \n"; OCLAtomicCounter::OCLAtomicCounter() { _numSubTests = 1; failed_ = false; } OCLAtomicCounter::~OCLAtomicCounter() {} void OCLAtomicCounter::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening"); char name[1024] = {0}; size_t size = 0; if (deviceId >= deviceCount_) { failed_ = true; return; } _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_EXTENSIONS, 1024, name, &size); if (!strstr(name, "cl_ext_atomic_counter")) { printf("Atomic counter extension is required for this test!\n"); failed_ = true; return; } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-legacy", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "atomic_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; for (unsigned int i = 0; i < MaxCounters; ++i) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, MaxCounters * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLAtomicCounter::run(void) { if (failed_) { return; } cl_uint initVal[2] = {5, 10}; for (unsigned int i = 0; i < MaxCounters; ++i) { error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], buffers()[i], true, 0, sizeof(cl_uint), &initVal[i], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); } for (unsigned int i = 0; i < MaxCounters + 1; ++i) { cl_mem buffer = buffers()[i]; error_ = _wrapper->clSetKernelArg(kernel_, i, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } size_t gws[1] = {64}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); cl_uint outputV[MaxCounters] = {0}; // Find the new counter value initVal[0]++; initVal[1]--; for (unsigned int i = 0; i < MaxCounters; ++i) { cl_mem buffer = buffers()[i]; error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers()[i], true, 0, sizeof(cl_uint), &outputV[i], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); if (initVal[i] != outputV[i]) { printf("%d != %d", initVal[i], outputV[i]); CHECK_RESULT(true, " - Incorrect result for counter!\n"); } } // Restore the original value to check the returned result in the kernel initVal[0]--; initVal[1]++; cl_mem buffer = buffers()[MaxCounters]; error_ = _wrapper->clEnqueueReadBuffer( cmdQueues_[_deviceId], buffers()[MaxCounters], true, 0, MaxCounters * sizeof(cl_uint), outputV, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (unsigned int i = 0; i < MaxCounters; ++i) { if (initVal[i] != outputV[i]) { printf("%d != %d", initVal[i], outputV[i]); CHECK_RESULT(true, " - Incorrect result for counter inside kernel. Returned " "value != original.\n"); } } } unsigned int OCLAtomicCounter::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLAtomicCounter.h000066400000000000000000000030151450307266000254310ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ATOMIC_COUNTER_H_ #define _OCL_ATOMIC_COUNTER_H_ #include "OCLTestImp.h" class OCLAtomicCounter : public OCLTestImp { public: OCLAtomicCounter(); virtual ~OCLAtomicCounter(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; }; #endif // _OCL_ATOMIC_COUNTER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLBlitKernel.cpp000066400000000000000000000555151450307266000252570ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLBlitKernel.h" #include #include #include #include #include "CL/cl.h" const static cl_uint Stages = 4; const static cl_uint ThreadsForCheck = 1 << Stages; #define KERNEL_CODE(...) #__VA_ARGS__ const static char* strKernel = KERNEL_CODE( \n \x23 if OCL20 \n extern void __amd_scheduler(__global void *, __global void *, uint); \n \x23 endif \n extern void __amd_copyBufferToImage( __global uint*, __write_only image2d_array_t, ulong4, int4, int4, uint4, ulong4); extern void __amd_copyImageToBuffer( __read_only image2d_array_t, __global uint*, __global ushort*, __global uchar*, int4, ulong4, int4, uint4, ulong4); extern void __amd_copyImage( __read_only image2d_array_t, __write_only image2d_array_t, int4, int4, int4); extern void __amd_copyImage1DA( __read_only image2d_array_t, __write_only image2d_array_t, int4, int4, int4); extern void __amd_copyBufferRect( __global uchar*, __global uchar*, ulong4, ulong4, ulong4); extern void __amd_copyBufferRectAligned( __global uint*, __global uint*, ulong4, ulong4, ulong4); extern void __amd_copyBuffer( __global uchar*, __global uchar*, ulong, ulong, ulong, uint); extern void __amd_copyBufferAligned( __global uint*, __global uint*, ulong, ulong, ulong, uint); extern void __amd_fillBuffer( __global uchar*, __global uint*, __constant uchar*, uint, ulong, ulong); extern void __amd_fillImage( __write_only image2d_array_t, float4, int4, uint4, int4, int4, uint); __kernel void copyBufferToImage( __global uint* src, __write_only image2d_array_t dst, ulong4 srcOrigin, int4 dstOrigin, int4 size, uint4 format, ulong4 pitch) { __amd_copyBufferToImage(src, dst, srcOrigin, dstOrigin, size, format, pitch); } __kernel void copyImageToBuffer( __read_only image2d_array_t src, __global uint* dstUInt, __global ushort* dstUShort, __global uchar* dstUChar, int4 srcOrigin, ulong4 dstOrigin, int4 size, uint4 format, ulong4 pitch) { __amd_copyImageToBuffer(src, dstUInt, dstUShort, dstUChar, srcOrigin, dstOrigin, size, format, pitch); } __kernel void copyImage( __read_only image2d_array_t src, __write_only image2d_array_t dst, int4 srcOrigin, int4 dstOrigin, int4 size) { __amd_copyImage(src, dst, srcOrigin, dstOrigin, size); } __kernel void copyImage1DA( __read_only image2d_array_t src, __write_only image2d_array_t dst, int4 srcOrigin, int4 dstOrigin, int4 size) { __amd_copyImage1DA(src, dst, srcOrigin, dstOrigin, size); } __kernel void copyBufferRect( __global uchar* src, __global uchar* dst, ulong4 srcRect, ulong4 dstRect, ulong4 size) { __amd_copyBufferRect(src, dst, srcRect, dstRect, size); } __kernel void copyBufferRectAligned( __global uint* src, __global uint* dst, ulong4 srcRect, ulong4 dstRect, ulong4 size) { __amd_copyBufferRectAligned(src, dst, srcRect, dstRect, size); } __kernel void copyBuffer( __global uchar* srcI, __global uchar* dstI, ulong srcOrigin, ulong dstOrigin, ulong size, uint remain) { __amd_copyBuffer(srcI, dstI, srcOrigin, dstOrigin, size, remain); } __kernel void copyBufferAligned( __global uint* src, __global uint* dst, ulong srcOrigin, ulong dstOrigin, ulong size, uint alignment) { __amd_copyBufferAligned(src, dst, srcOrigin, dstOrigin, size, alignment); } __kernel void fillBuffer( __global uchar* bufUChar, __global uint* bufUInt, __constant uchar* pattern, uint patternSize, ulong offset, ulong size) { __amd_fillBuffer(bufUChar, bufUInt, pattern, patternSize, offset, size); } __kernel void fillImage( __write_only image2d_array_t image, float4 patternFLOAT4, int4 patternINT4, uint4 patternUINT4, int4 origin, int4 size, uint type) { __amd_fillImage(image, patternFLOAT4, patternINT4, patternUINT4, origin, size, type); } \n \x23 if OCL20 \n typedef struct _HsaAqlDispatchPacket { uint mix; ushort workgroup_size[3]; ushort reserved2; uint grid_size[3]; uint private_segment_size_bytes; uint group_segment_size_bytes; ulong kernel_object_address; ulong kernel_arg_address; ulong reserved3; ulong completion_signal; } HsaAqlDispatchPacket; \n // This is an OpenCLized hsa_control_directives_t typedef struct _AmdControlDirectives { ulong enabled_control_directives; ushort enable_break_exceptions; ushort enable_detect_exceptions; uint max_dynamic_group_size; ulong max_flat_grid_size; uint max_flat_workgroup_size; uchar required_dim; uchar reserved1[3]; ulong required_grid_size[3]; uint required_workgroup_size[3]; uchar reserved2[60]; } AmdControlDirectives; \n // This is an OpenCLized amd_kernel_code_t typedef struct _AmdKernelCode { uint amd_kernel_code_version_major; uint amd_kernel_code_version_minor; ushort amd_machine_kind; ushort amd_machine_version_major; ushort amd_machine_version_minor; ushort amd_machine_version_stepping; long kernel_code_entry_byte_offset; long kernel_code_prefetch_byte_offset; ulong kernel_code_prefetch_byte_size; ulong max_scratch_backing_memory_byte_size; uint compute_pgm_rsrc1; uint compute_pgm_rsrc2; uint kernel_code_properties; uint workitem_private_segment_byte_size; uint workgroup_group_segment_byte_size; uint gds_segment_byte_size; ulong kernarg_segment_byte_size; uint workgroup_fbarrier_count; ushort wavefront_sgpr_count; ushort workitem_vgpr_count; ushort reserved_vgpr_first; ushort reserved_vgpr_count; ushort reserved_sgpr_first; ushort reserved_sgpr_count; ushort debug_wavefront_private_segment_offset_sgpr; ushort debug_private_segment_buffer_sgpr; uchar kernarg_segment_alignment; uchar group_segment_alignment; uchar private_segment_alignment; uchar wavefront_size; int call_convention; uchar reserved1[12]; ulong runtime_loader_kernel_symbol; AmdControlDirectives control_directives; } AmdKernelCode; \n typedef struct _HwDispatchHeader { uint writeData0; // CP WRITE_DATA write to rewind for memory uint writeData1; uint writeData2; uint writeData3; uint rewind; // REWIND execution uint startExe; // valid bit uint condExe0; // 0xC0032200 -- TYPE 3, COND_EXEC uint condExe1; // 0x00000204 ---- uint condExe2; // 0x00000000 ---- uint condExe3; // 0x00000000 ---- uint condExe4; // 0x00000000 ---- } HwDispatchHeader; \n typedef struct _HwDispatch { uint packet0; // 0xC0067602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (6 values) uint offset0; // 0x00000204 ---- OFFSET uint startX; // 0x00000000 ---- COMPUTE_START_X: START = 0x0 uint startY; // 0x00000000 ---- COMPUTE_START_Y: START = 0x0 uint startZ; // 0x00000000 ---- COMPUTE_START_Z: START = 0x0 uint wrkGrpSizeX; // 0x00000000 ---- COMPUTE_NUM_THREAD_X: NUM_THREAD_FULL = 0x0, NUM_THREAD_PARTIAL = 0x0 uint wrkGrpSizeY; // 0x00000000 ---- COMPUTE_NUM_THREAD_Y: NUM_THREAD_FULL = 0x0, NUM_THREAD_PARTIAL = 0x0 uint wrkGrpSizeZ; // 0x00000000 ---- COMPUTE_NUM_THREAD_Z: NUM_THREAD_FULL = 0x0, NUM_THREAD_PARTIAL = 0x0 uint packet1; // 0xC0027602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (2 values) uint offset1; // 0x0000020C ---- OFFSET uint isaLo; // 0x00000000 ---- COMPUTE_PGM_LO: DATA = 0x0 uint isaHi; // 0x00000000 ---- COMPUTE_PGM_HI: DATA = 0x0, INST_ATC__CI__VI = 0x0 uint packet2; // 0xC0027602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (2 values) uint offset2; // 0x00000212 ---- OFFSET uint resource1; // 0x00000000 ---- COMPUTE_PGM_RSRC1 uint resource2; // 0x00000000 ---- COMPUTE_PGM_RSRC2 uint packet3; // 0xc0017602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (1 value) uint offset3; // 0x00000215 ---- OFFSET uint pad31; // 0x000003ff ---- COMPUTE_RESOURCE_LIMITS uint packet31; // 0xC0067602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (1 value) uint offset31; // 0x00000218 ---- OFFSET uint ringSize; // 0x00000000 ---- COMPUTE_TMPRING_SIZE: WAVES = 0x0, WAVESIZE = 0x0 uint user0; // 0xC0047602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (4 values) uint offsUser0; // 0x00000240 ---- OFFSET uint scratchLo; // 0x00000000 ---- COMPUTE_USER_DATA_0: DATA = 0x0 uint scratchHi; // 0x80000000 ---- COMPUTE_USER_DATA_1: DATA = 0x80000000 uint scratchSize; // 0x00000000 ---- COMPUTE_USER_DATA_2: DATA = 0x0 uint padUser; // 0x00EA7FAC ---- COMPUTE_USER_DATA_3: DATA = 0xEA7FAC uint user1; // 0xC0027602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (2 values) uint offsUser1; // 0x00000244 ---- OFFSET uint aqlPtrLo; // 0x00000000 ---- COMPUTE_USER_DATA_4: DATA = 0x0 uint aqlPtrHi; // 0x00000000 ---- COMPUTE_USER_DATA_5: DATA = 0x0 uint user2; // 0xC0027602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (2 values) uint offsUser2; // 0x00000246 ---- OFFSET uint hsaQueueLo; // 0x00000000 ---- COMPUTE_USER_DATA_6: DATA = 0x0 uint hsaQueueHi; // 0x00000000 ---- COMPUTE_USER_DATA_7: DATA = 0x0 uint user3; // 0xC0027602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (2 values) uint offsUser3; // 0x00000246 ---- OFFSET uint argsLo; // 0x00000000 ---- COMPUTE_USER_DATA_8: DATA = 0x0 uint argsHi; // 0x00000000 ---- COMPUTE_USER_DATA_9: DATA = 0x0 uint copyData; // 0xC0044000 -- TYPE 3, COPY_DATA uint copyDataFlags; // 0x00000405 ---- srcSel 0x5, destSel 0x4, countSel 0x0, wrConfirm 0x0, engineSel 0x0 uint scratchAddrLo; // 0x000201C4 ---- srcAddressLo uint scratchAddrHi; // 0x00000000 ---- srcAddressHi uint shPrivateLo; // 0x00002580 ---- dstAddressLo uint shPrivateHi; // 0x00000000 ---- dstAddressHi uint user4; // 0xC0027602 -- TYPE 3, SET_SH_REG, TYPE:COMPUTE (2 values) uint offsUser4; // 0x00000248 ---- OFFSET uint scratchOffs; // 0x00000000 ---- COMPUTE_USER_DATA_10: DATA = 0x0 uint privSize; // 0x00000030 ---- COMPUTE_USER_DATA_11: DATA = 0x30 uint packet4; // 0xC0031502 -- TYPE 3, DISPATCH_DIRECT, TYPE:COMPUTE uint glbSizeX; // 0x00000000 uint glbSizeY; // 0x00000000 uint glbSizeZ; // 0x00000000 uint padd41; // 0x00000021 } HwDispatch; \n static const uint WavefrontSize = 64; static const uint MaxWaveSize = 0x400; static const uint UsrRegOffset = 0x240; static const uint Pm4Nop = 0xC0001002; static const uint Pm4UserRegs = 0xC0007602; static const uint Pm4CopyReg = 0xC0044000; static const uint PrivateSegEna = 0x1; static const uint DispatchEna = 0x2; static const uint QueuePtrEna = 0x4; static const uint KernelArgEna = 0x8; static const uint FlatScratchEna = 0x20; \n uint GetCmdTemplateHeaderSize() { return sizeof(HwDispatchHeader); } \n uint GetCmdTemplateDispatchSize() { return sizeof(HwDispatch); } \n void EmptyCmdTemplateDispatch(ulong cmdBuf) { volatile __global HwDispatch* dispatch = (volatile __global HwDispatch*)cmdBuf; dispatch->glbSizeX = 0; dispatch->glbSizeY = 0; dispatch->glbSizeZ = 0; } \n void RunCmdTemplateDispatch( ulong cmdBuf, __global HsaAqlDispatchPacket* aqlPkt, ulong scratch, ulong hsaQueue, uint scratchSize, uint scratchOffset, uint numMaxWaves, uint useATC) \n { volatile __global HwDispatch* dispatch = (volatile __global HwDispatch*)cmdBuf; uint usrRegCnt = 0; // Program workgroup size dispatch->wrkGrpSizeX = aqlPkt->workgroup_size[0]; dispatch->wrkGrpSizeY = aqlPkt->workgroup_size[1]; dispatch->wrkGrpSizeZ = aqlPkt->workgroup_size[2]; // ISA address __global AmdKernelCode* kernelObj = (__global AmdKernelCode*)aqlPkt->kernel_object_address; ulong isa = aqlPkt->kernel_object_address + kernelObj->kernel_code_entry_byte_offset; dispatch->isaLo = (uint)(isa >> 8); dispatch->isaHi = (uint)(isa >> 40) | (useATC ? 0x100 : 0); // Program PGM resource registers dispatch->resource1 = kernelObj->compute_pgm_rsrc1; dispatch->resource2 = kernelObj->compute_pgm_rsrc2; uint flags = kernelObj->kernel_code_properties; uint privateSize = kernelObj->workitem_private_segment_byte_size; uint ldsSize = aqlPkt->group_segment_size_bytes + kernelObj->workgroup_group_segment_byte_size; // Align up the LDS blocks 128 * 4(in DWORDs) uint ldsBlocks = (ldsSize + 511) >> 9; dispatch->resource2 |= (ldsBlocks << 15); // Private/scratch segment was enabled if (flags & PrivateSegEna) { uint waveSize = privateSize * WavefrontSize; // 256 DWRODs is the minimum for SQ waveSize = max(MaxWaveSize, waveSize); uint numWaves = scratchSize / waveSize; numWaves = min(numWaves, numMaxWaves); dispatch->ringSize = numWaves; dispatch->ringSize |= (waveSize >> 10) << 12; dispatch->user0 = Pm4UserRegs | (4 << 16); dispatch->scratchLo = (uint)scratch; dispatch->scratchHi = ((uint)(scratch >> 32)) | 0x80000000; // Enables swizzle dispatch->scratchSize = scratchSize; usrRegCnt += 4; } else { dispatch->ringSize = 0; dispatch->user0 = Pm4Nop | (4 << 16); } // Pointer to the AQL dispatch packet dispatch->user1 = (flags & DispatchEna) ? (Pm4UserRegs | (2 << 16)) : (Pm4Nop | (2 << 16)); dispatch->offsUser1 = UsrRegOffset + usrRegCnt; usrRegCnt += (flags & DispatchEna) ? 2 : 0; ulong gpuAqlPtr = (ulong)aqlPkt; dispatch->aqlPtrLo = (uint)gpuAqlPtr; dispatch->aqlPtrHi = (uint)(gpuAqlPtr >> 32); // Pointer to the AQL queue header if (flags & QueuePtrEna) { dispatch->user2 = Pm4UserRegs | (2 << 16); dispatch->offsUser2 = UsrRegOffset + usrRegCnt; usrRegCnt += 2; dispatch->hsaQueueLo = (uint)hsaQueue; dispatch->hsaQueueHi = (uint)(hsaQueue >> 32); } else { dispatch->user2 = Pm4Nop | (2 << 16); } // Pointer to the AQL kernel arguments dispatch->user3 = (flags & KernelArgEna) ? (Pm4UserRegs | (2 << 16)) : (Pm4Nop | (2 << 16)); dispatch->offsUser3 = UsrRegOffset + usrRegCnt; usrRegCnt += (flags & KernelArgEna) ? 2 : 0; dispatch->argsLo = (uint)aqlPkt->kernel_arg_address; dispatch->argsHi = (uint)(aqlPkt->kernel_arg_address >> 32); // Provide pointer to the private/scratch buffer for the flat address if (flags & FlatScratchEna) { dispatch->copyData = Pm4CopyReg; dispatch->scratchAddrLo = (uint)((scratch - scratchOffset) >> 16); dispatch->offsUser4 = UsrRegOffset + usrRegCnt; dispatch->scratchOffs = scratchOffset; dispatch->privSize = privateSize; } else { dispatch->copyData = Pm4Nop | (8 << 16); } // Update the global launch grid dispatch->glbSizeX = aqlPkt->grid_size[0]; dispatch->glbSizeY = aqlPkt->grid_size[1]; dispatch->glbSizeZ = aqlPkt->grid_size[2]; } \n __kernel void scheduler( __global void * queue, __global void * params, uint paramIdx) { __amd_scheduler(queue, params, paramIdx); } \n \x23 endif \n ); enum { BlitCopyImage = 0, BlitCopyImage1DA, BlitCopyImageToBuffer, BlitCopyBufferToImage, BlitCopyBufferRect, BlitCopyBufferRectAligned, BlitCopyBuffer, BlitCopyBufferAligned, FillBuffer, FillImage, Scheduler, BlitTotal }; static const char* BlitName[BlitTotal] = { "copyImage", "copyImage1DA", "copyImageToBuffer", "copyBufferToImage", "copyBufferRect", "copyBufferRectAligned", "copyBuffer", "copyBufferAligned", "fillBuffer", "fillImage", "scheduler", }; OCLBlitKernel::OCLBlitKernel() { _numSubTests = 1; } OCLBlitKernel::~OCLBlitKernel() {} void OCLBlitKernel::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); char dbuffer[1024] = {0}; CPerfCounter timer; int sub = 0; std::string options = "-cl-std=CL2.0 -DOCL20=1"; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { testDescString = "GPU device is required for this test!\n"; return; } size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { options = "-DOCL20=0"; sub = 1; delete strVersion; testDescString = "Currently it works for OCL20 devices only!\n"; return; } delete strVersion; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DRIVER_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DRIVER_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); std::string sch = strKernel; static const char AmdScheduler[] = "amd_scheduler"; static const char AmdSchedulerPal[] = "amd_scheduler_pal"; static const char AmdSchedulerROCm[] = "amd_scheduler_rocm"; const char* AmdSchedulerPatch = NULL; size_t loc = 0; if (NULL != strstr(strVersion, "LC")) { if (NULL != strstr(strVersion, "PAL")) { AmdSchedulerPatch = AmdSchedulerPal; } else if (NULL != strstr(strVersion, "HSA")) { AmdSchedulerPatch = AmdSchedulerROCm; } } delete strVersion; if (NULL != AmdSchedulerPatch) { loc = sch.find(AmdScheduler); sch.replace(loc, strlen(AmdScheduler), AmdSchedulerPatch); loc = sch.find(AmdScheduler, (loc + strlen(AmdSchedulerPatch))); sch.replace(loc, strlen(AmdScheduler), AmdSchedulerPatch); } timer.Reset(); timer.Start(); const char* strProgram = sch.c_str(); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strProgram, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], options.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); cl_kernel kernels[BlitTotal]; for (int i = 0; i < BlitTotal - sub; ++i) { kernels[i] = _wrapper->clCreateKernel(program_, BlitName[i], &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); } timer.Stop(); double sec = timer.GetElapsedTime(); time_ = (float)sec * 1000.f; testDescString = "Blit kernel compilaiton time (ms):"; for (int i = 0; i < BlitTotal - sub; ++i) { _wrapper->clReleaseKernel(kernels[i]); } } void OCLBlitKernel::run(void) { _perfInfo = time_; } unsigned int OCLBlitKernel::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLBlitKernel.h000066400000000000000000000027721450307266000247210ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_BLIT_KERNEL_H_ #define _OCL_BLIT_KERNEL_H_ #include "OCLTestImp.h" class OCLBlitKernel : public OCLTestImp { public: OCLBlitKernel(); virtual ~OCLBlitKernel(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: float time_; }; #endif // _OCL_BLIT_KERNEL_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLBufferFromImage.cpp000066400000000000000000000244451450307266000262220ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLBufferFromImage.h" #include #include #include #include #define GROUP_SIZE 256 const static char strKernel[] = "__kernel void buffer2bufferCopy( " " \n" " __global char* input, " " \n" " __global char* output) " " \n" "{ " " \n" " int coord = (int)(get_global_id(0)); " " \n" " output[coord] = input[coord]; " " \n" "} " " \n"; typedef CL_API_ENTRY cl_mem(CL_API_CALL *clCreateBufferFromImageAMD_fn)( cl_context context, cl_mem image, cl_int *errcode_ret); clCreateBufferFromImageAMD_fn clCreateBufferFromImageAMD; OCLBufferFromImage::OCLBufferFromImage() : OCLTestImp() { _numSubTests = 2; blockSizeX = GROUP_SIZE; blockSizeY = 1; } OCLBufferFromImage::~OCLBufferFromImage() {} void OCLBufferFromImage::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { buffer = bufferImage = clImage2D = bufferOut = NULL; done = false; pitchAlignment = 0; bufferSize = 0; _openTest = test; // Initialize random number seed srand((unsigned int)time(NULL)); OCLTestImp::open(test, units, conversion, deviceId); if (_errorFlag) return; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { testDescString = "GPU device is required for this test!\n"; done = true; return; } cl_bool imageSupport; size_t size; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport), &imageSupport, &size); if (!imageSupport) { testDescString = "Image not supported, skipping this test! "; done = true; return; } clCreateBufferFromImageAMD = (clCreateBufferFromImageAMD_fn)clGetExtensionFunctionAddressForPlatform( platform_, "clCreateBufferFromImageAMD"); if (clCreateBufferFromImageAMD == NULL) { testDescString = "clCreateBufferFromImageAMD not found!\n"; done = true; return; } CompileKernel(); AllocateOpenCLBuffer(); } void OCLBufferFromImage::run(void) { if (_errorFlag || done) { return; } if ((_openTest % 2) == 0) { testReadBuffer(bufferImage); } else { testKernel(); } } void OCLBufferFromImage::AllocateOpenCLBuffer() { cl_int status = 0; size_t size = 0; pitchAlignment = 0; status = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_IMAGE_PITCH_ALIGNMENT, sizeof(cl_uint), &pitchAlignment, &size); pitchAlignment--; const unsigned int requiredPitch = ((imageWidth + pitchAlignment) & ~pitchAlignment); const unsigned int pitch = requiredPitch; bufferSize = pitch * imageHeight; unsigned char *sourceData = new unsigned char[bufferSize]; // init data for (unsigned int y = 0; y < bufferSize; y++) { *(sourceData + y) = y; } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_COPY_HOST_PTR | CL_MEM_READ_WRITE, bufferSize, sourceData, &status); delete[] sourceData; const cl_image_format format = {CL_RGBA, CL_UNSIGNED_INT8}; #if defined(CL_VERSION_2_0) const cl_image_desc desc = {CL_MEM_OBJECT_IMAGE2D, imageWidth / 4, imageHeight, 0, 0, pitch, 0, 0, 0, {buffer}}; #else const cl_image_desc desc = {CL_MEM_OBJECT_IMAGE2D, imageWidth / 4, imageHeight, 0, 0, pitch, 0, 0, 0, buffer}; #endif clImage2D = _wrapper->clCreateImage(context_, CL_MEM_READ_WRITE, &format, &desc, NULL, &status); CHECK_RESULT(clImage2D == NULL || status != CL_SUCCESS, "AllocateOpenCLImage() failed"); bufferImage = clCreateBufferFromImageAMD(context_, clImage2D, &status); char c[1024]; _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DRIVER_VERSION, sizeof(c), &c, NULL); if (status == CL_INVALID_OPERATION) { testDescString = "clCreateBufferFromImageAMD not supported on this device!\n"; done = true; return; } CHECK_RESULT(bufferImage == NULL || status != CL_SUCCESS, "clCreateBufferFromImage(bufferOut) failed"); bufferOut = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, bufferSize, NULL, &status); CHECK_RESULT(bufferOut == NULL || status != CL_SUCCESS, "clCreateBuffer(bufferOut) failed"); } void OCLBufferFromImage::testReadBuffer(cl_mem buffer) { cl_int status = 0; unsigned char *dstData = new unsigned char[bufferSize]; status = clEnqueueReadBuffer(cmdQueues_[_deviceId], buffer, 1, 0, bufferSize, dstData, 0, 0, 0); ::clFinish(cmdQueues_[_deviceId]); for (unsigned int y = 0; y < bufferSize; y++) { if (*(dstData + y) != (unsigned char)y) { CHECK_RESULT_NO_RETURN(true, "CheckCLBuffer: *(dstData+y)!=y => %i != %i", *(dstData + y), y); goto cleanup; } } cleanup: delete[] dstData; } void OCLBufferFromImage::testKernel() { CopyOpenCLBuffer(bufferImage); testReadBuffer(bufferOut); } unsigned int OCLBufferFromImage::close(void) { if (bufferImage != NULL) clReleaseMemObject(bufferImage); if (clImage2D != NULL) clReleaseMemObject(clImage2D); if (buffer != NULL) clReleaseMemObject(buffer); if (bufferOut != NULL) clReleaseMemObject(bufferOut); return OCLTestImp::close(); } void OCLBufferFromImage::CopyOpenCLBuffer(cl_mem buffer) { cl_int status = 0; // Set appropriate arguments to the kernel2D // input buffer image status = clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLBuffer() failed at " "clSetKernelArg(kernel_,0,sizeof(cl_mem),&buffer)"); status = clSetKernelArg(kernel_, 1, sizeof(cl_mem), &bufferOut); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLBuffer() failed at " "clSetKernelArg(kernel_,1,sizeof(cl_mem),&bufferOut)"); // Enqueue a kernel run call. size_t global_work_offset[] = {0}; size_t globalThreads[] = {bufferSize}; size_t localThreads[] = {blockSizeX}; status = clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, globalThreads, NULL, 0, NULL, 0); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLBuffer() failed at clEnqueueNDRangeKernel"); status = clFinish(cmdQueues_[_deviceId]); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLBuffer() failed at clFinish"); } void OCLBufferFromImage::CompileKernel() { cl_int status = 0; size_t kernelSize = sizeof(strKernel); const char *strs = (const char *)&strKernel[0]; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strs, &kernelSize, &status); status = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (status != CL_SUCCESS) { if (status == CL_BUILD_PROGRAM_FAILURE) { cl_int logStatus; size_t buildLogSize = 0; logStatus = clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, buildLogSize, NULL, &buildLogSize); std::string buildLog; buildLog.resize(buildLogSize); logStatus = clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, buildLogSize, &buildLog[0], NULL); printf("%s", buildLog.c_str()); } return; } // get a kernel object handle for a kernel with the given name kernel_ = _wrapper->clCreateKernel(program_, "buffer2bufferCopy", &status); size_t kernel2DWorkGroupSize = 0; status = clGetKernelWorkGroupInfo(kernel_, devices_[_deviceId], CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &kernel2DWorkGroupSize, 0); if ((blockSizeX * blockSizeY) > kernel2DWorkGroupSize) { if (blockSizeX > kernel2DWorkGroupSize) { blockSizeX = kernel2DWorkGroupSize; blockSizeY = 1; } } } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLBufferFromImage.h000066400000000000000000000037721450307266000256670ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLBufferFromImage_H_ #define _OCLBufferFromImage_H_ #include "OCLTestImp.h" class OCLBufferFromImage : public OCLTestImp { public: OCLBufferFromImage(); virtual ~OCLBufferFromImage(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); protected: static const unsigned int imageWidth = 1920; static const unsigned int imageHeight = 1080; void testReadBuffer(cl_mem buffer); void testKernel(); void AllocateOpenCLBuffer(); void CopyOpenCLBuffer(cl_mem buffer); void CompileKernel(); bool done; size_t blockSizeX; /**< Work-group size in x-direction */ size_t blockSizeY; /**< Work-group size in y-direction */ size_t bufferSize; cl_mem buffer; cl_mem clImage2D; cl_mem bufferImage; cl_mem bufferOut; cl_uint pitchAlignment; }; #endif // _OCLBufferFromImage_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLCPUGuardPages.cpp000066400000000000000000000154361450307266000256140ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLCPUGuardPages.h" #include #include #include #include "CL/cl.h" #ifdef _WIN32 #include #include // for EXCEPTION_ACCESS_VIOLATION int filter(unsigned int code, struct _EXCEPTION_POINTERS* ep) { printf("In filter\n"); if (code == EXCEPTION_ACCESS_VIOLATION) { printf("caught AV as expected."); return EXCEPTION_EXECUTE_HANDLER; } else { printf("didn't catch AV, unexpected."); return EXCEPTION_CONTINUE_SEARCH; }; } #else #include #include #include #include void segfault_sigaction(int signal, siginfo_t *si, void *arg) { printf("Caught segfault at address %p\n", si->si_addr); exit(0); } #endif const static char* strKernel = "__kernel void simple_in_out_test( int in_offset, \n" " int out_offset, \n" " __global float4* in, \n" " __global float4* out) { \n" "unsigned int gid = get_global_id(0);\n" "out[gid + out_offset] = in[gid + in_offset] * -1.f;" "}"; testOCLCPUGuardPagesStruct testOCLCPUGuardPagesList[] = { {false, false, 1024, 0, 0}, {true, false, 1024, 0, 0}, {false, false, 1024, 0, 0}, {true, true, 1024, 0, 0}, {false, false, 1024, 0, 0}, {true, true, 1024, 0, 0}}; OCLCPUGuardPages::OCLCPUGuardPages() { _numSubTests = sizeof(testOCLCPUGuardPagesList) / sizeof(testOCLCPUGuardPagesStruct); /* struct sigaction sa; memset(&sa, 0, sizeof(sa)); sigemptyset(&sa.sa_mask); sa.sa_sigaction = segfault_sigaction; sa.sa_flags = SA_SIGINFO; sigaction(SIGSEGV, &sa, NULL); */ } OCLCPUGuardPages::~OCLCPUGuardPages() {} void OCLCPUGuardPages::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { // Initialize the current test parameters. testValues = testOCLCPUGuardPagesList[test]; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "simple_in_out_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); // Create input and output buffers for the test. cl_mem inBuffer, outBuffer; cl_float4* dummyIn = new cl_float4[testValues.items]; for (int i = 0; i < testValues.items; i++) { dummyIn[i].s[0] = dummyIn[i].s[1] = dummyIn[i].s[2] = dummyIn[i].s[3] = i * 1.f; } inBuffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, testValues.items * sizeof(cl_float4), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], inBuffer, 1, 0, testValues.items * sizeof(cl_float4), dummyIn, 0, 0, 0); buffers_.push_back(inBuffer); outBuffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, testValues.items * sizeof(cl_float4), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(outBuffer); delete[] dummyIn; } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLCPUGuardPages::run(void) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_int), &testValues.in_offset); error_ |= _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_int), &testValues.out_offset); error_ |= _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_mem), &buffers()[0]); error_ |= _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_mem), &buffers()[1]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t globalThreads[1]; globalThreads[0] = testValues.items; size_t localThreads[1] = {256}; #ifdef _WIN32 // LPTOP_LEVEL_EXCEPTION_FILTER pOriginalFilter = // SetUnhandledExceptionFilter(MyUnhandledExceptionFilter); // AddVectoredExceptionHandler(1,MyVectorExceptionFilter); try { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, globalThreads, localThreads, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } catch (...) { printf("exception caught in OCLTest...\n"); } #else error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, globalThreads, localThreads, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); #endif } unsigned int OCLCPUGuardPages::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLCPUGuardPages.h000066400000000000000000000032621450307266000252530ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_CPU_GUARD_PAGES_H_ #define _OCL_CPU_GUARD_PAGES_H_ #include "OCLTestImp.h" typedef struct { bool useGuardPages; bool shouldFail; int items; int in_offset; int out_offset; } testOCLCPUGuardPagesStruct; class OCLCPUGuardPages : public OCLTestImp { public: OCLCPUGuardPages(); virtual ~OCLCPUGuardPages(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: testOCLCPUGuardPagesStruct testValues; }; #endif // _OCL_CPU_GUARD_PAGES_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLCreateBuffer.cpp000066400000000000000000000131121450307266000255440ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLCreateBuffer.h" #include #include #include #include #include #ifdef __linux__ #include #endif #include "CL/cl.h" const static size_t MaxSubTests = 1; OCLCreateBuffer::OCLCreateBuffer() { _numSubTests = MaxSubTests; failed_ = false; maxSize_ = 0; } OCLCreateBuffer::~OCLCreateBuffer() {} void OCLCreateBuffer::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; size_t size; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), &maxSize_, &size); //! Workaround out of range issue in Windows 32bit apps #if defined(_WIN32) && !defined(_WIN64) static const size_t MaxSizeLimit = 512 * 1024 * 1024; if (maxSize_ > MaxSizeLimit) { maxSize_ = MaxSizeLimit; } #endif #if EMU_ENV maxSize_ = 1000; #endif // EMU_ENV cl_mem buf = NULL; // Make sure to use a size that's multiple of 8 (64bit). maxSize_ &= 0xFFFFFFFFFFFFFFF8; buf = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, maxSize_, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buf); } void OCLCreateBuffer::run(void) { CPerfCounter timer; cl_ulong pattern = PATTERN_20_64BIT; timer.Reset(); timer.Start(); error_ = /*_wrapper->*/ clEnqueueFillBuffer( cmdQueues_[_deviceId], buffers_[0], &pattern, sizeof(pattern), 0, maxSize_, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueFillBuffer() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); size_t maxSteps = maxSize_; #ifdef __linux__ long pages = sysconf(_SC_PHYS_PAGES); long page_size = sysconf(_SC_PAGE_SIZE); if (maxSteps > (size_t)(pages * page_size / 2)) { maxSteps = (size_t)pages * page_size / 2; } #endif void *resultBuf = NULL; // Reduce the buffer for the step transfers ahead of the allocation, // since huge buffers may cause paging and very low performance maxSteps /= 16; while ((resultBuf = malloc(maxSteps)) == NULL) { maxSteps /= 2; continue; } checkResult(maxSteps, resultBuf, PATTERN_20_64BIT); memset(resultBuf, PATTERN_2A_08BIT, maxSteps); writeBuffer(maxSteps, resultBuf); memset(resultBuf, 0x00, maxSteps); checkResult(maxSteps, resultBuf, PATTERN_2A_64BIT); free(resultBuf); timer.Stop(); double sec = timer.GetElapsedTime(); _perfInfo = (float)sec * 1000.f; std::stringstream str; str << "Max single alloc (size of "; str << maxSize_; str << " bytes) "; testDescString = str.str(); str << "Max single read/write (size of "; str << maxSize_; str << " bytes) create time (ms):"; testDescString = str.str(); } void OCLCreateBuffer::checkResult(size_t maxSteps, void *resultBuf, cl_ulong pattern) { size_t startPoint = 0; while ((startPoint) < maxSize_) { cl_event ee; size_t readSize = maxSteps; if ((startPoint + maxSteps) > maxSize_) { readSize = maxSize_ - startPoint; } error_ = /*wrapper->*/ clEnqueueReadBuffer( cmdQueues_[_deviceId], buffers_[0], CL_FALSE, startPoint, readSize, resultBuf, 0, NULL, &ee); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); size_t err_cnt = 0; size_t chk_cnt = readSize / sizeof(cl_ulong); cl_ulong *cc = reinterpret_cast(resultBuf); for (size_t i = 0; i < chk_cnt; i++) { if (cc[i] != pattern) { err_cnt++; } } if (err_cnt != 0) { error_ = -1; CHECK_RESULT((error_ != CL_SUCCESS), "checkResult() failed"); break; } startPoint += maxSteps; } } void OCLCreateBuffer::writeBuffer(size_t maxSteps, void *dataBuf) { size_t startPoint = 0; while ((startPoint) < maxSize_) { cl_event ee; size_t writeSize = maxSteps; if ((startPoint + maxSteps) > maxSize_) { writeSize = maxSize_ - startPoint; } error_ = /*wrapper->*/ clEnqueueWriteBuffer( cmdQueues_[_deviceId], buffers_[0], CL_FALSE, startPoint, writeSize, dataBuf, 0, NULL, &ee); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); startPoint += maxSteps; } } unsigned int OCLCreateBuffer::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLCreateBuffer.h000066400000000000000000000035301450307266000252140ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_CREATE_BUFFER_H_ #define _OCL_CREATE_BUFFER_H_ #include "OCLTestImp.h" #define PATTERN_20_08BIT 0x20 #define PATTERN_20_64BIT 0x2020202020202020 #define PATTERN_2A_08BIT 0x2a #define PATTERN_2A_64BIT 0x2a2a2a2a2a2a2a2a class OCLCreateBuffer : public OCLTestImp { public: OCLCreateBuffer(); virtual ~OCLCreateBuffer(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual void writeBuffer(size_t tmpMaxSize, void* dataBuf); virtual void checkResult(size_t tmpMaxSize, void* resultBuf, cl_ulong pattern); virtual unsigned int close(void); private: bool failed_; unsigned int testID_; cl_ulong maxSize_; }; #endif // _OCL_CREATE_BUFFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLCreateContext.cpp000066400000000000000000000072561450307266000257730ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLCreateContext.h" #include #include #include #include "CL/cl.h" OCLCreateContext::OCLCreateContext() { _numSubTests = 1; } OCLCreateContext::~OCLCreateContext() {} void OCLCreateContext::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { _crcword = 0; conversion = 1.0f; _deviceId = deviceId; } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLCreateContext::run(void) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; int error = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error != CL_SUCCESS, "clGetPlatformIDs failed"); for (unsigned i = 0; i < numPlatforms; ++i) { char pbuf[100]; error = _wrapper->clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { platform = platforms[i]; break; } } delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); /* Get the number of requested devices */ error = _wrapper->clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices); CHECK_RESULT(error != CL_SUCCESS, "clGetDeviceIDs failed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error = _wrapper->clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, num_devices, devices, NULL); CHECK_RESULT(error != CL_SUCCESS, "clGetDeviceIDs failed"); device = devices[0]; cl_context gContext = _wrapper->clCreateContext( NULL, 1, &device, notify_callback, NULL, &error); CHECK_RESULT(gContext == 0, "clCreateContext failed"); error = _wrapper->clReleaseContext(gContext); CHECK_RESULT(error != CL_SUCCESS, "clReleaseContext failed"); } unsigned int OCLCreateContext::close(void) { return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLCreateContext.h000066400000000000000000000027571450307266000254410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_CreateContext_H_ #define _OCL_CreateContext_H_ #include "OCLTestImp.h" class OCLCreateContext : public OCLTestImp { public: OCLCreateContext(); virtual ~OCLCreateContext(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); }; #endif // _OCL_CreateContext_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLCreateImage.cpp000066400000000000000000000463711450307266000253720ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLCreateImage.h" #include #include #include #include #ifdef __linux__ #include #include #endif #include "CL/cl.h" const static size_t ImageSize = 4; const static size_t MaxSubTests = 5; const static char *strKernel = "const sampler_t g_Sampler = CLK_FILTER_LINEAR | \n" " CLK_ADDRESS_CLAMP_TO_EDGE | \n" " CLK_NORMALIZED_COORDS_FALSE; \n" " \n" "__kernel void linear3D(__read_only image3d_t img3D, __global float4* " "f4Tata) \n" "{ \n" " float4 f4Index = { 2.25f, 1.75f, 0.5f, 0.0f }; \n" " // copy interpolated data in result buffer \n" " f4Tata[0] = read_imagef(img3D, g_Sampler, f4Index); \n" "} \n" " \n" "__kernel void linear2D(__read_only image2d_t img2D, __global float4* " "f4Tata) \n" "{ \n" " float2 f2Index = { 2.25f, 1.75f }; \n" " // copy interpolated data in result buffer \n" " f4Tata[0] = read_imagef(img2D, g_Sampler, f2Index); \n" "} \n" " \n" "__kernel void linear1DArray(__read_only image1d_array_t img1DA, __global " "float4* f4Tata) \n" "{ \n" " float2 f2Index = { 2.25f, 0 }; \n" " // copy interpolated data in result buffer \n" " f4Tata[0] = read_imagef(img1DA, g_Sampler, f2Index); \n" "} \n" " \n" "__kernel void linear2DArray(__read_only image2d_array_t img2DA, __global " "float4* f4Tata) \n" "{ \n" " float4 f4Index = { 2.25f, 1.75f, 0.0f, 0.0f }; \n" " // copy interpolated data in result buffer \n" " f4Tata[0] = read_imagef(img2DA, g_Sampler, f4Index); \n" "} \n" " \n" "__kernel void point1DBuffer(__read_only image1d_buffer_t img1DB, __global " "float4* f4Tata) \n" "{ \n" " int index = 2; \n" " // copy interpolated data in result buffer \n" " f4Tata[0] = read_imagef(img1DB, index); \n" "} \n" " \n"; OCLCreateImage::OCLCreateImage() { _numSubTests = MaxSubTests; done_ = false; ImageSizeX = ImageSize; ImageSizeY = ImageSize; ImageSizeZ = ImageSize; } OCLCreateImage::~OCLCreateImage() {} void OCLCreateImage::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; cl_bool imageSupport; size_t size; for (size_t i = 0; i < deviceCount_; ++i) { _wrapper->clGetDeviceInfo(devices_[i], CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport), &imageSupport, &size); if (!imageSupport) { testDescString = "Image not supported, skipping this test! "; done_ = true; return; } } cl_ulong max2DWidth; cl_ulong max2DHeight; cl_ulong max3DWidth; cl_ulong max3DHeight; cl_ulong max3DDepth; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), &maxSize_, &size); _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_IMAGE2D_MAX_WIDTH, sizeof(cl_ulong), &max2DWidth, &size); _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_IMAGE2D_MAX_HEIGHT, sizeof(cl_ulong), &max2DHeight, &size); _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_IMAGE3D_MAX_WIDTH, sizeof(cl_ulong), &max3DWidth, &size); _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_IMAGE3D_MAX_HEIGHT, sizeof(cl_ulong), &max3DHeight, &size); _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_IMAGE3D_MAX_DEPTH, sizeof(cl_ulong), &max3DDepth, &size); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); const char *kernels[] = {"linear3D", "linear2D", "linear2DArray", "linear1DArray", "point1DBuffer"}; unsigned int dimensions[] = {3, 2, 3, 2, 1}; kernel_ = _wrapper->clCreateKernel(program_, kernels[test], &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem memory; cl_mem buf = NULL; cl_image_desc desc; size_t offset[3] = {0, 0, 0}; cl_image_format imageFormat = {CL_RGBA, CL_FLOAT}; desc.image_type = CL_MEM_OBJECT_IMAGE3D; desc.image_array_size = 0; desc.image_row_pitch = 0; desc.image_slice_pitch = 0; desc.num_mip_levels = 0; desc.num_samples = 0; desc.buffer = (cl_mem)NULL; if (test == 0) { desc.image_type = CL_MEM_OBJECT_IMAGE3D; if (is64BitApp()) { ImageSizeX = max3DWidth; ImageSizeY = maxSize_ / (ImageSizeX * 16); if (ImageSizeY > (max3DHeight)) { ImageSizeY = max3DHeight; } ImageSizeZ = maxSize_ / (ImageSizeX * ImageSizeY * 16); #if EMU_ENV ImageSizeX = ImageSizeY = ImageSizeZ = 4; #endif // EMU_ENV } else { ImageSizeX = 4; ImageSizeY = 4; ImageSizeZ = 4; } desc.image_width = ImageSizeX; desc.image_height = ImageSizeY; desc.image_depth = ImageSizeZ; } if (test == 1) { desc.image_type = CL_MEM_OBJECT_IMAGE2D; if (is64BitApp()) { ImageSizeX = max2DWidth - 0x10; ImageSizeY = maxSize_ / (ImageSizeX * 16 * 2); if (ImageSizeY >= max2DHeight) { ImageSizeY = max2DHeight - 0x1000; } #ifdef __linux__ // On linux, if the size of total system memory is less than 4GB, // then, we can allocate much smaller image. // TODO, need to find the root cause struct sysinfo myinfo; unsigned long total_bytes; sysinfo(&myinfo); total_bytes = myinfo.mem_unit * myinfo.totalram; if ((total_bytes / (1024 * 1024)) <= 4096) { ImageSizeY /= 2; } #endif #if EMU_ENV ImageSizeX = ImageSizeY = 4; #endif // EMU_ENV } else { ImageSizeX = 4; ImageSizeY = 4; } ImageSizeZ = 0; desc.image_width = ImageSizeX; desc.image_height = ImageSizeY; desc.image_depth = 0; } else if (test == 2) { desc.image_type = CL_MEM_OBJECT_IMAGE2D_ARRAY; ImageSizeX = ImageSize; ImageSizeY = ImageSize; ImageSizeZ = ImageSize; desc.image_width = ImageSizeX; desc.image_height = ImageSizeY; desc.image_depth = 0; desc.image_array_size = ImageSize; } else if (test == 3) { desc.image_type = CL_MEM_OBJECT_IMAGE1D_ARRAY; ImageSizeX = ImageSize; ImageSizeY = ImageSize; ImageSizeZ = 0; desc.image_width = ImageSize; desc.image_height = ImageSize; desc.image_depth = 0; desc.image_array_size = ImageSize; } else if (test == 4) { ImageSizeX = ImageSize; desc.image_type = CL_MEM_OBJECT_IMAGE1D_BUFFER; buf = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, ImageSizeX * 4 * sizeof(cl_float), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); ImageSizeY = 0; ImageSizeZ = 0; desc.image_width = ImageSizeX; desc.image_height = 0; desc.image_depth = 0; desc.buffer = buf; } memory = _wrapper->clCreateImage(context_, CL_MEM_READ_ONLY, &imageFormat, &desc, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateImage() failed"); float fillColor[4] = {1.f, 1.f, 1.f, 1.f}; if (dimensions[test] == 1) { float data[4][ImageSize]; size_t region[3] = {ImageSize, 1, 1}; error_ = _wrapper->clEnqueueFillImage(cmdQueues_[_deviceId], memory, fillColor, offset, region, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueFillImage() failed"); error_ = _wrapper->clEnqueueReadImage(cmdQueues_[_deviceId], memory, true, offset, region, 0, 0, data, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadImage() failed"); for (size_t x = 0; x < ImageSize; ++x) { if (0 != memcmp(&data[x], fillColor, sizeof(fillColor))) { CHECK_RESULT(true, "Fill image validation failed"); } data[x][0] = (float)x; data[x][1] = data[x][2] = data[x][3] = 1.0f; } error_ = _wrapper->clEnqueueWriteImage(cmdQueues_[_deviceId], memory, true, offset, region, 0, 0, data, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteImage() failed"); } else if (dimensions[test] == 2) { size_t region[3] = {ImageSizeX, ImageSizeY, 1}; error_ = _wrapper->clEnqueueFillImage(cmdQueues_[_deviceId], memory, fillColor, offset, region, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueFillImage() failed"); float *data; size_t ActualImageSizeY = ImageSizeY; size_t maxImageSize = maxSize_; #ifdef __linux__ long pages = sysconf(_SC_PHYS_PAGES); long page_size = sysconf(_SC_PAGE_SIZE); if (maxImageSize > ((size_t)pages * page_size)) { maxImageSize = ((size_t)pages * page_size); } #endif while ((((ImageSizeX * ActualImageSizeY * sizeof(float) * 4) / (1024 * 1024)) >= (size_t)4 * 1024) || ((ImageSizeX * ActualImageSizeY * sizeof(float) * 4) >= (maxImageSize / 2))) { if (ActualImageSizeY == 1) { break; } ActualImageSizeY /= 2; } while ((data = (float *)malloc(ImageSizeX * ActualImageSizeY * sizeof(float) * 4)) == NULL) { if (ActualImageSizeY == 1) { break; } ActualImageSizeY /= 2; } if (data == NULL) { CHECK_RESULT(true, "malloc() failed"); } size_t remainSizeY = ImageSizeY; while (remainSizeY > 0) { ActualImageSizeY = (remainSizeY > ActualImageSizeY) ? ActualImageSizeY : remainSizeY; size_t tmpRange[3] = {ImageSizeX, ActualImageSizeY, 1}; error_ = _wrapper->clEnqueueReadImage(cmdQueues_[_deviceId], memory, true, offset, tmpRange, 0, 0, data, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadImage() failed"); for (size_t y = 0; y < ActualImageSizeY; ++y) { for (size_t x = 0; x < ImageSizeX; ++x) { size_t offsetData = (y * ImageSizeX + x) * 4; if (0 != memcmp(&data[offsetData], fillColor, sizeof(fillColor))) { CHECK_RESULT(true, "Fill image validation failed"); } data[offsetData + 0] = (float)x; data[offsetData + 1] = (float)y; data[offsetData + 2] = data[offsetData + 3] = 1.0f; } } error_ = _wrapper->clEnqueueWriteImage(cmdQueues_[_deviceId], memory, true, offset, tmpRange, 0, 0, data, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteImage() failed"); remainSizeY -= ActualImageSizeY; offset[1] += ActualImageSizeY; } free(data); } else if (dimensions[test] == 3) { float *data; float index = 0.f; size_t region[3] = {ImageSizeX, ImageSizeY, ImageSizeZ}; error_ = _wrapper->clEnqueueFillImage(cmdQueues_[_deviceId], memory, fillColor, offset, region, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueFillImage() failed"); size_t ActualImageSizeZ = ImageSizeZ; size_t maxImageSize = maxSize_; #ifdef __linux__ long pages = sysconf(_SC_PHYS_PAGES); long page_size = sysconf(_SC_PAGE_SIZE); if (maxImageSize > ((size_t)pages * page_size)) { maxImageSize = ((size_t)pages * page_size); } #endif while ((((ImageSizeX * ImageSizeY * ActualImageSizeZ * sizeof(float) * 4) / (1024 * 1024)) >= (size_t)4 * 1024) || ((ImageSizeX * ImageSizeY * ActualImageSizeZ * sizeof(float) * 4) >= (maxImageSize / 2))) { if (ActualImageSizeZ == 1) { break; } ActualImageSizeZ /= 2; } while ((data = (float *)malloc(ImageSizeX * ImageSizeY * ActualImageSizeZ * sizeof(float) * 4)) == NULL) { if (ActualImageSizeZ == 1) { break; } ActualImageSizeZ -= 1; } if (data == NULL) { CHECK_RESULT(true, "malloc() failed"); } size_t remainSizeZ = ImageSizeZ; while (remainSizeZ > 0) { ActualImageSizeZ = (remainSizeZ > ActualImageSizeZ) ? ActualImageSizeZ : remainSizeZ; size_t tmpRange[3] = {ImageSizeX, ImageSizeY, ActualImageSizeZ}; error_ = _wrapper->clEnqueueReadImage(cmdQueues_[_deviceId], memory, true, offset, tmpRange, 0, 0, data, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadImage() failed"); for (size_t z = 0; z < ActualImageSizeZ; ++z) { for (size_t y = 0; y < ImageSizeY; ++y) { for (size_t x = 0; x < ImageSizeX; ++x) { size_t offset = (((z * ImageSizeY) + y) * ImageSizeX + x) * 4; if (0 != memcmp(&data[offset], fillColor, sizeof(fillColor))) { CHECK_RESULT(true, "Fill image validation failed"); } data[offset + 0] = (float)x; data[offset + 1] = (float)y; data[offset + 2] = (float)z; data[offset + 3] = 1.0f; } } } error_ = _wrapper->clEnqueueWriteImage(cmdQueues_[_deviceId], memory, true, offset, tmpRange, 0, 0, data, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteImage() failed"); remainSizeZ -= ActualImageSizeZ; offset[2] += ActualImageSizeZ; } free(data); } buffers_.push_back(memory); memory = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, 4 * sizeof(cl_float), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(memory); if (buf != NULL) { buffers_.push_back(buf); } size_t imageSizebyte = (ImageSizeY != 0) ? ImageSizeY * ImageSizeX : ImageSizeX; imageSizebyte *= (ImageSizeZ != 0) ? ImageSizeZ : 1; imageSizebyte *= 16; // 16 bytes per pixel, imageFormat = {CL_RGBA,CL_FLOAT} char strImgSize[200]; if (imageSizebyte >= 1024 * 1024) { sprintf(strImgSize, "%5ld MB", (long)(imageSizebyte / (1024 * 1024))); } else { sprintf(strImgSize, "%6ld Bytes", (long)imageSizebyte); } std::stringstream str; str << " ("; str << ImageSizeX; str << ", "; str << ImageSizeY; str << ", "; str << ImageSizeZ; str << ") "; str << strImgSize; testDescString = str.str(); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLCreateImage::run(void) { if (done_) { return; } cl_float values[4] = {0.f, 0.f, 0.f, 0.f}; cl_float ref[2] = {1.75f, 1.25f}; cl_mem image = buffers()[0]; cl_mem buffer = buffers()[1]; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &image); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {0x1}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffer, true, 0, 4 * sizeof(cl_float), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); if (testID_ == 4) { ref[0] = 2.0f; } for (cl_uint i = 0; i < static_cast((testID_ >= 3) ? 1 : 2); ++i) { if (values[i] != ref[i]) { printf("%.2f != %.2f [ref]", values[i], ref[i]); CHECK_RESULT(true, " - Incorrect result for linear filtering!\n"); } } } unsigned int OCLCreateImage::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLCreateImage.h000066400000000000000000000032341450307266000250260ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_CREATE_IMAGE_H_ #define _OCL_CREATE_IMAGE_H_ #include "OCLTestImp.h" class OCLCreateImage : public OCLTestImp { public: OCLCreateImage(); virtual ~OCLCreateImage(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool done_; unsigned int testID_; size_t maxSize_; size_t ImageSizeX; size_t ImageSizeY; size_t ImageSizeZ; bool is64BitApp() { return sizeof(int*) == 8; } }; #endif // _OCL_CREATE_IMAGE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLDeviceAtomic.cpp000066400000000000000000000217651450307266000255600ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLDeviceAtomic.h" #include #include #include #include "CL/cl.h" #if EMU_ENV static const cl_uint TotalElements = 8 * 32 * 256; #else static const cl_uint TotalElements = 256 * 1024 * 1024; #endif static const cl_uint ArraySize = 256; static cl_uint hostArray[ArraySize]; #define KERNEL_CODE(...) #__VA_ARGS__ const static char* strKernel[] = { KERNEL_CODE( \n __kernel void atomic_test1(__global uint* res) { __global atomic_uint* inc = (__global atomic_uint*)res; atomic_fetch_add_explicit(inc, 1, memory_order_acq_rel, memory_scope_device); } \n __kernel void atomic_test2(__global uint* res) { __global atomic_uint* inc = (__global atomic_uint*)res; atomic_fetch_add_explicit(inc, 1, memory_order_acq_rel, memory_scope_device); } \n), #if EMU_ENV KERNEL_CODE( \n __kernel void atomic_test1(__global uint* res) { for (uint i = 0; i < 8 * 32; ++i) { for (uint j = 0; j < 256; ++j) { __global atomic_uint* inc = (__global atomic_uint*)&res[j]; uint val = atomic_load_explicit(inc, memory_order_acquire, memory_scope_device); if (0 != val) { res[1] = get_global_id(0); res[2] = i; return; } } } } \n __kernel void atomic_test2(__global uint* res) { if (get_global_id(0) == 8 * 20 * 100) { __global atomic_uint* inc = (__global atomic_uint*)res; // atomic_fetch_add_explicit(inc, 1, memory_order_acq_rel, // memory_scope_device); atomic_store_explicit(inc, get_global_id(0), memory_order_release, memory_scope_device); } } \n) #else KERNEL_CODE( \n __kernel void atomic_test1(__global uint* res) { for (uint i = 0; i < 256 * 1024; ++i) { for (uint j = 0; j < 256; ++j) { __global atomic_uint* inc = (__global atomic_uint*)&res[j]; uint val = atomic_load_explicit(inc, memory_order_acquire, memory_scope_device); if (0 != val) { res[1] = get_global_id(0); res[2] = i; return; } } } } \n __kernel void atomic_test2(__global uint* res) { if (get_global_id(0) == 64 * 1000 * 1000) { __global atomic_uint* inc = (__global atomic_uint*)res; // atomic_fetch_add_explicit(inc, 1, memory_order_acq_rel, // memory_scope_device); atomic_store_explicit(inc, get_global_id(0), memory_order_release, memory_scope_device); } } \n) #endif }; OCLDeviceAtomic::OCLDeviceAtomic() : hostQueue_(NULL), failed_(false), kernel2_(NULL) { _numSubTests = 2; } OCLDeviceAtomic::~OCLDeviceAtomic() {} void OCLDeviceAtomic::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } delete strVersion; char dbuffer[1024] = {0}; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel[test], NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "atomic_test1", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); kernel2_ = _wrapper->clCreateKernel(program_, "atomic_test2", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; memset(hostArray, 0, sizeof(hostArray)); buffer = _wrapper->clCreateBuffer(context_, CL_MEM_COPY_HOST_PTR, sizeof(hostArray), &hostArray, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); #if defined(CL_VERSION_2_0) const cl_queue_properties cprops[] = {CL_QUEUE_PROPERTIES, static_cast(0), 0}; hostQueue_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[deviceId], cprops, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); #endif } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLDeviceAtomic::run(void) { if (failed_) return; cl_mem buffer = buffers()[0]; size_t gws[1] = {TotalElements}; size_t gws2[1] = {1}; size_t gws3[1] = {TotalElements}; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); if (testID_ == 0) { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } else { error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws2, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); if (testID_ == 0) { error_ = _wrapper->clEnqueueNDRangeKernel(hostQueue_, kernel2_, 1, NULL, gws, NULL, 0, NULL, NULL); } else { error_ = _wrapper->clEnqueueNDRangeKernel(hostQueue_, kernel2_, 1, NULL, gws3, NULL, 0, NULL, NULL); } CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFlush(cmdQueues_[_deviceId]); _wrapper->clFlush(hostQueue_); _wrapper->clFinish(cmdQueues_[_deviceId]); _wrapper->clFinish(hostQueue_); error_ = _wrapper->clEnqueueReadBuffer(hostQueue_, buffer, CL_TRUE, 0, sizeof(hostArray), hostArray, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); if (testID_ == 0) { if (hostArray[0] != 2 * TotalElements) { printf("Counter: %d, expected: %d\n", hostArray[0], 2 * TotalElements); CHECK_RESULT(true, "Incorrect result for device atomic inc!\n"); } } else { printf("Value: %d, thread: %d, iter: %d\n", hostArray[0], hostArray[1], hostArray[2]); if (hostArray[0] == 0) { CHECK_RESULT(true, "Incorrect result for device atomic inc!\n"); } } } unsigned int OCLDeviceAtomic::close(void) { if (NULL != hostQueue_) { _wrapper->clReleaseCommandQueue(hostQueue_); } if (NULL != kernel2_) { _wrapper->clReleaseKernel(kernel2_); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLDeviceAtomic.h000066400000000000000000000031241450307266000252120ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DEVICE_ATOMIC_H_ #define _OCL_DEVICE_ATOMIC_H_ #include "OCLTestImp.h" class OCLDeviceAtomic : public OCLTestImp { public: OCLDeviceAtomic(); virtual ~OCLDeviceAtomic(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_command_queue hostQueue_; bool failed_; cl_kernel kernel2_; unsigned int testID_; }; #endif // _OCL_DEVICE_ATOMIC_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLDeviceQueries.cpp000066400000000000000000000306731450307266000257570ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLDeviceQueries.h" #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" struct AMDDeviceInfo { const char* targetName_; //!< Target name const char* machineTarget_; //!< Machine target cl_uint simdPerCU_; //!< Number of SIMDs per CU cl_uint simdWidth_; //!< Number of workitems processed per SIMD cl_uint simdInstructionWidth_; //!< Number of instructions processed per SIMD cl_uint memChannelBankWidth_; //!< Memory channel bank width cl_uint localMemSizePerCU_; //!< Local memory size per CU cl_uint localMemBanks_; //!< Number of banks of local memory cl_uint gfxipMajor_; //!< GFXIP major number cl_uint gfxipMinor_; //!< GFXIP minor number }; static const cl_uint Ki = 1024; static const AMDDeviceInfo DeviceInfo[] = { /* CAL_TARGET_CAYMAN */ {"Cayman", "cayman", 1, 16, 4, 256, 32 * Ki, 32, 5, 0}, /* CAL_TARGET_TAHITI */ {"Tahiti", "tahiti", 4, 16, 1, 256, 64 * Ki, 32, 6, 0}, /* CAL_TARGET_PITCAIRN */ {"Pitcairn", "pitcairn", 4, 16, 1, 256, 64 * Ki, 32, 6, 0}, /* CAL_TARGET_CAPEVERDE */ {"Capeverde", "capeverde", 4, 16, 1, 256, 64 * Ki, 32, 6, 0}, /* CAL_TARGET_DEVASTATOR */ {"Devastator", "trinity", 1, 16, 4, 256, 32 * Ki, 32, 5, 0}, /* CAL_TARGET_SCRAPPER */ {"Scrapper", "trinity", 1, 16, 4, 256, 32 * Ki, 32, 5, 0}, /* CAL_TARGET_OLAND */ {"Oland", "oland", 4, 16, 1, 256, 64 * Ki, 32, 6, 0}, /* CAL_TARGET_BONAIRE */ {"Bonaire", "bonaire", 4, 16, 1, 256, 64 * Ki, 32, 7, 2}, /* CAL_TARGET_SPECTRE */ {"Spectre", "spectre", 4, 16, 1, 256, 64 * Ki, 32, 7, 1}, /* CAL_TARGET_SPOOKY */ {"Spooky", "spooky", 4, 16, 1, 256, 64 * Ki, 32, 7, 1}, /* CAL_TARGET_KALINDI */ {"Kalindi", "kalindi", 4, 16, 1, 256, 64 * Ki, 32, 7, 2}, /* CAL_TARGET_HAINAN */ {"Hainan", "hainan", 4, 16, 1, 256, 64 * Ki, 32, 6, 0}, /* CAL_TARGET_HAWAII */ {"Hawaii", "hawaii", 4, 16, 1, 256, 64 * Ki, 32, 7, 2}, /* CAL_TARGET_ICELAND */ {"Iceland", "iceland", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* CAL_TARGET_TONGA */ {"Tonga", "tonga", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* CAL_TARGET_MULLINS */ {"Mullins", "mullins", 4, 16, 1, 256, 64 * Ki, 32, 7, 2}, /* CAL_TARGET_FIJI */ {"Fiji", "fiji", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* CAL_TARGET_CARRIZO */ {"Carrizo", "carrizo", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* CAL_TARGET_CARRIZO */ {"Bristol Ridge", "carrizo", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* CAL_TARGET_Ellesmere */ {"Ellesmere", "ellesmere", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* CAL_TARGET_BAFFIN */ {"Baffin", "baffin", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* ROCM Kaveri */ {"gfx700", "gfx700", 4, 16, 1, 256, 64 * Ki, 32, 7, 1}, /* ROCM Hawaii */ {"gfx701", "gfx701", 4, 16, 1, 256, 64 * Ki, 32, 7, 2}, /* ROCM Kabini */ {"gfx703", "gfx703", 4, 16, 1, 256, 64 * Ki, 32, 7, 2}, /* ROCM Iceland */ {"gfx800", "gfx800", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* ROCM Carrizo */ {"gfx801", "gfx801", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* ROCM Tonga */ {"gfx802", "gfx802", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* ROCM Fiji */ {"gfx803", "gfx803", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* Vega10 */ {"gfx900", "gfx900", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* CAL_TARGET_STONEY */ {"Stoney", "stoney", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* CAL_TARGET_LEXA */ {"gfx804", "gfx804", 4, 16, 1, 256, 64 * Ki, 32, 8, 0}, /* Vega10_XNACK */ {"gfx901", "gfx901", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* Raven */ {"gfx902", "gfx902", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* Raven_XNACK */ {"gfx903", "gfx903", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* Vega12 */ {"gfx904", "gfx904", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* Vega12_XNACK */ {"gfx905", "gfx905", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* Vega20 */ {"gfx906", "gfx906", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* Vega20_XNACK */ {"gfx907", "gfx907", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* MI100 */ {"gfx908", "gfx908", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* MI200 */ {"gfx90a", "gfx90a", 4, 16, 1, 256, 64 * Ki, 32, 9, 0}, /* MI300 */ {"gfx940", "gfx940", 4, 16, 1, 256, 64 * Ki, 32, 9, 4}, /* Navi10 */ {"gfx1010", "gfx1010", 4, 32, 1, 256, 64 * Ki, 32, 10, 1}, /* Navi12 */ {"gfx1011", "gfx1011", 4, 32, 1, 256, 64 * Ki, 32, 10, 1}, /* Navi14 */ {"gfx1012", "gfx1012", 4, 32, 1, 256, 64 * Ki, 32, 10, 1}, /* Navi21 */ { "gfx1030", "gfx1030", 4, 32, 1, 256, 64 * Ki, 32, 10, 3 }, /* Navi22 */ { "gfx1031", "gfx1031", 4, 32, 1, 256, 64 * Ki, 32, 10, 3 }, /* Navi23 */ { "gfx1032", "gfx1032", 4, 32, 1, 256, 64 * Ki, 32, 10, 3 }, /* Van Gogh */ { "gfx1033", "gfx1033", 4, 32, 1, 256, 64 * Ki, 32, 10, 3 }, /* Navi24 */ { "gfx1034", "gfx1034", 4, 32, 1, 256, 64 * Ki, 32, 10, 3 }, /* Rembrandt */{ "gfx1035", "gfx1035", 4, 32, 1, 256, 64 * Ki, 32, 10, 3 }, /* Raphael */ { "gfx1036", "gfx1036", 4, 32, 1, 256, 64 * Ki, 32, 10, 3 }, /* Navi31*/ { "gfx1100", "gfx1100", 4, 32, 1, 256, 64 * Ki, 32, 11, 0 }, /* Navi32*/ { "gfx1101", "gfx1101", 4, 32, 1, 256, 64 * Ki, 32, 11, 0 }, /* Navi33*/ { "gfx1102", "gfx1102", 4, 32, 1, 256, 64 * Ki, 32, 11, 0 }, /* Phoenix */ { "gfx1103", "gfx1103", 4, 32, 1, 256, 64 * Ki, 32, 11, 0 }, }; const int DeviceInfoSize = sizeof(DeviceInfo) / sizeof(AMDDeviceInfo); OCLDeviceQueries::OCLDeviceQueries() { _numSubTests = 1; failed_ = false; } OCLDeviceQueries::~OCLDeviceQueries() {} void OCLDeviceQueries::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); char name[1024] = {0}; size_t size = 0; if (deviceId >= deviceCount_) { failed_ = true; return; } cl_uint value; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_EXTENSIONS, 1024, name, &size); if (!strstr(name, "cl_amd_device_attribute_query")) { printf("AMD device attribute extension is required for this test!\n"); failed_ = true; return; } error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_NAME, sizeof(name), name, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_NAME failed"); std::string str = name; int id = 0; bool deviceFound = false; for (int i = 0; i < DeviceInfoSize; ++i) { if (0 == str.find(DeviceInfo[i].targetName_)) { deviceFound = true; id = i; break; } } CHECK_RESULT(deviceFound != true, "Device %s is not supported", name); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD failed"); CHECK_RESULT((value != DeviceInfo[id].simdPerCU_), "CL_DEVICE_SIMD_PER_COMPUTE_UNIT_AMD failed"); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_SIMD_WIDTH_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_SIMD_WIDTH_AMD failed"); CHECK_RESULT((value != DeviceInfo[id].simdWidth_), "CL_DEVICE_SIMD_WIDTH_AMD failed"); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD failed"); CHECK_RESULT((value != DeviceInfo[id].simdInstructionWidth_), "CL_DEVICE_SIMD_INSTRUCTION_WIDTH_AMD failed"); error_ = _wrapper->clGetDeviceInfo( devices_[deviceId], CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD failed"); CHECK_RESULT((value != DeviceInfo[id].memChannelBankWidth_), "CL_DEVICE_GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD failed"); error_ = _wrapper->clGetDeviceInfo( devices_[deviceId], CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD failed"); CHECK_RESULT((value != DeviceInfo[id].localMemSizePerCU_), "CL_DEVICE_LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD failed"); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_LOCAL_MEM_BANKS_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_LOCAL_MEM_BANKS_AMD failed"); CHECK_RESULT((value != DeviceInfo[id].localMemBanks_), "CL_DEVICE_LOCAL_MEM_BANKS_AMD failed"); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_GFXIP_MAJOR_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_GFXIP_MAJOR_AMD failed"); CHECK_RESULT((value != DeviceInfo[id].gfxipMajor_), "CL_DEVICE_GFXIP_MAJOR_AMD failed"); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_GFXIP_MINOR_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_GFXIP_MINOR_AMD failed"); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD failed"); CHECK_RESULT((value == 0), "CL_DEVICE_GLOBAL_MEM_CHANNEL_BANKS_AMD failed"); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_WAVEFRONT_WIDTH_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_WAVEFRONT_WIDTH_AMD failed"); CHECK_RESULT((value == 0), "CL_DEVICE_WAVEFRONT_WIDTH_AMD failed"); error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD, sizeof(cl_uint), &value, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD failed"); CHECK_RESULT((value == 0), "CL_DEVICE_GLOBAL_MEM_CHANNELS_AMD failed"); } static void CL_CALLBACK notify_callback(cl_event event, cl_int event_command_exec_status, void* user_data) {} void OCLDeviceQueries::run(void) { if (failed_) { return; } } unsigned int OCLDeviceQueries::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLDeviceQueries.h000066400000000000000000000030151450307266000254120ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DEVICE_QUERIES_H_ #define _OCL_DEVICE_QUERIES_H_ #include "OCLTestImp.h" class OCLDeviceQueries : public OCLTestImp { public: OCLDeviceQueries(); virtual ~OCLDeviceQueries(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; }; #endif // _OCL_DEVICE_QUERIES_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLDynamic.cpp000066400000000000000000000177151450307266000246100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLDynamic.h" #include #include #include #include "CL/cl.h" #if EMU_ENV static const cl_uint TotalElements = 1; #else static const cl_uint TotalElements = 128; #endif // EMU_ENV static cl_uint hostArray[TotalElements]; #define KERNEL_CODE(...) #__VA_ARGS__ const static char* strKernel[] = { KERNEL_CODE( \n void block_fn(int tid, int mul, __global uint* res) { res[tid] = mul * 7 - 21; } __kernel void dynamic(__global uint* res) { int multiplier = 3; int tid = get_global_id(0); void (^kernelBlock)(void) = ^{ block_fn(tid, multiplier, res); }; res[tid] = -1; queue_t def_q = get_default_queue(); ndrange_t ndrange = ndrange_1D(1); int enq_res; do { enq_res = enqueue_kernel(def_q, CLK_ENQUEUE_FLAGS_NO_WAIT, ndrange, kernelBlock); if (enq_res != 0 /*CL_SUCCESS*/) { res[tid] = -2; } } while (enq_res != 0); } \n), KERNEL_CODE( \n void block_fn(int tid, int mul, __global uint* res) { res[tid] = mul * 7 - 21; } __kernel void dynamic(__global uint* res, queue_t def_q) { int multiplier = 3; int tid = get_global_id(0); void (^kernelBlock)(void) = ^{ block_fn(tid, multiplier, res); }; res[tid] = -1; ndrange_t ndrange = ndrange_1D(1); // if (tid == 0) { int enq_res = enqueue_kernel(def_q, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, kernelBlock); if (enq_res != 0 /*CL_SUCCESS*/) { res[tid] = -2; return; } //} } \n)}; OCLDynamic::OCLDynamic() { _numSubTests = 2; deviceQueue_ = NULL; failed_ = false; } OCLDynamic::~OCLDynamic() {} void OCLDynamic::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { // FIXME: Re-enable CPU test once bug 10143 is fixed. if (type_ == CL_DEVICE_TYPE_CPU) { return; } OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } delete strVersion; char dbuffer[1024] = {0}; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel[test], NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "dynamic", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; memset(hostArray, 0xee, sizeof(hostArray)); buffer = _wrapper->clCreateBuffer( context_, CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR, sizeof(hostArray), &hostArray, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); #if EMU_ENV cl_uint queueSize = 1; #else cl_uint queueSize = (test == 0) ? 1 : 257 * 1024; #endif // EMU_ENV #if defined(CL_VERSION_2_0) const cl_queue_properties cprops[] = { CL_QUEUE_PROPERTIES, static_cast(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE_DEFAULT | CL_QUEUE_ON_DEVICE), CL_QUEUE_SIZE, queueSize, 0}; deviceQueue_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[deviceId], cprops, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); #endif } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLDynamic::run(void) { // FIXME: Re-enable CPU test once bug 10143 is fixed. if (type_ == CL_DEVICE_TYPE_CPU) { return; } if (failed_) return; cl_mem buffer = buffers()[0]; size_t gws[1] = {TotalElements}; size_t lws[1] = {16}; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); if (testID_ == 1) { error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_command_queue), &deviceQueue_); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } size_t offset = 0; size_t region = TotalElements * sizeof(cl_uint); cl_uint* host = reinterpret_cast(_wrapper->clEnqueueMapBuffer( cmdQueues_[_deviceId], buffer, CL_TRUE, (CL_MAP_READ | CL_MAP_WRITE), offset, region, 0, NULL, NULL, &error_)); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapBuffer() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, #if EMU_ENV NULL, gws, NULL, 0, NULL, NULL); #else NULL, gws, lws, 0, NULL, NULL); #endif // EMU_ENV CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); for (unsigned int i = 0; i < TotalElements; ++i) { if (host[i] != 0) { printf("Bad value: a[%d] = %d\n", i, hostArray[i]); CHECK_RESULT(true, "Incorrect result for dependency!\n"); } } error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], buffer, host, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueUnmapBuffer() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); } unsigned int OCLDynamic::close(void) { // FIXME: Re-enable CPU test once bug 10143 is fixed. if (type_ == CL_DEVICE_TYPE_CPU) { return 0; } if (NULL != deviceQueue_) { _wrapper->clReleaseCommandQueue(deviceQueue_); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLDynamic.h000066400000000000000000000030461450307266000242450ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DYNAMIC_H_ #define _OCL_DYNAMIC_H_ #include "OCLTestImp.h" class OCLDynamic : public OCLTestImp { public: OCLDynamic(); virtual ~OCLDynamic(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_command_queue deviceQueue_; bool failed_; unsigned int testID_; }; #endif // _OCL_MEM_DEPENDENCY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLDynamicBLines.cpp000066400000000000000000000305541450307266000257010ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLDynamicBLines.h" #include #include #include #include #include "CL/cl.h" const static cl_int nLines = 2048; const static cl_int blockDim = 64; #define MAX_TESSELLATION 64 #define KERNEL_CODE(...) #__VA_ARGS__ const static char* strKernel[] = { KERNEL_CODE( \n \x23 define MAX_TESSELLATION 64 \n struct BezierLine { float2 CP[3]; ulong vertexPos; int nVertices; int reserved; }; \n __kernel void computeBezierLinePositions(int lidx, __global struct BezierLine* bLines, int nTessPoints, __global char* buf) { int idx = get_global_id(0); if (idx < nTessPoints) { float u = (float)idx / (float)(nTessPoints-1); float omu = 1.0f - u; float B3u[3]; B3u[0] = omu * omu; B3u[1] = 2.0f * u * omu; B3u[2] = u * u; float2 position = {0, 0}; for (int i = 0; i < 3; i++) { position = position + B3u[i] * bLines[lidx].CP[i]; } ((__global float2*)(bLines[lidx].vertexPos))[idx] = position; } } \n __kernel void computeBezierLines(__global struct BezierLine* bLines, int nLines, __global char* buf) { int lidx = get_global_id(0); if (lidx < nLines) { float curvature = length(bLines[lidx].CP[1] - 0.5f * (bLines[lidx].CP[0] + bLines[lidx].CP[2])) / length(bLines[lidx].CP[2] - bLines[lidx].CP[0]); int nTessPoints = min(max((int)(curvature * 16.0f), 4), MAX_TESSELLATION); if (bLines[lidx].vertexPos == 0) { bLines[lidx].nVertices = nTessPoints; uint value = atomic_add((__global volatile uint*)buf, nTessPoints * sizeof(float2)); bLines[lidx].vertexPos = (ulong)(&buf[value]); } queue_t def_q = get_default_queue(); ndrange_t ndrange = ndrange_1D(bLines[lidx].nVertices, 64); int enq_res = enqueue_kernel(def_q, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ computeBezierLinePositions(lidx, bLines, bLines[lidx].nVertices, buf); }); } } \n __kernel void computeBezierLines2(__global struct BezierLine* bLines, int nLines, __global char* buf) { int lidx = get_global_id(0); if (lidx < nLines) { float curvature = length(bLines[lidx].CP[1] - 0.5f * (bLines[lidx].CP[0] + bLines[lidx].CP[2])) / length(bLines[lidx].CP[2] - bLines[lidx].CP[0]); int nTessPoints = min(max((int)(curvature * 16.0f), 4), MAX_TESSELLATION); if (bLines[lidx].vertexPos == 0) { bLines[lidx].nVertices = nTessPoints; uint value = atomic_add((__global volatile uint*)buf, nTessPoints * sizeof(float2)); bLines[lidx].vertexPos = (ulong)(&buf[value]); } } } \n ) }; OCLDynamicBLines::OCLDynamicBLines() { _numSubTests = 1; deviceQueue_ = NULL; failed_ = false; bLines_ = NULL; hostArray_ = NULL; kernel2_ = NULL; kernel3_ = NULL; } OCLDynamicBLines::~OCLDynamicBLines() {} void OCLDynamicBLines::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { if (type_ == CL_DEVICE_TYPE_CPU) { return; } OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } delete strVersion; char dbuffer[1024] = {0}; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel[test], NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "computeBezierLines", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); kernel2_ = _wrapper->clCreateKernel(program_, "computeBezierLines2", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); kernel3_ = _wrapper->clCreateKernel(program_, "computeBezierLinePositions", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; bLines_ = new BezierLine[nLines]; cl_float2 last = {0, 0}; for (int i = 0; i < nLines; i++) { bLines_[i].CP[0] = last; for (int j = 1; j < 3; j++) { bLines_[i].CP[j].s[0] = (float)rand() / (float)RAND_MAX; bLines_[i].CP[j].s[1] = (float)rand() / (float)RAND_MAX; } last = bLines_[i].CP[2]; bLines_[i].vertexPos = 0; bLines_[i].nVertices = 0; bLines_[i].reserved = 0; } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_USE_HOST_PTR, sizeof(BezierLine) * nLines, bLines_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); hostArray_ = new cl_float2[nLines * (MAX_TESSELLATION + 1)]; ((unsigned int*)hostArray_)[0] = sizeof(cl_float2); buffer = _wrapper->clCreateBuffer( context_, CL_MEM_USE_HOST_PTR, sizeof(cl_float2) * nLines * MAX_TESSELLATION, hostArray_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); cl_uint queueSize = 256 * 1024; #if defined(CL_VERSION_2_0) const cl_queue_properties cprops[] = { CL_QUEUE_PROPERTIES, static_cast(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE_DEFAULT | CL_QUEUE_ON_DEVICE), CL_QUEUE_SIZE, queueSize, 0}; deviceQueue_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[deviceId], cprops, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); #endif } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLDynamicBLines::run(void) { CPerfCounter timer; if (type_ == CL_DEVICE_TYPE_CPU) { return; } if (failed_) return; cl_mem buffer = buffers()[0]; cl_mem alloc = buffers()[1]; size_t gws[1] = {nLines}; size_t lws[1] = {blockDim}; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); error_ |= _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_int), &nLines); error_ |= _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_mem), &alloc); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); for (int i = 0; i < nLines; i++) { bLines_[i].vertexPos = 0; bLines_[i].nVertices = 0; bLines_[i].reserved = 0; } ((unsigned int*)hostArray_)[0] = sizeof(cl_float2); timer.Reset(); timer.Start(); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); double sec = timer.GetElapsedTime(); for (int i = 0; i < nLines; i++) { bLines_[i].vertexPos = 0; bLines_[i].nVertices = 0; bLines_[i].reserved = 0; } unsigned int allocSize = ((unsigned int*)hostArray_)[0]; ((unsigned int*)hostArray_)[0] = sizeof(cl_float2); // // Host emulation // timer.Reset(); timer.Start(); // Step 1. Fill the jobs error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), &buffer); error_ |= _wrapper->clSetKernelArg(kernel2_, 1, sizeof(cl_int), &nLines); error_ |= _wrapper->clSetKernelArg(kernel2_, 2, sizeof(cl_mem), &alloc); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel2_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); // Step 2. Run all jobs for (int lidx = 0; lidx < nLines; lidx++) { // Readback the new dimension. error_ = _wrapper->clSetKernelArg(kernel3_, 0, sizeof(cl_int), &lidx); error_ |= _wrapper->clSetKernelArg(kernel3_, 1, sizeof(cl_mem), &buffer); error_ |= _wrapper->clSetKernelArg(kernel3_, 2, sizeof(cl_int), &bLines_[lidx].nVertices); error_ |= _wrapper->clSetKernelArg(kernel3_, 3, sizeof(cl_mem), &alloc); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gwsL[1] = {static_cast(bLines_[lidx].nVertices)}; size_t lwsL[1] = {blockDim}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel3_, 1, NULL, gws, lws, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); double sec2 = timer.GetElapsedTime(); if (memcmp(&allocSize, hostArray_, sizeof(cl_uint)) != 0) { CHECK_RESULT(true, "Validaiton failed!"); } if (sec >= sec2) { _perfInfo = (float)(sec2 - sec); CHECK_RESULT(true, "Device enqueue is slower than emulation (sec)"); return; } _perfInfo = (float)(((sec2 - sec) / sec) * 100); testDescString = "Device enqueue is (%%) faster"; } unsigned int OCLDynamicBLines::close(void) { // FIXME: Re-enable CPU test once bug 10143 is fixed. if (type_ == CL_DEVICE_TYPE_CPU) { return 0; } delete[] bLines_; delete[] hostArray_; if (NULL != deviceQueue_) { _wrapper->clReleaseCommandQueue(deviceQueue_); } if (NULL != kernel2_) { _wrapper->clReleaseKernel(kernel2_); } if (NULL != kernel3_) { _wrapper->clReleaseKernel(kernel3_); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLDynamicBLines.h000066400000000000000000000034221450307266000253400ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_DYNAMIC_BLINES_H_ #define _OCL_DYNAMIC_BLINES_H_ #include "OCLTestImp.h" class OCLDynamicBLines : public OCLTestImp { public: OCLDynamicBLines(); virtual ~OCLDynamicBLines(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: struct BezierLine { cl_float2 CP[3]; long long vertexPos; int nVertices; int reserved; }; cl_command_queue deviceQueue_; bool failed_; unsigned int testID_; BezierLine* bLines_; cl_float2* hostArray_; cl_kernel kernel2_; cl_kernel kernel3_; }; #endif // _OCL_DYNAMIC_BLINES__H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLGenericAddressSpace.cpp000066400000000000000000001013321450307266000270470ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGenericAddressSpace.h" #include "CL/cl.h" #define TO_LOCAL_FAIL 0x000f0 #define TO_GLOBAL_FAIL 0x00e00 #define TO_PRIVATE_FAIL 0x0d000 #define WRONG_VALUE 0xc0000 OCLGenericAddressSpace::OCLGenericAddressSpace() { _numSubTests = 7; } OCLGenericAddressSpace::~OCLGenericAddressSpace() {} void OCLGenericAddressSpace::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "error_ opening test"); silentFailure = false; _openTest = test; size_t param_size = 0; program_ = 0; kernel_ = 0; char* strVersion = 0; #if EMU_ENV arrSize = 10; #else arrSize = 1000; #endif // EMU_ENV error_ = _wrapper->clGetDeviceInfo( devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); strVersion = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); if (strVersion[9] < '2') { printf("\nOpenCL C 2.0 not supported\n"); silentFailure = true; } free(strVersion); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLGenericAddressSpace::run(void) { if (silentFailure) return; switch (_openTest) { case 0: test0(); break; case 1: test1(); break; case 2: test2(); break; case 3: test3(); break; case 4: test4(); break; case 5: test5(); break; case 6: test6(); break; } return; } void OCLGenericAddressSpace::test6(void) { const char* kernel_str = "\n\ __global unsigned int gint = 1; \n\ __kernel void test(__global ulong *results) \n\ { \n\ uint tid = get_global_id(0); \n\ unsigned int *ptr; \n\ __private unsigned int pint = tid + 2; \n\ if ((tid % 2) == 0) { \n\ ptr = &pint; \n\ } \n\ else { \n\ ptr = &gint; \n\ } \n\ results[0] = *ptr;\n\ results[1] = pint;\n\ results[2] = (ulong)ptr;\n\ results[3] = (ulong)to_private(ptr);\n\ results[4] = (ulong)&pint;\n\ } \n"; const size_t global_work_size = 1; const size_t arrSize = global_work_size * 5; cl_ulong* output_arr = (cl_ulong*)malloc(arrSize * sizeof(cl_ulong)); memset(output_arr, 0, arrSize * sizeof(cl_ulong)); cl_mem buffer = _wrapper->clCreateBuffer( context_, CL_MEM_READ_WRITE, arrSize * sizeof(cl_ulong), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_ulong) * arrSize, output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); if (output_arr[0] != 2) { printf( "\n*ptr:0x%llx, pint:0x%llx, ptr:0x%llx, to_private(ptr):0x%llx, " "&pint:0x%llx", (unsigned long long)output_arr[0], (unsigned long long)output_arr[1], (unsigned long long)output_arr[2], (unsigned long long)output_arr[3], (unsigned long long)output_arr[4]); printf("\n\n"); error_ = 1; } free(output_arr); CHECK_RESULT((error_ != CL_SUCCESS), "Generic Address Space - test2 failed"); } void OCLGenericAddressSpace::test5(void) { const char* kernel_str = "\n\ __global unsigned int gint = 1; \n\ __kernel void test(__global ulong *results) \n\ { \n\ uint tid = get_global_id(0); \n\ results[tid] = 0; \n\ unsigned int *ptr; \n\ __local unsigned int lint; \n\ lint = 2; \n\ if ((tid % 2) == 0) { \n\ ptr = &lint; \n\ } \n\ else { \n\ ptr = &gint; \n\ } \n\ barrier(CLK_GLOBAL_MEM_FENCE); \n\ if ((tid % 2) == 0) { \n\ results[tid*5] = *ptr;\n\ results[tid*5+1] = lint;\n\ results[tid*5+2] = (ulong)ptr;\n\ results[tid*5+3] = (ulong)to_local(ptr);\n\ results[tid*5+4] = (ulong)&lint;\n\ } \n\ else { \n\ results[tid*5] = *ptr;\n\ results[tid*5+1] = gint;\n\ results[tid*5+2] = (ulong)ptr;\n\ results[tid*5+3] = (ulong)to_global(ptr);\n\ results[tid*5+4] = (ulong)&gint;\n\ } \n\ } \n"; const size_t global_work_size = 2; const size_t arrSize = global_work_size * 5; cl_ulong* output_arr = (cl_ulong*)malloc(arrSize * sizeof(cl_ulong)); memset(output_arr, 0, arrSize * sizeof(cl_ulong)); cl_mem buffer = _wrapper->clCreateBuffer( context_, CL_MEM_READ_WRITE, arrSize * sizeof(cl_ulong), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_ulong) * arrSize, output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); int error_cnt = 0; for (unsigned int i = 0; i < global_work_size; ++i) { if (((i % 2 == 0) && (output_arr[i * 5] != 2)) || ((i % 2 == 1) && (output_arr[i * 5] != 1))) { ++error_cnt; } } if (error_cnt) { printf("\nNumber of wrong results: %d/%d\n\n", error_cnt, (int)global_work_size); for (unsigned int i = 0; i < global_work_size; ++i) { if (i % 2 == 0) { printf( "\n*ptr:0x%llx, lint:0x%llx, ptr:0x%llx, to_local(ptr):0x%llx, " "&lint:0x%llx", (unsigned long long)output_arr[i * 5], (unsigned long long)output_arr[i * 5 + 1], (unsigned long long)output_arr[i * 5 + 2], (unsigned long long)output_arr[i * 5 + 3], (unsigned long long)output_arr[i * 5 + 4]); } else { printf( "\n*ptr:0x%llx, gint:0x%llx, ptr:0x%llx, to_global(ptr):0x%llx, " "&gint:0x%llx", (unsigned long long)output_arr[i * 5], (unsigned long long)output_arr[i * 5 + 1], (unsigned long long)output_arr[i * 5 + 2], (unsigned long long)output_arr[i * 5 + 3], (unsigned long long)output_arr[i * 5 + 4]); } } printf("\n\n"); } free(output_arr); CHECK_RESULT((error_cnt != 0), "Generic Address Space - test2 failed"); } void OCLGenericAddressSpace::test4(void) { const char* kernel_str = "\n\ __global unsigned int gint = 1; \n\ __kernel void test(__global ulong *results) \n\ { \n\ uint tid = get_global_id(0); \n\ results[tid] = 0; \n\ unsigned int *ptr; \n\ __private unsigned int pint = 2; \n\ if ((tid % 2) == 0) { \n\ ptr = &pint; \n\ } \n\ else { \n\ ptr = &gint; \n\ } \n\ barrier(CLK_GLOBAL_MEM_FENCE); \n\ if ((tid % 2) == 0) { \n\ results[tid*5] = *ptr;\n\ results[tid*5+1] = pint;\n\ results[tid*5+2] = (ulong)ptr;\n\ results[tid*5+3] = (ulong)to_private(ptr);\n\ results[tid*5+4] = (ulong)&pint;\n\ } \n\ else { \n\ results[tid*5] = *ptr;\n\ results[tid*5+1] = gint;\n\ results[tid*5+2] = (ulong)ptr;\n\ results[tid*5+3] = (ulong)to_global(ptr);\n\ results[tid*5+4] = (ulong)&gint;\n\ } \n\ } \n"; const size_t global_work_size = 2; const size_t arrSize = global_work_size * 5; cl_ulong* output_arr = (cl_ulong*)malloc(arrSize * sizeof(cl_ulong)); memset(output_arr, 0, arrSize * sizeof(cl_ulong)); cl_mem buffer = _wrapper->clCreateBuffer( context_, CL_MEM_READ_WRITE, arrSize * sizeof(cl_ulong), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_ulong) * arrSize, output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); int error_cnt = 0; for (unsigned int i = 0; i < global_work_size; ++i) { if (((i % 2 == 0) && (output_arr[i * 5] != 2)) || ((i % 2 == 1) && (output_arr[i * 5] != 1))) { ++error_cnt; } } if (error_cnt) { printf("\nNumber of wrong results: %d/%d\n\n", error_cnt, (int)global_work_size); for (unsigned int i = 0; i < global_work_size; ++i) { if (i % 2 == 0) { printf( "\n*ptr:0x%llx, pint:0x%llx, ptr:0x%llx, to_private(ptr):0x%llx, " "&pint:0x%llx", (unsigned long long)output_arr[i * 5], (unsigned long long)output_arr[i * 5 + 1], (unsigned long long)output_arr[i * 5 + 2], (unsigned long long)output_arr[i * 5 + 3], (unsigned long long)output_arr[i * 5 + 4]); } else { printf( "\n*ptr:0x%llx, gint:0x%llx, ptr:0x%llx, to_global(ptr):0x%llx, " "&gint:0x%llx", (unsigned long long)output_arr[i * 5], (unsigned long long)output_arr[i * 5 + 1], (unsigned long long)output_arr[i * 5 + 2], (unsigned long long)output_arr[i * 5 + 3], (unsigned long long)output_arr[i * 5 + 4]); } } printf("\n\n"); } free(output_arr); CHECK_RESULT((error_cnt != 0), "Generic Address Space - test2 failed"); } void OCLGenericAddressSpace::test3(void) { const char* kernel_str = "\n\ #define TO_LOCAL_FAIL 0x000f0\n\ #define TO_GLOBAL_FAIL 0x00e00\n\ #define TO_PRIVATE_FAIL 0x0d000\n\ #define WRONG_VALUE 0xc0000\n\ __global unsigned int gint = 1; \n\ __kernel void test(__global uint *results) \n\ { \n\ uint tid = get_global_id(0); \n\ results[tid] = 0; \n\ unsigned int *ptr; \n\ __local unsigned int lint; \n\ lint = 2; \n\ __private unsigned int pint = 3; \n\ switch (tid % 3) \n\ {\n\ case 0:\n\ ptr = &gint; break; \n\ case 1:\n\ ptr = &lint; break; \n\ case 2:\n\ ptr = &pint; break; \n\ }\n\ barrier(CLK_GLOBAL_MEM_FENCE); \n\ switch (tid % 3) \n\ {\n\ case 0:\n\ if(to_global(ptr) && (*ptr == 1))\n\ {\n\ results[tid] = *ptr;\n\ }\n\ else\n\ {\n\ if (*ptr != 1) results[tid] = WRONG_VALUE;\n\ if(!to_global(ptr)) results[tid] |= TO_GLOBAL_FAIL;\n\ }\n\ break; \n\ case 1:\n\ if(to_local(ptr) && (*ptr == 2))\n\ {\n\ results[tid] = *ptr;\n\ }\n\ else\n\ {\n\ if (*ptr != 2) results[tid] = WRONG_VALUE;\n\ if(!to_local(ptr)) results[tid] |= TO_LOCAL_FAIL;\n\ }\n\ break; \n\ case 2:\n\ if(to_private(ptr) && (*ptr == 3))\n\ {\n\ results[tid] = *ptr;\n\ }\n\ else\n\ {\n\ if (*ptr != 3) results[tid] = WRONG_VALUE;\n\ if(!to_private(ptr)) results[tid] |= TO_PRIVATE_FAIL;\n\ }\n\ break; \n\ }\n\ } \n"; cl_uint* output_arr = (cl_uint*)malloc(arrSize * sizeof(cl_uint)); memset(output_arr, 0, arrSize * sizeof(cl_uint)); cl_mem buffer = _wrapper->clCreateBuffer( context_, CL_MEM_READ_WRITE, arrSize * sizeof(cl_uint), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; size_t global_work_size = arrSize; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_uint) * arrSize, output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); int error_cnt = 0; int wrong_values = 0; int to_local_error = 0; int to_global_error = 0; int to_private_error = 0; for (unsigned int i = 0; i < arrSize; ++i) { switch (i % 3) { case 0: error_cnt += (output_arr[i] != 1); break; case 1: error_cnt += (output_arr[i] != 2); break; case 2: error_cnt += (output_arr[i] != 3); break; } if (output_arr[i] & WRONG_VALUE) ++wrong_values; if (output_arr[i] & TO_LOCAL_FAIL) ++to_local_error; if (output_arr[i] & TO_GLOBAL_FAIL) ++to_global_error; if (output_arr[i] & TO_PRIVATE_FAIL) ++to_private_error; } if (error_cnt) { printf("\nNumber of wrong results: %d/%d ", error_cnt, (int)arrSize); printf( "wrong values: %d to_local_error: %d, to_global_error: %d, " "to_private_error: %d\n", wrong_values, to_local_error, to_global_error, to_private_error); } free(output_arr); CHECK_RESULT((error_cnt != 0), "Generic Address Space - test3 failed"); } void OCLGenericAddressSpace::test2(void) { const char* kernel_str = "\n\ #define TO_LOCAL_FAIL 0x000f0\n\ #define TO_GLOBAL_FAIL 0x00e00\n\ #define TO_PRIVATE_FAIL 0x0d000\n\ #define WRONG_VALUE 0xc0000\n\ __global unsigned int gint = 1; \n\ __kernel void test(__global uint *results) \n\ { \n\ uint tid = get_global_id(0); \n\ results[tid] = 0; \n\ unsigned int *ptr; \n\ __private unsigned int pint = 2; \n\ if ((tid % 2) == 0) { \n\ ptr = &pint; \n\ } \n\ else { \n\ ptr = &gint; \n\ } \n\ barrier(CLK_GLOBAL_MEM_FENCE); \n\ if ((tid % 2) == 0) { \n\ if (to_private(ptr) && *ptr == 2) {\n\ results[tid] = *ptr;\n\ }\n\ else {\n\ if (*ptr != 2) results[tid] = WRONG_VALUE;\n\ if(!to_private(ptr)) results[tid] |= TO_PRIVATE_FAIL;\n\ }\n\ } \n\ else { \n\ if (to_global(ptr) && *ptr == 1) {\n\ results[tid] = *ptr;\n\ }\n\ else {\n\ if (*ptr != 1) results[tid] = WRONG_VALUE;\n\ if(!to_global(ptr)) results[tid] |= TO_GLOBAL_FAIL;\n\ }\n\ } \n\ } \n"; cl_uint* output_arr = (cl_uint*)malloc(arrSize * sizeof(cl_uint)); memset(output_arr, 0, arrSize * sizeof(cl_uint)); cl_mem buffer = _wrapper->clCreateBuffer( context_, CL_MEM_READ_WRITE, arrSize * sizeof(cl_uint), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; size_t global_work_size = arrSize; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_uint) * arrSize, output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); int error_cnt = 0; int wrong_values = 0; int to_local_error = 0; int to_global_error = 0; int to_private_error = 0; for (unsigned int i = 0; i < arrSize; ++i) { if (((i % 2 == 0) && (output_arr[i] != 2)) || ((i % 2 == 1) && (output_arr[i] != 1))) { if (output_arr[i] & WRONG_VALUE) ++wrong_values; if (output_arr[i] & TO_LOCAL_FAIL) ++to_local_error; if (output_arr[i] & TO_GLOBAL_FAIL) ++to_global_error; if (output_arr[i] & TO_PRIVATE_FAIL) ++to_private_error; ++error_cnt; } } free(output_arr); if (error_cnt) { printf("\nNumber of wrong results: %d/%d", error_cnt, (int)arrSize); printf( "wrong values: %d to_local_error: %d, to_global_error: %d, " "to_private_error: %d\n", wrong_values, to_local_error, to_global_error, to_private_error); } CHECK_RESULT((error_cnt != 0), "Generic Address Space - test2 failed"); } void OCLGenericAddressSpace::test1(void) { const char* kernel_str = "\n\ #define TO_LOCAL_FAIL 0x000f0\n\ #define TO_GLOBAL_FAIL 0x00e00\n\ #define TO_PRIVATE_FAIL 0x0d000\n\ #define WRONG_VALUE 0xc0000\n\ __global unsigned int gint1 = 1; \n\ __global unsigned int gint2 = 2; \n\ __kernel void test(__global uint *results) \n\ { \n\ uint tid = get_global_id(0); \n\ results[tid] = 0; \n\ unsigned int *ptr; \n\ if ((tid % 2) == 0) { \n\ ptr = &gint2; \n\ } \n\ else { \n\ ptr = &gint1; \n\ } \n\ barrier(CLK_GLOBAL_MEM_FENCE); \n\ if ((tid % 2) == 0) { \n\ if (to_global(ptr) && *ptr == 2) {\n\ results[tid] = *ptr;\n\ }\n\ else {\n\ if (*ptr != 2) results[tid] = WRONG_VALUE;\n\ if(!to_global(ptr)) results[tid] |= TO_GLOBAL_FAIL;\n\ }\n\ } \n\ else { \n\ if (to_global(ptr) && *ptr == 1) {\n\ results[tid] = *ptr;\n\ }\n\ else {\n\ if (*ptr != 1) results[tid] = WRONG_VALUE;\n\ if(!to_global(ptr)) results[tid] |= TO_GLOBAL_FAIL;\n\ }\n\ } \n\ } \n"; cl_uint* output_arr = (cl_uint*)malloc(arrSize * sizeof(cl_uint)); memset(output_arr, 0, arrSize * sizeof(cl_uint)); cl_mem buffer = _wrapper->clCreateBuffer( context_, CL_MEM_READ_WRITE, arrSize * sizeof(cl_uint), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; size_t global_work_size = arrSize; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_uint) * arrSize, output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); int error_cnt = 0; int wrong_values = 0; int to_local_error = 0; int to_global_error = 0; int to_private_error = 0; for (unsigned int i = 0; i < arrSize; ++i) { if (((i % 2 == 0) && (output_arr[i] != 2)) || ((i % 2 == 1) && (output_arr[i] != 1))) { if (output_arr[i] & WRONG_VALUE) ++wrong_values; if (output_arr[i] & TO_LOCAL_FAIL) ++to_local_error; if (output_arr[i] & TO_GLOBAL_FAIL) ++to_global_error; if (output_arr[i] & TO_PRIVATE_FAIL) ++to_private_error; ++error_cnt; } } free(output_arr); if (error_cnt) { printf("\nNumber of wrong results: %d/%d", error_cnt, (int)arrSize); printf( "wrong values: %d to_local_error: %d, to_global_error: %d, " "to_private_error: %d\n", wrong_values, to_local_error, to_global_error, to_private_error); } CHECK_RESULT((error_cnt != 0), "Generic Address Space - test1 failed"); } void OCLGenericAddressSpace::test0(void) { const char* kernel_str = "\n\ #define TO_LOCAL_FAIL 0x000f0\n\ #define TO_GLOBAL_FAIL 0x00e00\n\ #define TO_PRIVATE_FAIL 0x0d000\n\ #define WRONG_VALUE 0xc0000\n\ __global unsigned int gint = 1; \n\ __kernel void test(__global uint *results) \n\ { \n\ uint tid = get_global_id(0); \n\ results[tid] = 0; \n\ unsigned int *ptr; \n\ __local unsigned int lint; \n\ lint = 2; \n\ if ((tid % 2) == 0) { \n\ ptr = &lint; \n\ } \n\ else { \n\ ptr = &gint; \n\ } \n\ barrier(CLK_GLOBAL_MEM_FENCE); \n\ if ((tid % 2) == 0) { \n\ if (to_local(ptr) && *ptr == 2) {\n\ results[tid] = *ptr;\n\ }\n\ else {\n\ if (*ptr != 2) results[tid] = WRONG_VALUE;\n\ if(!to_local(ptr)) results[tid] |= TO_LOCAL_FAIL;\n\ }\n\ } \n\ else { \n\ if (to_global(ptr) && *ptr == 1) {\n\ results[tid] = *ptr;\n\ }\n\ else {\n\ if (*ptr != 1) results[tid] = WRONG_VALUE;\n\ if(!to_global(ptr)) results[tid] |= TO_GLOBAL_FAIL;\n\ }\n\ } \n\ } \n"; cl_uint* output_arr = (cl_uint*)malloc(arrSize * sizeof(cl_uint)); memset(output_arr, 0, arrSize * sizeof(cl_uint)); cl_mem buffer = _wrapper->clCreateBuffer( context_, CL_MEM_READ_WRITE, arrSize * sizeof(cl_uint), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; size_t global_work_size = arrSize; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_uint) * arrSize, output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); int error_cnt = 0; int wrong_values = 0; int to_local_error = 0; int to_global_error = 0; int to_private_error = 0; for (unsigned int i = 0; i < arrSize; ++i) { if (((i % 2 == 0) && (output_arr[i] != 2)) || ((i % 2 == 1) && (output_arr[i] != 1))) { if (output_arr[i] & WRONG_VALUE) ++wrong_values; if (output_arr[i] & TO_LOCAL_FAIL) ++to_local_error; if (output_arr[i] & TO_GLOBAL_FAIL) ++to_global_error; if (output_arr[i] & TO_PRIVATE_FAIL) ++to_private_error; ++error_cnt; } } free(output_arr); if (error_cnt) { printf("\nNumber of wrong results: %d/%d", error_cnt, (int)arrSize); printf( "wrong values: %d to_local_error: %d, to_global_error: %d, " "to_private_error: %d\n", wrong_values, to_local_error, to_global_error, to_private_error); } CHECK_RESULT((error_cnt != 0), "Generic Address Space - test0 failed"); } unsigned int OCLGenericAddressSpace::close(void) { if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); kernel_ = 0; } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLGenericAddressSpace.h000066400000000000000000000033471450307266000265230ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GenericAddressSpace_H_ #define _OCL_GenericAddressSpace_H_ #include "OCLTestImp.h" class OCLGenericAddressSpace : public OCLTestImp { public: OCLGenericAddressSpace(); virtual ~OCLGenericAddressSpace(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: void test0(void); void test1(void); void test2(void); void test3(void); void test4(void); void test5(void); void test6(void); bool silentFailure; cl_kernel kernel_; size_t arrSize; }; #endif // _OCL_GenericAddressSpace_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLGetQueueThreadID.cpp000066400000000000000000000075701450307266000263130ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGetQueueThreadID.h" #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" #if !defined(__linux__) #include "WinBase.h" typedef DWORD(WINAPI* GetThreadId)(__in HANDLE Thread); #endif bool badThread = false; OCLGetQueueThreadID::OCLGetQueueThreadID() { _numSubTests = 1; failed_ = false; } OCLGetQueueThreadID::~OCLGetQueueThreadID() {} void OCLGetQueueThreadID::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); char name[1024] = {0}; size_t size = 0; if (deviceId >= deviceCount_) { failed_ = true; return; } cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(cl_event event, cl_int event_command_exec_status, void* user_data) { #if defined(__linux__) pthread_t id = (pthread_t)user_data; pthread_t handle = pthread_self(); #else HMODULE module = GetModuleHandle("kernel32.dll"); GetThreadId getThreadId = reinterpret_cast(GetProcAddress(module, "GetThreadId")); if (NULL == getThreadId) { return; } DWORD id = getThreadId((HANDLE)user_data); DWORD handle = GetCurrentThreadId(); #endif if (id != handle) { badThread = true; } } void OCLGetQueueThreadID::run(void) { if (failed_) { return; } void* handle; cl_event clEvent; cl_event userEvent = clCreateUserEvent(context_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateUserEvent() failed"); cl_uint initVal[2] = {5, 10}; error_ = _wrapper->clGetCommandQueueInfo(cmdQueues_[_deviceId], CL_QUEUE_THREAD_HANDLE_AMD, sizeof(void*), &handle, NULL); error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], buffers()[0], false, 0, sizeof(cl_uint), &initVal[0], 1, &userEvent, &clEvent); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); error_ = _wrapper->clSetEventCallback(clEvent, CL_SUBMITTED, notify_callback, handle); clSetUserEventStatus(userEvent, CL_COMPLETE); clFinish(cmdQueues_[_deviceId]); clReleaseEvent(clEvent); clReleaseEvent(userEvent); CHECK_RESULT(badThread, "Thread ID is incorrect!"); } unsigned int OCLGetQueueThreadID::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLGetQueueThreadID.h000066400000000000000000000030451450307266000257510ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GET_QUEUE_THREAD_ID_H_ #define _OCL_GET_QUEUE_THREAD_ID_H_ #include "OCLTestImp.h" class OCLGetQueueThreadID : public OCLTestImp { public: OCLGetQueueThreadID(); virtual ~OCLGetQueueThreadID(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; }; #endif // _OCL_GET_QUEUE_THREAD_ID_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLGlobalOffset.cpp000066400000000000000000000133321450307266000255620ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLGlobalOffset.h" #include #include #include #include "CL/cl.h" const static cl_uint ThreadsForCheck = 2; const static cl_uint GlobalOffset = 64; const static char* strKernel = "__kernel void global_offset_test( \n" " global uint* out_val) \n" "{ \n" " // Check the first thread \n" " if (get_global_id(0) == get_global_offset(0)) { \n" " out_val[0] = (uint)get_global_offset(0); \n" " } \n" " // Check the last thread \n" " if (get_global_id(0) == (get_global_size(0) + get_global_offset(0) - " "1)) { \n" " out_val[1] = (uint)get_global_offset(0); \n" " } \n" "} \n"; OCLGlobalOffset::OCLGlobalOffset() { _numSubTests = 1; } OCLGlobalOffset::~OCLGlobalOffset() {} void OCLGlobalOffset::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); char dbuffer[1024] = {0}; _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_VERSION, 1024, dbuffer, NULL); if (strstr(dbuffer, "OpenCL 1.0")) { return; } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "global_offset_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, ThreadsForCheck * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLGlobalOffset::run(void) { char dbuffer[1024] = {0}; _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 1024, dbuffer, NULL); if (strstr(dbuffer, "OpenCL 1.0")) { return; } cl_uint offsetValues[ThreadsForCheck] = {0xffffffff, 0xffffffff}; cl_mem buffer = buffers()[0]; error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], buffer, true, 0, ThreadsForCheck * sizeof(cl_uint), offsetValues, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {0x0800000}; size_t gwo[1] = {GlobalOffset}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, gwo, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffer, true, 0, ThreadsForCheck * sizeof(cl_uint), offsetValues, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (cl_uint i = 0; i < ThreadsForCheck; ++i) { if (offsetValues[i] != GlobalOffset) { printf("%d != %d", GlobalOffset, offsetValues[i]); CHECK_RESULT(true, " - Incorrect result for global offset!\n"); } } } unsigned int OCLGlobalOffset::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLGlobalOffset.h000066400000000000000000000027541450307266000252350ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_GLOBAL_OFFSET_H_ #define _OCL_GLOBAL_OFFSET_H_ #include "OCLTestImp.h" class OCLGlobalOffset : public OCLTestImp { public: OCLGlobalOffset(); virtual ~OCLGlobalOffset(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); }; #endif // _OCL_GLOBAL_OFFSET_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLImage2DFromBuffer.cpp000066400000000000000000000343241450307266000264050ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLImage2DFromBuffer.h" #include #include #include #include #define GROUP_SIZE 256 const unsigned int OCLImage2DFromBuffer::imageWidth = 1920; const unsigned int OCLImage2DFromBuffer::imageHeight = 1080; const static char strKernel[] = "__constant sampler_t imageSampler = CLK_NORMALIZED_COORDS_FALSE | " "CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; \n" "__kernel void image2imageCopy( " " \n" " __read_only image2d_t input, " " \n" " __write_only image2d_t output) " " \n" "{ " " \n" " int2 coord = (int2)(get_global_id(0), get_global_id(1)); " " \n" " uint4 temp = read_imageui(input, imageSampler, coord); " " \n" " write_imageui(output, coord, temp); " " \n" "} " " \n"; typedef CL_API_ENTRY cl_mem(CL_API_CALL *clConvertImageAMD_fn)( cl_context context, cl_mem image, const cl_image_format *image_format, cl_int *errcode_ret); clConvertImageAMD_fn clConvertImageAMD; OCLImage2DFromBuffer::OCLImage2DFromBuffer() : OCLTestImp() { _numSubTests = 6; blockSizeX = GROUP_SIZE; blockSizeY = 1; } OCLImage2DFromBuffer::~OCLImage2DFromBuffer() {} void OCLImage2DFromBuffer::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { buffer = clImage2DOriginal = clImage2D = clImage2DOut = NULL; done_ = false; pitchAlignment = 0; _openTest = test; // Initialize random number seed srand((unsigned int)time(NULL)); OCLTestImp::open(test, units, conversion, deviceId); if (_errorFlag) return; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { testDescString = "GPU device is required for this test!\n"; done_ = true; return; } cl_bool imageSupport; size_t size; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport), &imageSupport, &size); if (!imageSupport) { testDescString = "Image not supported, skipping this test! "; done_ = true; return; } if (_openTest >= 4) { clConvertImageAMD = (clConvertImageAMD_fn)clGetExtensionFunctionAddressForPlatform( platform_, "clConvertImageAMD"); if (clConvertImageAMD == NULL) { testDescString = "clConvertImageAMD not found!\n"; done_ = true; return; } } CompileKernel(); AllocateOpenCLImage(); } void OCLImage2DFromBuffer::run(void) { if (_errorFlag || done_) { return; } if ((_openTest % 2) == 0) { testReadImage(clImage2D); } else { testKernel(); } } void OCLImage2DFromBuffer::AllocateOpenCLImage() { const bool pitchTest = (_openTest == 2 || _openTest == 3); cl_int status = 0; size_t size = 0; pitchAlignment = 0; status = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_IMAGE_PITCH_ALIGNMENT, sizeof(cl_uint), &pitchAlignment, &size); if (pitchAlignment != 0) { pitchAlignment--; } const unsigned int requiredPitch = ((imageWidth + pitchAlignment) & ~pitchAlignment); const unsigned int pitch = (!pitchTest) ? requiredPitch : imageWidth; const size_t bufferSize = pitch * imageHeight; CHECK_RESULT(bufferSize == 0, "ERROR: calculated image size is zero"); unsigned char *sourceData = new unsigned char[bufferSize]; // init data for (unsigned int y = 0; y < imageHeight; y++) { for (unsigned int x = 0; x < imageWidth / 4; x++) { for (unsigned int p = 0; p < 4; p++) { *(sourceData + y * pitch + x * 4 + p) = p; } } } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_COPY_HOST_PTR | CL_MEM_READ_WRITE, bufferSize, sourceData, &status); { // testing clConvertImageAMD if (_openTest == 4 || _openTest == 5) { const cl_image_format format = {CL_R, CL_UNSIGNED_INT8}; #if defined(CL_VERSION_2_0) const cl_image_desc desc = {CL_MEM_OBJECT_IMAGE2D, imageWidth, imageHeight, 0, 0, pitch, 0, 0, 0, {buffer}}; #else const cl_image_desc desc = {CL_MEM_OBJECT_IMAGE2D, imageWidth, imageHeight, 0, 0, pitch, 0, 0, 0, buffer}; #endif clImage2DOriginal = _wrapper->clCreateImage( context_, CL_MEM_READ_WRITE, &format, &desc, NULL, &status); CHECK_RESULT(status != CL_SUCCESS, "clCreateImage() failed"); const cl_image_format formatRGBA = {CL_RGBA, CL_UNSIGNED_INT8}; clImage2D = clConvertImageAMD(context_, clImage2DOriginal, &formatRGBA, &status); CHECK_RESULT(status != CL_SUCCESS, "clConvertImageAMD() failed"); cl_mem fishyBuffer = 0; status = clGetImageInfo(clImage2D, CL_IMAGE_BUFFER, sizeof(fishyBuffer), &fishyBuffer, 0); CHECK_RESULT(status != CL_SUCCESS, "clGetImageInfo(CL_IMAGE_BUFFER) failed"); CHECK_RESULT(fishyBuffer != buffer, "clGetImageInfo() failed, buffer != fishyBuffer"); } else { const cl_image_format format = {CL_RGBA, CL_UNSIGNED_INT8}; #if defined(CL_VERSION_2_0) const cl_image_desc desc = {CL_MEM_OBJECT_IMAGE2D, imageWidth / 4, imageHeight, 0, 0, pitch, 0, 0, 0, {buffer}}; #else const cl_image_desc desc = {CL_MEM_OBJECT_IMAGE2D, imageWidth / 4, imageHeight, 0, 0, pitch, 0, 0, 0, buffer}; #endif clImage2D = _wrapper->clCreateImage(context_, CL_MEM_READ_WRITE, &format, &desc, NULL, &status); } // testing pitch alignment correct check in the runtime if (pitchTest) { CHECK_RESULT(requiredPitch != pitch && (clImage2D != NULL || status != CL_INVALID_IMAGE_FORMAT_DESCRIPTOR), "AllocateOpenCLImage() failed: (clImage2D!=NULL || " "status!=CL_INVALID_IMAGE_FORMAT_DESCRIPTOR) <=> (%p, %x)", clImage2D, status); if (requiredPitch != pitch) { done_ = true; return; } } } delete[] sourceData; { const cl_image_format format = {CL_RGBA, CL_UNSIGNED_INT8}; #if defined(CL_VERSION_2_0) const cl_image_desc desc = {CL_MEM_OBJECT_IMAGE2D, imageWidth / 4, imageHeight, 0, 0, 0, 0, 0, 0, {NULL}}; #else const cl_image_desc desc = {CL_MEM_OBJECT_IMAGE2D, imageWidth / 4, imageHeight, 0, 0, 0, 0, 0, 0, NULL}; #endif clImage2DOut = _wrapper->clCreateImage(context_, CL_MEM_READ_WRITE, &format, &desc, NULL, &status); } CHECK_RESULT(clImage2D == NULL, "AllocateOpenCLImage() failed"); } void OCLImage2DFromBuffer::testReadImage(cl_mem image) { cl_int status = 0; size_t bufferSize = imageWidth * imageHeight; unsigned char *dstData = new unsigned char[bufferSize]; size_t origin[] = {0, 0, 0}; size_t region[] = {imageWidth / 4, imageHeight, 1}; status = clEnqueueReadImage(cmdQueues_[_deviceId], image, 1, origin, region, 0, 0, dstData, 0, 0, 0); ::clFinish(cmdQueues_[_deviceId]); for (unsigned int y = 0; y < imageHeight; y++) { for (unsigned int x = 0; x < imageWidth / 4; x++) { for (unsigned int p = 0; p < 4; p++) { if (*(dstData + y * imageWidth + x * 4 + p) != p) { CHECK_RESULT( true, "CheckCLImage: *(dstData+y*imageWidth+x*4+p)!=p => %i != %i", *(dstData + y * imageWidth + x * 4 + p), p); goto cleanup; } } } } cleanup: delete[] dstData; } void OCLImage2DFromBuffer::testKernel() { CopyOpenCLImage(clImage2D); testReadImage(clImage2DOut); } unsigned int OCLImage2DFromBuffer::close(void) { if (clImage2DOriginal != NULL) clReleaseMemObject(clImage2DOriginal); if (clImage2D != NULL) clReleaseMemObject(clImage2D); if (clImage2DOut != NULL) clReleaseMemObject(clImage2DOut); if (buffer != NULL) clReleaseMemObject(buffer); return OCLTestImp::close(); } void OCLImage2DFromBuffer::CopyOpenCLImage(cl_mem clImageSrc) { cl_int status = 0; // Set appropriate arguments to the kernel2D // input buffer image status = clSetKernelArg(kernel_, 0, sizeof(cl_mem), &clImageSrc); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLImage() failed at " "clSetKernelArg(kernel_,0,sizeof(cl_mem),&clImageSrc)"); status = clSetKernelArg(kernel_, 1, sizeof(cl_mem), &clImage2DOut); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLImage() failed at " "clSetKernelArg(kernel_,1,sizeof(cl_mem),&clImage2DOut)"); // Enqueue a kernel run call. size_t global_work_offset[] = {0, 0}; size_t globalThreads[] = {imageWidth / 4, imageHeight}; size_t localThreads[] = {blockSizeX, blockSizeY}; status = clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, globalThreads, NULL, 0, NULL, 0); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLImage() failed at clEnqueueNDRangeKernel"); status = clFinish(cmdQueues_[_deviceId]); CHECK_RESULT((status != CL_SUCCESS), "CopyOpenCLImage() failed at clFinish"); } void OCLImage2DFromBuffer::CompileKernel() { cl_int status = 0; size_t kernelSize = sizeof(strKernel); const char *strs = (const char *)&strKernel[0]; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strs, &kernelSize, &status); status = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (status != CL_SUCCESS) { if (status == CL_BUILD_PROGRAM_FAILURE) { cl_int logStatus; size_t buildLogSize = 0; logStatus = clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, buildLogSize, NULL, &buildLogSize); std::string buildLog; buildLog.resize(buildLogSize); logStatus = clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, buildLogSize, &buildLog[0], NULL); printf("%s", buildLog.c_str()); } return; } // get a kernel object handle for a kernel with the given name kernel_ = _wrapper->clCreateKernel(program_, "image2imageCopy", &status); size_t kernel2DWorkGroupSize = 0; status = clGetKernelWorkGroupInfo(kernel_, devices_[_deviceId], CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &kernel2DWorkGroupSize, 0); if ((blockSizeX * blockSizeY) > kernel2DWorkGroupSize) { if (blockSizeX > kernel2DWorkGroupSize) { blockSizeX = kernel2DWorkGroupSize; blockSizeY = 1; } } } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLImage2DFromBuffer.h000066400000000000000000000037551450307266000260560ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCLImage2DFromBuffer_H_ #define _OCLImage2DFromBuffer_H_ #include "OCLTestImp.h" class OCLImage2DFromBuffer : public OCLTestImp { public: OCLImage2DFromBuffer(); virtual ~OCLImage2DFromBuffer(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); protected: static const unsigned int imageWidth; static const unsigned int imageHeight; void testReadImage(cl_mem image); void testKernel(); void AllocateOpenCLImage(); void CopyOpenCLImage(cl_mem clImageSrc); void CompileKernel(); bool done_; size_t blockSizeX; /**< Work-group size in x-direction */ size_t blockSizeY; /**< Work-group size in y-direction */ cl_mem buffer; cl_mem clImage2DOriginal; cl_mem clImage2D; cl_mem clImage2DOut; cl_uint pitchAlignment; }; #endif // _OCLImage2DFromBuffer_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLImageCopyPartial.cpp000066400000000000000000000307551450307266000264150ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLImageCopyPartial.h" #include #include #include #include "CL/opencl.h" #include "Timer.h" // Quiet pesky warnings #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif #define NUM_SIZES 2 static const unsigned int Sizes0[NUM_SIZES] = {16384, 16384}; #define NUM_FORMATS 1 static const cl_image_format formats[NUM_FORMATS] = {{CL_R, CL_UNSIGNED_INT16}}; static const char *textFormats[NUM_FORMATS] = {"R8"}; static const unsigned int formatSize[NUM_FORMATS] = {2 * sizeof(cl_uchar)}; static const unsigned int Iterations[2] = {1, OCLImageCopyPartial::NUM_ITER}; #define NUM_SUBTESTS 3 OCLImageCopyPartial::OCLImageCopyPartial() { _numSubTests = NUM_SIZES * NUM_SUBTESTS * NUM_FORMATS * 2; } OCLImageCopyPartial::~OCLImageCopyPartial() {} static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLImageCopyPartial::setData(void *ptr, unsigned int pitch, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; value = 0; for (unsigned int i = 0; i < size >> 2; i++) { ptr2[i] = value; value++; } } void OCLImageCopyPartial::checkData(void *ptr, unsigned int pitch, unsigned int size, unsigned int value) { unsigned int *ptr2 = (unsigned int *)ptr; value = 0; for (unsigned int i = 0; i < size >> 2; i++) { if (ptr2[i] != value) { printf("Data validation failed at %d! Got 0x%08x 0x%08x 0x%08x 0x%08x\n", i, ptr2[i], ptr2[i + 1], ptr2[i + 2], ptr2[i + 3]); printf("Expected 0x%08x 0x%08x 0x%08x 0x%08x\n", value, value, value, value); CHECK_RESULT(true, "Data validation failed!"); break; } value++; } } void OCLImageCopyPartial::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint typeOfDevice = type_; cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; size_t queryOut = 0; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; srcBuffer_ = 0; dstBuffer_ = 0; srcImage_ = false; dstImage_ = false; error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], typeOfDevice, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices // if (num_devices > 0) //{ // platform = platforms[_platformIndex]; // break; //} #if 0 } #endif delete platforms; } bufnum_ = (_openTest / (NUM_SIZES * NUM_SUBTESTS)) % NUM_FORMATS; if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) + 1) & 1) { srcImage_ = true; } if ((((_openTest / NUM_SIZES) % NUM_SUBTESTS) + 1) & 2) { dstImage_ = true; } numIter = Iterations[_openTest / (NUM_SIZES * NUM_SUBTESTS * NUM_FORMATS)]; /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find AMD platform, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, typeOfDevice, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE2D_MAX_WIDTH, sizeof(size_t), &queryOut, NULL); bufSizeW_ = (cl_uint)queryOut; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_IMAGE2D_MAX_HEIGHT, sizeof(size_t), &queryOut, NULL); bufSizeH_ = (cl_uint)queryOut; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); cl_mem_flags flags = CL_MEM_WRITE_ONLY; void *mem; size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSizeW_, bufSizeH_, 1}; size_t image_row_pitch; size_t image_slice_pitch; unsigned int memSize; if (_openTest % NUM_SIZES) { origin[0] = bufSizeW_ - 16; region[0] = 16; } else { origin[1] = bufSizeH_ - 16; region[1] = 16; } if (dstImage_) { dstBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSizeW_, bufSizeH_, 0, NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateImage(dstBuffer) failed"); mem = _wrapper->clEnqueueMapImage( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * (unsigned int)region[1]; } else { dstBuffer_ = _wrapper->clCreateBuffer( context_, flags, region[0] * region[1] * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(dstBuffer_ == 0, "clCreateBuffer(dstBuffer) failed"); mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_WRITE, 0, region[0] * region[1] * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)region[0] * (unsigned int)region[1] * formatSize[bufnum_]; image_row_pitch = 0; } unsigned int *ptr2 = (unsigned int *)mem; for (unsigned int i = 0; i < memSize >> 2; i++) { ptr2[i] = 0xdeadbeef; } _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); flags = CL_MEM_READ_ONLY; if (srcImage_) { srcBuffer_ = _wrapper->clCreateImage2D(context_, flags, &formats[bufnum_], bufSizeW_, bufSizeH_, 0, NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateImage(srcBuffer) failed"); mem = _wrapper->clEnqueueMapImage( cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * (unsigned int)region[1]; } else { srcBuffer_ = _wrapper->clCreateBuffer( context_, flags, region[0] * region[1] * formatSize[bufnum_], NULL, &error_); CHECK_RESULT(srcBuffer_ == 0, "clCreateBuffer(srcBuffer) failed"); mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, srcBuffer_, CL_TRUE, CL_MAP_WRITE, 0, region[0] * region[1] * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)region[0] * (unsigned int)region[1] * formatSize[bufnum_]; image_row_pitch = 0; } setData(mem, (unsigned int)image_row_pitch, memSize, 0xdeadbeef); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, srcBuffer_, mem, 0, NULL, NULL); } void OCLImageCopyPartial::run(void) { size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSizeW_, bufSizeH_, 1}; if (_openTest % NUM_SIZES) { origin[0] = bufSizeW_ - 16; region[0] = 16; } else { origin[1] = bufSizeH_ - 16; region[1] = 16; } // Warm up if (srcImage_ == false) { error_ = _wrapper->clEnqueueCopyBufferToImage( cmd_queue_, srcBuffer_, dstBuffer_, 0, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBufferToImage failed"); } else if (dstImage_ == false) { error_ = _wrapper->clEnqueueCopyImageToBuffer( cmd_queue_, srcBuffer_, dstBuffer_, origin, region, 0, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImageToBuffer failed"); } else { error_ = _wrapper->clEnqueueCopyImage(cmd_queue_, srcBuffer_, dstBuffer_, origin, origin, region, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyImage failed"); } error_ = _wrapper->clFinish(cmd_queue_); CHECK_RESULT(error_, "clFinish failed"); const char *strSrc = NULL; const char *strDst = NULL; if (srcImage_) strSrc = "img"; else strSrc = "buf"; if (dstImage_) strDst = "img"; else strDst = "buf"; void *mem; size_t image_row_pitch; size_t image_slice_pitch; unsigned int memSize; if (dstImage_) { mem = _wrapper->clEnqueueMapImage( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapImage failed"); memSize = (unsigned int)image_row_pitch * (unsigned int)region[1]; } else { mem = _wrapper->clEnqueueMapBuffer( cmd_queue_, dstBuffer_, CL_TRUE, CL_MAP_READ, 0, region[0] * region[1] * formatSize[bufnum_], 0, NULL, NULL, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); memSize = (unsigned int)region[0] * (unsigned int)region[1] * formatSize[bufnum_]; image_row_pitch = 0; } checkData(mem, (unsigned int)image_row_pitch, memSize, 0x600df00d); _wrapper->clEnqueueUnmapMemObject(cmd_queue_, dstBuffer_, mem, 0, NULL, NULL); char buf[256]; SNPRINTF(buf, sizeof(buf), " (%4dx%4d) fmt:%s src:%s dst:%s i: %4d (GB/s) ", bufSizeW_, bufSizeH_, textFormats[bufnum_], strSrc, strDst, numIter); testDescString = buf; } unsigned int OCLImageCopyPartial::close(void) { _wrapper->clFinish(cmd_queue_); if (srcBuffer_) { error_ = _wrapper->clReleaseMemObject(srcBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(srcBuffer_) failed"); } if (dstBuffer_) { error_ = _wrapper->clReleaseMemObject(dstBuffer_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(dstBuffer_) failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLImageCopyPartial.h000066400000000000000000000037651450307266000260630ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ImageCopyCorners_H_ #define _OCL_ImageCopyCorners_H_ #include "OCLTestImp.h" class OCLImageCopyPartial : public OCLTestImp { public: OCLImageCopyPartial(); virtual ~OCLImageCopyPartial(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); static const unsigned int NUM_ITER = 1; cl_context context_; cl_command_queue cmd_queue_; cl_mem srcBuffer_; cl_mem dstBuffer_; cl_int error_; unsigned int bufSizeW_; unsigned int bufSizeH_; unsigned int bufnum_; bool srcImage_; bool dstImage_; unsigned int numIter; void setData(void* ptr, unsigned int pitch, unsigned int size, unsigned int value); void checkData(void* ptr, unsigned int pitch, unsigned int size, unsigned int value); }; #endif // _OCL_ImageCopyPartial_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLKernelBinary.cpp000066400000000000000000000245661450307266000256130ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLKernelBinary.h" #include #include #include #include "CL/cl.h" const static char* strKernel12 = "typedef struct ST { \n" " int i0; \n" " int i1; \n" "} ST_t; \n" " \n" "__constant ST_t STCArray[2] = { \n" " { 1, 0 }, \n" " { 2, 1 } \n" "}; \n" " \n" "__kernel void foo (__global int *p, int n) \n" "{ \n" " int s = 0; \n" " int i; \n" " for (i=0; i < n; ++i) { \n" " s += STCArray[i].i0 + STCArray[i].i1; \n" " } \n" " *p = s; \n" "} \n"; const static char* strKernel20 = "typedef struct ST { \n" " int i0; \n" " int i1; \n" "} ST_t; \n" " \n" "__constant ST_t STCArray[2] = { \n" " { -1, 0 }, \n" " { 3, -1 } \n" "}; \n" " \n" "__global int var = 1; \n" " \n" "__kernel void foo (__global int *p, int n) \n" "{ \n" " int s = 0; \n" " int i; \n" " for (i=0; i < n; ++i) { \n" " s += STCArray[i].i0 + STCArray[i].i1 + var++; \n" " } \n" " p[get_global_id(0)] = s; \n" "} \n"; OCLKernelBinary::OCLKernelBinary() { _numSubTests = 2; } OCLKernelBinary::~OCLKernelBinary() {} void OCLKernelBinary::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); char strVersion[128]; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_VERSION, sizeof(strVersion), strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (test == 1 && strVersion[7] < '2') { program_ = NULL; return; } const char *options, *options0; const char* strKernel; switch (test) { case 0: options = ""; options0 = "-O0"; strKernel = strKernel12; break; case 1: options = "-cl-std=CL2.0"; options0 = "-cl-std=CL2.0 -O0"; strKernel = strKernel20; break; default: assert(false); return; } CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], options, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); size_t* sizes = new size_t[deviceCount_]; CHECK_RESULT(((sizes != NULL) ? false : true), "malloc()"); size_t* sizes1 = new size_t[deviceCount_]; CHECK_RESULT(((sizes1 != NULL) ? false : true), "malloc()"); size_t* sizes2 = new size_t[deviceCount_]; CHECK_RESULT(((sizes2 != NULL) ? false : true), "malloc()"); unsigned int programInfoDeviceIdIndex = 0; cl_device_id* programInfoDevices = new cl_device_id[deviceCount_]; CHECK_RESULT(((programInfoDevices != NULL) ? false : true), "malloc()"); // get an array of device Id's that relate to values order returned by // 'clGetProgramInfo' error_ = _wrapper->clGetProgramInfo(program_, CL_PROGRAM_DEVICES, sizeof(cl_device_id) * deviceCount_, programInfoDevices, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetProgramInfo()"); // map between the class devices_ array and the programInfoDeviceId array for (unsigned int i = 0; i < deviceCount_; i++) { if (devices_[deviceId] == programInfoDevices[i]) { programInfoDeviceIdIndex = i; } } delete[] programInfoDevices; error_ = _wrapper->clGetProgramInfo(program_, CL_PROGRAM_BINARY_SIZES, sizeof(size_t) * deviceCount_, sizes, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetProgramInfo()"); unsigned char** binaries = new unsigned char*[deviceCount_]; CHECK_RESULT(((binaries != NULL) ? false : true), "malloc()"); for (unsigned int i = 0; i < deviceCount_; i++) { if (sizes[i] > 0) { binaries[i] = new unsigned char[sizes[i]]; CHECK_RESULT(((binaries[i] != NULL) ? false : true), "malloc()"); } else { binaries[i] = NULL; } } error_ = _wrapper->clGetProgramInfo(program_, CL_PROGRAM_BINARIES, sizeof(unsigned char*) * deviceCount_, binaries, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetProgramInfo()"); error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT((error_ != CL_SUCCESS), "clReleaseProgram()"); const unsigned char* cBinary = binaries[programInfoDeviceIdIndex]; cl_int status; program_ = _wrapper->clCreateProgramWithBinary( context_, 1, &devices_[deviceId], &(sizes[programInfoDeviceIdIndex]), &cBinary, &status, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithBinary()"); for (unsigned int i = 0; i < deviceCount_; i++) { if (binaries[i] != NULL) delete[] binaries[i]; } delete[] binaries; error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], options0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetProgramInfo()"); error_ = _wrapper->clGetProgramInfo(program_, CL_PROGRAM_BINARY_SIZES, sizeof(size_t) * deviceCount_, sizes1, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "1st clGetProgramInfo()"); kernel_ = _wrapper->clCreateKernel(program_, "foo", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "1st clCreateKernel() failed"); _wrapper->clReleaseKernel(kernel_); CHECK_RESULT((error_ != CL_SUCCESS), "1st clReleaseKernel() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], options0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetProgramInfo()"); error_ = _wrapper->clGetProgramInfo(program_, CL_PROGRAM_BINARY_SIZES, sizeof(size_t) * deviceCount_, sizes2, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "2nd clGetProgramInfo()"); kernel_ = _wrapper->clCreateKernel(program_, "foo", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "2nd clCreateKernel() failed"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, 2 * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); delete[] sizes; delete[] sizes1; delete[] sizes2; } void OCLKernelBinary::run(void) { if (program_ == NULL) { return; } cl_mem buffer = buffers()[0]; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); cl_int num = 2; error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_int), &num); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {2}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); cl_uint outputV[2] = {0}; error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffer, true, 0, 2 * sizeof(cl_uint), outputV, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); if (outputV[0] != 4) { CHECK_RESULT(true, "Incorrect result of kernel execution!"); } } unsigned int OCLKernelBinary::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLKernelBinary.h000066400000000000000000000027541450307266000252530ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_KERNEL_BINARY_H_ #define _OCL_KERNEL_BINARY_H_ #include "OCLTestImp.h" class OCLKernelBinary : public OCLTestImp { public: OCLKernelBinary(); virtual ~OCLKernelBinary(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); }; #endif // _OCL_KERNEL_BINARY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLLDS32K.cpp000066400000000000000000000314261450307266000241210ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLLDS32K.h" #include #include #include #include #include "CL/cl.h" // #include #include typedef unsigned int uint32_t; #if EMU_ENV #define LDS_SIZE 1024 #define A_SIZE 1024 #else #define LDS_SIZE 32768 #define A_SIZE (8 * 1024 * 1024) #endif // EMU_ENV #define LOCAL_WORK_SIZE 64 // We'll do a 64MB transaction #define B_SIZE A_SIZE #define C_SIZE A_SIZE #define D_SIZE A_SIZE #define GLOBAL_WORK_SIZE (A_SIZE / LDS_SIZE * LOCAL_WORK_SIZE) #define TEST_NAME "lds 32K" // 32K has 8192 elements // 64 threads each handle 8192/64=128 values #if EMU_ENV static const char program_source[] = KERNEL( __kernel void the_kernel(__global const uint *a, __global const uint *b, __global const uint *c, __global uint *d, __global uint *e) { // Reduce size for the emulator __local uint lds[256]; uint gid = get_global_id(0); __global const uint* ta = a + 4 * gid; __global const uint* tb = b + 4 * gid; __global const uint* tc = c + 4 * gid; __global uint* td = d + 4 * gid; uint i; for (i = 0; i < 4; ++i) lds[ta[i]] = tc[i]; barrier(CLK_LOCAL_MEM_FENCE); for (i = 0; i < 4; ++i) td[i] = lds[tb[i]]; } __kernel void the_kernel2(__global uint* d) { __local uint lds[8192]; uint i; uint gid = get_global_id(0); for (i = 0; i < 128; ++i) lds[i] = d[gid]; barrier(CLK_LOCAL_MEM_FENCE); for (i = 0; i < 128; ++i) d[gid] = lds[i]; }); #else static const char program_source[] = KERNEL( __kernel void the_kernel(__global const uint *a, __global const uint *b, __global const uint *c, __global uint *d, __global uint *e) { __local uint lds[8192]; uint gid = get_global_id(0); __global const uint *ta = a + 128 * gid; __global const uint *tb = b + 128 * gid; __global const uint *tc = c + 128 * gid; __global uint *td = d + 128 * gid; uint i; for (i = 0; i < 128; ++i) lds[ta[i]] = tc[i]; barrier(CLK_LOCAL_MEM_FENCE); for (i = 0; i < 128; ++i) td[i] = lds[tb[i]]; } __kernel void the_kernel2(__global uint *d) { __local uint lds[8192]; uint i; uint gid = get_global_id(0); for (i = 0; i < 128; ++i) lds[i] = d[gid]; barrier(CLK_LOCAL_MEM_FENCE); for (i = 0; i < 128; ++i) d[gid] = lds[i]; }); #endif // EMU_ENV static void fill(uint32_t *a, uint32_t *b, uint32_t *c, uint32_t *d, uint32_t *e) { uint32_t i, j, k, t; static uint32_t p[LDS_SIZE / 4]; static int is_set = 0; if (!is_set) { for (i = 0; i < LDS_SIZE / 4; ++i) p[i] = i; is_set = 1; } for (j = 0; j < A_SIZE / LDS_SIZE; ++j) { for (i = 0; i < LDS_SIZE / 4; ++i) { k = rand() % (LDS_SIZE / 4); t = p[i]; p[i] = p[k]; p[k] = t; c[i] = rand(); } memcpy(a, p, LDS_SIZE); for (i = 0; i < LDS_SIZE / 4; ++i) { k = rand() % (LDS_SIZE / 4); t = p[i]; p[i] = p[k]; p[k] = t; d[i] = 0xfeedbeefU; } memcpy(b, p, LDS_SIZE); a += LDS_SIZE / 4; b += LDS_SIZE / 4; c += LDS_SIZE / 4; d += LDS_SIZE / 4; } } static int check(const uint32_t *a, const uint32_t *b, const uint32_t *c, const uint32_t *d, const uint32_t *e) { uint32_t i, j, t; uint32_t lds[LDS_SIZE / 4]; for (j = 0; j < A_SIZE / LDS_SIZE; ++j) { for (i = 0; i < LDS_SIZE / 4; ++i) lds[i] = 0xdeadbeef; for (i = 0; i < LDS_SIZE / 4; ++i) lds[a[i]] = c[i]; for (i = 0; i < LDS_SIZE / 4; ++i) { t = lds[b[i]]; if (d[i] != t) { printf("mismatch group %u thread %u element %u: %u instead of %u\n", j, i / 128, i % 128, d[i], t); return EXIT_FAILURE; } } a += LDS_SIZE / 4; b += LDS_SIZE / 4; c += LDS_SIZE / 4; d += LDS_SIZE / 4; } return EXIT_SUCCESS; } #ifndef E_SIZE #define E_SIZE 32 #endif void OCLLDS32K::setup_run(const char *cmplr_opt) { cl_ulong lsize; const char *ps[2]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_LOCAL_MEM_SIZE, sizeof(lsize), &lsize, NULL); if (lsize < LDS_SIZE) { fprintf(stderr, "Passed! Test does not support 32kb of lds space!"); return; } // create the program ps[0] = program_source; program_ = _wrapper->clCreateProgramWithSource(context_, 1, ps, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); // build the program error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], cmplr_opt, NULL, NULL); if (error_ != CL_SUCCESS) { char build_log[16384]; size_t log_sz; fprintf(stderr, "build program failed, err=%d\n", error_); error_ = _wrapper->clGetProgramBuildInfo( program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, sizeof(build_log), build_log, &log_sz); if (error_ != CL_SUCCESS) fprintf(stderr, "failed to get build log, err=%d\n", error_); else fprintf(stderr, "----- Build Log -----\n%s\n----- ----- --- -----\n", build_log); return; } // create the kernel kernel_ = _wrapper->clCreateKernel(program_, "the_kernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "create a kernel failed"); // create the kernel kernel2_ = _wrapper->clCreateKernel(program_, "the_kernel2", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "create a kernel failed"); // allocate the buffer memory objects a_buf_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, A_SIZE, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "create a buffer a failed"); buffers_.push_back(a_buf_); b_buf_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, B_SIZE, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "create a buffer b failed"); buffers_.push_back(b_buf_); c_buf_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, C_SIZE, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "create a buffer c failed"); buffers_.push_back(c_buf_); d_buf_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, D_SIZE, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "create a buffer d failed"); buffers_.push_back(d_buf_); e_buf_ = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, E_SIZE, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "create a buffer e failed"); buffers_.push_back(e_buf_); // set the args values error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&a_buf_); error_ |= _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&b_buf_); error_ |= _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_mem), (void *)&c_buf_); error_ |= _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_mem), (void *)&d_buf_); error_ |= _wrapper->clSetKernelArg(kernel_, 4, sizeof(cl_mem), (void *)&e_buf_); CHECK_RESULT((error_ != CL_SUCCESS), "setkernelArg failed!"); error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), (void *)&d_buf_); CHECK_RESULT((error_ != CL_SUCCESS), "setkernelArg failed!"); } void OCLLDS32K::cleanup_run() { if (kernel2_) { _wrapper->clReleaseKernel(kernel2_); } } void OCLLDS32K::exec_kernel(void *a_mem, void *b_mem, void *c_mem, void *d_mem, void *e_mem) { size_t global_work_size[1]; size_t local_work_size[1]; // Send data to device error_ = _wrapper->clEnqueueWriteBuffer( cmdQueues_[_deviceId], a_buf_, CL_TRUE, 0, A_SIZE, a_mem, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueWritebuffer failed"); error_ = _wrapper->clEnqueueWriteBuffer( cmdQueues_[_deviceId], b_buf_, CL_TRUE, 0, B_SIZE, b_mem, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueWritebuffer failed"); error_ = _wrapper->clEnqueueWriteBuffer( cmdQueues_[_deviceId], c_buf_, CL_TRUE, 0, C_SIZE, c_mem, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueWritebuffer failed"); // set work-item dimensions global_work_size[0] = GLOBAL_WORK_SIZE; local_work_size[0] = LOCAL_WORK_SIZE; // execute kernel error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel failed"); // execute kernel error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel failed"); // execute kernel error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel failed"); // read results error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], d_buf_, CL_TRUE, 0, D_SIZE, d_mem, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], e_buf_, CL_TRUE, 0, E_SIZE, e_mem, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_RESULT((error_ != CL_SUCCESS), "clFinish failed"); } const char *OCLLDS32K::kernel_src = ""; static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} OCLLDS32K::OCLLDS32K() { _numSubTests = 1; } OCLLDS32K::~OCLLDS32K() {} void OCLLDS32K::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { _deviceId = deviceId; testID_ = test; OCLTestImp::open(test, units, conversion, _deviceId); } void OCLLDS32K::run(void) { void *a; void *b; void *c; void *d; void *e; const char *cmplr_opt = NULL; int j, nj; double f, dj, p; nj = 5; setup_run(cmplr_opt); CHECK_RESULT((error_ != CL_SUCCESS), "setup_run failed!"); p = 10.0; dj = 100.0 / (double)nj; a = malloc(A_SIZE); CHECK_RESULT((a == NULL), "malloc failed"); memset(a, 0, A_SIZE); b = malloc(B_SIZE); CHECK_RESULT((b == NULL), "malloc failed"); memset(b, 0, B_SIZE); c = malloc(C_SIZE); CHECK_RESULT((c == NULL), "malloc failed"); memset(c, 0, C_SIZE); d = malloc(D_SIZE); CHECK_RESULT((d == NULL), "malloc failed"); memset(d, 0, D_SIZE); e = malloc(E_SIZE); CHECK_RESULT((e == NULL), "malloc failed"); memset(e, 0, E_SIZE); // printf("Testing " TEST_NAME " on %s\n", argv[1]); for (j = 0; j < nj; ++j) { fill((uint32_t *)a, (uint32_t *)b, (uint32_t *)c, (uint32_t *)d, (uint32_t *)e); // printf("%s Test %d: ", sDevice, j); exec_kernel(a, b, c, d, e); CHECK_RESULT((error_ != CL_SUCCESS), "exec_kernel failed!"); CHECK_RESULT((check((uint32_t *)a, (uint32_t *)b, (uint32_t *)c, (uint32_t *)d, (uint32_t *)e) < 0), " Failed!\n"); f = (j + 1) * dj; if (nj > 1 && f >= p) { // printf("%.1lf%%...\n", f); // fflush(stdout); p += 10.0; } } } unsigned int OCLLDS32K::close(void) { cleanup_run(); return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLLDS32K.h000066400000000000000000000034321450307266000235620ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_LDS32K_H_ #define _OCL_LDS32K_H_ #include "OCLTestImp.h" class OCLLDS32K : public OCLTestImp { public: OCLLDS32K(); virtual ~OCLLDS32K(); public: virtual void open(unsigned int test, char *units, double &conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); void setup_run(const char *cmplr_opt); void cleanup_run(); void exec_kernel(void *a_mem, void *b_mem, void *c_mem, void *d_mem, void *e_mem); static const char *kernel_src; cl_kernel kernel2_; private: unsigned int testID_; cl_mem a_buf_; cl_mem b_buf_; cl_mem c_buf_; cl_mem d_buf_; cl_mem e_buf_; }; #endif // _OCL_LDS32K_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLLinearFilter.cpp000066400000000000000000000175441450307266000256040ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLLinearFilter.h" #include #include #include #include "CL/cl.h" const static size_t ImageSize = 4; const static char* strKernel = "const sampler_t g_Sampler = CLK_FILTER_LINEAR | \n" " CLK_ADDRESS_CLAMP_TO_EDGE | \n" " CLK_NORMALIZED_COORDS_FALSE; \n" " \n" "__kernel void linear3D(__read_only image3d_t img3D, __global float4* " "f4Tata) \n" "{ \n" " float4 f4Index = { 2.25f, 1.75f, 0.5f, 0.0f }; \n" " // copy interpolated data in result buffer \n" " f4Tata[0] = read_imagef(img3D, g_Sampler, f4Index); \n" "} \n" " \n" "__kernel void linear2D(__read_only image2d_t img2D, __global float4* " "f4Tata) \n" "{ \n" " float2 f2Index = { 2.25f, 1.75f }; \n" " // copy interpolated data in result buffer \n" " f4Tata[0] = read_imagef(img2D, g_Sampler, f2Index); \n" "} \n" " \n"; OCLLinearFilter::OCLLinearFilter() { done_ = false; _numSubTests = 2; } OCLLinearFilter::~OCLLinearFilter() {} void OCLLinearFilter::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); cl_bool imageSupport; size_t size; for (size_t i = 0; i < deviceCount_; ++i) { _wrapper->clGetDeviceInfo(devices_[i], CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport), &imageSupport, &size); if (!imageSupport) { testDescString = "Image not supported, skipping this test! "; done_ = true; return; } } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); const char* kernels[2] = {"linear3D", "linear2D"}; kernel_ = _wrapper->clCreateKernel(program_, kernels[test], &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem memory; size_t offset[3] = {0, 0, 0}; cl_image_format imageFormat = {CL_RGBA, CL_FLOAT}; if (test == 0) { float data[ImageSize][ImageSize][ImageSize][4]; float index = 0.f; size_t region[3] = {ImageSize, ImageSize, ImageSize}; for (size_t z = 0; z < ImageSize; ++z) { for (size_t y = 0; y < ImageSize; ++y) { for (size_t x = 0; x < ImageSize; ++x) { data[z][y][x][0] = (float)x; data[z][y][x][1] = (float)y; data[z][y][x][2] = (float)z; data[z][y][x][3] = 1.0f; } } } memory = _wrapper->clCreateImage3D(context_, CL_MEM_READ_ONLY, &imageFormat, ImageSize, ImageSize, ImageSize, 0, 0, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateImage() failed"); error_ = _wrapper->clEnqueueWriteImage(cmdQueues_[_deviceId], memory, true, offset, region, 0, 0, data, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteImage() failed"); } else { float data[4][ImageSize][ImageSize]; size_t region[3] = {ImageSize, ImageSize, 1}; for (size_t y = 0; y < ImageSize; ++y) { for (size_t x = 0; x < ImageSize; ++x) { data[y][x][0] = (float)x; data[y][x][1] = (float)y; data[y][x][2] = data[y][x][3] = 1.0f; } } memory = _wrapper->clCreateImage2D(context_, CL_MEM_READ_ONLY, &imageFormat, ImageSize, ImageSize, 0, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateImage() failed"); error_ = _wrapper->clEnqueueWriteImage(cmdQueues_[_deviceId], memory, true, offset, region, 0, 0, data, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteImage() failed"); } buffers_.push_back(memory); memory = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, 4 * sizeof(cl_float), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(memory); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLLinearFilter::run(void) { if (done_) { return; } cl_float values[4] = {0.f, 0.f, 0.f, 0.f}; cl_float ref[2] = {1.75f, 1.25f}; cl_mem image = buffers()[0]; cl_mem buffer = buffers()[1]; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &image); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {0x1}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffer, true, 0, 4 * sizeof(cl_float), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (cl_uint i = 0; i < 2; ++i) { if (values[i] != ref[i]) { printf("%.2f != %.2f [ref]", values[i], ref[i]); CHECK_RESULT(true, " - Incorrect result for linear filtering!\n"); } } } unsigned int OCLLinearFilter::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLLinearFilter.h000066400000000000000000000030051450307266000252340ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_LINEAR_FILTER_H_ #define _OCL_LINEAR_FILTER_H_ #include "OCLTestImp.h" class OCLLinearFilter : public OCLTestImp { public: OCLLinearFilter(); virtual ~OCLLinearFilter(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool done_; }; #endif // _OCL_LINEAR_FILTER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMapCount.cpp000066400000000000000000000071721450307266000247460ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLMapCount.h" #include #include #include #include "CL/cl.h" OCLMapCount::OCLMapCount() { _numSubTests = 1; } OCLMapCount::~OCLMapCount() {} void OCLMapCount::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); size_t size; clMemWrapper memObject; // Get the address alignment, so we can make sure the sub buffer test later // works properly cl_uint addressAlign; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_MEM_BASE_ADDR_ALIGN, sizeof(addressAlign), &addressAlign, NULL); if (addressAlign < 128) addressAlign = 128; void* void_buffer = malloc(addressAlign * 4); // Create a buffer to test against memObject = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, addressAlign * 4, void_buffer, &error_); if (error_) { free(void_buffer); printf("Unable to create buffer to test"); } // Map buffer void* mapped = _wrapper->clEnqueueMapBuffer( cmdQueues_[deviceId], memObject, true, CL_MAP_READ, 0, addressAlign * 4, 0, NULL, NULL, &error_); cl_uint mapCount; // Find the number of mappings on buffer after map error_ = _wrapper->clGetMemObjectInfo(memObject, CL_MEM_MAP_COUNT, sizeof(mapCount), &mapCount, &size); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to get mem object map count"); if (mapCount != 1) { printf( "ERROR: Returned mem object map count does not validate! (expected %d, " "got %d)\n", 1, mapCount); return; } // Unmap buffer error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[deviceId], memObject, mapped, 0, NULL, NULL); // Find the number of mappings on buffer after unmap error_ = _wrapper->clGetMemObjectInfo(memObject, CL_MEM_MAP_COUNT, sizeof(mapCount), &mapCount, &size); CHECK_RESULT((error_ != CL_SUCCESS), "Unable to get mem object map count"); if (mapCount != 0) { printf( "ERROR: Returned mem object map count does not validate! (expected %d, " "got %d)\n", 0, mapCount); return; } } void OCLMapCount::run(void) {} unsigned int OCLMapCount::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMapCount.h000066400000000000000000000036111450307266000244050ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MAP_COUNT_H_ #define _OCL_MAP_COUNT_H_ #include "OCLTestImp.h" class OCLMapCount : public OCLTestImp { public: OCLMapCount(); virtual ~OCLMapCount(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); }; #endif // _OCL_MAP_COUNT_H_ class clMemWrapper { public: clMemWrapper() { mMem = NULL; } clMemWrapper(cl_mem mem) { mMem = mem; } ~clMemWrapper() { if (mMem != NULL) clReleaseMemObject(mMem); } clMemWrapper& operator=(const cl_mem& rhs) { mMem = rhs; return *this; } operator cl_mem() { return mMem; } cl_mem* operator&() { return &mMem; } bool operator==(const cl_mem& rhs) { return mMem == rhs; } protected: cl_mem mMem; }; clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMemDependency.cpp000066400000000000000000000144001450307266000257250ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLMemDependency.h" #include #include #include #include "CL/cl.h" const static cl_uint Stages = 4; const static cl_uint ThreadsForCheck = 1 << Stages; #define KERNEL_CODE(...) #__VA_ARGS__ const static char* strKernel = KERNEL_CODE( \n __kernel void bitonicSort(__global uint2* keys, uint stage, uint pass) { const uint thread = get_global_id(0); const uint pairDistance = 1 << (stage - pass); /* The purpose of this is to introduce an additional zero at stage - pass * bit*/ const uint leftID = (thread & (pairDistance - 1)) | ((thread & ~(pairDistance - 1)) << 1); /* Is the same as below */ const uint direction = ((thread >> stage) & 1) == 1 ? 0 : 1; const uint rightID = leftID + pairDistance; const uint2 left = keys[leftID]; const uint2 right = keys[rightID]; const uint2 larger = left.x > right.x ? left : right; const uint2 smaller = left.x > right.x ? right : left; keys[leftID] = direction ? smaller : larger; keys[rightID] = direction ? larger : smaller; } \n); OCLMemDependency::OCLMemDependency() { _numSubTests = 1; } OCLMemDependency::~OCLMemDependency() {} void OCLMemDependency::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); char dbuffer[1024] = {0}; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "bitonicSort", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, ThreadsForCheck * sizeof(cl_uint2), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); cl_buffer_region reg = {0, ThreadsForCheck * sizeof(cl_uint2)}; buffer = _wrapper->clCreateSubBuffer(buffers()[0], CL_MEM_READ_WRITE, CL_BUFFER_CREATE_TYPE_REGION, ®, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLMemDependency::run(void) { cl_uint2 values[ThreadsForCheck] = { {{3, 0}}, {{1, 5}}, {{4, 6}}, {{2, 4}}, {{0, 3}}, {{5, 10}}, {{15, 7}}, {{13, 8}}, {{10, 2}}, {{9, 1}}, {{7, 11}}, {{11, 9}}, {{14, 12}}, {{12, 14}}, {{6, 13}}, {{8, 15}}}; cl_uint2 reference[ThreadsForCheck] = { {{0, 3}}, {{1, 5}}, {{3, 0}}, {{2, 4}}, {{4, 6}}, {{5, 10}}, {{6, 13}}, {{8, 15}}, {{7, 11}}, {{9, 1}}, {{10, 2}}, {{11, 9}}, {{14, 12}}, {{12, 14}}, {{15, 7}}, {{13, 8}}}; cl_uint2 results[ThreadsForCheck]; cl_mem buffer = buffers()[0]; error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], buffer, true, 0, sizeof(values), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); size_t gws[1] = {ThreadsForCheck}; for (unsigned int i = 0; i < Stages; ++i) { buffer = buffers()[i % 2]; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); for (unsigned int j = 0; j < i; ++j) { error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(unsigned int), &i); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(unsigned int), &j); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } } buffer = buffers()[0]; error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffer, true, 0, sizeof(results), results, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (unsigned int i = 0; i < ThreadsForCheck; ++i) { if ((results[i].s[0] != reference[i].s[0]) || (results[i].s[1] != reference[i].s[1])) { CHECK_RESULT(true, "Incorrect result for dependency!\n"); } } } unsigned int OCLMemDependency::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMemDependency.h000066400000000000000000000027621450307266000254020ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MEM_DEPENDENCY_H_ #define _OCL_MEM_DEPENDENCY_H_ #include "OCLTestImp.h" class OCLMemDependency : public OCLTestImp { public: OCLMemDependency(); virtual ~OCLMemDependency(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); }; #endif // _OCL_MEM_DEPENDENCY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMemObjs.cpp000066400000000000000000000105051450307266000245460ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLMemObjs.h" #include #include #include #include #include #include #include const char* OCLMemObjs::kernel_src = ""; static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} OCLMemObjs::OCLMemObjs() { _numSubTests = 1; } OCLMemObjs::~OCLMemObjs() {} void OCLMemObjs::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { _crcword = 0; conversion = 1.0f; _deviceId = deviceId; } int OCLMemObjs::test(void) { cl_int err; std::vector platforms; cl::Platform::get(&platforms); if (platforms.empty()) { std::cerr << "Platform::get() failed \n"; return EXIT_FAILURE; } cl_context_properties properties[] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[0])(), 0}; cl::Context context(CL_DEVICE_TYPE_ALL, properties, NULL, NULL, &err); if (err != CL_SUCCESS) { std::cerr << "Context::Context() failed (" << err << ")\n"; return EXIT_FAILURE; } std::vector devices = context.getInfo(); if (err != CL_SUCCESS) { std::cerr << "Context::getInfo() failed (" << err << ")\n"; return EXIT_FAILURE; } if (devices.size() == 0) { std::cerr << "No device available\n"; return EXIT_FAILURE; } const char source[] = "__kernel void test_memobjs(__global int* ptr) {}"; cl::Program::Sources sources(1, std::make_pair(source, 0)); cl::Program program(context, sources, &err); if (err != CL_SUCCESS) { std::cerr << "Program::Program() failed (" << err << ")\n"; return EXIT_FAILURE; } err = program.build(devices); if (err != CL_SUCCESS) { std::cerr << "Program::build() failed (" << err << ")\n"; return EXIT_FAILURE; } cl::Kernel kernel(program, "test_memobjs", &err); if (err != CL_SUCCESS) { std::cerr << "Kernel::Kernel() failed (" << err << ")\n"; return EXIT_FAILURE; } if (err != CL_SUCCESS) { std::cerr << "Kernel::setArg() failed (" << err << ")\n"; return EXIT_FAILURE; } cl::CommandQueue queue(context, devices[0], 0, &err); if (err != CL_SUCCESS) { std::cerr << "CommandQueue::CommandQueue() failed (" << err << ")\n"; return EXIT_FAILURE; } cl::Buffer buffer(context, (cl_mem_flags)0, 1024, NULL, &err); if (err != CL_SUCCESS) { std::cerr << "Buffer::Buffer() failed (" << err << ")\n"; return EXIT_FAILURE; } err = kernel.setArg(0, buffer); if (err != CL_SUCCESS) { std::cerr << "Kernel::setArg() failed (" << err << ")\n"; return EXIT_FAILURE; } err = queue.enqueueTask(kernel); if (err != CL_SUCCESS) { std::cerr << "CommandQueue::enqueueTask() failed (" << err << ")\n"; } // Force a clReleaseMemoryObject on buffer before dispatch. buffer = cl::Buffer(); err = queue.finish(); if (err != CL_SUCCESS) { std::cerr << "CommandQueue::finish() failed (" << err << ")\n"; } // std::cout << " Test: Pass!\n"; return EXIT_SUCCESS; } void OCLMemObjs::run(void) { CHECK_RESULT((test() != EXIT_SUCCESS), "test failed"); } unsigned int OCLMemObjs::close(void) { return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMemObjs.h000066400000000000000000000030601450307266000242110ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_Mem_Objs_H_ #define _OCL_Mem_Objs_H_ #include "CL/cl.h" #include "OCLTestImp.h" class OCLMemObjs : public OCLTestImp { public: OCLMemObjs(); virtual ~OCLMemObjs(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); int test(void); static const char* kernel_src; private: cl_int error; }; #endif // _OCL_Mem_Objs_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMemoryInfo.cpp000066400000000000000000000167571450307266000253150ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLMemoryInfo.h" #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" OCLMemoryInfo::OCLMemoryInfo() { // Run the second test with 64 bit only _numSubTests = (sizeof(int*) == 8) ? 2 : 1; failed_ = false; } OCLMemoryInfo::~OCLMemoryInfo() {} void OCLMemoryInfo::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { _deviceId = deviceId; test_ = test; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } char name[1024] = {0}; size_t size = 0; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_EXTENSIONS, 1024, name, &size); if (!strstr(name, "cl_amd_device_attribute_query")) { printf("AMD device attribute extension is required for this test!\n"); failed_ = true; return; } // Observed failures with APUs on GSL path due to incorrect available memory, // reported for visible heap cl_bool is_apu = false; error_ = clGetDeviceInfo(devices_[deviceId], CL_DEVICE_HOST_UNIFIED_MEMORY, sizeof(cl_bool), &is_apu, nullptr); if (is_apu && (test == 1)) { printf("Test not supported for apus, skipping...\n"); failed_ = true; return; } } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLMemoryInfo::run(void) { if (failed_) { return; } #if EMU_ENV size_t BufSize = 0x10000; #else size_t BufSize = 0x1000000; #endif // EMU_ENV bool succeed = false; bool done = false; if (test_ == 0) { // use multiple loops to make sure the failure case is not caused // by reusing the allocation from the cached memory pool for (int i = 0; i < 5 && !done; i++) { cl_mem buffer; size_t memoryInfo[2]; _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, 2 * sizeof(size_t), memoryInfo, NULL); buffer = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, BufSize * sizeof(cl_int4), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); unsigned int* values; values = reinterpret_cast(new cl_int4[BufSize]); // Clear destination buffer memset(values, 0, BufSize * sizeof(cl_int4)); error_ = _wrapper->clEnqueueWriteBuffer( cmdQueues_[_deviceId], buffer, CL_TRUE, 0, BufSize * sizeof(cl_int4), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); size_t memoryInfo2[2]; _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, 2 * sizeof(size_t), memoryInfo2, NULL); size_t dif = memoryInfo[0] - memoryInfo2[0]; if (dif == 0) { // the buffer memory may come from the cached memory pool BufSize *= 2; // double the size and try again } else if ((dif >= (static_cast(BufSize * sizeof(cl_int4) * 1.5f) / 1024)) || (dif <= ((BufSize * sizeof(cl_int4) / 2) / 1024))) { done = true; } else { succeed = true; done = true; } delete[] values; } } else { int i = 0; size_t sizeAll; size_t memoryInfo[2]; _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, 2 * sizeof(size_t), memoryInfo, NULL); unsigned int* values; values = reinterpret_cast(new cl_int4[BufSize]); memset(values, 0, BufSize * sizeof(cl_int4)); // Loop a few times to make sure the results are consistent for (int k = 0; k < 3; ++k) { sizeAll = 0; while (true) { cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, BufSize * sizeof(cl_int4), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); // Clear destination buffer error_ = _wrapper->clEnqueueWriteBuffer( cmdQueues_[_deviceId], buffer, CL_TRUE, 0, BufSize * sizeof(cl_int4), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); sizeAll += BufSize * sizeof(cl_int4) / 1024; size_t memoryInfo2[2]; _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_GLOBAL_FREE_MEMORY_AMD, 2 * sizeof(size_t), memoryInfo2, NULL); #if EMU_ENV // For testing on emulator with 2G RAM and buffer size of x10000 if (memoryInfo2[0] < (0x3e000 + (BufSize * sizeof(cl_int4) / 1024))) { #else if (memoryInfo2[0] < (0x50000 + (BufSize * sizeof(cl_int4) / 1024))) { #endif // EMU_ENV break; } size_t dif = memoryInfo[0] - memoryInfo2[0]; // extra memory could be allocated/destroyed in the driver if (dif == 0) { // the buffer memory may come from the cached memory pool } else if ((dif / sizeAll) == 1 || (sizeAll / dif) == 1) { succeed = true; } else { succeed = false; break; } ++i; } for (auto& it : buffers()) { error_ = _wrapper->clReleaseMemObject(it); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseMemObject() failed"); } buffers_.clear(); if (!succeed) { break; } } delete[] values; } if (!succeed) { CHECK_RESULT(true, "Reported free memory doesn't match allocated size!"); } } unsigned int OCLMemoryInfo::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMemoryInfo.h000066400000000000000000000030151450307266000247410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MEMORY_INFO_H_ #define _OCL_MEMORY_INFO_H_ #include "OCLTestImp.h" class OCLMemoryInfo : public OCLTestImp { public: OCLMemoryInfo(); virtual ~OCLMemoryInfo(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; uint32_t test_; }; #endif // _OCL_MEMORY_INFO_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMultiQueue.cpp000066400000000000000000000224451450307266000253170ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLMultiQueue.h" #include #include #include #include #include #include "CL/cl.h" const static char* strKernel = "__kernel void \n" "copyInc(__global uint* dst, __global uint* src) \n" "{ \n" " uint index = get_global_id(0); \n" " \n" " dst[index] = src[index] + 1; \n" "} \n"; static bool useGPU = true; static const cl_uint NumQueues = 8; // must be power of 2 static cl_uint NumElements = 4096; static const cl_uint NumRuns = 16384; static const cl_uint ExecutionsPerQueue = 256; std::stringstream lerror; class MemTransfer { public: MemTransfer(OCLWrapper* wrapper, cl_context context, cl_command_queue queue, cl_uint numElements) : wrapper_(wrapper), context_(context), queue_(queue), numElements_(numElements), count_(0) {} ~MemTransfer() { wrapper_->clReleaseMemObject(dst_); wrapper_->clReleaseMemObject(src_); } bool create() { cl_int err; size_t size = numElements_ * sizeof(cl_uint); cl_uint* data = new cl_uint[numElements_]; memset(data, 0, size); src_ = wrapper_->clCreateBuffer(context_, CL_MEM_COPY_HOST_PTR, size, data, &err); if (src_ == NULL) { lerror << "clReleaseContext failed"; delete[] data; return false; } dst_ = wrapper_->clCreateBuffer(context_, 0, size, NULL, &err); if (dst_ == NULL) { lerror << "clCreateBuffer() failed"; delete[] data; return false; } delete[] data; return true; } bool run(cl_kernel kernel) { size_t global_work_size[1]; size_t local_work_size[1]; size_t size = numElements_ * sizeof(cl_uint); global_work_size[0] = (numElements_ + 63) / 64 * 64; local_work_size[0] = 64; if (CL_SUCCESS != wrapper_->clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&dst_)) { return false; } if (CL_SUCCESS != wrapper_->clSetKernelArg(kernel, 1, sizeof(cl_mem), (void*)&src_)) { return false; } if (CL_SUCCESS != wrapper_->clEnqueueNDRangeKernel( queue_, kernel, 1, NULL, (const size_t*)global_work_size, (const size_t*)local_work_size, 0, NULL, NULL)) { lerror << "clEnqueueNDRangeKernel() failed"; return false; } // Copy dst into src if (CL_SUCCESS != wrapper_->clEnqueueCopyBuffer(queue_, dst_, src_, 0, 0, size, 0, 0, NULL)) { lerror << "clEnqueueCopyBuffer() failed"; return false; } count_++; return true; } bool check() { size_t size = numElements_ * sizeof(cl_uint); cl_event event; void* ptr = wrapper_->clEnqueueMapBuffer(queue_, src_, CL_TRUE, CL_MAP_READ, 0, size, 0, NULL, NULL, NULL); cl_uint* data = reinterpret_cast(ptr); for (cl_uint i = 0; i < numElements_; ++i) { if (data[i] != count_) { return false; } } wrapper_->clEnqueueUnmapMemObject(queue_, src_, ptr, 0, NULL, &event); wrapper_->clWaitForEvents(1, &event); wrapper_->clReleaseEvent(event); return true; } void flush() { wrapper_->clFlush(queue_); } private: OCLWrapper* wrapper_; cl_context context_; cl_command_queue queue_; cl_uint numElements_; cl_uint count_; cl_mem dst_; cl_mem src_; }; MemTransfer* work[NumQueues]; bool test(cl_kernel, cl_uint, cl_uint); OCLMultiQueue::OCLMultiQueue() { _numSubTests = 0; for (cl_uint i = 1; i <= NumQueues; i <<= 1, _numSubTests++) ; failed_ = false; } OCLMultiQueue::~OCLMultiQueue() {} void OCLMultiQueue::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); test_ = test; cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { testDescString = "GPU device is required for this test!\n"; failed_ = true; return; } size_t maxWorkGroupSize = 1; cl_uint computePower = 1; error_ = _wrapper->clGetDeviceInfo( devices_[deviceId], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(maxWorkGroupSize), &maxWorkGroupSize, NULL); computePower *= static_cast(maxWorkGroupSize); cl_uint maxComputeUnits = 1; error_ = _wrapper->clGetDeviceInfo( devices_[deviceId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(maxComputeUnits), &maxComputeUnits, NULL); computePower *= 32 * maxComputeUnits; NumElements = (NumElements < static_cast(computePower)) ? static_cast(computePower) : NumElements; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "copyInc", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); } void OCLMultiQueue::run(void) { if (failed_) { return; } // Run test cl_uint queues = 1 << test_; if (!test(kernel_, NumRuns / queues, queues)) { lerror << "We failed a test run!"; CHECK_RESULT(true, lerror.str().c_str()); } } unsigned int OCLMultiQueue::close(void) { return OCLTestImp::close(); } bool OCLMultiQueue::test(cl_kernel kernel, cl_uint numRuns, cl_uint numQueues) { cl_command_queue cmd_queue[NumQueues]; CPerfCounter timer; for (cl_uint i = 0; i < numQueues; ++i) { cmd_queue[i] = _wrapper->clCreateCommandQueue(context_, devices_[_deviceId], 0, &error_); if (cmd_queue[i] == (cl_command_queue)0) { _wrapper->clReleaseContext(context_); testDescString = "clCreateCommandQueue() failed"; return false; } work[i] = new MemTransfer(_wrapper, context_, cmd_queue[i], NumElements); if (work[i] == NULL || !work[i]->create()) { testDescString = "Test creation failed"; return false; } } timer.Reset(); timer.Start(); cl_uint dispatchCount = ExecutionsPerQueue / numQueues; for (cl_uint i = 0; i < numRuns; ++i) { for (cl_uint j = 0; j < numQueues; ++j) { if (!work[j]->run(kernel)) { testDescString = "Execution failed"; return false; } // Every queue should have a dispatch after 256 executions, // but the time for dispatch on each queue // will be shifted on dispatchCount if (((i % dispatchCount) == 0) && (((i / dispatchCount) % numQueues) == j)) { work[j]->flush(); } } } for (cl_uint i = 0; i < numQueues; ++i) { _wrapper->clFinish(cmd_queue[i]); } timer.Stop(); for (cl_uint j = 0; j < numQueues; ++j) { if (!work[j]->check()) { testDescString = "Result Check fails!"; return false; } } std::stringstream stream; stream << "Num Queues: " << numQueues << ", Executions Per Queue: "; stream.flags(std::ios::right | std::ios::showbase); stream.width(5); stream << numRuns; stream.precision(3); stream << ", Time: " << (float)(timer.GetElapsedTime()) << " seconds"; for (cl_uint i = 0; i < numQueues; ++i) { delete work[i]; _wrapper->clReleaseCommandQueue(cmd_queue[i]); } testDescString = stream.str(); return true; } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLMultiQueue.h000066400000000000000000000031271450307266000247600ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_MULTI_QUEUE_H_ #define _OCL_MULTI_QUEUE_H_ #include "OCLTestImp.h" class OCLMultiQueue : public OCLTestImp { public: OCLMultiQueue(); virtual ~OCLMultiQueue(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool test(cl_kernel kernel, cl_uint numRuns, cl_uint numQueues); bool failed_; unsigned int test_; }; #endif // _OCL_ASYNC_TRANSFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLOfflineCompilation.cpp000066400000000000000000000174121450307266000267770ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLOfflineCompilation.h" #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" #include "cl_kernel_info_amd.h" typedef CL_API_ENTRY cl_int(CL_API_CALL* clGetKernelInfoAMD_fn)( cl_kernel kernel, cl_device_id device, cl_kernel_info_amd param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret); clGetKernelInfoAMD_fn clGetKernelInfoAMDp; #define BLIT_KERNEL(...) #__VA_ARGS__ const char* strKernel12 = BLIT_KERNEL( \n const constant uint test = 1; __kernel void factorial(__global uint* out) { uint id = get_global_id(0); uint factorial = 1; out[id] = factorial + test; } \n); const char* strKernel20 = BLIT_KERNEL( \n const constant uint test = 1; global uint test2 = 0; __kernel void factorial(__global uint* out) { uint id = get_global_id(0); uint factorial = 1; out[id] = factorial + test; if (id == 0) { out[id] += test2++; } } \n); OCLOfflineCompilation::OCLOfflineCompilation() { _numSubTests = 1; } OCLOfflineCompilation::~OCLOfflineCompilation() {} void OCLOfflineCompilation::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { size_t nDevices = 0; cl_device_id* devices = NULL; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); _wrapper->clReleaseContext(context_); cl_context_properties cprops[5]; clGetKernelInfoAMDp = (clGetKernelInfoAMD_fn)clGetExtensionFunctionAddressForPlatform( platform_, "clGetKernelInfoAMD"); if (clGetKernelInfoAMDp == NULL) { testDescString = "clGetKernelInfoAMD not found!\n"; return; } // Utilize the CL_CONTEXT_OFFLINE_DEVICES_AMD platform option to allow for // the generation of binary kernel without target device installed in build // system. cprops[0] = CL_CONTEXT_PLATFORM; cprops[1] = (cl_context_properties)platform_; cprops[2] = CL_CONTEXT_OFFLINE_DEVICES_AMD; cprops[3] = (cl_context_properties)1; cprops[4] = (cl_context_properties)0; // end of options list marker // Create a context with all of the available devices. context_ = _wrapper->clCreateContextFromType(cprops, CL_DEVICE_TYPE_GPU, NULL, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateContextFromType() failed"); size_t deviceListSize = 0; error_ = _wrapper->clGetContextInfo(context_, CL_CONTEXT_NUM_DEVICES, sizeof(size_t), &deviceListSize, NULL); CHECK_RESULT(((error_ != CL_SUCCESS) || (deviceListSize == 0)), "clGetContextInfo() failed"); devices = (cl_device_id*)malloc(sizeof(cl_device_id) * deviceListSize); CHECK_RESULT((devices == NULL), "clGetContextInfo() failed"); memset(devices, 0, deviceListSize); error_ = _wrapper->clGetContextInfo(context_, CL_CONTEXT_DEVICES, sizeof(cl_device_id) * deviceListSize, devices, &nDevices); CHECK_RESULT((error_ != CL_SUCCESS), "clGetContextInfo() failed"); for (unsigned version = 1; version <= 2; ++version) { std::string options; const char* strKernel; switch (version) { case 1: options = ""; strKernel = strKernel12; break; case 2: options = "-cl-std=CL2.0"; strKernel = strKernel20; break; default: assert(false); return; } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); for (unsigned int i = 0; i < deviceListSize; ++i) { char name[128]; char strVersion[128]; _wrapper->clGetDeviceInfo(devices[i], CL_DEVICE_NAME, sizeof(name), name, NULL); error_ = _wrapper->clGetDeviceInfo(devices[i], CL_DEVICE_VERSION, sizeof(strVersion), strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (version == 2 && strVersion[7] < '2') { continue; } // skipping the test on gfx9+ for now till we add compiler support for al // the gfx10+ subdevices cl_uint gfxip_major = 0; cl_uint gfxip_minor = 0; clGetDeviceInfo(devices[i], CL_DEVICE_GFXIP_MAJOR_AMD, sizeof(gfxip_major), &gfxip_major, NULL); clGetDeviceInfo(devices[i], CL_DEVICE_GFXIP_MINOR_AMD, sizeof(gfxip_minor), &gfxip_minor, NULL); printf("Building on %s, OpenCL version %s, (options '%s')\n", name, (version == 2 ? "2.0" : "1.2"), options.c_str()); error_ = _wrapper->clBuildProgram(program_, 1, &devices[i], options.c_str(), NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo( program_, devices[i], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); break; } kernel_ = _wrapper->clCreateKernel(program_, "factorial", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); size_t usedVGPRs = 0; error_ = clGetKernelInfoAMDp(kernel_, devices[i], CL_KERNELINFO_USED_VGPRS, sizeof(usedVGPRs), &usedVGPRs, NULL); CHECK_RESULT(((error_ != CL_SUCCESS) || (usedVGPRs == 0)), "clGetKernelInfoAMD() failed"); _wrapper->clReleaseKernel(kernel_); kernel_ = nullptr; size_t binSize; error_ = _wrapper->clGetProgramInfo(program_, CL_PROGRAM_BINARY_SIZES, sizeof(size_t), &binSize, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetProgramInfo() failed"); char* binary = new char[binSize]; error_ = _wrapper->clGetProgramInfo(program_, CL_PROGRAM_BINARIES, sizeof(char*), &binary, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetProgramInfo() failed"); delete[] binary; } if (version == 1) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT((error_ != CL_SUCCESS), "clReleaseProgram() failed"); } } free(devices); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLOfflineCompilation::run(void) {} unsigned int OCLOfflineCompilation::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLOfflineCompilation.h000066400000000000000000000030201450307266000264320ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_OFFLINE_COMPILATION_H_ #define _OCL_OFFLINE_COMPILATION_H_ #include "OCLTestImp.h" class OCLOfflineCompilation : public OCLTestImp { public: OCLOfflineCompilation(); virtual ~OCLOfflineCompilation(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); }; #endif // _OCL_OFFLINE_COMPILATION_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLP2PBuffer.cpp000066400000000000000000000232031450307266000247440ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLP2PBuffer.h" #include #include #include #include #include #include #include #include "CL/cl.h" const static size_t ChunkSize = 256 * 1024; const static int NumSizes = 5; const static int NumRuns = 4; const static int NumChunksArray[NumSizes] = {1, 4, 16, 32, 64}; const static size_t MaxSubTests = NumRuns * NumSizes; const static int NumIterArray[NumSizes] = {20, 15, 10, 10, 10}; OCLP2PBuffer::OCLP2PBuffer() { #ifdef CL_VERSION_2_0 _numSubTests = MaxSubTests; #else _numSubTests = 0; #endif failed_ = false; maxSize_ = 0; context0_ = nullptr; context1_ = nullptr; cmdQueue0_ = nullptr; cmdQueue1_ = nullptr; } OCLP2PBuffer::~OCLP2PBuffer() {} void OCLP2PBuffer::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { #ifdef CL_VERSION_2_0 cl_uint numPlatforms = 0; cl_platform_id platform = NULL; cl_uint num_devices = 0; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); if (deviceCount_ < 2) { printf("\nTwo GPUs are required to run P2P test\n"); failed_ = true; return; } testID_ = test; char name[1024] = {0}; size_t size = 0; _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_EXTENSIONS, 1024, name, &size); if (!strstr(name, "cl_amd_copy_buffer_p2p")) { printf("P2P extension is required for this test!\n"); failed_ = true; return; } _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_EXTENSIONS, 1024, name, &size); if (!strstr(name, "cl_amd_copy_buffer_p2p")) { printf("P2P extension is required for this test!\n"); failed_ = true; return; } num_p2p_0_ = 0; _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_NUM_P2P_DEVICES_AMD, sizeof(num_p2p_0_), &num_p2p_0_, nullptr); if (num_p2p_0_ != 0) { cl_device_id* p2p = new cl_device_id[num_p2p_0_]; _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_P2P_DEVICES_AMD, sizeof(cl_device_id) * num_p2p_0_, p2p, nullptr); delete[] p2p; } num_p2p_1_ = 0; _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_NUM_P2P_DEVICES_AMD, sizeof(num_p2p_1_), &num_p2p_1_, nullptr); if (num_p2p_1_ != 0) { cl_device_id* p2p = new cl_device_id[num_p2p_1_]; _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_P2P_DEVICES_AMD, sizeof(cl_device_id) * num_p2p_1_, p2p, nullptr); delete[] p2p; } cl_context_properties props[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform, 0}; context0_ = _wrapper->clCreateContext(props, 1, &devices_[0], NULL, 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateContext#0 failed"); context1_ = _wrapper->clCreateContext(props, 1, &devices_[1], NULL, 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateContext#1 failed"); NumChunks = NumChunksArray[testID_ % NumSizes]; NumIter = NumIterArray[testID_ % NumSizes]; BufferSize = NumChunks * ChunkSize * sizeof(cl_uint); p2p_copy_ = (clEnqueueCopyBufferP2PAMD_fn)clGetExtensionFunctionAddressForPlatform( platform_, "clEnqueueCopyBufferP2PAMD"); if (p2p_copy_ == NULL) { testDescString = "Failed to initialize P2P extension!\n"; failed_ = true; return; } cl_queue_properties prop[] = {CL_QUEUE_PROPERTIES, 0, 0}; cmdQueue0_ = _wrapper->clCreateCommandQueueWithProperties( context0_, devices_[0], prop, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); cmdQueue1_ = _wrapper->clCreateCommandQueueWithProperties( context1_, devices_[1], prop, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); size_t chunkSize = ChunkSize; cl_mem buf = NULL; cl_uint memFlags = 0; buf = _wrapper->clCreateBuffer(context0_, CL_MEM_READ_ONLY | memFlags, BufferSize, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buf); buf = _wrapper->clCreateBuffer(context1_, memFlags, BufferSize, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buf); #endif } void OCLP2PBuffer::run(void) { #ifdef CL_VERSION_2_0 if (failed_) { return; } size_t finalBuf = 0; cl_uint subTest = (testID_ / NumSizes) % 2; cl_uint* buffer = new cl_uint[NumChunks * ChunkSize]; cl_uint* buffer2 = new cl_uint[NumChunks * ChunkSize]; cl_event event; memset(buffer, 0x23, BufferSize); error_ = _wrapper->clEnqueueWriteBuffer(cmdQueue1_, buffers_[1], CL_TRUE, 0, BufferSize, buffer, 0, nullptr, (subTest == 0) ? &event : nullptr); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); memset(buffer2, 0xEB, BufferSize); error_ = _wrapper->clEnqueueWriteBuffer(cmdQueue0_, buffers_[0], CL_TRUE, 0, BufferSize, buffer2, 0, nullptr, (subTest == 1) ? &event : nullptr); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); CPerfCounter timer; double sec = 0.; if (subTest == 0) { error_ = p2p_copy_(cmdQueue0_, buffers_[0], buffers_[1], 0, 0, BufferSize, 1, &event, nullptr); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueCopyBufferP2PAMD() failed"); _wrapper->clFinish(cmdQueue0_); } else { error_ = p2p_copy_(cmdQueue1_, buffers_[1], buffers_[0], 0, 0, BufferSize, 1, &event, nullptr); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueCopyBufferP2PAMD() failed"); _wrapper->clFinish(cmdQueue1_); } clReleaseEvent(event); cl_command_queue execQueue; if (((testID_ / NumSizes) == 0) || ((testID_ / NumSizes) == 3)) { execQueue = cmdQueue0_; } else { execQueue = cmdQueue1_; } for (int i = 0; i < NumIter; ++i) { timer.Reset(); timer.Start(); if (subTest == 0) { p2p_copy_(execQueue, buffers_[0], buffers_[1], 0, 0, BufferSize, 0, nullptr, nullptr); } else { p2p_copy_(execQueue, buffers_[1], buffers_[0], 0, 0, BufferSize, 0, nullptr, nullptr); } _wrapper->clFinish(execQueue); timer.Stop(); double cur = timer.GetElapsedTime(); if (i == 0) { sec = cur; } else { sec = std::min(cur, sec); } } memset(buffer, 0x20, BufferSize); if (subTest == 0) { error_ = _wrapper->clEnqueueReadBuffer(cmdQueue1_, buffers_[1], CL_TRUE, 0, BufferSize, buffer, 0, NULL, NULL); } else { error_ = _wrapper->clEnqueueReadBuffer(cmdQueue0_, buffers_[0], CL_TRUE, 0, BufferSize, buffer, 0, NULL, NULL); } CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed!"); cl_uint cmp_value = (subTest == 0) ? 0xEBEBEBEB : 0x23232323; for (int c = 0; c < NumChunks; ++c) { for (cl_uint i = 0; i < ChunkSize; ++i) { if (buffer[c * ChunkSize + i] != cmp_value) { CHECK_RESULT(true, "Validation failed!"); } } } delete[] buffer; delete[] buffer2; cl_uint* p2p = ((subTest == 0) ? &num_p2p_0_ : &num_p2p_1_); static const char* MemTypeStr[] = {"Visible ", "Remote ", "Invisible", "Staging"}; _perfInfo = (float)BufferSize / ((float)sec * 1000.f * 1000.f * 1000.f); std::stringstream str; if ((testID_ / (2 * NumSizes)) == 0) { str << "Write dev" << ((subTest == 0) ? 0 : 1) << "->dev" << ((subTest == 0) ? 1 : 0) << ((*p2p != 0) ? " " : " ") << "("; } else { str << "Read dev" << ((subTest == 0) ? 1 : 0) << "<-dev" << ((subTest == 0) ? 0 : 1) << ((*p2p != 0) ? " " : " ") << "("; } str.width(2); str << BufferSize / (1000 * 1000); str << " MB " << ") transfer speed (GB/s):"; testDescString = str.str(); #endif } unsigned int OCLP2PBuffer::close(void) { #ifdef CL_VERSION_2_0 if (!failed_) { if (cmdQueue0_ != nullptr) { _wrapper->clReleaseCommandQueue(cmdQueue0_); } if (cmdQueue1_ != nullptr) { _wrapper->clReleaseCommandQueue(cmdQueue1_); } if (context0_ != nullptr) { _wrapper->clReleaseContext(context0_); } if (context1_ != nullptr) { _wrapper->clReleaseContext(context1_); } } return OCLTestImp::close(); #else return CL_SUCCESS; #endif } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLP2PBuffer.h000066400000000000000000000035111450307266000244110ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_P2P_BUFFER_H_ #define _OCL_P2P_BUFFER_H_ #include "OCLTestImp.h" class OCLP2PBuffer : public OCLTestImp { public: OCLP2PBuffer(); virtual ~OCLP2PBuffer(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; unsigned int testID_; cl_ulong maxSize_; size_t BufferSize; int NumChunks; int NumIter; int NumStages; cl_context context0_; cl_context context1_; cl_command_queue cmdQueue0_; cl_command_queue cmdQueue1_; cl_uint num_p2p_0_; cl_uint num_p2p_1_; #ifdef CL_VERSION_2_0 clEnqueueCopyBufferP2PAMD_fn p2p_copy_; #endif }; #endif // _OCL_P2P_BUFFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPartialWrkgrp.cpp000066400000000000000000000276431450307266000260160ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPartialWrkgrp.h" #include #include #include #include #include "CL/cl.h" static const size_t BufSize = 0x1000; const static char* strKernel = "__kernel void fillX(__global int4* out) \n" "{ \n" " int id = get_global_id(0); \n" " out[id].x = id; \n" "} \n" " \n" "__kernel void fillXY(__global int4* out) \n" "{ \n" " int id = get_global_id(0) + get_global_id(1) * get_global_size(0); \n" " out[id].x = get_global_id(0); \n" " out[id].y = get_global_id(1); \n" "} \n" " \n" "__kernel void fillXYZ(__global int4* out) \n" "{ \n" " int id = get_global_id(0) + get_global_id(1) * get_global_size(0) + \n" " get_global_id(2) * get_global_size(0) * get_global_size(1); \n" " out[id].x = get_global_id(0); \n" " out[id].y = get_global_id(1); \n" " out[id].z = get_global_id(2); \n" "} \n"; OCLPartialWrkgrp::OCLPartialWrkgrp() { _numSubTests = 3; isOCL2_ = true; } OCLPartialWrkgrp::~OCLPartialWrkgrp() {} void OCLPartialWrkgrp::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { _openTest = test; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); char version[128]; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_VERSION, sizeof(version), version, NULL); if (_openTest != 0 && strstr(version, "OpenCL 2.0") == NULL) { isOCL2_ = false; return; } program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); switch (_openTest) { case 0: error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); break; case 1: error_ = _wrapper->clBuildProgram( program_, 1, &devices_[deviceId], "-cl-uniform-work-group-size -cl-std=CL2.0", NULL, NULL); break; case 2: error_ = _wrapper->clBuildProgram( program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); break; default: CHECK_RESULT(false, "Invalid test number > _numSubTests"); return; } if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "fillX", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, BufSize * sizeof(cl_int4), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLPartialWrkgrp::run(void) { if (!isOCL2_) return; unsigned int* values; cl_mem buffer = buffers()[0]; values = reinterpret_cast(new cl_int4[BufSize]); // // Check unaligned workgroup in X dimension // // Clear destination buffer memset(values, 0, BufSize * sizeof(cl_int4)); error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], buffer, CL_TRUE, 0, BufSize * sizeof(cl_int4), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {BufSize - 1}; size_t lws[1] = {256}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, lws, 0, NULL, NULL); switch (_openTest) { case 2: if (error_ != CL_SUCCESS) { CHECK_RESULT(false, "clEnqueueNDRangeKernel() failed"); return; } error_ = _wrapper->clEnqueueReadBuffer( cmdQueues_[_deviceId], buffer, CL_TRUE, 0, BufSize * sizeof(cl_int4), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (size_t x = 0; x < BufSize; ++x) { if (x == (BufSize - 1)) { CHECK_RESULT((values[4 * x] != 0), "Comparison failed!"); } else { CHECK_RESULT((values[4 * x] != x), "Comparison failed!"); } } break; case 1: case 0: CHECK_RESULT((error_ != CL_INVALID_WORK_GROUP_SIZE), "clEnqueueNDRangeKernel(): " "Expected to fail for non-uniform work group sizes!"); break; default: CHECK_RESULT(false, "Invalid test number > _numSubTests"); return; } error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseKernel() failed"); // // Check unaligned workgroup in X and Y dimensions // kernel_ = _wrapper->clCreateKernel(program_, "fillXY", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); // Clear destination buffer memset(values, 0, BufSize * sizeof(cl_int4)); error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], buffer, CL_TRUE, 0, BufSize * sizeof(cl_int4), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws2[2] = {0x3f, 0x3f}; size_t lws2[2] = {16, 16}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, gws2, lws2, 0, NULL, NULL); switch (_openTest) { case 2: if (error_ != CL_SUCCESS) { CHECK_RESULT(false, "clEnqueueNDRangeKernel() failed"); return; } error_ = _wrapper->clEnqueueReadBuffer( cmdQueues_[_deviceId], buffer, CL_TRUE, 0, BufSize * sizeof(cl_int4), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (size_t y = 0; y < 0x40; ++y) { for (size_t x = 0; x < 0x3f; ++x) { size_t id = x + y * 0x3f; if (y == 0x3f) { CHECK_RESULT((values[4 * id] != 0), "Comparison failed!"); CHECK_RESULT((values[4 * id + 1] != 0), "Comparison failed!"); } else { CHECK_RESULT((values[4 * id] != x), "Comparison failed!"); CHECK_RESULT((values[4 * id + 1] != y), "Comparison failed!"); } } } break; case 1: case 0: CHECK_RESULT((error_ != CL_INVALID_WORK_GROUP_SIZE), "clEnqueueNDRangeKernel(): " "Expected to fail for non-uniform work group sizes!"); break; default: CHECK_RESULT(false, "Invalid test number > _numSubTests"); return; } error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN((error_ != CL_SUCCESS), "clReleaseKernel() failed"); // // Check unaligned workgroup in X, Y and Z dimensions // kernel_ = _wrapper->clCreateKernel(program_, "fillXYZ", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); // Clear destination buffer memset(values, 0, BufSize * sizeof(cl_int4)); error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], buffer, CL_TRUE, 0, BufSize * sizeof(cl_int4), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws3[3] = {0xf, 0x10, 0xf}; size_t lws3[3] = {4, 4, 4}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 3, NULL, gws3, lws3, 0, NULL, NULL); switch (_openTest) { case 2: if (error_ != CL_SUCCESS) { CHECK_RESULT(false, "clEnqueueNDRangeKernel() failed"); return; } error_ = _wrapper->clEnqueueReadBuffer( cmdQueues_[_deviceId], buffer, CL_TRUE, 0, BufSize * sizeof(cl_int4), values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (size_t z = 0; z < 0x10; ++z) { for (size_t y = 0; y < 0x10; ++y) { for (size_t x = 0; x < 0xf; ++x) { size_t id = x + y * 0xf + z * 0xf0; if (z == 0xf) { CHECK_RESULT((values[4 * id] != 0), "Comparison failed!"); CHECK_RESULT((values[4 * id + 1] != 0), "Comparison failed!"); CHECK_RESULT((values[4 * id + 2] != 0), "Comparison failed!"); } else { CHECK_RESULT((values[4 * id] != x), "Comparison failed!"); CHECK_RESULT((values[4 * id + 1] != y), "Comparison failed!"); CHECK_RESULT((values[4 * id + 2] != z), "Comparison failed!"); } } } } break; case 1: case 0: CHECK_RESULT((error_ != CL_INVALID_WORK_GROUP_SIZE), "clEnqueueNDRangeKernel(): " "Expected fail for non-uniform work group sizes!"); break; default: CHECK_RESULT(false, "Invalid test number > _numSubTests"); return; } delete[] values; } unsigned int OCLPartialWrkgrp::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPartialWrkgrp.h000066400000000000000000000030151450307266000254460ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PARTIAL_WRKGRP_H_ #define _OCL_PARTIAL_WRKGRP_H_ #include "OCLTestImp.h" class OCLPartialWrkgrp : public OCLTestImp { public: OCLPartialWrkgrp(); virtual ~OCLPartialWrkgrp(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool isOCL2_; }; #endif // _OCL_PARTIAL_WRKGRP_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPerfCounters.cpp000066400000000000000000000725731450307266000256460ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPerfCounters.h" #include #include #include #include "CL/cl.h" #include "Timer.h" #ifdef WIN_OS #define SNPRINTF sprintf_s #else #define SNPRINTF snprintf #endif struct PerfCounterInfo { cl_long blockIdx; //!< Block Index cl_long counterIdx; //!< Counter Index cl_long eventIdx; //!< Event Index }; struct DeviceCounterInfo { const char *deviceName_; //!< Device name unsigned int devId_; //!< Device id PerfCounterInfo perfCounter_[2]; //!< Perforamnce counter array }; static const DeviceCounterInfo DeviceInfo[]{ #ifdef _WIN32 // GFX11 supports performance counter on Windows only. {"gfx1100", 11, {{139, 0, 4}, {74, 0, 13}}}, // {SQWGP, reg 0, SQ_PERF_SEL_WAVES}, {CPC, // reg 0, Me1 busy for packet decode} {"gfx1101", 11, {{139, 0, 4}, {74, 0, 13}}}, // {SQWGP, reg 0, SQ_PERF_SEL_WAVES}, {CPC, // reg 0, Me1 busy for packet decode} {"gfx1102", 11, {{139, 0, 4}, {74, 0, 13}}}, // {SQWGP, reg 0, SQ_PERF_SEL_WAVES}, {CPC, // reg 0, Me1 busy for packet decode} {"gfx1103", 11, {{139, 0, 4}, {74, 0, 13}}}, // {SQWGP, reg 0, SQ_PERF_SEL_WAVES}, {CPC, // reg 0, Me1 busy for packet decode} #endif // GFX10 {"gfx1000", 10, {{15, 0, 4}, {74, 0, 13}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {CPC, // reg 0, Me1 busy for packet decode} {"gfx1010", 10, {{15, 0, 4}, {74, 0, 13}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {CPC, // reg 0, Me1 busy for packet decode} {"gfx1011", 10, {{15, 0, 4}, {74, 0, 13}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {CPC, // reg 0, Me1 busy for packet decode} {"gfx1012", 10, {{15, 0, 4}, {74, 0, 13}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {CPC, // reg 0, Me1 busy for packet decode} // GFX9 {"gfx900", 9, {{14, 0, 4}, {97, 1, 2}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {MCVML2_l, // reg 0, BigK bank 0 hits} {"gfx901", 9, {{14, 0, 4}, {97, 1, 2}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {MCVML2_l, // reg 0, BigK bank 0 hits} {"gfx902", 9, {{14, 0, 4}, {97, 1, 2}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {MCVML2_l, // reg 0, BigK bank 0 hits} {"gfx903", 9, {{14, 0, 4}, {97, 1, 2}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {MCVML2_l, // reg 0, BigK bank 0 hits} {"gfx904", 9, {{14, 0, 4}, {97, 1, 2}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {MCVML2_l, // reg 0, BigK bank 0 hits} {"gfx905", 9, {{14, 0, 4}, {97, 1, 2}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {MCVML2_l, // reg 0, BigK bank 0 hits} {"gfx906", 9, {{14, 0, 4}, {97, 1, 2}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {MCVML2_l, // reg 0, BigK bank 0 hits} {"gfx907", 9, {{14, 0, 4}, {97, 1, 2}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {MCVML2_l, // reg 0, BigK bank 0 hits} // Sea Islands, GFX8 {"Bonaire", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Hawaii", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Maui", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Casper", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Spectre", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Slimer", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Spooky", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Kalindi", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Mullins", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Iceland", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Tonga", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Bermuda", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Fiji", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Carrizo", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Ellesmere", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Baffin", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Stoney", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"gfx804", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"gfx803", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Bristol Ridge", 0, {{14, 0, 4}, {9, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} // Southern Islands {"Tahiti", 0, {{10, 0, 4}, {5, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Pitcairn", 0, {{10, 0, 4}, {5, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Capeverde", 0, {{10, 0, 4}, {5, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Oland", 0, {{10, 0, 4}, {5, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} {"Hainan", 0, {{10, 0, 4}, {5, 0, 3}}}, // {SQ, reg 0, SQ_PERF_SEL_WAVES}, {GRBM, reg 0, // GRBM_PERF_SEL_CP_BUSY} }; const int DeviceCounterSize = sizeof(DeviceInfo) / sizeof(DeviceCounterInfo); static const char *sha256_kernel = "typedef uint UINT;\n" "\n" "#define VECTOR_LEN 1\n" "\n" "#ifdef LITTLE_E\n" "\n" "inline UINT byteswap(UINT x)\n" "{\n" " UINT res = 0;\n" " \n" " for (uint i=0; i<4; i++)\n" " {\n" " res <<= 8;\n" " res |= (x & 0xff);\n" " x >>= 8;\n" " }\n" " \n" " return res;\n" "}\n" "\n" "#else\n" "\n" "inline UINT byteswap(const UINT x)\n" "{\n" " return x;\n" "}\n" "\n" "#endif\n" "\n" "\n" "void sha256_step( const UINT data[16], UINT *state )\n" "{\n" " UINT W[64], temp1, temp2;\n" " UINT A, B, C, D, E, F, G, H;\n" "\n" " for( int i = 0; i < 16; i++)\n" " {\n" " W[i] = byteswap(data[i]);\n" " }\n" "\n" "#define SHR(x,n) ((x & 0xFFFFFFFF) >> n)\n" "#define ROTR(x,n) (SHR(x,n) | (x << (32 - n)))\n" "\n" "#define S0(x) (ROTR(x, 7) ^ ROTR(x,18) ^ SHR(x, 3))\n" "#define S1(x) (ROTR(x,17) ^ ROTR(x,19) ^ SHR(x,10))\n" "\n" "#define S2(x) (ROTR(x, 2) ^ ROTR(x,13) ^ ROTR(x,22))\n" "#define S3(x) (ROTR(x, 6) ^ ROTR(x,11) ^ ROTR(x,25))\n" "\n" "#define F0(x,y,z) ((x & y) | (z & (x | y)))\n" "#define F1(x,y,z) (z ^ (x & (y ^ z)))\n" "\n" "#define R(t) \\\n" "( \\\n" " W[t] = S1(W[t - 2]) + W[t - 7] + \\\n" " S0(W[t - 15]) + W[t - 16] \\\n" ")\n" "\n" "#define P(a,b,c,d,e,f,g,h,x,K) \\\n" "{ \\\n" " temp1 = h + S3(e) + F1(e,f,g) + K + x; \\\n" " temp2 = S2(a) + F0(a,b,c); \\\n" " d += temp1; h = temp1 + temp2; \\\n" "}\n" "\n" " A = state[0];\n" " B = state[1];\n" " C = state[2];\n" " D = state[3];\n" " E = state[4];\n" " F = state[5];\n" " G = state[6];\n" " H = state[7];\n" "\n" " P( A, B, C, D, E, F, G, H, W[ 0], 0x428A2F98 );\n" " P( H, A, B, C, D, E, F, G, W[ 1], 0x71374491 );\n" " P( G, H, A, B, C, D, E, F, W[ 2], 0xB5C0FBCF );\n" " P( F, G, H, A, B, C, D, E, W[ 3], 0xE9B5DBA5 );\n" " P( E, F, G, H, A, B, C, D, W[ 4], 0x3956C25B );\n" " P( D, E, F, G, H, A, B, C, W[ 5], 0x59F111F1 );\n" " P( C, D, E, F, G, H, A, B, W[ 6], 0x923F82A4 );\n" " P( B, C, D, E, F, G, H, A, W[ 7], 0xAB1C5ED5 );\n" " P( A, B, C, D, E, F, G, H, W[ 8], 0xD807AA98 );\n" " P( H, A, B, C, D, E, F, G, W[ 9], 0x12835B01 );\n" " P( G, H, A, B, C, D, E, F, W[10], 0x243185BE );\n" " P( F, G, H, A, B, C, D, E, W[11], 0x550C7DC3 );\n" " P( E, F, G, H, A, B, C, D, W[12], 0x72BE5D74 );\n" " P( D, E, F, G, H, A, B, C, W[13], 0x80DEB1FE );\n" " P( C, D, E, F, G, H, A, B, W[14], 0x9BDC06A7 );\n" " P( B, C, D, E, F, G, H, A, W[15], 0xC19BF174 );\n" " P( A, B, C, D, E, F, G, H, R(16), 0xE49B69C1 );\n" " P( H, A, B, C, D, E, F, G, R(17), 0xEFBE4786 );\n" " P( G, H, A, B, C, D, E, F, R(18), 0x0FC19DC6 );\n" " P( F, G, H, A, B, C, D, E, R(19), 0x240CA1CC );\n" " P( E, F, G, H, A, B, C, D, R(20), 0x2DE92C6F );\n" " P( D, E, F, G, H, A, B, C, R(21), 0x4A7484AA );\n" " P( C, D, E, F, G, H, A, B, R(22), 0x5CB0A9DC );\n" " P( B, C, D, E, F, G, H, A, R(23), 0x76F988DA );\n" " P( A, B, C, D, E, F, G, H, R(24), 0x983E5152 );\n" " P( H, A, B, C, D, E, F, G, R(25), 0xA831C66D );\n" " P( G, H, A, B, C, D, E, F, R(26), 0xB00327C8 );\n" " P( F, G, H, A, B, C, D, E, R(27), 0xBF597FC7 );\n" " P( E, F, G, H, A, B, C, D, R(28), 0xC6E00BF3 );\n" " P( D, E, F, G, H, A, B, C, R(29), 0xD5A79147 );\n" " P( C, D, E, F, G, H, A, B, R(30), 0x06CA6351 );\n" " P( B, C, D, E, F, G, H, A, R(31), 0x14292967 );\n" " P( A, B, C, D, E, F, G, H, R(32), 0x27B70A85 );\n" " P( H, A, B, C, D, E, F, G, R(33), 0x2E1B2138 );\n" " P( G, H, A, B, C, D, E, F, R(34), 0x4D2C6DFC );\n" " P( F, G, H, A, B, C, D, E, R(35), 0x53380D13 );\n" " P( E, F, G, H, A, B, C, D, R(36), 0x650A7354 );\n" " P( D, E, F, G, H, A, B, C, R(37), 0x766A0ABB );\n" " P( C, D, E, F, G, H, A, B, R(38), 0x81C2C92E );\n" " P( B, C, D, E, F, G, H, A, R(39), 0x92722C85 );\n" " P( A, B, C, D, E, F, G, H, R(40), 0xA2BFE8A1 );\n" " P( H, A, B, C, D, E, F, G, R(41), 0xA81A664B );\n" " P( G, H, A, B, C, D, E, F, R(42), 0xC24B8B70 );\n" " P( F, G, H, A, B, C, D, E, R(43), 0xC76C51A3 );\n" " P( E, F, G, H, A, B, C, D, R(44), 0xD192E819 );\n" " P( D, E, F, G, H, A, B, C, R(45), 0xD6990624 );\n" " P( C, D, E, F, G, H, A, B, R(46), 0xF40E3585 );\n" " P( B, C, D, E, F, G, H, A, R(47), 0x106AA070 );\n" " P( A, B, C, D, E, F, G, H, R(48), 0x19A4C116 );\n" " P( H, A, B, C, D, E, F, G, R(49), 0x1E376C08 );\n" " P( G, H, A, B, C, D, E, F, R(50), 0x2748774C );\n" " P( F, G, H, A, B, C, D, E, R(51), 0x34B0BCB5 );\n" " P( E, F, G, H, A, B, C, D, R(52), 0x391C0CB3 );\n" " P( D, E, F, G, H, A, B, C, R(53), 0x4ED8AA4A );\n" " P( C, D, E, F, G, H, A, B, R(54), 0x5B9CCA4F );\n" " P( B, C, D, E, F, G, H, A, R(55), 0x682E6FF3 );\n" " P( A, B, C, D, E, F, G, H, R(56), 0x748F82EE );\n" " P( H, A, B, C, D, E, F, G, R(57), 0x78A5636F );\n" " P( G, H, A, B, C, D, E, F, R(58), 0x84C87814 );\n" " P( F, G, H, A, B, C, D, E, R(59), 0x8CC70208 );\n" " P( E, F, G, H, A, B, C, D, R(60), 0x90BEFFFA );\n" " P( D, E, F, G, H, A, B, C, R(61), 0xA4506CEB );\n" " P( C, D, E, F, G, H, A, B, R(62), 0xBEF9A3F7 );\n" " P( B, C, D, E, F, G, H, A, R(63), 0xC67178F2 );\n" "\n" " state[0] += A;\n" " state[1] += B;\n" " state[2] += C;\n" " state[3] += D;\n" " state[4] += E;\n" " state[5] += F;\n" " state[6] += G;\n" " state[7] += H;\n" "}\n" "\n" "\n" "#define choose_temp(x) ((x)/16)\n" "\n" "#define STORE_TO_TEMP(i) tb[((i)/16)][((i)%16)]\n" "\n" "\n" "__kernel void CryptThread(__global const uint *buffer, __global uint " "*state, const uint blockLen, const uint foo)\n" "{\n" " const uint init[8] = {\n" " 0x6a09e667,\n" " 0xbb67ae85,\n" " 0x3c6ef372,\n" " 0xa54ff53a,\n" " 0x510e527f,\n" " 0x9b05688c,\n" " 0x1f83d9ab,\n" " 0x5be0cd19\n" " };\n" " \n" " const uint id = get_global_id(0);\n" " uint len = blockLen;\n" " uint i, j;\n" " const uint startPosInDWORDs = (len*id*foo)/4;\n" " const uint msgLenInBitsl = len * 8;\n" " const uint msgLenInBitsh = (len) >> (32-3);\n" " UINT localState[8];\n" "\n" " for (j=0; j<8; j++) {\n" " localState[j] = init[j];\n" " }\n" "\n" " i = 0;\n" " while (len >=64)\n" " {\n" " UINT data[16];\n" " for (j=0; j<16; j++) {\n" " data[j] = buffer[j + startPosInDWORDs + i];\n" " }\n" "\n" " sha256_step(data, localState);\n" " i += 16;\n" " len -= 64;\n" " }\n" "\n" " len /= 4;\n" "\n" " UINT tb[2][16];\n" "\n" " for (j=0; jclEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_WRITE, 0, bufSize_, 0, NULL, NULL, &error_); if (error_ != CL_SUCCESS) { printf("\nError code : %d\n", error_); } else { for (unsigned int i = 0; i < width_; i++) data[i] = val; error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); if (error_ == CL_SUCCESS) retVal = true; } return retVal; } void OCLPerfCounters::checkData(cl_mem buffer) { unsigned int *data = (unsigned int *)_wrapper->clEnqueueMapBuffer( cmd_queue_, buffer, true, CL_MAP_READ, 0, bufSize_, 0, NULL, NULL, &error_); for (unsigned int i = 0; i < width_; i++) { } error_ = _wrapper->clEnqueueUnmapMemObject(cmd_queue_, buffer, data, 0, NULL, NULL); } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLPerfCounters::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id *devices = NULL; cl_device_id device = NULL; _crcword = 0; conversion = 1.0f; _deviceId = deviceId; _openTest = test; context_ = 0; cmd_queue_ = 0; program_ = 0; kernel_ = 0; inBuffer_ = 0; outBuffer_ = 0; num_input_buf_ = 1; num_output_buf_ = 1; blockSize_ = 1024; isAMD = false; if (type_ != CL_DEVICE_TYPE_GPU) { char msg[256]; SNPRINTF(msg, sizeof(msg), "No GPU devices present. Exiting!\t"); testDescString = msg; return; } width_ = 22347776; // We compute a square domain bufSize_ = width_ * sizeof(cl_uint); error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id *platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms-1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); // Runtime returns an error when no GPU devices are present instead of just // returning 0 devices // CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); // Choose platform with GPU devices if (num_devices > 0) { if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { isAMD = true; } // platform = platforms[_platformIndex]; // break; } #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; global_device = device; context_ = _wrapper->clCreateContext(NULL, 1, &device, notify_callback, NULL, &error_); CHECK_RESULT(context_ == 0, "clCreateContext failed"); char charbuf[1024]; size_t retsize; error_ = _wrapper->clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 1024, charbuf, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); cmd_queue_ = _wrapper->clCreateCommandQueue(context_, device, 0, NULL); CHECK_RESULT(cmd_queue_ == 0, "clCreateCommandQueue failed"); inBuffer_ = new cl_mem[4]; outBuffer_ = new cl_mem[4]; for (int i = 0; i < num_input_buf_; ++i) { inBuffer_[i] = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(inBuffer_[i] == 0, "clCreateBuffer(inBuffer) failed"); bool result = setData(inBuffer_[i], 0xdeadbeef); CHECK_RESULT(result != true, "clEnqueueMapBuffer buffer failed"); } for (int i = 0; i < num_output_buf_; ++i) { outBuffer_[i] = _wrapper->clCreateBuffer(context_, 0, bufSize_, NULL, &error_); CHECK_RESULT(outBuffer_[i] == 0, "clCreateBuffer(outBuffer) failed"); bool result = setData(outBuffer_[i], 0xdeadbeef); CHECK_RESULT(result != true, "clEnqueueMapBuffer buffer failed"); } program_ = _wrapper->clCreateProgramWithSource( context_, 1, (const char **)&sha256_kernel, NULL, &error_); CHECK_RESULT(program_ == 0, "clCreateProgramWithSource failed"); const char *buildOps = NULL; if (isAMD) { // Enable caching buildOps = "-fno-alias"; } error_ = _wrapper->clBuildProgram(program_, 1, &device, buildOps, NULL, NULL); if (error_ != CL_SUCCESS) { cl_int intError; char log[16384]; intError = _wrapper->clGetProgramBuildInfo(program_, device, CL_PROGRAM_BUILD_LOG, 16384 * sizeof(char), log, NULL); printf("Build error -> %s\n", log); CHECK_RESULT(0, "clBuildProgram failed"); } kernel_ = _wrapper->clCreateKernel(program_, "CryptThread", &error_); CHECK_RESULT(kernel_ == 0, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_[0]); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_[0]); error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_uint), (void *)&blockSize_); // Foo is not part of the original test, this can be used to see how much of // the performance is limited by fetch. Set foo to 0 and all threads will // fetch the same 1k block. This way they will all be in cache and hit max // fetch speed. unsigned int foo = 1; error_ = _wrapper->clSetKernelArg(kernel_, 3, sizeof(cl_uint), (void *)&foo); } void OCLPerfCounters::run(void) { // Test runs only on GPU if (type_ != CL_DEVICE_TYPE_GPU) return; size_t global = bufSize_ / blockSize_; // 32 gives the best result due to memory thrashing. Need to optimize and // give feedback to SiSoft. size_t local = 64; char buf[256]; size_t global_work_size[1] = {global}; size_t local_work_size[1] = {local}; cl_int err = 0; cl_perfcounter_amd perfCounter; cl_perfcounter_property properties[4][2]; cl_event perfEvent; cl_ulong result; char deviceName[1024]; properties[0][0] = CL_PERFCOUNTER_GPU_BLOCK_INDEX; properties[1][0] = CL_PERFCOUNTER_GPU_COUNTER_INDEX; properties[2][0] = CL_PERFCOUNTER_GPU_EVENT_INDEX; properties[3][0] = CL_PERFCOUNTER_NONE; err = _wrapper->clGetDeviceInfo(global_device, CL_DEVICE_NAME, 1024, deviceName, NULL); CHECK_RESULT(err != CL_SUCCESS, "clGetDeviceInfo failed"); // Remove target ID features char* targetIdColon = strchr(deviceName, ':'); if (targetIdColon != nullptr) { size_t idx = targetIdColon - deviceName; deviceName[idx] = '\0'; } // Begin: to be removed when crash on Kabini is fixed if (strcmp(deviceName, "Kalindi") == 0) { char msg[256]; SNPRINTF(msg, sizeof(msg), "Exiting as device is Kabini!\t"); testDescString = msg; return; } // End: to be removed when crash on Kabini is fixed bool found = false; unsigned int devId = 0; for (int idx = 0; !found && idx < DeviceCounterSize; idx++) { if (strcmp(deviceName, DeviceInfo[idx].deviceName_) == 0) { devId = DeviceInfo[idx].devId_; properties[0][1] = DeviceInfo[idx].perfCounter_[_openTest].blockIdx; properties[1][1] = DeviceInfo[idx].perfCounter_[_openTest].counterIdx; properties[2][1] = DeviceInfo[idx].perfCounter_[_openTest].eventIdx; found = true; } } if (!found) { char msg[256]; SNPRINTF(msg, sizeof(msg), "Unsupported device(%s) for the test!\t", deviceName); testDescString = msg; return; } perfCounter = _wrapper->clCreatePerfCounterAMD(global_device, &properties[0][0], &err); CHECK_RESULT(err != CL_SUCCESS, "Create PerfCounter failed\n"); // set clock mode cl_set_device_clock_mode_input_amd setClockModeInput; setClockModeInput.clock_mode = CL_DEVICE_CLOCK_MODE_PROFILING_AMD; cl_set_device_clock_mode_output_amd setClockModeOutput = {}; _wrapper->clSetDeviceClockModeAMD(global_device, setClockModeInput, &setClockModeOutput); _wrapper->clEnqueueBeginPerfCounterAMD(cmd_queue_, 1, &perfCounter, 0, NULL, NULL); for (unsigned int i = 0; i < MAX_ITERATIONS; i++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void *)&inBuffer_[i % num_input_buf_]); error_ = _wrapper->clSetKernelArg(kernel_, 1, sizeof(cl_mem), (void *)&outBuffer_[i % num_output_buf_]); error_ = _wrapper->clEnqueueNDRangeKernel( cmd_queue_, kernel_, 1, NULL, (const size_t *)global_work_size, (const size_t *)local_work_size, 0, NULL, NULL); } CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueNDRangeKernel failed"); _wrapper->clEnqueueEndPerfCounterAMD(cmd_queue_, 1, &perfCounter, 0, NULL, &perfEvent); clWaitForEvents(1, &perfEvent); // set clock mode to default setClockModeInput.clock_mode = CL_DEVICE_CLOCK_MODE_DEFAULT_AMD; _wrapper->clSetDeviceClockModeAMD(global_device, setClockModeInput, &setClockModeOutput); _wrapper->clGetPerfCounterInfoAMD(perfCounter, CL_PERFCOUNTER_DATA, sizeof(cl_ulong), &result, NULL); err = _wrapper->clReleasePerfCounterAMD(perfCounter); CHECK_RESULT(err != CL_SUCCESS, "Release PerfCounter failed\n"); switch (_openTest) { case 0: SNPRINTF(buf, sizeof(buf), "SQ Number of Waves: %lu ", (long)result); break; case 1: if (devId >= 9) { SNPRINTF(buf, sizeof(buf), "Me1 busy for packet decode: %lu ", (long)result); } else { SNPRINTF(buf, sizeof(buf), "GRBM CP Busy: %lu ", (long)result); } break; } testDescString = buf; CHECK_RESULT(!(result > 0), "Perf counter value read is zero!\n"); } unsigned int OCLPerfCounters::close(void) { _wrapper->clFinish(cmd_queue_); if (inBuffer_) { for (int i = 0; i < num_input_buf_; ++i) { error_ = _wrapper->clReleaseMemObject(inBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(inBuffer_) failed"); } delete[] inBuffer_; } if (outBuffer_) { for (int i = 0; i < num_output_buf_; ++i) { error_ = _wrapper->clReleaseMemObject(outBuffer_[i]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject(outBuffer_) failed"); } delete[] outBuffer_; } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (cmd_queue_) { error_ = _wrapper->clReleaseCommandQueue(cmd_queue_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (context_) { error_ = _wrapper->clReleaseContext(context_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } return _crcword; } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPerfCounters.h000066400000000000000000000035211450307266000252760ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLTestImp.h" class OCLPerfCounters : public OCLTestImp { public: OCLPerfCounters(); virtual ~OCLPerfCounters(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); std::string shader_; bool setData(cl_mem buffer, unsigned int data); void checkData(cl_mem buffer); cl_context context_; cl_command_queue cmd_queue_; cl_program program_; cl_kernel kernel_; cl_mem* inBuffer_; cl_mem* outBuffer_; cl_int num_input_buf_; cl_int num_output_buf_; cl_int error_; unsigned int width_; unsigned int bufSize_; unsigned int blockSize_; static const unsigned int MAX_ITERATIONS = 1; bool isAMD; }; clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPersistent.cpp000066400000000000000000000122211450307266000253470ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPersistent.h" #include #include #include #include #include const static char* strKernel = "__kernel void persistentImage( write_only image2d_t source){ \n" " int tidX = get_global_id(0);\n" " int tidY = get_global_id(1);\n" " write_imagei( source, (int2)( tidX, tidY ),(int4)( tidX, tidY,0,0 ) " ");\n" "}\n"; OCLPersistent::OCLPersistent() : clImage_(0) { _numSubTests = 1; } OCLPersistent::~OCLPersistent() {} void OCLPersistent::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); if (_errorFlag) return; // Build the kernel program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed!"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed!"); kernel_ = _wrapper->clCreateKernel(program_, "persistentImage", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed!"); cl_image_format format; format.image_channel_data_type = CL_SIGNED_INT32; format.image_channel_order = CL_RG; cl_image_desc desc = {0}; desc.image_type = CL_MEM_OBJECT_IMAGE2D; desc.image_width = c_dimSize; desc.image_height = c_dimSize; desc.image_depth = 1; desc.image_array_size = 1; // CL_MEM_USE_PERSISTENT_MEM_AMD clImage_ = clCreateImage(context_, CL_MEM_USE_PERSISTENT_MEM_AMD | CL_MEM_WRITE_ONLY, &format, &desc, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateImage() failed"); } void OCLPersistent::run(void) { _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &clImage_); size_t dimSizes[] = {c_dimSize, c_dimSize}; size_t origin[] = {0, 0, 0}; size_t region[] = {c_dimSize, c_dimSize, 1}; size_t pitch, slice; cl_event event; error_ = _wrapper->clEnqueueNDRangeKernel( cmdQueues_[_deviceId], kernel_, 2, NULL, dimSizes, NULL, 0, NULL, NULL); error_ = _wrapper->clEnqueueMarkerWithWaitList(cmdQueues_[_deviceId], 0, NULL, &event); _wrapper->clFlush(cmdQueues_[_deviceId]); cl_uint status; _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_uint), &status, NULL); while (status != CL_COMPLETE) { _wrapper->clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_uint), &status, NULL); } unsigned int* image = (unsigned int*)_wrapper->clEnqueueMapImage( cmdQueues_[_deviceId], clImage_, CL_TRUE, CL_MAP_READ, origin, region, &pitch, &slice, 0, NULL, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMapImage() failed"); bool result = validateImage(image, pitch, c_dimSize); CHECK_RESULT(!result, "Validation failed!"); _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], clImage_, image, 0, NULL, NULL); } unsigned int OCLPersistent::close(void) { _wrapper->clReleaseMemObject(clImage_); return OCLTestImp::close(); } bool OCLPersistent::validateImage(unsigned int* image, size_t pitch, unsigned int dimSize) { unsigned int x, y; int idx = 0; for (y = 0; y < dimSize; y++) { for (x = 0; x < dimSize; x++) { if ((image[idx] != x) || (image[idx + 1] != y)) { printf("Failed at coordinate (%5d, %5d) - R:%d, G:%d value\n", x, y, image[idx], image[idx + 1]); return false; } idx += 2; } image += pitch / sizeof(int); idx = 0; } return true; } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPersistent.h000066400000000000000000000034171450307266000250230ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PERSISTENT_H_ #define _OCL_PERSISTENT_H_ #include "OCLTestImp.h" class OCLPersistent : public OCLTestImp { public: OCLPersistent(); virtual ~OCLPersistent(); static const unsigned int c_dimSize = 510; virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceId); virtual void run(void); virtual unsigned int close(void); private: //////////////////// // test functions // //////////////////// bool validateImage(unsigned int* image, size_t pitch, unsigned int dimSize); ///////////////////// // private members // ///////////////////// // CL identifiers cl_mem clImage_; }; #endif // _OCL_GL_BUFFER_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPinnedMemory.cpp000066400000000000000000000166051450307266000256270ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPinnedMemory.h" #ifdef _WIN32 #include // Pick up from OCLSVM size_t getTotalSystemMemory(); #else #include size_t getTotalSystemMemory() { struct sysinfo info; sysinfo(&info); return info.totalram; } #endif #include #include #include OCLPinnedMemory::OCLPinnedMemory() { _numSubTests = 2; } OCLPinnedMemory::~OCLPinnedMemory() {} void OCLPinnedMemory::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_ERROR(error_, "Error opening test"); _openTest = test; host_memory_ = nullptr; #ifdef _WIN32 // Observed failures on Win7 if (!IsWindows8OrGreater()) { printf("Test requires Win10, skipping...\n"); _openTest = -1; return; } #endif cl_int status; // Observed failures with Carrizo on GSL path cl_bool is_apu; status = clGetDeviceInfo(devices_[deviceId], CL_DEVICE_HOST_UNIFIED_MEMORY, sizeof(cl_bool), &is_apu, nullptr); CHECK_ERROR(status, "clGetDeviceInfo failed."); if (is_apu) { printf("Test not supported for apus, skipping...\n"); _openTest = -1; return; } cl_uint address_bits; status = clGetDeviceInfo(devices_[deviceId], CL_DEVICE_ADDRESS_BITS, sizeof(cl_uint), &address_bits, nullptr); CHECK_ERROR(status, "clGetDeviceInfo failed."); if (address_bits < 64u) { printf("GPU VA range size below 4GB, skipping...\n"); _openTest = -1; return; } row_size_ = getTotalSystemMemory(); if (row_size_ <= (1ull << 32u)) { printf("System memory below 4GB, skipping...\n"); _openTest = -1; return; } row_size_ *= ratio_; #if EMU_ENV if (row_size_ > 5000) { row_size_ = 5000; } #endif row_size_ = floor(sqrt(row_size_)); row_size_ = (row_size_ + row_data_size_ - 1) & ~(row_data_size_ - 1); pin_size_ = row_size_ * row_size_ / row_data_size_; host_memory_ = new row_data_t[pin_size_]; } void OCLPinnedMemory::runNoPrepinnedMemory() { cl_int status; row_data_t* tmp = new row_data_t[row_size_]; std::iota(tmp, tmp + row_size_, 0); std::fill_n(host_memory_, pin_size_, 0); cl_mem tmp_buffer = clCreateBuffer(context_, CL_MEM_USE_HOST_PTR, row_size_ * row_data_size_, tmp, &status); CHECK_ERROR(status, "clCreateBuffer failed."); cl_mem buffer = clCreateBuffer(context_, CL_MEM_READ_WRITE, row_size_ * row_data_size_, nullptr, &status); CHECK_ERROR(status, "clCreateBuffer failed."); status = clEnqueueCopyBuffer(cmdQueues_[_deviceId], tmp_buffer, buffer, 0, 0, row_size_ * row_data_size_, 0, nullptr, nullptr); CHECK_ERROR(status, "clEnqueueCopyBuffer failed."); clFinish(cmdQueues_[_deviceId]); size_t buffer_offset[3] = {0, 0, 0}; size_t host_offset[3] = {0, 0, 0}; size_t region[3] = {row_data_size_, row_size_, 1}; status = clEnqueueReadBufferRect( cmdQueues_[_deviceId], buffer, CL_TRUE, buffer_offset, host_offset, region, 0, 0, row_size_, 0, host_memory_, 0, nullptr, nullptr); CHECK_ERROR(status, "clEnqueueReadBufferRect failed."); status = clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(status, "clFinish failed."); for (uint64_t i = 0; i < row_size_; i++) { if (tmp[i] != host_memory_[i * row_size_ / row_data_size_]) { status = -1; break; } } CHECK_RESULT(status == -1, "Error when reading data."); status = clReleaseMemObject(buffer); CHECK_ERROR(status, "clReleaseMemObject failed."); status = clReleaseMemObject(tmp_buffer); CHECK_ERROR(status, "clReleaseMemObject failed."); delete[] tmp; } void OCLPinnedMemory::runPrepinnedMemory() { cl_int status; row_data_t* tmp = new row_data_t[row_size_]; std::iota(tmp, tmp + row_size_, 0); std::fill_n(host_memory_, pin_size_, 0); cl_mem tmp_buffer = clCreateBuffer(context_, CL_MEM_USE_HOST_PTR, row_size_ * row_data_size_, tmp, &status); CHECK_ERROR(status, "clCreateBuffer failed."); cl_mem buffer = clCreateBuffer(context_, CL_MEM_READ_WRITE, row_size_ * row_data_size_, nullptr, &status); CHECK_ERROR(status, "clCreateBuffer failed."); status = clEnqueueCopyBuffer(cmdQueues_[_deviceId], tmp_buffer, buffer, 0, 0, row_size_ * row_data_size_, 0, nullptr, nullptr); CHECK_ERROR(status, "clEnqueueCopyBuffer failed."); cl_mem pinned_buffer = clCreateBuffer(context_, CL_MEM_USE_HOST_PTR, pin_size_ * row_data_size_, host_memory_, &status); CHECK_ERROR(status, "clCreateBuffer failed."); clEnqueueMapBuffer(cmdQueues_[_deviceId], pinned_buffer, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, pin_size_ * row_data_size_, 0, nullptr, nullptr, &status); CHECK_ERROR(status, "clEnqueueMapBuffer failed."); size_t buffer_offset[3] = {0, 0, 0}; size_t host_offset[3] = {0, 0, 0}; size_t region[3] = {row_data_size_, row_size_, 1}; status = clEnqueueReadBufferRect( cmdQueues_[_deviceId], buffer, CL_TRUE, buffer_offset, host_offset, region, 0, 0, row_size_, 0, host_memory_, 0, nullptr, nullptr); CHECK_ERROR(status, "clEnqueueReadBufferRect failed."); for (uint64_t i = 0; i < row_size_; i++) { if (tmp[i] != host_memory_[i * row_size_ / row_data_size_]) { status = -1; break; } } CHECK_RESULT(status == -1, "Error when reading data."); status = clEnqueueUnmapMemObject(cmdQueues_[_deviceId], pinned_buffer, host_memory_, 0, nullptr, nullptr); CHECK_ERROR(status, "clEnqueueUnmap failed.") status = clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(status, "clFinish failed."); status = clReleaseMemObject(pinned_buffer); CHECK_ERROR(status, "clReleaseMemObject failed."); status = clReleaseMemObject(buffer); CHECK_ERROR(status, "clReleaseMemObject failed."); status = clReleaseMemObject(tmp_buffer); CHECK_ERROR(status, "clReleaseMemObject failed."); delete[] tmp; } void OCLPinnedMemory::run() { switch (_openTest) { case 0: runNoPrepinnedMemory(); break; case 1: runPrepinnedMemory(); break; } } unsigned int OCLPinnedMemory::close() { delete[] host_memory_; return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPinnedMemory.h000066400000000000000000000033521450307266000252670ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PINNED_MEMORY_H_ #define _OCL_PINNED_MEMORY_H_ #include #include "OCLTestImp.h" class OCLPinnedMemory : public OCLTestImp { public: OCLPinnedMemory(); ~OCLPinnedMemory(); void open(unsigned int test, char* units, double& conversion, unsigned int deviceId) override; void run() override; unsigned int close() override; private: void runNoPrepinnedMemory(); void runPrepinnedMemory(); static constexpr const float ratio_ = 0.4f; using row_data_t = uint64_t; row_data_t* host_memory_; size_t row_data_size_ = sizeof(row_data_t); size_t row_size_; size_t pin_size_; }; #endif // _OCL_PINNED_MEMORY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPlatformAtomics.cpp000066400000000000000000000156661450307266000263330ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLPlatformAtomics.h" #include #include #include #include "CL/cl.h" const static char* strKernel = "__kernel void test_atomic_kernel(volatile __global atomic_int *pSync, " "volatile __global atomic_int *ptr, int numIterations)\n" "{ " " \n" " while(atomic_load_explicit(pSync, memory_order_acquire, " "memory_scope_all_svm_devices) == 0); \n" " for (int i = 0; i < numIterations; i++) { " " \n" " atomic_fetch_add_explicit(ptr, 1, memory_order_acq_rel, " "memory_scope_all_svm_devices); \n" " } " " \n" "} " " \n"; OCLPlatformAtomics::OCLPlatformAtomics() { _numSubTests = 1; failed_ = false; svmCaps_ = 0; } OCLPlatformAtomics::~OCLPlatformAtomics() {} void OCLPlatformAtomics::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } delete strVersion; program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "test_atomic_kernel", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); } static int AtomicLoad(volatile cl_int* object) { #if defined(_MSC_VER) || defined(__INTEL_COMPILER) return InterlockedExchangeAdd((volatile long*)object, 0); #elif defined(__GNUC__) return __sync_add_and_fetch(object, 0); #else printf("Atomic load not supported, aborting..."); return 0; #endif } static int AtomicIncrement(volatile cl_int* object) { #if defined(_MSC_VER) || defined(__INTEL_COMPILER) return _InterlockedIncrement((volatile long*)object); #elif defined(__GNUC__) return __sync_fetch_and_add(object, 1); #endif printf("Atomic increment not supported, aborting..."); return 0; } void OCLPlatformAtomics::run(void) { if (failed_) return; #ifdef CL_VERSION_2_0 error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_SVM_CAPABILITIES, sizeof(svmCaps_), &svmCaps_, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetDeviceInfo() failed"); if (!(svmCaps_ & CL_DEVICE_SVM_ATOMICS)) { printf("SVM atomics not supported, skipping test...\n"); return; } volatile cl_int* pSyncBuf = (volatile cl_int*)_wrapper->clSVMAlloc( context_, CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS, sizeof(cl_int), 0); CHECK_RESULT(!pSyncBuf, "clSVMAlloc() failed"); *pSyncBuf = 0; volatile cl_int* pAtomicBuf = (volatile cl_int*)_wrapper->clSVMAlloc( context_, CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS, sizeof(cl_int), 0); CHECK_RESULT(!pAtomicBuf, "clSVMAlloc() failed"); *pAtomicBuf = 0; error_ = _wrapper->clSetKernelArgSVMPointer(kernel_, 0, (const void*)pSyncBuf); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArgSVMPointer() failed"); error_ = _wrapper->clSetKernelArgSVMPointer(kernel_, 1, (const void*)pAtomicBuf); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArgSVMPointer() failed"); cl_int numIterations = 0x100000; error_ = _wrapper->clSetKernelArg(kernel_, 2, sizeof(cl_int), &numIterations); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t globalWorkSize[1] = {1}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, globalWorkSize, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); clFlush(cmdQueues_[_deviceId]); AtomicIncrement(pSyncBuf); // wait until we see some activity from a device (try to run host side // simultaneously). while (AtomicLoad(pAtomicBuf /*, memory_order_relaxed*/) == 0) ; for (int i = 0; i < numIterations; i++) { AtomicIncrement(pAtomicBuf); } error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "clFinish() failed"); int expected = numIterations * 2; CHECK_RESULT(*pAtomicBuf != expected, "Expected: 0x%x, found: 0x%x", expected, *pAtomicBuf); _wrapper->clSVMFree(context_, (void*)pSyncBuf); _wrapper->clSVMFree(context_, (void*)pAtomicBuf); #endif } unsigned int OCLPlatformAtomics::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLPlatformAtomics.h000066400000000000000000000030531450307266000257630ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_PLATFORM_ATOMICS_H_ #define _OCL_PLATFORM_ATOMICS_H_ #include "OCLTestImp.h" class OCLPlatformAtomics : public OCLTestImp { public: OCLPlatformAtomics(); virtual ~OCLPlatformAtomics(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); bool failed_; unsigned long long svmCaps_; }; #endif // _OCL_KERNEL_BINARY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLProgramScopeVariables.cpp000066400000000000000000000300761450307266000274510ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLProgramScopeVariables.h" #include "CL/cl.h" OCLProgramScopeVariables::OCLProgramScopeVariables() { _numSubTests = 3; } OCLProgramScopeVariables::~OCLProgramScopeVariables() {} void OCLProgramScopeVariables::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "error_ opening test"); silentFailure = false; _openTest = test; size_t param_size = 0; program_ = 0; kernel1_ = kernel2_ = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo( devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); strVersion = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_OPENCL_C_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformInfo failed"); if (strVersion[9] < '2') { printf("\nOpenCL C 2.0 not supported\n"); silentFailure = true; } free(strVersion); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLProgramScopeVariables::run(void) { if (silentFailure) return; switch (_openTest) { case 0: test0(); break; case 1: test1(); break; case 2: test2(); break; } return; } void OCLProgramScopeVariables::test0(void) { const char* kernel_str = "global int g[1000] = {0}; \n\ __kernel void test1 (global unsigned int * A) \n\ { \n\ int id = get_global_id(0); \n\ g[id] = id; \n\ } \n\ __kernel void test2 (global unsigned int * A) \n\ { \n\ int id = get_global_id(0); \n\ A[id] = g[id]; \n\ } \n"; const size_t arrSize = 1000; cl_uint* output_arr = (cl_uint*)malloc(arrSize * sizeof(cl_uint)); cl_mem buffer = _wrapper->clCreateBuffer( context_, CL_MEM_READ_WRITE, arrSize * sizeof(cl_uint), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel1_ = _wrapper->clCreateKernel(program_, "test1", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel1 failed"); kernel2_ = _wrapper->clCreateKernel(program_, "test2", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel2 failed"); error_ = _wrapper->clSetKernelArg(kernel1_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; size_t global_work_size = arrSize; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel1_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel2_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_uint) * arrSize, output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); bool bResult = true; for (unsigned int i = 0; i < arrSize; ++i) { if (output_arr[i] != i) { bResult = false; break; } } free(output_arr); CHECK_RESULT((bResult == false), "Program Scope Variables - test0 failed"); } void OCLProgramScopeVariables::test1(void) { const char* kernel_str = "global int temp = 0; \n\ __kernel void test1 (global unsigned int * A) \n\ { \n\ int id = get_global_id(0); \n\ if (id == 0) temp = 55; \n\ } \n\ __kernel void test2 (global unsigned int * A) \n\ { \n\ int id = get_global_id(0); \n\ if (id == 0) A[0] = temp; \n\ } \n"; cl_uint* output_arr = (cl_uint*)malloc(sizeof(cl_uint)); cl_mem buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, sizeof(cl_uint), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel1_ = _wrapper->clCreateKernel(program_, "test1", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel1 failed"); kernel2_ = _wrapper->clCreateKernel(program_, "test2", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel2 failed"); error_ = _wrapper->clSetKernelArg(kernel1_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; size_t global_work_size = 1; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel1_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel2_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_uint), output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); bool bResult = (output_arr[0] == 55); free(output_arr); CHECK_RESULT((bResult == false), "Program Scope Variables - test1 failed"); } void OCLProgramScopeVariables::test2(void) { const char* kernel_str = "global int temp = 0; \n\ global int* ptr[] = {&temp}; \n\ __kernel void test1 (global unsigned int * A) \n\ { \n\ int id = get_global_id(0); \n\ if (id == 0) temp = 65; \n\ } \n\ __kernel void test2 (global unsigned int * A) \n\ { \n\ int id = get_global_id(0); \n\ if (id == 0) A[0] = *ptr[0]; \n\ } \n"; cl_uint* output_arr = (cl_uint*)malloc(sizeof(cl_uint)); cl_mem buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, sizeof(cl_uint), 0, &error_); buffers_.push_back(buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &kernel_str, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], "-cl-std=CL2.0", NULL, NULL); if (error_ != CL_SUCCESS) { char log[400]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 400, log, 0); printf("\n\n%s\n\n", log); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram failed"); kernel1_ = _wrapper->clCreateKernel(program_, "test1", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel1 failed"); kernel2_ = _wrapper->clCreateKernel(program_, "test2", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel2 failed"); error_ = _wrapper->clSetKernelArg(kernel1_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); error_ = _wrapper->clSetKernelArg(kernel2_, 0, sizeof(cl_mem), (void*)&buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed"); cl_event evt; size_t global_work_size = 1; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel1_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); _wrapper->clFinish(cmdQueues_[_deviceId]); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel2_, 1, NULL, &global_work_size, NULL, 0, NULL, &evt); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[0], CL_TRUE, 0, sizeof(cl_uint), output_arr, 1, &evt, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer failed"); bool bResult = (output_arr[0] == 65); free(output_arr); CHECK_RESULT((bResult == false), "Program Scope Variables - test2 failed"); } unsigned int OCLProgramScopeVariables::close(void) { if (kernel1_) { error_ = _wrapper->clReleaseKernel(kernel1_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel1 failed"); kernel1_ = 0; } if (kernel2_) { error_ = _wrapper->clReleaseKernel(kernel2_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel2 failed"); kernel2_ = 0; } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLProgramScopeVariables.h000066400000000000000000000032501450307266000271100ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_ProgramScopeVariables_H_ #define _OCL_ProgramScopeVariables_H_ #include "OCLTestImp.h" class OCLProgramScopeVariables : public OCLTestImp { public: OCLProgramScopeVariables(); virtual ~OCLProgramScopeVariables(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: void test0(void); void test1(void); void test2(void); bool silentFailure; cl_kernel kernel1_; cl_kernel kernel2_; }; #endif // _OCL_ProgramScopeVariables_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLRTQueue.cpp000066400000000000000000000412201450307266000245420ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLRTQueue.h" #include #include #include #include #include "CL/cl.h" static const size_t Iterations = 0x100; static const size_t IterationDivider = 2; static const size_t MaxBuffers = IterationDivider; static const size_t BufSize = 0x800000; const static char* strKernel = "__kernel void factorial(__global uint* out) \n" "{ \n" " uint id = get_global_id(0); \n" " uint factorial = 1; \n" " for (uint i = 1; i < (id / 0x400); ++i) \n" " { \n" " factorial *= i; \n" " } \n" " out[id] = factorial; \n" "} \n"; OCLRTQueue::OCLRTQueue(): rtQueue_(NULL), rtQueue1_(NULL), kernel2_(NULL) { #ifndef CL_VERSION_2_0 _numSubTests = 0; testID_ = 0; failed_ = false; #else _numSubTests = 2; testID_ = 0; failed_ = false; #endif } OCLRTQueue::~OCLRTQueue() {} void OCLRTQueue::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { #ifdef CL_VERSION_2_0 OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; size_t param_size = 0; char* strVersion = 0; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strVersion = new char[param_size]; error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_VERSION, param_size, strVersion, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strVersion[7] < '2') { failed_ = true; return; } cl_uint rtQueues; #define CL_DEVICE_MAX_REAL_TIME_COMPUTE_QUEUES_AMD 0x404D #define CL_DEVICE_MAX_REAL_TIME_COMPUTE_UNITS_AMD 0x404E #define CL_DEVICE_MAX_REAL_TIME_COMPUTE_UNITS_GRANULARITY_AMD 0x403A error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_MAX_REAL_TIME_COMPUTE_QUEUES_AMD, sizeof(rtQueues), &rtQueues, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (rtQueues < 2) { failed_ = true; return; } error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_MAX_REAL_TIME_COMPUTE_UNITS_AMD, sizeof(rtCUs_), &rtCUs_, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(maxCUs_), &maxCUs_, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); error_ = _wrapper->clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_MAX_REAL_TIME_COMPUTE_UNITS_GRANULARITY_AMD, sizeof(rtCUsGranularity_), &rtCUsGranularity_, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "factorial", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; for (size_t i = 0; i < MaxBuffers; ++i) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, BufSize * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_ALLOC_HOST_PTR, BufSize * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); #endif } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLRTQueue::run(void) { #ifdef CL_VERSION_2_0 if (failed_) { return; } void* values; CPerfCounter timer; cl_mem mapBuffer = buffers()[MaxBuffers]; values = _wrapper->clEnqueueMapBuffer(cmdQueues_[_deviceId], mapBuffer, true, (CL_MAP_READ | CL_MAP_WRITE), 0, BufSize * sizeof(cl_uint), 0, NULL, NULL, &error_); cl_mem buffer = buffers()[0]; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); // SubTest: 1 size_t gws[1] = {BufSize}; size_t x; error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Reset(); timer.Start(); for (x = 0; x < 1; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); double sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s double perf = ((double)BufSize * sizeof(cl_uint) * x * (double)(1e-09)) / sec; printf("\n Generic Queue(CUs: %d) Time: %.3fs\n", maxCUs_, sec); // SubTest: 2 bool test_rtq1 = true; if (testID_ == 0) { cu_ = rtCUs_ >> 1; } else { cu_ = rtCUs_; test_rtq1 = false; } if (cu_ == 0) { cu_ = rtCUs_; test_rtq1 = false; } if (cu_ < rtCUsGranularity_) { printf("The num of CUs is less than granularity, skipping...\n"); return; } // Create a real time queue #define CL_QUEUE_REAL_TIME_COMPUTE_UNITS_AMD 0x404f const cl_queue_properties cprops[] = { CL_QUEUE_PROPERTIES, static_cast(0), CL_QUEUE_REAL_TIME_COMPUTE_UNITS_AMD, cu_, 0}; rtQueue_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[_deviceId], cprops, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(rtQueue_, kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(rtQueue_); timer.Reset(); timer.Start(); for (x = 0; x < 1; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(rtQueue_, kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(rtQueue_); timer.Stop(); sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s perf = ((double)BufSize * sizeof(cl_uint) * x * (double)(1e-09)) / sec; printf(" RT Queue0 (CUs: %2d) Time: %.3fs\n", cu_, sec); rtQueue1_ = nullptr; if (test_rtq1) { #define CL_QUEUE_MEDIUM_PRIORITY_AMD 0x4050 const cl_queue_properties cprops2[] = {CL_QUEUE_PROPERTIES, static_cast(0), CL_QUEUE_MEDIUM_PRIORITY_AMD, cu_, 0}; rtQueue1_ = _wrapper->clCreateCommandQueueWithProperties( context_, devices_[_deviceId], cprops2, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateCommandQueueWithProperties() failed"); } if (rtQueue1_) { error_ = _wrapper->clEnqueueNDRangeKernel(rtQueue1_, kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); _wrapper->clFinish(rtQueue1_); timer.Reset(); timer.Start(); for (x = 0; x < 1; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(rtQueue1_, kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(rtQueue1_); timer.Stop(); sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s perf = ((double)BufSize * sizeof(cl_uint) * x * (double)(1e-09)) / sec; printf(" RT Queue1 (CUs: %2d) Time: %.3fs\n", cu_, sec); } else { if (testID_ == 0) { printf(" RT Queue1 test was skipped. Not enough CUs - %2d)", cu_); } } // SubTest: 3 timer.Reset(); timer.Start(); for (x = 0; x < 1; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(cmdQueues_[_deviceId]); timer.Stop(); sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s perf = ((double)BufSize * sizeof(cl_uint) * x * (double)(1e-09)) / sec; printf(" Generic Queue(CUs: %d) Time: %.3fs\n", maxCUs_ - rtCUs_, sec); // SubTest: 4 for (x = 0; x < Iterations / 10; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFlush(cmdQueues_[_deviceId]); timer.Reset(); timer.Start(); for (x = 0; x < 1; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(rtQueue_, kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(rtQueue_); timer.Stop(); _wrapper->clFinish(cmdQueues_[_deviceId]); sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s perf = ((double)BufSize * sizeof(cl_uint) * x * (double)(1e-09)) / sec; printf(" Async RT0(CUs: %d) + Generic(CUs: %d) Time: %.3fs\n", cu_, maxCUs_ - rtCUs_, sec); // SubTest: 5 if (rtQueue1_) { for (x = 0; x < Iterations / 10; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFlush(cmdQueues_[_deviceId]); timer.Reset(); timer.Start(); for (x = 0; x < 1; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(rtQueue1_, kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFinish(rtQueue1_); timer.Stop(); _wrapper->clFinish(cmdQueues_[_deviceId]); sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s perf = ((double)BufSize * sizeof(cl_uint) * x * (double)(1e-09)) / sec; printf(" Async RT1(CUs: %d) + Generic(CUs: %d) Time: %.3fs\n", cu_, maxCUs_ - rtCUs_, sec); } else { if (testID_ == 0) { printf(" RT Queue1 test was skipped. Not enough CUs - %2d)", cu_); } } // SubTest: 6 if (rtQueue1_) { for (x = 0; x < Iterations / 10; x++) { error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFlush(cmdQueues_[_deviceId]); timer.Reset(); timer.Start(); for (x = 0; x < 1; x++) { error_ = _wrapper->clEnqueueNDRangeKernel(rtQueue_, kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFlush(rtQueue_); for (x = 0; x < 1; x++) { error_ = _wrapper->clEnqueueNDRangeKernel(rtQueue1_, kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); } _wrapper->clFlush(rtQueue1_); _wrapper->clFinish(rtQueue_); _wrapper->clFinish(rtQueue1_); timer.Stop(); _wrapper->clFlush(cmdQueues_[_deviceId]); sec = timer.GetElapsedTime(); // Buffer read bandwidth in GB/s perf = ((double)BufSize * sizeof(cl_uint) * x * (double)(1e-09)) / sec; printf(" Async RT0(CUs: %d) + RT1(CUs: %d) + Generic(CUs: %d) Time: %.3fs\n", cu_, cu_, maxCUs_ - rtCUs_, sec); error_ = _wrapper->clEnqueueUnmapMemObject(cmdQueues_[_deviceId], mapBuffer, values, 0, NULL, NULL); _wrapper->clFinish(cmdQueues_[_deviceId]); } else { if (testID_ == 0) { printf(" RT Queue1 test was skipped. Not enough CUs - %2d)", cu_); } } #endif } unsigned int OCLRTQueue::close(void) { #ifdef CL_VERSION_2_0 if (NULL != rtQueue_) { _wrapper->clReleaseCommandQueue(rtQueue_); } if (NULL != rtQueue1_) { _wrapper->clReleaseCommandQueue(rtQueue1_); } if (NULL != kernel2_) { _wrapper->clReleaseKernel(kernel2_); } return OCLTestImp::close(); #else return CL_SUCCESS; #endif } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLRTQueue.h000066400000000000000000000032431450307266000242120ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_RT_QUEUE_H_ #define _OCL_RT_QUEUE_H_ #include "OCLTestImp.h" class OCLRTQueue : public OCLTestImp { public: OCLRTQueue(); virtual ~OCLRTQueue(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: cl_command_queue rtQueue_; cl_command_queue rtQueue1_; cl_kernel kernel2_; unsigned int testID_; bool failed_; cl_uint cu_; cl_uint maxCUs_; cl_uint rtCUs_; cl_uint rtCUsGranularity_; }; #endif // _OCL_RT_QUEUE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLReadWriteImage.cpp000066400000000000000000000363371450307266000260560ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLReadWriteImage.h" #include #include #include #include #ifdef __linux__ #include #include #endif #include "CL/cl.h" const static size_t imageSize = 4; const static size_t MaxSubTests = 4; static const char *rgba8888_kernel_read = "\n" "__kernel void read_rgba8888(read_only image2d_t srcimg, __global uchar4 " "*dst, sampler_t sampler)\n" "{\n" " int tid_x = get_global_id(0);\n" " int tid_y = get_global_id(1);\n" " int indx = tid_y * get_image_width(srcimg) + tid_x;\n" " float4 color;\n" "\n" " color = read_imagef(srcimg, sampler, (int2)(tid_x, tid_y)) * 255.0f;\n" " dst[indx] = convert_uchar4_rte(color);\n" "\n" "}\n"; static const char *rgba8888_kernel_write = "\n" "__kernel void write_rgba8888(__global unsigned char *src, write_only " "image2d_t dstimg)\n" "{\n" " int tid_x = get_global_id(0);\n" " int tid_y = get_global_id(1);\n" " int indx = tid_y * get_image_width(dstimg) + tid_x;\n" " float4 color;\n" "\n" " indx *= 4;\n" " color = (float4)((float)src[indx+0], (float)src[indx+1], " "(float)src[indx+2], (float)src[indx+3]);\n" " color /= (float4)(255.0f, 255.0f, 255.0f, 255.0f);\n" " write_imagef(dstimg, (int2)(tid_x, tid_y), color);\n" "\n" "}\n"; OCLReadWriteImage::OCLReadWriteImage() { _numSubTests = MaxSubTests; done_ = false; imageWidth = imageSize; imageHeight = imageSize; imageDepth = imageSize; } OCLReadWriteImage::~OCLReadWriteImage() {} bool OCLReadWriteImage::verifyImageData(unsigned char *inputImageData, unsigned char *output, size_t width, size_t height) { for (unsigned int i = 0; i < 4 * width * height; i++) { if (output[i] != inputImageData[i]) { printf( "Verification failed at byte %u in the output image => %x != %x " "[reference]\n", i, output[i], inputImageData[i]); return false; } } return true; } void OCLReadWriteImage::open(unsigned int test, char *units, double &conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); testID_ = test; cl_bool imageSupport; size_t size; for (size_t i = 0; i < deviceCount_; ++i) { _wrapper->clGetDeviceInfo(devices_[i], CL_DEVICE_IMAGE_SUPPORT, sizeof(imageSupport), &imageSupport, &size); if (!imageSupport) { testDescString = "Image not supported, skipping this test! "; done_ = true; return; } } if (test == 1) { program_ = _wrapper->clCreateProgramWithSource( context_, 1, &rgba8888_kernel_read, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "read_rgba8888", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); } else if ((test == 2) || (test == 3)) { program_ = _wrapper->clCreateProgramWithSource( context_, 1, &rgba8888_kernel_write, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "write_rgba8888", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); } cl_mem memory; cl_image_format imgageFormat; imgageFormat.image_channel_order = CL_RGBA; imgageFormat.image_channel_data_type = CL_UNORM_INT8; bufferSize = imageWidth * imageHeight * 4 * sizeof(unsigned char); memory = _wrapper->clCreateImage2D(context_, CL_MEM_READ_WRITE, &imgageFormat, imageWidth, imageHeight, 0, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateImage() failed"); buffers_.push_back(memory); if ((test == 1) || (test == 2) || (test == 3)) { memory = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, bufferSize, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(memory); } } static void CL_CALLBACK notify_callback(const char *errinfo, const void *private_info, size_t cb, void *user_data) {} void OCLReadWriteImage::run(void) { if (done_) { return; } const unsigned int inputImageData[imageSize][imageSize] = { {0xc0752fac, 0x67c3fb43, 0xf215d309, 0xd8465724}, {0xc13a8c58, 0xae5727e6, 0x19a55158, 0x9409484d}, {0xc5f3d073, 0xc0af4ffe, 0xb1d86352, 0x93931df3}, {0xc120a78e, 0x207fb909, 0x97f4ca1f, 0x72cbfea3}}; unsigned char *outputPtr = (unsigned char *)malloc(bufferSize); size_t origin[3] = {0, 0, 0}; size_t region[3] = {imageWidth, imageHeight, 1}; bool validation; size_t threads[2]; switch (testID_) { case 0: // ImageWrite (w/ sDMA) and ImageRead (w/ sDMA) error_ = _wrapper->clEnqueueWriteImage(cmdQueues_[_deviceId], buffers_[0], true, origin, region, 0, 0, inputImageData, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteImage() failed"); error_ = _wrapper->clEnqueueReadImage(cmdQueues_[_deviceId], buffers_[0], true, origin, region, 0, 0, outputPtr, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadImage() failed"); validation = verifyImageData((unsigned char *)&inputImageData, outputPtr, imageWidth, imageHeight); if (validation) { printf("ImageWrite (w/ sDMA) -> ImageRead (w/ sDMA) passed!\n"); } else { CHECK_RESULT(true, "ImageWrite (w/ sDMA) -> ImageRead (w/ sDMA) failed!\n"); } break; case 1: // ImageWrite (w/ sDMA) and ImageRead (w/ kernel) error_ = _wrapper->clEnqueueWriteImage(cmdQueues_[_deviceId], buffers_[0], true, origin, region, 0, 0, inputImageData, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteImage() failed"); cl_sampler sampler; sampler = _wrapper->clCreateSampler(context_, CL_FALSE, CL_ADDRESS_CLAMP_TO_EDGE, CL_FILTER_NEAREST, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateSampler failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof buffers_[0], &buffers_[0]); error_ |= clSetKernelArg(kernel_, 1, sizeof buffers_[1], &buffers_[1]); error_ |= clSetKernelArg(kernel_, 2, sizeof sampler, &sampler); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed\n"); threads[0] = (unsigned int)imageWidth; threads[1] = (unsigned int)imageHeight; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, threads, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[1], CL_TRUE, 0, bufferSize, outputPtr, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); validation = verifyImageData((unsigned char *)&inputImageData, outputPtr, imageWidth, imageHeight); if (validation) { printf("ImageWrite (w/ sDMA) -> ImageRead (w/ kernel) passed!\n"); } else { CHECK_RESULT(true, "ImageWrite (w/ sDMA) -> ImageRead (w/ kernel) failed!\n"); } break; case 2: // ImageWrite (w/ kernel) and ImageRead (w/ sDMA) error_ = _wrapper->clEnqueueWriteBuffer( cmdQueues_[_deviceId], buffers_[1], CL_TRUE, 0, bufferSize, inputImageData, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof buffers_[1], &buffers_[1]); error_ |= clSetKernelArg(kernel_, 1, sizeof buffers_[0], &buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed\n"); threads[0] = (unsigned int)imageWidth; threads[1] = (unsigned int)imageHeight; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, threads, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clEnqueueReadImage(cmdQueues_[_deviceId], buffers_[0], true, origin, region, 0, 0, outputPtr, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadImage() failed"); validation = verifyImageData((unsigned char *)&inputImageData, outputPtr, imageWidth, imageHeight); if (validation) { printf("ImageWrite (w/ kernel) -> ImageRead (w/ sDMA) passed!\n"); } else { CHECK_RESULT(true, "ImageWrite (w/ kernel) -> ImageRead (w/ sDMA) failed!\n"); } break; case 3: // ImageWrite (w/ kernel) and ImageRead (w/ kernel) error_ = _wrapper->clEnqueueWriteBuffer( cmdQueues_[_deviceId], buffers_[1], CL_TRUE, 0, bufferSize, inputImageData, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof buffers_[1], &buffers_[1]); error_ |= clSetKernelArg(kernel_, 1, sizeof buffers_[0], &buffers_[0]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed\n"); threads[0] = (unsigned int)imageWidth; threads[1] = (unsigned int)imageHeight; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, threads, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); // recreate the program_ to use the read kernel program_ = _wrapper->clCreateProgramWithSource( context_, 1, &rgba8888_kernel_read, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[_deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[_deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "read_rgba8888", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); sampler = _wrapper->clCreateSampler(context_, CL_FALSE, CL_ADDRESS_CLAMP_TO_EDGE, CL_FILTER_NEAREST, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateSampler failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof buffers_[0], &buffers_[0]); error_ |= clSetKernelArg(kernel_, 1, sizeof buffers_[1], &buffers_[1]); error_ |= clSetKernelArg(kernel_, 2, sizeof sampler, &sampler); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg failed\n"); threads[0] = (unsigned int)imageWidth; threads[1] = (unsigned int)imageHeight; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 2, NULL, threads, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[1], CL_TRUE, 0, bufferSize, outputPtr, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); validation = verifyImageData((unsigned char *)&inputImageData, outputPtr, imageWidth, imageHeight); if (validation) { printf("ImageWrite (w/ kernel) -> ImageRead (w/ kernel) passed!\n"); } else { CHECK_RESULT( true, "ImageWrite (w/ kernel) -> ImageRead (w/ kernel) failed!\n"); } break; } free(outputPtr); } unsigned int OCLReadWriteImage::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLReadWriteImage.h000066400000000000000000000034541450307266000255150ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_READ_WRITE_IMAGE_H_ #define _OCL_READ_WRITE_IMAGE_H_ #include "OCLTestImp.h" class OCLReadWriteImage : public OCLTestImp { public: OCLReadWriteImage(); virtual ~OCLReadWriteImage(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool done_; unsigned int testID_; size_t maxSize_; size_t imageWidth; size_t imageHeight; size_t imageDepth; size_t bufferSize; cl_sampler sampler; bool verifyImageData(unsigned char* inputImageData, unsigned char* output, size_t width, size_t height); }; #endif // _OCL_READ_WRITE_IMAGE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLSDI.cpp000066400000000000000000000513551450307266000236410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLSDI.h" #include "Timer.h" #define NUM_TESTS 6 #include typedef struct _threadInfo { int threadID_; OCLSDI* testObj_; } ThreadInfo; const char* kernel_str_ = "__kernel void test_kernel(global unsigned int * A) \ { \ int id = get_global_id(0); \ A[id] = id + 2;\ } "; const char* testNames[NUM_TESTS] = { "WriteBuffer", "CopyBuffer", "NDRangeKernel", "MapBuffer", "WriteBufferRect", "CopyImageToBuffer", }; void* ThreadMain(void* data) { if (data == NULL) { return 0; } ThreadInfo* threadData = (ThreadInfo*)data; threadData->testObj_->threadEntry(threadData->threadID_); return NULL; } OCLSDI::OCLSDI() { // If there are two different gpus in the system, // we have to test each of them as sender and receiver _numSubTests = 2 * NUM_TESTS; } OCLSDI::~OCLSDI() {} void OCLSDI::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { cl_uint numPlatforms = 0; cl_platform_id platform = NULL; cl_uint num_devices = 0; _crcword = 0; conversion = 1.0f; program_ = 0; kernel_ = 0; srcBuff_ = 0; _openTest = test % NUM_TESTS; bufSize_ = 0x10000; error_ = 0; markerValue_ = 0x12345; inputArr_ = 0; outputArr_ = 0; success_ = true; extPhysicalBuff_ = 0; silentFailure = false; busAddressableBuff_ = 0; devices_[0] = devices_[1] = 0; contexts_[0] = contexts_[1] = 0; cmd_queues_[0] = cmd_queues_[1] = 0; image_ = 0; inputArr_ = (cl_uint*)malloc(bufSize_); outputArr_ = (cl_uint*)malloc(bufSize_); for (unsigned int i = 0; i < (bufSize_ / sizeof(cl_uint)); ++i) { inputArr_[i] = i + 1; outputArr_[i] = 0; } error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(numPlatforms == 0, "clGetPlatformIDs failed"); error_ = _wrapper->clGetPlatformIDs(1, &platform, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); error_ = _wrapper->clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices); if (num_devices < 2) { printf("\nSilent Failure: Two GPUs are required to run OCLSdi test\n"); silentFailure = true; return; } error_ = _wrapper->clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 2, devices_, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); if (test >= NUM_TESTS) { cl_device_id temp = devices_[0]; devices_[0] = devices_[1]; devices_[1] = temp; } size_t param_size = 0; char* strExtensions = 0; error_ = _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_EXTENSIONS, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strExtensions = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_EXTENSIONS, param_size, strExtensions, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strstr(strExtensions, "cl_amd_bus_addressable_memory") == 0) { printf( "\nSilent Failure: cl_amd_bus_addressable_memory extension is not " "enabled on GPU 0\n"); silentFailure = true; free(strExtensions); return; } free(strExtensions); error_ = _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_EXTENSIONS, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strExtensions = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_EXTENSIONS, param_size, strExtensions, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (strstr(strExtensions, "cl_amd_bus_addressable_memory") == 0) { printf( "\nSilent Failure: cl_amd_bus_addressable_memory extension is not " "enabled on GPU 1\n"); silentFailure = true; free(strExtensions); return; } free(strExtensions); deviceNames_ = " ["; param_size = 0; char* strDeviceName = 0; error_ = _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_NAME, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strDeviceName = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[1], CL_DEVICE_NAME, param_size, strDeviceName, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); deviceNames_ = deviceNames_ + strDeviceName; free(strDeviceName); error_ = _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_NAME, 0, 0, ¶m_size); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); strDeviceName = (char*)malloc(param_size); error_ = _wrapper->clGetDeviceInfo(devices_[0], CL_DEVICE_NAME, param_size, strDeviceName, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); deviceNames_ = deviceNames_ + "->"; deviceNames_ = deviceNames_ + strDeviceName; free(strDeviceName); deviceNames_ = deviceNames_ + "]"; cl_context_properties props[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform, 0}; contexts_[0] = _wrapper->clCreateContext(props, 1, &devices_[0], 0, 0, &error_); CHECK_RESULT(contexts_[0] == 0, "clCreateContext failed"); contexts_[1] = _wrapper->clCreateContext(props, 1, &devices_[1], 0, 0, &error_); CHECK_RESULT(contexts_[1] == 0, "clCreateContext failed"); cmd_queues_[0] = _wrapper->clCreateCommandQueue(contexts_[0], devices_[0], 0, NULL); CHECK_RESULT(cmd_queues_[0] == 0, "clCreateCommandQueue failed"); cmd_queues_[1] = _wrapper->clCreateCommandQueue(contexts_[1], devices_[1], 0, NULL); CHECK_RESULT(cmd_queues_[1] == 0, "clCreateCommandQueue failed"); busAddressableBuff_ = _wrapper->clCreateBuffer( contexts_[0], CL_MEM_BUS_ADDRESSABLE_AMD, bufSize_, 0, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); error_ = _wrapper->clEnqueueMakeBuffersResidentAMD( cmd_queues_[0], 1, &busAddressableBuff_, true, &busAddr_, 0, 0, 0); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueMakeBuffersResidentAMD failed"); extPhysicalBuff_ = _wrapper->clCreateBuffer( contexts_[1], CL_MEM_EXTERNAL_PHYSICAL_AMD, bufSize_, &busAddr_, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer failed"); error_ = _wrapper->clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, 0, 0, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteSignalAMD failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); srcBuff_ = _wrapper->clCreateBuffer(contexts_[1], CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, bufSize_, inputArr_, &error_); CHECK_RESULT(error_ != CL_SUCCESS, "clCreateBuffer failed"); error_ = _wrapper->clEnqueueMigrateMemObjects(cmd_queues_[1], 1, &extPhysicalBuff_, 0, 0, 0, 0); CHECK_RESULT(error_, "clEnqueueMigrateMemObjects failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); error_ = _wrapper->clEnqueueMigrateMemObjects(cmd_queues_[1], 1, &srcBuff_, 0, 0, 0, 0); CHECK_RESULT(error_, "clEnqueueMigrateMemObjects failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); if (_openTest == 2) { program_ = _wrapper->clCreateProgramWithSource(contexts_[1], 1, &kernel_str_, NULL, &error_); CHECK_RESULT(error_, "clCreateProgramWithSource failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[1], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char* errorstr; size_t size; _wrapper->clGetProgramBuildInfo(program_, devices_[1], CL_PROGRAM_BUILD_LOG, 0, NULL, &size); errorstr = new char[size]; _wrapper->clGetProgramBuildInfo( program_, devices_[1], CL_PROGRAM_BUILD_LOG, size, errorstr, &size); printf("\n%s\n", errorstr); delete[] errorstr; } CHECK_RESULT(error_, "clBuildProgram failed"); kernel_ = _wrapper->clCreateKernel(program_, "test_kernel", &error_); CHECK_RESULT(error_, "clCreateKernel failed"); error_ = _wrapper->clSetKernelArg(kernel_, 0, sizeof(cl_mem), (void*)&extPhysicalBuff_); CHECK_RESULT(error_, "clSetKernelArg failed"); } if (_openTest == 5) { cl_image_format format = {CL_R, CL_UNSIGNED_INT32}; cl_image_desc desc; desc.image_type = CL_MEM_OBJECT_IMAGE1D; desc.image_width = bufSize_ / sizeof(cl_uint); desc.image_height = 0; desc.image_depth = 0; desc.image_array_size = 0; desc.image_row_pitch = 0; desc.image_slice_pitch = 0; desc.num_mip_levels = 0; desc.num_samples = 0; desc.buffer = (cl_mem)NULL; image_ = _wrapper->clCreateImage(contexts_[1], CL_MEM_READ_ONLY, &format, &desc, 0, &error_); CHECK_RESULT(error_, "clCreateImage failed"); } } void OCLSDI::run(void) { if (silentFailure) { return; } ++markerValue_; OCLutil::Thread threads[2]; ThreadInfo threadInfo[2]; threadInfo[0].testObj_ = threadInfo[1].testObj_ = this; threadInfo[0].threadID_ = 0; threadInfo[1].threadID_ = 1; threads[0].create(ThreadMain, &threadInfo[0]); threads[1].create(ThreadMain, &threadInfo[1]); threads[0].join(); threads[1].join(); char* descString = (char*)malloc(25 + deviceNames_.size()); sprintf(descString, "%-20s%s", testNames[_openTest], deviceNames_.c_str()); testDescString = descString; free(descString); if (!success_) { _errorFlag = true; _crcword += 1; } } void OCLSDI::threadEntry(int threadID) { if (silentFailure) { return; } switch (_openTest) { case 0: testEnqueueWriteBuffer(threadID); break; case 1: testEnqueueCopyBuffer(threadID); break; case 2: testEnqueueNDRangeKernel(threadID); break; case 3: testEnqueueMapBuffer(threadID); break; case 4: testEnqueueWriteBufferRect(threadID); break; case 5: testEnqueueCopyImageToBuffer(threadID); break; } } unsigned int OCLSDI::close(void) { if (srcBuff_) { error_ = _wrapper->clReleaseMemObject(srcBuff_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject failed"); } if (extPhysicalBuff_) { error_ = _wrapper->clReleaseMemObject(extPhysicalBuff_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject failed"); } if (busAddressableBuff_) { error_ = _wrapper->clReleaseMemObject(busAddressableBuff_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject failed"); } if (cmd_queues_[0]) { error_ = _wrapper->clReleaseCommandQueue(cmd_queues_[0]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (cmd_queues_[1]) { error_ = _wrapper->clReleaseCommandQueue(cmd_queues_[1]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseCommandQueue failed"); } if (contexts_[0]) { error_ = _wrapper->clReleaseContext(contexts_[0]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (contexts_[1]) { error_ = _wrapper->clReleaseContext(contexts_[1]); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseContext failed"); } if (program_) { error_ = _wrapper->clReleaseProgram(program_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseProgram failed"); } if (kernel_) { error_ = _wrapper->clReleaseKernel(kernel_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseKernel failed"); } if (image_) { error_ = _wrapper->clReleaseMemObject(image_); CHECK_RESULT_NO_RETURN(error_ != CL_SUCCESS, "clReleaseMemObject failed"); } if (inputArr_) { free(inputArr_); } if (outputArr_) { free(outputArr_); } return _crcword; } void OCLSDI::readAndVerifyResult() { memset(outputArr_, 0, bufSize_); error_ = _wrapper->clEnqueueWaitSignalAMD(cmd_queues_[0], busAddressableBuff_, markerValue_, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWaitSignalAMD failed"); error_ = _wrapper->clEnqueueReadBuffer(cmd_queues_[0], busAddressableBuff_, CL_TRUE, 0, bufSize_, outputArr_, 0, 0, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueReadBuffer failed"); success_ = (memcmp(inputArr_, outputArr_, bufSize_) == 0); } void OCLSDI::testEnqueueCopyImageToBuffer(int threadID) { if (threadID == 0) { size_t origin[3] = {0, 0, 0}; size_t region[3] = {bufSize_ / sizeof(cl_uint), 1, 1}; memset(inputArr_, (_openTest + 1), bufSize_); error_ = _wrapper->clEnqueueWriteImage(cmd_queues_[1], image_, CL_TRUE, origin, region, 0, 0, inputArr_, 0, 0, 0); CHECK_RESULT(error_, "clEnqueueWriteImage failed"); _wrapper->clFinish(cmd_queues_[1]); error_ = _wrapper->clEnqueueCopyImageToBuffer( cmd_queues_[1], image_, extPhysicalBuff_, origin, region, 0, 0, 0, 0); CHECK_RESULT(error_, "clEnqueueCopyImageToBuffer failed"); _wrapper->clFinish(cmd_queues_[1]); error_ = _wrapper->clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, markerValue_, 0, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteSignalAMD failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); } else { readAndVerifyResult(); } } void OCLSDI::testEnqueueWriteBufferRect(int threadID) { size_t width = (size_t)sqrt((float)bufSize_); size_t bufOrigin[3] = {0, 0, 0}; size_t hostOrigin[3] = {0, 0, 0}; size_t region[3] = {width, width, 1}; if (threadID == 0) { memset(inputArr_, (_openTest + 1), bufSize_); error_ = _wrapper->clEnqueueWriteBufferRect( cmd_queues_[1], extPhysicalBuff_, CL_TRUE, bufOrigin, hostOrigin, region, width, 0, width, 0, inputArr_, 0, 0, 0); CHECK_RESULT(error_, "clEnqueueWriteBufferRect failed"); error_ = _wrapper->clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, markerValue_, 0, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteSignalAMD failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); } else { memset(outputArr_, 0, bufSize_); error_ = _wrapper->clEnqueueWaitSignalAMD( cmd_queues_[0], busAddressableBuff_, markerValue_, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWaitSignalAMD failed"); error_ = _wrapper->clEnqueueReadBufferRect( cmd_queues_[0], busAddressableBuff_, CL_TRUE, bufOrigin, hostOrigin, region, width, 0, width, 0, outputArr_, 0, 0, 0); CHECK_RESULT(error_, "clEnqueueReadBufferRect failed"); success_ = (memcmp(inputArr_, outputArr_, bufSize_) == 0); } } void OCLSDI::testEnqueueMapBuffer(int threadID) { if (threadID == 0) { memset(inputArr_, (_openTest + 1), bufSize_); error_ = _wrapper->clEnqueueWriteBuffer(cmd_queues_[1], extPhysicalBuff_, CL_TRUE, 0, bufSize_, inputArr_, 0, 0, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteBuffer failed"); error_ = _wrapper->clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, markerValue_, 0, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteSignalAMD failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); } else { error_ = _wrapper->clEnqueueWaitSignalAMD( cmd_queues_[0], busAddressableBuff_, markerValue_, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWaitSignalAMD failed"); void* ptr = _wrapper->clEnqueueMapBuffer( cmd_queues_[0], busAddressableBuff_, CL_TRUE, CL_MAP_READ, 0, bufSize_, 0, 0, 0, &error_); CHECK_RESULT(error_, "clEnqueueMapBuffer failed"); success_ = (memcmp(inputArr_, ptr, bufSize_) == 0); error_ = _wrapper->clEnqueueUnmapMemObject( cmd_queues_[0], busAddressableBuff_, ptr, 0, 0, 0); CHECK_RESULT(error_, "clEnqueueUnmapMemObject failed"); error_ = _wrapper->clFinish(cmd_queues_[0]); CHECK_RESULT(error_, "clFinish failed"); } } void OCLSDI::testEnqueueNDRangeKernel(int threadID) { if (threadID == 0) { size_t global_work_size = bufSize_ / sizeof(cl_uint); error_ = _wrapper->clEnqueueNDRangeKernel(cmd_queues_[1], kernel_, 1, NULL, &global_work_size, NULL, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueNDRangeKernel failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); error_ = _wrapper->clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, markerValue_, 0, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteSignalAMD failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); } else { memset(outputArr_, 0, bufSize_); error_ = _wrapper->clEnqueueWaitSignalAMD( cmd_queues_[0], busAddressableBuff_, markerValue_, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWaitSignalAMD failed"); error_ = _wrapper->clEnqueueReadBuffer(cmd_queues_[0], busAddressableBuff_, CL_TRUE, 0, bufSize_, outputArr_, 0, 0, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteBuffer failed"); success_ = true; for (cl_uint i = 0; i < bufSize_ / sizeof(cl_uint); ++i) { success_ &= (outputArr_[i] == i + 2); } } } void OCLSDI::testEnqueueCopyBuffer(int threadID) { if (threadID == 0) { memset(inputArr_, (_openTest + 1), bufSize_); error_ = _wrapper->clEnqueueWriteBuffer(cmd_queues_[1], srcBuff_, CL_TRUE, 0, bufSize_, inputArr_, 0, 0, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteBuffer failed"); error_ = _wrapper->clEnqueueCopyBuffer(cmd_queues_[1], srcBuff_, extPhysicalBuff_, 0, 0, bufSize_, 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBuffer failed"); error_ = _wrapper->clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, markerValue_, 0, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteSignalAMD failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); } else { readAndVerifyResult(); } } void OCLSDI::testEnqueueWriteBuffer(int threadID) { if (threadID == 0) { memset(inputArr_, (_openTest + 1), bufSize_); error_ = _wrapper->clEnqueueWriteBuffer(cmd_queues_[1], extPhysicalBuff_, CL_TRUE, 0, bufSize_, inputArr_, 0, 0, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteBuffer failed"); error_ = _wrapper->clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, markerValue_, 0, 0, 0, 0); CHECK_RESULT(error_ != CL_SUCCESS, "clEnqueueWriteSignalAMD failed"); error_ = _wrapper->clFinish(cmd_queues_[1]); CHECK_RESULT(error_, "clFinish failed"); } else { readAndVerifyResult(); } } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLSDI.h000066400000000000000000000043321450307266000232770ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_OCLSDI_H_ #define _OCL_OCLSDI_H_ #include #include "OCLTestImp.h" class OCLSDI : public OCLTestImp { public: OCLSDI(); virtual ~OCLSDI(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); void threadEntry(int threadID); private: void testEnqueueWriteBuffer(int threadID); void testEnqueueCopyBuffer(int threadID); void testEnqueueNDRangeKernel(int threadID); void testEnqueueMapBuffer(int threadID); void testEnqueueWriteBufferRect(int threadID); void testEnqueueCopyImageToBuffer(int threadID); void readAndVerifyResult(); bool silentFailure; cl_context contexts_[2]; cl_device_id devices_[2]; cl_command_queue cmd_queues_[2]; cl_mem extPhysicalBuff_; cl_mem busAddressableBuff_; cl_int error_; cl_bus_address_amd busAddr_; cl_uint* inputArr_; cl_uint* outputArr_; unsigned int bufSize_; bool success_; cl_uint markerValue_; cl_mem srcBuff_; cl_program program_; cl_kernel kernel_; cl_mem image_; std::string deviceNames_; }; #endif // _OCL_OCLSDI_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLSVM.cpp000066400000000000000000000531741450307266000236700ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLSVM.h" #include #include #include #ifdef _WIN32 #include #include #endif #include #define NUM_SIZES 6 #define OCL_CHECK(error) \ if (error != CL_SUCCESS) { \ fprintf(stderr, "OpenCL API invocation failed at %s:%d\n", __FILE__, \ __LINE__); \ exit(-1); \ } #define STR(__macro__) #__macro__ #ifdef _WIN32 size_t getTotalSystemMemory() { MEMORYSTATUSEX status; status.dwLength = sizeof(status); GlobalMemoryStatusEx(&status); return status.ullTotalPhys; } #endif template static unsigned countOf(const T (&)[N]) { return N; } const static char* sources[] = { STR(__kernel void test(__global int* ptr) { ptr[get_global_id(0)] = 0xDEADBEEF; }), STR(__kernel void test(__global int* ptr, __global int* ptr2) { ptr[get_global_id(0)] = 0xDEADBEEF; ptr2[get_global_id(0)] = 0xDEADF00D; }), STR(__kernel void test(__global long* ptr) { ptr[get_global_id(0) * 1024] = 0xBAADF00D; }), STR(__kernel void test(__global ulong* ptr) { while (ptr) { *ptr = 0xDEADBEEF; ptr = *((__global ulong* __global*)(ptr + 1)); } }), STR(__kernel void test(__global volatile int* ptr, int numIterations) { for (int i = 0; i < numIterations; i++) { // This should be: // atomic_fetch_add_explicit(ptr, 1, memory_order_relaxed, // memory_scope_all_svm_devices); // But using device atomics is mapped to the same ISA and compiles // in OpenCL 1.2 atomic_inc(ptr); } }), STR(__kernel void test(){ // dummy }), STR(__kernel void test(int8 arg0, __global int* arg1, int arg2, __global int* arg3, __global float* arg4){ // dummy }), STR(__kernel void test(__global int* ptr, int to) { // dummy kernel that takes a long time to complete for (int i = 0; i < to; ++i) { // avoid compiler optimizations if (ptr[get_global_id(0)] != 17) { ptr[get_global_id(0)]++; } else { ptr[get_global_id(0)] += 2; } } }), STR(__kernel void test(){ // dummy })}; OCLSVM::OCLSVM() { _numSubTests = countOf(sources); } OCLSVM::~OCLSVM() {} void OCLSVM::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_ERROR(error_, "Error opening test"); _openTest = test; if (!isOpenClSvmAvailable(devices_[_deviceId])) { printf("Device does not support any SVM features, skipping...\n"); return; } program_ = _wrapper->clCreateProgramWithSource( context_, 1, sources + _openTest, NULL, &error_); CHECK_ERROR(error_, "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], "-cl-std=CL2.0", NULL, NULL); CHECK_ERROR(error_, "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "test", &error_); CHECK_ERROR(error_, "clCreateKernel() failed"); } #ifndef CL_VERSION_2_0 // make sure the tests compile in OpenCL <= 1.2 void OCLSVM::runFineGrainedBuffer() {} void OCLSVM::runFineGrainedSystem() {} void OCLSVM::runFineGrainedSystemLargeAllocations() {} void OCLSVM::runLinkedListSearchUsingFineGrainedSystem() {} void OCLSVM::runPlatformAtomics() {} void OCLSVM::runEnqueueOperations() {} void OCLSVM::runSvmArgumentsAreRecognized() {} void OCLSVM::runSvmCommandsExecutedInOrder() {} void OCLSVM::runIdentifySvmBuffers() {} #else void OCLSVM::runFineGrainedBuffer() { if (!(svmCaps_ & CL_DEVICE_SVM_FINE_GRAIN_BUFFER)) { printf( "Device does not support fined-grained buffer sharing, skipping " "test...\n"); return; } const size_t numElements = 256; int* ptr = (int*)clSVMAlloc(context_, CL_MEM_READ_WRITE | CL_MEM_SVM_FINE_GRAIN_BUFFER, numElements * sizeof(int), 0); CHECK_RESULT(!ptr, "clSVMAlloc() failed"); error_ = clSetKernelArgSVMPointer(kernel_, 0, ptr); CHECK_ERROR(error_, "clSetKernelArgSVMPointer() failed"); size_t gws[1] = {numElements}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "Queue::finish() failed"); size_t matchingElements = std::count(ptr, ptr + numElements, (int)0xDEADBEEF); CHECK_RESULT(matchingElements != numElements, "Expected: %zd, found:%zd", numElements, matchingElements); clSVMFree(context_, ptr); } void OCLSVM::runFineGrainedSystem() { if (!(svmCaps_ & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM)) { printf( "Device does not support fined-grained system sharing, skipping " "test...\n"); return; } const size_t numElements = 256; int* ptr = new int[numElements]; int* ptr2 = new int[numElements]; error_ = clSetKernelArgSVMPointer(kernel_, 0, ptr); CHECK_ERROR(error_, "clSetKernelArgSVMPointer() failed"); error_ = clSetKernelArgSVMPointer(kernel_, 1, ptr2); CHECK_ERROR(error_, "clSetKernelArgSVMPointer() failed"); size_t gws[1] = {numElements}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "Queue::finish() failed"); size_t matchingElements = std::count(ptr, ptr + numElements, (int)0xDEADBEEF); size_t matchingElements2 = std::count(ptr2, ptr2 + numElements, (int)0xDEADF00D); CHECK_RESULT(matchingElements + matchingElements2 != 2 * numElements, "Expected: %zd, found:%zd", numElements * 2, matchingElements + matchingElements2); delete[] ptr; delete[] ptr2; } void OCLSVM::runFineGrainedSystemLargeAllocations() { #ifdef _WIN32 if (!(svmCaps_ & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM)) { printf( "Device does not support fined-grained system sharing on Lnx, skipping " "test...\n"); return; } // Max allowed multiplier for malloc size_t allowedMemSize = getTotalSystemMemory() >> 12; size_t numElements = 256; char* s = getenv("OCLSVM_MALLOC_GB_SIZE"); char* s2 = getenv("OCLSVM_MEMSET_ALLOC"); for (int j = 1; j <= NUM_SIZES; j++) { numElements = 131072 * j; if (s != NULL) numElements = 131072 * atoi(s); if (numElements > allowedMemSize) break; void* ptr = malloc(numElements * 1024 * sizeof(uint64_t)); CHECK_ERROR(ptr == NULL, "malloc failure"); if (s2 != NULL) memset(ptr, 0, numElements * 1024 * sizeof(uint64_t)); error_ = clSetKernelArgSVMPointer(kernel_, 0, ptr); CHECK_ERROR(error_, "clSetKernelArgSVMPointer() failed"); size_t gws[1] = {numElements}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "Queue::finish() failed"); uint64_t* ptr64 = reinterpret_cast(ptr); // Do a check for (int i = 0; i < numElements; i++) { if ((int)ptr64[i * 1024] != 0xBAADF00D) { uint64_t temp = ptr64[i * 1024]; delete[] ptr; CHECK_RESULT(temp != 0xBAADF00D, "Found: %d, Expected:%d", temp, 0xBAADF00D); } } delete[] ptr; } #endif } void OCLSVM::runLinkedListSearchUsingFineGrainedSystem() { if (!(svmCaps_ & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM)) { printf( "Device does not support fined-grained system sharing, skipping " "test...\n"); return; } uint64_t input[] = {34, 6, 0, 11, 89, 34, 6, 6, 6, 0xDEADBEEF}; int inputSize = countOf(input); Node* ptr = NULL; for (int i = 0; i < inputSize; i++) { ptr = new Node(input[i], ptr); } error_ = clSetKernelArgSVMPointer(kernel_, 0, ptr); CHECK_ERROR(error_, "clSetKernelArgSVMPointer() failed"); size_t gws[1] = {1}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "Queue::finish() failed"); int matchingElements = 0; // verify result while deallocating resources at the same time while (ptr) { if (ptr->value_ == 0xDEADBEEF) { matchingElements++; } Node* tmp = ptr; ptr = (Node*)ptr->next_; delete tmp; } CHECK_RESULT(matchingElements != inputSize, "Expected: %d, found:%d", inputSize, matchingElements); } static int atomicIncrement(volatile int* loc) { #if defined(_MSC_VER) return _InterlockedIncrement((volatile long*)loc); #elif defined(__GNUC__) return __sync_fetch_and_add(loc, 1); #endif printf("Atomic increment not supported, aborting..."); std::abort(); return 0; } void OCLSVM::runPlatformAtomics() { if (!(svmCaps_ & CL_DEVICE_SVM_ATOMICS)) { printf("SVM atomics not supported, skipping test...\n"); return; } volatile int* value = (volatile int*)clSVMAlloc( context_, CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS, sizeof(int), 0); CHECK_RESULT(!value, "clSVMAlloc() failed"); *value = 0; const int numIterations = 1000000; error_ = clSetKernelArgSVMPointer(kernel_, 0, (const void*)value); CHECK_ERROR(error_, "clSetKernelArgSVMPointer() failed"); error_ = clSetKernelArg(kernel_, 1, sizeof(numIterations), &numIterations); CHECK_ERROR(error_, "clSetKernelArg() failed"); size_t gws[1] = {1}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueNDRangeKernel() failed"); for (int i = 0; i < numIterations; i++) { atomicIncrement(value); } error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "Queue::finish() failed"); int expected = numIterations * 2; CHECK_RESULT(*value != expected, "Expected: %d, found:%d", expected, *value); clSVMFree(context_, (void*)value); } void OCLSVM::runEnqueueOperations() { size_t numElements = 32; size_t size = numElements * 4; int* ptr0 = (int*)clSVMAlloc(context_, 0, size, 0); CHECK_RESULT(!ptr0, "clSVMAlloc() failed"); int* ptr1 = (int*)clSVMAlloc(context_, 0, size, 0); CHECK_RESULT(!ptr1, "clSVMAlloc() failed"); cl_event userEvent = clCreateUserEvent(context_, &error_); CHECK_ERROR(error_, "clCreateUserEvent() failed"); cl_command_queue queue = cmdQueues_[_deviceId]; // coarse-grained buffer semantics: the SVM pointer needs to be mapped // before the pointer can write to it error_ = clEnqueueSVMMap(queue, CL_TRUE, CL_MAP_WRITE, ptr0, size, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueSVMMap() failed"); std::fill(ptr0, ptr0 + numElements, 1); error_ = clEnqueueSVMUnmap(queue, ptr0, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueSVMUnmap() failed"); // we copy the 1st buffer into the 2nd buffer error_ = clEnqueueSVMMemcpy(queue, true, ptr1, ptr0, size, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueSVMMemcpy() failed"); // verification: the 2nd buffer should be identical to the 1st error_ = clEnqueueSVMMap(queue, CL_TRUE, CL_MAP_READ, ptr1, size, 0, NULL, &userEvent); CHECK_ERROR(error_, "clEnqueueSVMMap() failed"); error_ = clWaitForEvents(1, &userEvent); CHECK_ERROR(error_, "clWaitForEvents() failed"); size_t observed = std::count(ptr1, ptr1 + numElements, 1); size_t expected = numElements; CHECK_RESULT(observed != expected, "Expected: %zd, found:%zd", expected, observed); void* ptrs[2] = {ptr0, ptr1}; error_ = clEnqueueSVMFree(queue, countOf(ptrs), ptrs, NULL, NULL, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueSVMFree() failed"); error_ = clFinish(queue); CHECK_ERROR(error_, "clFinish() failed"); } /** * Simple test to ensure that SVM pointer arguments are identified properly in * the runtime, since kernel arguments of pointer type can be bound to either * SVM pointers or cl_mem objects. */ void OCLSVM::runSvmArgumentsAreRecognized() { cl_int8 arg0; error_ = clSetKernelArg(kernel_, 0, sizeof(arg0), &arg0); CHECK_ERROR(error_, "clSetKernelArg() failed"); error_ = clSetKernelArgSVMPointer(kernel_, 1, NULL); CHECK_ERROR(error_, "clSetKernelArgSVMPointer() failed"); cl_int arg2; error_ = clSetKernelArg(kernel_, 2, sizeof(arg2), &arg2); CHECK_ERROR(error_, "clSetKernelArg() failed"); error_ = clSetKernelArgSVMPointer(kernel_, 3, NULL); CHECK_ERROR(error_, "clSetKernelArgSVMPointer() failed"); cl_mem arg4 = NULL; error_ = clSetKernelArg(kernel_, 4, sizeof(arg4), &arg4); CHECK_ERROR(error_, "clSetKernelArg() failed"); size_t gws[1] = {1}; // run dummy kernel error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "Queue::finish() failed"); // now we bind a pointer argument to a standard buffer instead of a SVM one cl_mem buffer = NULL; error_ = clSetKernelArg(kernel_, 1, sizeof(buffer), &buffer); CHECK_ERROR(error_, "clSetKernelArg() failed"); // re-execute the dummy kernel using different actual parameters error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_ERROR(error_, "clEnqueueNDRangeKernel() failed"); error_ = _wrapper->clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "Queue::finish() failed"); } void OCLSVM::runSvmCommandsExecutedInOrder() { #if EMU_ENV // Small number is enough to verify functionality in Emu environment const int numElements = 5000; #else const int numElements = 100000; #endif // EMU_ENV size_t size = numElements * sizeof(int); // allocate SVM memory int* data = (int*)clSVMAlloc(context_, CL_MEM_READ_WRITE, size, 0); CHECK_RESULT(!data, "clSVMAlloc failed"); // map the SVM buffer to host cl_int status = clEnqueueSVMMap(cmdQueues_[_deviceId], CL_TRUE, CL_MAP_WRITE, data, size, 0, NULL, NULL); CHECK_ERROR(status, "Error when mapping SVM buffer"); // fill buffer with 0s std::fill(data, data + numElements, 0); // unmap the SVM buffer to host status = clEnqueueSVMUnmap(cmdQueues_[_deviceId], data, 0, NULL, NULL); CHECK_ERROR(status, "Error when unmapping SVM buffer"); // enqueue kernel status = clSetKernelArgSVMPointer(kernel_, 0, data); CHECK_ERROR(status, "Error when setting kernel argument"); status = clSetKernelArg(kernel_, 1, sizeof(int), &numElements); CHECK_ERROR(status, "clSetKernelArg() failed"); cl_event event; size_t overallSize = (size_t)numElements; status = clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, &overallSize, NULL, 0, NULL, &event); CHECK_ERROR(status, "Error when enqueuing kernel"); error_ = clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(status, "clFinish()"); // map the SVM buffer to host status = clEnqueueSVMMap(cmdQueues_[_deviceId], CL_TRUE, CL_MAP_READ, data, size, 0, NULL, NULL); CHECK_ERROR(status, "Error when mapping SVM buffer"); bool pass = true; // verify the data. Using descending order might increase the chance of // finding an error since the GPU (when used) might not have finished // updating the data array by the time we do the verification for (int i = numElements - 1; i >= 0; i--) { if (data[i] != numElements + 1) { pass = false; break; } } // unmap the SVM buffer to host status = clEnqueueSVMUnmap(cmdQueues_[_deviceId], data, 0, NULL, NULL); CHECK_ERROR(status, "Error when unmapping SVM buffer"); // free the SVM buffer status = clEnqueueSVMFree(cmdQueues_[_deviceId], 1, (void**)&data, NULL, NULL, 0, NULL, NULL); CHECK_ERROR(status, "Error when freeing the SVM buffer"); error_ = clFinish(cmdQueues_[_deviceId]); CHECK_ERROR(error_, "clFinish() failed"); CHECK_RESULT(!pass, "Wrong result"); } void OCLSVM::runIdentifySvmBuffers() { size_t size = 1024 * 1024; // dummy allocation to force the runtime to track several SVM buffers clSVMAlloc(context_, CL_MEM_READ_WRITE, size * 10, 0); void* ptr = clSVMAlloc(context_, CL_MEM_READ_WRITE, size, 0); cl_int status; cl_bool usesSVMpointer = CL_FALSE; // dummy allocation to force the runtime to track several SVM buffers clSVMAlloc(context_, CL_MEM_READ_WRITE, size * 4, 0); // buffer using the entire SVM region should be identified as such cl_mem buf1 = clCreateBuffer(context_, CL_MEM_USE_HOST_PTR, size, ptr, &status); CHECK_ERROR(status, "clCreateBuffer failed."); size_t paramSize = 0; status = clGetMemObjectInfo(buf1, CL_MEM_USES_SVM_POINTER, 0, 0, ¶mSize); CHECK_ERROR(status, "clGetMemObjectInfo failed"); CHECK_RESULT(paramSize != sizeof(cl_bool), "clGetMemObjectInfo(CL_MEM_USES_SVM_POINTER) " "returned wrong size."); status = clGetMemObjectInfo(buf1, CL_MEM_USES_SVM_POINTER, sizeof(cl_bool), &usesSVMpointer, 0); CHECK_ERROR(status, "clGetMemObjectInfo failed"); CHECK_RESULT(usesSVMpointer != CL_TRUE, "clGetMemObjectInfo(CL_MEM_USES_SVM_POINTER) " "returned CL_FALSE for buffer created from SVM pointer."); // Buffer that uses random region within SVM buffers cl_mem buf2 = clCreateBuffer(context_, CL_MEM_USE_HOST_PTR, 256, (char*)ptr + size - 256, &status); CHECK_ERROR(status, "clCreateBuffer failed."); status = clGetMemObjectInfo(buf2, CL_MEM_USES_SVM_POINTER, sizeof(cl_bool), &usesSVMpointer, 0); CHECK_ERROR(status, "clGetMemObjectInfo failed"); CHECK_RESULT(usesSVMpointer != CL_TRUE, "clGetMemObjectInfo(CL_MEM_USES_SVM_POINTER) " "returned CL_FALSE for buffer created from SVM pointer."); // for any other pointer the query should return false void* randomPtr = malloc(size); cl_mem buf3 = clCreateBuffer(context_, CL_MEM_USE_HOST_PTR, size, randomPtr, &status); CHECK_ERROR(status, "clCreateBuffer failed."); status = clGetMemObjectInfo(buf3, CL_MEM_USES_SVM_POINTER, sizeof(cl_bool), &usesSVMpointer, 0); CHECK_ERROR(status, "clGetMemObjectInfo failed"); CHECK_RESULT(usesSVMpointer == CL_TRUE, "clGetMemObjectInfo(CL_MEM_USES_SVM_POINTER) " "returned CL_TRUE for buffer not created from SVM pointer."); clReleaseMemObject(buf3); clReleaseMemObject(buf2); clReleaseMemObject(buf1); clSVMFree(context_, ptr); } #endif cl_bool OCLSVM::isOpenClSvmAvailable(cl_device_id device_id) { #ifdef CL_VERSION_2_0 error_ = clGetDeviceInfo(devices_[_deviceId], CL_DEVICE_SVM_CAPABILITIES, sizeof(svmCaps_), &svmCaps_, NULL); CHECK_ERROR_NO_RETURN(error_, "clGetDeviceInfo() failed"); if (!(svmCaps_ & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER)) { return CL_FALSE; } else { return CL_TRUE; } #endif // -Device does not support OpenCL >= 2.0 // -Device supports OpenCL >= 2.0, but available headers are <= 1.2 return CL_FALSE; } void OCLSVM::run() { if (!isOpenClSvmAvailable(devices_[_deviceId])) { printf("Device does not support any SVM features, skipping...\n"); return; } if (_openTest == 0) { runFineGrainedBuffer(); } else if (_openTest == 1) { runFineGrainedSystem(); } else if (_openTest == 2) { runFineGrainedSystemLargeAllocations(); } else if (_openTest == 3) { runLinkedListSearchUsingFineGrainedSystem(); } else if (_openTest == 4) { runPlatformAtomics(); } else if (_openTest == 5) { runEnqueueOperations(); } else if (_openTest == 6) { runSvmArgumentsAreRecognized(); } else if (_openTest == 7) { runSvmCommandsExecutedInOrder(); } else if (_openTest == 8) { runIdentifySvmBuffers(); } } unsigned int OCLSVM::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLSVM.h000066400000000000000000000037771450307266000233410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_SVM_H_ #define _OCL_SVM_H_ #include #include "OCLTestImp.h" #include "stdint.h" class OCLSVM : public OCLTestImp { public: OCLSVM(); virtual ~OCLSVM(); virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: void runFineGrainedBuffer(); void runFineGrainedSystem(); void runFineGrainedSystemLargeAllocations(); void runLinkedListSearchUsingFineGrainedSystem(); void runPlatformAtomics(); void runEnqueueOperations(); void runSvmArgumentsAreRecognized(); void runSvmCommandsExecutedInOrder(); void runIdentifySvmBuffers(); cl_bool isOpenClSvmAvailable(cl_device_id device_id); uint64_t svmCaps_; }; struct Node { Node(uint64_t value, Node* next) : value_(value), next_((uint64_t)next) {} uint64_t value_; uint64_t next_; }; #endif // _OCL_SVM_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLSemaphore.cpp000066400000000000000000000205441450307266000251410ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLSemaphore.h" #include #include #include #include "CL/cl.h" #ifndef CL_DEVICE_MAX_SEMAPHORES_AMD #define CL_DEVICE_MAX_SEMAPHORES_AMD 0x1041 #else #error "CL_DEVICE_MAX_SEMAPHORES_AMD is defined somewhere, remove this define!" #endif #ifndef CL_DEVICE_MAX_SEMAPHORE_SIZE_AMD #define CL_DEVICE_MAX_SEMAPHORE_SIZE_AMD 0x1042 #else #error \ "CL_DEVICE_MAX_SEMAPHORE_SIZE_AMD is defined somewhere, remove this define!" #endif #ifndef CL_KERNEL_MAX_SEMAPHORE_SIZE_AMD #define CL_KERNEL_MAX_SEMAPHORE_SIZE_AMD 0x1043 #else #error \ "CL_KERNEL_MAX_SEMAPHORE_SIZE_AMD is defined somewhere, remove this define!" #endif const static unsigned int MaxSemaphores = 1; const static char* strKernel = "#ifdef cl_amd_semaphore\n" "#pragma OPENCL EXTENSION cl_amd_semaphore : enable \n" "kernel void sema_test(sema_t lock, global int* a, global int* b, int " "value)\n" " {\n" " size_t idx = get_global_id(0);\n" " size_t gdx = get_group_id(0);\n" " size_t ng = get_num_groups(0);\n" " size_t ssize = get_max_semaphore_size();\n" " a[1] = true;\n" " if (gdx >= ssize) {\n" " return;\n" " }\n" " barrier(CLK_GLOBAL_MEM_FENCE);\n" " semaphore_init(lock, ng);\n" " while (a[1]) {\n" " atom_add(a, b[idx]);\n" " atom_inc(a + 2);\n" " if (gdx == (ssize - 1)) {\n" " semaphore_signal(lock);\n" " if (a[0] >= value) {\n" " a[1] = false;\n" " }\n" " } else {\n" " semaphore_wait(lock);\n" " idx += get_global_size(0);\n" " }\n" " }\n" " semaphore_signal(lock);\n" " }\n" "#endif\n"; OCLSemaphore::OCLSemaphore() { _numSubTests = 1; hasSemaphore = false; } OCLSemaphore::~OCLSemaphore() {} void OCLSemaphore::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); char name[1024] = {0}; size_t size = 0; _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_EXTENSIONS, 1024, name, &size); if (!strstr(name, "cl_amd_semaphore")) { error_ = CL_DEVICE_NOT_FOUND; hasSemaphore = false; printf("Semaphore extension is required for this test!\n"); return; } else { hasSemaphore = true; } _wrapper->clGetDeviceInfo(devices_[deviceId], (cl_device_info)CL_DEVICE_MAX_SEMAPHORES_AMD, sizeof(size), &size, NULL); _wrapper->clGetDeviceInfo(devices_[deviceId], (cl_device_info)CL_DEVICE_MAX_SEMAPHORE_SIZE_AMD, sizeof(size), &size, NULL); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "sema_test", &error_); _wrapper->clGetKernelInfo(kernel_, (cl_kernel_info)CL_KERNEL_MAX_SEMAPHORE_SIZE_AMD, sizeof(size), &size, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; for (unsigned int i = 0; i < MaxSemaphores; ++i) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, 1024 * size * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, 1024 * size * sizeof(cl_uint), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLSemaphore::run(void) { if (!hasSemaphore) { return; } cl_uint initVal[2] = {5, 10}; for (unsigned int i = 0; i < MaxSemaphores; ++i) { cl_mem buffer = buffers()[i]; error_ = _wrapper->clSetKernelArg(kernel_, i, sizeof(cl_uint), &initVal[i]); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } cl_mem buffer = buffers()[MaxSemaphores]; error_ = _wrapper->clSetKernelArg(kernel_, MaxSemaphores, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); buffer = buffers()[MaxSemaphores + 1]; error_ = _wrapper->clSetKernelArg(kernel_, MaxSemaphores + 1, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); cl_int val = 64; error_ = _wrapper->clSetKernelArg(kernel_, MaxSemaphores + 2, sizeof(val), &val); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); size_t gws[1] = {64}; error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[0], kernel_, 1, NULL, gws, NULL, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); cl_uint outputV[MaxSemaphores] = {0}; // Find the new counter value initVal[0]++; initVal[1]--; for (unsigned int i = 0; i < MaxSemaphores; ++i) { cl_mem buffer = buffers()[i]; error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[0], buffers()[i], true, 0, sizeof(cl_uint), &outputV[i], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); if (initVal[i] != outputV[i]) { printf("%u != %u", initVal[i], outputV[i]); CHECK_RESULT(true, " - Incorrect result for counter!\n"); } } // Restore the original value to check the returned result in the kernel initVal[0]--; initVal[1]++; buffer = buffers()[MaxSemaphores]; error_ = _wrapper->clEnqueueReadBuffer( cmdQueues_[0], buffers()[MaxSemaphores], true, 0, MaxSemaphores * sizeof(cl_uint), outputV, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (unsigned int i = 0; i < MaxSemaphores; ++i) { if (initVal[i] != outputV[i]) { printf("%u != %u", initVal[i], outputV[i]); CHECK_RESULT(true, " - Incorrect result for counter inside kernel. Returned " "value != original.\n"); } } } unsigned int OCLSemaphore::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLSemaphore.h000066400000000000000000000027541450307266000246110ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_SEMAPHORE_H_ #define _OCL_SEMAPHORE_H_ #include "OCLTestImp.h" class OCLSemaphore : public OCLTestImp { public: OCLSemaphore(); virtual ~OCLSemaphore(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); bool hasSemaphore; }; #endif // _OCL_SEMAPHORE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLStablePState.cpp000066400000000000000000000110341450307266000255430ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLStablePState.h" #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" cl_device_id gpu_device; OCLStablePState::OCLStablePState() { _numSubTests = 1; failed_ = false; } OCLStablePState::~OCLStablePState() {} void OCLStablePState::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { cl_uint numPlatforms; cl_platform_id platform = NULL; cl_uint num_devices = 0; cl_device_id* devices = NULL; cl_device_id device = NULL; _deviceId = deviceId; if (type_ != CL_DEVICE_TYPE_GPU) { error_ = CL_DEVICE_NOT_FOUND; printf("GPU device is required for this test!\n"); return; } error_ = _wrapper->clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); if (0 < numPlatforms) { cl_platform_id* platforms = new cl_platform_id[numPlatforms]; error_ = _wrapper->clGetPlatformIDs(numPlatforms, platforms, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetPlatformIDs failed"); #if 0 // Get last for default platform = platforms[numPlatforms - 1]; for (unsigned i = 0; i < numPlatforms; ++i) { #endif platform = platforms[_platformIndex]; char pbuf[100]; error_ = _wrapper->clGetPlatformInfo(platforms[_platformIndex], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); num_devices = 0; /* Get the number of requested devices */ error_ = _wrapper->clGetDeviceIDs(platforms[_platformIndex], type_, 0, NULL, &num_devices); #if 0 } #endif delete platforms; } /* * If we could find our platform, use it. If not, die as we need the AMD * platform for these extensions. */ CHECK_RESULT(platform == 0, "Couldn't find platform with GPU devices, cannot proceed"); devices = (cl_device_id*)malloc(num_devices * sizeof(cl_device_id)); CHECK_RESULT(devices == 0, "no devices"); /* Get the requested device */ error_ = _wrapper->clGetDeviceIDs(platform, type_, num_devices, devices, NULL); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceIDs failed"); CHECK_RESULT(_deviceId >= num_devices, "Requested deviceID not available"); device = devices[_deviceId]; gpu_device = device; } static void CL_CALLBACK notify_callback(cl_event event, cl_int event_command_exec_status, void* user_data) {} void OCLStablePState::run(void) { if (failed_) { return; } cl_set_device_clock_mode_input_amd setClockModeInput; setClockModeInput.clock_mode = CL_DEVICE_CLOCK_MODE_PROFILING_AMD; cl_set_device_clock_mode_output_amd setClockModeOutput = {}; error_ = _wrapper->clSetDeviceClockModeAMD(gpu_device, setClockModeInput, &setClockModeOutput); #ifdef _WIN32 CHECK_RESULT(error_ != CL_SUCCESS, "SetClockMode profiling failed\n"); #else error_ = CL_SUCCESS; #endif setClockModeInput.clock_mode = CL_DEVICE_CLOCK_MODE_DEFAULT_AMD; setClockModeOutput = {}; error_ = _wrapper->clSetDeviceClockModeAMD(gpu_device, setClockModeInput, &setClockModeOutput); #ifdef _WIN32 CHECK_RESULT(error_ != CL_SUCCESS, "SetClockMode default failed\n"); #else error_ = CL_SUCCESS; #endif } unsigned int OCLStablePState::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLStablePState.h000066400000000000000000000030071450307266000252110ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_STABLE_PSTATE_H_ #define _OCL_STABLE_PSTATE_H_ #include "OCLTestImp.h" class OCLStablePState : public OCLTestImp { public: OCLStablePState(); virtual ~OCLStablePState(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; }; #endif // _OCL_STABLE_PSTATE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLThreadTrace.cpp000066400000000000000000000313101450307266000253750ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLThreadTrace.h" #include #include #include #include "CL/cl.h" const static unsigned int IOThreadTrace = 3; // number of input/oputput buffers static size_t SeNum = 1; // number of SEs // size of thread trace buffer #if EMU_ENV const static unsigned int ttBufSize = 5000; #else const static unsigned int ttBufSize = 30000; #endif const static unsigned int InputElements = 2048; // elements in each vector const static char* strKernel = "__kernel void thread_trace_test( \n" " __global int *A,__global int *B,__global int *C) \n" "{ \n" " int idx = get_global_id(0); \n" " C[idx] = A[idx] + B[idx]; \n" "} \n"; OCLThreadTrace::OCLThreadTrace() { _numSubTests = 1; failed_ = false; clCreateThreadTraceAMD_ = 0; clReleaseThreadTraceAMD_ = 0; clRetainThreadTraceAMD_ = 0; clGetThreadTraceInfoAMD_ = 0; clSetThreadTraceParamAMD_ = 0; clEnqueueThreadTraceCommandAMD_ = 0; clEnqueueBindThreadTraceBufferAMD_ = 0; ioBuf_ = 0; ttBuf_ = 0; } OCLThreadTrace::~OCLThreadTrace() {} void OCLThreadTrace::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening"); if (deviceId >= deviceCount_) { failed_ = true; return; } cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } size_t threadTraceEnabled; size_t retsize; error_ = _wrapper->clGetDeviceInfo( devices_[deviceId], CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD, sizeof(threadTraceEnabled), &threadTraceEnabled, &retsize); CHECK_RESULT(error_ != CL_SUCCESS, "clGetDeviceInfo failed"); if (!threadTraceEnabled) { failed_ = true; testDescString = "Not supported"; return; } unsigned int datasize = sizeof(unsigned int) * InputElements; ioBuf_ = (unsigned int**)malloc(IOThreadTrace * sizeof(unsigned int*)); CHECK_RESULT((ioBuf_ == NULL), "malloc failed"); memset(ioBuf_, 0, IOThreadTrace * sizeof(unsigned int*)); for (unsigned i = 0; i < IOThreadTrace; ++i) { ioBuf_[i] = (unsigned int*)malloc(datasize); CHECK_RESULT((ioBuf_[i] == NULL), "malloc failed"); for (unsigned j = 0; j < InputElements; ++j) { ioBuf_[i][j] = j; } } clCreateThreadTraceAMD_ = (fnp_clCreateThreadTraceAMD)_wrapper->clGetExtensionFunctionAddress( "clCreateThreadTraceAMD"); CHECK_RESULT((clCreateThreadTraceAMD_ == 0), "clGetExtensionFunctionAddress(clCreateThreadTraceAMD) failed"); clGetThreadTraceInfoAMD_ = (fnp_clGetThreadTraceInfoAMD)_wrapper->clGetExtensionFunctionAddress( "clGetThreadTraceInfoAMD"); CHECK_RESULT((clGetThreadTraceInfoAMD_ == 0), "clGetExtensionFunctionAddress(clGetThreadTraceInfoAMD) failed"); threadTrace_ = clCreateThreadTraceAMD_(devices_[_deviceId], &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateThreadTraceAMD() failed"); // Get number of shader engines clGetThreadTraceInfoAMD_(threadTrace_, CL_THREAD_TRACE_SE, sizeof(SeNum), &SeNum, NULL); ttBuf_ = (unsigned int**)malloc(SeNum * sizeof(unsigned int*)); CHECK_RESULT((ttBuf_ == NULL), "malloc failed"); memset(ttBuf_, 0, SeNum * sizeof(unsigned int*)); program_ = _wrapper->clCreateProgramWithSource(context_, 1, &strKernel, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateProgramWithSource() failed"); error_ = _wrapper->clBuildProgram(program_, 1, &devices_[deviceId], NULL, NULL, NULL); if (error_ != CL_SUCCESS) { char programLog[1024]; _wrapper->clGetProgramBuildInfo(program_, devices_[deviceId], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); printf("\n%s\n", programLog); fflush(stdout); } CHECK_RESULT((error_ != CL_SUCCESS), "clBuildProgram() failed"); kernel_ = _wrapper->clCreateKernel(program_, "thread_trace_test", &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateKernel() failed"); cl_mem buffer; for (unsigned int i = 0; i < IOThreadTrace; ++i) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, datasize, ioBuf_[i], &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } for (unsigned int i = 0; i < SeNum; ++i) { buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_WRITE, ttBufSize, NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); } clReleaseThreadTraceAMD_ = (fnp_clReleaseThreadTraceAMD)_wrapper->clGetExtensionFunctionAddress( "clReleaseThreadTraceAMD"); CHECK_RESULT((clReleaseThreadTraceAMD_ == 0), "clGetExtensionFunctionAddress(clReleaseThreadTraceAMD) failed"); clRetainThreadTraceAMD_ = (fnp_clRetainThreadTraceAMD)_wrapper->clGetExtensionFunctionAddress( "clRetainThreadTraceAMD"); CHECK_RESULT((clRetainThreadTraceAMD_ == 0), "clGetExtensionFunctionAddress(clRetainThreadTraceAMD) failed"); clSetThreadTraceParamAMD_ = (fnp_clSetThreadTraceParamAMD)_wrapper->clGetExtensionFunctionAddress( "clSetThreadTraceParamAMD"); CHECK_RESULT( (clSetThreadTraceParamAMD_ == 0), "clGetExtensionFunctionAddress(clSetThreadTraceParamAMD) failed"); clEnqueueThreadTraceCommandAMD_ = (fnp_clEnqueueThreadTraceCommandAMD) _wrapper->clGetExtensionFunctionAddress( "clEnqueueThreadTraceCommandAMD"); CHECK_RESULT( (clEnqueueThreadTraceCommandAMD_ == 0), "clGetExtensionFunctionAddress(clEnqueueThreadTraceCommandAMD) failed"); clEnqueueBindThreadTraceBufferAMD_ = (fnp_clEnqueueBindThreadTraceBufferAMD)_wrapper ->clGetExtensionFunctionAddress("clEnqueueBindThreadTraceBufferAMD"); CHECK_RESULT((clEnqueueBindThreadTraceBufferAMD_ == 0), "clGetExtensionFunctionAddress(" "clEnqueueBindThreadTraceBufferAMD) failed"); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} static void DumpTraceSI(unsigned int index, cl_ushort* tracePtr, size_t numOfBytes) { FILE* outFile; char file_name[16] = {0}; static unsigned int iii = 0; sprintf(file_name, "TTrace%d%d.out", index, iii++); outFile = fopen(file_name, "w"); for (size_t i = 0; i < numOfBytes / 2; i++) { fprintf(outFile, "%04x\n", (cl_ushort)(*tracePtr)); tracePtr++; } fclose(outFile); } #define DUMPTRACE 0 void OCLThreadTrace::run(void) { cl_mem* ttArrBuf = 0; unsigned int* ttBufRecordedSizes = 0; unsigned int i = 0, j = 0; if (failed_) { return; } for (i = 0; i < IOThreadTrace; ++i) { cl_mem buffer = buffers()[i]; error_ = _wrapper->clSetKernelArg(kernel_, i, sizeof(cl_mem), &buffer); CHECK_RESULT((error_ != CL_SUCCESS), "clSetKernelArg() failed"); } size_t globalWorkSize[1]; size_t localWorkSize[1]; globalWorkSize[0] = InputElements; localWorkSize[0] = 32; ttArrBuf = (cl_mem*)malloc(sizeof(cl_mem) * SeNum); ; for (i = 0; i < SeNum; i++) ttArrBuf[i] = buffers()[IOThreadTrace + i]; cl_event clEvent; error_ = clEnqueueBindThreadTraceBufferAMD_( cmdQueues_[_deviceId], threadTrace_, ttArrBuf, (cl_uint)SeNum, ttBufSize, 0, NULL, &clEvent); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueBindThreadTraceBufferAMD() failed"); error_ = clEnqueueThreadTraceCommandAMD_(cmdQueues_[_deviceId], threadTrace_, CL_THREAD_TRACE_BEGIN_COMMAND, 0, NULL, &clEvent); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueThreadTraceCommandAMD() failed"); error_ = _wrapper->clEnqueueNDRangeKernel(cmdQueues_[_deviceId], kernel_, 1, NULL, globalWorkSize, localWorkSize, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueNDRangeKernel() failed"); clFinish(cmdQueues_[_deviceId]); error_ = clEnqueueThreadTraceCommandAMD_(cmdQueues_[_deviceId], threadTrace_, CL_THREAD_TRACE_END_COMMAND, 0, NULL, &clEvent); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueThreadTraceCommandAMD() failed"); ttBufRecordedSizes = (unsigned int*)malloc(sizeof(unsigned int) * SeNum); memset(ttBufRecordedSizes, 0, sizeof(unsigned int) * SeNum); size_t ttBufRecordedSize; error_ = clGetThreadTraceInfoAMD_(threadTrace_, CL_THREAD_TRACE_BUFFERS_SIZE, 1, NULL, &ttBufRecordedSize); CHECK_RESULT((error_ != CL_SUCCESS), "clGetThreadTraceInfoAMD() failed"); if (ttBufRecordedSize > sizeof(unsigned int) * SeNum) { free(ttBufRecordedSizes); ttBufRecordedSizes = (unsigned int*)malloc(ttBufRecordedSize); memset(ttBufRecordedSizes, 0, ttBufRecordedSize); } error_ = clGetThreadTraceInfoAMD_(threadTrace_, CL_THREAD_TRACE_BUFFERS_SIZE, ttBufRecordedSize, ttBufRecordedSizes, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clGetThreadTraceInfoAMD() failed"); for (i = 0; i < SeNum; ++i) { ttBuf_[i] = (cl_uint*)malloc(ttBufRecordedSizes[i] * sizeof(cl_uint)); CHECK_RESULT((ttBuf_[i] == NULL), "malloc failed"); } for (i = 0; i < SeNum; ++i) { if (ttBufRecordedSizes[i] != 0) { error_ = _wrapper->clEnqueueReadBuffer( cmdQueues_[_deviceId], buffers()[IOThreadTrace + i], CL_TRUE, 0, ttBufRecordedSizes[i], ttBuf_[i], 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); #if DUMPTRACE DumpTraceSI(i, (cl_ushort*)ttBuf_[i], ttBufRecordedSizes[i]); #endif } } bool validRes = true; for (i = 0; i < SeNum; ++i) { unsigned j; for (j = 0; j < ttBufRecordedSizes[i]; ++j) { if (ttBuf_[i][j] != 0) { break; } } if (j >= ttBufRecordedSizes[i] && ttBufRecordedSizes[i] > 0) { validRes = false; break; } } if (!validRes) { CHECK_RESULT( true, " - Incorrect result for thread trace. no output data was recorded.\n"); } if (ttArrBuf) free(ttArrBuf); if (ttBufRecordedSizes) free(ttBufRecordedSizes); } unsigned int OCLThreadTrace::close(void) { if (clReleaseThreadTraceAMD_ && threadTrace_) clReleaseThreadTraceAMD_(threadTrace_); if (ioBuf_) { for (unsigned i = 0; i < IOThreadTrace; ++i) { if (ioBuf_[i]) { free(ioBuf_[i]); } } free(ioBuf_); } if (ttBuf_) { for (unsigned i = 0; i < SeNum; ++i) { if (ttBuf_[i]) { free(ttBuf_[i]); } } free(ttBuf_); } return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLThreadTrace.h000066400000000000000000000057611450307266000250550ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_THREAD_TRACE_H_ #define _OCL_THREAD_TRACE_H_ #include "OCLTestImp.h" #include "cl_thread_trace_amd.h" // Thread Trace API typedef CL_API_ENTRY cl_threadtrace_amd( CL_API_CALL *fnp_clCreateThreadTraceAMD)(cl_device_id, cl_int *); typedef CL_API_ENTRY cl_int(CL_API_CALL *fnp_clReleaseThreadTraceAMD)( cl_threadtrace_amd); typedef CL_API_ENTRY cl_int(CL_API_CALL *fnp_clRetainThreadTraceAMD)( cl_threadtrace_amd); typedef CL_API_ENTRY cl_int(CL_API_CALL *fnp_clGetThreadTraceInfoAMD)( cl_threadtrace_amd, cl_threadtrace_info, size_t, void *, size_t *); typedef CL_API_ENTRY cl_int(CL_API_CALL *fnp_clSetThreadTraceParamAMD)( cl_threadtrace_amd, cl_thread_trace_param, cl_uint); typedef CL_API_ENTRY cl_int(CL_API_CALL *fnp_clEnqueueThreadTraceCommandAMD)( cl_command_queue, cl_threadtrace_amd, cl_threadtrace_command_name_amd, cl_uint, const cl_event *, cl_event *); typedef CL_API_ENTRY cl_int(CL_API_CALL *fnp_clEnqueueBindThreadTraceBufferAMD)( cl_command_queue, cl_threadtrace_amd, cl_mem *, cl_uint, cl_uint, cl_uint, const cl_event *, cl_event *); class OCLThreadTrace : public OCLTestImp { public: OCLThreadTrace(); virtual ~OCLThreadTrace(); public: virtual void open(unsigned int test, char *units, double &conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; cl_uint **ioBuf_; cl_uint **ttBuf_; cl_threadtrace_amd threadTrace_; fnp_clCreateThreadTraceAMD clCreateThreadTraceAMD_; fnp_clReleaseThreadTraceAMD clReleaseThreadTraceAMD_; fnp_clRetainThreadTraceAMD clRetainThreadTraceAMD_; fnp_clGetThreadTraceInfoAMD clGetThreadTraceInfoAMD_; fnp_clSetThreadTraceParamAMD clSetThreadTraceParamAMD_; fnp_clEnqueueThreadTraceCommandAMD clEnqueueThreadTraceCommandAMD_; fnp_clEnqueueBindThreadTraceBufferAMD clEnqueueBindThreadTraceBufferAMD_; }; #endif // _OCL_THREAD_TRACE_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLUnalignedCopy.cpp000066400000000000000000000111331450307266000257510ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLUnalignedCopy.h" #include #include #include #include #include "CL/cl.h" #include "CL/cl_ext.h" static const int BufSize = 64; OCLUnalignedCopy::OCLUnalignedCopy() { _numSubTests = 1; failed_ = false; } OCLUnalignedCopy::~OCLUnalignedCopy() {} void OCLUnalignedCopy::open(unsigned int test, char* units, double& conversion, unsigned int deviceId) { _deviceId = deviceId; OCLTestImp::open(test, units, conversion, deviceId); CHECK_RESULT((error_ != CL_SUCCESS), "Error opening test"); cl_device_type deviceType; error_ = _wrapper->clGetDeviceInfo(devices_[deviceId], CL_DEVICE_TYPE, sizeof(deviceType), &deviceType, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "CL_DEVICE_TYPE failed"); if (!(deviceType & CL_DEVICE_TYPE_GPU)) { printf("GPU device is required for this test!\n"); failed_ = true; return; } cl_mem buffer; buffer = _wrapper->clCreateBuffer(context_, CL_MEM_READ_ONLY, BufSize * sizeof(cl_int4), NULL, &error_); CHECK_RESULT((error_ != CL_SUCCESS), "clCreateBuffer() failed"); buffers_.push_back(buffer); buffer = _wrapper->clCreateBuffer(context_, CL_MEM_WRITE_ONLY, BufSize * sizeof(cl_int4), NULL, &error_); buffers_.push_back(buffer); } static void CL_CALLBACK notify_callback(const char* errinfo, const void* private_info, size_t cb, void* user_data) {} void OCLUnalignedCopy::run(void) { if (failed_) { return; } char* values = new char[BufSize]; char* results = new char[BufSize]; for (int i = 0; i < BufSize; ++i) { values[i] = i; } static const char TestCnt = 7; char sizes[TestCnt][3] = { {5, 7, 13}, {5, 7, 12}, {4, 9, 12}, {4, 9, 15}, {27, 16, 15}, {27, 16, 13}, {32, 16, 13}, }; for (int i = 0; i < TestCnt; ++i) { error_ = _wrapper->clEnqueueWriteBuffer(cmdQueues_[_deviceId], buffers_[0], CL_FALSE, 0, BufSize, values, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueWriteBuffer() failed"); cl_uint pattern = 0; error_ = /*_wrapper->*/ clEnqueueFillBuffer( cmdQueues_[_deviceId], buffers_[1], &pattern, sizeof(pattern), 0, BufSize, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueFillBuffer() failed"); error_ = _wrapper->clEnqueueCopyBuffer( cmdQueues_[_deviceId], buffers_[0], buffers_[1], sizes[i][0], sizes[i][1], sizes[i][2], 0, NULL, NULL); CHECK_RESULT(error_, "clEnqueueCopyBuffer failed"); error_ = _wrapper->clEnqueueReadBuffer(cmdQueues_[_deviceId], buffers_[1], CL_TRUE, 0, BufSize, results, 0, NULL, NULL); CHECK_RESULT((error_ != CL_SUCCESS), "clEnqueueReadBuffer() failed"); for (int j = 0; j < sizes[i][1]; ++j) { CHECK_RESULT(results[j] != 0, "Comparison failed"); } for (int j = sizes[i][1], k = 0; j < (sizes[i][1] + sizes[i][2]); ++j, ++k) { CHECK_RESULT(results[j] != sizes[i][0] + k, "Comparison failed"); } for (int j = (sizes[i][1] + sizes[i][2]); j < BufSize; ++j) { CHECK_RESULT(results[j] != 0, "Comparison failed"); } } delete[] values; delete[] results; } unsigned int OCLUnalignedCopy::close(void) { return OCLTestImp::close(); } clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/OCLUnalignedCopy.h000066400000000000000000000030151450307266000254160ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _OCL_UNALIGNED_COPY_H_ #define _OCL_UNALIGNED_COPY_H_ #include "OCLTestImp.h" class OCLUnalignedCopy : public OCLTestImp { public: OCLUnalignedCopy(); virtual ~OCLUnalignedCopy(); public: virtual void open(unsigned int test, char* units, double& conversion, unsigned int deviceID); virtual void run(void); virtual unsigned int close(void); private: bool failed_; }; #endif // _OCL_UNALIGNED_COPY_H_ clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/TestList.cpp000066400000000000000000000075261450307266000244400ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "OCLTestListImp.h" // // Includes for tests // #include "OCLAsyncMap.h" #include "OCLAsyncTransfer.h" #include "OCLAtomicCounter.h" #include "OCLBlitKernel.h" #include "OCLBufferFromImage.h" #include "OCLCPUGuardPages.h" #include "OCLCreateBuffer.h" #include "OCLCreateContext.h" #include "OCLCreateImage.h" #include "OCLDeviceAtomic.h" #include "OCLDeviceQueries.h" #include "OCLDynamic.h" #include "OCLDynamicBLines.h" #include "OCLGenericAddressSpace.h" #include "OCLGetQueueThreadID.h" #include "OCLGlobalOffset.h" #include "OCLImage2DFromBuffer.h" #include "OCLImageCopyPartial.h" #include "OCLKernelBinary.h" #include "OCLLDS32K.h" #include "OCLLinearFilter.h" #include "OCLMapCount.h" #include "OCLMemDependency.h" #include "OCLMemObjs.h" #include "OCLMemoryInfo.h" #include "OCLMultiQueue.h" #include "OCLOfflineCompilation.h" #include "OCLP2PBuffer.h" #include "OCLPartialWrkgrp.h" #include "OCLPerfCounters.h" #include "OCLPersistent.h" #include "OCLPinnedMemory.h" #include "OCLPlatformAtomics.h" #include "OCLProgramScopeVariables.h" #include "OCLRTQueue.h" #include "OCLReadWriteImage.h" #include "OCLSDI.h" #include "OCLSVM.h" #include "OCLSemaphore.h" #include "OCLStablePState.h" #include "OCLThreadTrace.h" #include "OCLUnalignedCopy.h" // // Helper macro for adding tests // template static void* dictionary_CreateTestFunc(void) { return new T(); } #define TEST(name) \ { #name, &dictionary_CreateTestFunc < name> } TestEntry TestList[] = { TEST(OCLCreateContext), TEST(OCLAtomicCounter), TEST(OCLKernelBinary), TEST(OCLGlobalOffset), TEST(OCLLinearFilter), TEST(OCLAsyncTransfer), TEST(OCLLDS32K), TEST(OCLMemObjs), TEST(OCLSemaphore), TEST(OCLPartialWrkgrp), TEST(OCLCreateBuffer), TEST(OCLCreateImage), TEST(OCLCPUGuardPages), TEST(OCLMapCount), TEST(OCLMemoryInfo), TEST(OCLOfflineCompilation), TEST(OCLMemDependency), TEST(OCLGetQueueThreadID), TEST(OCLDeviceQueries), TEST(OCLSDI), TEST(OCLThreadTrace), TEST(OCLMultiQueue), TEST(OCLImage2DFromBuffer), TEST(OCLBufferFromImage), TEST(OCLPerfCounters), TEST(OCLSVM), TEST(OCLProgramScopeVariables), TEST(OCLGenericAddressSpace), TEST(OCLDynamic), TEST(OCLPlatformAtomics), TEST(OCLDeviceAtomic), TEST(OCLDynamicBLines), TEST(OCLUnalignedCopy), TEST(OCLBlitKernel), TEST(OCLRTQueue), TEST(OCLAsyncMap), TEST(OCLPinnedMemory), TEST(OCLReadWriteImage), TEST(OCLStablePState), TEST(OCLP2PBuffer), // Failures in Linux. IOL doesn't support tiling aperture and Cypress linear // image writes TEST(OCLPersistent), }; unsigned int TestListCount = sizeof(TestList) / sizeof(TestList[0]); unsigned int TestLibVersion = 0; const char* TestLibName = "oclruntime"; clr-rocm-5.7.1/opencl/tests/ocltst/module/runtime/oclruntime.exclude000066400000000000000000000001731450307266000257040ustar00rootroot00000000000000# all clear OCLImageCopyPartial # EPR 362715 OCLCPUGuardPages OCLRegionDeviceQueries OCLDynamicBLines OCLAtomicCounter clr-rocm-5.7.1/opencl/tools/000077500000000000000000000000001450307266000156655ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tools/clinfo/000077500000000000000000000000001450307266000171375ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tools/clinfo/CMakeLists.txt000066400000000000000000000004621450307266000217010ustar00rootroot00000000000000add_executable(clinfo clinfo.cpp) target_compile_definitions(clinfo PRIVATE CL_TARGET_OPENCL_VERSION=220 HAVE_CL2_HPP) target_include_directories(clinfo PRIVATE ${OPENCL_ICD_LOADER_HEADERS_DIR}) target_link_libraries(clinfo OpenCL) INSTALL(TARGETS clinfo RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}) clr-rocm-5.7.1/opencl/tools/clinfo/clinfo.cpp000066400000000000000000001040641450307266000211220ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #if !defined(_WIN32) #include #endif #ifdef _MSC_VER #pragma warning(disable: 4290) #endif #if defined(HAVE_CL2_HPP) #define CL_HPP_ENABLE_EXCEPTIONS #define CL_HPP_MINIMUM_OPENCL_VERSION 120 #define CL_HPP_TARGET_OPENCL_VERSION 200 #define CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY #include "CL/cl2.hpp" #else // !HAVE_CL2_HPP #define __CL_ENABLE_EXCEPTIONS #define __MAX_DEFAULT_VECTOR_SIZE 50 #define CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_USE_DEPRECATED_OPENCL_2_0_APIS #include "CL/cl.hpp" #endif // !HAVE_CL2_HPP bool verbose = false; /// Returns EXIT_SUCCESS on success, EXIT_FAILURE on failure. int main(int argc, char** argv) { /* Error flag */ cl_int err; //parse input for(int i = 1; i < argc; i++){ if ((strcmp(argv[i], "-v") == 0) || (strcmp(argv[i], "--verbose") == 0)){ verbose = true; } else if ((strcmp(argv[i], "-h") == 0) || (strcmp(argv[i], "--help") == 0)){ std::cout << "Usage is: " << argv[0] << " [-v|--verbose]" << std::endl; return EXIT_FAILURE; } } // Platform info std::vector platforms; try { err = cl::Platform::get(&platforms); // Iteratate over platforms std::cout << "Number of platforms:\t\t\t\t " << platforms.size() << std::endl; for (std::vector::iterator i = platforms.begin(); i != platforms.end(); ++i) { const cl::Platform& platform = *i; std::cout << " Platform Profile:\t\t\t\t " << platform.getInfo().c_str() << std::endl; std::cout << " Platform Version:\t\t\t\t " << platform.getInfo().c_str() << std::endl; std::cout << " Platform Name:\t\t\t\t " << platform.getInfo().c_str() << std::endl; std::cout << " Platform Vendor:\t\t\t\t " << platform.getInfo().c_str() << std::endl; if (platform.getInfo().size() > 0) { std::cout << " Platform Extensions:\t\t\t\t " << platform.getInfo().c_str() << std::endl; } } std::cout << std::endl << std:: endl; // Now Iteratate over each platform and its devices for (std::vector::iterator p = platforms.begin(); p != platforms.end(); ++p) { const cl::Platform& platform = *p; std::cout << " Platform Name:\t\t\t\t " << platform.getInfo().c_str() << std::endl; std::vector devices; platform.getDevices(CL_DEVICE_TYPE_ALL, &devices); // Get OpenCL version std::string platformVersionStr = platform.getInfo(); std::string openclVerstionStr(platformVersionStr.c_str()); size_t vStart = openclVerstionStr.find(" ", 0); size_t vEnd = openclVerstionStr.find(" ", vStart + 1); std::string vStrVal = openclVerstionStr.substr(vStart + 1, vEnd - vStart - 1); std::cout << "Number of devices:\t\t\t\t " << devices.size() << std::endl; for (std::vector::iterator i = devices.begin(); i != devices.end(); ++i) { const cl::Device& device = *i; /* Get device name */ std::string deviceName = device.getInfo(); cl_device_type dtype = device.getInfo(); /* Get CAL driver version in int */ std::string driverVersion = device.getInfo(); std::string calVersion(driverVersion.c_str()); calVersion = calVersion.substr(calVersion.find_last_of(".") + 1); int version = atoi(calVersion.c_str()); std::cout << " Device Type:\t\t\t\t\t " ; switch (dtype) { case CL_DEVICE_TYPE_ACCELERATOR: std::cout << "CL_DEVICE_TYPE_ACCRLERATOR" << std::endl; break; case CL_DEVICE_TYPE_CPU: std::cout << "CL_DEVICE_TYPE_CPU" << std::endl; break; case CL_DEVICE_TYPE_DEFAULT: std::cout << "CL_DEVICE_TYPE_DEFAULT" << std::endl; break; case CL_DEVICE_TYPE_GPU: std::cout << "CL_DEVICE_TYPE_GPU" << std::endl; break; } std::cout << " Vendor ID:\t\t\t\t\t " << std::hex << device.getInfo() << "h" << std::dec << std::endl; bool isAMDPlatform = (strcmp(platform.getInfo().c_str(), "AMD Accelerated Parallel Processing") == 0) ? true : false; if (isAMDPlatform) { std::string boardName; device.getInfo(CL_DEVICE_BOARD_NAME_AMD, &boardName); std::cout << " Board name:\t\t\t\t\t " << boardName.c_str() << std::endl; cl_device_topology_amd topology; err = device.getInfo(CL_DEVICE_TOPOLOGY_AMD, &topology); if (topology.raw.type == CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD) { std::cout << " Device Topology:\t\t\t\t " << "PCI[ B#" << (int)topology.pcie.bus << ", D#" << (int)topology.pcie.device << ", F#" << (int)topology.pcie.function << " ]" << std::endl; } } std::cout << " Max compute units:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max work items dimensions:\t\t\t " << device.getInfo() << std::endl; std::vector< ::size_t> witems = device.getInfo(); for (unsigned int x = 0; x < device.getInfo(); x++) { std::cout << " Max work items[" << x << "]:\t\t\t\t " << witems[x] << std::endl; } std::cout << " Max work group size:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Preferred vector width char:\t\t\t " << device.getInfo() << std::endl; std::cout << " Preferred vector width short:\t\t\t " << device.getInfo() << std::endl; std::cout << " Preferred vector width int:\t\t\t " << device.getInfo() << std::endl; std::cout << " Preferred vector width long:\t\t\t " << device.getInfo() << std::endl; std::cout << " Preferred vector width float:\t\t\t " << device.getInfo() << std::endl; std::cout << " Preferred vector width double:\t\t " << device.getInfo() << std::endl; #ifdef CL_VERSION_1_1 if(vStrVal.compare("1.0") > 0) { std::cout << " Native vector width char:\t\t\t " << device.getInfo() << std::endl; std::cout << " Native vector width short:\t\t\t " << device.getInfo() << std::endl; std::cout << " Native vector width int:\t\t\t " << device.getInfo() << std::endl; std::cout << " Native vector width long:\t\t\t " << device.getInfo() << std::endl; std::cout << " Native vector width float:\t\t\t " << device.getInfo() << std::endl; std::cout << " Native vector width double:\t\t\t " << device.getInfo() << std::endl; } #endif // CL_VERSION_1_1 std::cout << " Max clock frequency:\t\t\t\t " << device.getInfo() << "Mhz" << std::endl; std::cout << " Address bits:\t\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max memory allocation:\t\t\t " << device.getInfo() << std::endl; std::cout << " Image support:\t\t\t\t " << (device.getInfo() ? "Yes" : "No") << std::endl; if (device.getInfo()) { std::cout << " Max number of images read arguments:\t\t " << device.getInfo() << std::endl; std::cout << " Max number of images write arguments:\t\t " << device.getInfo() << std::endl; std::cout << " Max image 2D width:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max image 2D height:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max image 3D width:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max image 3D height:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max image 3D depth:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max samplers within kernel:\t\t\t " << device.getInfo() << std::endl; if (verbose) { std::cout << " Image formats supported:" << std::endl; std::vector formats; cl_context_properties cps[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(*p)(), 0 }; std::vector device; device.push_back(*i); cl::Context context(device, cps, NULL, NULL, &err); std::map channelOrder; channelOrder[CL_R] = "CL_R"; channelOrder[CL_A] = "CL_A"; channelOrder[CL_RG] = "CL_RG"; channelOrder[CL_RA] = "CL_RA"; channelOrder[CL_RGB] = "CL_RGB"; channelOrder[CL_RGBA] = "CL_RGBA"; channelOrder[CL_BGRA] = "CL_BGRA"; channelOrder[CL_ARGB] = "CL_ARGB"; channelOrder[CL_INTENSITY] = "CL_INTENSITY"; channelOrder[CL_LUMINANCE] = "CL_LUMINANCE"; channelOrder[CL_Rx] = "CL_Rx"; channelOrder[CL_RGx] = "CL_RGx"; channelOrder[CL_RGBx] = "CL_RGBx"; std::map > channelType; channelType[CL_SNORM_INT8] = std::make_pair("snorm", "int8"); channelType[CL_SNORM_INT16] = std::make_pair("snorm", "int16"); channelType[CL_UNORM_INT8] = std::make_pair("unorm", "int8"); channelType[CL_UNORM_INT16] = std::make_pair("unorm", "int16"); channelType[CL_UNORM_SHORT_565] = std::make_pair("unorm", "short_565"); channelType[CL_UNORM_SHORT_555] = std::make_pair("unorm", "short_555"); channelType[CL_UNORM_INT_101010] = std::make_pair("unorm", "int_101010"); channelType[CL_SIGNED_INT8] = std::make_pair("signed", "int8"); channelType[CL_SIGNED_INT16] = std::make_pair("signed", "int16"); channelType[CL_SIGNED_INT32] = std::make_pair("signed", "int32"); channelType[CL_UNSIGNED_INT8] = std::make_pair("unsigned", "int8"); channelType[CL_UNSIGNED_INT16] = std::make_pair("unsigned", "int16"); channelType[CL_UNSIGNED_INT32] = std::make_pair("unsigned", "int32"); channelType[CL_HALF_FLOAT] = std::make_pair("half_float", ""); channelType[CL_FLOAT] = std::make_pair("float", ""); std::vector > imageDimensions; imageDimensions.push_back(std::make_pair(CL_MEM_OBJECT_IMAGE2D, std::string("2D "))); imageDimensions.push_back(std::make_pair(CL_MEM_OBJECT_IMAGE3D, std::string("3D "))); for(std::vector >::iterator id = imageDimensions.begin(); id != imageDimensions.end(); id++){ struct imageAccessStruct { std::string name; int access; std::vector formats; } imageAccess[] = {{std::string("Read-Write/Read-Only/Write-Only"), CL_MEM_READ_WRITE, std::vector()}, {std::string("Read-Only"), CL_MEM_READ_ONLY, std::vector()}, {std::string("Write-Only"), CL_MEM_WRITE_ONLY, std::vector()}}; for(size_t ia=0; ia < sizeof(imageAccess)/sizeof(imageAccessStruct); ia++){ context.getSupportedImageFormats(imageAccess[ia].access, (*id).first, &(imageAccess[ia].formats)); bool printTopHeader = true; for (std::map::iterator o = channelOrder.begin(); o != channelOrder.end(); o++) { bool printHeader = true; for (std::vector::iterator it = imageAccess[ia].formats.begin(); it != imageAccess[ia].formats.end(); ++it) { if ( (*o).first == (int)(*it).image_channel_order) { bool printedAlready = false; //see if this was already print in RW/RO/WO if (ia !=0) { for (std::vector::iterator searchIt = imageAccess[0].formats.begin(); searchIt != imageAccess[0].formats.end(); searchIt++) { if ( ((*searchIt).image_channel_data_type == (*it).image_channel_data_type) && ((*searchIt).image_channel_order == (*it).image_channel_order)) { printedAlready = true; break; } } } if (printedAlready) { continue; } if (printTopHeader) { std::cout << " " << (*id).second << imageAccess[ia].name << std::endl; printTopHeader = false; } if (printHeader) { std::cout << " " << (*o).second << ": "; printHeader = false; } std::cout << channelType[(*it).image_channel_data_type].first; if (channelType[(*it).image_channel_data_type].second != "") { std::cout << "-" << channelType[(*it).image_channel_data_type].second; } if (it != (imageAccess[ia].formats.end() - 1)) { std::cout << " "; } } } if (printHeader == false) { std::cout << std::endl; } } } } } } std::cout << " Max size of kernel argument:\t\t\t " << device.getInfo() << std::endl; std::cout << " Alignment (bits) of base address:\t\t " << device.getInfo() << std::endl; std::cout << " Minimum alignment (bytes) for any datatype:\t " << device.getInfo() << std::endl; std::cout << " Single precision floating point capability" << std::endl; std::cout << " Denorms:\t\t\t\t\t " << (device.getInfo() & CL_FP_DENORM ? "Yes" : "No") << std::endl; std::cout << " Quiet NaNs:\t\t\t\t\t " << (device.getInfo() & CL_FP_INF_NAN ? "Yes" : "No") << std::endl; std::cout << " Round to nearest even:\t\t\t " << (device.getInfo() & CL_FP_ROUND_TO_NEAREST ? "Yes" : "No") << std::endl; std::cout << " Round to zero:\t\t\t\t " << (device.getInfo() & CL_FP_ROUND_TO_ZERO ? "Yes" : "No") << std::endl; std::cout << " Round to +ve and infinity:\t\t\t " << (device.getInfo() & CL_FP_ROUND_TO_INF ? "Yes" : "No") << std::endl; std::cout << " IEEE754-2008 fused multiply-add:\t\t " << (device.getInfo() & CL_FP_FMA ? "Yes" : "No") << std::endl; std::cout << " Cache type:\t\t\t\t\t " ; switch (device.getInfo()) { case CL_NONE: std::cout << "None" << std::endl; break; case CL_READ_ONLY_CACHE: std::cout << "Read only" << std::endl; break; case CL_READ_WRITE_CACHE: std::cout << "Read/Write" << std::endl; break; } std::cout << " Cache line size:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Cache size:\t\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Global memory size:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Constant buffer size:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max number of constant args:\t\t\t " << device.getInfo() << std::endl; std::cout << " Local memory type:\t\t\t\t " ; switch (device.getInfo()) { case CL_LOCAL: std::cout << "Scratchpad" << std::endl; break; case CL_GLOBAL: std::cout << "Global" << std::endl; break; } std::cout << " Local memory size:\t\t\t\t " << device.getInfo() << std::endl; #if defined(CL_VERSION_2_0) if(vStrVal.compare("2") > 0) { std::cout << " Max pipe arguments:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max pipe active reservations:\t\t\t " << device.getInfo() << std::endl; std::cout << " Max pipe packet size:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Max global variable size:\t\t\t " << device.getInfo() << std::endl; std::cout << " Max global variable preferred total size:\t " << device.getInfo() << std::endl; std::cout << " Max read/write image args:\t\t\t " << device.getInfo() << std::endl; std::cout << " Max on device events:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Queue on device max size:\t\t\t " << device.getInfo() << std::endl; std::cout << " Max on device queues:\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Queue on device preferred size:\t\t " << device.getInfo() << std::endl; std::cout << " SVM capabilities:\t\t\t\t " << std::endl; std::cout << " Coarse grain buffer:\t\t\t " << (device.getInfo() & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER ? "Yes" : "No") << std::endl; std::cout << " Fine grain buffer:\t\t\t\t " << (device.getInfo() & CL_DEVICE_SVM_FINE_GRAIN_BUFFER ? "Yes" : "No") << std::endl; std::cout << " Fine grain system:\t\t\t\t " << (device.getInfo() & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM ? "Yes" : "No") << std::endl; std::cout << " Atomics:\t\t\t\t\t " << (device.getInfo() & CL_DEVICE_SVM_ATOMICS ? "Yes" : "No") << std::endl; std::cout << " Preferred platform atomic alignment:\t\t " << device.getInfo() << std::endl; std::cout << " Preferred global atomic alignment:\t\t " << device.getInfo() << std::endl; std::cout << " Preferred local atomic alignment:\t\t " << device.getInfo() << std::endl; } #endif // CL_VERSION_2_0 #if defined(CL_VERSION_1_1) && !defined(ATI_ARCH_ARM) if(vStrVal.compare("1.0") > 0) { cl_context_properties cps[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(*p)(), 0 }; std::vector device; device.push_back(*i); cl::Context context(device, cps, NULL, NULL, &err); if (err != CL_SUCCESS) { std::cerr << "Context::Context() failed (" << err << ")\n"; return EXIT_FAILURE; } std::string kernelStr("__kernel void hello(){ size_t i = get_global_id(0); size_t j = get_global_id(1);}"); cl::Program::Sources sources(1, std::make_pair(kernelStr.data(), kernelStr.size())); cl::Program program = cl::Program(context, sources, &err); if (err != CL_SUCCESS) { std::cerr << "Program::Program() failed (" << err << ")\n"; return EXIT_FAILURE; } err = program.build(device); if (err != CL_SUCCESS) { if(err == CL_BUILD_PROGRAM_FAILURE) { std::string str = program.getBuildInfo((*i)); std::cout << " \n\t\t\tBUILD LOG\n"; std::cout << " ************************************************\n"; std::cout << str.c_str() << std::endl; std::cout << " ************************************************\n"; } std::cerr << "Program::build() failed (" << err << ")\n"; return EXIT_FAILURE; } cl::Kernel kernel(program, "hello", &err); if (err != CL_SUCCESS) { std::cerr << "Kernel::Kernel() failed (" << err << ")\n"; return EXIT_FAILURE; } std::cout << " Kernel Preferred work group size multiple:\t " << kernel.getWorkGroupInfo((*i), &err) << std::endl; } #endif // CL_VERSION_1_1 std::cout << " Error correction support:\t\t\t " << device.getInfo() << std::endl; #ifdef CL_VERSION_1_1 if(vStrVal.compare("1.0") > 0) { std::cout << " Unified memory for Host and Device:\t\t " << device.getInfo() << std::endl; } #endif // CL_VERSION_1_1 std::cout << " Profiling timer resolution:\t\t\t " << device.getInfo() << std::endl; std::cout << " Device endianess:\t\t\t\t " << (device.getInfo() ? "Little" : "Big") << std::endl; std::cout << " Available:\t\t\t\t\t " << (device.getInfo() ? "Yes" : "No") << std::endl; std::cout << " Compiler available:\t\t\t\t " << (device.getInfo() ? "Yes" : "No") << std::endl; std::cout << " Execution capabilities:\t\t\t\t " << std::endl; std::cout << " Execute OpenCL kernels:\t\t\t " << (device.getInfo() & CL_EXEC_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Execute native function:\t\t\t " << (device.getInfo() & CL_EXEC_NATIVE_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Queue on Host properties:\t\t\t\t " << std::endl; std::cout << " Out-of-Order:\t\t\t\t " << (device.getInfo() & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Profiling :\t\t\t\t\t " << (device.getInfo() & CL_QUEUE_PROFILING_ENABLE ? "Yes" : "No") << std::endl; #ifdef CL_VERSION_2_0 if(vStrVal.compare("2") > 0) { std::cout << " Queue on Device properties:\t\t\t\t " << std::endl; std::cout << " Out-of-Order:\t\t\t\t " << (device.getInfo() & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Profiling :\t\t\t\t\t " << (device.getInfo() & CL_QUEUE_PROFILING_ENABLE ? "Yes" : "No") << std::endl; } #endif std::cout << " Platform ID:\t\t\t\t\t " << device.getInfo() << std::endl; std::cout << " Name:\t\t\t\t\t\t " << device.getInfo().c_str() << std::endl; std::cout << " Vendor:\t\t\t\t\t " << device.getInfo().c_str() << std::endl; #ifdef CL_VERSION_1_1 if(vStrVal.compare("1.0") > 0) { std::cout << " Device OpenCL C version:\t\t\t " << device.getInfo().c_str() << std::endl; } #endif // CL_VERSION_1_1 std::cout << " Driver version:\t\t\t\t " << device.getInfo().c_str() << std::endl; std::cout << " Profile:\t\t\t\t\t " << device.getInfo().c_str() << std::endl; std::cout << " Version:\t\t\t\t\t " << device.getInfo().c_str() << std::endl; std::cout << " Extensions:\t\t\t\t\t " << device.getInfo().c_str() << std::endl; std::cout << std::endl << std::endl; } } } catch (cl::Error err) { std::cerr << "ERROR: " << err.what() << "(" << err.err() << ")" << std::endl; return EXIT_FAILURE; } return EXIT_SUCCESS; } clr-rocm-5.7.1/opencl/tools/cltrace/000077500000000000000000000000001450307266000173025ustar00rootroot00000000000000clr-rocm-5.7.1/opencl/tools/cltrace/CMakeLists.txt000066400000000000000000000011321450307266000220370ustar00rootroot00000000000000add_library(cltrace SHARED cltrace.cpp) if(WIN32) target_sources(cltrace PRIVATE cltrace.def) else() target_link_libraries(cltrace PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_LIST_DIR}/cltrace.map") set_target_properties(cltrace PROPERTIES LINK_DEPENDS "${CMAKE_CURRENT_LIST_DIR}/cltrace.map") endif() target_include_directories(cltrace PRIVATE ${CMAKE_SOURCE_DIR}/opencl ${OPENCL_ICD_LOADER_HEADERS_DIR} ${ROCCLR_INCLUDE_DIR}) INSTALL(TARGETS cltrace RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) clr-rocm-5.7.1/opencl/tools/cltrace/cltrace.cpp000066400000000000000000003444351450307266000214400ustar00rootroot00000000000000// // Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. All rights reserved. // #include #include #if defined(CL_VERSION_2_0) /* Deprecated in OpenCL 2.0 */ # define CL_DEVICE_QUEUE_PROPERTIES 0x102A # define CL_DEVICE_HOST_UNIFIED_MEMORY 0x1035 #endif #include #include #include #include #include #include #include #ifdef _MSC_VER #include #include #include #else #include #include #endif #define CASE(x) case x: return #x; std::ofstream clTraceLog; std::streambuf *cerrStreamBufSave; // A call record with links for the checker struct Rec { Rec *next; Rec *prev; std::ostringstream *sp; int visits; Rec() : sp(0) { } Rec(std::ostringstream *ps) : sp(ps), visits(0) { } }; // This is the head of the checker Rec list static Rec recs; // About how many times per second the checker runs static const int checks_per_second = 10; // Some OS independent synchronization for the checker Rec list #ifdef _MSC_VER #define CHECKERTYPE static void #define CHECKERRETURN return static CRITICAL_SECTION recsCS[1]; static inline void initRecs(void) { InitializeCriticalSection(recsCS); recs.next = &recs; recs.prev = &recs; } static inline void lockRecs(void) { EnterCriticalSection(recsCS); } static inline void unlockRecs(void) { LeaveCriticalSection(recsCS); } static inline void waitRecs(void) { Sleep(1000/checks_per_second); } #else #define CHECKERTYPE static void * #define CHECKERRETURN return NULL static pthread_mutex_t recsMtx = PTHREAD_MUTEX_INITIALIZER; static inline void initRecs(void) { recs.next = &recs; recs.prev = &recs; } static inline void lockRecs(void) { pthread_mutex_lock(&recsMtx); } static inline void unlockRecs(void) { pthread_mutex_unlock(&recsMtx); } static inline void waitRecs(void) { usleep(1000000/checks_per_second); } #endif // Link into checker Rec list static inline void addRec(Rec *r) { lockRecs(); r->next = recs.next; r->prev = &recs; recs.next->prev = r; recs.next = r; unlockRecs(); } // unlink from checker Rec list static inline void delRec(Rec *r) { lockRecs(); r->next->prev = r->prev; r->prev->next = r->next; unlockRecs(); } // This is the checker thread function CHECKERTYPE checker(void *) { Rec *b; Rec *e = &recs; for (;;) { // Wait for a while waitRecs(); std::ostringstream ss; int go = 0; lockRecs(); for (b=recs.next; b!=e; b=b->next) { ++b->visits; if (b->visits == 2) { // This record has been on the list for a while // we'll log it in case the thread has hung ss << "Waiting for " << b->sp->str() << std::endl; go = 1; } } unlockRecs(); if (go) std::cerr << ss.str(); } CHECKERRETURN; } #ifdef _MSC_VER static cl_int startChecker(void) { uintptr_t h = _beginthread(checker, 0, NULL); return h == 0; } #else static cl_int startChecker(void) { int e; pthread_t tid; pthread_attr_t pa; e = pthread_attr_init(&pa); if (e) return e; e = pthread_attr_setdetachstate(&pa, PTHREAD_CREATE_DETACHED); if (e) return e; e = pthread_create(&tid, &pa, checker, NULL); return e; } #endif template std::string getDecimalString(T value) { std::ostringstream ss; ss << value; return ss.str(); } template std::string getDecimalString(T* value) { if (value == NULL) { return "NULL"; } std::ostringstream ss; ss << '&' << *value; return ss.str(); } template std::string getHexString(T value) { std::ostringstream ss; ss << "0x" << std::hex << value; return ss.str(); } template std::string getHexString(T* value) { if (value == NULL) { return "NULL"; } std::ostringstream ss; ss << "&0x" << std::hex << *value; return ss.str(); } template std::string getHexString(T** value) { if (value == NULL) { return "NULL"; } std::ostringstream ss; ss << "&" << *value; return ss.str(); } template <> std::string getHexString(void *value) { return getHexString(reinterpret_cast(value)); } static std::string getMemoryString(const void* ptr, size_t size) { switch (size) { case 1: return getHexString((const char*)ptr); case 2: return getHexString((const short*)ptr); case 4: return getHexString((const int*)ptr); case 8: return getHexString((const long long*)ptr); default: break; } std::ostringstream ss; ss << "&" << ptr; return ss.str(); } static std::string getBoolString(cl_bool b) { return (b == CL_TRUE) ? "CL_TRUE" : "CL_FALSE"; } static std::string getNDimString(const size_t* nd, size_t dims) { if (nd == NULL) { return "NULL"; } if (dims == 0) { return "[]"; } std::ostringstream ss; ss << '[' << nd[0]; if (dims > 1) { ss << ',' << nd[1]; if (dims > 2) { ss << ',' << nd[2]; } } ss << ']'; return ss.str(); } static std::string getErrorString(cl_int errcode) { switch(errcode) { CASE(CL_SUCCESS); CASE(CL_DEVICE_NOT_FOUND); CASE(CL_DEVICE_NOT_AVAILABLE); CASE(CL_COMPILER_NOT_AVAILABLE); CASE(CL_MEM_OBJECT_ALLOCATION_FAILURE); CASE(CL_OUT_OF_RESOURCES); CASE(CL_OUT_OF_HOST_MEMORY); CASE(CL_PROFILING_INFO_NOT_AVAILABLE); CASE(CL_MEM_COPY_OVERLAP); CASE(CL_IMAGE_FORMAT_MISMATCH); CASE(CL_IMAGE_FORMAT_NOT_SUPPORTED); CASE(CL_BUILD_PROGRAM_FAILURE); CASE(CL_MAP_FAILURE); CASE(CL_MISALIGNED_SUB_BUFFER_OFFSET); CASE(CL_INVALID_VALUE); CASE(CL_INVALID_DEVICE_TYPE); CASE(CL_INVALID_PLATFORM); CASE(CL_INVALID_DEVICE); CASE(CL_INVALID_CONTEXT); CASE(CL_INVALID_QUEUE_PROPERTIES); CASE(CL_INVALID_COMMAND_QUEUE); CASE(CL_INVALID_HOST_PTR); CASE(CL_INVALID_MEM_OBJECT); CASE(CL_INVALID_IMAGE_FORMAT_DESCRIPTOR); CASE(CL_INVALID_IMAGE_SIZE); CASE(CL_INVALID_SAMPLER); CASE(CL_INVALID_BINARY); CASE(CL_INVALID_BUILD_OPTIONS); CASE(CL_INVALID_PROGRAM); CASE(CL_INVALID_PROGRAM_EXECUTABLE); CASE(CL_INVALID_KERNEL_NAME); CASE(CL_INVALID_KERNEL_DEFINITION); CASE(CL_INVALID_KERNEL); CASE(CL_INVALID_ARG_INDEX); CASE(CL_INVALID_ARG_VALUE); CASE(CL_INVALID_ARG_SIZE); CASE(CL_INVALID_KERNEL_ARGS); CASE(CL_INVALID_WORK_DIMENSION); CASE(CL_INVALID_WORK_GROUP_SIZE); CASE(CL_INVALID_WORK_ITEM_SIZE); CASE(CL_INVALID_GLOBAL_OFFSET); CASE(CL_INVALID_EVENT_WAIT_LIST); CASE(CL_INVALID_EVENT); CASE(CL_INVALID_OPERATION); CASE(CL_INVALID_GL_OBJECT); CASE(CL_INVALID_BUFFER_SIZE); CASE(CL_INVALID_MIP_LEVEL); CASE(CL_INVALID_GLOBAL_WORK_SIZE); default: return getDecimalString(errcode); } } static std::string getMemObjectTypeString(cl_mem_object_type type) { switch(type) { CASE(CL_MEM_OBJECT_BUFFER); CASE(CL_MEM_OBJECT_IMAGE2D); CASE(CL_MEM_OBJECT_IMAGE3D); default: return getHexString(type); } } static std::string getMemInfoString(cl_mem_info param_name) { switch(param_name) { CASE(CL_MEM_TYPE); CASE(CL_MEM_FLAGS); CASE(CL_MEM_SIZE); CASE(CL_MEM_HOST_PTR); CASE(CL_MEM_MAP_COUNT); CASE(CL_MEM_REFERENCE_COUNT); CASE(CL_MEM_CONTEXT); CASE(CL_MEM_ASSOCIATED_MEMOBJECT); CASE(CL_MEM_OFFSET); default: return getHexString(param_name); } } static std::string getImageInfoString(cl_image_info param_name) { switch(param_name) { CASE(CL_IMAGE_FORMAT); CASE(CL_IMAGE_ELEMENT_SIZE); CASE(CL_IMAGE_ROW_PITCH); CASE(CL_IMAGE_SLICE_PITCH); CASE(CL_IMAGE_WIDTH); CASE(CL_IMAGE_HEIGHT); CASE(CL_IMAGE_DEPTH); default: return getHexString(param_name); } } static std::string getErrorString(cl_int* errcode_ret) { if (errcode_ret == NULL) { return "NULL"; } std::ostringstream ss; ss << '&' << getErrorString(*errcode_ret); return ss.str(); } static std::string getHandlesString(const void* handles, cl_uint num_handles) { if (handles == NULL) { return "NULL"; } if (num_handles == 0) { return "[]"; } const cl_event* p = reinterpret_cast(handles); std::ostringstream ss; ss << '['; while (true) { ss << *p++; if (--num_handles == 0) { break; } ss << ','; } ss << ']'; return ss.str(); } static std::string getContextPropertyString(cl_context_properties cprop) { switch(cprop) { CASE(CL_CONTEXT_PLATFORM); default: return getHexString(cprop); } } static std::string getContextPropertiesString(const cl_context_properties* cprops) { if (cprops == NULL) { return "NULL"; } std::ostringstream ss; ss << '{'; while (*cprops != 0) { ss << getContextPropertyString(cprops[0]) << ',' << getHexString(cprops[1]) << ","; cprops += 2; } ss << "NULL}"; return ss.str(); } static std::string getCommandQueuePropertyString(cl_command_queue_properties property) { if (property == 0) { return "0"; } std::ostringstream ss; while (property) { if (property & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) { ss << "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE"; property &= ~CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE; } else if (property & CL_QUEUE_PROFILING_ENABLE) { ss << "CL_QUEUE_PROFILING_ENABLE"; property &= ~CL_QUEUE_PROFILING_ENABLE; } else { ss << "0x" << std::hex << (int)property; property = 0; } if (property != 0) { ss << '|'; } } return ss.str(); } static std::string getQueuePropertyString(const cl_queue_properties* qprops) { if (qprops == NULL) { return "NULL"; } std::ostringstream ss; cl_command_queue_properties property = 0; unsigned int queueSize = 0; const struct QueueProperty { cl_queue_properties name; union { cl_queue_properties raw; cl_uint size; } value; } *p = reinterpret_cast(qprops); if (p != NULL) while(p->name != 0) { switch(p->name) { case CL_QUEUE_PROPERTIES: property = static_cast(p->value.raw); if (property & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) { ss << "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE"; property &= ~CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE; } else if (property & CL_QUEUE_PROFILING_ENABLE) { ss << "CL_QUEUE_PROFILING_ENABLE"; property &= ~CL_QUEUE_PROFILING_ENABLE; } else { ss << "0x" << std::hex << (int)property; property = 0; } if (property != 0) { ss << '|'; } break; case CL_QUEUE_SIZE: // Unimplemented queueSize = p->value.size; ss << "QUEUE_SIZE " << queueSize; break; #define CL_QUEUE_REAL_TIME_COMPUTE_UNITS_AMD 0x404f case CL_QUEUE_REAL_TIME_COMPUTE_UNITS_AMD: queueSize = p->value.size; ss << " RT_QUEUE " << queueSize; break; #define CL_QUEUE_MEDIUM_PRIORITY_AMD 0x4050 case CL_QUEUE_MEDIUM_PRIORITY_AMD: queueSize = p->value.size; ss << " MEDIUM_PRIORITY " << queueSize; break; default: break; } ++p; } return ss.str(); } static std::string getMemFlagsString(cl_mem_flags flags) { if (flags == 0) { return "0"; } std::ostringstream ss; while (flags) { if (flags & CL_MEM_READ_WRITE) { ss << "CL_MEM_READ_WRITE"; flags &= ~CL_MEM_READ_WRITE; } else if (flags & CL_MEM_WRITE_ONLY) { ss << "CL_MEM_WRITE_ONLY"; flags &= ~CL_MEM_WRITE_ONLY; } else if (flags & CL_MEM_READ_ONLY) { ss << "CL_MEM_READ_ONLY"; flags &= ~CL_MEM_READ_ONLY; } else if (flags & CL_MEM_USE_HOST_PTR) { ss << "CL_MEM_USE_HOST_PTR"; flags &= ~CL_MEM_USE_HOST_PTR; } else if (flags & CL_MEM_ALLOC_HOST_PTR) { ss << "CL_MEM_ALLOC_HOST_PTR"; flags &= ~CL_MEM_ALLOC_HOST_PTR; } else if (flags & CL_MEM_COPY_HOST_PTR) { ss << "CL_MEM_COPY_HOST_PTR"; flags &= ~CL_MEM_COPY_HOST_PTR; } else { ss << "0x" << std::hex << (int)flags; flags = 0; } if (flags != 0) { ss << '|'; } } return ss.str(); } static std::string getMapFlagsString(cl_map_flags flags) { if (flags == 0) { return "0"; } std::ostringstream ss; while (flags) { if (flags & CL_MAP_READ) { ss << "CL_MAP_READ"; flags &= ~CL_MAP_READ; } else if (flags & CL_MAP_WRITE) { ss << "CL_MAP_WRITE"; flags &= ~CL_MAP_WRITE; } else { ss << "0x" << std::hex << (int)flags; flags = 0; } if (flags != 0) { ss << '|'; } } return ss.str(); } static std::string getBufferCreateString( cl_buffer_create_type type, const void* info) { std::ostringstream ss; if (type == CL_BUFFER_CREATE_TYPE_REGION) { const cl_buffer_region* region = (const cl_buffer_region*)info; ss << "CL_BUFFER_CREATE_TYPE_REGION,{"; ss << region->origin << ',' << region->size << '}'; } else { ss << getHexString(type) << ',' << info; } return ss.str(); } static std::string getChannelOrderString(cl_channel_order order) { switch(order) { CASE(CL_R); CASE(CL_A); CASE(CL_RG); CASE(CL_RA); CASE(CL_RGB); CASE(CL_RGBA); CASE(CL_BGRA); CASE(CL_ARGB); CASE(CL_INTENSITY); CASE(CL_LUMINANCE); CASE(CL_Rx); CASE(CL_RGx); CASE(CL_RGBx); default: return getHexString(order); } } static std::string getChannelTypeString(cl_channel_type type) { switch(type) { CASE(CL_SNORM_INT8); CASE(CL_SNORM_INT16); CASE(CL_UNORM_INT8); CASE(CL_UNORM_INT16); CASE(CL_UNORM_SHORT_565); CASE(CL_UNORM_SHORT_555); CASE(CL_UNORM_INT_101010); CASE(CL_SIGNED_INT8); CASE(CL_SIGNED_INT16); CASE(CL_SIGNED_INT32); CASE(CL_UNSIGNED_INT8); CASE(CL_UNSIGNED_INT16); CASE(CL_UNSIGNED_INT32); CASE(CL_HALF_FLOAT); CASE(CL_FLOAT); default: return getHexString(type); } } static std::string getImageFormatsString(const cl_image_format* format, size_t num_entries) { if (format == NULL) { return "NULL"; } std::ostringstream ss; ss << '['; while (true) { ss << '{' << getChannelOrderString(format->image_channel_order) << ','; ss << getChannelTypeString(format->image_channel_data_type) << '}'; if (--num_entries == 0) { break; } ss << ','; } ss << ']'; return ss.str(); } static std::string getImageDescString(const cl_image_desc* image_desc) { if (image_desc == NULL) { return "NULL"; } std::ostringstream ss; ss << '{' << getMemObjectTypeString(image_desc->image_type) << ','; ss << image_desc->image_width << ','; ss << image_desc->image_height << ','; ss << image_desc->image_depth << ','; ss << image_desc->image_array_size << ','; ss << image_desc->image_row_pitch << ','; ss << image_desc->image_slice_pitch << ','; ss << image_desc->num_mip_levels << ','; ss << image_desc->num_samples << ','; ss << image_desc->mem_object << '}'; return ss.str(); } static std::string getAddressingModeString(cl_addressing_mode mode) { switch(mode) { CASE(CL_ADDRESS_NONE); CASE(CL_ADDRESS_CLAMP_TO_EDGE); CASE(CL_ADDRESS_CLAMP); CASE(CL_ADDRESS_REPEAT); CASE(CL_ADDRESS_MIRRORED_REPEAT); default: return getHexString(mode); } } std::string getFilterModeString(cl_filter_mode mode) { switch(mode) { CASE(CL_FILTER_NEAREST); CASE(CL_FILTER_LINEAR); default: return getHexString(mode); } } static std::string getSamplerInfoString(cl_sampler_info param_name) { switch(param_name) { CASE(CL_SAMPLER_REFERENCE_COUNT); CASE(CL_SAMPLER_CONTEXT); CASE(CL_SAMPLER_NORMALIZED_COORDS); CASE(CL_SAMPLER_ADDRESSING_MODE); CASE(CL_SAMPLER_FILTER_MODE); default: return getHexString(param_name); } } std::string getDeviceTypeString(cl_device_type type) { if (type == CL_DEVICE_TYPE_ALL) { return "CL_DEVICE_TYPE_ALL"; } std::ostringstream ss; while (type) { if (type & CL_DEVICE_TYPE_CPU) { ss << "CL_DEVICE_TYPE_CPU"; type &= ~CL_DEVICE_TYPE_CPU; } else if (type & CL_DEVICE_TYPE_GPU) { ss << "CL_DEVICE_TYPE_GPU"; type &= ~CL_DEVICE_TYPE_GPU; } else if (type & CL_DEVICE_TYPE_ACCELERATOR) { ss << "CL_DEVICE_TYPE_ACCELERATOR"; type &= ~CL_DEVICE_TYPE_ACCELERATOR; } else { ss << "0x" << std::hex << (int)type; type = 0; } if (type != 0) { ss << '|'; } } return ss.str(); } static std::string getPlatformInfoString(cl_platform_info param_name) { switch (param_name) { CASE(CL_PLATFORM_PROFILE); CASE(CL_PLATFORM_VERSION); CASE(CL_PLATFORM_NAME); CASE(CL_PLATFORM_VENDOR); CASE(CL_PLATFORM_EXTENSIONS); CASE(CL_PLATFORM_ICD_SUFFIX_KHR); default: return getHexString(param_name); } } static std::string getKernelArgInfoString(cl_kernel_arg_info param_name) { switch (param_name) { CASE(CL_KERNEL_ARG_ADDRESS_QUALIFIER); CASE(CL_KERNEL_ARG_ACCESS_QUALIFIER); CASE(CL_KERNEL_ARG_TYPE_NAME); CASE(CL_KERNEL_ARG_TYPE_QUALIFIER); CASE(CL_KERNEL_ARG_NAME); default: return getHexString(param_name); } } static std::string getDeviceInfoString(cl_device_info param_name) { switch (param_name) { CASE(CL_DEVICE_TYPE); CASE(CL_DEVICE_VENDOR_ID); CASE(CL_DEVICE_MAX_COMPUTE_UNITS); CASE(CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS); CASE(CL_DEVICE_MAX_WORK_GROUP_SIZE); CASE(CL_DEVICE_MAX_WORK_ITEM_SIZES); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE); CASE(CL_DEVICE_MAX_CLOCK_FREQUENCY); CASE(CL_DEVICE_ADDRESS_BITS); CASE(CL_DEVICE_MAX_READ_IMAGE_ARGS); CASE(CL_DEVICE_MAX_WRITE_IMAGE_ARGS); CASE(CL_DEVICE_MAX_MEM_ALLOC_SIZE); CASE(CL_DEVICE_IMAGE2D_MAX_WIDTH); CASE(CL_DEVICE_IMAGE2D_MAX_HEIGHT); CASE(CL_DEVICE_IMAGE3D_MAX_WIDTH); CASE(CL_DEVICE_IMAGE3D_MAX_HEIGHT); CASE(CL_DEVICE_IMAGE3D_MAX_DEPTH); CASE(CL_DEVICE_IMAGE_SUPPORT); CASE(CL_DEVICE_MAX_PARAMETER_SIZE); CASE(CL_DEVICE_MAX_SAMPLERS); CASE(CL_DEVICE_MEM_BASE_ADDR_ALIGN); CASE(CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE); CASE(CL_DEVICE_SINGLE_FP_CONFIG); CASE(CL_DEVICE_GLOBAL_MEM_CACHE_TYPE); CASE(CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE); CASE(CL_DEVICE_GLOBAL_MEM_CACHE_SIZE); CASE(CL_DEVICE_GLOBAL_MEM_SIZE); CASE(CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE); CASE(CL_DEVICE_MAX_CONSTANT_ARGS); CASE(CL_DEVICE_LOCAL_MEM_TYPE); CASE(CL_DEVICE_LOCAL_MEM_SIZE); CASE(CL_DEVICE_ERROR_CORRECTION_SUPPORT); CASE(CL_DEVICE_PROFILING_TIMER_RESOLUTION); CASE(CL_DEVICE_ENDIAN_LITTLE); CASE(CL_DEVICE_AVAILABLE); CASE(CL_DEVICE_COMPILER_AVAILABLE); CASE(CL_DEVICE_EXECUTION_CAPABILITIES); CASE(CL_DEVICE_QUEUE_PROPERTIES); CASE(CL_DEVICE_NAME); CASE(CL_DEVICE_VENDOR); CASE(CL_DRIVER_VERSION); CASE(CL_DEVICE_PROFILE); CASE(CL_DEVICE_VERSION); CASE(CL_DEVICE_EXTENSIONS); CASE(CL_DEVICE_PLATFORM); CASE(CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF); CASE(CL_DEVICE_HOST_UNIFIED_MEMORY); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_INT); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE); CASE(CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF); CASE(CL_DEVICE_OPENCL_C_VERSION); default: return getHexString(param_name); } } static std::string getContextInfoString(cl_context_info param_name) { switch (param_name) { CASE(CL_CONTEXT_REFERENCE_COUNT); CASE(CL_CONTEXT_DEVICES); CASE(CL_CONTEXT_PROPERTIES); CASE(CL_CONTEXT_NUM_DEVICES); default: return getHexString(param_name); } } static std::string getCommandQueueInfoString(cl_command_queue_info param_name) { switch (param_name) { CASE(CL_QUEUE_CONTEXT); CASE(CL_QUEUE_DEVICE); CASE(CL_QUEUE_REFERENCE_COUNT); CASE(CL_QUEUE_PROPERTIES); default: return getHexString(param_name); } } static std::string getProgramInfoString(cl_program_info param_name) { switch (param_name) { CASE(CL_PROGRAM_REFERENCE_COUNT); CASE(CL_PROGRAM_CONTEXT); CASE(CL_PROGRAM_NUM_DEVICES); CASE(CL_PROGRAM_DEVICES); CASE(CL_PROGRAM_SOURCE); CASE(CL_PROGRAM_BINARY_SIZES); CASE(CL_PROGRAM_BINARIES); default: return getHexString(param_name); } } static std::string getKernelInfoString(cl_kernel_info param_name) { switch (param_name) { CASE(CL_KERNEL_FUNCTION_NAME); CASE(CL_KERNEL_NUM_ARGS); CASE(CL_KERNEL_REFERENCE_COUNT); CASE(CL_KERNEL_CONTEXT); CASE(CL_KERNEL_PROGRAM); default: return getHexString(param_name); } } static std::string getKernelExecInfoString(cl_kernel_exec_info param_name) { switch (param_name) { CASE(CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM); CASE(CL_KERNEL_EXEC_INFO_SVM_PTRS); CASE(CL_KERNEL_EXEC_INFO_NEW_VCOP_AMD); CASE(CL_KERNEL_EXEC_INFO_PFPA_VCOP_AMD); default: return getHexString(param_name); } } static std::string getKernelWorkGroupInfoString(cl_kernel_work_group_info param_name) { switch (param_name) { CASE(CL_KERNEL_WORK_GROUP_SIZE); CASE(CL_KERNEL_COMPILE_WORK_GROUP_SIZE); CASE(CL_KERNEL_LOCAL_MEM_SIZE); CASE(CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE); CASE(CL_KERNEL_PRIVATE_MEM_SIZE); default: return getHexString(param_name); } } static std::string getProgramBuildInfoString(cl_program_build_info param_name) { switch (param_name) { CASE(CL_PROGRAM_BUILD_STATUS); CASE(CL_PROGRAM_BUILD_OPTIONS); CASE(CL_PROGRAM_BUILD_LOG); default: return getHexString(param_name); } } static std::string getEventInfoString(cl_event_info param_name) { switch (param_name) { CASE(CL_EVENT_COMMAND_QUEUE); CASE(CL_EVENT_COMMAND_TYPE); CASE(CL_EVENT_REFERENCE_COUNT); CASE(CL_EVENT_COMMAND_EXECUTION_STATUS); CASE(CL_EVENT_CONTEXT); default: return getHexString(param_name); } } static std::string getProfilingInfoString(cl_profiling_info param_name) { switch (param_name) { CASE(CL_PROFILING_COMMAND_QUEUED); CASE(CL_PROFILING_COMMAND_SUBMIT); CASE(CL_PROFILING_COMMAND_START); CASE(CL_PROFILING_COMMAND_END); default: return getHexString(param_name); } } static std::string getCommandExecutionStatusString(cl_int param_name) { switch (param_name) { CASE(CL_COMPLETE); CASE(CL_RUNNING); CASE(CL_SUBMITTED); CASE(CL_QUEUED); default: return getHexString(param_name); } } static std::string getStringString(const char* src) { if (src == NULL) { return "NULL"; } std::string str(src); if (str.length() > 60) { str = str.substr(0, 60).append("..."); } size_t found = 0; while (true) { found = str.find_first_of("\n\r\t\"", found); if (found == std::string::npos) { break; } char subst[] = { '\\', '\0', '\0' }; switch (str[found]) { case '\n': subst[1] = 'n'; break; case '\r': subst[1] = 'r'; break; case '\t': subst[1] = 't'; break; case '\"': subst[1] = '\"'; break; default: ++found; continue; } str.replace(found, 1, subst); found += 2; } str.insert(size_t(0), size_t(1), '\"').append(1, '\"'); return str; } static std::string getProgramSourceString( const char** strings, const size_t* lengths, cl_uint count) { if (strings == NULL) { return "NULL"; } if (count == 0) { return "[]"; } std::ostringstream ss; ss << '['; for (cl_uint i = 0; i < count; ++i) { std::string src; if (lengths != NULL && lengths[i] != 0) { src = std::string(strings[i], lengths[i]); } else { src = strings[i]; } if (i != 0) { ss << ','; } ss << getStringString(src.c_str()); } ss << ']'; return ss.str(); } static cl_icd_dispatch_table original_dispatch; static cl_int CL_API_CALL GetPlatformIDs( cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms) { std::ostringstream ss; Rec r(&ss); ss << "clGetPlatformIDs(" << num_entries << ','; addRec(&r); cl_int ret = original_dispatch.GetPlatformIDs( num_entries, platforms, num_platforms); delRec(&r); ss << getHandlesString(platforms, num_entries) << ','; ss << getHexString(num_platforms) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetPlatformInfo( cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetPlatformInfo(" << platform << ','; ss << getPlatformInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetPlatformInfo( platform, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetDeviceIDs( cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) { std::ostringstream ss; Rec r(&ss); ss << "clGetDeviceIDs(" << platform << ','; ss << getDeviceTypeString(device_type) << ','; ss << num_entries << ','; addRec(&r); cl_int ret = original_dispatch.GetDeviceIDs( platform, device_type, num_entries, devices, num_devices); delRec(&r); ss << getHandlesString(devices, num_entries) << ','; ss << getDecimalString(num_devices) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetDeviceInfo( cl_device_id device, cl_device_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetDeviceInfo(" << device << ','; ss << getDeviceInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetDeviceInfo( device, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_context CL_API_CALL CreateContext( const cl_context_properties * properties, cl_uint num_devices, const cl_device_id * devices, void (CL_CALLBACK * pfn_notify)(const char *, const void *, size_t, void *), void * user_data, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateContext("; ss << getContextPropertiesString(properties) << ','; ss << num_devices << ','; ss << getHandlesString(devices, num_devices) << ','; ss << pfn_notify << ',' << user_data << ','; addRec(&r); cl_context ret = original_dispatch.CreateContext( properties, num_devices, devices, pfn_notify, user_data, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_context CL_API_CALL CreateContextFromType( const cl_context_properties * properties, cl_device_type device_type, void (CL_CALLBACK * pfn_notify)(const char *, const void *, size_t, void *), void * user_data, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateContextFromType("; ss << getContextPropertiesString(properties) << ','; ss << getDeviceTypeString(device_type) << ','; ss << pfn_notify << ',' << user_data << ','; addRec(&r); cl_context ret = original_dispatch.CreateContextFromType( properties, device_type, pfn_notify, user_data, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL RetainContext(cl_context context) { std::ostringstream ss; Rec r(&ss); ss << "clRetainContext(" << context; addRec(&r); cl_int ret = original_dispatch.RetainContext(context); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL ReleaseContext(cl_context context) { std::ostringstream ss; Rec r(&ss); ss << "clReleaseContext(" << context; addRec(&r); cl_int ret = original_dispatch.ReleaseContext(context); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetContextInfo( cl_context context, cl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetContextInfo(" << context << ','; ss << getContextInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetContextInfo( context, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_command_queue CL_API_CALL CreateCommandQueue( cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateCommandQueue(" << context << ',' << device << ','; ss << getCommandQueuePropertyString(properties) << ','; addRec(&r); cl_command_queue ret = original_dispatch.CreateCommandQueue( context, device, properties, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_command_queue CL_API_CALL CreateCommandQueueWithProperties( cl_context context, cl_device_id device, const cl_queue_properties * properties, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateCommandQueueWithProperties(" << context << ',' << device << ','; ss << getQueuePropertyString(properties) << ','; addRec(&r); cl_command_queue ret = original_dispatch.CreateCommandQueueWithProperties( context, device, properties, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL RetainCommandQueue(cl_command_queue command_queue) { std::ostringstream ss; Rec r(&ss); ss << "clRetainCommandQueue(" << command_queue; addRec(&r); cl_int ret = original_dispatch.RetainCommandQueue(command_queue); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL ReleaseCommandQueue(cl_command_queue command_queue) { std::ostringstream ss; Rec r(&ss); ss << "clReleaseCommandQueue(" << command_queue; addRec(&r); cl_int ret = original_dispatch.ReleaseCommandQueue(command_queue); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetCommandQueueInfo( cl_command_queue command_queue, cl_command_queue_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetCommandQueueInfo(" << command_queue << ','; ss << getCommandQueueInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetCommandQueueInfo( command_queue, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL SetCommandQueueProperty( cl_command_queue command_queue, cl_command_queue_properties properties, cl_bool enable, cl_command_queue_properties * old_properties) { std::ostringstream ss; Rec r(&ss); ss << "clSetCommandQueueProperty(" << command_queue << ','; ss << getCommandQueuePropertyString(properties) << ','; ss << enable << ','; addRec(&r); cl_int ret = original_dispatch.SetCommandQueueProperty( command_queue, properties, enable, old_properties); delRec(&r); ss << getHexString(old_properties) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateBuffer( cl_context context, cl_mem_flags flags, size_t size, void * host_ptr, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateBuffer(" << context << ','; ss << getMemFlagsString(flags) << ',' << size << ',' << host_ptr << ','; addRec(&r); cl_mem ret = original_dispatch.CreateBuffer( context, flags, size, host_ptr, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateSubBuffer( cl_mem buffer, cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateSubBuffer(" << buffer << ','; ss << getMemFlagsString(flags) << ','; ss << getBufferCreateString(buffer_create_type, buffer_create_info) << ','; addRec(&r); cl_mem ret = original_dispatch.CreateSubBuffer( buffer, flags, buffer_create_type, buffer_create_info, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateImage2D( cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height, size_t image_row_pitch, void * host_ptr, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateImage2D(" << context << ','; ss << getMemFlagsString(flags) << ','; ss << getImageFormatsString(image_format, 1) << ','; ss << image_width << ',' << image_height << ',' << image_row_pitch << ','; ss << host_ptr << ','; addRec(&r); cl_mem ret = original_dispatch.CreateImage2D( context, flags, image_format, image_width, image_height, image_row_pitch, host_ptr, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateImage3D( cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height, size_t image_depth, size_t image_row_pitch, size_t image_slice_pitch, void * host_ptr, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateImage3D(" << context << ','; ss << getMemFlagsString(flags) << ','; ss << getImageFormatsString(image_format, 1) << ','; ss << image_width << ',' << image_height << ',' << image_depth << ','; ss << image_row_pitch << ',' << image_slice_pitch << ','; ss << host_ptr << ','; addRec(&r); cl_mem ret = original_dispatch.CreateImage3D( context, flags, image_format, image_width, image_height, image_depth, image_row_pitch, image_slice_pitch, host_ptr, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL RetainMemObject(cl_mem memobj) { std::ostringstream ss; Rec r(&ss); ss << "clRetainMemObject(" << memobj; addRec(&r); cl_int ret = original_dispatch.RetainMemObject(memobj); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL ReleaseMemObject(cl_mem memobj) { std::ostringstream ss; Rec r(&ss); ss << "clReleaseMemObject(" << memobj; addRec(&r); cl_int ret = original_dispatch.ReleaseMemObject(memobj); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetSupportedImageFormats( cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format * image_formats, cl_uint * num_image_formats) { std::ostringstream ss; Rec r(&ss); ss << "clGetSupportedImageFormats(" << context << ','; ss << getMemFlagsString(flags) << ','; ss << getMemObjectTypeString(image_type) << ','; ss << num_entries << ','; addRec(&r); cl_int ret = original_dispatch.GetSupportedImageFormats( context, flags, image_type, num_entries, image_formats, num_image_formats); delRec(&r); ss << getImageFormatsString(image_formats, num_entries) << ','; ss << getDecimalString(num_image_formats); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetMemObjectInfo( cl_mem memobj, cl_mem_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetMemObjectInfo(" << memobj << ','; ss << getMemInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetMemObjectInfo( memobj, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetImageInfo( cl_mem image, cl_image_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetImageInfo(" << image << ','; ss << getImageInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetImageInfo( image, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL SetMemObjectDestructorCallback( cl_mem memobj, void (CL_CALLBACK * pfn_notify)( cl_mem memobj, void* user_data), void * user_data) { std::ostringstream ss; Rec r(&ss); ss << "clSetMemObjectDestructorCallback(" << memobj << ','; ss << pfn_notify << ',' << user_data; addRec(&r); cl_int ret = original_dispatch.SetMemObjectDestructorCallback( memobj, pfn_notify, user_data); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_sampler CL_API_CALL CreateSampler( cl_context context, cl_bool normalized_coords, cl_addressing_mode addressing_mode, cl_filter_mode filter_mode, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateSampler(" << context << ','; ss << normalized_coords << ','; ss << getAddressingModeString(addressing_mode) << ','; ss << getFilterModeString(filter_mode) << ','; addRec(&r); cl_sampler ret = original_dispatch.CreateSampler( context, normalized_coords, addressing_mode, filter_mode, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL RetainSampler(cl_sampler sampler) { std::ostringstream ss; Rec r(&ss); ss << "clRetainSampler(" << sampler; addRec(&r); cl_int ret = original_dispatch.RetainSampler(sampler); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL ReleaseSampler(cl_sampler sampler) { std::ostringstream ss; Rec r(&ss); ss << "clReleaseSampler(" << sampler; addRec(&r); cl_int ret = original_dispatch.ReleaseSampler(sampler); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetSamplerInfo( cl_sampler sampler, cl_sampler_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetSamplerInfo(" << sampler << ','; ss << getSamplerInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetSamplerInfo( sampler, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_program CL_API_CALL CreateProgramWithSource( cl_context context, cl_uint count, const char ** strings, const size_t * lengths, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateProgramWithSource(" << context << ',' << count << ','; ss << getProgramSourceString(strings, lengths, count) << ','; ss << lengths << ','; addRec(&r); cl_program ret = original_dispatch.CreateProgramWithSource( context, count, strings, lengths, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_program CL_API_CALL CreateProgramWithBinary( cl_context context, cl_uint num_devices, const cl_device_id * device_list, const size_t * lengths, const unsigned char ** binaries, cl_int * binary_status, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateProgramWithBinary(" << context << ','; ss << num_devices << ',' << getHandlesString(device_list, num_devices); ss << ',' << lengths << ',' << binaries << ','; ss << binary_status << ','; addRec(&r); cl_program ret = original_dispatch.CreateProgramWithBinary( context, num_devices, device_list, lengths, binaries, binary_status, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL RetainProgram(cl_program program) { std::ostringstream ss; Rec r(&ss); ss << "clRetainProgram(" << program; addRec(&r); cl_int ret = original_dispatch.RetainProgram(program); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL ReleaseProgram(cl_program program) { std::ostringstream ss; Rec r(&ss); ss << "clReleaseProgram(" << program; addRec(&r); cl_int ret = original_dispatch.ReleaseProgram(program); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL BuildProgram( cl_program program, cl_uint num_devices, const cl_device_id * device_list, const char * options, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data) { std::ostringstream ss; Rec r(&ss); ss << "clBuildProgram(" << program << ','; ss << num_devices << ',' << getHandlesString(device_list, num_devices); ss << ',' << getStringString(options) << ','; ss << pfn_notify << ',' << user_data; addRec(&r); cl_int ret = original_dispatch.BuildProgram( program, num_devices, device_list, options, pfn_notify, user_data); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL UnloadCompiler(void) { std::ostringstream ss; Rec r(&ss); ss << "clUnloadCompiler("; addRec(&r); cl_int ret = original_dispatch.UnloadCompiler(); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetProgramInfo( cl_program program, cl_program_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetProgramInfo(" << program << ','; ss << getProgramInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetProgramInfo( program, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetProgramBuildInfo( cl_program program, cl_device_id device, cl_program_build_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetProgramBuildInfo(" << program << ',' << device << ','; ss << getProgramBuildInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetProgramBuildInfo( program, device, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_kernel CL_API_CALL CreateKernel( cl_program program, const char * kernel_name, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateKernel(" << program << ','; ss << getStringString(kernel_name) << ','; addRec(&r); cl_kernel ret = original_dispatch.CreateKernel( program, kernel_name, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL CreateKernelsInProgram( cl_program program, cl_uint num_kernels, cl_kernel * kernels, cl_uint * num_kernels_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateKernelInProgram(" << program << ','; ss << num_kernels << ',' << kernels << ','; addRec(&r); cl_int ret = original_dispatch.CreateKernelsInProgram( program, num_kernels, kernels, num_kernels_ret); delRec(&r); ss << getDecimalString(num_kernels_ret) << ','; ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL RetainKernel(cl_kernel kernel) { std::ostringstream ss; Rec r(&ss); ss << "clRetainKernel(" << kernel; addRec(&r); cl_int ret = original_dispatch.RetainKernel(kernel); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL ReleaseKernel(cl_kernel kernel) { std::ostringstream ss; Rec r(&ss); ss << "clReleaseKernel(" << kernel; addRec(&r); cl_int ret = original_dispatch.ReleaseKernel(kernel); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL SetKernelArg( cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void * arg_value) { std::ostringstream ss; Rec r(&ss); ss << "clSetKernelArg(" << kernel << ','; ss << arg_index << ',' << arg_size << ','; ss << getMemoryString(arg_value, arg_size); addRec(&r); cl_int ret = original_dispatch.SetKernelArg( kernel, arg_index, arg_size, arg_value); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetKernelInfo( cl_kernel kernel, cl_kernel_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetKernelInfo(" << kernel << ','; ss << getKernelInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetKernelInfo( kernel, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetKernelWorkGroupInfo( cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetKernelWorkGroupInfo(" << kernel << ',' << device << ','; ss << getKernelWorkGroupInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetKernelWorkGroupInfo( kernel, device, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL WaitForEvents( cl_uint num_events, const cl_event * event_list) { std::ostringstream ss; Rec r(&ss); ss << "clWaitForEvents(" << num_events << ','; ss << getHandlesString(event_list, num_events); addRec(&r); cl_int ret = original_dispatch.WaitForEvents( num_events, event_list); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetEventInfo( cl_event event, cl_event_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetEventInfo(" << event << ','; ss << getEventInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetEventInfo( event, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_event CL_API_CALL CreateUserEvent( cl_context context, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateUserEvent(" << context << ','; addRec(&r); cl_event ret = original_dispatch.CreateUserEvent( context, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL RetainEvent(cl_event event) { std::ostringstream ss; Rec r(&ss); ss << "clRetainEvent(" << event; addRec(&r); cl_int ret = original_dispatch.RetainEvent(event); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL ReleaseEvent(cl_event event) { std::ostringstream ss; Rec r(&ss); ss << "clReleaseEvent(" << event; addRec(&r); cl_int ret = original_dispatch.ReleaseEvent(event); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL SetUserEventStatus( cl_event event, cl_int execution_status) { std::ostringstream ss; Rec r(&ss); ss << "clSetUserEventStatus(" << event << ',' << execution_status; addRec(&r); cl_int ret = original_dispatch.SetUserEventStatus( event, execution_status); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL SetEventCallback( cl_event event, cl_int command_exec_callback_type, void (CL_CALLBACK * pfn_notify)(cl_event, cl_int, void *), void * user_data) { std::ostringstream ss; Rec r(&ss); ss << "clSetEventCallback(" << event << ','; ss << getCommandExecutionStatusString(command_exec_callback_type) << ','; ss << pfn_notify << ',' << user_data; addRec(&r); cl_int ret = original_dispatch.SetEventCallback( event, command_exec_callback_type, pfn_notify, user_data); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetEventProfilingInfo( cl_event event, cl_profiling_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetEventProfilingInfo(" << event << ','; ss << getProfilingInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetEventProfilingInfo( event, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL Flush(cl_command_queue command_queue) { std::ostringstream ss; Rec r(&ss); ss << "clFlush(" << command_queue; addRec(&r); cl_int ret = original_dispatch.Flush(command_queue); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL Finish(cl_command_queue command_queue) { std::ostringstream ss; Rec r(&ss); ss << "clFinish(" << command_queue; addRec(&r); cl_int ret = original_dispatch.Finish(command_queue); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueReadBuffer( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t cb, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueReadBuffer(" << command_queue << ','; ss << buffer << ',' << getBoolString(blocking_read) << ','; ss << offset << ',' << cb << ',' << ptr << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueReadBuffer( command_queue, buffer, blocking_read, offset, cb, ptr, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueReadBufferRect( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t * buffer_offset, const size_t * host_offset, const size_t * region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueReadBufferRect(" << command_queue << ','; ss << buffer << ',' << getBoolString(blocking_read) << ','; ss << getNDimString(buffer_offset, 3) << ','; ss << getNDimString(host_offset, 3) << ','; ss << getNDimString(region, 3) << ','; ss << buffer_row_pitch << ',' << buffer_slice_pitch << ','; ss << host_row_pitch << ',' << host_slice_pitch << ','; ss << ptr << ',' << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueReadBufferRect( command_queue, buffer, blocking_read, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueWriteBuffer( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t cb, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueWriteBuffer(" << command_queue << ','; ss << buffer << ',' << getBoolString(blocking_write) << ','; ss << offset << ',' << cb << ',' << ptr << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueWriteBuffer( command_queue, buffer, blocking_write, offset, cb, ptr, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueWriteBufferRect( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, const size_t * buffer_offset, const size_t * host_offset, const size_t * region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueWriteBufferRect(" << command_queue << ','; ss << buffer << ',' << getBoolString(blocking_write) << ','; ss << getNDimString(buffer_offset, 3) << ','; ss << getNDimString(host_offset, 3) << ','; ss << getNDimString(region, 3) << ','; ss << buffer_row_pitch << ',' << buffer_slice_pitch << ','; ss << host_row_pitch << ',' << host_slice_pitch << ','; ss << ptr << ',' << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueWriteBufferRect( command_queue, buffer, blocking_write, buffer_offset, host_offset, region, buffer_row_pitch, buffer_slice_pitch, host_row_pitch, host_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueCopyBuffer( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueCopyBuffer(" << command_queue << ','; ss << src_buffer << ',' << dst_buffer << ','; ss << src_offset << ',' << dst_offset << ',' << cb << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueCopyBuffer( command_queue, src_buffer, dst_buffer, src_offset, dst_offset, cb, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueCopyBufferRect( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, const size_t * src_origin, const size_t * dst_origin, const size_t * region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueCopyBufferRect(" << command_queue << ','; ss << src_buffer << ',' << dst_buffer << ','; ss << getNDimString(src_origin, 3) << ','; ss << getNDimString(dst_origin, 3) << ','; ss << getNDimString(region, 3) << ','; ss << src_row_pitch << ',' << src_slice_pitch << ','; ss << dst_row_pitch << ',' << dst_slice_pitch << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueCopyBufferRect( command_queue, src_buffer, dst_buffer, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueReadImage( cl_command_queue command_queue, cl_mem image, cl_bool blocking_read, const size_t * origin, const size_t * region, size_t row_pitch, size_t slice_pitch, void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueReadImage(" << command_queue << ','; ss << image << ',' << getBoolString(blocking_read) << ','; ss << getNDimString(origin, 3) << ','; ss << getNDimString(region, 3) << ','; ss << row_pitch << ',' << slice_pitch << ','; ss << ptr << ',' << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueReadImage( command_queue, image, blocking_read, origin, region, row_pitch, slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueWriteImage( cl_command_queue command_queue, cl_mem image, cl_bool blocking_write, const size_t * origin, const size_t * region, size_t input_row_pitch, size_t input_slice_pitch, const void * ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueWriteImage(" << command_queue << ','; ss << image << ',' << getBoolString(blocking_write) << ','; ss << getNDimString(origin, 3) << ','; ss << getNDimString(region, 3) << ','; ss << input_row_pitch << ',' << input_slice_pitch << ','; ss << ptr << ',' << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueWriteImage( command_queue, image, blocking_write, origin, region, input_row_pitch, input_slice_pitch, ptr, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueCopyImage( cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image, const size_t * src_origin, const size_t * dst_origin, const size_t * region, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueCopyImage(" << command_queue << ','; ss << src_image << ',' << dst_image << ','; ss << getNDimString(src_origin, 3) << ','; ss << getNDimString(dst_origin, 3) << ','; ss << getNDimString(region, 3) << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueCopyImage( command_queue, src_image, dst_image, src_origin, dst_origin, region, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueCopyImageToBuffer( cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buffer, const size_t * src_origin, const size_t * region, size_t dst_offset, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueCopyImageToBuffer(" << command_queue << ','; ss << src_image << ',' << dst_buffer << ','; ss << getNDimString(src_origin, 3) << ','; ss << getNDimString(region, 3) << ','; ss << dst_offset << ',' << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueCopyImageToBuffer( command_queue, src_image, dst_buffer, src_origin, region, dst_offset, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueCopyBufferToImage( cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_image, size_t src_offset, const size_t * dst_origin, const size_t * region, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueCopyBufferToImage(" << command_queue << ','; ss << src_buffer << ',' << dst_image << ',' << src_offset << ','; ss << getNDimString(dst_origin, 3) << ','; ss << getNDimString(region, 3) << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueCopyBufferToImage( command_queue, src_buffer, dst_image, src_offset, dst_origin, region, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static void * CL_API_CALL EnqueueMapBuffer( cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map, cl_map_flags map_flags, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueMapBuffer(" << command_queue << ','; ss << buffer << ',' << getBoolString(blocking_map) << ','; ss << getMapFlagsString(map_flags) << ','; ss << offset << ',' << cb << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); void* ret = original_dispatch.EnqueueMapBuffer( command_queue, buffer, blocking_map, map_flags, offset, cb, num_events_in_wait_list, event_wait_list, event, errcode_ret); delRec(&r); ss << getHexString(event) << ',' << getErrorString(errcode_ret); ss << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static void * CL_API_CALL EnqueueMapImage( cl_command_queue command_queue, cl_mem image, cl_bool blocking_map, cl_map_flags map_flags, const size_t * origin, const size_t * region, size_t * image_row_pitch, size_t * image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueMapImage(" << command_queue << ','; ss << image << ',' << getBoolString(blocking_map) << ','; ss << getMapFlagsString(map_flags) << ','; ss << getNDimString(origin, 3) << ','; ss << getNDimString(region, 3) << ','; ss << image_row_pitch << ',' << image_slice_pitch << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); void* ret = original_dispatch.EnqueueMapImage( command_queue, image, blocking_map, map_flags, origin, region, image_row_pitch, image_slice_pitch, num_events_in_wait_list, event_wait_list, event, errcode_ret); delRec(&r); ss << getHexString(event) << ',' << getErrorString(errcode_ret); ss << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueUnmapMemObject( cl_command_queue command_queue, cl_mem memobj, void * mapped_ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueUnmapMemObject(" << command_queue << ','; ss << memobj << ',' << mapped_ptr << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueUnmapMemObject( command_queue, memobj, mapped_ptr, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueNDRangeKernel( cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t * global_work_offset, const size_t * global_work_size, const size_t * local_work_size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueNDRangeKernel(" << command_queue << ','; ss << kernel << ',' << work_dim << ','; ss << getNDimString(global_work_offset, work_dim) << ','; ss << getNDimString(global_work_size, work_dim) << ','; ss << getNDimString(local_work_size, work_dim) << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueNDRangeKernel( command_queue, kernel, work_dim, global_work_offset, global_work_size, local_work_size, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueTask(cl_command_queue command_queue, cl_kernel kernel, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueTask(" << command_queue << ',' << kernel << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueTask( command_queue, kernel, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueNativeKernel( cl_command_queue command_queue, void (CL_CALLBACK *user_func)(void *), void * args, size_t cb_args, cl_uint num_mem_objects, const cl_mem * mem_list, const void ** args_mem_loc, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueNativeKernel(" << command_queue << ',' << user_func << ','; ss << args << ',' << cb_args << ',' << num_mem_objects << ','; ss << getHandlesString(mem_list, num_mem_objects) << ','; ss << args_mem_loc << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueNativeKernel( command_queue, user_func, args, cb_args, num_mem_objects, mem_list, args_mem_loc, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueMarker( cl_command_queue command_queue, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueMarker(" << command_queue << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueMarker(command_queue, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueWaitForEvents( cl_command_queue command_queue, cl_uint num_events, const cl_event * event_list) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueWaitForEvents(" << command_queue << ','; ss << num_events << ','; ss << getHandlesString(event_list, num_events); addRec(&r); cl_int ret = original_dispatch.EnqueueWaitForEvents( command_queue, num_events, event_list); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueBarrier(cl_command_queue command_queue) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueBarrier(" << command_queue; addRec(&r); cl_int ret = original_dispatch.EnqueueBarrier(command_queue); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static void * CL_API_CALL GetExtensionFunctionAddress(const char * func_name) { std::ostringstream ss; Rec r(&ss); ss << "clGetExtensionFunctionAddress(" << func_name; addRec(&r); void* ret = original_dispatch.GetExtensionFunctionAddress(func_name); delRec(&r); ss << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateFromGLBuffer( cl_context context, cl_mem_flags flags, cl_GLuint bufobj, int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateFromGLBuffer(" << context << ','; ss << getMemFlagsString(flags) << ',' << bufobj << ','; addRec(&r); cl_mem ret = original_dispatch.CreateFromGLBuffer( context, flags, bufobj, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateFromGLTexture2D( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateFromGLTexture2D(" << context << ','; ss << getMemFlagsString(flags) << ',' << target << ','; ss << miplevel << ',' << texture << ','; addRec(&r); cl_mem ret = original_dispatch.CreateFromGLTexture2D( context, flags, target, miplevel, texture, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateFromGLTexture3D( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateFromGLTexture3D(" << context << ','; ss << getMemFlagsString(flags) << ',' << target << ','; ss << miplevel << ',' << texture << ','; addRec(&r); cl_mem ret = original_dispatch.CreateFromGLTexture3D( context, flags, target, miplevel, texture, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateFromGLRenderbuffer( cl_context context, cl_mem_flags flags, cl_GLuint renderbuffer, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateFromGLRenderbuffer(" << context << ','; ss << getMemFlagsString(flags) << ',' << renderbuffer << ','; addRec(&r); cl_mem ret = original_dispatch.CreateFromGLRenderbuffer( context, flags, renderbuffer, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetGLObjectInfo( cl_mem memobj, cl_gl_object_type * gl_object_type, cl_GLuint * gl_object_name) { std::ostringstream ss; Rec r(&ss); ss << "clGetGLObjectInfo(" << memobj << ','; addRec(&r); cl_int ret = original_dispatch.GetGLObjectInfo( memobj, gl_object_type, gl_object_name); delRec(&r); ss << getHexString(gl_object_type) << ','; ss << getDecimalString(gl_object_name) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetGLTextureInfo( cl_mem memobj, cl_gl_texture_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetGLTextureInfo(" << memobj << ','; ss << param_name << ',' << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetGLTextureInfo( memobj, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetGLContextInfoKHR( const cl_context_properties * properties, cl_gl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetGLContextInfoKHR("; ss << getContextPropertiesString(properties) << ','; ss << param_name << ',' << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetGLContextInfoKHR( properties, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueAcquireGLObjects( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueAcquireGLObjects(" << command_queue << ','; ss << num_objects << ',' << getHandlesString(mem_objects, num_objects); ss << ',' << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueAcquireGLObjects( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueReleaseGLObjects( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueReleaseGLObjects(" << command_queue << ','; ss << num_objects << ',' << getHandlesString(mem_objects, num_objects); ss << ',' << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueReleaseGLObjects( command_queue, num_objects, mem_objects, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL RetainDevice( cl_device_id device) { std::ostringstream ss; Rec r(&ss); ss << "clRetainDevice(" << device; addRec(&r); cl_int ret = original_dispatch.RetainDevice( device); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL ReleaseDevice( cl_device_id device) { std::ostringstream ss; Rec r(&ss); ss << "ReleaseDevice(" << device; addRec(&r); cl_int ret = original_dispatch.ReleaseDevice( device); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateImage( cl_context context, cl_mem_flags flags, const cl_image_format * image_format, const cl_image_desc * image_desc, void * host_ptr, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "CreateImage(" << context << ','; ss << getMemFlagsString(flags) << ','; ss << getImageFormatsString(image_format, 1) << ','; ss << getImageDescString(image_desc) << ','; ss << host_ptr << ','; addRec(&r); cl_mem ret = original_dispatch.CreateImage( context, flags, image_format, image_desc, host_ptr, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_program CL_API_CALL CreateProgramWithBuiltInKernels( cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * kernel_names, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateProgramWithBuiltInKernels(" << context << ','; ss << num_devices << ',' << getHandlesString(device_list, num_devices); ss << ',' << kernel_names << ','; addRec(&r); cl_program ret = original_dispatch.CreateProgramWithBuiltInKernels( context, num_devices, device_list, kernel_names, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL CompileProgram( cl_program program, cl_uint num_devices, const cl_device_id * device_list, const char * options, cl_uint num_input_headers, const cl_program * input_headers, const char ** header_include_names, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data) { std::ostringstream ss; Rec r(&ss); ss << "clCompileProgram(" << program << ','; ss << num_devices << ',' << getHandlesString(device_list, num_devices); ss << options << ','; ss << num_devices << ',' << getHandlesString(input_headers, num_input_headers); ss << header_include_names << ','; ss << pfn_notify << ','; addRec(&r); cl_int ret = original_dispatch.CompileProgram( program, num_devices, device_list, options, num_input_headers, input_headers, header_include_names, pfn_notify, user_data); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_program CL_API_CALL LinkProgram( cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * options, cl_uint num_input_programs, const cl_program * input_programs, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clLinkProgram(" << context << ','; ss << num_devices << ',' << getHandlesString(device_list, num_devices); ss << options << ','; ss << getHandlesString(input_programs, num_input_programs); ss << pfn_notify << ',' << user_data << ','; addRec(&r); cl_program ret = original_dispatch.LinkProgram( context, num_devices, device_list, options, num_input_programs, input_programs, pfn_notify, user_data, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL UnloadPlatformCompiler( cl_platform_id platform) { std::ostringstream ss; Rec r(&ss); ss << "clUnloadPlatformCompiler(" << platform << ','; addRec(&r); cl_int ret = original_dispatch.UnloadPlatformCompiler( platform); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetKernelArgInfo( cl_kernel kernel, cl_uint arg_indx, cl_kernel_arg_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetKernelArgInfo(" << kernel << ','; ss << arg_indx << ','; ss << getKernelArgInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetKernelArgInfo( kernel, arg_indx, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueFillBuffer( cl_command_queue command_queue, cl_mem buffer, const void * pattern, size_t pattern_size, size_t offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueFillBuffer(" << command_queue << ','; ss << buffer << ',' << pattern << ',' << pattern_size << ','; ss << offset << ',' << cb << ',' ; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueFillBuffer( command_queue, buffer, pattern, pattern_size, offset, cb, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueFillImage( cl_command_queue command_queue, cl_mem image, const void * fill_color, const size_t origin[3], const size_t region[3], cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueFillImage(" << command_queue << ','; ss << image << ',' << fill_color << ','; ss << getNDimString(origin, 3) << ','; ss << getNDimString(region, 3) << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueFillImage( command_queue, image, fill_color, origin, region, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event) << ','; ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueMigrateMemObjects( cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem * mem_objects, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueMigrateMemObjects(" << command_queue << ','; ss << ',' << num_mem_objects << ','; ss << getHandlesString(mem_objects, num_mem_objects) << ',' << flags << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueMigrateMemObjects( command_queue, num_mem_objects, mem_objects, flags, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event) << ','; ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueMarkerWithWaitList( cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueMarkerWithWaitList(" << command_queue << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueMarkerWithWaitList( command_queue, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event) << ','; ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueBarrierWithWaitList( cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueBarrierWithWaitList(" << command_queue << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueBarrierWithWaitList( command_queue, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event) << ','; ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static void * CL_API_CALL GetExtensionFunctionAddressForPlatform( cl_platform_id platform, const char * function_name) { std::ostringstream ss; Rec r(&ss); ss << "clGetExtensionFunctionAddressForPlatform(" << platform << ','; ss << function_name << ','; addRec(&r); void* ret = original_dispatch.GetExtensionFunctionAddressForPlatform( platform, function_name); delRec(&r); ss << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreateFromGLTexture( cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreateFromGLTexture(" << context << ','; ss << getMemFlagsString(flags) << ',' << target << ','; ss << miplevel << ',' << texture << ','; addRec(&r); cl_mem ret = original_dispatch.CreateFromGLTexture( context, flags, target, miplevel, texture, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_mem CL_API_CALL CreatePipe( cl_context context, cl_mem_flags flags, cl_uint pipePacketSize, cl_uint pipeMaxPackets, const cl_pipe_properties * props, cl_int * errcode_ret) { std::ostringstream ss; Rec r(&ss); ss << "clCreatePipe(" << context << ','; ss << getMemFlagsString(flags) << ',' << pipePacketSize << ','<< pipeMaxPackets << ',' << props << ','; addRec(&r); cl_mem ret = original_dispatch.CreatePipe( context, flags, pipePacketSize, pipeMaxPackets, props, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL GetPipeInfo( cl_mem memobj, cl_pipe_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { std::ostringstream ss; Rec r(&ss); ss << "clGetPipeInfo(" << memobj << ','; ss << getMemInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.GetPipeInfo( memobj, param_name, param_value_size, param_value, param_value_size_ret); delRec(&r); ss << getHexString(param_value) << ','; ss << getHexString(param_value_size_ret) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static void* CL_API_CALL SVMAlloc( cl_context context, cl_svm_mem_flags flags, size_t size, cl_uint alignment) { std::ostringstream ss; Rec r(&ss); ss << "clSVMAlloc(" << context << ','; ss << getHexString(flags) << ','; ss << getHexString(size) << ','; ss << getHexString(alignment) << ") = "; addRec(&r); void* ret = original_dispatch.SVMAlloc(context, flags, size, alignment); delRec(&r); ss << ret << std::endl; std::cerr << ss.str(); return ret; } static void CL_API_CALL SVMFree(cl_context context, void* svm_pointer) { std::ostringstream ss; Rec r(&ss); ss << "clSVMFree(" << context << ','; ss << svm_pointer << ')'; addRec(&r); original_dispatch.SVMFree(context, svm_pointer); delRec(&r); ss << std::endl; std::cerr << ss.str(); } static cl_int CL_API_CALL EnqueueSVMFree( cl_command_queue command_queue, cl_uint num_svm_pointers, void * svm_pointers[], void (CL_CALLBACK * pfn_free_func)(cl_command_queue /*queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers */, void * /* user_data */), void * user_data, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueSVMMap(" << command_queue << ','; ss << num_svm_pointers << ','; ss << '['; for (cl_uint i = 0; i < num_svm_pointers; ++i) { ss << svm_pointers[i] << ','; } ss << "],"; ss << pfn_free_func << ','; ss << user_data << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueSVMFree( command_queue, num_svm_pointers, svm_pointers, pfn_free_func, user_data, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueSVMMemcpy( cl_command_queue command_queue, cl_bool blocking_copy, void * dst_ptr, const void * src_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueSVMMemcpy(" << command_queue << ','; ss << getBoolString(blocking_copy) << ','; ss << dst_ptr << ','; ss << src_ptr << ',' << getHexString(size) << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueSVMMemcpy( command_queue, blocking_copy, dst_ptr, src_ptr, size, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueSVMMemFill( cl_command_queue command_queue, void * svm_ptr, const void * pattern, size_t pattern_size, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_2_0 { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueSVMMemFill(" << command_queue << ','; ss << svm_ptr << ','; ss << pattern << ','; ss << getHexString(pattern_size) << ',' << getHexString(size) << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueSVMMemFill( command_queue, svm_ptr, pattern, pattern_size, size, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueSVMMap( cl_command_queue command_queue, cl_bool blocking_map, cl_map_flags flags, void * svm_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueSVMMap(" << command_queue << ','; ss << getBoolString(blocking_map) << ','; ss << getMapFlagsString(flags) << ','; ss << svm_ptr << ',' << getHexString(size) << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueSVMMap( command_queue, blocking_map, flags, svm_ptr, size, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL EnqueueSVMUnmap( cl_command_queue command_queue, void * svm_ptr, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) { std::ostringstream ss; Rec r(&ss); ss << "clEnqueueSVMUnmap(" << command_queue << ','; ss << svm_ptr << ','; ss << num_events_in_wait_list << ','; ss << getHandlesString(event_wait_list, num_events_in_wait_list) << ','; addRec(&r); cl_int ret = original_dispatch.EnqueueSVMUnmap( command_queue, svm_ptr, num_events_in_wait_list, event_wait_list, event); delRec(&r); ss << getHexString(event); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_sampler CL_API_CALL CreateSamplerWithProperties( cl_context context, const cl_sampler_properties * sampler_properties, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_2_0 { std::ostringstream ss; Rec r(&ss); ss << "clCreateSamplerWithProperties(" << context << ','; ss << "["; const struct SamplerProperty { cl_sampler_properties name; union { cl_sampler_properties raw; cl_bool normalizedCoords; cl_addressing_mode addressingMode; cl_filter_mode filterMode; cl_float lod; } value; } *p = reinterpret_cast(sampler_properties); if (p != NULL) while (p->name != 0) { ss << getSamplerInfoString((cl_sampler_info)p->name) << ':'; switch (p->name) { case CL_SAMPLER_NORMALIZED_COORDS: ss << getBoolString(p->value.normalizedCoords) << ','; break; case CL_SAMPLER_ADDRESSING_MODE: ss << getAddressingModeString(p->value.addressingMode) << ','; break; case CL_SAMPLER_FILTER_MODE: ss << getFilterModeString(p->value.filterMode) << ','; break; case CL_SAMPLER_MIP_FILTER_MODE: ss << getFilterModeString(p->value.filterMode) << ','; break; case CL_SAMPLER_LOD_MIN: ss << p->value.lod << ','; break; case CL_SAMPLER_LOD_MAX: ss << p->value.lod << ','; break; default: break; } ++p; } addRec(&r); cl_sampler ret = original_dispatch.CreateSamplerWithProperties( context, sampler_properties, errcode_ret); delRec(&r); ss << getErrorString(errcode_ret) << ") = " << ret; ss << ret << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL SetKernelArgSVMPointer(cl_kernel kernel, cl_uint arg_index, const void *arg_value) { std::ostringstream ss; Rec r(&ss); ss << "clSetKernelArgSVMPointer(" << kernel << ','; ss << arg_index << ','; ss << arg_value; addRec(&r); cl_int ret = original_dispatch.SetKernelArgSVMPointer( kernel, arg_index, arg_value); delRec(&r); ss << ") = " << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_int CL_API_CALL SetKernelExecInfo( cl_kernel kernel, cl_kernel_exec_info param_name, size_t param_value_size, const void* param_value) { std::ostringstream ss; Rec r(&ss); ss << "clSetKernelExecInfo(" << kernel << ','; ss << getKernelExecInfoString(param_name) << ','; ss << param_value_size << ','; addRec(&r); cl_int ret = original_dispatch.SetKernelExecInfo( kernel, param_name, param_value_size, param_value); delRec(&r); ss << getHexString(const_cast(param_value)) << ") = "; ss << getErrorString(ret); ss << std::endl; std::cerr << ss.str(); return ret; } static cl_icd_dispatch_table modified_dispatch = { /* OpenCL 1.0 */ GetPlatformIDs, GetPlatformInfo, GetDeviceIDs, GetDeviceInfo, CreateContext, CreateContextFromType, RetainContext, ReleaseContext, GetContextInfo, CreateCommandQueue, RetainCommandQueue, ReleaseCommandQueue, GetCommandQueueInfo, SetCommandQueueProperty, CreateBuffer, CreateImage2D, CreateImage3D, RetainMemObject, ReleaseMemObject, GetSupportedImageFormats, GetMemObjectInfo, GetImageInfo, CreateSampler, RetainSampler, ReleaseSampler, GetSamplerInfo, CreateProgramWithSource, CreateProgramWithBinary, RetainProgram, ReleaseProgram, BuildProgram, UnloadCompiler, GetProgramInfo, GetProgramBuildInfo, CreateKernel, CreateKernelsInProgram, RetainKernel, ReleaseKernel, SetKernelArg, GetKernelInfo, GetKernelWorkGroupInfo, WaitForEvents, GetEventInfo, RetainEvent, ReleaseEvent, GetEventProfilingInfo, Flush, Finish, EnqueueReadBuffer, EnqueueWriteBuffer, EnqueueCopyBuffer, EnqueueReadImage, EnqueueWriteImage, EnqueueCopyImage, EnqueueCopyImageToBuffer, EnqueueCopyBufferToImage, EnqueueMapBuffer, EnqueueMapImage, EnqueueUnmapMemObject, EnqueueNDRangeKernel, EnqueueTask, EnqueueNativeKernel, EnqueueMarker, EnqueueWaitForEvents, EnqueueBarrier, GetExtensionFunctionAddress, CreateFromGLBuffer, CreateFromGLTexture2D, CreateFromGLTexture3D, CreateFromGLRenderbuffer, GetGLObjectInfo, GetGLTextureInfo, EnqueueAcquireGLObjects, EnqueueReleaseGLObjects, GetGLContextInfoKHR, { NULL, NULL, NULL, NULL, NULL, NULL }, /* _reservedForD3D10KHR[6] */ /* OpenCL 1.1 */ SetEventCallback, CreateSubBuffer, SetMemObjectDestructorCallback, CreateUserEvent, SetUserEventStatus, EnqueueReadBufferRect, EnqueueWriteBufferRect, EnqueueCopyBufferRect, { NULL, NULL, NULL }, /* _reservedForDeviceFissionEXT[3] */ NULL, /* CreateEventFromGLsyncKHR */ /* OpenCL 1.2 */ NULL, /* CreateSubDevices */ RetainDevice, ReleaseDevice, CreateImage, CreateProgramWithBuiltInKernels, CompileProgram, LinkProgram, UnloadPlatformCompiler, GetKernelArgInfo, EnqueueFillBuffer, EnqueueFillImage, EnqueueMigrateMemObjects, EnqueueMarkerWithWaitList, EnqueueBarrierWithWaitList, GetExtensionFunctionAddressForPlatform, CreateFromGLTexture, { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }, /* _reservedD3DExtensions[10] */ { NULL, NULL, NULL, NULL }, /* _reservedEGLExtensions[4] */ /* OpenCL 2.0 */ CreateCommandQueueWithProperties, CreatePipe, GetPipeInfo, SVMAlloc, SVMFree, EnqueueSVMFree, EnqueueSVMMemcpy, EnqueueSVMMemFill, EnqueueSVMMap, EnqueueSVMUnmap, CreateSamplerWithProperties, SetKernelArgSVMPointer, SetKernelExecInfo, NULL, /* clGetKernelSubGroupInfoKHR */ /* OpenCL 2.1 */ NULL, /* clCloneKernel */ NULL, /* clCreateProgramWithILKHR */ NULL, /* clEnqueueSVMMigrateMem */ NULL, /* clGetDeviceAndHostTimer */ NULL, /* clGetHostTimer */ NULL, /* clGetKernelSubGroupInfo */ NULL, /* clSetDefaultDeviceCommandQueue */ /* OpenCL 2.2 */ NULL, /* clSetProgramReleaseCallback */ NULL, /* clSetProgramSpecializationConstant */ }; static void cleanup(void) { std::cerr.rdbuf(cerrStreamBufSave); } #define SET_ORIGINAL_EXTENSION(DISPATCH) \ memcpy(modified_dispatch._reservedFor##DISPATCH, \ original_dispatch._reservedFor##DISPATCH, \ sizeof(original_dispatch._reservedFor##DISPATCH)); #define SET_ORIGINAL(DISPATCH) \ modified_dispatch.DISPATCH = original_dispatch.DISPATCH; int32_t CL_CALLBACK vdiAgent_OnLoad(vdi_agent * agent) { char *clTraceLogEnv; int32_t err = agent->GetICDDispatchTable( agent, &original_dispatch, sizeof(original_dispatch)); if (err != CL_SUCCESS) { return err; } clTraceLogEnv = getenv("CL_TRACE_OUTPUT"); if(clTraceLogEnv!=NULL) { std::string clTraceLogStr = clTraceLogEnv; const std::size_t pidPos = clTraceLogStr.find("%pid%"); if (pidPos != std::string::npos) { #if defined(_WIN32) const std::int32_t pid = _getpid(); #else const std::int32_t pid = getpid(); #endif clTraceLogStr.replace(pidPos, 5, std::to_string(pid)); } clTraceLog.open(clTraceLogStr); cerrStreamBufSave = std::cerr.rdbuf(clTraceLog.rdbuf()); std::atexit(cleanup); } cl_platform_id platform; err = agent->GetPlatform(agent, &platform); if (err != CL_SUCCESS) { return err; } char version[256]; err = original_dispatch.GetPlatformInfo( platform, CL_PLATFORM_VERSION, sizeof(version), version, NULL); if (err != CL_SUCCESS) { return err; } std::cerr << "!!!" << std::endl << "!!! API trace for \"" << version << "\"" << std::endl << "!!!" << std::endl; SET_ORIGINAL_EXTENSION(D3D10KHR); SET_ORIGINAL_EXTENSION(DeviceFissionEXT); SET_ORIGINAL(CreateEventFromGLsyncKHR); SET_ORIGINAL(CreateSubDevices); SET_ORIGINAL_EXTENSION(D3DExtensions); SET_ORIGINAL_EXTENSION(EGLExtensions); SET_ORIGINAL(GetKernelSubGroupInfoKHR); SET_ORIGINAL(CloneKernel); SET_ORIGINAL(CreateProgramWithILKHR); SET_ORIGINAL(EnqueueSVMMigrateMem); SET_ORIGINAL(GetDeviceAndHostTimer); SET_ORIGINAL(GetHostTimer); SET_ORIGINAL(GetKernelSubGroupInfo); SET_ORIGINAL(SetDefaultDeviceCommandQueue); SET_ORIGINAL(SetProgramReleaseCallback); SET_ORIGINAL(SetProgramSpecializationConstant); err = agent->SetICDDispatchTable( agent, &modified_dispatch, sizeof(modified_dispatch)); if (err != CL_SUCCESS) { return err; } initRecs(); err = startChecker(); return err; } void CL_CALLBACK vdiAgent_OnUnload(vdi_agent * agent) { clTraceLog.close(); } clr-rocm-5.7.1/opencl/tools/cltrace/cltrace.def000066400000000000000000000000521450307266000213740ustar00rootroot00000000000000EXPORTS vdiAgent_OnLoad vdiAgent_OnUnload clr-rocm-5.7.1/opencl/tools/cltrace/cltrace.map000066400000000000000000000001231450307266000214120ustar00rootroot00000000000000CLTRACE_1.0 { global: vdiAgent_OnLoad; vdiAgent_OnUnload; local: *; }; clr-rocm-5.7.1/rocclr/000077500000000000000000000000001450307266000145315ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/.clang-format000066400000000000000000000004031450307266000171010ustar00rootroot00000000000000Language: Cpp BasedOnStyle: Google AlignEscapedNewlinesLeft: false AlignOperands: false ColumnLimit: 100 AlwaysBreakTemplateDeclarations: false DerivePointerAlignment: false IndentFunctionDeclarationAfterType: false MaxEmptyLinesToKeep: 2 SortIncludes: false clr-rocm-5.7.1/rocclr/.gitattributes000066400000000000000000000012041450307266000174210ustar00rootroot00000000000000# Set the default behavior, in case people don't have core.autolf set. * text=auto # Explicitly declare text files you want to always be normalized and converted # to have LF line endings on checkout. *.c text eol=lf *.cpp text eol=lf *.cc text eol=lf *.h text eol=lf *.hpp text eol=lf *.txt text eol=lf *.asm text eol=lf # Define files to support auto-remove trailing white space # Need to run the command below, before add modified file(s) to the staging area # git config filter.trimspace.clean 'sed -e "s/[[:space:]]*$//g"' *.cpp filter=trimspace *.c filter=trimspace *.h filter=trimspacecpp *.hpp filter=trimspace *.md filter=trimspace clr-rocm-5.7.1/rocclr/.gitignore000066400000000000000000000001171450307266000165200ustar00rootroot00000000000000.* !.gitignore *.d *.o *.obj *.gch *.pch *.so *.dll *.a *.lib *.exe *.out buildclr-rocm-5.7.1/rocclr/CMakeLists.txt000066400000000000000000000026501450307266000172740ustar00rootroot00000000000000# Copyright (c) 2017 - 2021 Advanced Micro Devices, Inc. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.5) project(ROCclr) if (${CMAKE_SOURCE_DIR} STREQUAL ${CMAKE_CURRENT_SOURCE_DIR}) message(AUTHOR_WARNING "ROCclr is being built as a standalone project. This isn't supported anymore.") endif() list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake) include(ROCclr) clr-rocm-5.7.1/rocclr/LICENSE.txt000066400000000000000000000020701450307266000163530ustar00rootroot00000000000000Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. clr-rocm-5.7.1/rocclr/README.md000066400000000000000000000077421450307266000160220ustar00rootroot00000000000000# ROCclr - Radeon Open Compute Common Language Runtime ROCclr is a virtual device interface that compute runtimes interact with to different backends such as ROCr or PAL This abstraction allows runtimes to work on Windows as well as on Linux without much effort. # DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. © 2021 Advanced Micro Devices, Inc. All Rights Reserved. ## Repository branches The repository maintains several branches. The branches that are of importance are: - Main branch: This is the stable branch. It is up to date with the latest release branch, for example, if the latest ROCM release is rocm-4.1.x, main branch will be the repository based on this release. - Develop branch: This is the default branch, on which the new features are still under development and visible. While this maybe of interest to many, it should be noted that this branch and the features under development might not be stable. - Release branches. These are branches corresponding to each ROCM release, listed with release tags, such as rocm-4.0.x, rocm-4.1.x, etc. ## Building ### Prerequisites - Install mesa-common-dev - Either build or install [COMGR](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport), [CLANG](https://github.com/RadeonOpenCompute/llvm-project) and [Device Library](https://github.com/RadeonOpenCompute/ROCm-Device-Libs) ### Getting the source code ```bash git clone -b main https://github.com/ROCm-Developer-Tools/ROCclr.git git clone -b main https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime.git ``` ### Set the environment variables ```bash export ROCclr_DIR="$(readlink -f ROCclr)" export OPENCL_DIR="$(readlink -f ROCm-OpenCL-Runtime)" ``` ### Build ROCclr Here is command to build ROCclr: ```bash cd "$ROCclr_DIR" mkdir -p build; cd build cmake -DOPENCL_DIR="$OPENCL_DIR" -DCMAKE_INSTALL_PREFIX=/opt/rocm/rocclr .. make -j$(nproc) sudo make install ``` ### Optional steps to build HIP runtime Enter the directory where git cloned the ROCClr and OpenCL. Run the following commands: ```bash git clone -b main https://github.com/ROCm-Developer-Tools/HIP.git export HIP_DIR="$(readlink -f HIP)" cd "$HIP_DIR" mkdir -p build; cd build cmake -DCMAKE_PREFIX_PATH="$ROCclr_DIR/build;/opt/rocm/" .. make -j$(nproc) ``` ### Release build For release build, add "-DCMAKE_BUILD_TYPE=Release" to the cmake command line. clr-rocm-5.7.1/rocclr/cmake/000077500000000000000000000000001450307266000156115ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/cmake/FindAMD_HSA_LOADER.cmake000066400000000000000000000047011450307266000215400ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. if(AMD_HSA_LOADER_FOUND) return() endif() find_path(AMD_LIBELF_INCLUDE_DIR libelf.h HINTS ${AMD_LIBELF_PATH} PATHS ${CMAKE_SOURCE_DIR}/hsail-compiler/lib/loaders/elf/utils/libelf ${CMAKE_SOURCE_DIR}/../hsail-compiler/lib/loaders/elf/utils/libelf ${CMAKE_SOURCE_DIR}/../../hsail-compiler/lib/loaders/elf/utils/libelf NO_DEFAULT_PATH) find_path(AMD_HSAIL_INCLUDE_DIR hsa.h HINTS ${AMD_SC_PATH} PATHS ${CMAKE_SOURCE_DIR}/sc ${CMAKE_SOURCE_DIR}/../sc ${CMAKE_SOURCE_DIR}/../../sc PATH_SUFFIXES HSAIL/include) include(FindPackageHandleStandardArgs) find_package_handle_standard_args(AMD_HSA_LOADER "\nHSA Loader not found" AMD_LIBELF_INCLUDE_DIR AMD_HSAIL_INCLUDE_DIR) mark_as_advanced(AMD_LIBELF_INCLUDE_DIR AMD_HSAIL_INCLUDE_DIR) set(USE_AMD_LIBELF "yes" CACHE FORCE "") # TODO compiler team requested supporting sp3 disassembly set(NO_SI_SP3 "yes" CACHE FORCE "") set(HSAIL_COMPILER_SOURCE_DIR "${AMD_LIBELF_INCLUDE_DIR}/../../../../..") set(HSAIL_ELFTOOLCHAIN_DIR ${HSAIL_COMPILER_SOURCE_DIR}/lib/loaders/elf/utils) add_subdirectory("${AMD_LIBELF_INCLUDE_DIR}" ${CMAKE_CURRENT_BINARY_DIR}/libelf) add_subdirectory("${AMD_HSAIL_INCLUDE_DIR}/../ext/libamdhsacode" ${CMAKE_CURRENT_BINARY_DIR}/libamdhsacode) add_subdirectory("${AMD_HSAIL_INCLUDE_DIR}/../ext/loader" ${CMAKE_CURRENT_BINARY_DIR}/loader) clr-rocm-5.7.1/rocclr/cmake/FindAMD_OPENCL.cmake000066400000000000000000000052111450307266000210540ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. if(AMD_OPENCL_FOUND) return() endif() find_path(AMD_OPENCL_INCLUDE_DIR cl.h HINTS ${AMD_OPENCL_PATH} PATHS # gerrit repo name ${CMAKE_SOURCE_DIR}/opencl ${CMAKE_SOURCE_DIR}/../opencl ${CMAKE_SOURCE_DIR}/../../opencl # github repo name ${CMAKE_SOURCE_DIR}/ROCm-OpenCL-Runtime ${CMAKE_SOURCE_DIR}/../ROCm-OpenCL-Runtime ${CMAKE_SOURCE_DIR}/../../ROCm-OpenCL-Runtime # jenkins repo name ${CMAKE_SOURCE_DIR}/opencl-on-vdi ${CMAKE_SOURCE_DIR}/../opencl-on-vdi ${CMAKE_SOURCE_DIR}/../../opencl-on-vdi ${CMAKE_SOURCE_DIR}/opencl-on-rocclr ${CMAKE_SOURCE_DIR}/../opencl-on-rocclr ${CMAKE_SOURCE_DIR}/../../opencl-on-rocclr PATH_SUFFIXES khronos/headers/opencl2.2/CL NO_DEFAULT_PATH) include(FindPackageHandleStandardArgs) find_package_handle_standard_args(AMD_OPENCL "\nAMD OpenCL not found" AMD_OPENCL_INCLUDE_DIR) mark_as_advanced(AMD_OPENCL_INCLUDE_DIR) set(AMD_OPENCL_DEFS -DHAVE_CL2_HPP -DOPENCL_MAJOR=2 -DOPENCL_MINOR=1 -DOPENCL_C_MAJOR=2 -DOPENCL_C_MINOR=0 -DCL_TARGET_OPENCL_VERSION=220 -DCL_USE_DEPRECATED_OPENCL_1_0_APIS -DCL_USE_DEPRECATED_OPENCL_1_1_APIS -DCL_USE_DEPRECATED_OPENCL_1_2_APIS -DCL_USE_DEPRECATED_OPENCL_2_0_APIS) mark_as_advanced(AMD_OPENCL_DEFS) set(AMD_OPENCL_INCLUDE_DIRS ${AMD_OPENCL_INCLUDE_DIR} ${AMD_OPENCL_INCLUDE_DIR}/.. ${AMD_OPENCL_INCLUDE_DIR}/../.. ${AMD_OPENCL_INCLUDE_DIR}/../../.. ${AMD_OPENCL_INCLUDE_DIR}/../../../.. ${AMD_OPENCL_INCLUDE_DIR}/../../../../amdocl) mark_as_advanced(AMD_OPENCL_INCLUDE_DIRS) clr-rocm-5.7.1/rocclr/cmake/FindAMD_PAL.cmake000066400000000000000000000045701450307266000205170ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. if(AMD_PAL_FOUND) return() endif() find_path(AMD_ASIC_REG_INCLUDE_DIR nv_id.h HINTS ${AMD_DRIVERS_PATH} PATHS # p4 repo layout ${CMAKE_SOURCE_DIR}/drivers ${CMAKE_SOURCE_DIR}/../drivers ${CMAKE_SOURCE_DIR}/../../drivers # github ent repo layout ${CMAKE_SOURCE_DIR}/drivers/drivers ${CMAKE_SOURCE_DIR}/../drivers/drivers ${CMAKE_SOURCE_DIR}/../../drivers/drivers PATH_SUFFIXES inc/asic_reg) find_path(AMD_HSAIL_INCLUDE_DIR hsa.h HINTS ${AMD_SC_PATH} PATHS ${CMAKE_SOURCE_DIR}/sc ${CMAKE_SOURCE_DIR}/../sc ${CMAKE_SOURCE_DIR}/../../sc PATH_SUFFIXES HSAIL/include) find_path(AMD_PAL_INCLUDE_DIR pal.h HINTS ${AMD_PAL_PATH} PATHS ${CMAKE_SOURCE_DIR}/pal ${CMAKE_SOURCE_DIR}/../pal ${CMAKE_SOURCE_DIR}/../../pal PATH_SUFFIXES inc/core) include(FindPackageHandleStandardArgs) find_package_handle_standard_args(AMD_PAL "\nPAL not found" AMD_ASIC_REG_INCLUDE_DIR AMD_HSAIL_INCLUDE_DIR AMD_PAL_INCLUDE_DIR) mark_as_advanced(AMD_ASIC_REG_INCLUDE_DIR AMD_HSAIL_INCLUDE_DIR AMD_PAL_INCLUDE_DIR) set(GLOBAL_ROOT_SRC_DIR "${AMD_ASIC_REG_INCLUDE_DIR}/../../..") set(PAL_SC_PATH "${AMD_HSAIL_INCLUDE_DIR}/../..") add_subdirectory("${AMD_PAL_INCLUDE_DIR}/../.." ${CMAKE_CURRENT_BINARY_DIR}/pal) clr-rocm-5.7.1/rocclr/cmake/FindNUMA.cmake000066400000000000000000000025511450307266000201570ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. find_path(NUMA_INCLUDE_DIR numa.h) find_library(NUMA_LIBRARIES numa) include(FindPackageHandleStandardArgs) find_package_handle_standard_args(NUMA DEFAULT_MSG NUMA_LIBRARIES NUMA_INCLUDE_DIR) mark_as_advanced(NUMA_LIBRARIES NUMA_INCLUDE_DIR) clr-rocm-5.7.1/rocclr/cmake/ROCclr.cmake000066400000000000000000000116741450307266000177500ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. cmake_minimum_required(VERSION 3.5) # ROCclr abstracts the usage of multiple AMD compilers and runtimes. # It is possible to support multiple backends concurrently in the same binary. option(ROCCLR_ENABLE_HSAIL "Enable support for HSAIL compiler" OFF) option(ROCCLR_ENABLE_LC "Enable support for LC compiler" ON) option(ROCCLR_ENABLE_HSA "Enable support for HSA runtime" ON) option(ROCCLR_ENABLE_PAL "Enable support for PAL runtime" OFF) if((NOT ROCCLR_ENABLE_HSAIL) AND (NOT ROCCLR_ENABLE_LC)) message(FATAL "Support for at least one compiler needs to be enabled!") endif() if((NOT ROCCLR_ENABLE_HSA) AND (NOT ROCCLR_ENABLE_PAL)) message(FATAL "Support for at least one runtime needs to be enabled!") endif() set(THREADS_PREFER_PTHREAD_FLAG ON) find_package(Threads REQUIRED) find_package(AMD_OPENCL) add_library(rocclr STATIC) include(ROCclrCompilerOptions) # To Fix path issue due to current dir (cmake folder - cmake/../) in debuginfo get_filename_component(_ROCCLR_SRC_DIR_PATH "${CMAKE_CURRENT_LIST_DIR}/../" REALPATH) set(ROCCLR_SRC_DIR "${_ROCCLR_SRC_DIR_PATH}") mark_as_advanced(ROCCLR_SRC_DIR) set(ROCCLR_INCLUDE_DIR "${ROCCLR_SRC_DIR}/include" PARENT_SCOPE) mark_as_advanced(ROCCLR_INCLUDE_DIR) set_target_properties(rocclr PROPERTIES CXX_STANDARD 17 CXX_STANDARD_REQUIRED ON CXX_EXTENSIONS OFF POSITION_INDEPENDENT_CODE ON) target_sources(rocclr PRIVATE ${ROCCLR_SRC_DIR}/compiler/lib/utils/options.cpp ${ROCCLR_SRC_DIR}/device/appprofile.cpp ${ROCCLR_SRC_DIR}/device/blit.cpp ${ROCCLR_SRC_DIR}/device/blitcl.cpp ${ROCCLR_SRC_DIR}/device/comgrctx.cpp ${ROCCLR_SRC_DIR}/device/devhcmessages.cpp ${ROCCLR_SRC_DIR}/device/devhcprintf.cpp ${ROCCLR_SRC_DIR}/device/devhostcall.cpp ${ROCCLR_SRC_DIR}/device/device.cpp ${ROCCLR_SRC_DIR}/device/devkernel.cpp ${ROCCLR_SRC_DIR}/device/devprogram.cpp ${ROCCLR_SRC_DIR}/device/devwavelimiter.cpp ${ROCCLR_SRC_DIR}/device/hsailctx.cpp ${ROCCLR_SRC_DIR}/elf/elf.cpp ${ROCCLR_SRC_DIR}/os/alloc.cpp ${ROCCLR_SRC_DIR}/os/os_posix.cpp ${ROCCLR_SRC_DIR}/os/os_win32.cpp ${ROCCLR_SRC_DIR}/os/os.cpp ${ROCCLR_SRC_DIR}/platform/activity.cpp ${ROCCLR_SRC_DIR}/platform/agent.cpp ${ROCCLR_SRC_DIR}/platform/command.cpp ${ROCCLR_SRC_DIR}/platform/commandqueue.cpp ${ROCCLR_SRC_DIR}/platform/context.cpp ${ROCCLR_SRC_DIR}/platform/kernel.cpp ${ROCCLR_SRC_DIR}/platform/memory.cpp ${ROCCLR_SRC_DIR}/platform/ndrange.cpp ${ROCCLR_SRC_DIR}/platform/program.cpp ${ROCCLR_SRC_DIR}/platform/runtime.cpp ${ROCCLR_SRC_DIR}/platform/interop_gl.cpp ${ROCCLR_SRC_DIR}/thread/monitor.cpp ${ROCCLR_SRC_DIR}/thread/semaphore.cpp ${ROCCLR_SRC_DIR}/thread/thread.cpp ${ROCCLR_SRC_DIR}/utils/debug.cpp ${ROCCLR_SRC_DIR}/utils/flags.cpp) if(WIN32) target_sources(rocclr PRIVATE ${ROCCLR_SRC_DIR}/platform/interop_d3d9.cpp ${ROCCLR_SRC_DIR}/platform/interop_d3d10.cpp ${ROCCLR_SRC_DIR}/platform/interop_d3d11.cpp) target_compile_definitions(rocclr PUBLIC ATI_OS_WIN) else() target_compile_definitions(rocclr PUBLIC ATI_OS_LINUX) endif() if(CMAKE_SIZEOF_VOID_P EQUAL 4) target_compile_definitions(rocclr PUBLIC ATI_BITS_32) endif() target_compile_definitions(rocclr PUBLIC LITTLEENDIAN_CPU ${AMD_OPENCL_DEFS}) target_include_directories(rocclr PUBLIC ${ROCCLR_SRC_DIR} ${ROCCLR_SRC_DIR}/compiler/lib ${ROCCLR_SRC_DIR}/compiler/lib/include ${ROCCLR_SRC_DIR}/compiler/lib/backends/common ${ROCCLR_SRC_DIR}/device ${ROCCLR_SRC_DIR}/elf ${ROCCLR_SRC_DIR}/include ${AMD_OPENCL_INCLUDE_DIRS}) target_link_libraries(rocclr PUBLIC Threads::Threads) # IPC on Windows is not supported if(UNIX) target_link_libraries(rocclr PUBLIC rt) endif() if(ROCCLR_ENABLE_HSAIL) include(ROCclrHSAIL) endif() if(ROCCLR_ENABLE_LC) include(ROCclrLC) endif() if(ROCCLR_ENABLE_HSA) include(ROCclrHSA) endif() if(ROCCLR_ENABLE_PAL) include(ROCclrPAL) endif() clr-rocm-5.7.1/rocclr/cmake/ROCclrCompilerOptions.cmake000066400000000000000000000027721450307266000230160ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. include_guard() if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC") if (CMAKE_VERSION VERSION_LESS "3.20") # This code is neccessary to avoid this command line warning: # "Overriding /GR with /GR- cl: command line warning D9025" # # /GR is implied by MSVC anyway. So getting rid of it doesn't matter. string(REPLACE "/GR" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}) endif() endif() clr-rocm-5.7.1/rocclr/cmake/ROCclrHSA.cmake000066400000000000000000000045561450307266000203050ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. find_package(hsa-runtime64 1.11 REQUIRED CONFIG PATHS /opt/rocm/ ${ROCM_INSTALL_PATH} PATH_SUFFIXES cmake/hsa-runtime64 lib/cmake/hsa-runtime64 lib64/cmake/hsa-runtime64) target_link_libraries(rocclr PUBLIC hsa-runtime64::hsa-runtime64) find_package(NUMA) if(NUMA_FOUND) target_compile_definitions(rocclr PUBLIC ROCCLR_SUPPORT_NUMA_POLICY) target_include_directories(rocclr PUBLIC ${NUMA_INCLUDE_DIR}) target_link_libraries(rocclr PUBLIC ${NUMA_LIBRARIES}) endif() find_package(OpenGL REQUIRED) target_sources(rocclr PRIVATE ${ROCCLR_SRC_DIR}/device/rocm/rocappprofile.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocblit.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocblitcl.cpp ${ROCCLR_SRC_DIR}/device/rocm/roccounters.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocdevice.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocglinterop.cpp ${ROCCLR_SRC_DIR}/device/rocm/rockernel.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocmemory.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocprintf.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocprogram.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocsettings.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocsignal.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocvirtual.cpp ${ROCCLR_SRC_DIR}/device/rocm/rocurilocator.cpp) target_compile_definitions(rocclr PUBLIC WITH_HSA_DEVICE) clr-rocm-5.7.1/rocclr/cmake/ROCclrHSAIL.cmake000066400000000000000000000022731450307266000205240ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. target_compile_definitions(rocclr PUBLIC WITH_COMPILER_LIB HSAIL_DYN_DLL) clr-rocm-5.7.1/rocclr/cmake/ROCclrLC.cmake000066400000000000000000000035161450307266000201630ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. find_package(amd_comgr REQUIRED CONFIG PATHS /opt/rocm/ ${ROCM_INSTALL_PATH} PATH_SUFFIXES cmake/amd_comgr lib/cmake/amd_comgr) target_compile_definitions(rocclr PUBLIC WITH_LIGHTNING_COMPILER USE_COMGR_LIBRARY) if(BUILD_SHARED_LIBS) target_compile_definitions(rocclr PUBLIC COMGR_DYN_DLL) endif() target_link_libraries(rocclr PUBLIC amd_comgr) if(CLR_BUILD_HIP) # Temporary hack for versioned comgr needed by hiprtc file(STRINGS ${HIP_COMMON_DIR}/VERSION VERSION_LIST REGEX "^[0-9]+") list(GET VERSION_LIST 0 HIP_VERSION_MAJOR) list(GET VERSION_LIST 1 HIP_VERSION_MINOR) add_definitions(-DHIP_MAJOR_VERSION=${HIP_VERSION_MAJOR}) add_definitions(-DHIP_MINOR_VERSION=${HIP_VERSION_MINOR}) endif() clr-rocm-5.7.1/rocclr/cmake/ROCclrPAL.cmake000066400000000000000000000061701450307266000203000ustar00rootroot00000000000000# Copyright (c) 2020 - 2021 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. set(PAL_CLIENT "OCL") set(PAL_CLIENT_INTERFACE_MAJOR_VERSION 792) set(GPUOPEN_CLIENT_INTERFACE_MAJOR_VERSION 42) set(GPUOPEN_CLIENT_INTERFACE_MINOR_VERSION 0) set(PAL_CLOSED_SOURCE ON) set(PAL_DEVELOPER_BUILD OFF) set(PAL_BUILD_GPUOPEN ON) set(PAL_BUILD_SCPC OFF) set(PAL_BUILD_VIDEO OFF) set(PAL_BUILD_DTIF OFF) set(PAL_BUILD_OSS ON) set(PAL_BUILD_SECURITY OFF) set(PAL_SPPAP_CLOSED_SOURCE OFF) set(PAL_BUILD_GFX ON) set(PAL_BUILD_NULL_DEVICE OFF) set(PAL_BUILD_GFX6 ON) set(PAL_BUILD_GFX9 ON) set(PAL_BUILD_GFX11 ON) set(PAL_BUILD_NAVI31 ON) set(PAL_BUILD_NAVI32 ON) set(PAL_BUILD_NAVI33 ON) set(PAL_BUILD_PHOENIX1 ON) find_package(AMD_PAL) find_package(AMD_HSA_LOADER) target_sources(rocclr PRIVATE ${ROCCLR_SRC_DIR}/device/pal/palappprofile.cpp ${ROCCLR_SRC_DIR}/device/pal/palblit.cpp ${ROCCLR_SRC_DIR}/device/pal/palconstbuf.cpp ${ROCCLR_SRC_DIR}/device/pal/palcounters.cpp ${ROCCLR_SRC_DIR}/device/pal/paldevice.cpp ${ROCCLR_SRC_DIR}/device/pal/paldeviced3d10.cpp ${ROCCLR_SRC_DIR}/device/pal/paldeviced3d11.cpp ${ROCCLR_SRC_DIR}/device/pal/paldeviced3d9.cpp ${ROCCLR_SRC_DIR}/device/pal/paldevicegl.cpp ${ROCCLR_SRC_DIR}/device/pal/palgpuopen.cpp ${ROCCLR_SRC_DIR}/device/pal/palkernel.cpp ${ROCCLR_SRC_DIR}/device/pal/palmemory.cpp ${ROCCLR_SRC_DIR}/device/pal/palprintf.cpp ${ROCCLR_SRC_DIR}/device/pal/palprogram.cpp ${ROCCLR_SRC_DIR}/device/pal/palresource.cpp ${ROCCLR_SRC_DIR}/device/pal/palblitcl.cpp ${ROCCLR_SRC_DIR}/device/pal/palsettings.cpp ${ROCCLR_SRC_DIR}/device/pal/palsignal.cpp ${ROCCLR_SRC_DIR}/device/pal/palthreadtrace.cpp ${ROCCLR_SRC_DIR}/device/pal/paltimestamp.cpp ${ROCCLR_SRC_DIR}/device/pal/palvirtual.cpp) target_compile_definitions(rocclr PUBLIC WITH_PAL_DEVICE PAL_GPUOPEN_OCL) target_link_libraries(rocclr PUBLIC pal amdhsaloader) # support for OGL/D3D interop if(WIN32) target_link_libraries(rocclr PUBLIC opengl32.lib dxguid.lib) endif() clr-rocm-5.7.1/rocclr/compiler/000077500000000000000000000000001450307266000163435ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/compiler/lib/000077500000000000000000000000001450307266000171115ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/compiler/lib/backends/000077500000000000000000000000001450307266000206635ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/compiler/lib/backends/common/000077500000000000000000000000001450307266000221535ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/compiler/lib/backends/common/library.hpp000066400000000000000000000043271450307266000243360ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef LIBRARY_H_ #define LIBRARY_H_ #include #include namespace amd { typedef enum _library_selector { LibraryUndefined = 0, GPU_Library_7xx, GPU_Library_Evergreen, GPU_Library_SI, CPU_Library_Generic, CPU_Library_AVX, CPU_Library_FMA4, GPU_Library_Generic, CPU64_Library_Generic, CPU64_Library_AVX, CPU64_Library_FMA4, GPU64_Library_Evergreen, GPU64_Library_SI, GPU64_Library_Generic, GPU_Library_CI, GPU64_Library_CI, GPU_Library_HSAIL, LibraryTotal } LibrarySelector; /** Integrated Bitcode Libararies **/ class LibraryDescriptor { public: enum {MAX_NUM_LIBRARY_DESCS = 11}; const char* start; size_t size; }; int getLibDescs ( LibrarySelector LibType, // input LibraryDescriptor* LibDesc, // output int& LibDescSize // output -- LibDesc[0:LibDescSize-1] ); static constexpr const char* amdRTFuns[] = { "__amdrt_div_i64", "__amdrt_div_u64", "__amdrt_mod_i64", "__amdrt_mod_u64", "__amdrt_cvt_f64_to_u64", "__amdrt_cvt_f32_to_u64" }; } //amd #endif // LIBRARY_H_ clr-rocm-5.7.1/rocclr/compiler/lib/include/000077500000000000000000000000001450307266000205345ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/compiler/lib/include/acl.h000066400000000000000000000257211450307266000214530ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _ACL_0_8_H_ #define _ACL_0_8_H_ #ifdef __cplusplus extern "C" { #endif #include "aclTypes.h" //!--------------------------------------------------------------------------!// // Functions that deal with aclCompiler objects. //!--------------------------------------------------------------------------!// aclCompiler* ACL_API_ENTRY aclCompilerInit(aclCompilerOptions *opts, acl_error *error_code) ACL_API_0_8; acl_error ACL_API_ENTRY aclCompilerFini(aclCompiler *cl) ACL_API_0_8; aclCLVersion ACL_API_ENTRY aclCompilerVersion(aclCompiler *cl, acl_error *error_code) ACL_API_0_8; uint32_t ACL_API_ENTRY aclVersionSize(aclCLVersion num, acl_error *error_code) ACL_API_0_8; const char* ACL_API_ENTRY aclGetErrorString(acl_error error_code) ACL_API_0_8; //!--------------------------------------------------------------------------!// // Functions that deal with target specific information. //!--------------------------------------------------------------------------!// //! Returns in the names argument, if non-NULL, a pointer to each of the arch // names that the compiler supports. If names is NULL and arch_size is // non-NULL, returns the number of arch entries that are required. acl_error ACL_API_ENTRY aclGetArchInfo(const char** arch_names, size_t *arch_size) ACL_API_0_8; //! Returns in the arch argument, if non-NULL, a pointer to each device // name that the compiler supports. If device_size is non-NULL, // returns the number of device entries that are used. acl_error ACL_API_ENTRY aclGetDeviceInfo(const char* arch, const char **names, size_t *device_size) ACL_API_0_8; //! Function that returns a correctly filled out aclTargetInfo structure based // on the information passed into the kernel. aclTargetInfo ACL_API_ENTRY aclGetTargetInfo(const char *arch, const char *device, acl_error *error_code) ACL_API_0_8; //! Function that returns a correctly filled out aclTargetInfo structure based // on the information passed into the kernel. aclTargetInfo ACL_API_ENTRY aclGetTargetInfoFromChipID(const char *arch, const uint32_t chip_id, acl_error *error_code) ACL_API_0_8; //! Function that returns a string representation of the target architecture. const char* ACL_API_ENTRY aclGetArchitecture(const aclTargetInfo &target) ACL_API_0_8; //! Function that returns a string representation of the target chip options. const uint64_t ACL_API_ENTRY aclGetChipOptions(const aclTargetInfo &target) ACL_API_0_8; //! Function that returns a string representation of the target family. const char* ACL_API_ENTRY aclGetFamily(const aclTargetInfo &target) ACL_API_0_8; //! Function that returns a string representation of the target chip. const char* ACL_API_ENTRY aclGetChip(const aclTargetInfo &target) ACL_API_0_8; //!--------------------------------------------------------------------------!// // Functions that deal with aclBinary objects. //!--------------------------------------------------------------------------!// aclBinary* ACL_API_ENTRY aclBinaryInit( size_t struct_version, const aclTargetInfo *target, const aclBinaryOptions *options, acl_error *error_code) ACL_API_0_8; acl_error ACL_API_ENTRY aclBinaryFini(aclBinary *bin) ACL_API_0_8; aclBinary* ACL_API_ENTRY aclReadFromFile(const char *str, acl_error *error_code) ACL_API_0_8; aclBinary* ACL_API_ENTRY aclReadFromMem(const void *mem, size_t size, acl_error *error_code) ACL_API_0_8; acl_error ACL_API_ENTRY aclWriteToFile(aclBinary *bin, const char *str) ACL_API_0_8; acl_error ACL_API_ENTRY aclWriteToMem(aclBinary *bin, void **mem, size_t *size) ACL_API_0_8; aclBinary* ACL_API_ENTRY aclCreateFromBinary(const aclBinary *binary, aclBIFVersion version) ACL_API_0_8; aclBIFVersion ACL_API_ENTRY aclBinaryVersion(const aclBinary *binary) ACL_API_0_8; acl_error ACL_API_ENTRY aclInsertSection(aclCompiler *cl, aclBinary *binary, const void *data, size_t data_size, aclSections id) ACL_API_0_8; acl_error ACL_API_ENTRY aclInsertSymbol(aclCompiler *cl, aclBinary *binary, const void *data, size_t data_size, aclSections id, const char *symbol) ACL_API_0_8; const void* ACL_API_ENTRY aclExtractSection(aclCompiler *cl, const aclBinary *binary, size_t *size, aclSections id, acl_error *error_code) ACL_API_0_8; const void* ACL_API_ENTRY aclExtractSymbol(aclCompiler *cl, const aclBinary *binary, size_t *size, aclSections id, const char *symbol, acl_error *error_code) ACL_API_0_8; acl_error ACL_API_ENTRY aclRemoveSection(aclCompiler *cl, aclBinary *binary, aclSections id) ACL_API_0_8; acl_error ACL_API_ENTRY aclRemoveSymbol(aclCompiler *cl, aclBinary *binary, aclSections id, const char *symbol) ACL_API_0_8; //!--------------------------------------------------------------------------!// // Functions that deal with debug/metdata. //!--------------------------------------------------------------------------!// acl_error ACL_API_ENTRY aclQueryInfo(aclCompiler *cl, const aclBinary *binary, aclQueryType query, const char *kernel, void *data_ptr, size_t *ptr_size) ACL_API_0_8; acl_error ACL_API_ENTRY aclDbgAddArgument(aclCompiler *cl, aclBinary *binary, const char* kernel, const char* name, bool byVal) ACL_API_0_8; acl_error ACL_API_ENTRY aclDbgRemoveArgument(aclCompiler *cl, aclBinary *binary, const char* kernel, const char* name) ACL_API_0_8; //!--------------------------------------------------------------------------!// // Functions that deal with various compilation phases. //!--------------------------------------------------------------------------!// acl_error ACL_API_ENTRY aclCompile(aclCompiler *cl, aclBinary *bin, const char *options, aclType from, aclType to, aclLogFunction compile_callback) ACL_API_0_8; acl_error ACL_API_ENTRY aclLink(aclCompiler *cl, aclBinary *src_bin, unsigned int num_libs, aclBinary **libs, aclType link_mode, const char *options, aclLogFunction link_callback) ACL_API_0_8; const char* ACL_API_ENTRY aclGetCompilerLog(aclCompiler *cl) ACL_API_0_8; const void* ACL_API_ENTRY aclRetrieveType(aclCompiler *cl, const aclBinary *bin, const char *name, size_t *data_size, aclType type, acl_error *error_code) ACL_API_0_8; acl_error ACL_API_ENTRY aclSetType(aclCompiler *cl, aclBinary *bin, const char *name, aclType type, const void *data, size_t size) ACL_API_0_8; acl_error ACL_API_ENTRY aclConvertType(aclCompiler *cl, aclBinary *bin, const char *name, aclType type) ACL_API_0_8; acl_error ACL_API_ENTRY aclDisassemble(aclCompiler *cl, aclBinary *bin, const char *kernel, aclLogFunction disasm_callback) ACL_API_0_8; const void* ACL_API_ENTRY aclGetDeviceBinary(aclCompiler *cl, const aclBinary *bin, const char *kernel, size_t *size, acl_error *error_code) ACL_API_0_8; //!--------------------------------------------------------------------------!// // Functions that deal with binary image. //!--------------------------------------------------------------------------!// bool ACL_API_ENTRY aclValidateBinaryImage(const void* binary, size_t length, unsigned) ACL_API_0_8; //!--------------------------------------------------------------------------!// // Functions that deal with aclJITObjectImage objects. //!--------------------------------------------------------------------------!// aclJITObjectImage ACL_API_ENTRY aclJITObjectImageCreate(aclCompiler *cl, const void* buffer, size_t length, aclBinary* bin, acl_error* error_code); aclJITObjectImage ACL_API_ENTRY aclJITObjectImageCopy(aclCompiler *cl, const void* buffer, size_t length, acl_error* error_code); acl_error ACL_API_ENTRY aclJITObjectImageDestroy(aclCompiler *cl, aclJITObjectImage buffer); acl_error ACL_API_ENTRY aclJITObjectImageFinalize(aclCompiler *cl, aclJITObjectImage image); size_t ACL_API_ENTRY aclJITObjectImageSize(aclCompiler *cl, aclJITObjectImage image, acl_error* error_code); const char* ACL_API_ENTRY aclJITObjectImageData(aclCompiler *cl, aclJITObjectImage image, acl_error* error_code); size_t ACL_API_ENTRY aclJITObjectImageGetGlobalsSize(aclCompiler *cl, aclJITObjectImage image, acl_error* error_code); acl_error ACL_API_ENTRY aclJITObjectImageIterateSymbols(aclCompiler *cl, aclJITObjectImage image, aclJITSymbolCallback callback, void* data); #if defined(LEGACY_COMPLIB) char* ACL_API_ENTRY aclJITObjectImageDisassembleKernel(aclCompiler *cl, constAclJITObjectImage image, const char* kernel, acl_error* error_code); #endif //!--------------------------------------------------------------------------!// // Debug functionality //!--------------------------------------------------------------------------!// void aclDumpBinary(const aclBinary *bin); //!--------------------------------------------------------------------------!// // Functions that deal with kenel statistics. //!--------------------------------------------------------------------------!// void aclGetKstatsSI(const void* shader, aclKernelStats &kstats); acl_error ACL_API_ENTRY aclInsertKernelStatistics(aclCompiler *cl, aclBinary *bin); //! Define hardware info constants for SI and above devices static constexpr unsigned SI_sgprs_avail = 102; static constexpr unsigned SI_vgprs_avail = 256; static constexpr unsigned SI_ldssize_avail = 32*1024; //!--------------------------------------------------------------------------!// // Functions that deal with memory. // Free memory allocated by aclWriteToMem //!--------------------------------------------------------------------------!// acl_error ACL_API_ENTRY aclFreeMem(aclBinary *bin, void *mem); #ifdef __cplusplus } #endif #endif // _ACL_0_8_H_ clr-rocm-5.7.1/rocclr/compiler/lib/include/aclDefs.h000066400000000000000000000031531450307266000222500ustar00rootroot00000000000000/* Copyright (c) 2011 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _ACL_DEFS_0_8_H_ #define _ACL_DEFS_0_8_H_ #ifndef ACL_API_ENTRY #if defined(_WIN32) || defined(__CYGWIN__) #define ACL_API_ENTRY __stdcall #else #define ACL_API_ENTRY #endif #endif #ifndef ACL_API_0_8 #define ACL_API_0_8 #endif #ifndef BIF_API_2_0 #define BIF_API_2_0 #endif #ifndef BIF_API_2_1 #define BIF_API_2_1 #endif #ifndef BIF_API_3_0 #define BIF_API_3_0 #endif #ifndef MAX_HIDDEN_KERNARGS_NUM #define MAX_HIDDEN_KERNARGS_NUM 6 #else #error "MAX_HIDDEN_KERNARGS_NUM is already defined" #endif #endif // _ACL_DEFS_0_8_H_ clr-rocm-5.7.1/rocclr/compiler/lib/include/aclEnums.h000066400000000000000000000272261450307266000224650ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _ACL_ENUMS_0_8_H_ #define _ACL_ENUMS_0_8_H_ typedef enum _acl_error_enum_0_8 { ACL_SUCCESS = 0, ACL_ERROR = 1, ACL_INVALID_ARG = 2, ACL_OUT_OF_MEM = 3, ACL_SYS_ERROR = 4, ACL_UNSUPPORTED = 5, ACL_ELF_ERROR = 6, ACL_INVALID_FILE = 7, ACL_INVALID_COMPILER= 8, ACL_INVALID_TARGET = 9, ACL_INVALID_BINARY = 10, ACL_INVALID_OPTION = 11, ACL_INVALID_TYPE = 12, ACL_INVALID_SECTION = 13, ACL_INVALID_SYMBOL = 14, ACL_INVALID_QUERY = 15, ACL_FRONTEND_FAILURE= 16, ACL_INVALID_BITCODE = 17, ACL_LINKER_ERROR = 18, ACL_OPTIMIZER_ERROR = 19, ACL_CODEGEN_ERROR = 20, ACL_ISAGEN_ERROR = 21, ACL_INVALID_SOURCE = 22, ACL_LIBRARY_ERROR = 23, ACL_INVALID_SPIR = 24, ACL_LWVERIFY_FAIL = 25, ACL_HWVERIFY_FAIL = 26, ACL_SPIRV_LOAD_FAIL = 27, ACL_SPIRV_SAVE_FAIL = 28, ACL_LAST_ERROR = 29 } acl_error_0_8; typedef enum _comp_device_caps_enum_0_8 { capError = 0, capFMA = 1, capImageSupport = 2, capSaveSOURCE = 3, // input source capSaveLLVMIR = 4, // output LLVMIR from frontend capSaveCG = 5, // output from LLVM-BE capSaveEXE = 6, // output executable capSaveAMDIL = 7, // Save per-kernel AMDIL capSaveHSAIL = 8, // Save per-kernel HSAIL capEncrypted = 9, capSaveDISASM = 10, capSaveAS = 11, capSaveSPIR = 12, capDumpLast = 13 } compDeviceCaps_0_8; typedef enum _comp_opt_settings_enum_0_8 { optO0 = 0, // No optimization setting. optO1 = 1, optO2 = 2, optO3 = 3, optO4 = 4, optOs = 5, optError = 6, // Invalid optimization set optLast = 7 } compOptSettings_0_8; #define FLAG_SHIFT_VALUE 5 #define FLAG_MASK_VALUE ((1 << capDumpLast) - 1) #define FLAG_BITLOC(A) (1 << ((A) & FLAG_MASK_VALUE)) #define FLAG_ARRAY_SIZE 4 //! An enumeration that defines the possible valid device types that // can be compiled for. typedef enum _acl_dev_type_enum_0_8 { aclError = 0, // aclDevType of 0 is an error. aclX86 = 1, // Targeting a 32bit X86 CPU device. aclAMDIL = 2, // Targeting an AMDIL GPU device. aclHSAIL = 3, // Targeting an HSAIL GPU device. aclX64 = 4, // Targeting a 64bit X86 CPU device. aclHSAIL64= 5, // Targeting a 64bit HSAIL GPU device. aclAMDIL64= 6, // Targeting a 64bit AMDIL GPU device aclLast = 7 } aclDevType_0_8; //! Enum that represents the versions of the compiler typedef enum _acl_cl_version_enum_0_8 { ACL_VERSION_ERROR = 0, ACL_VERSION_0_7 = 1, ACL_VERSION_0_8 = 2, ACL_VERSION_0_8_1 = 3, ACL_VERSION_0_9 = 4, ACL_VERSION_1_0 = 5, ACL_VERSION_LAST = 6 } aclCLVersion_0_8; //! Enum of the various aclTypes that are supported typedef enum _acl_type_enum_0_8 { ACL_TYPE_DEFAULT = 0, ACL_TYPE_OPENCL = 1, ACL_TYPE_LLVMIR_TEXT = 2, ACL_TYPE_LLVMIR_BINARY = 3, ACL_TYPE_SPIR_TEXT = 4, ACL_TYPE_SPIR_BINARY = 5, ACL_TYPE_AMDIL_TEXT = 6, ACL_TYPE_AMDIL_BINARY = 7, ACL_TYPE_HSAIL_TEXT = 8, ACL_TYPE_HSAIL_BINARY = 9, ACL_TYPE_X86_TEXT = 10, ACL_TYPE_X86_BINARY = 11, ACL_TYPE_CG = 12, ACL_TYPE_SOURCE = 13, ACL_TYPE_ISA = 14, ACL_TYPE_HEADER = 15, ACL_TYPE_RSLLVMIR_BINARY = 16, ACL_TYPE_SPIRV_BINARY = 17, ACL_TYPE_ASM_TEXT = 18, ACL_TYPE_LAST = 19 } aclType_0_8; //! Enum of the various loader types that are supported. typedef enum _acl_loader_type_enum_0_8 { ACL_LOADER_COMPLIB = 0, ACL_LOADER_FRONTEND = 1, ACL_LOADER_LINKER = 2, ACL_LOADER_OPTIMIZER= 3, ACL_LOADER_CODEGEN = 4, ACL_LOADER_BACKEND = 5, ACL_LOADER_SC = 6, ACL_LOADER_LAST = 7 } aclLoaderType_0_8; // Enumeration for the various acl versions typedef enum _bif_version_enum_0_8 { aclBIFVersionError = 0, // Error aclBIFVersion20 = 1, // Version 2.0 of the OpenCL BIF aclBIFVersion21 = 2, // Version 2.1 of the OpenCL BIF aclBIFVersion30 = 3, // Version 3.0 of the OpenCL BIF aclBIFVersion31 = 4, // Version 3.1 of the OpenCL BIF aclBIFVersionLatest = aclBIFVersion31, // Most recent version of the BIF aclBIFVersionCAL = 5, aclBIFVersionLast = 6 } aclBIFVersion_0_8; // Enumeration for the various platform types typedef enum _bif_platform_enum_0_8 { aclPlatformCAL = 0, // For BIF 2.0 backward compatibility aclPlatformCPU = 1, // For BIF 2.0 backward compatibility aclPlatformCompLib = 2, aclPlatformLast = 3 } aclPlatform_0_8; // Enumeration for the various bif sections typedef enum _bif_sections_enum_0_8 { aclLLVMIR = 0, aclSOURCE = 1, aclILTEXT = 2, // For BIF 2.0 backward compatibility aclASTEXT = 3, // For BIF 2.0 backward compatibility aclCAL = 4, // For BIF 2.0 backward compatibility aclDLL = 5, // For BIF 2.0 backward compatibility aclSTRTAB = 6, aclSYMTAB = 7, aclRODATA = 8, aclSHSTRTAB = 9, aclNOTES = 10, aclCOMMENT = 11, aclILDEBUG = 12, // For BIF 2.0 backward compatibility aclDEBUG_INFO = 13, aclDEBUG_ABBREV = 14, aclDEBUG_LINE = 15, aclDEBUG_PUBNAMES = 16, aclDEBUG_PUBTYPES = 17, aclDEBUG_LOC = 18, aclDEBUG_ARANGES = 19, aclDEBUG_RANGES = 20, aclDEBUG_MACINFO = 21, aclDEBUG_STR = 22, aclDEBUG_FRAME = 23, aclJITBINARY = 24, // For BIF 2.0 backward compatibility aclCODEGEN = 25, aclTEXT = 26, aclINTERNAL = 27, aclSPIR = 28, aclHEADER = 29, aclBRIG = 30, aclBRIGxxx1 = 31, aclBRIGxxx2 = 32, aclBRIGxxx3 = 33, aclHSADEBUG = 34, aclKSTATS = 35, // For storing kernel statistics aclSPIRV = 36, aclLAST = 37 } aclSections_0_8; //! An enumeration that defines what are valid queries for aclQueryInfo. typedef enum _rt_query_types_enum_0_8 { RT_ABI_VERSION = 0, RT_DEVICE_NAME = 1, RT_MEM_SIZES = 2, RT_GPU_FUNC_CAPS = 3, RT_GPU_FUNC_ID = 4, RT_GPU_DEFAULT_ID = 5, RT_WORK_GROUP_SIZE = 6, RT_WORK_REGION_SIZE = 7, RT_ARGUMENT_ARRAY = 8, RT_GPU_PRINTF_ARRAY = 9, RT_CPU_BARRIER_NAMES = 10, RT_DEVICE_ENQUEUE = 11, RT_KERNEL_INDEX = 12, RT_KERNEL_NAME = 13, RT_KERNEL_NAMES = 14, RT_CONTAINS_LLVMIR = 15, RT_CONTAINS_OPTIONS = 16, RT_CONTAINS_BRIG = 17, RT_CONTAINS_HSAIL = 18, RT_CONTAINS_ISA = 19, RT_CONTAINS_LOADER_MAP = 20, RT_CONTAINS_SPIR = 21, RT_NUM_KERNEL_HIDDEN_ARGS = 22, RT_CONTAINS_SPIRV = 23, RT_WAVES_PER_SIMD_HINT = 24, RT_WORK_GROUP_SIZE_HINT = 25, RT_VEC_TYPE_HINT = 26, RT_LAST_TYPE = 27 } aclQueryType_0_8; //! An enumeration for the various GPU capabilities typedef enum _rt_gpu_caps_enum_0_8 { RT_COMPILER_WRITE = 1 << 0, RT_DATA_SECTION = 1 << 1, RT_WGS = 1 << 2, RT_LIMIT_WGS = 1 << 3, RT_PACKED_REGS = 1 << 4, RT_64BIT_ABI = 1 << 5, RT_PRINTF = 1 << 6, RT_ARENA_UAV = 1 << 7, RT_LRP_MEM = 1 << 8, // Local/Region/Private Memory RT_INDEX_TEMPS = 1 << 9, RT_WRS = 1 << 10, RT_GWS = 1 << 11, RT_SWGWS = 1 << 12, RT_GPU_CAPS_MASK = 0xFFF } aclGPUCaps_0_8; //! An enumeration for the various CPU capabilities. typedef enum _rt_cpu_caps_enum_0_8 { RT_KERNEL_BARRIER = 1 << 0, RT_PROGRAM_BARRIER = 1 << 1, RT_CPU_CAPS_MASK = 0x3 } aclCPUCaps_0_8; //! An enumeration that maps Resource type to index values typedef enum _rt_gpu_resource_enum_0_8 { RT_RES_UAV = 0, // UAV resources RT_RES_PRI = 1, // Private resources RT_RES_LDS = 2, // LDS resources RT_RES_GDS = 3, // GDS resources RT_RES_CON = 4, // Constant resources RT_RES_LAST = 5 } aclGPUResource_0_8; //! An enumeration that maps memory types to index values typedef enum _rt_gpu_mem_sizes_enum_0_8 { RT_MEM_HW_LOCAL = 0, RT_MEM_SW_LOCAL = 1, RT_MEM_HW_PRIVATE = 2, RT_MEM_SW_PRIVATE = 3, RT_MEM_HW_REGION = 4, RT_MEM_SW_REGION = 5, RT_MEM_LAST = 6 } aclGPUMemSizes_0_8; // Enumerations for the various argument types. typedef enum _acl_arg_type_enum_0_8 { ARG_TYPE_ERROR = 0, ARG_TYPE_SAMPLER = 1, ARG_TYPE_IMAGE = 2, ARG_TYPE_COUNTER = 3, ARG_TYPE_VALUE = 4, ARG_TYPE_POINTER = 5, ARG_TYPE_SEMAPHORE = 6, ARG_TYPE_QUEUE = 7, // enum for device enqueue ARG_TYPE_LAST = 8 } aclArgType_0_8; // Enumerations of the valid data types for pass by value and // pass by pointer kernel arguments. typedef enum _acl_data_type_enum_0_8 { DATATYPE_ERROR = 0, DATATYPE_i1 = 1, DATATYPE_i8 = 2, DATATYPE_i16 = 3, DATATYPE_i32 = 4, DATATYPE_i64 = 5, DATATYPE_u8 = 6, DATATYPE_u16 = 7, DATATYPE_u32 = 8, DATATYPE_u64 = 9, DATATYPE_f16 = 10, DATATYPE_f32 = 11, DATATYPE_f64 = 12, DATATYPE_f80 = 13, DATATYPE_f128 = 14, DATATYPE_struct = 15, DATATYPE_union = 16, DATATYPE_event = 17, DATATYPE_opaque = 18, DATATYPE_unknown = 19, DATATYPE_LAST = 20 } aclArgDataType_0_8; // Enumerations of the valid memory types for pass by pointer // kernel arguments typedef enum _acl_memory_type_enum_0_8 { PTR_MT_ERROR = 0, // Error PTR_MT_GLOBAL = 1, // global buffer PTR_MT_SCRATCH_EMU = 2, // SW emulated private memory PTR_MT_LDS_EMU = 3, // SW emulated local memory PTR_MT_UAV = 4, // uniformed access vector memory PTR_MT_CONSTANT_EMU = 5, // SW emulated constant memory PTR_MT_GDS_EMU = 6, // SW emulated region memory PTR_MT_LDS = 7, // HW local memory PTR_MT_SCRATCH = 8, // HW private memory PTR_MT_CONSTANT = 9, // HW constant memory PTR_MT_GDS = 10, // HW region memory PTR_MT_UAV_SCRATCH = 11, // SI and later HW private memory PTR_MT_UAV_CONSTANT = 12, // SI and later HW constant memory PTR_MT_LAST = 13 } aclMemoryType_0_8; // Enumeration that specifies the various access types for a pointer/image. typedef enum _acl_access_type_enum_0_8 { ACCESS_TYPE_ERROR = 0, ACCESS_TYPE_RO = 1, ACCESS_TYPE_WO = 2, ACCESS_TYPE_RW = 3, ACCESS_TYPE_LAST = 4 } aclAccessType_0_8; // Enumeration that specifies the binary types. typedef enum _acl_binary_image_type_enum_0_8 { BINARY_TYPE_ELF = 1, BINARY_TYPE_LLVM = 2, BINARY_TYPE_SPIRV = 4, } aclBinaryImageType_0_8; #endif // _ACL_ENUMS_0_8_H_ clr-rocm-5.7.1/rocclr/compiler/lib/include/aclFunctors.h000066400000000000000000000157241450307266000232010ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _ACL_FUNCTORS_0_8_H_ #define _ACL_FUNCTORS_0_8_H_ //! Callback for the log function function pointer that many // API calls take to have the calling application receive // information on what errors occur. typedef void (*aclLogFunction_0_8)(const char *msg, size_t size); typedef bool (*aclJITSymbolCallback)(const char*, const void*, void*); typedef void* aclJITObjectImage; typedef const void* constAclJITObjectImage; typedef acl_error (ACL_API_ENTRY *InsertSec_0_8)(aclCompiler *cl, aclBinary *binary, const void *data, size_t data_size, aclSections id) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *InsertSym_0_8)(aclCompiler *cl, aclBinary *binary, const void *data, size_t data_size, aclSections id, const char *symbol) ACL_API_0_8; typedef const void * (ACL_API_ENTRY *ExtractSec_0_8)(aclCompiler *cl, const aclBinary *binary, size_t *size, aclSections id, acl_error *error_code) ACL_API_0_8; typedef const void * (ACL_API_ENTRY *ExtractSym_0_8)(aclCompiler *cl, const aclBinary *binary, size_t *size, aclSections id, const char *symbol, acl_error *error_code) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *RemoveSec_0_8)(aclCompiler *cl, aclBinary *binary, aclSections id) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *RemoveSym_0_8)(aclCompiler *cl, aclBinary *binary, aclSections id, const char *symbol) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *QueryInfo_0_8)(aclCompiler *cl, const aclBinary *binary, aclQueryType query, const char *kernel, void *data_ptr, size_t *ptr_size) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *AddDbgArg_0_8)(aclCompiler *cl, aclBinary *bin, const char *kernel, const char *name, bool byVal) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *RemoveDbgArg_0_8)(aclCompiler *cl, aclBinary *bin, const char *kernel, const char *name) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *Compile_0_8)(aclCompiler *cl, aclBinary *bin, const char *options, aclType from, aclType to, aclLogFunction_0_8 compile_callback) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *Link_0_8)(aclCompiler *cl, aclBinary *src_bin, unsigned int num_libs, aclBinary **libs, aclType link_mode, const char *options, aclLogFunction_0_8 link_callback) ACL_API_0_8; typedef const char * (ACL_API_ENTRY *CompLog_0_8)(aclCompiler *cl) ACL_API_0_8; typedef const void * (ACL_API_ENTRY *RetrieveType_0_8)(aclCompiler *cl, const aclBinary *bin, const char *name, size_t *data_size, aclType type, acl_error *error_code) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *SetType_0_8)(aclCompiler *cl, aclBinary *bin, const char *name, aclType type, const void *data, size_t size) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *ConvertType_0_8)(aclCompiler *cl, aclBinary *bin, const char *name, aclType type) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *Disassemble_0_8)(aclCompiler *cl, aclBinary *bin, const char *kernel, aclLogFunction_0_8 disasm_callback) ACL_API_0_8; typedef const void * (ACL_API_ENTRY *GetDevBinary_0_8)(aclCompiler *cl, const aclBinary *bin, const char *kernel, size_t *size, acl_error *error_code) ACL_API_0_8; typedef aclLoaderData * (ACL_API_ENTRY *LoaderInit_0_8)(aclCompiler *cl, aclBinary *bin, aclLogFunction_0_8 callback, acl_error *error); typedef acl_error (ACL_API_ENTRY *LoaderFini_0_8)(aclLoaderData *data); typedef aclModule * (ACL_API_ENTRY *FEToIR_0_8)(aclLoaderData *ald, const char *source, size_t data_size, aclContext *ctx, acl_error *error) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *SourceToISA_0_8)(aclLoaderData *ald, const char *source, size_t data_size) ACL_API_0_8; typedef aclModule * (ACL_API_ENTRY *IRPhase_0_8)(aclLoaderData *data, aclModule *ir, aclContext *ctx, acl_error *error) ACL_API_0_8; typedef aclModule * (ACL_API_ENTRY *LinkPhase_0_8)(aclLoaderData *data, aclModule *ir, unsigned int num_libs, aclModule **libs, aclContext *ctx, acl_error *error) ACL_API_0_8; typedef const void * (ACL_API_ENTRY *CGPhase_0_8)(aclLoaderData *data, aclModule *ir, aclContext *ctx, acl_error *error) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *DisasmISA_0_8)(aclLoaderData *data, const char *kernel, const void *isa_code, size_t isa_size) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *SetupLoaderObject_0_8)(aclCompiler *cl) ACL_API_0_8; typedef aclJITObjectImage (ACL_API_ENTRY *JITObjectImageCreate_0_8)(const void* buffer, size_t length, aclBinary* bin, acl_error* error_code) ACL_API_0_8; typedef aclJITObjectImage (ACL_API_ENTRY *JITObjectImageCopy_0_8)(const void* buffer, size_t length, acl_error* error_code) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *JITObjectImageDestroy_0_8)(aclJITObjectImage image) ACL_API_0_8; typedef size_t (ACL_API_ENTRY *JITObjectImageSize_0_8)(aclJITObjectImage image, acl_error* error_code) ACL_API_0_8; typedef const char * (ACL_API_ENTRY *JITObjectImageData_0_8)(aclJITObjectImage image, acl_error* error_code) ACL_API_0_8; typedef acl_error (ACL_API_ENTRY *JITObjectImageFinalize_0_8)(aclJITObjectImage image) ACL_API_0_8; typedef size_t (ACL_API_ENTRY *JITObjectImageGetGlobalsSize_0_8)(aclJITObjectImage image, acl_error* error_code) ACL_API_0_8; typedef bool (*JITSymbolCallback_0_8)(const char*, const void*, void*); typedef acl_error (ACL_API_ENTRY *JITObjectImageIterateSymbols_0_8)(aclJITObjectImage image, JITSymbolCallback_0_8 jit_callback, void* data) ACL_API_0_8; typedef char* (ACL_API_ENTRY *JITObjectImageDisassembleKernel_0_8)(constAclJITObjectImage image, const char* kernel, acl_error* error_code) ACL_API_0_8; typedef void* (*AllocFunc_0_8)(size_t size) ACL_API_0_8; typedef void (*FreeFunc_0_8)(void *ptr) ACL_API_0_8; #endif // _ACL_FUNCTORS_0_8_H_ clr-rocm-5.7.1/rocclr/compiler/lib/include/aclStructs.h000066400000000000000000000302251450307266000230360ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _ACL_STRUCTS_0_8_H_ #define _ACL_STRUCTS_0_8_H_ #define ACL_STRUCT_HEADER \ size_t struct_size //! A structure that holds information on the various types of arguments // The format in memory of this structure is // ------------- // | aclArgData | // ------------- // |->argStr | // ------------- // |->typeStr | // ------------- typedef struct _acl_md_arg_type_0_8 { ACL_STRUCT_HEADER; size_t argNameSize; size_t typeStrSize; const char *argStr; const char *typeStr; union { struct { // Struct for sampler arguments unsigned ID; unsigned isKernelDefined; unsigned value; } sampler; struct { // Struct for image arguments unsigned resID; unsigned cbNum; unsigned cbOffset; aclAccessType type; bool is2D; bool is1D; bool isArray; bool isBuffer; } image; struct { // struct for atomic counter arguments unsigned is32bit; unsigned resID; unsigned cbNum; unsigned cbOffset; } counter; struct { // struct for semaphore arguments unsigned resID; unsigned cbNum; unsigned cbOffset; } sema; struct { // struct for pass by value arguments unsigned numElements; unsigned cbNum; unsigned cbOffset; aclArgDataType data; } value; struct { // struct for pass by pointer arguments unsigned numElements; unsigned cbNum; unsigned cbOffset; unsigned bufNum; unsigned align; aclArgDataType data; aclMemoryType memory; aclAccessType type; bool isVolatile; bool isRestrict; bool isPipe; } pointer; struct { // Struct for queue arguments unsigned numElements; unsigned cbNum; unsigned cbOffset; aclArgDataType data; aclMemoryType memory; } queue; } arg; aclArgType type; bool isConst; } aclArgData_0_8; //! A structure that holds information for printf // The format in memory of this structure is // -------------- // | aclPrintfFmt| // -------------- // |->argSizes | // -------------- // |->fmrStr | // -------------- typedef struct _acl_md_printf_fmt_0_8 { ACL_STRUCT_HEADER; unsigned ID; size_t numSizes; size_t fmtStrSize; uint32_t *argSizes; const char *fmtStr; } aclPrintfFmt_0_8; //! A structure that holds the metadata in the RODATA section. typedef struct _acl_metadata_0_8 { ACL_STRUCT_HEADER; // This holds the size of the structure itself for versioning. size_t data_size; // This holds the size of all the memory allocated for this structure. uint32_t major, minor, revision; // RT_ABI_VERSION uint32_t gpuCaps; // RT_GPU_FUNC_CAPS uint32_t funcID; // RT_GPU_FUNC_ID uint32_t gpuRes[5]; // RT_GPU_DEFAULT_ID size_t wgs[3]; // RT_WORK_GROUP_SIZE uint32_t wrs[3]; // RT_WORK_REGION_SIZE size_t kernelNameSize; size_t deviceNameSize; size_t mem[6]; // RT_MEM_SIZES size_t numArgs; size_t numPrintf; aclArgData_0_8 *args; // RT_ARGUMENT_ARRAY aclPrintfFmt_0_8 *printf; // RT_GPU_PRINTF_ARRAY const char *kernelName; // RT_KERNEL_NAME const char *deviceName; // RT_DEVICE_NAME bool enqueue_kernel; // RT_DEVICE_ENQUEUE uint32_t kernel_index; // RT_KERNEL_INDEX uint32_t numHiddenKernelArgs; // RT_NUM_KERNEL_HIDDEN_ARGS size_t wavesPerSimdHint; // RT_WAVES_PER_SIMD_HINT size_t wsh[3]; // RT_WORK_GROUP_SIZE_HINT size_t vecTypeHintSize; const char *vth; // RT_VEC_TYPE_HINT } aclMetadata_0_8; //! An structure that holds information on the capabilities of the bif device. typedef struct _acl_device_caps_rec_0_8 { ACL_STRUCT_HEADER; uint32_t flags[4]; uint32_t encryptCode; } aclDevCaps_0_8; //! Structure that holds information on the target that the source is // being compiled for. typedef struct _acl_target_info_rec_0_8 { ACL_STRUCT_HEADER; aclDevType arch_id; // An identifier for the architecture. uint32_t chip_id; // A identifier for the chip. } aclTargetInfo_0_8; // Structure for the version 0.8 of the structure. typedef struct _acl_binary_opts_rec_0_8 { ACL_STRUCT_HEADER; uint32_t elfclass; uint32_t bitness; const char *temp_file; uint32_t kernelArgAlign; } aclBinaryOptions_0_8; // Structure for the version 0.8.1 of the structure. // This versions addes in alloc/dealloc functions. typedef struct _acl_binary_opts_rec_0_8_1 { ACL_STRUCT_HEADER; uint32_t elfclass; uint32_t bitness; const char *temp_file; uint32_t kernelArgAlign; AllocFunc_0_8 alloc; FreeFunc_0_8 dealloc; } aclBinaryOptions_0_8_1; //! Structure that holds the OpenCL binary information. typedef struct _acl_bif_rec_0_8 { ACL_STRUCT_HEADER; aclTargetInfo_0_8 target; // Information about the target device. aclBIF* bin; // Pointer to the acl. aclOptions* options; // Pointer to acl options. aclBinaryOptions_0_8 binOpts; // Pointer to the binary options. aclDevCaps_0_8 caps; // Capabilities of the BIF. } aclBinary_0_8; //! Version of the aclBinary that uses the 0_8_1 version of the aclBinaryOptions. typedef struct _acl_bif_rec_0_8_1 { ACL_STRUCT_HEADER; aclTargetInfo_0_8 target; // Information about the target device. aclBIF* bin; // Pointer to the acl. aclOptions* options; // Pointer to acl options. aclBinaryOptions_0_8_1 binOpts; // Pointer to the binary options. aclDevCaps_0_8 caps; // Capabilities of the BIF. } aclBinary_0_8_1; #define ACL_LOADER_COMMON\ ACL_STRUCT_HEADER; \ bool isBuiltin; \ const char *libName; \ void *handle; \ LoaderInit init; \ LoaderFini fini; // Struct that maps to the common structure between all loaders. typedef struct _acl_common_loader_rec_0_8 { ACL_LOADER_COMMON; } aclCommonLoader_0_8; typedef struct _acl_cl_loader_rec_0_8 { ACL_LOADER_COMMON; Compile compile; Link link; CompLog getLog; RetrieveType_0_8 retrieveType; SetType_0_8 setType; ConvertType_0_8 convertType; Disassemble disassemble; GetDevBinary_0_8 devBinary; InsertSec insSec; ExtractSec extSec; RemoveSec remSec; InsertSym insSym; ExtractSym extSym; RemoveSym remSym; QueryInfo getInfo; AddDbgArg addDbg; RemoveDbgArg removeDbg; SetupLoaderObject setupLoaderObject; JITObjectImageCreate jitOICreate; JITObjectImageCopy jitOICopy; JITObjectImageDestroy jitOIDestroy; JITObjectImageSize jitOISize; JITObjectImageData jitOIData; JITObjectImageFinalize jitOIFinalize; JITObjectImageGetGlobalsSize jitOIGlobalSize; JITObjectImageIterateSymbols jitOIIterateSymbols; JITObjectImageDisassembleKernel jitOIDisassembleKernel; } aclCLLoader_0_8; //! Structure that holds the required functions // that sc exports for the SCDLL infrastructure. typedef struct _acl_sc_loader_rec_0_8 { ACL_LOADER_COMMON; uint32_t /*SC_UINT32*/ sc_interface_version; void /**SC_EXPORT_FUNCTIONS**/ *scef; // Any version specific fields go here. } aclSCLoader_0_8; typedef struct _acl_fe_loader_rec_0_8 { ACL_LOADER_COMMON; FEToIR toIR; // Used for Source to aclModule containing LLVMIR FEToIR toModule; // Used to convert raw SPIR/LLVM-IR to aclModule SourceToISA toISA; // Used for Source to ISA } aclFELoader_0_8; typedef struct _acl_opt_loader_rec_0_8 { ACL_LOADER_COMMON; IRPhase optimize; // Used for IR to IR transformation } aclOptLoader_0_8; typedef struct _acl_link_loader_rec_0_8 { ACL_LOADER_COMMON; LinkPhase link; // Used for Linking in IR modules IRPhase toLLVMIR; // Used for converting SPIR to LLVMIR IRPhase toSPIR; // Used for converting LLVMIR to SPIR } aclLinkLoader_0_8; typedef struct _acl_cg_loader_rec_0_8 { ACL_LOADER_COMMON; CGPhase codegen; // Used for converting from LLVMIR to target ASM. } aclCGLoader_0_8; typedef struct _acl_be_loader_rec_0_8 { ACL_LOADER_COMMON; SourceToISA finalize; // Used for converting from target source to target ISA. SourceToISA assemble; // Used for converting from target text to target binary. DisasmISA disassemble; // Used for converting from target binary to target ISA. } aclBELoader_0_8; typedef struct _acl_compiler_opts_rec_0_8 { ACL_STRUCT_HEADER; // Size of the structure for version checking. const char *clLib; const char *feLib; const char *optLib; const char *linkLib; const char *cgLib; const char *beLib; const char *scLib; } aclCompilerOptions_0_8; typedef struct _acl_compiler_opts_rec_0_8_1 { ACL_STRUCT_HEADER; // Size of the structure for version checking. const char* clLib; const char *feLib; const char *optLib; const char *linkLib; const char *cgLib; const char *beLib; const char *scLib; // Name or path to the shader compiler shared library AllocFunc alloc; FreeFunc dealloc; } aclCompilerOptions_0_8_1; //! Structure that holds the OpenCL compiler and various loaders. typedef struct _acl_compiler_rec_0_8 { ACL_STRUCT_HEADER; // Size of structure for version checking. aclCLLoader clAPI; // Pointer to the compiler API. aclFELoader feAPI; // Pointer to the FE Loader API. aclOptLoader optAPI; // Pointer to the Opt Loader API. aclLinkLoader linkAPI; // Pointer to the Link Loader API. aclCGLoader cgAPI; // Pointer to the CG Loader API. aclBELoader beAPI; // Pointer to the BE Loader API. aclSCLoader scAPI; // Pointer to the SC Loader API. aclCompilerOptions *opts; // The options structure for the compiler. void *llvm_shutdown; // Pointer to the llvm shutdown object. char *buildLog; // Pointer to the current build log. unsigned logSize; // Size of the current build log. aclLoaderData *apiData; // pointer to data store for the compiler API loader. } aclCompilerHandle_0_8; //! Structure that holds the OpenCL compiler and various loaders. typedef struct _acl_compiler_rec_0_8_1 { ACL_STRUCT_HEADER; aclCLLoader clAPI; // Pointer to the compiler API. aclFELoader feAPI; // Pointer to the FE Loader API. aclOptLoader optAPI; // Pointer to the Opt Loader API. aclLinkLoader linkAPI; // Pointer to the Link Loader API. aclCGLoader cgAPI; // Pointer to the CG Loader API. aclBELoader beAPI; // Pointer to the BE Loader API. aclSCLoader scAPI; // Pointer to the SC Loader API. AllocFunc alloc; FreeFunc dealloc; aclCompilerOptions *opts; // The options structure for the compiler. void *llvm_shutdown; // Pointer to the llvm shutdown object. char *buildLog; // Pointer to the current build log. unsigned logSize; // Size of the current build log. aclLoaderData *apiData; // pointer to data store for the compiler API loader. } aclCompilerHandle_0_8_1; //! Structure to hold kernel statistics obtained from kernel typedef struct _acl_kernel_stats_0_8_1 { unsigned int scratchRegs; unsigned int scratchSize; unsigned int availablevgprs; unsigned int availablesgprs; unsigned int usedvgprs; unsigned int usedsgprs; unsigned int availableldssize; unsigned int usedldssize; unsigned int availablestacksize; unsigned int usedstacksize; unsigned int wavefrontsize; unsigned int wavefrontpersimd; unsigned int threadsperworkgroup; unsigned int reqdworkgroup_x; unsigned int reqdworkgroup_y; unsigned int reqdworkgroup_z; } aclKernelStats; #endif // _ACL_STRUCTS_0_8_H_ clr-rocm-5.7.1/rocclr/compiler/lib/include/aclTypes.h000066400000000000000000000130351450307266000224730ustar00rootroot00000000000000/* Copyright (c) 2012 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _ACL_API_TYPES_0_8_H_ #define _ACL_API_TYPES_0_8_H_ #include "aclDefs.h" #include #include // Typedefs that always point to the most recent versions of the objects. typedef struct _acl_md_arg_type_0_8 aclArgData; typedef struct _acl_md_printf_fmt_0_8 aclPrintfFmt; typedef struct _acl_metadata_0_8 aclMetadata; typedef struct _acl_device_caps_rec_0_8 aclDevCaps; typedef struct _acl_target_info_rec_0_8 aclTargetInfo; typedef struct _acl_bif_rec_0_8_1 aclBinary; typedef struct _acl_binary_opts_rec_0_8_1 aclBinaryOptions; typedef struct _acl_compiler_rec_0_8_1 aclCompiler; typedef struct _acl_compiler_opts_rec_0_8_1 aclCompilerOptions; typedef struct _acl_options_0_8* aclOptions; // Opaque pointer to amd::Options typedef struct _acl_binary_0_8* aclBIF; // Opaque pointer to bifbase typedef struct _acl_common_loader_rec_0_8 aclCommonLoader; typedef struct _acl_cl_loader_rec_0_8 aclCLLoader; typedef struct _acl_sc_loader_rec_0_8 aclSCLoader; typedef struct _acl_fe_loader_rec_0_8 aclFELoader; typedef struct _acl_link_loader_rec_0_8 aclLinkLoader; typedef struct _acl_opt_loader_rec_0_8 aclOptLoader; typedef struct _acl_cg_loader_rec_0_8 aclCGLoader; typedef struct _acl_be_loader_rec_0_8 aclBELoader; typedef struct _acl_llvm_module_0_8* aclModule; // Opaque pointer to llvm::Module typedef struct _acl_llvm_context_0_8* aclContext; // Opaque pointer to llvm::Context typedef struct _acl_loader_data_0_8* aclLoaderData; // Opaque pointer to loader data #include "aclEnums.h" // Typedefs for enumerations typedef enum _acl_error_enum_0_8 acl_error; typedef enum _comp_device_caps_enum_0_8 compDeviceCaps; typedef enum _comp_opt_settings_enum_0_8 compOptSettings; typedef enum _acl_dev_type_enum_0_8 aclDevType; typedef enum _acl_cl_version_enum_0_8 aclCLVersion; typedef enum _acl_type_enum_0_8 aclType; typedef enum _rt_query_types_enum_0_8 aclQueryType; typedef enum _rt_gpu_caps_enum_0_8 aclGPUCaps; typedef enum _rt_gpu_resource_enum_0_8 aclGPUResource; typedef enum _rt_gpu_mem_sizes_enum_0_8 aclGPUMemSizes; typedef enum _acl_arg_type_enum_0_8 aclArgType; typedef enum _acl_data_type_enum_0_8 aclArgDataType; typedef enum _acl_memory_type_enum_0_8 aclMemoryType; typedef enum _acl_access_type_enum_0_8 aclAccessType; typedef enum _bif_version_enum_0_8 aclBIFVersion; typedef enum _bif_platform_enum_0_8 aclPlatform; typedef enum _bif_sections_enum_0_8 aclSections; typedef enum _acl_loader_type_enum_0_8 aclLoaderType; typedef enum _acl_binary_image_type_enum_0_8 aclBinaryImageType; #include "aclFunctors.h" // Typedefs for function pointers typedef aclLogFunction_0_8 aclLogFunction; typedef InsertSec_0_8 InsertSec; typedef RemoveSec_0_8 RemoveSec; typedef ExtractSec_0_8 ExtractSec; typedef InsertSym_0_8 InsertSym; typedef RemoveSym_0_8 RemoveSym; typedef ExtractSym_0_8 ExtractSym; typedef QueryInfo_0_8 QueryInfo; typedef Compile_0_8 Compile; typedef Link_0_8 Link; typedef AddDbgArg_0_8 AddDbgArg; typedef RemoveDbgArg_0_8 RemoveDbgArg; typedef SetupLoaderObject_0_8 SetupLoaderObject; typedef CompLog_0_8 CompLog; typedef RetrieveType_0_8 RetrieveType; typedef SetType_0_8 SetType; typedef ConvertType_0_8 ConvertType; typedef Disassemble_0_8 Disassemble; typedef GetDevBinary_0_8 GetDevBinary; typedef LoaderInit_0_8 LoaderInit; typedef LoaderFini_0_8 LoaderFini; typedef FEToIR_0_8 FEToIR; typedef SourceToISA_0_8 SourceToISA; typedef IRPhase_0_8 IRPhase; typedef LinkPhase_0_8 LinkPhase; typedef CGPhase_0_8 CGPhase; typedef DisasmISA_0_8 DisasmISA; typedef AllocFunc_0_8 AllocFunc; typedef FreeFunc_0_8 FreeFunc; typedef JITObjectImageCreate_0_8 JITObjectImageCreate; typedef JITObjectImageCopy_0_8 JITObjectImageCopy; typedef JITObjectImageDestroy_0_8 JITObjectImageDestroy; typedef JITObjectImageSize_0_8 JITObjectImageSize; typedef JITObjectImageData_0_8 JITObjectImageData; typedef JITObjectImageFinalize_0_8 JITObjectImageFinalize; typedef JITObjectImageGetGlobalsSize_0_8 JITObjectImageGetGlobalsSize; typedef JITSymbolCallback_0_8 JITSymbolCallback; typedef JITObjectImageIterateSymbols_0_8 JITObjectImageIterateSymbols; typedef JITObjectImageDisassembleKernel_0_8 JITObjectImageDisassembleKernel; #include "aclStructs.h" #endif // _CL_API_TYPES_0_8_H_ clr-rocm-5.7.1/rocclr/compiler/lib/spirv/000077500000000000000000000000001450307266000202545ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/compiler/lib/spirv/spirvUtils.h000066400000000000000000000024231450307266000226120ustar00rootroot00000000000000/* Copyright (c) 2008 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _COMPLIB_SPIRV_UTILS_H #define _COMPLIB_SPIRV_UTILS_H #include bool validateSPIRV(const void *image, size_t length); bool isSPIRVMagic(const void* image, size_t length); #endif clr-rocm-5.7.1/rocclr/compiler/lib/utils/000077500000000000000000000000001450307266000202515ustar00rootroot00000000000000clr-rocm-5.7.1/rocclr/compiler/lib/utils/OPTIONS.def000066400000000000000000001302631450307266000220710ustar00rootroot00000000000000/* Copyright (c) 2010 - 2021 Advanced Micro Devices, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* * Description: * * This is the file that contains definitions of all options to clBuildProgram(). * And any changes to option (add/remove/modify) should be done here. This option * processing is thread-safe, that is, each option is not implemented as a static * variable that is assessible to all threads but as a local variable that is * accessible only to its definining thread. For example, * * option::Options localOptions; * option::parseAllOptions(cppstr, localOptions); * * where 'localOptions' is an option object that contains all option variables and any * code that needs to access the option has to do so via 'localOptions'. (That is why * 'localOptions' has been passed down to many parts that need to access option * variables.) For instance, the following will be used to check if -g is present: * * if (localOptions.oVariables->EnableDebug) { * <-g is present>; * } * * * MACROS for making changes to this file: * * Two macros : OPTION for runtime options that have option variables and NOPTION for others * that do Not have option variables. The OPTION are ones that are referenced by the runtime * via their option variables like the one shown above. The NOPTION are ones that are either * passed into component option processors directly, or alias runtime options. An alias runtime * option is one that refers to another option or a group of others and has no corresponding * option variable in the above option object. For example, * -D NAME= is passed into the front end and the runtime has no variable for it in the * above option object. Another example is -cl-opt-disable, which is a runtime alias option. * It is equivalent to -O0 and used to set -O0. It has no variable in the option object either. * * Here are these two macros: * OPTION(type, attr, sn, ln, var, ideft, imix, imax, sdeft, desc) * NOPTION(type, attr, sn, ln, var, ideft, imix, imax, sdeft, desc) * * For convenience, FLAG marco is provided as well. FLAG macro is very close to a flag * used in flags.hpp. It is define as below: * FLAG(type, vis, sn, var, deft, desc) * * type: option type defined as OptionType enum: * OT_BOOL : bool * OT_INT32 : int32 * OT_UINT32 : uint32 * OT_CSTRING : char* * OT_uCHAR : unsigned char * * attr: option attributes, divided in several groups: * OptionValue : value attribute. Use exactly one. * OVA_OPTIONAL : value is optional * OVA_REQUIRED : value is required to appear * OVA_DISALLOWED : value may not be specified * * OptionForm : form attribute. Use exactly one. * OFA_NORMAL : normal form, no prefix * OFA_PREFIX_F : -f, machine-independent (-f[no-]